128 51 5MB
English Pages 94 [90] Year 2024
Yan Sun
High-Orders Motion Analysis Computer Vision Methods
High-Orders Motion Analysis
Yan Sun
High-Orders Motion Analysis Computer Vision Methods
Yan Sun School of Computer Engineering and Science Shanghai University Shanghai, China
ISBN 978-981-99-9190-7 ISBN 978-981-99-9191-4 (eBook) https://doi.org/10.1007/978-981-99-9191-4 Jointly published with Southeast University Press The print edition is not for sale in China (Mainland). Customers from China (Mainland) please order the print book from: Southeast University Press. © Southeast University Press 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publishers, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Preface
Most research on motion analysis has generally not yet considered the basic nature of higher orders of motion such as acceleration. Hence, this book analyzes the higher orders of motion flow, and their constituent parts are preliminarily investigated for revealing the chaotic motion fields. Naturally, it is possible to extend this notion further: to detect higher orders of image motion. In this respect, this book shows how acceleration, Jerk and Snap field can be obtained from image sequences. We believe higher-order motion is propagated as a general motion descriptor; it shows the capability for differentiating different types of motion both on synthesized data and real image sequences. The derived results on test images and heel strike detection in gait analysis illustrate the ability of higher-order motion, which provide the basis for the following research and applications in the future. This book has six chapters; Chap. 1 gives the overview and intention of this book. Chapter 2 gives a brief introduction of optical flow, and four benchmark algorithms are compared to demonstrate the advantages and weakness of optical flow algorithms. The acceleration algorithms presented in Chap. 3, include the experimental results on both synthetic and real-world images. In Chap. 4, optical flow is decomposed into higher orders and their constituent parts. Chapter 5 describes and evaluates the methodology of detecting heel strike via higher-order motion. Chapter 6 concludes this book; the analysis and results of higher-order motion descriptor show that they are ripe for further investigation and explore potential future application directions. Thanks for the advice and guidance from Professor Mark S. Nixon and Professor Jonathon S. Hare. This book is funded by National Natural Science Foundation of China (62002215) and Shanghai Pujiang Program (20PJ1404400). Shanghai, China
Yan Sun
v
Contents
1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2 Describing Motion in Computer Images’ Stream: Optical Flow . . . . . . 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Data Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Prior Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Selected Optical Flow Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Differential Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Region-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Dense Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 DeepFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Preparation for Performance Quantify of Optical Flow Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Synthetic Images with Explicit Motion . . . . . . . . . . . . . . . . . . . 2.4.2 Flow Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Flow Error Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 3 4 6 6 7 7 7 8 8
3 Analysing Acceleration in Computer Images’ Stream . . . . . . . . . . . . . . . 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Estimation of Acceleration Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Recovering Acceleration from Optical Flow . . . . . . . . . . . . . . 3.2.2 Approximating the Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Analysing Acceleration Algorithm on Image Sequences . . . . 3.3 Estimating Acceleration Flow Via Other Flow Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 A More Practical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Evaluating Acceleration Algorithms on Synthetic Images . . .
11 11 11 13 14 23 23 23 23 28 29 31 31 32
vii
viii
Contents
3.3.3 Comparison Between Differential and Variational Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Tangential and Radial Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Decomposing the Resultant Acceleration . . . . . . . . . . . . . . . . . 3.4.2 Radial and Tangential Acceleration Fields on Image Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34 35 35 38 44
4 Jerk and High-Order Motion in Computer Images’ Streams . . . . . . . . 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Jerk, Snap and Higher Order Motion in Kinematics . . . . . . . . . . . . . . . 4.3 Jerk and Snap Field Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Appling Multi-orders Flow Fields to Synthetic and Real Images . . . . 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 45 45 47 47 51
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Detecting Heel Strikes Through Radial Acceleration . . . . . . . . . . . . . . 5.2.1 Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Heel Strikes Detection for Gait Analysis . . . . . . . . . . . . . . . . . . 5.2.3 The Acceleration Pattern on Gait . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Strike Position Estimation and Verification . . . . . . . . . . . . . . . . 5.3 Gait Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Large Gait Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 CASIA Gait Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 The OU-ISIR Gait Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Key Frame Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Heel Strike Position Verification . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Detection Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Robustness of Heel Strike Detection Approaches . . . . . . . . . . 5.4.5 Detecting Heel Strikes Via Snap and Jerk . . . . . . . . . . . . . . . . . 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55 55 56 56 57 58 60 62 62 63 63 63 64 64 67 72 73 75 77
6 More Potential Applications Via High-Order Motion . . . . . . . . . . . . . . . 79 6.1 Scenes Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2 Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Abbreviations
AE AEPE CASIA EPE GT k-NN OU-ISIR PR ROI SD SIFT SOTON
Angular Error Average End-Point Error The CASIA Gait Database End-Point Error Ground Truth k-Nearest Neighbors Algorithm The OU-ISIR Gait Database Precise-Recall Region of Interest Standard Deviations Scale-Invariant Feature Transform The Large Gait Database
ix
Chapter 1
Overview
An image is a Snapshot in which all motions are frozen in an instant. This implies that a video involves many motions which coalesce to form the image sequence. In reality, there are many different types of motion: in the simplest sense, there are objects that move with constant velocity and some that move with acceleration; however, in reality, many objects have more complicated motions. Higher-order motion is a distinctive motion feature and deserves to be systematically investigated and introduced in computer vision as a baseline approach. Comparing a person who is walking with constant velocity in general and a speeding up athlete, they can illustrate the diversity of motion. Moreover, each part of both subjects is experiencing different types of motion, especially the legs. When a person is walking, the body moves at approximately a constant velocity, and one of the legs is stationary to support the body while the other one is swinging forward like a pendulum, as shown in Fig. 1.1. These motions can be identified by acceleration as once the status of an object has changed, there must be acceleration. Therefore, we assume that we can find the legs of a person’s body and discriminate between the supporting leg and the swinging one by extracting their acceleration features. This book introduces algorithms for the higher orders of motion and their constituent parts in the flow fields. Synthetic and real-world test images provide different characters among acceleration, Jerk and Snap. The main components of this book are: • The original Horn-Schunck (Horn et al., 1981) optical flow technique is extended to focus on acceleration. This analysis retains the elegance of Horn and Schunck’s formulation with an approach that isolates only acceleration. • The constraints within the Horn-Schunck acceleration algorithm too stringent for application in real-world video footage, so the other state of art optical flow algorithms are used as a basis for approximating acceleration with wider applicability in the general video. • Acceleration is decomposed into its constituent parts: radial and tangential acceleration. © Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_1
1
2
1 Overview
Fig. 1.1 A walking cycle (Cunado et al., 2003)
• Radial acceleration can be used to localize the frame and position of heel strikes for gait analysis. Detecting heel strike via radial acceleration only needs three frames to determine the event compared with previous techniques which need the whole sequence. • The experimental results show that the acceleration detector can increase the precision of the heel strike location significantly, especially when combined with simple classification methods like the mean shift. • The sensitivity of higher-order motion to different imaging conditions is illustrated via a wide range of datasets, as well as different types of distortion: visual angle, lighting condition, Gaussian noise, occlusion and low resolution. • Compared with other heel strike detection techniques, radial acceleration is less sensitive to Gaussian noise noticeably among the three types of noise, which would probably appear in real CCTV footage, whereas more sensitive to the occlusion in the detection region. • The change of acceleration, Jerk and Snap flow, and their constituent parts are preliminary investigated on both synthetic and real images, and the results show the potential for further application and study.
Chapter 2
Describing Motion in Computer Images’ Stream: Optical Flow
2.1 Overview This chapter gives a review of optical flow and evaluates the performance of the benchmark optical flow techniques on both synthetic and real image sequences. In the comparison, the results show that Horn-Schunck is not good at handling large motion. For block matching, the block size is critical to the performance: if the block size is too small it might not able to integrate all the area, and if the block size is too large it will lead the window to include the motion belonging to other group (Sun, 2013). Farneback (2013) is not good at preserving the motion boundaries. DeepFlow appears to outperform the other techniques in most scenarios (Weinzaepfel et al., 2013).
2.2 Optical Flow The concept of optical flow was first described by James J. Gibson in 1950. Optical flow denotes the apparent motion between the observer and the observed object caused by relative motion (Fortun, 2015). It has been widely used in many fields of image processing such as motion estimation and video compression. For an image, optical flow is the change of brightness patterns. Figure 2.1a, b show two successive frames from ETH Zurich Computer Vision Lab optical flow dataset, Fig. 2.1c is the optical flow between them. Thus, many aspects of scene motion can be determined by optical flow: the people and the train are highlighted by optical flow whereas the static objects (e.g., the trees) are not. Optical flow estimation is one of the earliest and still active research topics in Computer Vision. Many methodological concepts have been introduced and the performances have been improved gradually since Horn and Schunck proposed the first variational optical flow estimation algorithm, however, the basic assumption
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_2
3
4
2 Describing Motion in Computer Images’ Stream: Optical Flow
(a) Frame
(b) Frame
(c) Optical flow field Fig. 2.1 The optical flow between two consecutive frames1
of optical flow did not change too much (Fortun et al., 2015). Most state-of-art approaches estimate optical flow by optimizing the weighted sum of two terms: a data term and a prior term (Baker et al., 2011). Mathematically: I = Idata + αI prior
(2.1)
2.2.1 Data Term 2.2.1.1
Brightness Constancy
The fundamental assumption in optical flow is Brightness Constancy (Horn et al., 1981), assuming that the intensity at the position (x, y) in frame t is I(x, y, t), the constraint is that the intensity of this point is constant between successive moment: 1
Video comes from ETH Zurich Computer Vision Lab optical flow datasets.
2.2 Optical Flow
5
I(x, y, t) = I(x + Δx, y + Δy, t + Δt)
(2.2)
the change of the intensity can be expanded by Taylor series approximation: I(x + Δx, y + Δy, t + Δt) = I(x, y, t) +
( ) ∂I ∂I ∂I Δx + Δy + Δt + O Δi 2 ∂x ∂y ∂t (2.3)
If Δt → 0, ∂I ∂I ∂I dx + dy + dt = 0 ∂x ∂y ∂t
(2.4)
∂I dy ∂I ∂I dx + + =0 ∂ x dt ∂ y dt ∂t
(2.5)
After dividing by dt:
and dy separately, it gives the Optical Flow Constraint If using u and v to denote dx dt dt equation: ( ) Ix , I y · (u, v) = −It
(2.6)
There are two unknowns, u and v, with only one constraint. Therefore, in order to solve the problem, the prior term need to be introduced to the model. The alternative of data term is the colour space. Zimmer et al. (2009) incorporate HSV in their data term.
2.2.1.2
Penalty Function
For estimating the flow, one key step is the penalty function to the violations. The most common choice is L2 norm (Horn et al., 1981), which simplifies the computation: ¨ ec =
)2 ( Ix u + I y v + It dxdy
(2.7)
Equation (2.7) corresponds to Gaussian assumption therefore it is not robust on the boundaries if there is occlusion. Black and Anandan (1996) propose a framework based on robust estimation which is adapted to some later work (Wedel et al., 2009; Xu et al., 2012); another popular penalty function is L1 norm (Brox et al., 2004).
6
2.2.1.3
2 Describing Motion in Computer Images’ Stream: Optical Flow
Other Features
Apart from the intensity of the frames, robust pairwise features also can be used for constructing the motion fields. Brox and et al. combine gradient constancy with brightness (Brox et al., 2004); DeepFlow (Weinzaepfel et al., 2013) and SIFT flow (Liu et al., 2008) both use Scale-Invariant Feature Transform (SIFT) for matching, which performs the best among the illumination invariant features.
2.2.2 Prior Term To make the problem well-posed, an additional constraint needs to be introduced to the algorithm. The most widely used prior term is smoothness, which assumes that the neighbouring pixels which belong to the same object have a similar motion (Horn et al., 1981; Lucas et al., 1981). If using L2 norm, the penalty function is: ¨ () es =
∂u ∂x
(2
) +
∂u ∂y
(2
) +
∂v ∂x
(2
) +
∂v ∂y
(2 ) dxdy
(2.8)
Besides the first-order, Trobin et al. (2008) use second-order prior to achieve high-accuracy optical flow, and Wedel et al. (2009) adapt rigid motion as their prior.
2.2.3 Learning Methods It is hard to ignore learning algorithms in the upcoming optical flow approaches. Gordong and Milman (2006) learn a statistical model of both the error of brightness and smoothness constrains. FlowNet first uses convolutional neural networks (CNNs) to predict optical flow from a large quantity of training data (Dosovitskiy et al., 2015). Later Recurrent All-Pairs Field Transforms (RAFT) (Teed et al., 2020) combine CNN and RNN and achieve better results. Apart from the above methods, numbers of optimization improve the performance of optical flow algorithms; however, this is beyond the intension of this book. The next section will give details and compare the performance of four benchmark algorithms among different types.
2.3 Selected Optical Flow Algorithms
7
2.3 Selected Optical Flow Algorithms 2.3.1 Differential Method Horn and Schunck developed the first differential approach to computing optical flow in 1981. It represented the beginning of variational techniques in Computer Vision (Sun et al., 2014). Nowadays most upcoming algorithms still build their algorithms based on Horn-Schunck’s theory. They estimate optical flow from the spatial–temporal derivatives of image intensity based on brightness constancy and motion smoothness. Combing Eqs. (2.7) and (2.8), the problem becomes one of minimizing the change of the optical flow along both the horizontal and vertical directions: ¨ ( )2 ) ( ) ( (2.9) e = αes + ec = α ∇ 2 u + ∇ 2 v + Ix u + I y v + It dxdy where α is the factor of motion smoothness. The solution, the velocities (u, v), are obtained by minimizing the total error (Horn et al., 1981).
2.3.2 Region-Based Method Block Matching is one of the most fundamental methods in region-based matching techniques. The algorithm assumes that the intensity of every single pixel remains constant between successive frames if the motion is continuous (there is no occlusion) (Fortun et al., 2015). Optical flow can be easily computed by determining which block best matches the current block in a selected neighbourhood. The implementation of a region-based matching technique can be achieved by minimizing the sum-of-squared difference (SSD) between blocks in the image (Barron et al., 1994): SS D =
∑
(I(x + Δx, y + Δy, t + Δt) − I(x, y, t))2
(2.10)
(x,y)∈B
where I(x, y, t) is the pixel intensity of position (x, y) at frame t. The matching block is that where the error is minimum within the search area, and the optical flow is therefore the change in position between the current block and the matched block (Yaakob et al., 2013).
8
2 Describing Motion in Computer Images’ Stream: Optical Flow
2.3.3 Dense Optical Flow Farneback (2003) developed the most popular dense optical flow algorithm based on polynomial expansion. In the algorithm, the neighbour of each pixel can be approximated by polynomial expansion: f (x) ∼ xT Ax + bT x + c
(2.11)
( )T where x = i j , A is a 2 × 2 matrix and b is a 2 × 1 vector. Optical flow assumes that the image intensity is constant. Therefore, if the displacement between f 1 (x) and f 2 (x) is d: f 2 (x) = f 1 (x − d) = (x − d)T A1 (x − d) + b1T (x − d) + c1 = xT A2 x + b2T x + c2 (2.12) ⎧ ⎨ A2 = A1 (2.13) b = b1 − 2A1 d ⎩ 2 T T c2 = d A1 d − b1 d + c1 Thus if A1 is non-singular, the displacement d is: 1 d = − A−1 (b2 − b1 ) 2 1
(2.14)
2.3.4 DeepFlow DeepFlow has emerged as a popular optical flow technique in recent years due to its excellent performance on large displacement estimation and non-rigid matching and was developed by Weinazepfel et al. in 2013. DeepFlow made a step towards bridging the gap between descriptor matching algorithms with large displacement optical flow techniques (Weinzaephel et al., 2013). It will be introduced it in two parts: the deep matching algorithm and the energy minimization framework.
2.3.4.1
DeepMatching
The deep matching algorithm first splits the Scale-Invariant Feature Transform (SIFT) (Lowe, 2004) descriptor from a 128-dimensional real vector into four quadrants: the gradient of the points of interest are changed from H ∈ R 128 ] [ 1 2orientations 3 4 into H = H H H H where H s ∈ R 32 . In order to maximize the similarity
2.3 Selected Optical Flow Algorithms
9
between the reference and target descriptor, DeepFlow optimizes the positions of H s on the target descriptor by assuming that each of the quadrants can moving independently with some extent rather than keep them fixed: sim(H, Q( p)) =
4 ∑
max HsT Q( ps )
s=1
(2.15)
ps
where Q( p) ∈ R 32 is one quadrant of the reference descriptor. By assuming each of the quadrants can moving independently, a coarse-to-fine non-rigid matching can be obtained efficiently. { } L−1 { } L−1 If Pi, j i, j=0 and Pi, j ' i, j=0 denote the reference and target descriptor respectively, the optimal warping ω∗ is the one that maximizes the similarity between pixels: ∑ ( ) ( ) S ω∗ = max S(ω) = max sim P(i, j ), P ' (ω(i, j )) ω∈W
ω∈W
(2.16)
i, j
where ω(i ) returns the position of pixel i in P ' . If defined recursively, then we can obtain the optimal warpings that are largely robust to deformation (Weinzaepfel et al., 2013).
2.3.4.2
Energy Minimization Framework
DeepFlow is an energy minimization function which is similar to Horn-Schunck. It is based on the same two assumptions: intensity constancy and smooth motion. In addition, an extra term, deep matching, is blended into the framework: ∫ E(ω) =
(E D + α E s + β E M )dx
(2.17)
where E D is the weighted data sum, E S is the smoothness term and a matching term E M . A robust penalizer is applied to each term: Ψ(s) =
√ s2 + ∈2
(2.18)
with ∈ = 0.001 which was determined empirically. The data term consists of two penalizers of brightness: ( E D = δΨ
c ∑ i=1
) ω
T
i J 0ω
+ γψ
( c ∑ i=1
) ω
T
i J xyω
(2.19)
10
2 Describing Motion in Computer Images’ Stream: Optical Flow
where the first term is the penalizer over image channels, the second one is the penalization for the x and y axes. ω = (u, v)T is the flow we seek to estimate, c is the number of the image channels. J0 is a tensor which is normalized by spatial derivatives: ( )( ) i J 0 = θ0 ∇3 I i ∇3T I i
(2.20)
∇3 is the spatial–temporal gradient (∂ x, ∂ y, ∂z). θ0 is the spatial normaliza(|| )−1 ||2 tion factor ||∇2 I i || + ξ 2 to reduce the impact of small gradient locations and ξ = 0.1 to prevent the factor from being zero. The gradient constancy penalizer is normalized along the x and y axes respectively: )−1 ( )−1 ||2 ||2 )( )(|| )( )(|| ( i J x y = ∇3 Ixi ∇3T Ixi ||∇2 Ixi || + ξ 2 + ∇3 I yi ∇3T I yi ||∇2 I yi || + ξ 2 (2.21) where Ix and I y are the gradient derivatives with respect to the horizontal and vertical axes. The smoothness term in DeepFlow is a penalization for gradient flow: ) ( E s = Ψ ||∇u||2 + ||∇v||2
(2.22)
The purpose of the matching term is to find the most similar flow to the known vector ω' as previously introduced in Sect. 2.3.4.1. The difference is estimated by: || ||2 E M = bφΨ ||ω − ω' ||
(2.23)
Due to the matching being semi-dense, a binary term b(x) is added into the matching term. b(x) equals 1 if and only if there is a match at position x. φ(x) is a weight that is low in the flat area. The optical flow we seek to estimate ω : (u, v)T can be obtained by minimising the energy function (Weinzaepfel et al., 2013). This section introduced the principles of the classical optical flow algorithms from different categories as well as the state of art. Before the evaluation, it is worth describing the performance quantification.
2.4 Preparation for Performance Quantify of Optical Flow Algorithms
11
2.4 Preparation for Performance Quantify of Optical Flow Algorithms 2.4.1 Synthetic Images with Explicit Motion The advantage of synthetic images is that they lack specularity, or other types of noise. Also, the motion field and the scene properties can be manipulated as required. For compare the desired flow, some test images with only simple motion (like linear shift or rotation) are necessary. Two sets of synthetic image sequences were constructed, which can identify where the flow approaches fail in the first place. The artificial image sequences involve linear translation and rotation are synthesized by using images from Middlebury database (Baker et al., 2011). A subpart of a frame from Mequon (the block of two faces in the middle of synthetic images shown in Fig. 2.2) in Middlebury is embedded in a frame from the Wooden images. The sub-Mequon shifts along a linear trajectory to the lower right corner at speed 1 and 3 pixels/frame, both on horizontal and vertical axes. The rotation sequence is obtained by rotating the middle square around its centre to form circular motion. Figure 2.2 gives examples of linear shifting when the displacement is 32 pixels from the start position and rotation images when the rotation degrees are 10° and 30°.
2.4.2 Flow Visualization In the early optical flow papers, the flow field is visualized by an arrow which starts from the initial position to the moving direction, and the length of the arrow indicates the magnitude of the displacement. Figure 2.3 gives an example of using arrows to present the motion field. With the development of optical flow, the new approaches can handle more complicated and anisotropic situations, so using arrows might cause confusion in the analysis of a complex motion field. Baker et al. (2011) created a colour coding scheme for visualizing such complex motion when they established their optical flow dataset. The colour scheme is shown in Fig. 2.4: the hue indicates the direction and the saturation represents the intensity of the flow.
12
2 Describing Motion in Computer Images’ Stream: Optical Flow
(a)
(b)
(d)
(c) Fig. 2.2 Examples of synthetic test images
(a) A frame of Yosemite sequence
(b) Motion field visualized by arrows
Fig. 2.3 Presenting flow using arrows (Farneback, 2003)
2.4 Preparation for Performance Quantify of Optical Flow Algorithms
13
Fig. 2.4 Flow field colour coding (Baker et al., 2011)
2.4.3 Flow Error Measurements In general, there are two commonly used measurements in optical flow: Angular Error (AE) and End-Point Error (EPE) (Baker et al., 2011). If the ground-truth vector and estimated vector are (u GT , vGT )T , (u, v)T , AE measures the angle between the ground-truth flow and predicted flow vector in a 3D space (pixel, pixel, frame): ⎞
⎛
1.0 + u × u GT + v × vGT ⎠ √ AE = cos−1 ⎝ √ 2 1 + u 2 + v 2 1 + u 2GT + vGT
(2.24)
where (u GT , vGT , 1)T , (u, v, 1)T are the extended 3D vectors. The second measurement, EPE, is the Euclidean distance between the two vectors on a 2D image plane: √ EPE =
(u − u GT )2 + (v − vGT )2
(2.25)
Both of the measurements have their own advantages and bias: AE is more sensitive to the error of small motion whereas it undervalues the errors with large motion. EPE strongly penalizes the large motion errors but is insensitive to errors with small motion (Fortun et al., 2019). Now we have given a general sense of how the flow can be presented and evaluated, and the next section is going to test the methods introduced in Sect. 2.3 and illustrate the results in the most appropriate way.
14
2 Describing Motion in Computer Images’ Stream: Optical Flow
2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images The optical flow methods are first tested on the synthetic linear translation sequence, the results produced by simple motion can highlight the drawbacks of techniques. The collection of flow fields is shown in Fig. 2.5. Although the flow fields are self-evident, some discussion is still required. Beginning with small motion it can be observed that all the techniques produce reasonably accurate flow fields. When the displacement increases to 3 pixels, HornSchunck and Block Matching have poor estimations since they are global algorithms without multi-layer refinement. Block Matching has more error on the smooth area since the local method will be ill-posed in homogeneous region (Sun, 2012). The accuracy largely relies on an adequate block size according to the motion and texture of the image. Farneback gives some directional errors along the block edge. As the most advanced approach, DeepFlow performs the best on both scenarios, including the discontinuous area. In addition to the synthetic images, the optical flow methods are also evaluated on the famous test image sequence, Yosemite, shown in Fig. 2.6. It is a conventional challenge for most optical flow algorithms due to the divergent displacement on different areas and the edges between the mountains are occluded as the scene moves. The non-uniform motion is caused by the asymmetrical projection of 3D motion in real-world onto the 2D image surface: the upper right corner translates to the right with a speed of 2 pixels/frame and the speed in the lower left area is about 4–5 pixels/frame (Barron et al., 1994). Although the scene is complex, the motion field is simple. The camera is moving straight forward smoothly without any rotation or distortion. The results are identical within synthetic sequences, and the estimation result by DeepFlow is linear and evenly distributed along the moving direction of the camera as expected. Farneback is noisy at the edge of the image whereas the results of Horn-Schunck and Block Matching are poor in this test image. For a more objective evaluation, we applied these algorithms on seven image sequences (GT is available) from Middlebury (Baker et al., 2011) to give a statistical measurement. Table 2.1 summaries the Average End-Point Error (AEPE), which is the sum of the distance between estimated flow and GT averaged over all the points with distance exceeds the threshold. Table 2.2 reports the Standard Deviations (SD) of the values shown in Table 2.1. Furthermore, Figs. 2.7 and 2.8 exhibit the distribution of the measurements of RubberWhale and Dimetrodon and support Tables 2.1 and 2.2. Different from the manipulated image sequences, the motion fields in Middlebury are more complex than the synthetic sequence, so we opt to use colour coding to illustrate the flow field. The measurements on Middlebury give consistent results with synthetic images: DeepFlow outperforms the other methods. The accuracy is largely benefited by using the robust SIFT descriptors in feature matching and structuring the response pyramids using multiple sizes whereas the remaining approaches are global algorithms which
2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images Large rigid motion (3 pixels)
DeepFlow
Farneback
Block Matching
Horn-Schunck
Small rigid motion (1 pixel)
Fig. 2.5 Estimated flow fields on small and large rigid motion
15
16
2 Describing Motion in Computer Images’ Stream: Optical Flow
(a) GT of Yosemite
(b) Horn-Schunck
(c) Block Matching
(d) Farneback
(e) DeepFlow
Fig. 2.6 The GT of Yosemite and the estimated flow fields
0.65
0.17
1.78
0.26
0.11
Block Matching
Farneback
DeepFlow
Hydrangea
2.79
3.35
Dimetrodon
1.66
Method
Horn-Schunck
Table 2.1 AEPE on Middlebury datasets RubberWhale
0.13
0.21
0.82
0.61
Urban2
0.29
7.53
8.09
8.06
Urban3
0.44
6.75
7.05
7.18
Grove2
0.18
0.47
1.37
3.16
Grove3
0.66
2.37
2.98
3.89
Avg
0.28
2.61
3.55
3.99
2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images 17
1.48
0.36
1.24
0.41
0.1
Block Matching
Farneback
DeepFlow
Hydrangea
2.13
1.6
Dimetrodon
0.94
Method
Horn-Schunck
Table 2.2 The SD of AEPE RubberWhale
0.26
0.48
0.88
0.65
Urban2
0.95
8.85
8.23
8.17
Urban3
1.44
5.53
4.85
4.83
Grove2
0.43
0.96
1.57
1.32
Grove3
1.45
3.12
2.93
2.96
Avg
0.71
2.98
3.12
2.92
18 2 Describing Motion in Computer Images’ Stream: Optical Flow
2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images
(a) Frame 10 (Baker et al., 2011)
(b) Frame 11 (Baker et al., 2011)
(c) GT (Baker et al., 2011)
(d) Colour coding
(e) Horn-Schunck
(f) Block Matching
(g) Farneback
(h) DeepFlow
Fig. 2.7 The input frames of RubberWhale, GT and the estimated flow fields
19
20
2 Describing Motion in Computer Images’ Stream: Optical Flow
(a) Frame 10 (Baker et al., 2011)
(b) Frame 11 (Baker et al., 2011)
(c) GT (Baker et al., 2011)
(d) Colour coding (Baker et al., 2011)
(e) Horn-Schunck
(f) Block Matching
(g) Farneback
(h) DeepFlow
Fig. 2.8 The input frames of Dimetrodon, GT and the estimated flow fields
2.5 Performance Comparison of Optical Flow Algorithms on Synthetic Images
21
lack propagation of flow estimation across different scales. The main drawback of DeepFlow is detecting the motion of small objects, as can be observed in Fig. 2.8h, the shape of the dinosaur’s head is very blurred. However, the remainder of the result by DeepFlow is remarkably consistent with the GT, and it is much more evident than for the other (standard) approaches.
Chapter 3
Analysing Acceleration in Computer Images’ Stream
3.1 Overview In most research on motion analyses in computer vision, only relative movement between consecutive frames has been considered, without considering the higher orders of motion. Acceleration is a more distinct feature than displacement or velocity. Analysing motion in terms of acceleration can provide better understanding of the scene. This chapter introduces acceleration detection algorithms and generalize them to real-world motion. In addition, the acceleration field can be decomposed into constituent parts to allow a greater depth of understanding of the motion and the algorithms have been tested and show the power on a variety of image sequences. The experimental results illustrate the ability of acceleration on discriminating different motions whereas velocity did not show any obvious difference.
3.2 Estimation of Acceleration Flow 3.2.1 Recovering Acceleration from Optical Flow Acceleration is a vector describing the magnitude and direction of the change of velocity. Average acceleration is the average rate of change of velocity with respect to a time interval. As with velocity, when the time approaches zero, it is termed instantaneous acceleration: a→ = lim
Δt→0
d→ v Δ→ v = Δt dt
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_3
(3.1)
23
24
3 Analysing Acceleration in Computer Images’ Stream
There was little work analysing acceleration before the work determining gait events through acceleration flow (Sun et al., 2018). Since then, there has been contemporaneous research on acceleration which has, as will be shown, significant scope for improvement. Chen et al. (2015) establish an algorithm based on the combination of classic Horn-Schunck (1981) and Lucas-Kanade (1981) optical flow algorithms. They assume that the image brightness is constant during a short period. Letting I(x, y, t) denote the image intensity of point (x, y) at time t, then: I(t − Δt)x−Δx1 ,y−Δy1 = I(t)x,y = I(t + Δt)x+Δx2 ,y+Δy2
(3.2)
Expanding the left side of Eq. (3.2) by Taylor series gives us: I(t − Δt)x−Δx1 ,y−Δy1 = I(t)x,y − Ix Δx1 − I y Δy1 − It Δt ) 1( ) 1( ) 1( + Ix x (Δx1 )2 + I yy (Δy1 )2 + Itt (Δt)2 2 2 2 + Ix y Δx1 Δy1 + Ixt Δx1 Δt + I yt Δy1 Δt + ξ (3.3) where Δx1 and Δy1 are the horizontal and vertical displacement between the first frame and second frame; Ix = ∂I/∂ x, I y = ∂I/∂ y, and It = ∂I/∂t are the first-order spatial and temporal partial differentials of the image intensity; Ix x , Ix y , I yy , Ixt , I yt and Itt are the second-order partial differentials (Ix x = ∂ 2 I/∂ x 2 , , etc.), and ξ contains the higher order terms. The right side of Eq. (3.2) is expanded into a similar format as Eq. (3.3), after some rearrangement, we can get: Itt (Δt)2 + Ixt (Δx1 + Δx2 )Δt + I yt (Δy1 + Δy2 )Δt )) 1( ( + Ix y (Δx1 Δy1 + Δx2 Δy2 ) + Ix x (Δx1 )2 + (Δx2 )2 2 )) 1( ( + I yy (Δy1 )2 + (Δy2 )2 + Ix (Δx2 − Δx1 ) + I y (Δy2 − Δy1 ) + ξ = 0 2 (3.4) Equation (3.4) is then divided by (Δt)2 , and meanwhile, the higher order term ξ is ignored, resulting in: ( ( ( ) ) ) Δx2 Δy2 Δx1 Δy1 Δx1 Δy1 Δx2 Δy2 + + I yt + + Ix y + Itt + Ixt Δt Δt Δt Δt Δt Δt Δt Δt ( (( ( (( )2 ( )2 )) )2 ( ) )) Δx1 Δy1 1 Δx2 Δy2 2 1 + + + + Ix x I yy 2 Δt Δt 2 Δt Δt ( ) ( ) Δx1 Δy1 1 Δy2 1 Δx2 + Ix − + Iy − =0 (3.5) Δt Δt Δt Δt Δt Δt
3.2 Estimation of Acceleration Flow
25
1 2 1 2 If δt → 0, Δx , Δx , Δy and Δy , are the horizontal and vertical velocities, which Δt Δt Δt Δt ( ) 1 Δx2 1 are denoted as u1, u2, v1 and v2. In their formulation, the terms Δt − Δx and Δt Δt ( ) Δy1 1 Δy2 − Δt are considered as the acceleration along x and y axes separately, Δt Δt denoted au and av :
Itt + Ixt (u1 + u2) + I yt (v1 + v2) + Ix y (u1v1 + u2v2) )) 1 ( ( )) 1( ( + Ix x u12 + u22 + I yy v12 + v22 + Ix au + I y av = 0 2 2
(3.6)
They assume the velocity is constant, namely u1 = u2, v1 = v2: Itt + 2Ixt u + 2I yt v + 2Ix y uv + Ix x u 2 + I yy v 2 + Ix au + I y av = 0
(3.7)
Eventually, their constraint equation is: (
I x a x + I y a y + Iv = 0 Iv = Itt + 2Ixt u + 2I yt v + 2Ix y uv + Ix x u 2 + I yy v 2
(3.8)
By following Lucas-Kanande (1981), Chen et al. (2015) made an assumption that acceleration is constant over a small patch on the flow field to turn the constraint equations into an over-determined problem. The solution by the least square ( is determined ) ( ) Δy1 Δx1 1 Δy2 1 Δx2 algorithm. It seems that Δx Δt − Δt and Δt Δt − Δt in Eq. (3.5) probably indicate the derivatives of acceleration: Jerk field rather than acceleration fields. Later, Dong et al. (2016) aimed to feed acceleration into deep networks as a motion descriptor for detecting violence in videos. In their work, acceleration was estimated by expanding the horizontal velocity field U and vertical velocity V at t + Δt by a Taylor series: U(t + δt)x+Δx,y+Δy = U(t)x,y + Ux Δx + U y Δy + Ut Δt V(t + δt)x+Δx,y+Δy = V(t)x,y + Vx Δx + V y Δy + Vt Δt
(3.9)
The higher order terms are ignored in Eq. (3.9). The change of velocity over time is: dx dy dU = Ux + Uy dt dt dt dx dy dV = Vx + Vy (3.10) dt dt dt ) ( dy , corresponds to (u, v), and The approach appeared to assume that dx dt dt therefore, the acceleration is obtained as: a = (ωΔu, ωΔv)
(3.11)
26
3 Analysing Acceleration in Computer Images’ Stream
The acceleration flow is obtained by a second-order differential of neighbouring frames, which is computing flow on optical flow. However, the main drawback of this idea is probably that it is difficult to compute the spatial partial derivatives due to the smoothness of the optical flow field (the neighbour pixels tend to have similar velocities). The accuracy of optical flow algorithms has improved steadily over the past few years, the basic formulation, changed little after the pioneering work of Horn and Schunck (Sun et al., 2010). Sun et al. (2017, 2018) (Variational Acceleration) build the Variational Acceleration algorithm based on Horn and Schunck’s work since almost all the state-of-the-art algorithms are still using their theory (Sun, 2013). Horn and Schunck assumed that the intensity of a point does not change between two consecutive frames. Therefore Sun et al. extend this principle to three frames for estimating acceleration fields. If I(x, y, t) denotes the image intensity on (x, y) at time t, the image intensity is constant at frame t − δt, t, t + δt: (
I(t − Δt)x−Δx1 ,y−Δy1 = I(t)x,y I(t)x,y = I(t + Δt)x+Δx2 ,y+Δy2
(3.12)
Expanding Eq. (3.12) by a Taylor expansion: (
I(t)x,y − Ix Δx1 − I y Δy1 − It Δt + ξ = I(t)x,y I(t)x,y = I(t)x,y + Ix Δx2 + I y Δy2 + It Δt + ξ
(3.13)
Ignoring the high-order terms, we have: −Ix Δx1 − I y Δy1 − It Δt = Ix Δx2 + I y Δy2 + It Δt
(3.14)
Dividing Eq. (3.14) by Δt, −Ix
Δx1 Δy1 Δx2 Δy2 − Iy − It = I x + Iy + It Δt Δt Δt Δt
(3.15)
Then the gradient constraint is yielded: ∇I · (vt−Δt ) − It = ∇I · (vt+Δt ) + It
(3.16)
) ( where ∇ = ∂∂x , ∂∂y , and v consists of horizontal and vertical components (u, v)T . If the acceleration changes dynamically from frame to frame, then there is little chance of recovering the acceleration. More commonly, motion is smooth which means acceleration is constant during a small period. Therefore, Sun et al. assume that the acceleration does not change during three consecutive frames. According to Newton’s laws: vt = v0 + aΔt
(3.17)
3.2 Estimation of Acceleration Flow
27
vt−Δt and vt+Δt in Eq. (3.16) can be substituted by: (
vt−Δt = vt − aΔt vt+Δt = vt + aΔt
(3.18)
where vt represents the velocity vector at time t and the acceleration vector a is composed of horizontal and vertical components (au , av )T , we have: ∇I(vt − aΔt) − It = ∇I(vt + aΔt) + It
(3.19)
∇I · aΔt + It = 0
(3.20)
It =0 Δt
(3.21)
Dividing Eq. (3.20) by Δt: ∇I · a +
If Δt → 0, optical flow constraint equation of acceleration is: ∇I · a + Itt = 0
(3.22)
where Itt is the second order of image intensity with respect to time. Now we have two unknowns a(au , av )T in Eq. (3.22) but with only one equation, and there are infinite solutions. Hence, another equation is needed to avoid this ill-posed problem. Acceleration also has similar smoothness characteristics to velocity in that neighbouring pixels tend to have similar acceleration. This shows a natural linkage between velocity and acceleration analysis in image sequences. The smoothness means that the neighbouring pixels tend to have similar acceleration. The way to estimate the smoothness constrain is minimizing the error of the squares of the Laplacians of the horizontal and vertical flow: ) ¨ ( 2 ¨ ) ( 2 ∂ au ∂ 2 au ∂ 2 av ∂ 2 av 2 2 dxdy + + + εs = ∇ au + ∇ av dxdy = ∂x2 ∂ y2 ∂x2 ∂ y2 (3.23) Combing the error needing to be minimized into brightness constrain, we get: ¨ ε2 =
( 2 ) εd + λ2 εs2 dxdy
(3.24)
where λ is dependent on the noise in the pixel intensity, the error of data term εd is:
28
3 Analysing Acceleration in Computer Images’ Stream
¨ εd =
(∇I · a + Itt )dxdy
(3.25)
Now the problem is well-posed.
3.2.2 Approximating the Derivatives For estimating acceleration in image sequences, the derivatives are approximated between three consecutive frames. It is important that the derivatives are consistent so that the pixel that in the image at the same time refers to the same point in the implementation. There are many ways of estimating differentiation, this book uses the same spatial kernels with Horn-Schunck (Horn et al., 1981). The spatial–temporal relationship is illustrated in Fig. 3.1. To estimate the first order horizontal and vertical derivatives, we use the following kernel (Horn et al., 1981) (Fig. 3.2): Since three frames need to be considered in computing acceleration therefore the spatial derivatives are:
Fig. 3.1 Estimating the partial derivatives for point (x, y, t)
Fig. 3.2 Spatial derivatives kernels
3.2 Estimation of Acceleration Flow
29
1
-2
1
Fig. 3.3 The temporal temple for second-order time derivative
Ix ≈ Iy ≈
) 1 ∑ ∑( Ix+1,y,t − Ix,y,t 6 y t ) 1 ∑ ∑( Ix,y+1,t − Ix,y,t 6 x t
(3.26)
The data constrain (Eq. 3.22) contains the second-order time derivative, Laplacian operator is chosen to compute the second order of time derivative in the implementation, the template is (Fig. 3.3):
Itt ≈
) 1 ∑ ∑( Ix,y,t+1 − 2Ix,y,t + Ix,y,t−1 8 x y
(3.27)
where x ∈ {m, m + 1}, y ∈ {n, n + 1}, t ∈ {k − 1, k, k + 1}. By following a similar solution to Horn-Schunck to express the smoothness constraint, eventually, the acceleration flow in image can be determined by: [ ] ( 2 ) Ix + I2y (au − au ) = −Ix Ix au + I y av + Itt ( 2 [ ] ) Ix + I2y (av − av ) = −I y Ix au + I y av + Itt
(3.28)
We now have the basis for detecting acceleration, we shall now move to evaluating the algorithms to determine whether we can indeed detect acceleration from image intensity.
3.2.3 Analysing Acceleration Algorithm on Image Sequences Figure 3.4 shows the acceleration detection results of synthetic image sequences with rigid motion. The algorithms are tested on the sequences without acceleration, with small acceleration and large acceleration respectively. The synthetic images are manipulated by two test images from Middlebury dataset (Baker et al., 2011). The results detected by Variational Acceleration algorithm are compared with Dong’s algorithm, denoted by Dong (Dong et al., 2016). Dong detects acceleration flow by computing the optical flow on optical flow. To make the comparison fair, HornSchunck is used in the implementation since Variational Acceleration is based on Horn-Schunck as well.
30
3 Analysing Acceleration in Computer Images’ Stream
Variational Acceleration
Large acceleration (
)
Small acceleration (
)
Non-acceleration
Dong
Fig. 3.4 The acceleration fields detected by various methods of synthetic images (Baker et al., 2011)
3.3 Estimating Acceleration Flow Via Other Flow Estimation Methods
31
In the first row, Dong detects evenly distributed acceleration field on the moving block however the motion field does not contain any acceleration. The acceleration field become denser when the sub-Mequon is accelerating with small value in the second row and it did not change much when the acceleration become large. In the performance of Variational Acceleration, the result is acceptable when the sub-Mequon moves constantly. There is little acceleration flow detected, this is consistent with the expected motion, only some random noise. Variational Acceleration detects more flow with the increasing of acceleration but the results tending to be noisier at the same time.
3.3 Estimating Acceleration Flow Via Other Flow Estimation Methods 3.3.1 A More Practical Approach Since the motion in real images is often large and non-rigid, the results of acceleration based on the assumption of constant intensity are not satisfactory. Therefore, researchers want to seek more feasible form for recovering acceleration from image sequences. Optical flow is still an active area in computer vision, new algorithms are constantly emerging and the performance has significantly improved since the first variational algorithm Horn-Schunck. Therefore, instead of aiming to improve the basic theory, Sun et al. (2017) believe it was better to use a state of art algorithm to approximate the acceleration flow. According to Equation (3.1), the acceleration field can be approximated by the differential of neighbour velocity fields: ˆ A(t) = Vˆ (t, t + Δt) − Vˆ (t − Δt, t)
(3.29)
ˆ t + Δt) is the velocity field referring frame t to (t + Δt) and V(t ˆ − Δt, t) where V(t, is the velocity field referring frame (t − Δt) to t. In implementation, the resultant optical flow field can be considered as the velocity field due to the fix frame rate, the unit is pixel/frame. To avoid the error caused by inconsistent reference along the time axis, Sun et al. proposed a new way that refers to the middle frame as the start frame in the temporal template as explained in Fig. 3.5, they term this approach Differential Acceleration algorithm (Sun et al., 2017). The acceleration field approximation by Differential Acceleration can be expressed as: [ ] ˆ A(t) = Vˆ (t, t + Δt) − − Vˆ (t, t − Δt)
(3.30)
32
3 Analysing Acceleration in Computer Images’ Stream
Fig. 3.5 Computing acceleration field referred the middle frame as the start
Now this more practical algorithm can be deployed on images to see whether this approach can achieve better results than Variational Acceleration. In the implementation of Differential Acceleration, DeepFlow (Weinzaepfel et al., 2013), which is a popular technique with excellent performance for large displacement estimation and non-rigid matching, is used as our fundamental technique for Differential Acceleration.
3.3.2 Evaluating Acceleration Algorithms on Synthetic Images The performances of Dong (Dong et al., 2016), Variational Acceleration and Differential Acceleration are evaluated on the famous test sequence Yosemite. There is little acceleration in the original image sequence, so the artificial acceleration field in this image sequence is induced by skipping one frame to see whether the algorithms are robust enough to estimate acceleration fields under this sequence. The detection results are illustrated in Fig. 3.6. Since currently most optical flow databases (Baker et al., 2011; Geiger, 2013), are targeted at measuring velocity flow, either the test image sequences only have two frames or the GT is only one single velocity field. Obtaining an accurate GT of these benchmark optical flow databases becomes an obstacle during this research on acceleration. To evaluate the acceleration algorithms, pseudo acceleration GT is computed by MDP-Flow2 (Xu et al., 2012), and it is a highly-ranked method in the evaluation of optical flow algorithms1 and the code is publicly available. We use MDP-Flow2 to estimate the pseudo GT of velocity flow first, the pseudo acceleration flow is computed by Eq. (3.30). The acceleration in pseudo GT mostly focusses on the lower part of the manipulated Yosemite sequence (Baker et al., 2011) shown in Fig. 3.6a. The detection 1
www.vision.middlebury.edu/flow/eval/
3.3 Estimating Acceleration Flow Via Other Flow Estimation Methods
(a) Pseudo GT estimated by MDP-Flow2 (Xu et
33
(b) Dong (Dong et al., 2016)
al., 2012)
(c) Variational Acceleration
(d) Differential Acceleration
Fig. 3.6 Comparing detection results on Yosemite
results of Dong in Fig. 3.6b show the flow is on the whole mountain area. This is because the principle of their method computes the optical flow of the optical flow. However, once the flow is detected from the image sequence, the field varies in a smooth manner, which means the neighbour pixels tend to have similar velocities, the moving objects lose texture which is important for the majority of optical flow algorithms. In Fig. 3.6c, the results estimated by Variational Acceleration show an improvement over Dong, with the acceleration mostly focussed on the lower left corner of the image which shows consistency with the pseudo GT. However, the result is still very noisy because the motion violates both the data and smoothness constrains (occlusion and large motion). Variational Acceleration is restricted to the linear domain of the image derivatives, which means the displacement must be very small and the motion must be smooth. Differential Acceleration detects evenly distributed acceleration flow in the lower right corner. The result shows a considerable improvement compared with the results
34
3 Analysing Acceleration in Computer Images’ Stream
of Dong and Variational Acceleration. The acceleration field in Fig. 3.6d shows less noise and it is more the consistent with the pseudo GT.
3.3.3 Comparison Between Differential and Variational Acceleration Apart from Yosemite, Tables 3.1 and 3.2 report the statistics of performance of Dong, Variational Acceleration and Differential Acceleration in estimating acceleration on a number of sequences of the Middlebury (Baker et al., 2011) for a more objective evaluation. The statistical results are compared with the pseudo GT. The pseudo GT is estimated by MDP-Flow2 (Xu et al., 2012) since only one ground truth optical flow field between consecutive frames is available on the evaluation website to prevent upcoming new algorithms from overfitting the test images. AEPE is chosen here to present the error since the motion in Middlebury image sequences is relatively large in the pseudo GT. The average ranks, reported AEPE and SD of MDP-Flow2, DeepFlow and Horn-Schunck on the Middlebury evaluation website are shown in Tables 3.3 and 3.4 for reference. The optical flow evaluation uses average absolute endpoint error and the standard deviation to present the results. It is difficult to include the uncertainty although it will give more information of the results since the error of each part could be very different (e.g., some algorithms are good on the discontinuous area and some are good on the textured area). The benchmark evaluation dataset Middlebury gives the separate results of “all”, “discontinuous” and “textured”. However, it is difficult to separate the results because the labels for the edges and textured area are not available. Table 3.1 AEPE of estimation algorithms on Middlebury datasets Method
Backyard
Dumptruck
Mequon
Schefflera
Walking
Yosemite
Avg
Dong
3.21
2.58
3.54
3.35
3.13
3.39
3.2
Variational acceleration
2.48
1
3.38
2.89
1.87
2.1
2.29
Differential acceleration
0.35
0.3
0.29
0.37
0.51
0.25
0.35
Table 3.2 SD of EPE between acceleration algorithms Method
Backyard
Dumptruck
Mequon
Schefflera
Walking
Yosemite
Avg
Dong
2.94
2.43
3.01
2.67
2.45
3.26
2.79
Variational acceleration
3.74
2.79
4.18
3.49
2.52
2.62
3.22
Differential acceleration
0.94
1.29
0.78
0.98
0.99
0.32
0.88
3.4 Tangential and Radial Acceleration
35
Table 3.3 The reported AEPE ranks on Middlebury evaluation website2 Method MDP-Flow2 DeepFlow Horn-Schunck
AEPE average rank
Mequon
Schefflera
Yosemite
11.8
0.15
0.20
0.11
69.9
0.28
0.44
0.11
113.3
0.61
1.01
0.16
Table 3.4 The reported SD on Middlebury evaluation website (see Footnote 2)
Method
Mequon
Schefflera
Yosemite
MDP-Flow2
0.40
0.55
0.12
DeepFlow
0.78
1.23
0.12
Horn-Schunck
0.98
1.88
0.16
Examples of results from Middlebury dataset are illustrated in Fig. 3.7. The motion of illustrated results is relatively dense so visualizing the motion field with arrows could cause confusion, flow field colour coding create by Baker et al. (2011) is used in this book to present the fields. The experimental results show that both the assumptions of constant intensity and smooth motion in Variational Acceleration are too strong for real motion (most are complex) so Differential Acceleration is used to detect and represent acceleration flow in the rest of this book.
3.4 Tangential and Radial Acceleration 3.4.1 Decomposing the Resultant Acceleration Acceleration of curved motion is composed of two components: tangential and radial acceleration. The tangential component changes the magnitude of the velocity and the direction is located in the tangent line of the trajectory (increasing or decreasing the speed). The radial component (also called centripetal acceleration in circular motion) changes the direction of the velocity and it points to the centre of the curved path (normal to the tangent line of the trajectory). Therefore, decomposing radial and tangential acceleration from resultant acceleration fields will provide a completely new way to understand and disambiguate motions in image sequences. There is linear or circular motion, so the motion incorporated in images is either linear or circular if the time interval is sufficiently small. Sun et al. (2017) assume that the moving points which follow the curved trajectories rotate along the same arc in any three consecutive frames since three nonlinear points can determine one and only one circle. The rotation centre can be calculated by the positions of the pixel in the consecutive frames. Connecting these three points with straight lines and 2
http://vision.middlebury.edu/flow/eval/results/results-a1.php.
36
3 Analysing Acceleration in Computer Images’ Stream
(a) Backyard (Baker et al., 2011)
(a-1) Pseudo GT
(a-2) Variational Acceleration
(a-3) Differential Acceleration
(b) Mequon (Baker et al., 2011)
(b-1) Pseudo GT
Fig. 3.7 More examples of detecting acceleration on real images
applying perpendicular bisectors to them, the centre of the circle is then located at the intersection of the two perpendicular bisectors, as shown in Fig. 3.8. Suppose the coordinates of a point in three consecutive frames are: Pi (xi , yi )i ∈ (1, 2, 3), M O, N O are the perpendicular bisectors of P1 P2 and P2 P3 , and O(xo , yo ) denotes the center of the circular motion, hence: −−→ −−→ −−→ −−→ M O · P1 P2 = N O · P2 P3 Then the coordinate of O can be obtained by:
(3.31)
3.4 Tangential and Radial Acceleration
37
(b-2) Variational Acceleration
(b-3) Differential Acceleration
(c) Walking (Baker et al., 2011)
(c-1) Pseudo GT
(c-2) Variational Acceleration
(c-3) Differential Acceleration
Fig. 3.7 (continued)
O T = 0.5 · Φ−1 Ψ
(3.32)
where, [ Φ=
x2 − x1 y2 − y1 x3 − x2 y3 − y2
]
[
x 2 − x12 + y22 − y12 Ψ = 22 x3 − x22 + y32 − y22
] (3.33)
a = (x2 + au , y2 + av ) presents the coordinates of the acceleration vector in the image plane. The positions of tangential acceleration t ang(u, v) and radial
38
3 Analysing Acceleration in Computer Images’ Stream
Fig. 3.8 Location of radial acceleration centre
acceleration r ad(u, v) can be estimated by: [ ]T [ ]T t ang T = f(−θ )g(−θ ) p2 · f(θ ) a · g(θ ) [ ]T [ ]T r ad T = f(−θ )g(−θ ) a · f(θ ) p2 · g(θ )
(3.34)
where θ is the angle between O P2 and the horizontal axis f(θ ) = (cosθ, sinθ ), g(θ ) = (−sinθ, cosθ ).
3.4.2 Radial and Tangential Acceleration Fields on Image Sequences The radial and tangential acceleration fields are first show on synthetic images and then on real images to show its capability in real applications. The synthetic images are manipulated by two test images from Middlebury dataset (Baker et al., 2011). They are classified into 4 groups: linear shift with constant velocity, linear shift with acceleration, rotation with constant angular velocity and rotation with angular acceleration. We use decomposing algorithms to detect velocity, resultant acceleration, radial and tangential fields for each group separately. The results are illustrated in Fig. 3.9. In the results of linear shift, there is little radial acceleration because the direction of trajectory does not change, and the tangential component only appears when the object is accelerating. The resultant acceleration field shows similar features with tangential acceleration since it only contains the tangential component in linear displacement. Velocity appears in both situations however. In the rotation examples, radial acceleration appears both in the rotation with constant angular velocity and the rotation with angular acceleration, due to the direction of motion changing all the time. The magnitude of radial acceleration is
3.4 Tangential and Radial Acceleration
39
Linear shift with
Rotation with constant
Rotation with angular
velocity
acceleration
angular velocity
acceleration
Tangential Acceleration
Radial Acceleration
Resultant Acceleration
Velocity
Linear shift with constant
Fig. 3.9 The experimental results of synthetic images
increasing with the increase in the angle of the object rotated. All the radial acceleration flow points to the sub-Mequon centre since it rotates about the centre. The directions of tangential components are along the tangent of rotating trajectory showing a result consistent with expectations. Simultaneously, the velocity field does not show any obvious distinction. There appears to be some noise around the edge of the moving object; this is mainly caused by discontinuous motion in that area. The estimated detection of acceleration shows expected results on artificial scenes, and they illustrate detection of features consistent with acceleration features, which velocity analysis lacks. In real image sequences, acceleration flow can help distinguish objects undergoing different motions. Figures 3.10 and 3.11 gives different flow fields of two images sequences (Baker et al., 2011) which are natural motion in real world by the highspeed camera. In the first row, the velocity fields contain any types of motion so they
40
3 Analysing Acceleration in Computer Images’ Stream
look chaos; as contrast, the motion in radial and tangential acceleration fields are more comprehensible. In first sequence “Basketball”, the left man is passing the ball to another person. At the beginning of the throw, the ball has acceleration and it hasn’t started spinning so the acceleration is mainly tangential in frame 9, the left man’s hands show little tangential acceleration flow since he has not raised his hand to take the pass. In frame 11, the radial acceleration reveals that the basketball is spinning during the travel, as well as the shadow on the wall. The tangential acceleration shown on the Basketball frame 11
Tangential acceleration
Radial acceleration
Velocity
Basketball frame 9
Fig. 3.10 Different flow fields from Middlebury dataset: Basketball
3.4 Tangential and Radial Acceleration
Backyard frame 12
Tangential acceleration
Radial acceleration
Velocity
Backyard frame 8
41
Fig. 3.11 Different flow fields from Middlebury dataset: Backyard
hand of the right man indicates that he raises his hands and is ready to take the pass. The acceleration fields distinguish the motion field features as known. In the second sequence “Backyard” the little girls are jumping up. The acceleration flow shows they have the most tangential acceleration at the start of the jump and little in the end. In Fig. 3.10 a significant amount of radial acceleration appears on the spinning basketball and its shadow on the wall. The accumulated rotation centre in Fig. 3.12 (the blue cross on the right-hand man) is consistent with the direction of radial acceleration on the shadow since it has much larger flow filed than the basketball
42
3 Analysing Acceleration in Computer Images’ Stream
Fig. 3.12 Accumulated radial acceleration rotation centre in test sequence “Basketball”
itself. The centres are accumulated by the proximity algorithm (Bouchrika and Nixon, 2007b) and the radius is 10 pixels. The result demonstrates the power of radial and tangential acceleration decomposition algorithm and the information benefit the prediction of the trajectory the moving objects. Also here is a new example of the decomposition approach detecting acceleration field on gait in a Chroma key laboratory (Shutler et al., 2004), as shown in Fig. 3.13. Acceleration is detected mainly only around the limbs of the walking subject and is maximum around the swinging forward leg while the other leg is in the stance phase, since the limbs appear to have pendulum-like motion when people are walking (Cunado, et al., 2003). In addition, Fig. 3.14 shows the zoom-in radial acceleration on the subject’s thorax, and the direction of radial acceleration reveals that the upper body actually moves like a pendulum to a small extent during walking, which is consistent with the argument that the upper body moves like the motion of a pendulum (Cunado, et al., 2003). In contrast, the velocity flow is distributed all over the body without a notable difference, so the detected acceleration is consistent with the above analysis. The image sequence “Dumptruck” gives another encouraging example in Fig. 3.15. The silver car in the front and the red dump truck in the back are approximately moving in constant velocity for waiting for the traffic light. In contrast, the detection result shows the other two cars have passed the intersection and both are accelerating. This example demonstrates that acceleration can differentiate the accelerating objects from those which are simply moving.
Radial Acceleration
Tangential Acceleration
Velocity
3.4 Tangential and Radial Acceleration
Fig. 3.13 The acceleration fields of half walking cycle
43
44
3 Analysing Acceleration in Computer Images’ Stream
Fig. 3.14 The zoom-in radial flow on the walking subject
(a) Optical flow
(b) Acceleration
Fig. 3.15 The flow fields of Dumptruck
3.5 Conclusion Acceleration can be derived from the basis of Horn-Schunck, however most real motion violates the basic global smoothness assumptions made in the original formulation. There is another way of approximating acceleration fields, which is more accurate and able to handle most situations and appear to be improved over the Horn-Schunck technique on the standard Yosemite test sequence. The ability of this algorithms is also demonstrated by its capability to achieve radial and tangential acceleration analysis, providing a completely new way to understand and disambiguate motions in image sequences. Clearly, acceleration is likely to be more sensitive to noise though the experiments show that this is not a severe limitation and in fact, radial acceleration error estimates are encouragingly low.
Chapter 4
Jerk and High-Order Motion in Computer Images’ Streams
4.1 Overview It is well known that velocity measures the change in position over time, acceleration is the change of velocity, and the term of measuring the change of acceleration is Jerk (Schot,1978; Eager et al., 2016). It is usually used to analyse chaotic dynamical systems (Eichhorn et al., 1998). Chapter 3 introduced the algorithms that separating acceleration from complex motion, it is also worth investigating the higher-orders of motion. Examples of synthetic images show that higher-order flow gives a different perspective from acceleration, and Jerk and Snap fields allow more possibilities of analysing complex motion fields in computer vision.
4.2 Jerk, Snap and Higher Order Motion in Kinematics Acceleration links a force acting on a mass from Newton’s Second Law: F→ = m a→
(4.1)
Hence, assuming constant mass, Jerk describes the change of force; and Snap describes the change of Jerk in kinematic. In calculus, Snap is the derivative of acceleration with respect to time, the fourth derivative to position (Eager et al., 2016). Equation (4.2) describes the time evolution of position r→: s→(t) =
d2 a→ (t) d3 v→(t) d4 r→(t) d →j (t) = = = dt dt 2 dt 3 dt 4
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_4
(4.2)
45
46
4 Jerk and High-Order Motion in Computer Images’ Streams
Fig. 4.1 The relationship between motion profiles in a straight linear motion (Thompson, 2011)
where s→ represents Snap, →j denotes Jerk, and a→ , v→, r→, t are acceleration, velocity, position and time respectively. The change of n th-order flow under limited Snap is shown in Fig. 4.1 which gives a sense of the relationship between them. The conventional engineering application of Jerk and Snap in the industry is in motion control since humans have limited tolerance to the change of force. Motion limitation is, therefore, necessary to avoid users losing control during transportation, 2.0 m/s−3 in a straight-line transportation is acceptable for most people. The most common examples are in lift and vehicle design. Now acceleration and Jerk have been widely used to analyse driving behaviours in intelligent driving evaluation: predicting potential risks and guarantee the passages’ comfort in autonomous driving system (Kalsoom et al., 2013; Meiring et al., 2015; Murphey et al., 2009). Bagdadi and Varhelyi (2013) found that the breaking Jerk of vehicles measured by accelerometer is highly related to accidents, and their evaluation system based on Jerk is 1.6 times better than the longitudinal acceleration methods. In road and track design, unbounded radial Jerk needs to be avoided on curved parts: the theoretical optimum strategy is linearly increasing the radial acceleration. Another application using acceleration and Jerk is the operation path evaluation of numerical control machines (Bringmann et al., 2009). The square of the magnitude
4.4 Appling Multi-orders Flow Fields to Synthetic and Real Images
47
of Jerk integrated over time termed the “Jerk cost”, was measured for quantitatively analysing different movements in human arms (Nagasaki, 1989). In 2006, Caligiuri et al. (2006) used Jerk to monitor how the drug-induced side effect affect patients’ handwriting. More recently, a detection algorithm for manoeuvring targets using radars has considered analysing Jerk (Kong et al., 2015; Zhang et al., 2017).
4.3 Jerk and Snap Field Estimation Sun et al. (2021) proposed their Jerk and Snap estimation algorithm by following their Differential Acceleration, Jerk is computed by differencing the neighbour acceleration fields: ˆ = A(t, ˆ t + Δt) − A(t ˆ − Δt, t) J(t)
(4.3)
where (t, t + Δt) represents the flow field from frame (t) to (t + Δt). The Jerk field is resolved into tangential and radial components as well; they are computed in the same manner with Eq. (4.3). The definitions of the tangential and radial components of Jerk in this book indicate that they measure the linear changes of tangential and radial components of acceleration. In Differential Acceleration, it is easy to refer the displacement to the middle frame to avoid the inconsistent start positions since it requires three points to compute acceleration. Estimating Snap involves an odd number of positions (five), which makes it possible to use the middle point as the start. Therefore, Snap fields are computed in a similar manner to Differential Acceleration in Sect. 3.3: [ ] ˆ ˆ t + Δt) − −J(t, ˆ t − Δt) S(t) = J(t,
(4.4)
Now we have the algorithms to provide the change of acceleration and Jerk, in the next section they are going to be applied to synthetic and real images to see whether they can reveal different motion features.
4.4 Appling Multi-orders Flow Fields to Synthetic and Real Images The advantages of synthetic images have been mentioned in Sect. 3.3.2. A set of images are manipulated to simulate the motion of Newton’s cradle named “synthetic cradle”. It is a device consisting of a set of swinging spheres, which was originally used to demonstrate the conservation of momentum and energy. Newton’s cradle is a good example to demonstrate the difference between various flow fields due to the
48
4 Jerk and High-Order Motion in Computer Images’ Streams
(a)
(b)
(c)
Fig. 4.2 Examples of “synthetic cradle”
different order components changing from time to time whereas the entire motion is rather simple. In the image sequence, the highest point is considered as the stationary point, which is t = 0. Swinging to the lowest point is the positive direction and swing to the stationary position is the negative direction. The change in inclination of the line suspending the swinging ball is computed by: Δθ = 2 × t
(4.5)
where t ∈ {−5, −4, . . . , 4, 5}, Δθ is the number of degrees increased between each frame. Figure 4.2 gives the examples of t = −5, 0, 2. The results of nth order motion flow when t = −1, 0, 1 are presented in Fig. 4.3. When t = −1, the sphere swings to the highest position and remains there for one frame (t = 0), then it swings back at t = 1. The acceleration shows similar motion fields in all three frames where the sphere is accelerating to the lower right with a similar magnitude. On the other hand, Jerk and Snap flows give different perspectives. When t = −1 Jerk and Snap flows are in the same direction as acceleration and the magnitude of Snap is larger, it means that the acceleration is increasing with the same direction. In the second row, Jerk and Snap flows have notable reduction compared with the Jerk and Snap field in frame t = −1. When t = 1, the directions of Jerk and Snap are opposite, which proved that the acceleration is actually decreasing. In addition to synthetic images, Jerk and Snap can be disambiguated from real motion fields as well. A few image sequences filmed by high-speed camera from Middlebury are selected as the test data to reduce the error caused by the low frame rate. In Figs. 4.4 and 4.5, the results show that the motion fields indeed change with the increase in order. The constituent parts of Snap are too noisy due to the strong constraint (moving along the same arc for five frames), only resultant Snap is shown here. In “Beanbags”, tangential Jerk has larger magnitude than tangential acceleration on the left arm in frame 9 whereas there is barely radial Jerk, which means the main change of acceleration is in magnitude rather than direction. The right hand shows a similar change. In frame 11, the motion is mostly focused on the right hand and the
4.4 Appling Multi-orders Flow Fields to Synthetic and Real Images
Acceleration Flow
Jerk Flow
Fig. 4.3 Flow fields of synthetic cradle sequence
49
Snap Flow
50
4 Jerk and High-Order Motion in Computer Images’ Streams
Tangential Jerk
Radial Acceleration
Tangential Acceleration
Optical Flow
Beanbags frame 9
Fig. 4.4 The motion fields of “Beanbags”
Beanbags frame 11
51
Snap
Radial Jerk
4.5 Conclusion
Fig. 4.4 (continued)
balls in the air: the large radial Jerk flow denotes the acceleration direction of right hand is changing. In the results of “DogDance”, the flow fields between different orders are roughly identical, except the dog’s leg in radial Jerk and radial acceleration. Again, there is a large amount of optical flow although it is hard to interpret due to the complexity of real motion. In kinetics, obtaining acceleration needs three points on the path, four to get Jerk and five for Snap. With the increase in the motion order, the computation involves more frames since the image is a discrete signal. If the difference between frames is finite and relatively large, it potentially adds noise to the results. Therefore, the accuracy of motion fields is largely a function of the frame rate and motion intensity. If the frame interval → 0, we will have the most accurate motion fields.
4.5 Conclusion In this chapter, motion field is decomposed into its constituent parts. In particular, the notion of acceleration has been extended into the detection of Jerk and Snap (and in their vector format). Analysis of test image sequences shows that the extensions
52
4 Jerk and High-Order Motion in Computer Images’ Streams
Tangential Jerk
Radial Acceleration
Tangential Acceleration
Optical Flow
DogDance frame 9
Fig. 4.5 The motion fields of “DogDance”
DogDance frame 11
53
Radial Jerk
4.5 Conclusion
Fig. 4.5 (continued)
have the power to further discriminate higher orders of acceleration successfully. Clearly, they are ripe for further evaluation and application. The nature of high-order motion detection suggests that the techniques might be more susceptible to noise, as this can be exacerbated when detecting high-order motion. This could especially be the case when analysing surveillance video, which is often a target of motion-based analysis. In that data, there are often problems with low temporal, spatial and brightness resolution, and these can limit the results of a motionbased detection technique. Any development should likely include consideration of the smoothing that is necessary to mitigate limitations inherent in the original data.
Chapter 5
Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion Flow
5.1 Overview In gait analysis, heel strikes are an important and preliminary cue because gait period, step and stride length can be derived accurately by the moment and position of heel strikes. It refers to the heel that first contacts the ground during the stance phase of the walking cycle (Cunado et al., 2003). This chapter use higher-order motion field to detect when and where a heel strikes the floor. When the foot is approaching to strike, its motion status changes from a forward motion to making circular motion centred at the heel. The amount of radial acceleration on the leading foot will dramatically increase when the heel hits the floor. According to this clue, the key frames can be determined by the quantity of acceleration flow within the ROI, and positions can be found from the centres of rotation caused by radial acceleration. Compared with other heel strike detection methods, the temporal template of using acceleration only requires three consecutive frames for processing so it also allows near real-time detection with only a single frame of delay. The ability of higher-order motion method is tested on a number of databases which were recorded indoors and outdoors with multiple views and walking directions for evaluating the detection rate under various environments. The results show the ability of high-order motion for both temporal detection and spatial positioning. The robustness of this approach to three anticipated types of noises in real CCTV footage is also evaluated in the experiments. The high-order motion flow detector is less sensitive to Gaussian white noise, whilst being effective for images of low resolution and incomplete body information when compared with other techniques.
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_5
55
56
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
5.2 Detecting Heel Strikes Through Radial Acceleration 5.2.1 Gait Analysis Gait analysis is the systematic study of human walking. It has been mainly applied in two fields: medical consultation for conditions which affect walking (Whittle, 2007) and human identification (Nixon et al., 2006). Clinical gait analysis usually uses physical data to analyse the walking pattern of patients for diagnosis and treatment. The data is collected by wearable or non-wearable sensors, like accelerometer and treadmill. In human identification, gait as a behavioural biometric feature obtained at a distance from the camera, is hard to hide or disguise. It is the most reliable biometric feature in the criminal investigation since it is less sensitive to the low quality of images compared with other biometric features. It has been demonstrated that gait can be used in criminal investigations either as the body (Larsen et al., 2008) or the gait measurements (Bouchrika et al., 2011). The approaches to analysing gait can be classified into three types based on the sensor modalities that are used to make measurements: physical-sensor based, depth image-based and standard image-based. Physical sensor-based techniques measure the physical data extracted from gait, mostly kinetic parameters and underfoot force/ pressures (Han et al., 2006; Zeni et al., 2008). The physical-sensors are classifiable into wearable and non-wearable. The most popular wearable sensors are accelerometers and gyroscopes (Connor et al., 2018; Shull et al., 2014). Djuri´c-Joviˇci´c et al. (2011) use accelerometer to measure the angle of leg segments and the ankle. Rueterbories et al. (2010) use gyroscopes to capture the angular displacement, or Coriolis force since it is the response to a rotating particle, to discriminate gait events. In the modalities of underfoot force/pressure sensors, the features of Ground Reaction Forces (GRFs) are used for analysing gait and it is commonly considered as the golden standard for gait phase partitioning (Taborrri et al., 2016). Derlatka (2013) use Dynamic Time Warping (DTW) to measure the stride difference and then use k-Nearest Neighbour (k-NN) to classify people. Later Derlatka and Bogdan partitioned the GRF stance into five sub phases to achieve a higher classification rate (Derlatka et al., 2015). Depth, or RGBD image, based gait analysis techniques have expanded since the introduction and wide availability of PrimeSense and Kinect sensors. These measurements use the distance between the body parts and the sensor in depth images to analyse gait (O’connor et al., 2007; Auvinet et al., 2015). Lu et al. (2014) have built a gait database named ADSC-AWD based on Kinect data. O’Connor measure the acceleration of the body using Kinect. Standard image-based gait recognition has been extensively studied. Most approaches are targeted at the recognition of individual humans, using gait as the biometric signature. The general framework usually consists of background subtraction, feature extraction and classification (Wang et al., 2011). The approaches can be classified into two catalogues: model-based and model-free. Model-based
5.2 Detecting Heel Strikes Through Radial Acceleration
57
approaches have an intimate relationship with the human body and its motion. ´ Swito´ nski et al. (2011) extracted the velocities and accelerations across the path of skeleton root element, feet, hands and head as the gait feature. Yam et al. (2002) presented an analytical gait model which extracted the angle of thigh and lower leg rotation without parameter selection. Model-free approaches concentrate on the body shape or the motion of the entire gait process and thus could be used for the analysis of other moving shapes or mammals. Bobick and Davis (2001) employed the motionenergy images and motion-history images of silhouette; Han and Bhanu (2006) used the gait-energy image for recognition. Model-based methods are view-invariant and scale-invariant but the computation cost is relatively high and the approaches can be very sensitive to image quality. Model-free approaches are less sensitive to the image quality with lower computation cost though they are not intrinsically robust to variation in viewpoint and scale (Wang et al., 2011).
5.2.2 Heel Strikes Detection for Gait Analysis Gait is periodic and most analysis methods rely on accurate gait period detection. The components of one gait cycle are shown in Fig. 5.1: a gait cycle is defined as the interval between two consecutive heel strikes of the same foot. A heel strike refers to the moment when the heel first strikes the floor. Suppose one gait cycle starts from the heel strike of right foot, the right foot rotates on the heel to touch the floor (‘stance phase’) to support the body while the left foot is swinging forward (‘swing phase’) until the left heel strikes the floor. Then the roles of the two feet switch, the left foot remains flat on the floor whilst the right foot is swinging forward. When the right heel strikes the floor again, then a gait cycle is complete. Hence the accurate and efficient detection of heel strikes is essential because it partitions walking sequences into cycles composed of stance and swing phases (Zeni et al., 2008). In addition, the stride and step length can be derived from the stationary position of the heel at the moment of strike. Heel strikes also show outstanding ability to disambiguate walking people from other moving objects. The striking foot is stationary for almost half period during one gait cycle as shown in Fig. 5.2. Therefore, previous standard image-based heel strike detection methods usually accumulate the gait sequences based on empirical analysis and find the areas where have the most desired features. Bouchrika and Nixon (2007b) extracted low-level features, corners, to estimate the strike positions. They used the Harris corner detector to detect all the corners of each frame and obtained the corner proximity image by accumulating proximity algorithm. The positions of heel strike on the ground locate around the densest areas of accumulated corners. Jung and Nixon (2013) used the movement of the head to detect the key frame (the frame in which the heel strike takes place). When a person is walking, the vertical positions of the head in the sequence are similar to a sinusoid, shown in Fig. 5.3a. When the heel strikes, the stride length is maximal, so the head is at the lowest point; when the feet are crossing, the head is at the highest point. Similar to Bouchrika’s
58
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
Fig. 5.1 The temporal components contained in a gait cycle and step and stride length during the cycle (Cunado et al., 2003)
Fig. 5.2 Foot model of walking (Bouchrika et al., 2006)
method, Jung and Nixon accumulated the silhouettes of the whole sequence to find the positions where they remain the longest, as shown in Fig. 5.4.
5.2.3 The Acceleration Pattern on Gait Torsos move like connected pendulums during walking and researchers have successfully simulated pathological gait by using a liner inverted pendulum model (Komura et al., 2005; Kajita et al., 2001). Pendulum has a regular acceleration pattern, which
5.2 Detecting Heel Strikes Through Radial Acceleration
(a) The vertical positions of the
(b) When the heel strikes
head, left leg and right leg (Jung
(Shutler et al., 2004)
59
(c) When the feet are crossing (Shutler et al., 2004)
et al., 2013)
Fig. 5.3 The relationship between the trajectories of head with gait events
Fig. 5.4 Silhouette accumulation map for a gait sequence (Jung et al., 2013)
implies that we can describe gait by the acceleration pattern of the image-based data. Figure 5.5 shows the acceleration fields of the body during (a) Toe off, (b) Heel strike and (c) Heel rise. They reveal that the legs and feet appear to have more acceleration or deceleration than the other parts of the body during different gait phases. Likewise, the forearms have acceleration since they are similar to swinging pendulum. Therefore, the acceleration pattern of a walking body could be used to indicate the gait phases (Chaquet et al., 2013).
60
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
(a) Toe off
(b) Heel strike
(c) Heel rise
Fig. 5.5 The radial acceleration flow on a walking person
At the instant of a heel strike, the foot hits the ground which forces its velocity to cease in a short time. Therefore, the acceleration of the front foot increases dramatically, due to the disappearance of velocity (rapid deceleration). Also, the striking foot sole’s motion is approximately circular during the period from the heels striking on the ground to fully touching the ground, centred at the heel. Hence, most acceleration caused by heel strikes is radial in nature. The video frames where the heel strikes then can be located by the quantity of radial acceleration. When a person is walking, the motion of the body is similar to several joined pendula (Cunado et al., 2003). Therefore, the radial acceleration caused by a heel strike might be confused with that caused by other limbs since the motion of a pendulum incorporates radial acceleration. To reduce interference, the Region of Interest (ROI) which is located on the leading foot according to a walking body model is extracted. The size of the ROI is 0.133H × 0.177H where H represents the height.
5.2.4 Strike Position Estimation and Verification After obtaining the acceleration field A(t), the algorithm F can decompose the resultant acceleration field A(t) into radial and tangential acceleration to compute the tangential acceleration field T(t), radial acceleration fields R(t) and the radial acceleration rotation centres map C(t):
5.2 Detecting Heel Strikes Through Radial Acceleration
⎧ ⎨ T(t) F(A(t)) = R(t) ⎩ C(t)
61
(5.1)
where A(t) is the acceleration field at time t. If all the detected radial accelerations vector in the ROI are caused by the circular motion of the foot, then the rotation centres of these radial acceleration vectors should all locate at the heel. Consequently, the densest point of the rotation centre map indicates the strike position. There are numerous algorithms to determine the densest point and we select three methods to do the experiment: weighted sum, accumulation proximity (Bouchrika et al., 2007) and mean shift (Nixon et al., 2020). A weighted sum is a straightforward method to estimate where the rotation centres accumulate. The strike position is determined by: ∑m,n h(t) =
wi, j × (i, j ) ∑m,n i, j wi, j
i, j
(5.2)
where the weight factor is determined by the density of point (i, j) in centres map C(t): wi, j = C(t)i, j
(5.3)
Bouchrika and Nixon (2007a) accumulated all the corners in the gait sequences into one image. A dense area is the strike location since the striking foot is stabilized for half a gait cycle. For estimating the dense area, they estimated a measure for density of proximity. The value of proximity at point (i, j ) is determined by the number of corners within the surrounding region. By using the same method, the density of proximity for rotation centres map C(t) is estimated for evaluation. If Ri, j is a sub-region in the ROI with centre (i, j), the proximity value di, j is computed by: (
di,r j = Crr n di,n−1 j = di, j +
Cn n
(5.4)
where r is the radius of sub-region Ri, j which is around 20 pixels, di,n j is the proximity value for rings which are of n pixels away from the centre (i, j), Cn is the sum of rotation centres in the centre density map C(t) for rings which are of n pixels away from the current processing centre (i, j). Equation (5.4) is repeated for each point in the ROI to obtain the rotation centres density of proximity. The densest point in the density of proximity is where the heel strikes the ground. Mean shift (Nixon and Aguado, 2020) is a recursive algorithm that allows nonparametric mode-based clustering. It assigns the data points to the cluster iteratively by shifting points towards the highest density of data points. Mean shift is used to position the densest point in the rotation centre map. Since Sun’s heel strike detection
62
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
algorithm only needs one mode in the ROI, the bandwidth needs to be wide. For the experiments, a value of 20 pixels is used. The comparison experiments will be illustrated in Sect. 5.4.
5.3 Gait Databases Previous sections introduce the technique that uses radial acceleration to detect heel strikes for gait analysis. To show the ability of the algorithm, it is evaluated on various gait benchmark databases: the Large Gait Database (SOTON) (Shutler et al., 2004), the CASIA Gait Database (CASIA) (Wang et al., 2003; Yu et al., 2006; Zheng et al., 2011) and the OU-ISIR Gait Database (OU-ISIR) (Iwama et al., 2012; Takemura et al., 2018). The covariates and number of subjects of each database are illustrated in Table 5.1.
5.3.1 The Large Gait Database The Large Gait Database (SOTON) (Shutler et al., 2004) was built in 2002 by the University of Southampton. Shutler et al. (2004) collected walking sequences from over 100 subjects, including indoor (controlled lighting) and outdoor (uncontrolled lighting). Each target had 8 sequences of approximately 1.5 gait periods. This database focused on colour and lighting quality, and researchers used two different types of cameras and captured 3 scenarios each from 2 different views. The video frame rate is 25 fps. Since the database focussed on the key factors that affect gait recognition other than background segmentation, Chroma-keying was used in indoor data recording. Chroma-keying segments a relatively narrow range of colours; green was used because it is further away from skin tone and is less likely to be a component of clothing. This technique helps to remove the background from the subject in the images. Table 5.1 The summary of gait data used in the experiments Database
Number of subjects
Data covariates
SOTON
115
2 views, 3 scenarios (indoor/outdoor track, treadmill)
CASIA-A
20
3 views
CASIA-B
124
11 views, clothing, baggage
OU-ISIR
10,307
14 views
5.4 Experimental Results
63
5.3.2 CASIA Gait Database The CASIA Gait Database Dataset A (CASIA-A) (Wang et al., 2003) and Dataset B (CASIA-B) (Yu et al., 2006; Zheng et al., 2011) were built by Institute of Automation, Chinese Academy of Sciences. Before CASIA, there were only a few databases designed for gait recognition and most of them only contained a small number of subjects and walking environments which limited the progress of the field. The emergence of this database revealed the key factors of gait recognition and offered a baseline for new algorithms. CASIA-A contains 20 subjects walking in three directions (0°, 45°, 90°) with respect to the camera in an outdoor environment. Four sequences were filmed for each view per subject. These image sequences were in 24-bit with resolution 351 × 240 at frame rate 25 fps. Each sequence contains 90 frames on average. The videos in CASIA-B were all recorded in an indoor environment. This dataset contains 124 subjects, 93 males and 31 females, in three different categories of data: normal walk, wearing coats and carrying bags. The subjects walked along a specified trajectory enclosed with 11 cameras at different angles.
5.3.3 The OU-ISIR Gait Database The OU-ISIR Gait Database (OU-ISIR) (Iwama et al., 2012; Takemura et al., 2018) upgraded the scale of subjects in a gait database significantly. It contains 10,307 subjects (5114 males and 5193 females) in total. Their ages range from 4 to 89 years (Takemura et al., 2018), twice as larger as their earlier version in 2012 (Iwama et al., 2012). The huge amount of data significantly benefits the applications of machine learning algorithm in gait recognition that emerged in recent years. Moreover, the diversity of the subjects’ age and gender leads to statistically reliable performance evaluation of gait algorithms.
5.4 Experimental Results The heel strike detection method is evaluated on three benchmark databases: CASIA (Yu et al., 2009; Wang et al., 2004), SOTON (Shutler et al., 2004) and OU-ISIR (Takemura et al., 2018). The data used in the experiments is collected in various controlled environments. The experiments test around 100 heel strikes in each scenario and the test data incorporates multiple viewing angles and walking directions with gait sequences recorded indoors and outdoors, as described in Table 5.2. The acceleration decomposition algorithm is based on a subject moving perpendicular to the background so it is theoretically most effective in a direction perpendicular to the
64
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
Table 5.2 Dataset information Databases
CASIA-A (45°)
CASIA-A (90°)
CASIA-B
SOTON
OU-ISIR
Lighting control
NO
NO
Yes
Yes
Yes
Camera visual angle (°)
45
90
54
90
~ 75
Number of subjects
13
25
15
21
15
Number of strikes
96
98
126
114
120
Frame size
240 × 352
240 × 352
240 × 320
576 × 720
480 × 640
camera. Therefore, gait data imaged at multiple views has been used to evaluate the robustness of the acceleration method to other view angles. The GT of key frames and heel strike positions are manually labelled multiple times by three different people. Figure 5.6 shows the variance of manually labelled GT between different databases for key frames and strike positions. The variance in the key frame labelling is generally low and within one frame. Figure 5.6b shows greater variance on the SOTON dataset as it has the largest the ROI compared with other databases.
5.4.1 Key Frame Detection The key frame (or moment) of a heel strike is detected according to the quantity of radial acceleration in the ROI as described in Sect. 5.2.1. The histogram of radial acceleration within a walking sequence shows distinct suggestions for key frames, and it appears regularly and noticeably, showing the periodicity of gait. The framework of this heel strike detection system is illustrated in Fig. 5.7: (a) illustrates an example of the heel strike key frame, (b) is the area where the dense radial acceleration flow is counted, and (c) shows the histograms of radial acceleration within the ROI during a gait video. Noise has been effectively reduced by applying an empirical threshold. In Fig. 5.7c, the heel strikes occur at frame 13, 27, 41 obviously. There is much acceleration flow in frames 54 and 55 since the heel strike took place between them. This suggests that a higher frame rate could improve accuracy of detection. Figure 5.8 gives the pseudo code of the system.
5.4.2 Heel Strike Position Verification The ROI is extracted according to gait proportion, which is not always perfectly located on the leading foot in the sequence because the shape of the human body
5.4 Experimental Results
65
Fig. 5.6 GT labelling variance on different databases
(a) Key frame
(b) Heel strike position
changes during a gait cycle. In addition, there is radial acceleration on the other body parts, for example, the calf, since the limbs’ motion is that of several joined pendulums (Cunado et al., 2003). The rotation centres of the erroneous radial accelerations also form invalid strike position candidates. To reduce the effect of this error, the detected key frames are used to filter the heel strike position candidates. When the heel strikes between two frames, the acceleration quantities are used as a weighting factor for deriving the positions. Figure 5.9a shows detected candidates of heel strike positions in each frame and Fig. 5.9b is the result after being filtered by key frames. The (expected) periodicity of gait is evident in the result.
66
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
Fig. 5.7 An overview of key frame detection
5.4 Experimental Results
67
for frame in video: vel_field_1 = DeepFlow(frame_2, frame_1) vel_field_2 = DeepFlow(frame_2, frame_3) acc_field = vel_field_1 + vel_field_2 rad_field, centre_map = decomp_components(acc_field) ROI = extract_region(silhouette_2) rotation_center = desity_accumulation(centre_map[ROI]) for each_pixel in ROI: if rad_field[each_pixel] > magnitude_thres: rad_amount += 1 else: pass end for while rad_amount is peak: "KEY FRAME!" strike_position = rotation_center end while end for return key_frames_num, strike_positions
Fig. 5.8 Heel strike detection system
5.4.3 Detection Performance Bouchrika and Nixon (2007a) proposed a method that accumulated corners within a gait cycle by the Harris corner detector to determine the positions of heel strikes. Theoretically, there should be dense corners accumulated at the positions of heel since the heels stay at the strike positions for almost half the gait cycle. The results of the radial acceleration detector are compared against the corner detection method since there are few heel strike detection methods based on standard image sequences with implementation available. The performance is evaluated by F-score: ) p×r ) Fβ = 1 + β 2 r + β2 p
(5.5)
p stands for precision and r for recall. F-score prefers precision if β is set to be small and recall if β is large (Sokolova et al., 2006). Let TP represent the number of true positives, TN represent the true negatives, FP is false positives and FN is false negatives, p and r can be computed by: p=
TP TP r= TP + FP TP + FN
(5.6)
68
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
(a) Candidates for heel strikes.
(b) Detected heel strikes (after filtering). Fig. 5.9 Heel strike verification process
5.4 Experimental Results
69
Figures 5.10 and 5.11 illustrate the comparative F1 score of the acceleration and corner detector. Since F-score favours precision if β is small and recall if β is large, β is set as 1 here to evenly balance the result. The results differ from the results in (Sun et al., 2018) because the background has been included to give a more realistic implementation scenario. The detection of the heel strike moments (the key frames) and positions are evaluated separately since they are determined individually, and they describe different events in gait analysis. Since the corner detection does not return the key frames, an additional condition is applied which is that for a key frame to be successfully detected, a corner position within ± 30 pixels from the GT is considered as a true positive. This condition is actually quite generous and leads to an optimistic estimate of the frame for corner detection. For the radial acceleration detector, the criterion for a true positive in Fig. 5.10 is whether the detected frames are within ± 2 frames from the GT. For heel
(a) Radial acceleration
(b) Harris corner detector (Bouchrika et al., 2007a)
Fig. 5.10 F1 score of key frame detection
(a) Radial acceleration
(b) Harris corner detector (Bouchrika et al., 2007a)
Fig. 5.11 F1 score of heel positioning
70
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
positioning of both methods, a distance within ± 10 pixels (along both axes) from the GT is considered as a true positive in Fig. 5.11. Figure 5.10 illustrates that radial acceleration is able to detect key frames accurately when the camera is nearly perpendicular to the walking direction, the detection rate decreases with the increase of the angle between the camera and the walking subjects. Acceleration is more sensitive to the view angle since the scale of acceleration changes through the image sequence if the walking trajectory is not perpendicular to the camera. When the angle is large, the magnitude of acceleration is extremely small if it is far from the camera, which will cause the failures. The last row of Fig. 5.13 shows that the missing usually occur when subject is far from the camera. In Fig. 5.11 the radial acceleration detector provides more precise positioning results than Harris corner detector for all the camera views, especially on SOTON. The main reason is that the image size of SOTON is large, which causes the excessive accumulation area for Harris corner detector, thus the precision decreases correspondingly. The Precision-Recall curves with respect to the magnitude of acceleration (varies from 0 to 4) and the density of corners (varies from 500 to 1600) are reported in Fig. 5.12. Since the strike positions are filtered by the key frames, they do not show significant sensitivity to the change of threshold, and hence only the algorithm of key frame detection is evaluated. Radial acceleration hits a much higher recall rate; therefore, it has a larger area under the curve than corner detector. The precision of radial acceleration is steady, it keeps around 86% with the change of recall while the PR curve shows that the corner detector is more sensitive to the density.
Fig. 5.12 Precision-Recall curves of radial acceleration and Harris corner detector for the key frame detection
5.4 Experimental Results
71
Fig. 5.13 Examples of detection results with various databases
Figure 5.13 shows samples of the detection results for different databases. The radial acceleration detector can locate a precise location and frame of the heel strike since the angle between the camera and the walking subject is small. In CASIA-A45 (the last row in Fig. 5.13) the acceleration detector fails to detect several strikes when the subjects walk away from the camera and the accuracy of localization also decreases. Positioning is critical for the heel strike detection in gait analysis. In the above experiments, the weighted sum is used to estimate the strike positions. To improve the performance, a measure for density of proximity by Bouchrika and Nixon (2007a) and mean shift are also tested. Weighted sum, density of proximity and mean shift are used to determine the strike position. There is only one strike position in the ROI so one mode only needs to be determined, the bandwidth is set as 20 pixels in
72
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
Table 5.3 The results of strike position detection by advanced algorithms
Method
F1-score
Density of proximity (Bouchrika et al., 2007a)
0.72
Mean shift
0.95
the experiment. Around 100 randomly selected strike frames from SOTON database are tested. The new condition of a correct prediction in Table 5.3 is the estimated position is within ± 3 pixels from the GT. Mean shift has improved the precision of positioning significantly considering that the condition of initial results in Fig. 5.11a was within ± 10 pixels. The great improvement in determining the strike position by the density of proximity (Bouchrika and Nixon, 2007b) via acceleration also proves that the radial acceleration is a much better feature for detecting the heel strike than corners.
5.4.4 Robustness of Heel Strike Detection Approaches Since the performance of a system under adverse imaging conditions is an important issue, the robustness of heel strike detection technique evaluated as well. Three different factors affecting image quality, that might reduce the detection rates, are applied to the original sequences: Gaussian zero-mean white noise, occlusion in the detection area, and reduced resolution. These factors reflect some of the difficulties anticipated when detecting heel strikes in real surveillance videos. Figure 5.14 illustrates examples of different types of noise at different levels. Figure 5.15 gives the results of testing the acceleration detector’s robustness to these factors. Corner detection is also evaluated for comparison. The performance of the acceleration detector reduces slowly and smoothly with the increase in Gaussian
(a) Gaussian white noise (=1.5%) Fig. 5.14 Example of added noise and occlusion
(b) Occlusion in the ROI (40%)
5.4 Experimental Results
73
noise variance. Corner detection is much more sensitive to increases in the variance of the Gaussian noise, as shown in Fig. 5.15a. The evaluation of robustness to occlusion investigates whether the gait information in real surveillance can be totally seen or not. Occlusion is achieved by covering the ROI from the toe to the heel increasingly with a random texture. The performance under occlusion decreases steadily, and the radial acceleration detector fails when the ROI is covered more than 30% of the whole area. This is because most highmagnitude acceleration is located around the toe (the toe travels the greatest distance during heel strikes) but the toe is almost completely occluded when the occlusion in the ROI is over 30%. The detection by corners does not decrease significantly since the area that most corners concentrated on is the heel, which has not been totally occluded yet. Acceleration does outperform the Corner method when occlusion is slight. If the occlusion starts from heel to toe, the radial acceleration detector could achieve much better robustness than the corner detector. Resolution reduction investigates whether the resolution of the subject is sufficient in surveillance footage. The original images are down-sampled and the detection rate of both approaches decreases to a low level when the new pixels are equivalent to 5 × 5 patches in the original image (in which the height of the subject is now around 70 pixels whereas it was 350 pixels originally). Acceleration and corner detectors show similar characteristics under this situation.
5.4.5 Detecting Heel Strikes Via Snap and Jerk Following the detection system in the last chapter, Jerk and Snap are also applied to detect heel strikes. Figure 5.16 shows the normalized radial Jerk and Snap of a walking cycle sampled every 7 frames. There is considerable amount of flow on the leading foot periodically when the heel strikes on the floor although Snap is relatively noisy. Since Jerk and Snap are higher orders of motion, the flow fields suggest comparatively intense motion. Logically, we wonder whether Jerk and Snap can detect and localize the heel strikes or improve the performance than acceleration. Figure 5.17 reports the F1 score of detecting heel strike through Jerk and Snap. The criteria are the same with acceleration. As the resolution of all the CASIA data is insufficient and the detection results are very sensitive to the magnitude of motion flow, the flows are only evaluated on SOTON and OU-ISIR. Jerk shows competitive ability with acceleration on key frame detection and slightly lower on positioning. This is a sign that Jerk can be adapted in gait analysis or other applications on real images. On the other hand, as the highest order of motion, Snap underperforms acceleration and Jerk, and furthermore the relatively small ROI in OU-ISIR increases the difficulty (Fig. 5.18). The Precision-Recall curves suggest that Jerk has a high area under the curve, which represents that it hits both high recall and high precision. It performs considerably better than Snap on the balance between the precision and recall rate.
74
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
(a) Testing robustness to Gaussian white noise
(b) Testing robustness to occlusion
(c) Testing robustness to resolution Fig. 5.15 Performance analysis of heel strike detection
5.5 Discussion
75
(a) Jerk patterns
(b) Snap patterns Fig. 5.16 Noticeable patterns on a gait cycle
5.5 Discussion In dynamics, a change of force causes acceleration, and acceleration changes motion. Consequently, acceleration is a distinctive cue to the change of motion. Some previous physics-based gait analysis approaches used the accelerometers and gyroscopes to detect the acceleration and angular velocity of the body parts in order to determine stance, swing and strike (Connor et al., 2018; Taborrri et al., 2016). This principle is applied to standard image sequences to detect heel strikes via higher-order motion fields. When the heel approaches the strike, the foot has significant radial acceleration which is centred at the heel. Experimental results show that higher-order motion
76
5 Detecting Heel Strikes for Gait Analysis Through Higher-Order Motion …
(a) Key frame detection
(b) Heel positioning
Fig. 5.17 F1 score of heel strike detection via Jerk and Snap
Fig. 5.18 Precision-Recall curves of Jerk and Snap for the key frame detection
is a more powerful way of estimating the positions of the strike than previous standard image-based techniques. Also, the radial acceleration detector overcomes the problem of detection in real-time as only three frames are needed to estimate acceleration flow. The evaluation of robustness to different types of noise suggests that acceleration is more robust to Gaussian noise than the previous approach. On the other hand, the main limitation of higher-order motion is its sensitivity to the visual angle between the camera and the subject. When the camera is orthogonal to the subject, higher-order motion performs best since the measurements and decomposition algorithms utilise a 2D plane. The most realistic way of solving this problem is by applying the algorithm using a 3D volume, for example using Kinect
5.6 Conclusions
77
depth images, to replace the standard image sequence. However, the complexity of computation will be much higher than the existing technique. Another weakness of this approach is that it can only be applied to data with a static background and the subjects in the images do not overlap. This is a similar limitation to most existing techniques, however. Currently, background subtraction and silhouette extraction are essential pre-processing progresses for most standard imagebased approaches. The results will be severely affected if the scene is too complex, for example, the overlap of subjects in a crowded scene. Hence there is still refinement necessary to apply these techniques in poor quality images, such as surveillance footage of the underground, or videos recorded with adverse illumination.
5.6 Conclusions This chapter demonstrates that high-order motion flow can be used to detect heel strikes. Cunado et al. (2003) proposed that the limbs appeared to have pendulumlike motion in their gait model and acceleration has been widely used in gait analyse techniques based on physical data as the motion of a pendulum can be easily discriminated by radial and tangential acceleration. The ability of the new heel strike detection technique has been compared with one of the few existing techniques. The results show that this new technique not only improves the precision significantly but also enables real-time detection. The experiments also investigate how the camera viewpoint can affect performance, as radial and tangential components are derived based on a plane perpendicular to the subjects.
Chapter 6
More Potential Applications Via High-Order Motion
6.1 Scenes Segmentation There have been a few acceleration studies focusing on abnormal behaviour detection. Chen et al. (2015) applied acceleration flow to detect abnormal behaviours. Dong et al. (2016) fed optical flow and acceleration flow into a network, both in combination and individually, to detect the violence in scenes. The evaluation in their work illustrates that the most reliable feature is individual acceleration flow. When people fight, their bodies tend to have large acceleration (in many places and with large magnitudes) because their arms are swinging and their feet are kicking. As such, acceleration appears more suited to the detection of rapid changes, consistent with scenes of violence. In the scenes which do not contain episodes of violence there is only little acceleration. In comparison there is more optical flow, or velocity flow, with more leisurely movement. Thus, acceleration field might be able to determine an approach suited to the detection of violent crime in the future.
6.2 Gait Analysis Accelerometer and gyroscope have been widely used for gait analysis (Connor et al., 2018; Shull et al., 2014) since the movement caused by different gait events is unique. Chapter 5 showed that radial acceleration and Jerk appear periodically on the leading foot at the moment of heel strikes. Figure 6.1 exhibits the normalized acceleration and Jerk magnitude of a whole walking cycle sampled every 5 frames. Acceleration and Jerk on each part of the subject are different during different temporal components of the walking cycle. This potential new work bridges the gap between analysing gait through motion data from physical sensors to computer images. Some research on gait analysis has considered velocity and acceleration based on physical data (Taborrri et al., 2016),
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4_6
79
80
6 More Potential Applications Via High-Order Motion
(a) Acceleration patterns
(b) Jerk patterns Fig. 6.1 The acceleration and Jerk patterns on one gait cycle
hence we suggest a hypothesis that the multi-orders of flow on each part of the body extracted from computer images can discriminate different phases of walking. Moreover, acceleration can also be used for segmentation. For example, there is rarely substantial acceleration on the upper body, thus presumably the legs and the body can be easily extracted from this observation.
References
Auvinet, E., Multon, F., Aubin, C. E., Meunier, J., & Raison, M. (2015) Detection of gait cycles in treadmill walking using a kinect. Gait Posture, 41(2), 722–725. Bagdadi, O., & Várhelyi, A. (2013). Development of a method for detecting jerks in safety critical events. Accident Analysis & Prevention, 50, 83–91. Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011) A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31. Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance of optical flow techniques. International Journal of Computer Vision, 12(1), 43–77. Black, M. J., & Anandan, P. (1996). The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1), 75–104. Bobick, A. F., & Davis, J. W. (2001) The recognition of human movement using temporal templates. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(3), 257–267. Bouchrika, I., Goffredo, M., Carter, J., & Nixon, M. (2011). On using gait in forensic biometrics. Journal of Forensic Sciences, 56(4), 882–889. Bouchrika, I., & Nixon, M. (2006). Markerless feature extraction for gait analysis. In 5th Conference on Advances in Cybernetic Systems (pp. 55–60). Bouchrika, I., & Nixon, M. (2007a). Gait-based pedestrian detection for automated surveillance. In International Conference of Computer Vision System. Bouchrika, I., & Nixon, M. (2007b). Model-based feature extraction for gait analysis and recognition. Computer Vision/Computer Graphics Collaboration Techniques (pp. 150–160). Bringmann, B., & Maglie, P. (2009). A method for direct evaluation of the dynamic 3D path accuracy of NC machine tools. CIRP Annals—Manufacturing Technology, 58(1), 343–346. Brox, T., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. Springer. Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012) A naturalistic open source movie for optical flow evaluation. Lecture Notes on Computer Science, 7577(6), 611–625. Caligiuri, M. P., Teulings, H. L., Filoteo, J. V., Song, D., & Lohr, J. B. (2006). Quantitative measurement of handwriting in the assessment of drug-induced parkinsonism. Human Movement Science, 25(4–5), 510–522. Chaquet, J. M., Carmona, E. J., & Fernández-Caballero, A. (2013). A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding, 117(6), 633–659. Chen, C., Shao, Y., & Bi, X. (2015). Detection of anomalous crowd behavior based on the acceleration feature. IEEE Sensors Journal, 15(12), 7252–7261.
© Southeast University Press 2024 Y. Sun, High-orders Motion Analysis, https://doi.org/10.1007/978-981-99-9191-4
81
82
References
Collins, R. T., Gross, R., & Shi, J. (2002) Silhouette-based human identification from body shape and gait. In Proceedings—5th IEEE International Conference on Automatic Face Gesture Recognition (pp. 366–371). Connor, P., & Ross, A. (2018). Biometric recognition by gait: A survey of modalities and features. Computer Vision and Image Understanding, 167, 1–27. Cunado, D., Nixon, M., & Carter, J. N. (2003). Automatic extraction and description of human gait models for recognition purposes. Computer Vision and Image Understanding, 90(1), 1–41. Datta, A., Shah, M., & Da Vitoria Lobo, N. (2002). Person-on-person violence detection in video data. Proceedings of the International Conference on Pattern Recognition, 16(1), 433–438. Derlatka, M. (2013). Modified kNN algorithm for improved recognition accuracy of biometrics system based on gait. In Lecture Notes in Computer Science, 8104, 59–66. Derlatka, M., & Bogdan, M. (2015) Ensemble kNN classifiers for human gait recognition based on ground reaction forces. In 8th International Conference on Human System Interaction (HSI) (pp. 88–93). Djuri´c-Joviˇci´c, M. D., Joviˇci´c, N. S., & Popovi´c, D. B. (2011). Kinematics of gait: New method for angle estimation based on accelerometers. Sensors, 11(11), 10571–10585. Dong, Z., Qin, J., & Wang, Y. (2016). Multi-stream deep networks for person to person violence detection in videos. Communications in Computer and Information Science, 662, 517–531. Dosovitskiy, A., Fischery, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., & Brox, T. (2015). FlowNet: Learning optical flow with convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, 2015, 2758–2766. Eager, D., Pendrill, A. M., & Reistad, N. (2016). Beyond velocity and acceleration: Jerk, snap and higher derivatives. European Journal of Physics, 37(6). Eichhorn, R., Linz, S. J., & Hänggi, P. (1998). Transformations of nonlinear dynamical systems to jerky motion and its application to minimal chaotic flows. Physical Review E—Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 58(6), 7151–7164. Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion. Lecture Notes on Computer Science, 2749(1), 363–370. Fortun, D., Bouthemy, P., & Kervrann, C. (2015). Optical flow modeling and computation: A survey. Computer Vision and Image Understanding, 134, 1–21. Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 32(11), 1231–1237. Gordon, G., & Milman, E. (2006). Learning optical flow. Image (Rochester, N.Y.) (pp. 83–97). Han, J., & Bhanu, B. (2006). Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2), 316–322. Horn, B. K. B., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203. Iwama, H., Okumura, M., Makihara, Y., & Yagi, Y. (2012). The OU-ISIR gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Transactions on Information Forensics and Security, 7(5), 1511–1521. Jung, S. U., & Nixon, M. (2013). Heel strike detection based on human walking movement for surveillance analysis. Pattern Recognition Letters, 34(8), 895–902. Kajita, S., Matsumoto, O., & Saigo, M. (2001). Real-time 3D walking pattern generation for a biped robot with telescopic legs. Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164) (Vol. 3, pp. 2299–2306). Kalsoom, R., & Halim, Z. (2013). Clustering the driving features based on data streams. In Inmic (pp. 89–94). Komura, T., Nagano, A., Leung, H., & Shinagawa, Y. (2005). Simulating pathological gait using the enhanced linear inverted pendulum model. IEEE Transactions on Biomedical Engineering, 52(9), 1502–1513. Kong, L., Li, X., Cui, G., Yi, W., & Yang, Y. (2015). Coherent integration algorithm for a maneuvering target with high-order range migration. IEEE Transactions on Signal Processing, 63(17), 4474–4486.
References
83
Larsen, P. K., Simonsen, E. B., & Lynnerup, N. (2008) Gait analysis in forensic medicine. Journal of Forensic Sciences, 53(5), 1149–1153. Liu, C., Yuen, J., Torralba, A., Sivic, J., & Freeman, W. T. (2008). SIFT flow: Dense correspondence across different scenes. Lecture Notes in Computer Science, Volume 5304 LNCS, Issue PART 3 (pp. 28–42). Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. Imaging, 130, 674–679. Lu, J., Wang, G., & Moulin, P. (2014). Human identity and gender recognition from gait sequences with arbitrary walking directions. IEEE Transactions on Information Forensics and Security, 9(1), 51–61. Meiring, G. A. M., & Myburgh, H. C. (2015). A review of intelligent driving style analysis systems and related artificial intelligence algorithms. Sensors (Switzerland), 15(12), 30653–3068. Murphey, Y. L., Milton, R., & Kiliaris, L. (2009). Driver’s style classification using jerk analysis. In 2009 IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems, CIVVS 2009—Proceedings (pp. 23–28). Nagasaki, H. (1989). Asymmetric velocity and acceleration profiles of human arm movements. Experimental Brain Research, 74(2), 319–32. Nievas, E. B., Suarez, O. D., Garcia, G. B., & Sukthankar, R. (2011). Violence detection in video using computer vision techniques. In International Conference on Computer Analysis of Images and Patterns (pp. 332–339). Nir, T., Bruckstein, A. M., & Kimmel, R. (2008). Over-parameterized variational optical flow. International Journal of Computer Vision, 76(2), 205–216. Nixon, M., & Aguado, A. (2020). Feature extraction and image processing for computer vision (4th ed.). Waltham, MA: Academic Press. Nixon, M., Tan, T., & Chellappa, R. (2005). Human identification based on gait. Springer. O’Connor, C. M., Thorpe, S. K., O’Malley, M. J., & Vaughan, C. L. (2007). Automatic detection of gait events using kinematic data. Gait Posture, 25(3), 469–474. Rueterbories, J., Spaich, E. G., Larsen, B., & Andersen, O. K. (2010). Methods for gait event detection and analysis in ambulatory systems. Medical Engineering & Physics, 32(6), 545–552. Schot, S. H. (1978). Jerk: The time rate of change of acceleration. American Journal of Physics, 46(11), 1090–1094. Saunier, N., & Sayed, T. (2006). A feature-based tracking algorithm for vehicles in intersections. In Third Canadian Conference on Computer and Robot Vision. Shoaib, M., Bosch, S., Incel, O. D., Scholten, H., & Havinga, P. J. M. (2016). Complex human activity recognitionusing smartphone and wrist-worn motion sensors. Sensors, 16(4), 426. Shull, P. B., Jirattigalachote, W., Hunt, M. A., Cutkosky, M. R., & Delp, S. L. (2014) Quantified self and human movement: A review on the clinical impact of wearable sensing and feedback for gait analysis and intervention. Gait Posture, 40(1), 11–19. Shutler, J. D., Grant, M. G., Nixon, M., & Carter, J. N. (2004). On a large sequence-based human gait database. Applications and Science in Soft Computing (pp. 339–346). Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In AI 2006: Advances in Artificial Intelligence (pp. 1015–1021). Sun, D. (2013). From pixels to layers: Joint motion estimation and segmentation. Providence, RI, USA: Brown University. Sun, D., Roth, S., & Black, M. J. (2010). Secrets of optical flow estimation and their principles. In Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition (pp. 2432–2439). Sun, D., Roth, S., & Black, M. J. (2014). A quantitative analysis of current practices in optical flow estimation and the principles behind them. International Journal of Computer Vision, 106(2), 115–137.
84
References
Sun, Y., Hare J., & Nixon, M. (2017). Analysing acceleration for motion analysis. In International Conference on Signal Image Technology & Internet Based Systems, Jaipur, India (pp. 289–295). Sun, Y., Hare, J., & Nixon, M. (2018). Detecting heel strikes for gait analysis through acceleration analysis. IET Computer Vision, 12(5), 686–692. Sun, Y., Hare J., & Nixon, M. (2021). On parameterizing higher-order motion for behaviour recognition. Pattern Recognition, 112. Sun, Y., Hare, J. S., & Nixon, M. (2016). Detecting acceleration for gait and crime scene analysis. In 7th International Conference on Imaging for Crime Detection and Prevention (pp. 1–6). ´ Swito´ nski, A. Pola´nski, A., & Wojciechowski, K. (2011). Human identification based on gait paths. In Lecture Notes in Computer Science (Vol. 6915, pp. 531–542). Springer. Taborri, J., Palermo, E., Rossi, S., & Cappa, P. (2016). Gait partitioning methods: A systematic review. Sensors (Switzerland), 16(1). Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., & Yagi, Y. (2018). Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Transactions on Computer Vision and Applications, 10(1), 4. Teed, Z., & Deng, J. (2020). RAFT: Recurrent all-pairs field transforms for optical flow. ECCV 2020, Lecture Notes in Computer Science (Vol. 12347). Thompson, P. M. (2011) Snap crackle and pop. In Proceedings of AIAA Southern California Aerospace Systems and Technology Conference. Trobin, W., Pock, T., Cremers, D., & Bischof, H. (2008). An unbiased second-order prior for highaccuracy motion estimation. Lecture Notes in Computer Science (including Subseries Lecture Notes on Artificial Intelligence, Lecture Notes on Bioinformatics) (Vol. 5096, no. 813396, pp. 396–405). Wang, L., Ning, H., Tan, T., & Hu, W. (2004). Fusion of static and dynamic body biometrics for gait recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(2), 149–158. Wang, J., She, M., Nahavandi, S., & Kouzani, A. (2010). A review of vision-based gait recognition methods for human identification. In 2010 International Conference on Digital Image Computing: Techniques and Applications (pp. 320–327). Wang, L., Tan, T., Ning, H., & Hu, W. (2003). Silhouette analysis-based gait recognition for human identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1505– 1518. Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2009). An improved algorithm for TV-L 1 optical flow. In Statistical and Geometrical Approaches to Visual Motion Analysis (Vol. 5604, pp. 23–45). Springer. Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013). DeepFlow: Large displacement optical flow with deep matching. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1385–1392). Whittle, M. W. (2007) Normal ranges for gait parameters (4th edn., pp. 223–224). Elsevier. Xu, L., Jia, J., & Matsushita, Y. (2012). Motion detail preserving optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1744–1757. Yaakob, R., Aryanfar, A., Halin, A. A., & Sulaiman, N. (2013). A comparison of different block matching algorithms for motion estimation. Procedia Technology, 11, 199–205. Yam, C., Nixon, M., & Carter, J. (2002). Gait recognition by walking and running: a model-based approach. Asian Conference on Computer Vision (pp. 1–6). Yu, S., Tan, D., & Tan, T. (2006). A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proceedings—International Conference on Pattern Recognition, 4, 441–444. Yu, S., Tan, T., Huang, K., Jia, K., & Wu, X. (2009). A study on gait-based gender classification. IEEE Transactions on Image Processing, 18(8), 1905–1910. Zaki, M., Sayed, T., & Shaaban, K. (2014). Use of drivers’ Jerk profiles in computer visionbased traffic safety evaluations. Transportation Research Record: Journal of the Transportation Research Board, 2434, 103–112.
References
85
Zeni, J. A., Richards, J. G., & Higginson, J. S. (2008). Two simple methods for determining gait events during treadmill and overground walking using kinematic data. Gait Posture, 27(4), 710–714. Zhang, J., Su, T., Zheng, J., & He, X. (2017). Novel fast coherent detection algorithm for radar maneuvering target with Jerk motion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(5), 1792–1803. Zheng, S., Zhang, J., Huang, K., He, R., & Tan, T. (2011). Robust view transformation model for gait recognition. In Proceedings—International Conference on Image Processing, ICIP (pp. 2073– 2076). Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, A., Rosenhahn, B., & Seidel, H. (2009). Complementary optic flow. In International workshop on energy minimization methods in computer vision and pattern recognition (pp. 207–220). Springer.