Lan and Huttenlocher, 2004). These all represent ex-
amples of high-level motion models, since the motion
represents a change in conﬁguration rather than the
movement of individual parts.
The motion of tracked feature points has been used in
the analysis and recognition of quadruped gait (Gib-
son et al., 2003). In this approach tracking errors are
overcome by ﬁrst splitting the foreground object into
quadrants and then analysing the average motion of
each quadrant. This approach relies on being able to
accurately locate the centre of the foreground object
making it sensitive to outliers.
Gait has successfully been detected using approaches
such as spatio-temporal features (Schuldt et al.,
2004), symmetry cues (Havasi et al., 2007) and prob-
abilistic models learnt from sparse motion features
(Song et al., 2001). However, these approaches do
not estimate the particular phase of gait only that it is
present.
In this work we present a system that exploits the
low-level motion of a sparse set of feature points ex-
tracted using the Kanade-Lucas-Tomasi (KLT) fea-
ture tracker (Shi and Tomasi, 1994). The feature
points track both the foreground and background of
the image meaning segmentation must be carried out.
The feature points also contain tracking errors that are
not gaussian in nature but systematic due to for exam-
ple edge effects, this is particularly apparent during
self occlusion e.g. as one leg passes another.
We initially learn motion models that represents the
expected trajectories for each of the main limbs.
Given a set of feature points we use our models to
simultaneously solve two problems: the ﬁrst problem
is that of labelling the feature points as belonging to
the background or foreground, if a feature is classiﬁed
as a foreground point it is also assigned to the limb
that the feature’s motion best represents. The second
problem is to estimate the phase that the limb must be
in to have produced the observed motion.
Once all feature points have been classiﬁed we then
integrate over all the points and estimate the most
likely gait phase for each frame, ensuring that only
smooth transitions are allowed between frames. This
is achieved without making assumptions about the lo-
cation of any of the features; each trajectory is classi-
ﬁed depending only on its motion, not its position.
2 LEARNING
Our objective is to learn a statistical model for each
of the main limbs that represents how we would ex-
pect a point located at that limb to move through time.
To create a motion model we use a representation
similar to (Coughlan et al., 2000) except we make
our representation dependant on orientation; we as-
sume that people walk upright. A different motion
model is learnt for each limb and is represented as
a chain of m vectors, where each vector represents
the mean displacement you would expect to observe
between frames. Each vector also deﬁnes the cen-
tre of a Gaussian that represents the variation in mo-
tion we expect. This model can be deﬁned by the
parameters Θ = (R, Σ), where R = {R
1
, .., R
m
} and
Σ = {Σ
1
, .., Σ
m
}. R
j
is the average motion belong-
ing to the jth point in the chain and Σ
j
is the cor-
responding covariance matrix. This representation is
illustrated in Figure 1.
),(R
111
θ
r
),(R
222
θ
r
),(R
333
θ
r
),(R
444
θ
r
2
∑
Figure 1: Chain used to represent a motion trajectory. r is
the magnitude of the vector; θ is the angle relative to the
horizontal; Σ is the covariance matrix.
To learn a model for each limb consider we have
a set of example gait cycles {g
1
, .., g
n
} where each
gait cycle consists of m temporally ordered vectors
{
v
1
, ..,
v
m
}
. We want to learn a model
Θ
max
that max-
imises
P(g
1
, .., g
n
|Θ) =
n
∏
i=1
m
∏
j=1
p(g
i
j
|Θ
j
) (1)
This is a maximisation over all the training examples
for every position in the model. We see that equation
(1) can be maximised by solving for each Θ
j
indepen-
dently,
Θ
max
j
= arg max
Θ
j
n
∏
i=1
p(g
i
j
|Θ
j
) (2)
This is the Maximum Likelihood estimate for Θ
j
and
can be calculated directly from the training examples.
Each position j in the model can be seen as represent-
ing a different gait phase.
However, our ground truth data consists of coarsely
hand labeled x and y positions of the main limbs
through the duration of a video clip. To use the
method described above we need examples of indi-
vidual gait cycles and we need all gait cycles to have
the same temporal length.
The data can be cut into individual gait cycles by us-
ing a reliable heuristic, for example the turning point
in the data that corresponds to when the toes are at
their maximum height. To make each gait cycle the
same temporal length the average length is ﬁrst calcu-
lated. A Cubic spline is then ﬁtted to each individual
USING LOW-LEVEL MOTION TO ESTIMATE GAIT PHASE
497