To compute the KL distance on those values, we nor
malize p
i
and q
i
as following
ˆp
i
=
p
i
∑
n
j=1
p
j
, ˆq
i
=
q
i
∑
n
j=1
q
j
The KL distance is deﬁned as
d
kl
= d( ˆq, ˆp) =
n
∑
i=1
ˆq
i
log
ˆq
i
ˆp
i
How to select S can be critical. To maximize the dif
ference between p
i
’s and q
i
’s, it is best to use all the
points in I; however, the computational cost can be
prohibitive. Instead, by sampling points from I, we
typically get equivalent results as long as the sam
pling process is reasonable. We sample points uni
formly along pathlength values. Practically, when
we choose 100 points randomly spaced at 1% seg
ments of pathlength, the results are equivalent to us
ing all the data points.
By examining the KL distance, we can measure
how different two distributions are. However, because
p
i
and q
i
are normalized, this can be problematical.
For instance, when
ˆ
f
M
(x) and
ˆ
f
I
(x) are uniform dis
tributions over different ranges, then all the p
i
’s are
very low, and all the q
i
’s are very high. Although two
distributions are quite different, after normalization ˆp
i
and ˆq
i
form almost identical distributions and d
kl
is
approximately and misleadingly 0.
To overcome this limitation, we introduce an ad
ditional distance measure which represents a quanti
tative difference between p
i
’s and q
i
’s as follows:
d
r
= 1− (
¯p
¯q
) (3)
where ¯p = (
∑
p
i
)/n and ¯q = (
∑
q
i
)/n.
3.2 Robust Distance Measure
Human appearance in video streams varies over time.
In outdoor scenes, lighting, human pose variation and
carried objects may lead to changes in the foreground
region. To cope with such variations we employ a
robust estimation norm that adjusts the weighting of
points within the distance metric based on whether
points are inliers or outliers.
For the robust estimation, we employ the general
Mestimator of (Huber, 1977), which minimizes the
objective function,
n
∑
i=1
ρ(e
i
) =
n
∑
i=1
ρ(y
i
− x
i
T
b) (4)
where x
i
’s are independent variables, y
i
’s are data
points, b is a coefﬁcient vector, ρ is the inﬂuence
function, and n is the number of data points.
If we deﬁne the weight function ω(e) = ρ
′
(e)/e,
and let ω
i
= ω(e
i
). Then we need to solve the follow
ing equation to minimize (4)
n
∑
i=1
ω
i
(y
i
− x
T
i
b)x
T
i
= 0 (5)
In our approach, we deﬁne a new feature, δ
i
using
p
i
and q
i
for each sample point, s
i
, :
δ
i
=

q
i
− p
i

max(p
i
,q
i
)
When the current instance is correctly matched to a
model, most p
i
’s are similar to q
i
’s leading the δ
i
’s to
be close to 0. On the other hand, when the instance
and model are mismatched, most δ
i
’s will be greater
than 0. The mean of δ
i
will represent how well the
current instance is matched to the model. We apply
the robust ﬁtting (5) to compute the robust mean of
the δ
i
’s, µ; it can be written as
n
∑
i=1
ω
i
(δ
i
− µ) = 0
Notice that weights are designed to minimize the in
ﬂuence of outliers. In other words, the weight of each
data point depends on how far the point is from the
mean. Data points near to the estimated mean get high
weight. Points that are far from the mean have smaller
weights.
We used the iteratively reweighted least square
(IRLS) method using the bisqaure weight function to
solve the equation to get a robust mean as in (Cole
man et al., 1980) and (Fox, 2002).
The ﬁnal weights at the last iteration after the es
timated mean converges were investigated to ﬁnd in
liers. Only data points with the weight greater than
a certain threshold value are regarded as inliers. The
two distances, d
′
r
and d
′
kl
, are recomputed using only
inliers. Fig. 2 shows examples of outliers and inliers
as determined using robust ﬁtting method for a sam
ple region that has been manually altered by changing
its color.
4 SPATIAL ANALYSIS
Sometimes it is possible to improve the accuracy of
the models in the gallery and the matching perfor
mance by utilizing the relative order of participants.
We perform this as follows.
For each model, M
i
, we compute an adjacency
matrix, F
i
that captures the frequency of spatial or
dering among models. An adjacency matrix, F
i
is
m × n, where n is the number of models and m in
dexes relative positions. For example, if N is the
SIGMAP 2007  International Conference on Signal Processing and Multimedia Applications
338