3.2 The Canonical Motions
The central step in our algorithm is the approxima
tion of the optical ﬂow ﬁeld by a combination of three
canonical camera motion ﬂows p, t and z. These
ﬁelds are the distinctive optical ﬂows that ideally
would result from a static scene being images which
represent pan, tilt, or zoom motion, with a speciﬁc
speed.
The canonical pan ﬂow p, by deﬁnition, has
p(u) = (1, 0) at every point u of the image’s do
main. Similarly, the canonical tilt ﬂow t and zoom
ﬂow z are deﬁned, respectivelly, as t(u) = (0,1) and
z(u) = 2u for all u (see ﬁgure 2). We use an im
age coordinate system whose origin is at the center
of each frame, with the x axis pointing left and y
pointing up. The unit of measurement is such that the
image domain D is the rectangle [−0.500, +0.500] ×
[−0.375,+0.375].
Figure 2: The canonical camera motion ﬂows: pan p(u), tilt
t(u), zoom z(u), sampled at a 7×5 grid of points.
The canonical pan ﬂow corresponds to a rotation
of the camera around the local vertical axis that causes
the aim point to sweep horizontally from right to left,
just fast enough to completely replace the ﬁeld of
view from one frame to the other. Assuming an an
gular ﬁeld of view fairly small.
3.3 Analyzing the Optical Flow
The next step in our algorithm is to approximate the
optical ﬂow f between the two given frames by a lin
ear combination
˜
f of the canonical ﬂows, namely
˜
f(u) = P∗p(u) + T ∗t(u) + Z ∗z(u) (2)
for every u ∈ D.
The coefﬁcients P, T and Z, to be determined, will
indicate the amount of pan, tilt and zoom, respec
tively, that seem to have occurred between two con
secutive frames. Note that a negative value for a coef
ﬁcient means that the apparent motion is opposite to
the corresponding canonical movement (that is, a pan
to the right, a tiltup, or a zoomout, respectively).
We compute the coefﬁcients P, T,Z by a straight
forward weighted least squares procedure. For that
purpose, we deﬁne the scalar product of two ﬂows a
and b, with a weight function w, as
h
ab
i
=
D
w(u)a(u)b(u)du
D
w(u)
(3)
The discrete version of this formula, assuming that
the images are sampled at points u
1
,u
2
,.. .,u
n
is
hh
ab
ii
=
∑
n
i=1
w
i
a
i
b
i
∑
n
i=1
w
i
(4)
As usual, we also deﬁne the norm of a (sampled) ﬂow
f as
k
f
k
=
p
hh
ff
ii
. Formulas (3) and (4) obviously
satisfy the deﬁnitions of scalar product and norm, as
long as the weights w
i
are all positive.
We seek P, T and Z that minimize the discrep
ancy between the given ﬂow f and the ideal ﬂow
˜
f of
equation ( 2). The discrepancy is the ﬂow d = f −
˜
f,
and its overall magnitude can be measured by the
square error Q(P, T, Z) =
k
d
k
2
=
f −
˜
ff −
˜
f
. As
in standard leastsquares ﬁtting, the values of P,T,Z
that minimize Q are found by solving the system of
linear equations
hh
pp
ii hh
pt
ii hh
pz
ii
hh
tp
ii hh
tt
ii hh
tz
ii
hh
zp
ii hh
zt
ii hh
zz
ii
P
T
Z
=
hh
fp
ii
hh
ft
ii
hh
fz
ii
(5)
3.4 Weight Adjustment for Vectors
The leastsquares method (5) works ﬁne if the scene
is stationary. Moving objects change the optical ﬂow,
and therefore introduce errors in the ﬁtted parame
ters P, T, and Z, as the leastsquares procedure yields
some average of the two ﬂows. This is not a signiﬁ
cant problem if the moving objects cover a small frac
tion of the image and/or their speed is small compared
to the camera motion ﬂow. However, if the scene con
tains fast moving objects, their ﬂow may easily dom
inate the ﬁtted ﬂow
˜
f.
In order to alleviate this problem, we deﬁne the
weights w
i
as being the reliability weights ω
i
pro
vided by the optical tracking procedure, divided by
the length of the corresponding ﬂow vectors f
i
, that is
w
i
=
ω
i
p
f
i

2
+ ε
2
(6)
where ε is a small constant bias, introduced to avoid
division by zero or very small numbers.
Note that this formula increases the relative
weight of small ﬂow vectors, while reducing that of
large vectors. The justiﬁcation for this correction is
that small ﬂow vectors are indeed more signiﬁcant,
statistically, than large ones. If the sampled optical
ﬂow f contains a signiﬁcant number of very small
vectors mixed with some large ones, the explanation
is that the camera is stationary, and the set K of points
with small vectors is part of the background.