NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS
IN ACTIVE VISION SYSTEMS
Data Acquisition, LPV Observer Design, Analysis and Test
Tiago Gaspar and Paulo Oliveira
Institute for Systems and Robotics, Instituto Superior T
´
ecnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal
Keywords:
Depth estimation, Depth from focus, LPV observers.
Abstract:
In this paper, new methodologies for the estimation of the depth of a generic moving target with unknown
dimensions, based upon depth from focus strategies, are proposed. A set of measurements, extracted from real
time images acquired with a single pan and tilt camera, is used. These measurements are obtained resorting
to the minimization of a new functional, deeply rooted on optical characteristics of the lens system, and
combined with additional information extracted from images to provide estimates for the depth of the target.
This integration is performed by a Linear Parameter Varying (LPV) observer, whose syntesis and analysis are
also detailed. To assess the performance of the proposed system, a series of indoor experimental tests, with
a real target mounted on a robotic platform, for a range of operation of up to ten meter, were carried out. A
centimetric accuracy was obtained under realistic conditions.
1 INTRODUCTION
Depth estimation plays a key role in a wide vari-
ety of domains, such as target tracking (Bar-Shalom
et al., 2001), 3D reconstruction (Bertelli et al., 2008),
obstacle detection (Discant et al., 2007), and video
surveillance (Haritaoglu et al., 2000). In 3D image
applications, a common approach consists in using
triangulation methods applied to the data collected by
two or more cameras. However, there has been work
on estimating depth resorting to a single camera, see
(Krotkov, 1987) and (Ens and Lawrence, 1993). In
addition to the main advantage of requiring just one
camera, this technique reduces the impact of the im-
age to image matching problem, as well as the impact
of occlusion problems, see (Schechner and Kiryati,
1998). The idea is to explore the relation between
the depth of a point in the 3D world and the amount
of blur that affects its projection into acquired im-
ages. This is done by modelling the influence that
some of the camera intrinsic parameters have on im-
ages acquired with a small depth of field. Based upon
this principle, there are three main strategies that have
been explored: depth from blur by focusing, see (Viet
et al., 2003) and (Pentland, 1987); depth from blur
by zooming, see (Asada et al., 2001); and depth from
blur by irising, see (Ens and Lawrence, 1993).
In this paper, we are mainly concerned with depth
estimation from blur by focusing. Two different tech-
niques based upon this approach can be found in the
literature: depth from defocus, see (Pentland, 1987)
and (Ens and Lawrence, 1993), and depth from focus,
see (Krotkov, 1987), (Nayar and Nakagawa, 1994),
and (Viet et al., 2003). This work is based on the latter
method, since this type of approach does not require
a mathematical model for the blurring process of the
camera, i.e. the point spread function (PSF) respon-
sible for the blurring does not need to be modeled.
This is not possible in depth from defocus strategies,
where it is common to consider that this function is
either a two-dimensional Gaussian, or a circle of con-
stant intensity. Moreover, the amount of blur present
in an image is a consequence of both the characteris-
tics of the lens and the scene itself, which restricts
the applicability of depth from defocus methods to
step discontinuities in the scene. There are strategies
that tackle this problem by using a minimum of two
images of the same scene, acquired with a different
depth of field (Pentland, 1987). Since the contribution
of the scene to all images is the same, it can be re-
moved. However, measuring the amount of blur with
high precision is still a difficult problem, as it is an
ill-posed inverse problem.
In this paper, two novelties are proposed: a new
algorithm for the estimation of the depth of a target
with unknown dimensions is developed, and a strat-
484
Gaspar T. and Oliveira P..
NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV Observer Design, Analysis and Test.
DOI: 10.5220/0003356904840491
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 484-491
ISBN: 978-989-8425-47-8
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
egy to estimate these dimensions is also described.
The depth estimation problem is tackled by com-
bining information present on the target boundary,
namely the amount of blur that corrupts this region,
with measurements of the dimensions of the projec-
tion of the target into acquired images. The dynam-
ics of the depth of the target is written as a function
of a parameter that depends on the dimensions of the
image of the target, which leads to a LPV observer
for the depth of moving targets with unknown dimen-
sions. In what concerns the dimensions of the real tar-
get, they are estimated resorting to the depth estimates
provided by the observer and to the measurements of
the dimensions of the image of the target.
This document is organized as follows: in sec-
tion 2, some background on theory of defocus is pro-
vided, and in section 3, a new method to estimate the
camera focus value that minimizes the amount of blur
in an image discontinuity is presented. In section 4,
the design and analysis of the proposed LPV observer
are detailed, and in section 5, experimental results il-
lustrating the performance of the described depth es-
timation algorithms are provided. Finally, section 6
summarizes the main conclusions of this work and
unveils challenging problems for the future.
2 BACKGROUND ON THEORY
OF DEFOCUS
There are two traditional approaches to model the
image formation process: one uses geometrical op-
tics and the other physical optics. The first is an
approximation that disregards behaviours specifically
attributed to the wave nature of light, such as in-
terference and diffraction, and relies on ray tracing.
The great simplicity of this approach compensates
for its inaccuracies. On the contrary, the second re-
lies on diffraction theory, and its results are exact.
In this work, only geometrical effects are considered
since the spatial resolution of the used imaging sys-
tem makes diffraction effects negligible.
The idea of inferring depth from focus is based on
the concept of depth of field, which is a consequence
of the inability of cameras to simultaneously focus
planes on the scene at different depths. The depth of
field of a camera with a given focus value corresponds
to the distance between the farthest and the nearest
planes on the scene, in relation to the camera, whose
points appear in acquired images with a satisfactory
definition, according to a certain criterion.
At each instant, a lens can exactly focus points in
only one plane, denominated object plane. Consider-
ing a thin model for the lens of the camera, see (Hecht,
2001), it is possible to establish a nonlinear relation
between the distance z from the lens to the plane that
the camera can exactly focus at each instant of time,
and the distance v between the lens and the image
plane at which the projection of objects in the scene
appears sharply focused, see Fig. 1. To complete the
Object
plane
Sensor
plane
Image
plane
f
z
v
P
L
Lens
R
c
v
0
Figure 1: Geometrical optics model for the imaging process
of a thin lens.
relation, the focal length f of the lens must be con-
sidered. This relation is known as the Gaussian Lens
Formula, see (Hecht, 2001), and can be rearranged in
the form
z =
f v
v f
. (1)
Considering that the CCD sensor plane is located
at a distance v
0
< v from the lens, and using (1) and
some trigonometric manipulations, it is possible to
write the distance z from the lens to the object plane
in the scene as
z =
f v
0
2R
c
F + v
0
f
, (2)
see (Ens and Lawrence, 1993) and (Pentland, 1987)
for details, where F is the f-number of the lens and
R
c
is the effective radius of the point spread function.
This expression is valid when v > v
0
. An expression
similar to this would be easily derived for the case
v < v
0
.
In practical applications, usually all parameters in
the right-hand side of equation (2) are known, except
for R
c
. Depth from focus methods consist in finding
the sensor plane position that minimizes the amount
of blur present in image points of interest. This cor-
responds to finding the camera focus value that leads
to R
c
= 0, which is solved by optimizing a cost func-
tion that depends on the amount of blur present in the
points of interest. Depth can then be computed using
(1).
3 MINIMUM BLUR FOCUS
VALUE
In this section, a method to estimate the camera focus
NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV
Observer Design, Analysis and Test
485
value that leads to the minimum amount of blur in an
image discontinuity is proposed. The cost function
used for this purpose is described, as well as the pro-
cedure used to search for its minimum.
3.1 Cost Function
The estimation of the camera focus value that mini-
mizes the amount of blur in an image discontinuity
requires the definition of a metric that quantifies the
sharpness of a transition in an image. Metrics re-
lated with high-frequency energy contents in the im-
age, such as the Fourier transform, the image gradient,
or the Laplacian, are detailed in (Krotkov, 1987).
There are some properties that are desirable for a
cost function: it must preferably be unimodal, vary
monotonically with the focus value on either side of
the mode, and be robust in presence of noise. In
(Krotkov, 1987), several cost functions were tested
and the maximization of the magnitude of the image
intensity gradient proved to achieve better results in
what concerns the referred criteria.
The goal of our system is to estimate the depth of
a target, therefore the metric proposed aims to max-
imize the image gradient magnitude across lines or-
thogonal to its boundary, which can be found resort-
ing to active contours, see (Blake and Isard, 2000) for
details. This approach considers that the real target
boundary is on a plane perpendicular to the camera
optical axis, which is the plane that appears sharply
focused when the camera focus value v
0
(i.e. the dis-
tance between the lens and the plane of the CCD sen-
sor of the camera) is the one that optimizes the metric
proposed. The plane in which the target boundary is
considered to be is the plane that specifies the depth
of the target. The problem at hand can be formulated
as
min
v
0
g(v
0
),
where the cost function
g(v
0
) =
1
1
N
l
N
l
i=1
max
(x,y)l
i
||I
v
0
(x,y)||
2
(3)
is the inverse of the mean of the image gradient mag-
nitude maximum values across lines orthogonal to the
target boundary. Moreover, N
l
denotes the number of
lines used, l
i
the i-th line, the gradient operator, ||·||
the Euclidean norm, and I
v
0
(x,y) the intensity of the
image acquired with the focus value v
0
at point (x,y).
The formulation of this problem as the minimization
of g(v
0
), instead of the maximization of its inverse,
is based on the model that will be proposed for this
function in the sequel.
3.2 Optimization of the Cost Function
The minimization of the cost function proposed in
(3) is difficult. The data available is scarce and to
get new information, the acquisition of new images is
required. The problem is even more difficult as we
want to estimate parameters related with the depth of
a moving target. Therefore, a model for the cost func-
tion that allows to infer its minimum resorting only to
a few images will be derived.
In order to gain some insight into how to model
the cost function proposed, consider that, for a given
focus value v
0
, acquired images are obtained from the
convolution of the corresponding sharply focused im-
age I
f
v
0
(x,y) with the point spread function h(x,y) of
the lens system, i.e. with the function that models the
camera blurring process:
I
v
0
(x,y) =
Z
Z
I
f
v
0
(α,β)h(x α, y β)dαdβ.
A common model for the point spread function is
a circle of constant intensity. Let, in this situation, the
PSF be
h(x,y) =
(
1
πR
2
c
, x
2
+ y
2
R
2
c
0 , x
2
+ y
2
> R
2
c
,
where R
c
denotes the radius of the circle, and consider
the existence of a vertical step in the sharply focused
image of the form I
f
v
0
(x,y) = a
1
+ a
2
u(x x
0
), where
u(x x
0
) is the standard unit step function centered
at point x
0
, a
1
is the intensity of the image when x <
x
0
, and a
2
is the magnitude of the step. Thus, this
approach profits from the target segmentation method
used.
In this situation, it is straightforward to show that
the partial derivative of I
v
0
(x,y) with respect to y is 0,
since I
f
v
0
(x,y) does not depend on this variable, and
differentiation and convolution are linear operations,
thus they commute. Using this fact, and after some
mathematical manipulation, it is also possible to show
that the partial derivative of I
v
0
(x,y) with respect to x
is
(
0 , |x x
0
| > R
c
2a
2
πR
2
c
p
R
2
c
(x x
0
)
2
, |x x
0
| R
c
.
By considering a line l orthogonal to the boundary of
the target, it is possible to conclude that
max
(x,y)l
||I
v
0
(x,y)||
2
= ||I
v
0
(x,y)||
2
x=x
0
=
2a
2
πR
c
2
.
From Fig. 1, and resorting to some trigonometric
manipulations, it is possible to write the value of R
c
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
486
as a function of the already defined quantities f , z,
and v
0
, and the diameter of the lens L, see (Ens and
Lawrence, 1993) for details. The replacement of the
value of R
c
in
2a
2
πR
c
2
by its expression, allows to write
the cost function proposed in (3) in the form
g(v
0
) =
( f z)
2
v
2
0
+ 2 f z( f z)v
0
+ ( f z)
2
[4 f za
2
/(Lπ)]
2
.
According to the discussion above, which is con-
firmed by Fig. 2, the cost function in (3) is expected to
depend parabolically on v
0
. Therefore, the parabolic
model g(v
0
) = a(v
0
v)
2
+ b, where a, b, and v are
parameters to be estimated, was considered for the
cost function. In particular, v is the camera focus
value that minimizes the cost function. This expres-
sion can also be written as g(v
0
) = a
0
v
2
0
+ b
0
v
0
+ c
0
,
where a
0
= a, b
0
= 2av, and c
0
= av
2
+ b. In this
form, the model of the cost function depends linearly
on the parameters that must be estimated, which sim-
plifies significantly the fitting process described next.
This is the reason why the minimization of g(v
0
) was
considered, instead of the maximization of its inverse,
which seemed more intuitive.
20.5 20.55 20.6 20.65 20.7 20.75
2
2.2
2.4
2.6
2.8
v
0
[mm]
g(v
0
)
parabola model
measurements
Figure 2: Cost function for an AXIS 215 PTZ, when the
camera focal length is 29 mm and the target is 3 m away
from the lens.
Consider that at instant k the focus value of the
camera is v
0
k
, and that a measurement of the cost
function g(v
0
k
) corrupted by additive white Gaussian
noise is available. Stacking N of these measurements,
a fitting problem can be formulated as min
y
||Ay b||,
where ||·|| is the Euclidean norm, A is a matrix with
N rows of the form [v
2
0
k
v
0
k
1], b is a column vector
that stacks the N measurements, and y = [a
0
b
0
c
0
]
T
is
the vector of parameters to be estimated. The solution
to this problem is straightforward, using least squares
method. Given the three unknowns of the model, each
minimization of the cost function requires the acqui-
sition of at least three images with different focus val-
ues. This procedure must be repeated over time since
the cost function varies with the instantaneous depth
of the target. Once estimated the three parameters,
from which the camera focus value v that minimizes
the cost function is easily obtainable, the depth z of
the target can be computed resorting to (1).
The depth estimation method proposed in this sec-
tion is robust to variations in parameters such as scene
illumination or camera zoom and aperture values,
which may change the shape of the cost function, see
Fig.3, since the implemented estimation process es-
timates new parabola coefficients in each iteration of
the algorithm, leading to the adaptation of the cost
function model to those values.
3000
3500
4000
20.45
20.5
20.55
20.6
20.65
20.7
0
5
10
v
0
[mm]
z [mm]
g(z, v
0
)
high luminosity
low luminosity
Figure 3: Luminosity influence on the cost function, for sev-
eral target depths (results obtained with an AXIS 215 PTZ;
f = 45.6 mm).
4 DEPTH LPV OBSERVER
In this section, an observer for the depth of a target
with unknown dimensions is pursued. A state-space
formulation for the evolution of the target depth is de-
rived in continuous time, and an observer for the state
of the LPV system that results is proposed. The anal-
ysis of the observer stability and its discrete-time ver-
sion are also provided.
4.1 Continuous-time Observer
Considering a pinhole model for the camera, see
(Faugeras and Luong, 2001), the relation between the
cartesian coordinates of a point in the camera refer-
ence frame (x,y, z) and the coordinates (x
p
,y
p
) of its
projection into the image plane is given by x
p
= f x/z
and y
p
= f y/z, where the origin of the camera ref-
erence frame was considered to be coincident with
the camera optical centre, and the origin of the image
frame is in the image centre.
From the expressions of x
p
and y
p
, it is straightfor-
ward to show that the distance R, between two points
in a plane at a distance z from the camera, and the
distance r, between the projection of these points into
the image plane, are related by
r = f R/z. (4)
In particular, if two points of the real target, lying in
the plane in which the target boundary is considered
NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV
Observer Design, Analysis and Test
487
to be, are used to obtain a measure of the real target
dimensions, they will verify this relation. However,
the use of a distance between two points as a mea-
sure of the target dimensions would require a precise
identification of those points in each image, which is
a very difficult problem to solve, especially when the
projection of the target appears with different orienta-
tions in different images.
In order to obtain a measure of the target dimen-
sions invariant to rotations of the image of the target,
consider that the coordinates x R
2
, of a point of
the curve that describes the target boundary, consist of
two discrete random variables, and that the covariance
of x is Σ
x
. Moreover, let x
a
R
2
be the coordinates
of a point of the curve that describes the boundary of
a target in an image, and x
b
= R
x
x
a
the coordinates
of the same point when the target boundary is rotated
by an amount R
x
, where R
x
is an element of the Spe-
cial Orthogonal group SO(2). Consider also that both
quantities are random variables with covariance ma-
trices Σ
x
a
and Σ
x
b
, and that tr(·) denotes the trace of
a matrix. If r
a
=
p
tr(Σ
x
a
) and r
b
=
p
tr(Σ
x
b
) are the
dimensions of the image of the target associated with
x
a
and x
b
, respectively, then
r
b
=
q
tr(Σ
x
b
)=
q
tr(R
x
Σ
x
a
R
T
x
)=
q
tr(Σ
x
a
R
T
x
R
x
)=r
a
,
since R
T
x
R
x
= I
2×2
, where I
2×2
is the identity ma-
trix of dimensions 2 ×2. Therefore, the square root
of the trace of the covariance matrix associated with
the boundary of the image of the target was used as
a measure of its dimensions, since this quantity is in-
variant to rotations of the boundary of the target.
According to relation (4), and assuming that the
focal length f of the lens and the dimensions R of the
real target do not vary over time, it is possible to write
the derivative of the depth of the target with respect to
time in the form
˙z =
˙r
r
z, (5)
where r and ˙r denote the square root of the trace of
the covariance matrix associated with the boundary of
the image of the target and its derivative with respect
to time, respectively. Both quantities follow directly
from the boundary of the projection of the target into
acquired images, and their measurements are here de-
noted r
m
and ˙r
m
. Assuming that z
m
, r
m
, and ˙r
m
are
exact measurements of z, r, and ˙r, and denoting the
quotient
˙r
m
/r
m
by a parameter α, a deterministic LPV
system with the realization
˙z = αz
z
m
= z
results. An observer for the state z of this system can
be written in the form
˙
ˆz = αˆz + h(z
m
ˆz), ˆz(t
0
) = ˆz
0
, (6)
see (Rugh, 1996), where ˆz and
˙
ˆz are the target depth
estimate and its derivative with respect to time, re-
spectively, h is the observer gain, t
0
is the initial time
instant, and ˆz
0
is the initial estimate for the target
depth.
From the considerations above, it is straightfor-
ward to show that the state estimation error ˜z = z ˆz
satisfies the linear state equation
˙
˜z = (α h)˜z, ˜z(t
0
) = z
0
ˆz
0
, (7)
where
˙
˜z denotes the derivative of the estimation error
with respect to time. The values of r
m
and ˙r
m
, and as
a consequence the value of α, depend on several vari-
ables, such as the target dimensions, the target depth,
and the target motion. Therefore, the gain of the ob-
server must be chosen according to the experiment
at hand to guarantee the stability of the observer, as
shown in Proposition 1.
Proposition 1. The linear state equation (7) is uni-
formly exponentially stable if the gain h of the ob-
server verifies h α
max
+
ν
/2q, where α
max
is the up-
per bound of α, and ν and q are finite positive con-
stants.
Proof. Consider the Lyapunov function V (˜z) = q˜z
2
,
where q is a finite positive constant. From the er-
ror dynamics in (7), it is possible to show that the
derivative of this Lyapunov function with respect to
time has the form
˙
V (˜z) = 2q(α h)˜z
2
. According to
Lyapunov theory, see (Rugh, 1996), the linear state
equation (7) is uniformly exponentially stable if there
exists a q that, for all possible values of α, verifies
2q(α h) ν, where ν is a finite positive constant.
This relation can be rewritten in the form h α+
ν
/2q.
Consider that α has an upper bound α
max
, which is
specified by the values that both r
m
and ˙r
m
can as-
sume. If the gain of the observer is chosen in such a
way that h α
max
+
ν
/2q is verified, for given values
of ν and q, then the observer-error state equation (7) is
guaranteed to be uniformly exponentially stable.
4.2 Discrete-time Observer
According to relation (6), the depth estimates pro-
vided by the proposed observer can be rewritten in
the form
˙
ˆz(t) = (α(t) h)
| {z }
a(t)
ˆz(t) + hz
m
(t), (8)
where a is a new parameter. The time variable t, omit-
ted in previous sections, was considered to distinguish
the terms that depend on time from the ones that do
not.
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
488
The solution of the homogeneous equation
˙
ˆz(t) =
a(t)ˆz(t) is given by
ˆz(t) = e
R
t
τ
a(σ)dσ
| {z }
Φ(t,τ)
ˆz(τ),
where τ is an arbitrary instant of time verifying t τ.
Therefore, the solution of (8) has the form
ˆz(t) = Φ(t,τ)ˆz(τ) +
Z
t
τ
Φ(t,σ)hz
m
(σ)dσ, t τ,
see (Rugh, 1996) for more details. Evaluating this
expression for t = (k + 1)T and τ = kT , where T is a
fixed positive constant and k = k
0
,k
0
+ 1,. . . , yields
ˆz
k+1
= F
k
ˆz
k
+ Λ
k
u
k
,
where u
k
and ˆz
k
denote the input z
m
and the state
estimate ˆz, respectively, at instant kT . The values
of the input z
m
and parameter a were assumed con-
stant over the integration range, and the index k
0
is
associated with the initial time instant k
0
T . Accord-
ing to the considerations above, we have F
k
= e
a
k
T
and Λ
k
= h
(e
a
k
T
1)
/a
k
, where a
k
= h
˙r
m
k
/r
m
k
. The
variables associated with the subscript k are discrete,
with values that correspond to the evaluation of their
continuous-time versions at time instant kT .
The discrete-time LPV observer derived in this
section provides estimates for the depth of a target,
with unknown dimensions, moving in a 3D scene.
Therefore, this observer is suitable for the tracking
system proposed, since it is appropriate for imple-
mentation in a digital computer.
5 EXPERIMENTAL RESULTS
In this section, some brief considerations about the
implementation of the proposed depth estimation al-
gorithms and experimental results illustrating their
performance are presented.
LPV
observer
sec. 3.1
sec. 3.2
eq. (1)
sec. 4
fv
fz
z
f
I
v
0
1
I
v
0
2
I
v
0
3
g
(
v
0
1
)
g
(
v
0
2
)
g
(
v
0
3
)
v
v
z
m
ˆz
f
v
c
0
Figure 4: Scheme of the proposed depth estimation algo-
rithms.
Figure 4 depicts a simplified version of the archi-
tecture of the proposed depth estimation strategies.
In this figure, I
v
0
i
and g(v
0
i
), i = 1, 2, 3, denote, re-
spectively, the three images used by the depth from
focus algorithm and the cost function measurements
extracted from these images. The value of v
c
0
corre-
sponds to the focus value used to command the focus
of the camera.
Results provided in this section were obtained
with the 215 PTZ camera from AXIS. Images with the
spatial resolution 704 ×576 pixels were used. Since
image segmentation is itself a very complex domain,
which does not correspond to the main focus of this
work, targets with easily identifiable colours were
considered.
As in most cameras, the value of the distance v
0
,
between the plane of the CCD sensor of the used cam-
era and the lens of the camera, is not accessible to
the operator. Instead, a different parameter ranging
from 1 to 9999 is available. This parameter is speci-
fied by the manufacturer and is usually known as the
camera focus setting. The use of the depth estima-
tion algorithms proposed requires the calibration of
the relation between these two quantities, see (Tara-
banis et al., 1992) for details about this procedure.
The implementation of the proposed discrete-time
observer requires the availability of discrete-time ver-
sions of the measurements extracted from images.
The value of the target depth at time instant kT ,
k = k
0
,k
0
+ 1,. .., obtained from the depth from fo-
cus algorithm, is denoted z
m
k
. The dimensions r
m
k
of
the projection of the target into the image acquired at
instant kT , and its derivatie over time ˙r
m
k
, are com-
puted according to
p
tr(Σ
x
k
) and (r
m
k
r
m
k1
)/T , re-
spectively, where Σ
x
k
denotes the covariance matrix
associated with the boundary of the projection of the
target into the image acquired at instant kT . As stated
before, this boundary is estimated resorting to active
contours.
In the sequel, two experiments are reported: one
in which the target, a balloon attached to a robot Pio-
neer P3-DX as in Fig. 5, moves along a straight line,
and other in which the target describes a circumfer-
ence. In both experiments, the nominal sampling in-
terval T for the application was set to 1.3 s, due to
limitations imposed by the resources available, and
the focal length of the lens was set to its maximum,
f = 45.6 mm.
The performance of the depth estimates provided
by the depth from focus algorithm and discrete-time
observer in both experiments is illustrated in Figs. 6
and 7. In Fig. 6, the nominal depth of the target is
plotted in blue and the value of the depth estimates
provided by the depth form focus algorithm and LPV
observer are plotted in red and green, respectively.
As can be seen, both estimates converge to the target
real depth, i.e. the depth estimation error, depicted
in Fig. 7, converges to zero. From the standard de-
NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV
Observer Design, Analysis and Test
489
(a) (b)
Figure 5: Real time target tracking. Left: experimental
setup; right: target identification, where the initial guess for
the target contour is presented in black, its temporal evo-
lution is presented in red, and the final contour estimate is
presented in blue.
0 50 100 150 200
2500
3000
3500
4000
4500
t [s]
target depth [mm]
real
z
m
ˆz
(a) Straight line trajectory.
0 75 150 225 300 375 450
2500
3000
3500
4000
4500
t [s]
target depth [mm]
real
z
m
ˆz
(b) Circular trajectory.
Figure 6: Depth estimation (h = 0.4).
viations σ
ss
of the steady-state depth estimation er-
rors presented in this figure, it is possible to confirm
that the depth estimates ˆz provided by the observer
perform better than the measurements z
m
obtained di-
rectly from the depth from focus strategy. The stan-
dard deviations of the steady-state errors associated
with the depth estimates provided by the LPV ob-
server (20.7 mm in the straight line trajectory and
37.7 mm in the circular trajectory) are smaller than
the ones associated with the depth measurements pro-
vided by the depth from focus algorithm (45.5 mm in
the straight line trajectory and 79.8 mm in the circular
trajectory).
There are several reasons that can explain the er-
0 50 100 150 200
−1000
−750
−500
−250
0
250
500
750
1000
t [s]
target depth error [mm]
z
m
[σ
ss
= 45.5 mm]
ˆz
l p v
[σ
ss
= 20.7 mm]
(a) Straight line trajectory.
0 75 150 225 300 375 450
−1000
−750
−500
−250
0
250
500
750
1000
t [s]
target depth error [mm]
z
m
[σ
ss
= 79.8 mm]
ˆz
l p v
[σ
ss
= 37.7 mm]
(b) Circular trajectory.
Figure 7: Depth estimation error (h = 0.4).
rors observed in Fig. 7: i) uncertainty associated with
the characterization of the real trajectory described
by the target; ii) errors resulting from the fitting of
the cost function, and iii) uncertainty associated with
the calibration of the relation between the focus value
and the focus setting of the camera. In particular, part
of the errors associated with the estimates provided
by the observer result from the fact that the measure-
ments z
m
, r
m
, and ˙r
m
are not exact, as assumed in the
derivation of the observer, but corrupted by noise.
For the experiments reported in this section, the
dimensions of the target do not vary over time. There-
fore, it is possible to infer from (4) that an estimate
ˆ
R of the target dimensions can be obtained accord-
ing to
ˆ
R = E
{
(r
m
ˆz)
/f
}
, where E denotes the expected
value operator. The quantities r and z, in (4), were
replaced by the measurements r
m
and by estimates ˆz
of the depth of the target, respectively, since their real
values are not known. In discrete-time, this expres-
sion can be rewritten in the form
ˆ
R =
1
N
T
N
T
k=1
r
m
k
ˆz
k
f
,
where N
T
denotes the number of iterations of the ex-
priment, and r
m
k
and ˆz
k
the values of r
m
and ˆz, re-
spectively, at time instant kT . The use of this strategy
to estimate the dimensions of the target used in the
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
490
two experiments leads to
ˆ
R = 35.80 mm, when the
depth estimates provided by the depth from focus al-
gorithm are used, and to
ˆ
R = 35.77 mm, when the
depth estimates are provided by the observer. The-
ses values are very close to the target real dimensions,
R = 35.76 mm.
6 CONCLUSIONS
In this paper, new methodologies for the estimation
of the depth of a moving target with unknown di-
mensions were proposed. Measurements of the target
depth, extracted from real time images acquired with
a single pan and tilt camera and based upon depth
from focus techniques, were used. These measure-
ments were processed resorting to a LPV observer,
whose analysis and synthesis were also provided. The
performance of the proposed algorithms was assessed
resorting to a series of indoor experimental tests, for
a range of operation of up to ten meter. A centimet-
ric accuracy was obtained under realistic conditions.
In the near future, this system will be used to gener-
ate real time 3D trajectories of marine animals under
captivity, for behavioural studies.
ACKNOWLEDGEMENTS
This work was partially funded by FCT (ISR/IST
plurianual funding) through the PIDDAC Program.
The work of Tiago Gaspar was supported by the PhD
Student Scholarship SFRH/BD/46860/2008, from
FCT.
REFERENCES
Asada, N., Baba, M., and Oda, A. (2001). Depth from blur
by zooming. In Proceedings of the Vision Interface
Annual Conference, pages 165–172.
Bar-Shalom, Y., Rong-Li, X., and Kirubarajan, T. (2001).
Estimation with Applications to Tracking and Naviga-
tion: Theory Algorithms and Software. John Wiley &
Sons, Inc.
Bertelli, L., Ghosh, P., Manjunath, B., and Gibou, F. (2008).
Robust depth estimation for efficient 3d face recon-
struction. 15th IEEE International Conference on Im-
age Processing, pages 1516–1519.
Blake, A. and Isard, M. (2000). Active Contours. Springer,
1st edition.
Discant, A., Rogozan, A., Rusu, C., and Bensrhair, A.
(2007). Sensors for obstacle detection - a survey. 30th
International Spring Seminar on Electronics Technol-
ogy, pages 100–105.
Ens, J. and Lawrence, P. (1993). An investigation of meth-
ods for determining depth from focus. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
15(2):97–108.
Faugeras, O. and Luong, Q. (2001). The geometry of multi-
ple images. MIT Press.
Haritaoglu, I., Harwood, D., and Davis, L. (2000). W
4
:
real-time surveillance of people and their activities.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 22(8):809–830.
Hecht, E. (2001). Optics. Addison-Wesley, 4th edition.
Krotkov, E. (1987). Focusing. International Journal of
Computer Vision, 1:223–237.
Nayar, S. and Nakagawa, Y. (1994). Shape from focus.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 16(8):824–831.
Pentland, A. P. (1987). A new sense for depth of field. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 9(4):523–531.
Rugh, W. (1996). Linear System Theory. Prentice Hall, 2nd
edition.
Schechner, Y. Y. and Kiryati, N. (1998). Depth from de-
focus vs. stereo: How different really are they? In
Proceedings of the International Conference on Pat-
tern Recognition, pages 1784–1786.
Tarabanis, K., Tsai, R., and Goodman, D. (1992). Modeling
of a computer-controlled zoom lens. In Proceedings
of the IEEE International Conference on Robotics and
Automation, volume 2, pages 1545–1551.
Viet, H. Q. H., Miwa, M., Maruta, H., and Sato, M. (2003).
Recognition of motion in depth by a fixed camera. In
VII Digital Image Computing: Techniques and Appli-
cations, pages 205–214.
NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV
Observer Design, Analysis and Test
491