NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS

IN ACTIVE VISION SYSTEMS

Data Acquisition, LPV Observer Design, Analysis and Test

Tiago Gaspar and Paulo Oliveira

Institute for Systems and Robotics, Instituto Superior T

´

ecnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal

Keywords:

Depth estimation, Depth from focus, LPV observers.

Abstract:

In this paper, new methodologies for the estimation of the depth of a generic moving target with unknown

dimensions, based upon depth from focus strategies, are proposed. A set of measurements, extracted from real

time images acquired with a single pan and tilt camera, is used. These measurements are obtained resorting

to the minimization of a new functional, deeply rooted on optical characteristics of the lens system, and

combined with additional information extracted from images to provide estimates for the depth of the target.

This integration is performed by a Linear Parameter Varying (LPV) observer, whose syntesis and analysis are

also detailed. To assess the performance of the proposed system, a series of indoor experimental tests, with

a real target mounted on a robotic platform, for a range of operation of up to ten meter, were carried out. A

centimetric accuracy was obtained under realistic conditions.

1 INTRODUCTION

Depth estimation plays a key role in a wide vari-

ety of domains, such as target tracking (Bar-Shalom

et al., 2001), 3D reconstruction (Bertelli et al., 2008),

obstacle detection (Discant et al., 2007), and video

surveillance (Haritaoglu et al., 2000). In 3D image

applications, a common approach consists in using

triangulation methods applied to the data collected by

two or more cameras. However, there has been work

on estimating depth resorting to a single camera, see

(Krotkov, 1987) and (Ens and Lawrence, 1993). In

addition to the main advantage of requiring just one

camera, this technique reduces the impact of the im-

age to image matching problem, as well as the impact

of occlusion problems, see (Schechner and Kiryati,

1998). The idea is to explore the relation between

the depth of a point in the 3D world and the amount

of blur that affects its projection into acquired im-

ages. This is done by modelling the inﬂuence that

some of the camera intrinsic parameters have on im-

ages acquired with a small depth of ﬁeld. Based upon

this principle, there are three main strategies that have

been explored: depth from blur by focusing, see (Viet

et al., 2003) and (Pentland, 1987); depth from blur

by zooming, see (Asada et al., 2001); and depth from

blur by irising, see (Ens and Lawrence, 1993).

In this paper, we are mainly concerned with depth

estimation from blur by focusing. Two different tech-

niques based upon this approach can be found in the

literature: depth from defocus, see (Pentland, 1987)

and (Ens and Lawrence, 1993), and depth from focus,

see (Krotkov, 1987), (Nayar and Nakagawa, 1994),

and (Viet et al., 2003). This work is based on the latter

method, since this type of approach does not require

a mathematical model for the blurring process of the

camera, i.e. the point spread function (PSF) respon-

sible for the blurring does not need to be modeled.

This is not possible in depth from defocus strategies,

where it is common to consider that this function is

either a two-dimensional Gaussian, or a circle of con-

stant intensity. Moreover, the amount of blur present

in an image is a consequence of both the characteris-

tics of the lens and the scene itself, which restricts

the applicability of depth from defocus methods to

step discontinuities in the scene. There are strategies

that tackle this problem by using a minimum of two

images of the same scene, acquired with a different

depth of ﬁeld (Pentland, 1987). Since the contribution

of the scene to all images is the same, it can be re-

moved. However, measuring the amount of blur with

high precision is still a difﬁcult problem, as it is an

ill-posed inverse problem.

In this paper, two novelties are proposed: a new

algorithm for the estimation of the depth of a target

with unknown dimensions is developed, and a strat-

484

Gaspar T. and Oliveira P..

NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV Observer Design, Analysis and Test.

DOI: 10.5220/0003356904840491

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 484-491

ISBN: 978-989-8425-47-8

Copyright

c

2011 SCITEPRESS (Science and Technology Publications, Lda.)

egy to estimate these dimensions is also described.

The depth estimation problem is tackled by com-

bining information present on the target boundary,

namely the amount of blur that corrupts this region,

with measurements of the dimensions of the projec-

tion of the target into acquired images. The dynam-

ics of the depth of the target is written as a function

of a parameter that depends on the dimensions of the

image of the target, which leads to a LPV observer

for the depth of moving targets with unknown dimen-

sions. In what concerns the dimensions of the real tar-

get, they are estimated resorting to the depth estimates

provided by the observer and to the measurements of

the dimensions of the image of the target.

This document is organized as follows: in sec-

tion 2, some background on theory of defocus is pro-

vided, and in section 3, a new method to estimate the

camera focus value that minimizes the amount of blur

in an image discontinuity is presented. In section 4,

the design and analysis of the proposed LPV observer

are detailed, and in section 5, experimental results il-

lustrating the performance of the described depth es-

timation algorithms are provided. Finally, section 6

summarizes the main conclusions of this work and

unveils challenging problems for the future.

2 BACKGROUND ON THEORY

OF DEFOCUS

There are two traditional approaches to model the

image formation process: one uses geometrical op-

tics and the other physical optics. The ﬁrst is an

approximation that disregards behaviours speciﬁcally

attributed to the wave nature of light, such as in-

terference and diffraction, and relies on ray tracing.

The great simplicity of this approach compensates

for its inaccuracies. On the contrary, the second re-

lies on diffraction theory, and its results are exact.

In this work, only geometrical effects are considered

since the spatial resolution of the used imaging sys-

tem makes diffraction effects negligible.

The idea of inferring depth from focus is based on

the concept of depth of ﬁeld, which is a consequence

of the inability of cameras to simultaneously focus

planes on the scene at different depths. The depth of

ﬁeld of a camera with a given focus value corresponds

to the distance between the farthest and the nearest

planes on the scene, in relation to the camera, whose

points appear in acquired images with a satisfactory

deﬁnition, according to a certain criterion.

At each instant, a lens can exactly focus points in

only one plane, denominated object plane. Consider-

ing a thin model for the lens of the camera, see (Hecht,

2001), it is possible to establish a nonlinear relation

between the distance z from the lens to the plane that

the camera can exactly focus at each instant of time,

and the distance v between the lens and the image

plane at which the projection of objects in the scene

appears sharply focused, see Fig. 1. To complete the

Object

plane

Sensor

plane

Image

plane

f

z

v

P

L

Lens

R

c

v

0

Figure 1: Geometrical optics model for the imaging process

of a thin lens.

relation, the focal length f of the lens must be con-

sidered. This relation is known as the Gaussian Lens

Formula, see (Hecht, 2001), and can be rearranged in

the form

z =

f v

v − f

. (1)

Considering that the CCD sensor plane is located

at a distance v

0

< v from the lens, and using (1) and

some trigonometric manipulations, it is possible to

write the distance z from the lens to the object plane

in the scene as

z =

f v

0

2R

c

F + v

0

− f

, (2)

see (Ens and Lawrence, 1993) and (Pentland, 1987)

for details, where F is the f-number of the lens and

R

c

is the effective radius of the point spread function.

This expression is valid when v > v

0

. An expression

similar to this would be easily derived for the case

v < v

0

.

In practical applications, usually all parameters in

the right-hand side of equation (2) are known, except

for R

c

. Depth from focus methods consist in ﬁnding

the sensor plane position that minimizes the amount

of blur present in image points of interest. This cor-

responds to ﬁnding the camera focus value that leads

to R

c

= 0, which is solved by optimizing a cost func-

tion that depends on the amount of blur present in the

points of interest. Depth can then be computed using

(1).

3 MINIMUM BLUR FOCUS

VALUE

In this section, a method to estimate the camera focus

NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV

Observer Design, Analysis and Test

485

value that leads to the minimum amount of blur in an

image discontinuity is proposed. The cost function

used for this purpose is described, as well as the pro-

cedure used to search for its minimum.

3.1 Cost Function

The estimation of the camera focus value that mini-

mizes the amount of blur in an image discontinuity

requires the deﬁnition of a metric that quantiﬁes the

sharpness of a transition in an image. Metrics re-

lated with high-frequency energy contents in the im-

age, such as the Fourier transform, the image gradient,

or the Laplacian, are detailed in (Krotkov, 1987).

There are some properties that are desirable for a

cost function: it must preferably be unimodal, vary

monotonically with the focus value on either side of

the mode, and be robust in presence of noise. In

(Krotkov, 1987), several cost functions were tested

and the maximization of the magnitude of the image

intensity gradient proved to achieve better results in

what concerns the referred criteria.

The goal of our system is to estimate the depth of

a target, therefore the metric proposed aims to max-

imize the image gradient magnitude across lines or-

thogonal to its boundary, which can be found resort-

ing to active contours, see (Blake and Isard, 2000) for

details. This approach considers that the real target

boundary is on a plane perpendicular to the camera

optical axis, which is the plane that appears sharply

focused when the camera focus value v

0

(i.e. the dis-

tance between the lens and the plane of the CCD sen-

sor of the camera) is the one that optimizes the metric

proposed. The plane in which the target boundary is

considered to be is the plane that speciﬁes the depth

of the target. The problem at hand can be formulated

as

min

v

0

g(v

0

),

where the cost function

g(v

0

) =

1

1

N

l

N

l

∑

i=1

max

(x,y)∈l

i

||∇I

v

0

(x,y)||

2

(3)

is the inverse of the mean of the image gradient mag-

nitude maximum values across lines orthogonal to the

target boundary. Moreover, N

l

denotes the number of

lines used, l

i

the i-th line, ∇ the gradient operator, ||·||

the Euclidean norm, and I

v

0

(x,y) the intensity of the

image acquired with the focus value v

0

at point (x,y).

The formulation of this problem as the minimization

of g(v

0

), instead of the maximization of its inverse,

is based on the model that will be proposed for this

function in the sequel.

3.2 Optimization of the Cost Function

The minimization of the cost function proposed in

(3) is difﬁcult. The data available is scarce and to

get new information, the acquisition of new images is

required. The problem is even more difﬁcult as we

want to estimate parameters related with the depth of

a moving target. Therefore, a model for the cost func-

tion that allows to infer its minimum resorting only to

a few images will be derived.

In order to gain some insight into how to model

the cost function proposed, consider that, for a given

focus value v

0

, acquired images are obtained from the

convolution of the corresponding sharply focused im-

age I

f

v

0

(x,y) with the point spread function h(x,y) of

the lens system, i.e. with the function that models the

camera blurring process:

I

v

0

(x,y) =

Z

∞

−∞

Z

∞

−∞

I

f

v

0

(α,β)h(x −α, y −β)dαdβ.

A common model for the point spread function is

a circle of constant intensity. Let, in this situation, the

PSF be

h(x,y) =

(

1

πR

2

c

, x

2

+ y

2

≤ R

2

c

0 , x

2

+ y

2

> R

2

c

,

where R

c

denotes the radius of the circle, and consider

the existence of a vertical step in the sharply focused

image of the form I

f

v

0

(x,y) = a

1

+ a

2

u(x −x

0

), where

u(x −x

0

) is the standard unit step function centered

at point x

0

, a

1

is the intensity of the image when x <

x

0

, and a

2

is the magnitude of the step. Thus, this

approach proﬁts from the target segmentation method

used.

In this situation, it is straightforward to show that

the partial derivative of I

v

0

(x,y) with respect to y is 0,

since I

f

v

0

(x,y) does not depend on this variable, and

differentiation and convolution are linear operations,

thus they commute. Using this fact, and after some

mathematical manipulation, it is also possible to show

that the partial derivative of I

v

0

(x,y) with respect to x

is

(

0 , |x −x

0

| > R

c

2a

2

πR

2

c

p

R

2

c

−(x −x

0

)

2

, |x −x

0

| ≤ R

c

.

By considering a line l orthogonal to the boundary of

the target, it is possible to conclude that

max

(x,y)∈l

||∇I

v

0

(x,y)||

2

= ||∇I

v

0

(x,y)||

2

x=x

0

=

2a

2

πR

c

2

.

From Fig. 1, and resorting to some trigonometric

manipulations, it is possible to write the value of R

c

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

486

as a function of the already deﬁned quantities f , z,

and v

0

, and the diameter of the lens L, see (Ens and

Lawrence, 1993) for details. The replacement of the

value of R

c

in

2a

2

πR

c

2

by its expression, allows to write

the cost function proposed in (3) in the form

g(v

0

) =

( f −z)

2

v

2

0

+ 2 f z( f −z)v

0

+ ( f z)

2

[4 f za

2

/(Lπ)]

2

.

According to the discussion above, which is con-

ﬁrmed by Fig. 2, the cost function in (3) is expected to

depend parabolically on v

0

. Therefore, the parabolic

model g(v

0

) = a(v

0

−v)

2

+ b, where a, b, and v are

parameters to be estimated, was considered for the

cost function. In particular, v is the camera focus

value that minimizes the cost function. This expres-

sion can also be written as g(v

0

) = a

0

v

2

0

+ b

0

v

0

+ c

0

,

where a

0

= a, b

0

= −2av, and c

0

= av

2

+ b. In this

form, the model of the cost function depends linearly

on the parameters that must be estimated, which sim-

pliﬁes signiﬁcantly the ﬁtting process described next.

This is the reason why the minimization of g(v

0

) was

considered, instead of the maximization of its inverse,

which seemed more intuitive.

20.5 20.55 20.6 20.65 20.7 20.75

2

2.2

2.4

2.6

2.8

v

0

[mm]

g(v

0

)

parabola model

measurements

Figure 2: Cost function for an AXIS 215 PTZ, when the

camera focal length is 29 mm and the target is 3 m away

from the lens.

Consider that at instant k the focus value of the

camera is v

0

k

, and that a measurement of the cost

function g(v

0

k

) corrupted by additive white Gaussian

noise is available. Stacking N of these measurements,

a ﬁtting problem can be formulated as min

y

||Ay −b||,

where ||·|| is the Euclidean norm, A is a matrix with

N rows of the form [v

2

0

k

v

0

k

1], b is a column vector

that stacks the N measurements, and y = [a

0

b

0

c

0

]

T

is

the vector of parameters to be estimated. The solution

to this problem is straightforward, using least squares

method. Given the three unknowns of the model, each

minimization of the cost function requires the acqui-

sition of at least three images with different focus val-

ues. This procedure must be repeated over time since

the cost function varies with the instantaneous depth

of the target. Once estimated the three parameters,

from which the camera focus value v that minimizes

the cost function is easily obtainable, the depth z of

the target can be computed resorting to (1).

The depth estimation method proposed in this sec-

tion is robust to variations in parameters such as scene

illumination or camera zoom and aperture values,

which may change the shape of the cost function, see

Fig.3, since the implemented estimation process es-

timates new parabola coefﬁcients in each iteration of

the algorithm, leading to the adaptation of the cost

function model to those values.

3000

3500

4000

20.45

20.5

20.55

20.6

20.65

20.7

0

5

10

v

0

[mm]

z [mm]

g(z, v

0

)

high luminosity

low luminosity

Figure 3: Luminosity inﬂuence on the cost function, for sev-

eral target depths (results obtained with an AXIS 215 PTZ;

f = 45.6 mm).

4 DEPTH LPV OBSERVER

In this section, an observer for the depth of a target

with unknown dimensions is pursued. A state-space

formulation for the evolution of the target depth is de-

rived in continuous time, and an observer for the state

of the LPV system that results is proposed. The anal-

ysis of the observer stability and its discrete-time ver-

sion are also provided.

4.1 Continuous-time Observer

Considering a pinhole model for the camera, see

(Faugeras and Luong, 2001), the relation between the

cartesian coordinates of a point in the camera refer-

ence frame (x,y, z) and the coordinates (x

p

,y

p

) of its

projection into the image plane is given by x

p

= f x/z

and y

p

= f y/z, where the origin of the camera ref-

erence frame was considered to be coincident with

the camera optical centre, and the origin of the image

frame is in the image centre.

From the expressions of x

p

and y

p

, it is straightfor-

ward to show that the distance R, between two points

in a plane at a distance z from the camera, and the

distance r, between the projection of these points into

the image plane, are related by

r = f R/z. (4)

In particular, if two points of the real target, lying in

the plane in which the target boundary is considered

NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV

Observer Design, Analysis and Test

487

to be, are used to obtain a measure of the real target

dimensions, they will verify this relation. However,

the use of a distance between two points as a mea-

sure of the target dimensions would require a precise

identiﬁcation of those points in each image, which is

a very difﬁcult problem to solve, especially when the

projection of the target appears with different orienta-

tions in different images.

In order to obtain a measure of the target dimen-

sions invariant to rotations of the image of the target,

consider that the coordinates x ∈ R

2

, of a point of

the curve that describes the target boundary, consist of

two discrete random variables, and that the covariance

of x is Σ

x

. Moreover, let x

a

∈ R

2

be the coordinates

of a point of the curve that describes the boundary of

a target in an image, and x

b

= R

x

x

a

the coordinates

of the same point when the target boundary is rotated

by an amount R

x

, where R

x

is an element of the Spe-

cial Orthogonal group SO(2). Consider also that both

quantities are random variables with covariance ma-

trices Σ

x

a

and Σ

x

b

, and that tr(·) denotes the trace of

a matrix. If r

a

=

p

tr(Σ

x

a

) and r

b

=

p

tr(Σ

x

b

) are the

dimensions of the image of the target associated with

x

a

and x

b

, respectively, then

r

b

=

q

tr(Σ

x

b

)=

q

tr(R

x

Σ

x

a

R

T

x

)=

q

tr(Σ

x

a

R

T

x

R

x

)=r

a

,

since R

T

x

R

x

= I

2×2

, where I

2×2

is the identity ma-

trix of dimensions 2 ×2. Therefore, the square root

of the trace of the covariance matrix associated with

the boundary of the image of the target was used as

a measure of its dimensions, since this quantity is in-

variant to rotations of the boundary of the target.

According to relation (4), and assuming that the

focal length f of the lens and the dimensions R of the

real target do not vary over time, it is possible to write

the derivative of the depth of the target with respect to

time in the form

˙z = −

˙r

r

z, (5)

where r and ˙r denote the square root of the trace of

the covariance matrix associated with the boundary of

the image of the target and its derivative with respect

to time, respectively. Both quantities follow directly

from the boundary of the projection of the target into

acquired images, and their measurements are here de-

noted r

m

and ˙r

m

. Assuming that z

m

, r

m

, and ˙r

m

are

exact measurements of z, r, and ˙r, and denoting the

quotient

−˙r

m

/r

m

by a parameter α, a deterministic LPV

system with the realization

˙z = αz

z

m

= z

results. An observer for the state z of this system can

be written in the form

˙

ˆz = αˆz + h(z

m

−ˆz), ˆz(t

0

) = ˆz

0

, (6)

see (Rugh, 1996), where ˆz and

˙

ˆz are the target depth

estimate and its derivative with respect to time, re-

spectively, h is the observer gain, t

0

is the initial time

instant, and ˆz

0

is the initial estimate for the target

depth.

From the considerations above, it is straightfor-

ward to show that the state estimation error ˜z = z − ˆz

satisﬁes the linear state equation

˙

˜z = (α −h)˜z, ˜z(t

0

) = z

0

−ˆz

0

, (7)

where

˙

˜z denotes the derivative of the estimation error

with respect to time. The values of r

m

and ˙r

m

, and as

a consequence the value of α, depend on several vari-

ables, such as the target dimensions, the target depth,

and the target motion. Therefore, the gain of the ob-

server must be chosen according to the experiment

at hand to guarantee the stability of the observer, as

shown in Proposition 1.

Proposition 1. The linear state equation (7) is uni-

formly exponentially stable if the gain h of the ob-

server veriﬁes h ≥ α

max

+

ν

/2q, where α

max

is the up-

per bound of α, and ν and q are ﬁnite positive con-

stants.

Proof. Consider the Lyapunov function V (˜z) = q˜z

2

,

where q is a ﬁnite positive constant. From the er-

ror dynamics in (7), it is possible to show that the

derivative of this Lyapunov function with respect to

time has the form

˙

V (˜z) = 2q(α −h)˜z

2

. According to

Lyapunov theory, see (Rugh, 1996), the linear state

equation (7) is uniformly exponentially stable if there

exists a q that, for all possible values of α, veriﬁes

2q(α −h) ≤−ν, where ν is a ﬁnite positive constant.

This relation can be rewritten in the form h ≥α+

ν

/2q.

Consider that α has an upper bound α

max

, which is

speciﬁed by the values that both r

m

and ˙r

m

can as-

sume. If the gain of the observer is chosen in such a

way that h ≥ α

max

+

ν

/2q is veriﬁed, for given values

of ν and q, then the observer-error state equation (7) is

guaranteed to be uniformly exponentially stable.

4.2 Discrete-time Observer

According to relation (6), the depth estimates pro-

vided by the proposed observer can be rewritten in

the form

˙

ˆz(t) = (α(t) −h)

| {z }

a(t)

ˆz(t) + hz

m

(t), (8)

where a is a new parameter. The time variable t, omit-

ted in previous sections, was considered to distinguish

the terms that depend on time from the ones that do

not.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

488

The solution of the homogeneous equation

˙

ˆz(t) =

a(t)ˆz(t) is given by

ˆz(t) = e

R

t

τ

a(σ)dσ

| {z }

Φ(t,τ)

ˆz(τ),

where τ is an arbitrary instant of time verifying t ≥τ.

Therefore, the solution of (8) has the form

ˆz(t) = Φ(t,τ)ˆz(τ) +

Z

t

τ

Φ(t,σ)hz

m

(σ)dσ, t ≥ τ,

see (Rugh, 1996) for more details. Evaluating this

expression for t = (k + 1)T and τ = kT , where T is a

ﬁxed positive constant and k = k

0

,k

0

+ 1,. . . , yields

ˆz

k+1

= F

k

ˆz

k

+ Λ

k

u

k

,

where u

k

and ˆz

k

denote the input z

m

and the state

estimate ˆz, respectively, at instant kT . The values

of the input z

m

and parameter a were assumed con-

stant over the integration range, and the index k

0

is

associated with the initial time instant k

0

T . Accord-

ing to the considerations above, we have F

k

= e

a

k

T

and Λ

k

= h

(e

a

k

T

−1)

/a

k

, where a

k

= −h −

˙r

m

k

/r

m

k

. The

variables associated with the subscript k are discrete,

with values that correspond to the evaluation of their

continuous-time versions at time instant kT .

The discrete-time LPV observer derived in this

section provides estimates for the depth of a target,

with unknown dimensions, moving in a 3D scene.

Therefore, this observer is suitable for the tracking

system proposed, since it is appropriate for imple-

mentation in a digital computer.

5 EXPERIMENTAL RESULTS

In this section, some brief considerations about the

implementation of the proposed depth estimation al-

gorithms and experimental results illustrating their

performance are presented.

LPV

observer

sec. 3.1

sec. 3.2

eq. (1)

sec. 4

fv

v − f

fz

z

−

f

I

v

0

1

I

v

0

2

I

v

0

3

g

(

v

0

1

)

g

(

v

0

2

)

g

(

v

0

3

)

v

v

z

m

ˆz

f

v

c

0

Figure 4: Scheme of the proposed depth estimation algo-

rithms.

Figure 4 depicts a simpliﬁed version of the archi-

tecture of the proposed depth estimation strategies.

In this ﬁgure, I

v

0

i

and g(v

0

i

), i = 1, 2, 3, denote, re-

spectively, the three images used by the depth from

focus algorithm and the cost function measurements

extracted from these images. The value of v

c

0

corre-

sponds to the focus value used to command the focus

of the camera.

Results provided in this section were obtained

with the 215 PTZ camera from AXIS. Images with the

spatial resolution 704 ×576 pixels were used. Since

image segmentation is itself a very complex domain,

which does not correspond to the main focus of this

work, targets with easily identiﬁable colours were

considered.

As in most cameras, the value of the distance v

0

,

between the plane of the CCD sensor of the used cam-

era and the lens of the camera, is not accessible to

the operator. Instead, a different parameter ranging

from 1 to 9999 is available. This parameter is speci-

ﬁed by the manufacturer and is usually known as the

camera focus setting. The use of the depth estima-

tion algorithms proposed requires the calibration of

the relation between these two quantities, see (Tara-

banis et al., 1992) for details about this procedure.

The implementation of the proposed discrete-time

observer requires the availability of discrete-time ver-

sions of the measurements extracted from images.

The value of the target depth at time instant kT ,

k = k

0

,k

0

+ 1,. .., obtained from the depth from fo-

cus algorithm, is denoted z

m

k

. The dimensions r

m

k

of

the projection of the target into the image acquired at

instant kT , and its derivatie over time ˙r

m

k

, are com-

puted according to

p

tr(Σ

x

k

) and (r

m

k

−r

m

k−1

)/T , re-

spectively, where Σ

x

k

denotes the covariance matrix

associated with the boundary of the projection of the

target into the image acquired at instant kT . As stated

before, this boundary is estimated resorting to active

contours.

In the sequel, two experiments are reported: one

in which the target, a balloon attached to a robot Pio-

neer P3-DX as in Fig. 5, moves along a straight line,

and other in which the target describes a circumfer-

ence. In both experiments, the nominal sampling in-

terval T for the application was set to 1.3 s, due to

limitations imposed by the resources available, and

the focal length of the lens was set to its maximum,

f = 45.6 mm.

The performance of the depth estimates provided

by the depth from focus algorithm and discrete-time

observer in both experiments is illustrated in Figs. 6

and 7. In Fig. 6, the nominal depth of the target is

plotted in blue and the value of the depth estimates

provided by the depth form focus algorithm and LPV

observer are plotted in red and green, respectively.

As can be seen, both estimates converge to the target

real depth, i.e. the depth estimation error, depicted

in Fig. 7, converges to zero. From the standard de-

NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV

Observer Design, Analysis and Test

489

(a) (b)

Figure 5: Real time target tracking. Left: experimental

setup; right: target identiﬁcation, where the initial guess for

the target contour is presented in black, its temporal evo-

lution is presented in red, and the ﬁnal contour estimate is

presented in blue.

0 50 100 150 200

2500

3000

3500

4000

4500

t [s]

target depth [mm]

real

z

m

ˆz

(a) Straight line trajectory.

0 75 150 225 300 375 450

2500

3000

3500

4000

4500

t [s]

target depth [mm]

real

z

m

ˆz

(b) Circular trajectory.

Figure 6: Depth estimation (h = 0.4).

viations σ

ss

of the steady-state depth estimation er-

rors presented in this ﬁgure, it is possible to conﬁrm

that the depth estimates ˆz provided by the observer

perform better than the measurements z

m

obtained di-

rectly from the depth from focus strategy. The stan-

dard deviations of the steady-state errors associated

with the depth estimates provided by the LPV ob-

server (20.7 mm in the straight line trajectory and

37.7 mm in the circular trajectory) are smaller than

the ones associated with the depth measurements pro-

vided by the depth from focus algorithm (45.5 mm in

the straight line trajectory and 79.8 mm in the circular

trajectory).

There are several reasons that can explain the er-

0 50 100 150 200

−1000

−750

−500

−250

0

250

500

750

1000

t [s]

target depth error [mm]

z

m

[σ

ss

= 45.5 mm]

ˆz

l p v

[σ

ss

= 20.7 mm]

(a) Straight line trajectory.

0 75 150 225 300 375 450

−1000

−750

−500

−250

0

250

500

750

1000

t [s]

target depth error [mm]

z

m

[σ

ss

= 79.8 mm]

ˆz

l p v

[σ

ss

= 37.7 mm]

(b) Circular trajectory.

Figure 7: Depth estimation error (h = 0.4).

rors observed in Fig. 7: i) uncertainty associated with

the characterization of the real trajectory described

by the target; ii) errors resulting from the ﬁtting of

the cost function, and iii) uncertainty associated with

the calibration of the relation between the focus value

and the focus setting of the camera. In particular, part

of the errors associated with the estimates provided

by the observer result from the fact that the measure-

ments z

m

, r

m

, and ˙r

m

are not exact, as assumed in the

derivation of the observer, but corrupted by noise.

For the experiments reported in this section, the

dimensions of the target do not vary over time. There-

fore, it is possible to infer from (4) that an estimate

ˆ

R of the target dimensions can be obtained accord-

ing to

ˆ

R = E

{

(r

m

ˆz)

/f

}

, where E denotes the expected

value operator. The quantities r and z, in (4), were

replaced by the measurements r

m

and by estimates ˆz

of the depth of the target, respectively, since their real

values are not known. In discrete-time, this expres-

sion can be rewritten in the form

ˆ

R =

1

N

T

N

T

∑

k=1

r

m

k

ˆz

k

f

,

where N

T

denotes the number of iterations of the ex-

priment, and r

m

k

and ˆz

k

the values of r

m

and ˆz, re-

spectively, at time instant kT . The use of this strategy

to estimate the dimensions of the target used in the

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

490

two experiments leads to

ˆ

R = 35.80 mm, when the

depth estimates provided by the depth from focus al-

gorithm are used, and to

ˆ

R = 35.77 mm, when the

depth estimates are provided by the observer. The-

ses values are very close to the target real dimensions,

R = 35.76 mm.

6 CONCLUSIONS

In this paper, new methodologies for the estimation

of the depth of a moving target with unknown di-

mensions were proposed. Measurements of the target

depth, extracted from real time images acquired with

a single pan and tilt camera and based upon depth

from focus techniques, were used. These measure-

ments were processed resorting to a LPV observer,

whose analysis and synthesis were also provided. The

performance of the proposed algorithms was assessed

resorting to a series of indoor experimental tests, for

a range of operation of up to ten meter. A centimet-

ric accuracy was obtained under realistic conditions.

In the near future, this system will be used to gener-

ate real time 3D trajectories of marine animals under

captivity, for behavioural studies.

ACKNOWLEDGEMENTS

This work was partially funded by FCT (ISR/IST

plurianual funding) through the PIDDAC Program.

The work of Tiago Gaspar was supported by the PhD

Student Scholarship SFRH/BD/46860/2008, from

FCT.

REFERENCES

Asada, N., Baba, M., and Oda, A. (2001). Depth from blur

by zooming. In Proceedings of the Vision Interface

Annual Conference, pages 165–172.

Bar-Shalom, Y., Rong-Li, X., and Kirubarajan, T. (2001).

Estimation with Applications to Tracking and Naviga-

tion: Theory Algorithms and Software. John Wiley &

Sons, Inc.

Bertelli, L., Ghosh, P., Manjunath, B., and Gibou, F. (2008).

Robust depth estimation for efﬁcient 3d face recon-

struction. 15th IEEE International Conference on Im-

age Processing, pages 1516–1519.

Blake, A. and Isard, M. (2000). Active Contours. Springer,

1st edition.

Discant, A., Rogozan, A., Rusu, C., and Bensrhair, A.

(2007). Sensors for obstacle detection - a survey. 30th

International Spring Seminar on Electronics Technol-

ogy, pages 100–105.

Ens, J. and Lawrence, P. (1993). An investigation of meth-

ods for determining depth from focus. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

15(2):97–108.

Faugeras, O. and Luong, Q. (2001). The geometry of multi-

ple images. MIT Press.

Haritaoglu, I., Harwood, D., and Davis, L. (2000). W

4

:

real-time surveillance of people and their activities.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 22(8):809–830.

Hecht, E. (2001). Optics. Addison-Wesley, 4th edition.

Krotkov, E. (1987). Focusing. International Journal of

Computer Vision, 1:223–237.

Nayar, S. and Nakagawa, Y. (1994). Shape from focus.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 16(8):824–831.

Pentland, A. P. (1987). A new sense for depth of ﬁeld. IEEE

Transactions on Pattern Analysis and Machine Intel-

ligence, 9(4):523–531.

Rugh, W. (1996). Linear System Theory. Prentice Hall, 2nd

edition.

Schechner, Y. Y. and Kiryati, N. (1998). Depth from de-

focus vs. stereo: How different really are they? In

Proceedings of the International Conference on Pat-

tern Recognition, pages 1784–1786.

Tarabanis, K., Tsai, R., and Goodman, D. (1992). Modeling

of a computer-controlled zoom lens. In Proceedings

of the IEEE International Conference on Robotics and

Automation, volume 2, pages 1545–1551.

Viet, H. Q. H., Miwa, M., Maruta, H., and Sato, M. (2003).

Recognition of motion in depth by a ﬁxed camera. In

VII Digital Image Computing: Techniques and Appli-

cations, pages 205–214.

NEW DYNAMIC ESTIMATION OF DEPTH FROM FOCUS IN ACTIVE VISION SYSTEMS - Data Acquisition, LPV

Observer Design, Analysis and Test

491