MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING

FROM 2D ELLIPSES

Nicola Greggio

⊘,‡

, Jos´e Gaspar

‡

, Alexandre Bernardino

‡

and Jos´e Santos-Victor

‡

⊘

ARTS Lab - Scuola Superiore S. Anna, Polo S. Anna Valdera, Viale R. Piaggio 34, 56025 Pontedera, Italy

‡

Instituto de Sistemas e Rob´otica, Instituto Superior T´ecnico, 1049-001 Lisboa, Portugal

Keywords:

Machine vision, Pattern recognition, Least-square ﬁtting, Algebraic distance.

Abstract:

Real-Time tracking of elliptical objects, e.g. a ball, is a well studied ﬁeld. However, the question between a

monocular and binocular approach for 3D objects localization is still an open issue. In this work we imple-

mented a real-time algorithm for 3D ball localization and tracking from 2D image ellipse ﬁtting with calibrated

cameras. We will exploit both approaches, together with their own characteristics. Our algorithm features the

following key features: (1) a real-time video segmentation by means of a Gaussian mixture descriptor; (2)

a closed-form ellipse ﬁtting algorithm; and (3) a novel 3D reconstruction algorithm for spheres from the 2D

ellipse parameters. We test the algorithm’s performance in several conditions, by performing experiments in

virtual scenarios with ground truth. Finally, we show the results of monocular and binocular reconstructions

and evaluate the inﬂuence of having prior knowledge of the ball’s dimension and the sensitivity of binocular

reconstruction to mechanical calibration errors.

1 INTRODUCTION

3D object localization and tracking are leading sub-

jects robotics and computer vision. The current

ever and ever improvements in hardware capabili-

ties (framegrabbers, motor controllers) require ever

and ever more sophisticated techniques and algo-

rithms to coope with. This is because the correct ob-

ject/target identiﬁcation in terms of 3D position, di-

mension, velocity, and trajectory is of primary impor-

tance to many applications. Vehicle guidance (Stentz,

2001) (Shi et al., 2007), rescue (Carpin et al., 2006)

(S. Balakirsky et al., 2007), soccer robots (Menegatti

et al., 2008) (Assfalg et al., 2003), or surveillance

systems (Ottlik and Nagel, 2008) (Davis et al., 2006)

are only few examples of ﬁelds that exploit this con-

cept. In general unstructured and dynamic environ-

ments, tracking techniques must be robust, to cope

with world uncertainty and sensor noise, and compu-

tationally efﬁcient to provide fast reaction times to un-

expected events.

Robustness is closely linked to the environment

the application has to perform into. For instance, in

industrial applications (i.e. in structured scenarios) a

non-robust algorithm can perform well because of the

absence of disturbances. However, in non-constrained

scenarios this is no longer true: changes in color,

shape and speed of the object can signiﬁcantly per-

turb object identiﬁcation and tracking. For instance

in outdoor vision and robotics applications, like video

surveillance and rescue robots, there is the need of a

substantial adaptability toward the changing lighting

and color conditions. The other fundamental issue re-

quired by a vision identiﬁcation algorithm is its com-

putational efﬁciency. For example in video surveil-

lance, a precise but very slow algorithm leads to high

(and sometimes unacceptable) latency time between

the image acquisition and its feature extraction. Also

in vision-based control applications, like robotics or

vehicle guidance, low latencies are essential for the

stability and performance properties of the control

loops.

1.1 Related Work

Stereovision tracking requires the usage of two or

more calibrated cameras that capture the same im-

age at the same time from different known positions

and orientation. An addressing example are the two

cameras in a humanoid robot. Each image (left and

right is processed separately in order to isolate the

target). Then, thanks to the known robot’s direct

kinematics and encoders information the target posi-

tion is retrieved by means of geometric triangulation

Greggio N., Gaspar J., Bernardino A. and Santos-Victor J..

MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING FROM 2D ELLIPSES.

DOI: 10.5220/0003543400670073

In Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2011), pages 67-73

ISBN: 978-989-8425-75-1

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

(Forsyth and Ponce, 2002). In 2004, Kwolek devel-

oped a method for tracking human heads with a mo-

bile stereovision camera (Kwolek, 2004). He charac-

terized faces by ﬁrst performing a color ﬁltering, and

then by modeling the head in the 2D image domain

as an ellipse. Therefore, they formulated the tracking

problem as a probabilisticone in which a particle ﬁlter

is used to approximate the probability distribution by

a weighted color cue, shape information, and stereovi-

sion sample collection. In 2011, Greggio et Al. imple-

mented a 3D tracker for the iCub platform featuring

an ellipse pattern recognition algorithm for the target

description (Greggio et al., 2011). In their work, the

authors performed a simple color based image ﬁlter-

ing in order to evidence a green ball. Then, the 2D in-

formation obtained with both the iCub’s cameras has

been processed by the ellipse detector, and the results

triangulated in order to reconstruct the ball’s position.

On the other hand, monocular tracking features

the usage of a single camera. Therefore all the tri-

angulation information are lost, making this category

more challenging than the previous one. Many ex-

ertions go toward the human beings tracking. Face

tracking is one of the most important application for

both video surveillance or simple people detection. In

their work, Gokturk et Al. proposed tho apply the

Principal Component Analysis to learn all the possi-

ble facial deformations (Gokturk et al., 2001). Then,

they tracked pose and deformation of the face from an

image sequence. Fossati et Al. used a motion model

to infer 3-D poses between consecutive detections for

identifying key postures for recognizing people seen

from arbitrary viewpoints by a single and potentially

moving camera (Fossati et al., 2007). In 2010 An-

driluka et Al. proposed a people spatial pose estima-

tion technique based on three stages: Initial estimate

of the 2D articulation from single frames, data asso-

ciation across frames, and recovering 3D pose (An-

driluka et al., 2010).

1.2 Main Contributions

In this work we present a real-time algorithm that

robustly segments color blobs modeled by a ﬁnite

Gaussian mixture model, following the approach pro-

posed in (Greggio et al., 2010a) and in (Greggio et al.,

2010b). This is an essential pre-processing step that

allows the identiﬁcation of connected components in

the image where the remaining phases will be ap-

plied. Then, we present a novel algorithm to com-

pute the 3D position of a world sphere correspond-

ing to the projected ellipse in the camera. We per-

formed several experiments in the simulator of the

iCub robot (Tikhanoff et al., 2008) in order to verify

our approach’s performance in a realistic context. We

compare, the reconstructions obtained with monocu-

lar and binocular approaches with ground truth data.

Besides, we consider the existence or absence of a

priory knowledge about the ball’s dimensions.

1.3 Paper Organization

This paper is organized as follows. In sec. 2 we

will describe our mapping from 2D to 3D coordinates.

Then, in sec. 3 we will describe our experimental set-

up and we will discuss our results. Finally, we will

conclude and point out directions for future work in

sec. 4.

2 3D RECONSTRUCTION FROM

ELLIPSES

In this section we describe how to reconstruct the 3D

location of the ball given monocular and binocular

images of the ball silhouette and, in the monocular

case, the knowledge of the ball radius.

2.1 Projection Equation

Describing the ball as a quadric surface, solution of

the quadratic equation:

QX = 0, (1)

where X = [x y z 1]

is a point of the projective space

belonging to the ball surface, and its imaged silhou-

ette (ellipse) as a conic:

Cm = 0 (2)

where m = [u v 1]

is a point of the projective plane

belonging to the imaged silhouette, then the projec-

tion of the ball Q to the ellipse C

is simply (Cross

and Zisserman, 1998):

∗

∼ PQ

∗

(3)

where P is a 3× 4 projection matrix,C

∗

= ad j(C) and

∗

= ad j(Q) are the dual conic and the dual quadratic

of C and Q respectively, and ∼ denotes equality up to

a scale factor.

2.2 Monocular Reconstruction

Given the particular case of a spherical ball, the

quadratic Q has a simpliﬁed form. Considering a

In order to simplify the notation, we refer the parameter ma-

trices Q and C interchangeably with the associated quadratic and

conic equations.

ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics

ball with radius R centered at the origin, one has

= diag(1,1,1, −R

). Translation of the ball to a

generic 3D location t = [t

]

is obtained by apply-

ing the homogeneous coordinates transformation T to

as Q = TQ

, with T = [I

− t; 0

1] where I

is a 3× 3 identity matrix and 0

is a vector of zeros.

The translated quadratic has therefore the form:

Q =



−t

t − R



(4)

Noting that C and Q are symmetric matrices,

then the adjoint matrices coincide with their inverses.

Hence, the projection equation Eq.3 can be written as

−1

∼ PQ

−1

, with:

−1



− tt

−t/R

−t

−1/R



(5)

Considering, without loss of generality, that a

camera has its coordinate frame coincident with the

world frame P = K[I

], where K denotes the in-

trinsics parameters matrix, one obtains C

−1

∼ K(I

−

. Given that K is an invertible matrix one

obtains the normalized projection equation:

− tt

∼ K

−1

−T

(6)

Computing the characteristic equation of the LHS

of Eq.6, det(I

− tt

− λI

) = 0 one ﬁnds that the

LHS matrix has the eigenvalues λ

1,2,3

= {1,1,1 −

kt/Rk

}

. The two unitary eigenvalues of the LHS

imply that the RHS has also two equal eigenvalues,

but usually not-unitary given that the equality is only

up to a scale factor. However, this observation allows

ﬁnding a scale for the RHS that makes it equal to the

LHS. The scaling has just to impose that the equal

eigenvalues of the RHS are scaled to become unitary,

which can be done by sorting the three eigenvalues

and selecting the middle one.

More speciﬁcally, deﬁning H as the normalized

conic of the RHS of Eq.6, i.e. H = K

−1

−T

, one

removes the scale ambiguity by computing the me-

dian of the eigenvalues:

− tt

= H/median(eig(H)) (7)

where eig(H) denotes a function returning the set of

three eigenvalues of H.

Removed the scale ambiguity, Eq.7 allows solving

easily for the direction of the ball location v = t/R.

Denoting H

= I

− H/median(eig(H)), one obtains

Despite appearing that λ

= 1 − kt/Rk

can be zero, hence

posing questions about the existence of Q, this is not a case of prac-

tical importance since λ

= 0 implies ktk = R, meaning that the ball

would touch the camera projection center which is not possible in

practice.

= H

and the direction of the ball location in the

camera frame is:

v =





2(1,1)

2(2,2)

2(3,3)





(8)

where s

denote the signs of the components

of v. Assuming that the ball is in front of the cam-

era, and the optical axis of the camera is the z axis,

pointing forward, then s

= +1 and one can compute

= sign(H

(1,3)),s

= sign(H

(2,3)) where sign(.)

is the sign function returning +1,−1,0 for positive,

negative or null arguments respectively. Finally, given

the ball radius, R computing the ball location in the

camera frame coordinate, t is just a scaling of the

computed direction:

t = Rv. (9)

Computing the ball location in world coordinates,

t, given a generic projection matrix, P = K[

proceeds as before, ﬁrstly using just the intrinsic pa-

rameters K, and than correcting the computed pose to

the world coordinates frame:

t =

t +

(10)

Note that even if P is not given in a factorized form,

the intrinsic K and extrinsic parameters

can

be extracted from P using QR factorization as de-

scribed in (Hartley and Zisserman, 2000).

2.3 Binocular Reconstruction

Considering a binocular vision system, Eq.3 is repli-

cated for the two cameras:



−1

∼ P

−1

∼ P

−1

where P

and P

denote the two projection matrices,

possibly having different intrinsic matrices, K

rotation matrices,

, and camera centers,

In the case of knowing the radius of the ball, R

then the monocular reconstruction methodology can

be applied twice, resulting in two estimates of the ball

location,

, from which one can obtain a ﬁnal

estimate as the mean of the two estimates,

t = (

)/2.

In case of not knowing the radius of the ball, a

binocular (stereo) setup still allows obtaining depth

estimates by triangulation, provided that the setup

has a not-null baseline, i.e. different camera cen-

ters,

where

= −

−1

and

−P

−1

i(1:3,1:3)

i(1:3,4)

for i = 1, 2. Using Eq.8, one ob-

tains two directions to the ball location, v

, which

MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING FROM 2D ELLIPSES

can be converted to a common (world) coordinate

system,

−1

, i = 1, 2. The ball location

can therefore be obtained by scaling these directions,

starting at the camera centers,

, and ﬁnding the

closest to an intersection point in a least squares

sense:

(α

∗

,β

∗

) = arg

α,β

minkα

− (β

(11)

Collecting α, β into a vector, the cost function of

Eq.11 can be rewritten as



A[α β]

+ b



with A =

[

] and b =

−

, thus having a solution

[α

∗

]

= −(A

−1

b. The ball location can be

estimated just using the scaling factor α, but is more

convenient to use also β:

t = (α

∗w

+ β

∗w

)/2. (12)

Similarly, the ball radius could be obtained simply as

R = α

∗

, but once more is more convenient to average

the scalings R = (α

∗

+ β

∗

)/2.

3 EXPERIMENTS

3.1 Experimental Set-up

The experimental set-up consists in a virtual world

(the iCub simulator) observed by the robot’s stereo

cameras.The world is constituted by a textured back-

ground, a uniformly colored table and uniformly col-

ored objects, including a ball which will be the subject

of tracking. The ball trajectory is an helix with a di-

ameter of 2m, starting 1.5m away from the robot and

moving with uniform velocity until 2.5m.

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

400

450

500

550

(a)

−1.5

−1

−0.5

0.5

1.5

−2

−1

−1.5

−1

−0.5

0.5

1.5

(b)

Figure 1: Left: ﬁrst image of the sequence used for evalu-

ation. Right: ball’s motion follows an helicoidal trajectory

moving away from the robot.

Fig. 1 shows both the simulator scenario and the

3D helicoidal imposed trajectory. The reason to use a

simulated world in the evaluation of the algorithm is

that it allows us to obtain the ground truth 3D position

of the ball and thus compute the absolute tracking er-

ror achieved by different methods. Also, the intrinsic

and extrinsic camera parameters are known which is

useful to rule our errors thad could arise from cam-

era calibration. We show the performance of our im-

age segmentation method with the proposed Gaussian

Mixture segmentation (Greggio et al., 2010a), (Greg-

gio et al., 2010b), and the ellipse ﬁtting algorithm

(LCSE) (Greggio et al., 2010c). Then we evaluate

our method to reconstruct the 3D position of the ball

from the 2D ellipse parameters comparing monocular

and binocular approaches.

3.2 Image Segmentation by Means of

Gaussian Mixtures

Fig. 2 shows the results of our image segmentation by

means of Gaussian Mixtures algorithm applied to the

images captured from the the iCub simulator. Here,

we collected three sets of images: (1) is a scenario

containing a large ball and a non-spherical object; (2)

contains a small ball over the table; and (3) contains

a small ball against the background. For each set, we

show in Fig. (a) The original image, as captured by

the camera; (b) The color segmented image, (c) the

cost function as function of the number of mixture

components, and (d) the behavior of the whole log-

likelihood as function of the number of iterations.

Learning the right number of color components

(i.e. mixture components) within a colored image

is a difﬁcult task. This is because a general colored

image is supposed to contain a huge number of the

three fundamental color combination, especially on

modern devices. Therefore, the number of mixture

components needed to represent the image at best

rapidly rises up excessively, becoming too high, re-

sulting in an excessive computational burden. Our

approach is able to segment the images with a good

accuracy (all the important features of the image are

reproduced), while performing a lower computational

burden against the other state-of-the-art techniques -

for a deeper comparison see (Greggio et al., 2010b).

3.3 Ellipse Fitting

Fig. 3 shows the results of ellipse ﬁtting algorithm.

Images at top row represent the original images with

the superimposed ellipse.The ellipse ﬁtting technique

performs equally better when the ellipse is bigger or

smaller. Of course, when the pattern’s resolution de-

creases (e.g. when the target is small) the ﬁtting preci-

sion may lack some accuracy, although this is merely

a question of resolution rather than the algorithm it-

self.

ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics

(1)

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

1 2 3 4 5 6 7

4.5

5.5

x 10

Cost function

Number of components

Cost function

0 20 40 60 80 100 120

−5.5

−5

−4.5

−4

x 10

LogLikelihood

Iteration number

LogLikelihood

(a) (b) (c) (d)

(2)

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

4.2

4.4

4.6

4.8

5.2

5.4

5.6

x 10

Cost function

Number of components

Cost function

0 10 20 30 40 50 60

−5.6

−5.4

−5.2

−5

−4.8

−4.6

−4.4

−4.2

x 10

LogLikelihood

Iteration number

LogLikelihood

(a) (b) (c) (d)

Figure 2: Image segmentation by means of Gaussian Mixtures. (a) The original images, as captured by the camera; (b) the

color segmented images, (c) the cost function as function of the number of mixture components, (d) the cost function as

function of the number of iterations, and (e) the behavior of the whole log-likelihood as function of the number of iterations.

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550

100

150

200

250

300

350

(a) (b) (c) (d) (e) (f)

Figure 3: Ball reconstruction image sequence. Images at top row represent the original images with the superimposed ellipse.

3.4 Ball Monocular 3D Localization

In this section we present the results of ball 3D local-

ization using a single camera. Fig. 4 shows the track-

ing errors along time and the 3D reconstructed tra-

jectories. The generated trajectories are shown in red

while the estimated trajectory is represented in blue.

Due to the similarity among the analyzed ellipse ﬁt-

ting methods, in the following experiments we only

use the LCSE ellipse ﬁtting approach (Greggio et al.,

2010c).

We can observe that the reconstructed trajectory

follows closely the true one. The tracking errors grow

slightly for increasing distances, which is a natural

consequence of the decreasing size of the ellipse in

the image and the consequent increase of discretiza-

tion errors and estimation variance. Anyway, the ab-

solute error is always kept below 7cm.

3.5 Comparison between Monocular

and Binocular Reconstructions

In this section we perform the reconstruction of the

ball’s position using a binocular approach. The used

baseline is 10cm, matching approximately the struc-

ture of the iCub robot head. We consider two sce-

narios corresponding to cases where prior knowledge

of the ball radius is absent or present: (i) when ball

radius is unknown we use stereo triangulation, as de-

scribed in section 2.3; (ii) when ball radius is known

we use the mean of the two monocular reconstruc-

tions associated to each of the cameras of the binocu-

lar pair. Furthermore, for the binocular case, we con-

sider the possibility of having uncertainty in the ver-

gence angle formed by the cameras. This is a com-

mon case in robots with moving eyes and is due to

either mechanical effects (backlash, miscalibration)

or asynchronous acquisition of image and motor an-

gles. Results are shown in Fig. 5. The top row shows

the results of stereo triangulation with unknown ball

radius. It is obvious a bigger variance of the esti-

mate with respect to the monocular case, even in the

absence of vergence angle errors (compare Fig. 5(a)

and (d)). This shows the well known sensitivity of

the stereo triangulation algorithm to small errors in

the input data for small baselines. If the ball size is

known, the binocular method copes much better with

vergence angle error (Fig. 5(e) vs (b) and (c)). How-

ever, if the ball size is known, the monocular method

shows roughly the same performance (Fig. 5(d) vs

(e)).

4 CONCLUSIONS

This paper presents a real-time 3D ball tracking sys-

MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING FROM 2D ELLIPSES

(a) Reconstruction error vs time. (b) 3D reconstructed trajectory

of the ball.

era frontal view.

Figure 4: 3D monocular reconstruction of the trajectory of the ball based on the LCSE ellipse ﬁtting. Ground truth and

reconstructed trajectories are in red (circles) and blue (dots). The average reconstruction error is 2.8cm for reconstruction

depths larger than 1m.

STEREO TRIANGULATION MONOCULAR RECONSTRUCTION

(a) (b) (c) (d) (e)

Figure 5: Comparing monocular and binocular reconstruction. The left set (a-b-c) shows stereo triangulation, when ball size

is unknown. Stereo reconstruction with zero, −0.2

or +0.2

vergence errors (a, b and c). The bottom set (d-e) considers

known ball size. Monocular reconstruction (d). Average of two monocular reconstructions, with vergence error of +0.2

(e).

tem including all processing stages, from image seg-

mentation, 2D feature extraction and 3D reconstruc-

tion. All stages of the processing pipeline were devel-

oped taking both quality and computation time into

account and are carefully described in the paper. The

segmentation algorithm uses a clustering based ap-

proach in joint color-space coordinates, using a novel

greedy optimization of Gaussian Mixture parameters

that overcomes related techniques in video segmen-

tation. The 2D ellipse ﬁtting method was designed

for improved robustness to singular cases and is com-

petitive with alternative methods. The 3D reconstruc-

tion method is simple, elegant and makes effective of

all ellipse parameters obtained from the image. We

performed experiments with a realistic simulator of

the iCub robot. The full tracking method was evalu-

ated by comparing tracking errors with ground truth

values. Monocular and binocular approaches were

tested, including the presence of errors in vergence

angle measurements. This is a frequentcase in robotic

heads with moving vergence and the results obtained

conﬁrmed our empirical tests on the iCub robot: Ver-

gence angle errors propagate signiﬁcantly to 3D re-

construction errors in this robot. We showed that a

priori knowledge of the ball radius can reduce signif-

icantly the variance of the 3D estimates. Although it

may not be possible in practice to obtain this knowl-

edge in advance, our future work will focus on the

online estimation of the ball size and the demonstra-

tion of its inﬂuence in the 3D reconstruction quality.

ACKNOWLEDGEMENTS

We thank Cecilia Laschi and Paolo Dario for their

support. This work was supported by the Euro-

pean Commission, Project IST-004370 RobotCub and

FP7-231640 Handle, and by the Portuguese Gov-

ernment - Fundac¸˜ao para a Ciˆencia e Tecnologia

(ISR/IST pluriannual funding) through the PIDDAC

program funds and through the projects BIO-LOOK,

PTDC / EEA-ACR / 71032 / 2006, and DCCAL,

PTDC / EEA-CRO / 105413 / 2008.

REFERENCES

Andriluka, M., Roth, S., and Schiele, B. (2010). Monocular

3d pose estimation and tracking by detection. IEEE

Conference on Computer Vision and Pattern Recogni-

tion (CVPR 2010), San Francisco, USA.

ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics

Assfalg, J., Bertini, M., Colombo, C., Del Bimbo, A., and

Nunziati, W. (2003). Semantic annotation of soccer

videos: automatic highlights identifcation. Computer

Vision and Image Understanding, 92:285–305.

Carpin, S., Lewis, M., Wang, J., Balakirsky, S., and Scrap-

per, C. (2006). Bridging the gap between simulation

and reality in urban search and rescue”. In Robocup

2006: Robot Soccer World Cup X.

Cross, G. and Zisserman, A. (1998). Quadric reconstruction

from dual-space geometry. Int. Conf. on Comp. Vision

(ICCV), pages 25–31.

Davis, J. W., Morison, A. W., and Woods, D. D. (2006). An

adaptive focus-of-attention model for video surveil-

lance and monitoring. Machine Vision and Applica-

tions, 18(1):41–64.

Fitzgibbon, A., Pilu, M., and Fisher, R. (1999). Direct least

square ﬁtting of ellipses. IEEE Trans. PAMI, 21:476–

480.

Forsyth, D. A. and Ponce, J. (2002). Computer Vision: A

modern approach. Prentice Hall, NY.

Fossati, A., Dimitrijevic, M., Lepetit, V., and Fua, P. (June

17-22, 2007). Bridging the gap between detection and

tracking for 3d monocular video-based motion cap-

ture. IEEE Conference on Computer Vision and Pat-

tern Recognition, Minneapolis, pages 1–8.

Gokturk, S., Bouguet, J., and Grzeszczuk, R. (2001).

A data-driven model for monocular face tracking.

ICCV01, pages 701–708.

Greggio, N., Bernardino, A., Laschi, C., Dario, P., and

Santos-Victor, J. (2010a). Self-adaptive gaussian mix-

ture models for real-time video segmentation and

background subtraction. IEEE 10th International

Conference on Intelligent Systems Design and Appli-

cations (ISDA), Cairo, Egypt.

Greggio, N., Bernardino, A., Laschi, C., Santos-Victor, J.,

and Dario, P. (2010b). An algorithm for the least

square-ﬁtting of ellipses. IEEE 22th International

Conference on Tools with Artiﬁcial Intelligence (IC-

TAI 2010), Arras, France.

Greggio, N., Bernardino, A., Laschi, C., Santos-Victor, J.,

and Dario, P. (2011). Real-time 3d stereo tracking and

localizing of spherical objects with the icub robotic

platform. Journal of Intelligent & Robotic Systems,

pages 1–30. 10.1007/s10846-010-9527-3.

Greggio, N., Bernardino, A., and Santos-Victor, J. (2010c).

Sequentially greedy unsupervised learning of gaus-

sian mixture models by means of a binary tree struc-

ture. 11-th International Conference on Intelligent Au-

tonomous Systems (IAS-11) 2010 - Aug 30, Sept 1.

Hartley, R. and Zisserman, A. (2000). Multiple view geom-

etry in computer vision. Cambridge University Press.

Kwolek, B. (2004). Real-time head tracker using color,

stereovision and ellipse ﬁtting in a particle ﬁlter. IN-

FORMATICA, 15(2):219–230.

Maini, E. S. (2006). Enhanced direct least square ﬁtting of

ellipses. IJPRAI, 20(6):939–954.

Menegatti, E., Silvestri, G., Pagello, E., Greggio, N., Maz-

zanti, F., Cisternino, A., Sorbello, R., and Chella, A.

(2008). 3d realistic simulations of humanoid soccer

robots. International Journal of Humanoid Robotics,

5(3):523–546.

Ottlik, A. and Nagel, H.-H. (2008). Initialization of model-

based vehicle tracking in video sequences of inner-

city intersections. International Journal of Computer

Vision, 80(2):211–225.

S. Balakirsky, S. C., A.Kleiner, Lewis, M., Visser, A.,

Wang, J., and Ziparo, V. (2007). Towards heteroge-

neous robot teams for disaster mitigation: Results and

performance metrics from robocup rescue. Journal of

Field Robotics.

Shi, Y., Qian, W., Yan, W., and Li, J. (2007). Adaptive depth

control for autonomous underwater vehicles based on

feedforward neural networks. International Journal of

Computer Science & Applications, 4(3):107–118.

Stentz, A. (2001). Robotic technologies for outdoor in-

dustrial vehicles. Unmanned ground vehicle tech-

nology. Conference No. 3, Orlando FL, ETATS-UNIS

(16/04/2001), 4364:192–199.

Tikhanoff, V., Fitzpatrick, P., Nori, F., Natale, L., Metta,

G., and Cangelosi, A. (2008). The icub humanoid

robot simulator. International Conference on Intel-

ligent RObots and Systems IROS, Nice, France.

MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING FROM 2D ELLIPSES