ROBUST AUGMENTED REALITY TRACKING BASED VISUAL

POSE ESTIMATION

Madjid Maidi, Fakhr-Eddine Ababsa and Malik Mallem

IBISC Laboratory / CNRS FRE 2873

University of Evry

40 Rue du Pelvoux, CE 1455 Courcouronnes, 91020 Evry Cedex, France

Keywords:

Computer vision, augmented reality, calibration, pose estimation, robust tracking.

Abstract:

In this paper, we present a robust ﬁducials tracking method for real time Augmented Reality systems. Our

approach identiﬁes the target object with an internal barecode of the ﬁducial and extracts its 2D features

points. Given the 2D feature points and a 3D object model, object pose consists in recovering the position and

the orientation of the object with respect to the camera. For pose estimation, we presented two methods for

recovering pose using the Extended Kalman Filter and the Orthogonal Iteration algorithm. The ﬁrst algorithm

is a sequential estimator that predicts and corrects the state vector. While the later uses the object space

collinearity error and derives an iterative algorithm to compute orthogonal rotation matrices. Due to lighting

or contrast conditions or occlusion of the target object by an other object, the tracking may fail. Therefore, we

extend our tracking method using a RANSAC algorithm to deal with occlusions. The algorithm is tested with

different camera viewpoints under various image conditions and shows to be accurate and robust.

1 INTRODUCTION

In this study we develop a robust visual tracker

based pose estimation that handles occlusions in Aug-

mented Reality (AR) applications. Our proposed sys-

tem tracks coded ﬁducials, estimates the camera pose

sequentially to input scenes and deals with occlusions

of the target object when it is partially or not visible.

Among the various existing tracking methods, we

can quote 3 usually used generic methods. Gennery

method (Gennery, 1992) is the most intuitive, it pri-

marily consists to project the model and to adjust its

position in the image. This method is simple for im-

plementation but requires a good initialization. The

method of Lowe (Lowe, 1992) expresses the error ac-

cording to the pose parameters. This method con-

verges quickly but requires a good initialization of

pose parameters. Harris (Harris and Stennett, 1990)

exploits image points in addition to the points of in-

terest to make a tracking. His method appears more

precise and more complex than the 2 previous ones.

All methods presented previously, allow to track vis-

ible targets and do not manage occlusion. Other au-

thors were interested by the robustness aspect, thus,

Naimark and Foxlin (Naimark and Foxlin, 1990) im-

plemented a hybrid vision-inertial self-tracker system

which operates in various real-world lighting condi-

tions. The aim is to extract bar-coded ﬁducials in

the presence of very non-uniform lighting. In (Com-

port et al., 2003), the authors integrate a M-estimator

into a visual control law via an iteratively re-weighted

least squares implementation. The implementation

showed that method is robust to occlusion, changes

in illumination and miss-tracking. In (Chen et al.,

2003), the authors proposed also an algorithm based

M-estimator for speeding up the process of template

matching and dealing with outliers.

In this work, we deﬁne our ﬁducials, we make a

comparison between 2 pose estimators and we imple-

ment a tracking algorithm based RANSAC (Fischler

and Bolles, 1981) to deal with target occlusion.

The remainder of the paper is organized as fol-

lows. In section 2, object detection procedure is pre-

sented. Section 3 describes the principle of target ex-

traction using bar-coded ﬁducials. The camera cali-

bration procedure is presented in section 4. In section

4, we present pose estimation using 2 methods, the

OI and the EKF algorithm. In section 6, we detail the

principle of our robust tracking algorithm. Section 7

shows the obtained results. We conclude by section 8

where we present conclusion and future work.

2 OBJECT DETECTION

The ﬁrst step of our work deals with image segmenta-

tion for shape detection. Our goal is to build an identi-

346

Maidi M., Ababsa F. and Mallem M. (2006).

ROBUST AUGMENTED REALITY TRACKING BASED VISUAL POSE ESTIMATION.

In Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics, pages 346-351

DOI: 10.5220/0001216303460351

Copyright

c

SciTePress

ﬁer model describing various object of interests struc-

ture. In order to reduce detection error rate, images

are pre-processed into an acceptable form before any

image analysis is carried out. The image is converted

into a black and white image using a suitable thresh-

old. Then, many operations are applied to process the

image and detect the object of interest

1. Apply Canny ﬁlter to detect contours in image

(Canny, 1986).

2. Smooth the image using a Gaussian ﬁlter to elimi-

nate pixel variations according to the contour seg-

ments by joining the average values.

3. Dilate the smoothed image to remove potential

holes between edge segments.

4. Approximate contour with accuracy proportional to

the contour perimeter.

5. Find the number of object vertices.

6. Identify object boundaries as 4 intersecting lines by

testing collinearity of vertices.

7. Find minimum angle between joint edges, if the

cosines of the 4 angles are small (≈ 0) then a square

is detected

Finally, only objects with 4 vertices and right angles

are retrieved and considered as square shapes. Once

a square object is detected, the next step is to identify

this object and match it with a deﬁned template.

3 FIDUCIAL DESIGN AND

IDENTIFICATION

Our goal is to design ﬁducials which can be robustly

extracted in real time from the scene. Therefore, we

use 2 kind of square ﬁdicials with patterns inside (ﬁg-

ure 1), these ﬁducials contain a barcode used for tem-

plate matching.

Figure 1: Models of Fiducials.

The internal code of the ﬁducial is computed by

spatial sampling of the 3D ﬁducial model. Then, we

project the sample points on the 2D image using the

homography computed from the 4 vertices of the de-

tected ﬁducial with the following formula

su

sv

s

=

h

11

h

12

h

13

h

21

h

22

h

23

h

31

h

32

h

33

x

y

1

(1)

From equation 1 we can write

x y 1 0 0 0 −xu −yu

0 0 0 x y 1 −xv −yv

h

11

h

12

h

13

h

21

h

22

h

23

h

31

h

32

=

h

33

u

h

33

v

(2)

In order to obtain a non trivial solution, we must

ﬁx one of the coefﬁcients h

ij

to 1. In this study we

set h

33

= 1. The number of parameters to estimate

is 8, then we need 8 equations to solve our system.

Therefore we use 4 coplanar points which represent

the ﬁducial vertices in the image. This homography

allows the passage of a point from the 3D object co-

ordinate frame toward a 2D image coordinate frame.

We deﬁne a sampling grid of the ﬁducial model (ﬁg-

ure 2) and we project these sampling points on 2D

image using the the computed homography.

Figure 2: Fiducial sampling.

We compute the ﬁducial corresponding code from

the sampling grid, this code is composed of 16 bits

and represents the ﬁducial samples color. Finally the

ﬁducial code can have 4 values following the 4 possi-

ble ﬁducial orientations.

4 CAMERA CALIBRATION

Our algorithm has been applied to estimate the ob-

ject pose in a sequence of images. In the experiments

we use a CCD camera IS − 800 model from i2S

with 8mm focal length. Before start working with

the camera we must calibrate it to determine its in-

trinsic and extrinsic parameters. The calibration pro-

cedure simulates the camera by a theoretical model

ROBUST AUGMENTED REALITY TRACKING BASED VISUAL POSE ESTIMATION

347

which describes the transformation of the scene (3D

objects) toward the image (Maidi et al., 2005).

The camera calibration determines the geometrical

model of an object and the corresponding image for-

mation system which is described by the following

equation (Zhang, 1998)

su

sv

s

!

= I

x

(

R T

)

X

Y

Z

1

(3)

where s is an arbitrary scale factor, (R, T ) called the

extrinsic parameters, is the rotation and translation

which relate the world coordinate system to the cam-

era coordinate system and I

x

called the camera intrin-

sic matrix given by

I

x

=

α

u

0 u

0

0 α

v

v

0

0 0 1

!

(4)

with (u

0

, v

0

) the coordinates of the principal point

and α

u

and α

v

the scale factors according to u and

v image axes (ﬁgure 3).

Figure 3: Coordinate systems used in camera calibration

procedure.

I

x

is computed from the camera calibration proce-

dure and remains unchanged during the experiments.

The purpose of the work is to compute the extrinsic

matrix, (R, T ), which represents the camera pose.

5 POSE ESTIMATION

The main part of this work deals with the 2D-3D pose

estimation problem. Pose estimation is the determi-

nation of the position and orientation of a 3D object

with respect to the camera coordinate frame (ﬁgure

4).

In section 3, we presented our method of identify-

ing the object of interest. Now, we will develop the

Figure 4: Pose parameters: rotation and translation of the

object coordinate frame according to the camera coordinate

frame.

pose estimation algorithm used to compute the posi-

tion and the orientation of the camera according to the

object frame.

We have studied 2 algorithms of pose estimation,

the ﬁrst is based on the Extended Kalman Filter (EKF)

where the measurement equation models the feature

points of object in image and the process model pre-

dicts the behavior of the system based on the current

state and estimates the position and orientation of the

object toward the camera coordinate frame. The sec-

ond algorithm is the Orthogonal Iteration (OI) algo-

rithm combined with an analytic pose estimator, the

OI method uses the object space collinearity error and

derives an iterative algorithm which computes orthog-

onal rotation matrices.

5.1 The Extended Kalman Filter

We have implemented the EKF algorithm to estimate

the transformation between the object and the camera

coordinate frames (ﬁgure 4). Based on the knowledge

of the feature point position in the camera frame, we

use the perspective projection matrix of the camera

M, where M = I

x

· (

R T

), to get the projected

image coordinates of this point.

u

i

= f (M, P

i

)

v

i

= f (M, P

i

)

(5)

where P

i

= (X

i

, Y

i

, Z

i

)

T

The EKF is composed of 2 steps, the ﬁrst step is the

time update, the state vector and the error covariance

matrix are predicted using initial estimates. Once this

step is ﬁnished, they will become the inputs for the

measurement update (correction) step. With the up-

dated information, the time update step projects the

state vector and the error covariance matrix to the next

ICINCO 2006 - ROBOTICS AND AUTOMATION

348

time step (Welch and Bishop, 2004). By doing these

2 steps recursively, we successfully estimate the state

vector.

The object feature points are determined by the

ﬁducial identiﬁcation algorithm (section 3). For the

EKF, we use 6 states representing the rotation an-

gles and the translation vector of the object coordinate

frame with respect to the camera coordinate frame

represented by x = (roll pitch yaw T

x

T

y

T

z

)

T

.

The measurement input are the image data of the fea-

ture points coming from the camera and given by

z = (u

1

u

2

u

3

u

4

v

1

v

2

v

3

v

4

)

T

, where each fea-

ture point is represented by (u

i

, v

i

).

5.2 The Orthogonal Iteration

Algorithm

The OI algorithm allows to dynamically determine

the external camera parameters using 2D-3D corre-

spondences established by the 2D ﬁducials tracking

algorithm from the current video image (Ababsa and

Mallem, 2004). The OI algorithm computes ﬁrst the

object-space collinearity error vector (Lu et al., 2000)

e

i

=

I −

ˆ

V

i

(RP

i

+ T ) (6)

where

ˆ

V

i

is the observed line of sight projection ma-

trix deﬁned by

ˆ

V

i

=

ˆv

i

ˆv

t

i

ˆv

t

i

ˆv

i

(7)

then, a minimization of squared error is performed

E (R, T ) =

n

X

i=0

ke

i

k

2

=

n

X

i=0

I −

ˆ

V

i

(RP

i

+ T )

2

(8)

The OI algorithm converges to an optimum for any

set of observed points and any starting point (Ababsa

and Mallem, 2004). However, in order to ensure the

convergence of the algorithm into the correct pose in

a minimum time, we initialize the pose by another an-

alytic pose estimator for each acquired image.

The analytic pose estimator needs 4 coplanar points

to provide a solution for equation 2. Equation 1 rep-

resents the transformation between a 3D point and its

projection in the image. This transformation or ho-

mography can be expressed as follows

H = I

x

(

R T

) (9)

From equations 2 and 9, the extrinsic parameters

for each image are computed

R

1

= λI

−1

x

h

1

R

2

= λI

−1

x

h

2

R

3

= R

2

× R

3

T = λI

−1

x

h

3

(10)

Where λ is an arbitrary scalar, h

i

= (h

1i

h

2i

h

3i

)

T

and R

i

= (R

1i

R

2i

R

3i

)

T

.

6 ROBUST TRACKING

In this section we propose a solution to solve the prob-

lem of target occlusion, we use both ﬁducials pre-

sented in section 3. The second ﬁducial model allows

to have a double number of points to track, that per-

mit to compute the homography in a worse case where

one of the targets object is completely occulted. The

principle of the occlusion handling is explained in the

diagram of ﬁgure 5.

Figure 5: Robust tracking diagram.

We use 2 different types of ﬁducial models with 2

distinct codes. Initially, both ﬁducial models must be

visible by the camera to identify and extract their 4

feature points. Then, the robust algorithm of track-

ing is launched, it is based on RANSAC estimator to

track the 8 feature points of ﬁducials by computing a

rigid transformation between 2 successive images ac-

quired by the camera. RANSAC determines the cor-

respondence points between 2 images and in particu-

lar, allows to track the feature points of the ﬁducials

in course of time through the images. If one or more

points of the ﬁducials are occulted, the RANSAC al-

lows the tracking of the visible points by matching

them with points of the previous image. If both ﬁdu-

cials are occulted the RANSAC fails and can’t track

the feature points any more, from where we must re-

project the 3D ﬁcucial models on the current image

using the projection matrix of the camera. That will

allow an initialization of the tracking procedure.

7 RESULTS

For the experiments, we have used a single camera

with ﬁxed internal parameters observing an object.

First we test the algorithm of ﬁducial identiﬁcation

ROBUST AUGMENTED REALITY TRACKING BASED VISUAL POSE ESTIMATION

349

and pose estimation. We printed on a standard laser

printer 80 × 80 mm black and white ﬁducials. The

identiﬁcation algorithm detects squares in image then

computes the barcode of the targets, if it matches with

template code, the target is identiﬁed (ﬁgure 6).

barcode = 544

barcode = 608

Figure 6: Fiducial objects identiﬁcation with their barcodes.

We tested the 2 algorithms of pose estimation on

several camera and ﬁducials positioning in the world.

In all conﬁgurations the 2 algorithms converge. How-

ever to measure the pose estimation errors using the

OI algorithm and the EKF algorithm, we project the

object features points on the image coordinate frame

using the estimated pose and compare them to the 2D

object features points (image error). We do the same

by projecting these 2D feature points to the object co-

ordinate frame using the estimated pose and we com-

pare with the real object position in the object coordi-

nate frame (object error), see ﬁgure 7. Table 1 shows

some estimated errors, standard deviation, variance

and mean error of the pose estimation results for a se-

quence of images where some frames are illustrated

in ﬁgure 8. It resumes the image and object obtained

pose errors from OI and EKF algorithms. The 2 algo-

rithms converge in few iterations and the error values

are clearly less in OI than in EKF, the time conver-

gence is also better in OI method where it is estimated

to be less than 5ms against 20ms for the EKF.

Table 1: Object and image errors in pose estimation (units:

object errors in mm

2

, image errors in pixels

2

).

std var mean error

OI

object 0.6154 0.3788 2.6524

image 5.1e-4 2.6e-7 0.0021

EKF

object 0.6508 0.4236 46.3400

image 0.1081 0.0117 0.4973

We estimate ﬁducials pose in different experiments

with real user motion. In the experiments we move

the camera around ﬁducials, the identiﬁcation algo-

rithm detects and track targets in frames and the OI

algorithm computes the position and the orientation

Figure 7: Pose estimation errors: object and image projec-

tion errors in a tracking sequence.

of the feature points toward the camera. The proposed

tracking system was tested with image sequences cap-

tured under various camera positions (ﬁgure 8).

Figure 8: Overlaying a cube on the identiﬁed targets using

OI algorithm.

The second part of the experiments was to test the

robust tracking algorithm. Figure 9 represents cam-

era images used during our experiments in order to

analyze the performances of the robust tracking al-

gorithm. We initially made an extraction of feature

points in images using the Harris detector. From the

found correspondences, we retained only those corre-

sponding to the ﬁducial feature points computed by

the ﬁducial identiﬁcation algorithm. Then, we car-

ried out an automatic matching of feature points of

ﬁducials between two successive images using the

RANSAC estimator which is a robust estimator re-

jecting outliers. We see also in ﬁgure 9 that the ﬁdu-

cials are tracked even if they are occulted by other ob-

jects, if the number of feature points is at least equal

to 4. If both ﬁducials are occulted, the tracking algo-

ICINCO 2006 - ROBOTICS AND AUTOMATION

350

rithm stops and we must re-iniatilize the procedure by

a re-projection of ﬁducials on the current image using

the previous image homography.

Figure 9: Robust ﬁducial tracking.

8 CONCLUSION

In this paper, we were interested to robust and real

time tracking of ﬁducials in AR applications. We used

bar-coded object targets in order to identify ﬁducials

in images and extract their feature points. Thereafter,

we established the existing relationship between the

perspective model of camera, and we used 2 pose es-

timators which are the EKF and OI algorithm to de-

temine the right location of ﬁducials in the image. The

performance of OI algorithm were clearly better than

the EKF algorithm in term of errors and time execu-

tion. These performances justiﬁed throughly the use

of OI algorithm for the experiments, knowing that the

OI algorithm was initialized by pose parameters com-

puted by an analytic pose estimator.

We initially proposed an algorithm of feature points

tracking for a real time sequence of images based on

the object identiﬁcation and pose computation. We

showed how to extend this method and make it robust

in the case of partial or total occlusion of the target by

using another algorithm based on RANSAC estima-

tor.

The obtained tracking results were precise, robust

and showed the validity of the used approach.

As perspective we will combine the camera with

an inertial measurement unit in order to locate this

hybrid system when the two target objects are com-

pletely occulted.

REFERENCES

Ababsa, F. and Mallem, M. (2004). Robust camera pose es-

timation using 2d ﬁducials tracking for real-time aug-

mented reality systems. Proceedings of ACM SIG-

GRAPH International Conference on Virtual-Reality

Continuum and its Applications in Industry (VR-

CAI2004).

Canny, J. (1986). A computational approach to edge de-

tection. IEEE Trans. Pattern Anal. Mach. Intell.,

8(6):679–698.

Chen, J. H., Chen, C. S., and Chen, Y. S. (2003). Fast algo-

rithm for robust template matching with m-estimators.

IEEE Transactions on Signal Processing, 51(1):36–

45.

Comport, A. I., Marchand, E., and Chaumette, F. (2003).

A real-time tracker for markerless augmented reality.

ACM/IEEE Int. Symp. on Mixed and Augmented Real-

ity (ISMAR 2003), pages 36–45.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: a paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Commun. ACM, 24(6):381–395.

Gennery, D. B. (1992). Visual tracking of known three-

dimensional objects. International Journal of Com-

puter Vision, 7(3):243–270.

Harris, C. and Stennett, C. (1990). Rapid – a video rate ob-

ject tracker. In Proc 1st British Machine Vision Conf,

pages 73–78.

Lowe, D. G. (1992). Robust model-based motion tracking

through the integration of search and estimation. Int.

J. Comput. Vision, 8(2):113–122.

Lu, C. P., , Hager, G. D., and Mjolsness, E. (2000). Fast

and globally convergent pose estimation from video

images. IEEE Transaction on Pattern Analysis and

Machine Intelligence, 22(6):610–622.

Maidi, M., , Ababsa, F., and Mallem, M. (2005). Vision-

inertial system calibration for tracking in augmented

reality. Proceedings of the Second International Con-

ference on Informatics in Control, Automation and

Robotics, ICINCO 2005, 3:156–162.

Naimark, L. and Foxlin, E. (1990). Circular data ma-

trix ﬁducial system and robust image processing for

a wearable vision-inertial self-tracker. IEEE Interna-

tional Symposium on Mixed and Augmented Reality

(ISMAR 2002), pages 27– 36.

Welch, G. and Bishop, G. (2004). An introduction to the

kalman ﬁlter. Technical Report No. TR 95-041, De-

partment of Computer Science, University of North

Carolina.

Zhang, Z. (1998). A ﬂexible new technique for camera cal-

ibration. Technical Report MSR-TR-98-71.

ROBUST AUGMENTED REALITY TRACKING BASED VISUAL POSE ESTIMATION

351