VISION-INERTIAL SYSTEM CALIBRATION FOR TRACKING IN

AUGMENTED REALITY

Madjid Maidi, Fakhr-Eddine Ababsa and Malik Mallem

Complexes Systems Laboratory

University of Evry Val d’Essonne

40 Rue du Pelvoux, CE 1455 Courcouronnes, 91020 Evry Cedex, France

Keywords:

Hybrid sensor, tracking system, calibration, computer vision, augmented reality.

Abstract:

High accuracy registration between real and virtual environments is crucial in Augmented Reality (AR) sys-

tems. However, when a vision/inertial hybrid tracker is used,such accuracy depends mostly on the calibration

procedure to determine transformations between the sensors frames. This calibration allows to project all data

in a single reference frame. In this paper, we describe a new calibration method for a hybrid tracking system.

It consists on rigidly assembling the hybrid tracker to a 6DOF robot in order to simulate the users head motion

while tracking targets in AR environment. Our approach exploits the robot positioning to obtain a high accu-

racy for the tracker calibration. Experimental results and accuracy analyses are presented and demonstrate our

approach effectiveness.

1 INTRODUCTION

Augmented Reality (AR) is the term used to describe

systems in which the user’s view of the real envi-

ronment is enhanced by inserting computer graphics.

These graphics must be generated in such a way that

the user believes that the synthetic objects exist in the

real environment (Jacobs et al., 1997). However, if

there is a misregistration between virtual and real ob-

jects, the augmentation fails. In order to overcome

this problem, over the past years, a new technology

has focused on the use of hybrid tracking devices in

AR systems. Fusing the multiple data sources pro-

vided by several sensors, gives an accurate informa-

tion for virtual objects registration (Azuma, 1997),

(Azuma and Bishop, 1995), (Bajura and Neumann,

1995).

Research works in this domain employ different

sensing technologies for the motion tracking systems

to compensate for the shortcomings of the used sen-

sors and produce robust results (You et al., 1999).

However, each technology has its strengths and weak-

ness and uses a calibration method which depends on

the employed system and the required accuracy of the

application.

Azuma and Bishop (Azuma and Bishop, 1995) de-

veloped an optoelectronic tracking system to improve

dynamic registration. For the calibration, the authors

used directly the viewing measures parameters rely-

ing on geometric constraints. You, Neumann and

Azuma (You et al., 1999) developed a tracking sys-

tem that integrates inertial and vision-based technol-

ogy to compensate for the limitations in each sys-

tem component. The system was calibrated using a

motion-based calibration (You et al., 1999), (You and

Neumann, 2001). Foxlin (Foxlin, 2003) used a self

tracker system composed of inertial and vision sen-

sors. Auto-calibration algorithms were used to get

high accuracy measurement without expensive cal-

ibration equipment. Chai, Hoff and Vincent (Chai

et al., 2002) used inertial sensors with two cameras

for the tracking process. To simplify the kinematics

model of the system, the authors coincided the differ-

ent sensors frames.

In the present work we develop a new technique

for calibrating a camera with an inertial sensor using

a 6DOF robot. Having both camera and robot posi-

tion data while observing some features points of the

environment, the transformation between the camera

and the inertial sensor frames is determined.

The remainder of the paper is organized as follows.

Section 2 describes the system components and rep-

resents the different sensors frames. The calibration

procedure using a robot is presented in section 3, the

models of sensors and the frames transformations are

reported. Section 4 shows the experimental setup and

156

Maidi M., Ababsa F. and Mallem M. (2005).

VISION-INERTIAL SYSTEM CALIBRATION FOR TRACKING IN AUGMENTED REALITY.

In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Robotics and Automation, pages 156-162

DOI: 10.5220/0001183901560162

 SciTePress

the obtained results. We conclude by section 5 where

we present conclusions and future work.

2 HYBRID TRACKING SYSTEM

Our hybrid system is composed of an inertial sensor

and a CCD camera rigidly mounted onto a robot as

illustrated in Figure 1.

Figure 1: Hybrid system mounted onto the robot.

2.1 Inertial Sensor

The inertial sensor (MT9-B from Xsens) measures ac-

celerations, rate of turn and earth magnetic ﬁeld. All

these data are in the right handed cartesian coordinate

system, {I}, as deﬁned in Figure 2. This coordinate

system is the body-ﬁxed to the Inertial Measurement

Unit (IMU) and it is substantially aligned to the exter-

nal housing of the IMU. The IMU software computes

the rotation of the IMU frame, {I}, with respect to

a global coordinate system, {G}, deﬁned as a right

handed cartesian coordinate system (Figure 2) with

• X positive when pointing to the local magnetic

north.

• Y according to right handed coordinates (West).

• Z positive when pointing up.

Figure 2: IMU related frames.

2.2 Camera Model

Our vision sensor is a CCD camera (IS-800 from i2S

with 8mm focal length). Figure 3 illustrates the dif-

ferent frames used for the camera calibration. The

calibration procedure simulates the camera by a the-

oretical model which describes the transformation of

the scene (3D objects) toward the image.

Figure 3: Coordinate systems used in camera calibration.

The camera calibration determines the geometrical

model of an object and the corresponding image for-

mation system which is described by the following

equation

= A [R T ]













(1)

where s is an arbitrary scale factor, (R, T ) called the

extrinsic parameters, is the rotation and translation

which relate the world coordinate system, {W }, to

the camera coordinate system, {C}, and A called the

camera intrinsic matrix given by

A =

0 u

0 α

0 0 1

(2)

with (u

, v

) the coordinates of the principal point

and α

the scale factors according to u and

v image axes.

3 CALIBRATION PROCEDURE

We have rigidly mounted our hybrid tracker onto a

6DOF robot (LR Mate 200i from FANUC Robotics)

in order to exploit the accuracy of the robot position-

ing in a calibration process. The coordinate frames

used in the calibration procedure are illustrated in

VISION-INERTIAL SYSTEM CALIBRATION FOR TRACKING IN AUGMENTED REALITY

157

Figure 4. The transformation between two frames is

represented by the rotation matrix and the translation

vector. (R

, T

) is the transformation of the IMU

frame with respect to the camera frame, (R

, T

)

is the transformation of the tool frame with respect to

the IMU frame, (R

W T

, T

W T

) is the transformation of

the robot tool frame with respect to the world frame

and (R

, T

) is the transformation of the world

frame with respect to the camera frame.

Figure 4: Coordinate frames related to the hybrid system.

The used robot is a manipulator arm (Figure 5), it

is composed of six rotation axes. The robot is com-

pletely articulated with its six axes, its tool (axis 6) is

the reference for motion and all settings applications.

Figure 5: Robot and rotation axes.

The robot provides the coordinates of its tool frame

with respect to a user reference frame. Consequently,

to determine the position of a new tool frame in a new

user frame, it is necessary to make a learning of both

frames: tool and user.

3.1 Robot Frames

By default, the user frame is related to the basis of

the robot (axis 1). The robot data are expressed in the

tool frame with respect to the robot user frame. Nev-

ertheless, the transformation between the robot user

frame and the world frame of the camera is unknown.

For this purpose, we deﬁne a new user frame, {U}, to

have the same orientation as the camera world frame,

{W }. We learn also a new tool frame, {T }, so that

{T } and {I} are coincident. The aim of these frames

deﬁnitions is to derive directly the pose of the IMU

frame from the robot computed coordinates.

3.1.1 Deﬁnition of the Tool Frame

This frame is related to the last axis of the robot. As

we want to align the tool frame and the IMU frame,

we deﬁne the axes of {T } according to {I} axes using

six points method for learning.

3.1.2 Deﬁnition of the User Frame

The three points method is used to learn the user

frame. It consists on moving the tool to the beginning

and according to two other points on the X and Y

axes of the user frame which is in this case the world

camera frame.

3.2 IMU Calibration

3.2.1 IMU Orientation

Method 1 We can determine the IMU orientation

with respect to the global frame which is by deﬁni-

tion related to the earth magnetic ﬁeld. We can also

compute the IMU orientation in an earth ﬁxed coor-

dinate frame that is different from the global coordi-

nate frame. In this work, we deﬁned a new global

coordinate frame, {G}, the IMU has to be orientated

in such a way that the sensor axes all point onto ex-

actly the same direction as the axes of the global co-

ordinates frame (Figure 6). Afterwards, the orienta-

tion output will be with respect to the newly deﬁned

global axes. The IMU is used to record orientation

of 3D object in real time. However, when the IMU is

mounted onto an object which contains ferromagnetic

materials (for example a camera) the measured mag-

netic ﬁeld is distorted and this will cause errors on the

orientation measurement.

We can also obtain the position of the IMU by in-

tegrating the acceleration data. It’s theoretically pos-

sible to double integrate accelerometer data, after co-

ordinate transformations and subtraction of the accel-

eration due to gravity, we obtain 3D position data.

To implement this, some practical issues will be en-

countered:

ICINCO 2005 - ROBOTICS AND AUTOMATION

158

Figure 6: New global frame of the IMU.

• We need a ”starting point”, a reference 3D position,

from which we can start to integrate the 3D accel-

eration data.

• Noise on the acceleration data and small offset er-

rors and/or incorrectly subtracted acceleration due

to gravity, will be integrated and over time will

cause huge (drift) errors in the position estimate if

used longer than a few seconds without any exter-

nal update of true position.

The conclusion is that the orientation and also the

position determined by this method depends very

much on the type of motion and the environment in

which we are operating. For the position estimation

typically, short duration motions, preferably cyclical,

with known reference positions will work well. We

must also take into account the magnetic perturba-

tions, actually, the orientation measured by the IMU

is affected by the disturbances caused by the ferro-

magnetic objects present in the environment. These

constraints and problems obliged us to choose an-

other method more appropriated for our application

which needs accurate orientation and position mea-

surements.

Method 2 As we already evoked, the robot gives

the position of the tool located at the end of its last

axis with respect to a deﬁned user frame. Knowing

the positioning of the IMU with respect to the robot

tool, we deduce the transformation between the IMU

related frame, {I}, and the user frame, {U}, where

{U} represents the camera world frame.

The IMU rotation with respect to the camera frame

= R

W T

T I

(3)

where R

and R

are respectively the rotation of

the IMU frame and the world frame with respect to the

camera frame, R

W T

is the rotation of the tool frame

with respect to the world frame and R

T I

is the rota-

tion of the IMU frame with respect to the tool frame.

3.2.2 IMU Translation

For this part, we use also the data provided by the ro-

bot, which are the coordinates of its tool frame, {T },

with respect to its user frame, {U}. A simple read-

ing on the robot control tool, allows to know the three

translation components of the tool with respect to the

user frame.

Nevertheless, we need to determine the translation

of the IMU frame, {I}, with respect to the camera

frame, {C}. Then, it is important to know exactly the

position of the IMU frame with respect to the robot

tool frame, {T }.

Indeed, the coordinates of the IMU frame origin,

IM U

, according to the tool frame, represent the

translation of the IMU frame with respect to the tool

frame, we denote it T

T I

. This translation is computed

from reported measurements and manufacturer data.

T I

is known, we compute T

W I

, the translation of the

IMU frame with respect to the world frame. We apply

the following formula of coordinate transformation to

determine T

W I

= R

W T

· T

T I

+ T

W T

(4)

Finally, the translation to T

is given by

= R

· T

W I

+ T

(5)

3.3 Camera Calibration

In this work, we have used a calibration method which

is based on Zhang technique (Zhang, 1998). The cam-

era observes a planar pattern from a few (at least two)

different orientations. We can move either the camera

or the planar pattern, the motion does not need to be

known. The camera intrinsic and extrinsic parameters

are solved using an analytical solution, followed by a

nonlinear optimization technique based on the maxi-

mum likelihood criterion (Zhang, 1998). Radial and

tangential lens distortions are also modeled and very

good results have been obtained compared with clas-

sical techniques which use two or more orthogonal

planes.

From (1) the rotation matrix and the translation

vector are computed during the determination of

the camera parameters in the calibration procedure.

This transformation expresses the orientation and the

translation of the camera frame, {C}, with respect to

the camera world frame, {W }.

4 EXPERIMENTS

4.1 Experimental Setup

The hybrid tracker calibration procedure described in

the previous section was experimented. We have de-

VISION-INERTIAL SYSTEM CALIBRATION FOR TRACKING IN AUGMENTED REALITY

159

termined the rigid transformation between the camera

and the IMU frame using the calibration bench illus-

trated in Figure 7. The hybrid system is mounted onto

the robot. First, we ﬁx a metal point to the IMU hous-

ing (Figure 1 and Figure 7). The end of this point

deﬁnes our tool frame which represents the motion

reference and used for all frame learning operations.

We set a test card opposite to the system to calibrate

the camera and to learn the user frame of the robot.

Then, we report the robot tool coordinates for each

recorded position.

Figure 7: Calibration bench.

4.2 Orientation between IMU Frame

and Camera Frame

Since the ﬁrst method which uses the orientation com-

puted by the IMU software does not give good exper-

imental results, we use the robot data which are the

orientation and the position of the tool frame with re-

spect to the user frame.

As we already said, for the kinematics simplicity,

we deﬁne a tool frame, {T }, which has the same ori-

entation as the IMU frame, {I}.

Hence, the rotation between these two frames is an

identity matrix

T I

1 0 0

0 1 0

0 0 1

(6)

The user frame, {U }, is aligned to the camera

world frame, {W }. Then, the IMU orientation is di-

rectly given by the orientation of {T } with respect to

{W }.

The camera orientation is computed by calibration.

Several images taken from different viewpoints were

used for this procedure (Figure 8).

We used a single camera with 8mm lenses and

640×480 8-bit grayscale images. For the experiment,

Figure 8: Camera images used for the calibration.

we have tried various numbers of images. The used

formulation needs at least 2 images in different orien-

tations for the pose estimation. On the other hand, we

found that using more than 14 images did not increase

the accuracy any more. In the experiment, there are

40 corner points in each image. After calibration, the

obtained results for the camera internal parameters

are: the scale factors (α

, α

) = (989, 986) pixel,

the image center (u

, v

) = (380, 283) pixel, the ra-

dial distortions (k

, k

) = (−0.2395, 0.3938) and the

tangential distortions (t

, t

) = (−0.0004, −0.0018).

The extrinsic parameters are represented by the rota-

tion matrix and the translation vector of patterns po-

sition in the image.

The RMS (Root Mean Square) error between the

original and the reconstituted image points is equal to

2.6525 pixels

. Of course we introduced the radial

and tangential distortions into the perspective projec-

tion matrix to correct geometric errors of the camera.

The rotation of the IMU frame, {I}, with respect

to the camera frame, {C}, is determined by

= R

W I

(7)

The rotation angles of the IMU with respect to the

camera are ﬁnally computed from the rotation matrix

We used 14 positions of our hybrid sensors system

and we notice on the whole that the obtained rotation

angles are practically the same. However, to evaluate

efﬁciently the performance of this method, the rota-

tion measurement errors are computed (Figure 9).

Evaluation of the Rotation Errors The mean

value of each angle is computed and the static error

corresponding to each angle measurements is deter-

mined.

The mean rotation errors (MRE) of the angles are

MRE

= 0.32

◦

MRE

= 0.30

◦

MRE

= 0.18

◦

(8)

ICINCO 2005 - ROBOTICS AND AUTOMATION

160

0 5 10 15

89.5

90.5

Experiments

Variation of psi angle (°)

0 5 10 15

−1.5

−1

−0.5

Experiments

Variation of theta angle (°)

0 5 10 15

88.2

88.4

88.6

88.8

Experiments

Variation of phi angle (°)

Mean value of angles

Angle measures

Figure 9: Variation of the IMU rotation angles.

Finally, the rotation angles of the IMU frame with

respect to the camera frame are given by the following

mean values

ψ = 89.69

◦

θ = -0.59

◦

φ = 88.63

◦

(9)

4.3 Translation between IMU Frame

and Camera Frame

To compute the translation of the IMU origin, O

IM U

in the camera frame, it is necessary to know the po-

sition of O

IM U

in the world frame, T

W I

, and then

project into the camera frame. First, we determine

the translation of the IMU frame with respect to the

tool frame which is computed using the projection of

IM U

coordinates into the robot tool frame (see (4)

and (5)). The coordinates of O

IM U

with respect to

the tool frame are

IM U

(mm) =

-6.0

7.8

75.0

{T }

(10)

For this experiment, we use the same positions and

orientations of the robot which were used to compute

the IMU rotation in the camera frame.

We replace the values of T

W T

and T

in (4) and

(5) where T

T I

is O

IM U

given in (10).

Evaluation of the Translation Errors We compute

the mean value of each component of the T

coor-

dinates for all robot positions used for this calibra-

tion (Figure 10). The mean translation errors (MT E)

components are

MT E

= 1.5 mm

MT E

= 1.5 mm

MT E

= 1.2 mm

(11)

Finally, the translation of the IMU frame with re-

spect to the camera frame is given by the following

vector which expresses the coordinates of the IMU

with respect to the camera frame

(mm) =

7.2

40.8

-41.6

(12)

0 5 10 15

Experiments

X position variation (mm)

0 5 10 15

Experiments

Y position variation (mm)

0 5 10 15

−44

−43

−42

−41

−40

−39

Experiments

Z position variation (mm)

Mean value of coordinates

Coordinate measures

Figure 10: Variation of the IMU translation coordinates.

5 CONCLUSIONS

In this work, we presented a new approach to calibrate

a hybrid tracking device for an augmented reality sys-

tem. The system consists on a camera and an inertial

measurement unit rigidly attached and mounted onto

a robot tool axis. This robot allows the displacement

of its tool in a workspace and computes the position

and the orientation of the tool frame in a deﬁned ref-

erence frame. The calibration of the camera and the

coordinates provided by the robot determine the trans-

formation between the inertial measurement unit and

the camera with high accuracy.

The obtained calibration accuracy is sufﬁcient for

the tracking application for which this hybrid system

was concerned. The evaluation of the numerical re-

sults showed the validity and the effectiveness of the

proposed approach.

Our future work is to test prediction ﬁlters with real

data provided by this hybrid system. We will integrate

the robot information data to correct and evaluate the

tracking methods before implementing prediction al-

gorithms on a portable AR system.

VISION-INERTIAL SYSTEM CALIBRATION FOR TRACKING IN AUGMENTED REALITY

161

REFERENCES

Azuma, R. (1997). A survey of augmented reality. In Pres-

ence:Teleoperators and Virtual Environments, Vol.6,

No.4, pp. 355-385.

Azuma, R. and Bishop, G. (1995). Improving static and

dynamic registration in an optical see-through hmd.

In Proceedings of SIGGRAPH 95.

Bajura, M. and Neumann, U. (1995). Dynamic registra-

tion correction in augmented reality systems. In Pro-

ceedings of IEEE Virtual Reality Annual International

Symposium, pp. 189-196.

Chai, L., Hoff, W., and Vincent, T. (2002). Three-

dimensional motion and structure estimation using in-

ertial sensors and computer vision for augmented re-

ality. In Presence: Teleoperators and Virtual Environ-

ments, Vol.11, pp. 474-492.

Foxlin, E. (2003). Vis-tracker: A wearable vision-inertial

self-tracker. In Proceedings of the IEEE VR2003.

Jacobs, M., Livingston, M., and State, A. (1997). Manag-

ing latency in complex augmented reality systems. In

Proceedings of the 1997 Symposium on Interactive 3D

graphics.

You, S. and Neumann, U. (2001). Fusion of vision and gyro

tracking for robust augmented reality applications. In

Proceedings of IEEE VR2001, pp. 71-78.

You, S., Neumann, U., and Azuma, R. (1999). Hybrid iner-

tial and vision tracking for augmented reality registra-

tion. In Proceedings of IEEE VR’99, pp. 260-267.

Zhang, Z. (1998). A ﬂexible new technique for camera cal-

ibration. In Technical Report MSR-TR-98-71.

ICINCO 2005 - ROBOTICS AND AUTOMATION

162