CATADIOPTRIC MULTIVIEW POSE ESTIMATION

FOR ROBOTIC PICK AND PLACE

Markus Heber, Matthias R¨uther and Horst Bischof

Institute for Computer Graphics and Vision, Graz University of Technology, Inffeldgasse 16/II, Graz, Austria

Keywords:

Mirror symmetry, Object pose, Contour matching, Industrial application.

Abstract:

Robotic handling of objects requires exact knowledge of the object pose. In this work, we propose a novel

vision system, allowing robust and accurate pose estimation of objects, which are grasped and held in unknown

pose by an industrial manipulator. For superior robustness, we solely rely on object contour as a visual cue.

We address the apparent problems of object symmetry and ambiguous perspective by acquiring multiple views

of the object cheaply and accurately, through a mirror system. Self-calibration of the mirror setup allows us to

model the mirror geometry and perform metric multiview contour matching with a known 3D model.

1 INTRODUCTION

Automated robotic handling processes rely on exact

knowledge of type and pose of the object to manipu-

late. So the problem of pose estimation is concerned

with determining object position and orientation rela-

tive to a reference coordinate frame. We especially

address the case of uniformly textured, opaque or

specularly reﬂecting objects. Approaches based on

3D reconstruction will fail due to robustness prob-

lems. Here, the only robust geometric cue is the ob-

ject contour, which can be segmented even on trans-

parent objects with specialized illumination. If the 3D

model is known a priori, contour shape matching and

registration techniques are the method of choice, to

avoid laborious appearance teaching.

An early pose estimation approach was introduced by

Phong et al. (Phong et al., 1995). It is based on line

and point correspondences between images of an ob-

ject of different pose. The six extrinsic parameters,

represented by dual quaternions, are estimated by

minimizing a quadric error function. Their minimiza-

tion technique is compared with the Newton method,

and Levenberg-Marquardtoptimization. Pose estima-

tion results are compared to ground truth data, as well

as the results of Faugeras and Toscani (Faugeras and

Toscani, 1986).

Byne and Anderson (Byne and Anderson, 1998) in-

troduced a CAD-based method. They combine geo-

metric descriptions of a 3D model, appearance infor-

mation and functional information. Online, they gen-

erate hypotheses from these models, based on either

edge information, or classiﬁcation of surface material

type. Final pose reﬁnement is done by maximizing a

ﬁtting score.

An extensive review of pose estimation methods is

given by Rosenhahn et al. (Rosenhahn et al., 2004).

In (Rosenhahn and Sommer, 2004) Rosenhahn and

Sommer introduced a free-form surface based ap-

proach, where surface models are represented by

three Fourier descriptors. They estimate the corre-

sponding 3D silhouettes and reﬁne the pose using the

iterative closest point algorithm (ICP), introduced by

Zhang (Zhang, 1994). In a subsequent work, Rosen-

hahn et al. (Rosenhahn et al., 2006) compared ICP

and a variational method for shape registration via

level sets. Evaluation results suggest, that the vari-

ational method is more robust against large pose vari-

ations, while ICP is more accurate.

A recent approach to CAD-based pose estimation is

introduced by Ulrich et al. (Ulrich et al., 2009), where

hierarchical views of a CAD model are generated at

multiple scale levels. For shape matching they evalu-

ate a similarity measure based on gradient orientation

differences. Their experiments show considerable ro-

bustness to occlusions, clutter and contrast changes.

Chang et al. (Chang et al., 2009) investigated pose es-

timation and segmentation of specular objects. Spec-

ular reﬂections and specular ﬂow of a known 3D

model are used for localization and pose estimation.

Experiments show the feasibility of their method un-

der sparse highlights and small environmental mo-

423

Heber M., Rüther M. and Bischof H. (2010).

CATADIOPTRIC MULTIVIEW POSE ESTIMATION FOR ROBOTIC PICK AND PLACE.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 423-426

DOI: 10.5220/0002822704230426

 SciTePress

tion.

Most approaches assume the presence of edges or any

kind of local features. Furthermore, object symme-

tries and shape ambiguities are not considered. Our

method in contrast also works with untextured, trans-

parent and shiny objects. They can also have smooth

surface geometry which is not easily approximated by

polyhedral models. In our work, we rely only on ob-

ject contour information. To overcome the problem

of contour symmetry and ambiguous views, we pro-

pose a multi-mirror system in combination with a sin-

gle camera and light source. The result is a cheap,

perfectly synchronized multiview system, which can

be self-calibrated from any known reference object.

Furthermore, our pose estimation procedure is insen-

sitive to local minima, because an exhaustive search

over the space of object contours is performed.

2 CATADIOPTRIC GEOMETRY

A central perspective camera projection matrix is

given by a 3 × 4 matrix P, computed from camera

calibration matrix K, which includes the ﬁve intrin-

sic camera parameters, rotation R and translation t:

P = K[R|t] (1)

A 3D plane is deﬁned as:

x+ d = 0, (2)

with normal vector n and the distance to the ori-

gin d. Reﬂections in 3D space are Euclidean trans-

formations, which additionally perform orientation

changes. Algebraically, a reﬂection is given by a ma-

trix D

4×4

, which is able to reﬂect points x, planes Π

and cameras P over a reﬂection plane:

′

= D

x, Π

′

= D

−1

Π, P

′

= PD

. (3)

Catadioptric devices use reﬂective devices in a cam-

era’s ﬁeld of view to capture an object from more than

one viewpoint. Additionally, catadioptric stereo has

geometric and radiometric advantages. One geomet-

ric advantage is the reduced number of camera param-

eters. One radiometric advantage is the replication of

light sources due to mirror reﬂections. The relation

between the real camera and its virtual reﬂection is

deﬁned by the mirror reﬂection matrix D, which is

deﬁned by the mirror normal n, the camera-mirror-

distance d and the camera coordinate frame origin

c (Gluckman and Nayar, 2001):

D =



I− 2nn

c− 2dn

0 1





R t

0 1



. (4)

A virtual camera is computed from P

real

virtual

= P

real

. (5)

3 POSE ESTIMATION

Pose estimation is based on matching measured con-

tours from all mirror views against a set of pre-

generated synthetic contours. Using a known 3D

model, and camera-mirror geometry as obtained from

calibration, the object is rendered in different poses.

From each rendered image, contours are extracted

and added to a database. The set of all rendered im-

ages covers the space of possible object orientations

in front of the camera, sampled in discrete intervals.

Pose estimation subsequently is reduced to an exhaus-

tive search within this database. The classiﬁcation re-

sult is guaranteed to be globally optimal with respect

to the discretized space of orientations.

3.1 System Calibration

We assume the camera projection center to be lo-

cated at the world coordinate origin. Hence, camera

rotation R is the identity matrix and translation t is

zero. Intrinsic calibration is performed as proposed

by Zhang (Zhang, 1999). The mirror calibration pro-

cedure used within our approach is based on the work

of Hu et al. (Hu et al., 2005), where ﬁrst the mir-

ror plane normal n is estimated, followed by camera-

mirror-distance d. For computation of n, two pairs

of corresponding points between real view and each

mirror view are required. These correspondences are

obtained via the object convexhull in the real and mir-

ror views. There are exactly two lines, which are tan-

gent to both convex hulls. They are called limitation

lines and provide a pair of corresponding points each.

Furthermore, their intersection provides the vanish-

ing point vp of n, which coincidentally describes the

epipole e of the virtual camera. Mirror normal n is

computed by evaluating the direction of the viewing

ray through e in the image:

n = (n

)

= K

−1

e. (6)

Camera-mirror-distance d is computed with knowl-

edge of a single object point (x,y) and its mirrored

correspondence (x

′

d =

∆uz

2(u

′

− n

)

, (7)

where (u,v) are normalized image coordinates, and

∆u = u

′

− u. The nominal distance between camera

center and 3D world point z

is set to 1, which results

in a system calibration up to an unknown scaling fac-

tor (Hu et al., 2005). In the case of multiple mirrors,

these correspondencescannot be uniquely determined

over all views. Hence, an additional point correspon-

dence is established by evaluating the centroid of a

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

424

sphere. With known sphere radius, the calibration can

be upgraded to metric, including exact scale.

3.2 Contour Representation and

Similarity Metric

The matching process itself needs to be fast and accu-

rate, but it does not have to be scale- or rotation invari-

ant. We therefore choose a simple approach, where

each contour is represented by a set S

of neighboring

contour points S

= {x

...x

}. Without scale invari-

ance, identical contours have approximately the same

number of points, and a similarity metric is computed,

using the sum of squared distances of corresponding

points:

∑

k=1

− x

, (8)

where e

is the similarity error between two contours

and S

3.3 Contour Extraction and Matching

Shape matching is an exhaustive search for the

best matching synthetic contour in all views. The

camera intrinsics, mirror parameters, and a contour

database, which stores the synthetic object con-

tours for all orientations, are given. Experiments

on synthetic contour matching as well as real ob-

ject pose estimation (see Section 4) have shown

that it is sufﬁcient to consider a discrete set of

orientations. Considering the space of all object

orientations as the set of all roll, pitch and yaw-angles

(r,p,y)

,r ∈ [0

◦

,360

◦

],p ∈ [0

◦

,360

◦

],y ∈ [0

◦

,360

◦

we discretize in 12

◦

steps and have 27k database

entries. Selection of the discretization step depends

on the underlying application as well as available

hardware. Modern graphics cards allow rendering of

six views of a detailed 3D model at over 100fps, and

a database of 27k entries is generated in roughly ﬁve

minutes. To estimate an object pose, the contours are

extracted from an image and compared to entries in

the database, according to the metric in Section 3.2.

To further speed up the matching process, contour

features like length and aspect ratio are used to reject

dissimilar contours early on.

4 EXPERIMENTS

We focus on object symmetries and ambiguities, be-

cause these are the most challenging cases. In Fig-

ure 1 a typical symmetry case is shown. Our ex-

perimental hardware setup consists of a monochrome

Figure 1: Tow views of a sample test object. Extracted im-

age object contours are similar due to object symmetry.

CCD camera, and ﬁve planar mirrors. The mirrors

are placed transversely in front of the camera. A LED

light source illuminates the object coaxially against

deﬁned background to simplify the contour extrac-

tion process. Due to the transversal placement of

the mirrors, light sources are replicated, which results

in approximately diffuse illumination conditions and

avoids shading on round object borders.

An object is placed in unknown orientation inside

the catadioptric setup. We evaluated our setup with

three different test objects: (a) 5× 10× 5mm block,

providing a front-back symmetry, if only image con-

tours are taken into account, (b) 5× 15× 5mm slant-

ing block, providing an upside-down symmetry, and

upside-down symmetry.

4.1 Self Calibration

The system is calibrated as described in Section 3.1.

We evaluated the reprojection error (RE) and used it

as a measure of calibration accuracy. Over severalcal-

ibration runs we achieved an average RE of 0.01px.

This comes up to a geometric error of 2µm at a cam-

era object distance of 350mm.

4.2 Pose Estimation

Pose estimation has been evaluated on synthetic con-

tours as well as real objects, with a focus on sym-

metries, that cannot be resolved from a single view.

These include upside-down ﬂips (uds) and front-back

ﬂips (fbs). Additionally, different rotations (rot) have

been evaluated. The upside-down ambiguity can be

resolved with our proposed method. For symmet-

ric blocks like object (a), a front-back symmetry re-

mains, and quadric blocks would lead to four equiv-

alent poses. In order to evaluate correctness, pose

estimation has also been evaluated on synthetic data.

Synthetic contours with slightly different poses to the

contour database poses were generated. According to

a discretization step of 12

◦

, the residual error should

not be greater than 6

◦

for a correct match. As pre-

sented in Table 1, our results lie within this expected

range. Numerical results on the real test objects are

CATADIOPTRIC MULTIVIEW POSE ESTIMATION FOR ROBOTIC PICK AND PLACE

425

given in Table 2. For each object, (uds), (fbs) as well

as different rotations (rot) have been evaluated. For

object (a), ’?’ means a successful match, but a front-

back symmetry could not be resolved. Figure 2 shows

exemplary matching results.

Table 1: Results on pose estimation of randomly generated

synthetic contours at a discretization step of 12

◦

Object Runs Avg. rpy Errors

(a) 10 r = 5.2

◦

,p = 3.4

◦

,y = 3.5

◦

(b) 10 r = 6.0

◦

,p = 3.2

◦

,y = 5.4

◦

,p = 4.6

◦

,y = 4.5

◦

Table 2: Results on pose estimation of real test objects,

where e.g 1/2 denotes that one out of two runs was correct.

Object uds fbs rot

(a) 5/ 5 ’?’ / 3 4 / 4

(b) 4/ 5 2/ 3 3/ 5

(a) (b)

Figure 2: Two examples of contour matching results. Match

result are shown with a translational offset for better visual-

ization.

5 CONCLUSIONS

We have presented a novel method for object pose es-

timation. Our approach beneﬁts from employing mir-

rors in the optical path, due to replicated object illu-

mination, and multiple perfectly synchronized views.

We have shown that object symmetry and ambiguous

perspective are better resolved than using a monocu-

lar setup. The problem of object pose estimation was

reduced to an exhaustive search of matching contours.

Experimental results show that this procedure does

not get stuck in local minima. Most object symme-

tries were resolved. Creation and storage of a contour

database is feasible up to a certain discretization step.

Our choice of 12

◦

might not be sufﬁcient to resolve

ﬁne details of some objects, though. Future work in-

cludes the intelligent organization of a more dense

database by clustering of similar views. Furthermore,

evaluation of real objects with given ground truth on

their pose will be an issue. Further iterative 3D ob-

ject registration can also be taken into account. To

overcome the discretization error, one could consider

further pose reﬁnement, using iterative methods like

ICP.

REFERENCES

Byne, J. and Anderson, J. (1998). A CAD based computer

vision system. IVC, 16(8).

Chang, J. Y., Rsakar, R., and Agrawal, A. (2009). 3D pose

estimation and segmentation using specular cues. In

Proc. CVPR 2009, Miami, FL.

Faugeras, O. D. and Toscani, G. (1986). The calibration

problem for stereo. In Proc. CVPR, Miami Beach.

Gluckman, J. and Nayar, S. K. (2001). Catadioptric stereo

using planar mirrors. Int. J. Comput. Vision, 44(1).

Hu, B., Brown, C., and Nelson, R. (2005). Multiple-view

3-D reconstruction using a mirror. Technical report,

University of Rochester.

Phong, T. Q., Horaud, R., Yassine, A., and Tao, P. D. (1995).

Object pose from 2-D to 3-D point and line correspon-

dences. Int. J. Comput. Vision, 15(3).

Rosenhahn, B., Brox, T., Cremers, D., and Seidel, H.-P.

(2006). A comparison of shape matching methods for

contour based pose estimation. Combinatorial Image

Analysis.

Rosenhahn, B., Perwass, C., and Sommer, G. (2004).

CVOnline: Foundations about 2D-3D pose estima-

tion. CVOnline.

Rosenhahn, B. and Sommer, G. (2004). Pose estimation

of free-form objects. In Proc. ECCV 2004, Part I, T.

Pajdla and J. Matas (Eds.).

Ulrich, M., Wiedemann, C., and Steger, C. (2009). CAD-

based recognition of 3D objects in monocular images.

In Proc. ICRA 2009, Kobe, Japan.

Zhang, Z. (1994). Iterative point matching for registration

of free-form curves and surfaces. Int. J. Comput. Vi-

sion, 13(2).

Zhang, Z. (1999). Flexible camera calibration by viewing a

plane from unknown orientations. In Proc. ICCV.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

426