Particle Filtering for Position based 6DOF Visual Servoing in Industrial

Environments

Aitor Ibarguren, Jos´e Mar´ıa Mart´ınez-Otzeta and I˜naki Maurtua

Fundaci´on Tekniker, Apdo. 44 Otaola 20, 20600 Eibar, Gipuzkoa, Spain

Keywords:

Robotics, Visual Servoing, Particle Filter.

Abstract:

Visual servoing allows the introduction of robotic manipulation in dynamic and uncontrolled environments.

This paper presents a position-based visual servoing algorithm using particle ﬁltering. The objective is the

grasping of objects using the 6 degrees of freedom of the robot manipulator (position and orientation) in non-

automated industrial environments using monocular vision. A particle ﬁlter has been added to the position-

based visual servoing algorithm to deal with the different noise sources of those industrial environments

(metallic nature of the objects, dirt or illumination problems). This addition allows dealing with those un-

certainties and being able to recover from errors in the grasping process. Experiments performed in the real

industrial scenario of ROBOFOOT project showed accurate grasping and high level of stability in the visual

servoing process.

1 INTRODUCTION

Traditional industrial robotic applications, like part

placement or spot welding, require precise informa-

tion about the position of the objects to perform their

task. Visual servoing (Weiss et al., 1987; Hutchinson

et al., 1996) can enhance those industrial applications

allowing corrections on the robot trajectories.

Even so, industrial environments raise their own

challenges in the inclusion of visual servoing tech-

niques, especially when the production line is not

completely automated. Dirt, imprecision in the work-

piece placement or changing lighting conditions are

some of the problems that must be tackled in this kind

of environments, introducing uncertainties in the tra-

jectory correction process.

This paper presents a position-based visual servo-

ing algorithm using particle ﬁltering. Based on the

real industrial scenario of ROBOFOOT project, the

paper proposes an algorithm to grasp a workpiece (a

shoe last speciﬁcally) from a not constrained work-

shop, correcting the 6 degrees of freedom of the robot

during the visual servoing process.

The paper is organized as follows. In Section 2 the

related work is presented. Section 3 exposes brieﬂy

particle ﬁlters. Task speciﬁcation and conﬁguration is

shown in Section 4. Section 5 is devoted to the pro-

posed approach, while in Section 6 the experimental

results are shown. Finally, Section 7 presents the con-

clusions as well as the future work to be done.

2 RELATED WORK

Several approaches tackle the use of visual servoing

in industrial environments, posing different industrial

scenarios and approaches.

Sung-Hyun et al. (Han et al., 1999) propose an

image-based visual servoing based on stereo vision.

The use of stereo vision allows guiding the robot ma-

nipulator to the desired location without giving such

prior knowledge about the relative distance to the de-

sired location or the model of the object.

Nomura et al. (Nomura and Naito, 2000) describe

a visual servoing system able to track and grasp in-

dustrial parts moving on a conveyor using a 6DOF

robot arm. A hybrid Kalman Filter is also incorpo-

rated to track a moving object stably against visual

data noise. Experiments are also presented, perform-

ing both 3DOF and 6DOF visual servoing.

Finally, Lippiello et al. (Lippiello et al., 2007)

presented visual servoing applications on Industrial

Robotic cells. On their setup, composed of two

industrial robot manipulators equipped with pneu-

matic grippers, vision systems and a belt conveyor, a

position-based visual servoing is proposed. The sys-

tem also uses Extended Kalman Filters (EKF) (Julier

and Uhlmann, 2004) to manage the occlusions during

the multi-arm manipulation.

161

Ibarguren A., Martínez-Otzeta J. and Maurtua I..

Particle Filtering for Position based 6DOF Visual Servoing in Industrial Environments.

DOI: 10.5220/0003965501610166

In Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2012), pages 161-166

ISBN: 978-989-8565-22-8

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

3 PARTICLE FILTER

Particle ﬁlters (A. Doucet and Gordon, 2001; Kotecha

and Djuric, 2003), also known as sequential Monte

Carlo methods (SMC), are sequential estimation tech-

niques that allow estimating unknown states x

from a

collection of observationsz

1:t

= {z

,...,z

}. The state-

space model is usually described by state transition

and measurement equations

= f

t−1

) (1)

= g

) (2)

where f and g are the state evolution and observa-

tion model functions respectively and v

and u

denote

the process and observation noise respectively.

Based on the previous equations, particle ﬁlters

allow approximating the posterior density (PDF) by

means of a set of particles {x

(i)

}

i=1,...,n

using equa-

tion

p(x

1:t

) =

∑

i=1

(i)

δ(x

− x

(i)

) (3)

where each particle x

(i)

has an importance weight

(i)

associated and δ is the Kronecker delta. These

weights are computed following equation

(i)

= ω

(i)

t−1

p(z

(i)

)p(x

(i)

t−1

)

q(x

(i)

0:t−1

0:t

)

(4)

where p(z

(i)

) is the likehood function of the

measurements z

and q(x

(i)

0:t−1

0:t

) is the proposal

density function.

Based on the previously presented equations the

particle set evolves along time, changing the weights

of the particles and resampling them in terms of the

observations.

4 TASK SPECIFICATION AND

CONFIGURATION

Based on the needs of ROBOFOOT project, an object

grasping task has been designed using real speciﬁca-

tions of footwear workshops. The grasping scenario

has been speciﬁed as:

• Lasts, material with the shape of a foot used to

build shoes, are the object to be grasped. An iron

piece (grasping device) has been added to lasts

to allow a precise and stiff grasping, see Fig. 1,

as well as to protect the leather during the grasp-

ing process. Those grasping devices will be the

objects to be identiﬁed during the visual servoing

process.

Figure 1: Lasts with the grasping device on the trolley.

• Lasts are carried in speciﬁc trolleys mounted on a

manovia. The trolleys are designed to allow the

placement of lasts of different shapes and sizes.

Lasts are placed in the trolley by human operators.

Due to those previous facts it is not possible to

know the pose of the last in the trolley, as seen in

Fig. 1.

• A 6DOF robot arm with a gripper and a camera

and lighting system mounted on the end-effector

with an eye-in-hand conﬁguration.

• Based on the design of the gripper and the grasp-

ing device, the grasping process requires a preci-

sion of around a millimeter and 1-2 degrees on

each axis to grasp the last smoothly. In the same

way the maneuver should take no more than 5-6

seconds.

Based on this scenario, the initial set-up of the sys-

tem has raised some problems related with the pose

estimation of the grasping device:

• Illumination is a key aspect in a vision system. In

this industrial scenario is complicated to place a

suitable external illumination, that is why it was

decided to put a speciﬁc lighting system on the

gripper. Even so, the metallic nature of the grasp-

ing device makes it difﬁcult to get a good image

due to the brightness, reﬂection and the impossi-

bility of lighting all the image properly.

• Some of the tasks to be performed by both the hu-

man operators and robots involve the use of ink,

wax or generate dust (roughing process). This dirt

can be adhered to the grasping device, complicat-

ing the visual servoing process.

Those previous points will make it difﬁcult to ac-

quire clear images of the grasping device, adding un-

certainties to the 6DOF pose estimation that will be

the base of the visual servoing process.

ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics

162

5 PROPOSED APPROACH

Taking into account the needed precision, which im-

plies a high resolution camera, and the demanding im-

age processing due to the unstable conditions of the

image, a dynamic look-and-move approach has been

adopted. A particle ﬁlter has also been added to the

system to manage the uncertainties of the vision sys-

tem.

Next lines will describe the general structure of

the system, as well as the vision module, pose estima-

tion, the particle ﬁlter and the grasping algorithm.

5.1 System Modeling and Architecture

In the described scenario, the space can be repre-

sented by P ∈ ℜ

, a set of three positions and three

orientations, where P = [x,y, z, α,β, γ]

. In the same

way, this scenario will be composed of two different

frames, the robot frame r and the camera frame c.

Given those two frames, the homogeneous transfor-

mation matrix, denoted by

, transforms poses from

frame c to frame r as:

(5)

The error of the positioning task involved in the

grasping process is represented by vector E ∈ ℜ

which represents the difference between the pose of

the object P

in the robot frame and the pose of the

end-effector P

in the robot frame (6). The grasping

process can be seen as a minimization of this error

that will be fulﬁlled when |E| = 0.

E = P

− P







− x

− y

− z

− α

− β

− γ







(6)

For pose estimation, position-based visual servo-

ing systems extract features from the acquired images

and estimate the pose of the object P

and perform

the corrections. Even so, the described scenario intro-

duces uncertainties in the feature extraction step (il-

lumination, metallic workpiece...), introducing errors

in the pose estimation. To deal with this problem, the

use of a particle ﬁlter is proposed. From each image, a

set of n feature vectors F

= { f

, f

,..., f

}

i=1...n

will

be extracted for the pose estimation, each of them re-

lated with a speciﬁc image analysis procedure. Each

of those n vectors will be a hypothesis of the values

of the m features used for the pose estimation, as it

will not be possible to have a unique feature vector

extracted from each image due to the uncertainties in

the image.

From each feature vector F

, P

and P

will be

calculated,

= PE(F

) (7)

(8)

where P

is the i-th hypothesis of the pose of the

object in the camera frame, P

is the i-th hypothesis

of the pose of the object in the robot frame and pose

estimation function PE is the function that relates a

set of features with a pose of the object in the camera

frame.

Figure 2: Dynamic position based look-and-move structure

with particle ﬁlter.

Those n poses, P

i=1..n

will be the observations of

the particle ﬁlter, which will output the ﬁnal pose es-

timation of the object in the robot frame P

. This ﬁnal

pose will be used to calculate the error E between the

object and the end-effector, used to calculate the next

robot movement. Fig. 2 shows the structure of the

proposed Visual Servoing system.

Next lines will describe the feature extraction,

pose estimation, particle ﬁltering and grasping algo-

rithm of the grasping process.

5.2 Feature Extraction

As stated before, one of the challenges of the pre-

sented scenario is the feature extraction for pose es-

timation. The metallic nature of the grasping device

and the illumination problems make it difﬁcult to de-

tect the different features (edges, corners, holes) pre-

cisely. Taking also into account the perspective of the

camera through the grasping process, the image fea-

tures used for pose estimation, shown in Fig. 3, are:

• The center of the three holes (1, 2, 3) of the grasp-

ing device. Only the pixels of the center of the

holes are included, excluding the size and dimen-

sions of the holes, due to the difﬁculties of extract-

ing their contour precisely.

• The inclination of the left edge (4) of the grasping

device.

To detect those image features different thresh-

olds, edge detection algorithms and ﬁlters are used.

Even so, in some images it is not possible to deter-

mine the exact position of the three holes’ centres as

ParticleFilteringforPositionbased6DOFVisualServoinginIndustrialEnvironments

163

Figure 3: Visual features for pose estimation.

there are variouspossible circular shapes in each posi-

tion (ex. the inner screw, outer circle and dirt around

it). In those cases it is not possible to deﬁne a uni-

versal rule to determine which the real contour of the

holes is. To overcome this problem, this approach

proposes to use all those possible centers of the three

holes (left, central and right), creating a set of hypoth-

esis that will be used for pose estimation.

Once the centers of the holes and the left edge are

detected, a feature vector will be calculated for each

hypothesis as:

= {c

,φ

edge

,λ} (9)

where c

is the coordinate in pixels of the central

hole, d

is the distance in pixels between the holes i

and j, φ

is the angle between the holes i and j, φ

edge

is the angle of the left side of the grasping device and

λ is a coefﬁcient that measures the noise (quality) of

the hypothesis based on the similitude of the circular

shapes and their alignment and calculated as

λ =

, p

) +

|φ

−φ

|+1

|φ

|+1

Min(r

)

(10)

where p

is the perimeter of the ith hole, C

is the

coefﬁcient of variation of the perimeters and r

is the

xy axis ratio of the bounding box of the ith hole.

Those are the features that will be used to estimate

the pose of the workpiece.

5.3 Pose Estimation

Once the image is analyzed and the features are ex-

tracted, the pose of the object in the camera frame for

each of the possible hypothesis are calculated as

= [x

,α

,β

,γ

]

= PE(F

) (11)

where each position and orientation is a quadratic

function based on some of the features. The coefﬁ-

cients of the quadratic functions are omitted from the

paper as they are related to the size of the grasping

device and the aberration of the lens.

PE(F

) =











= f(c

)

= g(c

)

= h(d

)

= j(φ

,φ

)

= k(d

)

= l(φ

edge

)

(12)

Once the hypothetical poses of the object in the

camera frame P

are estimated, the poses in the robot

frame P

are calculated using the homogeneoustrans-

formation matrix

. Those hypothesis will be the

observations of the particle ﬁlter.

5.4 Particle Filter

Once the possible hypothesis are calculated it is nec-

essary to merge and fuse this information to perform

the grasping process. To this end a particle ﬁlter is

proposed, as it ﬁts in this kind of non-gaussian prob-

lem.

Focusing on the posed problem, the state in time

t will be deﬁned as a pose of the object in the robot

frame

= [x

,α

,β

,γ

]

(13)

As it is not possible to model the pose estimation

error a priori, the state transition is deﬁned as

= X

t−1

(14)

where X

t−1

is the previous state vector and V

t−1

the process noise.

The observation, on the other hand, is deﬁned by

a set of hypothetical poses of the object in the robot

frame

= P

i=1..n

(15)

So based on this information source each particle

will be deﬁned by a probability P(X

5.4.1 Probability

To calculate the probability of a state given an ob-

servation, initially the distance between the poses is

calculated as

distPos

− x

)

+ (y

− y

)

+ (z

− z

)

(16)

distAng

(α

− α

)

+ (β

− β

)

+ (γ

− γ

)

(17)

where distPos

is the Euclidean distance between

x, y and z positions of the state t and the ith hypoth-

esis and distAng

is the Euclidean distance between

the α, β and γ orientations of the state t and the ith

hypothesis.

P(X

) =

∏

i=1

−distPos

·distAng

·(1+λ

)

(18)

Based on this distance, the probability of the state

is calculated as the product of the exponential of the

distances of all the hypothesis ponderated by the λ

noise coefﬁcient.

ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics

164

5.4.2 Particle Filtering Procedure

Finally the procedure of the particle ﬁlter is given as:

• 1: Find the grasping device in the initial image

and initialise N particles X

(i)

with the differenthy-

pothesis randomly, where w

(i)

= 1/N

• 2: if ESS < threshold (Effective Sample Size),

draw N samples with selection with replacement

• 3: Predict x

(i)

= x

(i)

t−1

+ v

t−1

• 4: Replace the particles with the lowest weight by

the new hypothesis found in the image

• 5: Update importance weights w

(i)

t−1

P(X

)

• 6: Normalize weights w

′

)

= w

(i)

∑

j=1

( j)

• 7: Set t = t + 1, goto Step 2

In this procedure EES (Liu et al., 2000) (Effective

Sample Size) is calculated as

var(w

(i)

)

(i)

)

∑

i=1

(Nw

(i)

− 1)

(19)

ESS

1+ cv

(20)

where N is the number of particles and w

(i)

is the

weight of particle i in time t. The fourth step has been

added to allow a fast convergence.

Based on this discrete approximation of the poste-

rior probability, the object is tracked along the grasp-

ing procedure.

5.5 Grasping Algorithm

The feature extraction step has shown that the best

images are acquired when camera is perpendicular to

the grasping device and the end-effector to a 30mm

distance, E = [0,0, 30,0,0,0]

, as it solves in part

the illumination problems. Taking it into account the

grasping algorithm will try to minimize the error until

this value is reached, adding an small tolerance of ±1

mm in position and ±1.5

◦

in orientation to avoid an

inﬁnite loop. Once this error is reached the robot will

make a ﬁnal approach in just one axis and perform the

grasping.

6 EXPERIMENTAL RESULTS

To test the performance of the proposed approach an

experiment has been designed in order to measure its

suitability. Those are the speciﬁcations of the experi-

ment:

• Six different particle ﬁlter conﬁgurations have

been set-up, mixing different state estimation

methods and number of particles. Speciﬁcally the

estate estimation methods are:

– Best particle (the one with maximum weight)

– Robust mean with the 3 particles with maxi-

mum weight (denoted as R.M. 3)

– Robust mean with the 5 particles with maxi-

mum weight (denoted as R.M. 5)

• For each conﬁguration, 250 repetitions have been

performed using different shoes and grasping de-

vices. Most of the grasping devices have been

dirtied up to include variety and simulate real

conditions. Due to the structure of the manovia,

the grasping device will be placed in a space of

200x100x200mm (depending on the shoe and its

placement) and with a rotation of ±15

◦

in each

axis.

• The grasping process will fail if does not achieve

to pick up the shoe. There are two reasons for

this fail, the grasping device has not been found in

the initial image (ex. not well illuminated due to

its orientation) or a wrong pose estimation which

leads to movement that leaves the grasping de-

vices out of the scope of the camera.

TABLE 1 shows the results of the experiment. The

ﬁrst column describes the number of particles and the

estimation method, the second one the success rate,

third and fourth columns show the mean (µ) and the

standard deviation (σ) of the grasping time in sec-

onds, ﬁfth and sixth columns the mean (µ) and the

standard deviation (σ) of the number of movements

required to grasp the shoe and ﬁnally the last column

shows the time required to process each cycle of the

particle ﬁlter in milliseconds.

Table 1: Results of the experiment.

Time (s) Mov.

% µ σ µ σ ms/image

50 - Best 96.4 4.84 1.79 10.39 4.45 127.53

100 - Best 95.6 4.94 2.03 10.41 5.34 133.50

50 - R.M. 3 97.2 4.96 1.85 10.36 4.99 133.79

100 - R.M. 3 98 4.98 1.71 10.27 4.68 136.71

50 - R.M. 5 99.6 4.78 1.94 10.25 5.05 123.03

100 - R.M. 5 99.2 4.69 1.65 9.89 4.14 127.94

Results show a better performance of the system

using the robust mean estimation method with 5 par-

ticles, both in success rate and in grasping time. In

the case of the success rate, in all the conﬁgurations

a part of the fails were related with the search of the

grasping device in the initial image (more or less the

same quantity for each conﬁguration, around a 1-2%).

ParticleFilteringforPositionbased6DOFVisualServoinginIndustrialEnvironments

165

In the same way, it seems that the addition of more

particles does not help to improve the success rates

although it does not increase signiﬁcantly the process-

ing time of each visual servoing iteration.

7 CONCLUSIONS AND FUTURE

WORK

This paper presents a dynamic position-based look-

and-move architecture to perform visual servoing

with 6DOF in industrial environments. This kind

of environments usually suffers from unstable condi-

tions like changing lighting condition or dirt, intro-

ducing uncertainties in the visual servoing process.

To overcomethe above mentioned problem, this paper

proposes the use of a particle ﬁlter to manage multiple

hypothesis of the poses of the workpiece to grasp.

The results show a high success rate of the grasp-

ing system, reaching around a 99% of success in the

different experiments performed. The use of particle

ﬁltering allows the use and fuse various hypothesis,

overcoming the noise problems of the presented sce-

nario. The system also performs the grasping process

in a suitable time, increasing fewprocessing time with

the addition of the particle ﬁlter.

As further work, there are two interesting paths

to follow. On one hand, test this approach in simi-

lar scenarios (different workpiece, environment, noise

source...) to test its suitability. On the other hand, one

or more sensors could be attached to the end-effector

as new data sources, using the particle ﬁlter to fuse

the information received from the different sources as

done in different robotic applications.

ACKNOWLEDGEMENTS

This work has been performed within the scope of the

project ”ROBOFOOT: Smart robotics for high added

value footwear industry ”. ROBOFOOT is a Small

or Medium-scale focused research project supported

by the European Commission in the 7th Framework

Programme (260159). For further information see

http://www.robofoot.eu

REFERENCES

A. Doucet, N. De Freitas, N. and Gordon, N. (2001). Se-

quential Monte Carlo methods in practice. Springer-

Verlag.

Han, S.-H., Seo, W., Yoon, K., and Lee, M.-H. (1999). Real-

time control of an industrial robot using image-based

visual servoing. In Intelligent Robots and Systems,

1999. IROS ’99. Proceedings. 1999 IEEE/RSJ Inter-

national Conference on, volume 3, pages 1762 –1767

vol.3.

Hutchinson, S., Hager, G., and Corke, P. (1996). A tuto-

rial on visual servo control. Robotics and Automation,

IEEE Transactions on, 12(5):651 –670.

Julier, S. and Uhlmann, J. (2004). Unscented ﬁltering

and nonlinear estimation. Proceedings of the IEEE,

92(3):401 – 422.

Kotecha, J. and Djuric, P. (2003). Gaussian particle ﬁl-

tering. Signal Processing, IEEE Transactions on,

51(10):2592 – 2601.

Lippiello, V., Siciliano, B., and Villani, L. (2007). Position-

based visual servoing in industrial multirobot cells us-

ing a hybrid camera conﬁguration. Robotics, IEEE

Transactions on, 23(1):73 –86.

Liu, J., Chen, R., and Logvinenko, T. (2000). A theoretical

framework for sequential importance sampling and re-

sampling. Sequential Monte Carlo Methods in Prac-

tice, pages 1–24.

Nomura, H. and Naito, T. (2000). Integrated visual servo-

ing system to grasp industrial parts moving on con-

veyer by controlling 6dof arm. In Systems, Man, and

Cybernetics, 2000 IEEE International Conference on,

volume 3, pages 1768 –1775 vol.3.

Weiss, L., Sanderson, A. C., and Neuman, C. P. (1987).

Dynamic sensor-based control of robots with visual

feedback. IEEE Journal on Robotics and Automation,

RA-3(5).

ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics

166