Novel Pose Estimation System for Precise Robotic Manipulation in

Unstructured Environment

Mario di Castro

1,2

, Jorge Camarero Vera

, Alessandro Masi

and Manuel Ferre P´erez

Centre for Automation and Robotics UPM-CSIC, Madrid, Spain

CERN, European Organization for Nuclear Research, Meyrin, Switzerland

Keywords:

Mobile Robots and Intelligent Autonomous Systems, Virtual Environment, Virtual and Augmented Reality,

Perception and Awareness.

Abstract:

Intelligent robotic systems are becoming essential for industry and harsh environments, such as the CERN ac-

celerator complex. Aiming to increase safety and machine availability, robots can help perform repetitive and

dangerous tasks, which humans either prefer to avoid or are unable to do because of hazards, size constraints,

or the extreme environments in which they take place, such as outer space or radioactive experimental areas.

A fundamental part of intelligent robots is the perception of the environment that is possible to obtain only

knowing the 6D pose of the objects around the robotic system. In this paper, we present a novel algorithm

to estimate the 6D pose of an object that can be manipulated by a robot. The proposed algorithms works

consistently in unstructured and harsh environments presenting several constraints like variable luminosity,

difﬁcult accessibility and light reﬂections. The algorithm detects the position and rotation of an object using

3D cameras. The procedure has been developed using Point Cloud Library to manage the point cloud created

with an RGBD Camera. The position and rotation of an object is useful in augmented reality systems to help

the tele-operator and for the realization of autonomous or semi-autonomous tasks.

1 INTRODUCTION

Remotely controlled and autonomous mobile robots,

able to carry out maintenance work and inspections

are nowadays considered in applications for hostile

and hazardous environments in order to reduce hu-

man interventions. The mission of tele-robotics at the

European Organizationfor Nuclear Research (CERN)

may be resumed in the following: Ensuring safety of

personnel improving availability of CERN’s accelera-

tors. The robots that are being developed in harsh and

unstructured environments should offer visual capaci-

ties, among them the capacity to estimate the 6D pose

of an object. CERN has identiﬁed this need, and is in

the process of developing several devices for remote

inspection, radiation monitoring and machinery main-

tenance in order to minimize the personnel exposure

to hazards.

There are several challenges to face in deploying

tele-operated semi-autonomous systems in harsh en-

vironments, as those found at CERN. Examples of

constraints at CERN can be the huge spaces or dis-

tances that a robot may have to travel, like the 27

km of the Large Hadron Collider (LHC), the presence

of unpredictable obstacles in the robot’s path, poor

light conditions, communication difﬁculties specially

in the underground areas, unknown environments in

areas that are not reachable by humans, radiation and

magnetic disturbances that may alter hardware com-

ponent behavior and so on.

Vision algorithms for 6D pose estimation is still

an open problem, despite the enormous advances in

recent years in the ﬁeld of computer vision, due

to the introduction of Deep Learning techniques.

There are several state-of-the-art algorithms to esti-

mate 6D position: LSD-SLAM(Engel et al., 2014),

DSO(Engel et al., 2016), LINEMOD(Hinterstoisser

et al., 2011), PWP3D(Prisacariu and Reid, 2009),

Sliding Shapes(Song and Xiao, 2014). We have inten-

sively tested these mentioned solutions without posi-

tive results mainly due to light reﬂections caused by

metallic objects.

Modern deep learning provides a very powerful

framework for supervised learning. By adding more

layers and more units within a layer, a deep network

can represent functions of increasing complexity.

Deep learning algorithms for 6D pose estimation

have been used in the Amazon Picking Challenge

Castro, M., Vera, J., Masi, A. and Pérez, M.

Novel Pose Estimation System for Precise Robotic Manipulation in Unstructured Environment.

DOI: 10.5220/0006426700500055

In Proceedings of the 14th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2017) - Volume 2, pages 50-55

ISBN: Not Available

(APC) with very good results(Zeng et al., 2016). Also

it has been used to estimate the 6D pose of furni-

ture in rooms(Song and Xiao, 2016). For the training

phase, deep learning based solutions needs high com-

puting power that can’t be present on mobile compact

robots. In addition, these solutions need several train-

ing conditions that can’t be obtained in unstructured

and harsh environments.

State-of-the-art algorithms (LineMOD, PWP3D,

Sliding Shapes) were tested with negative results, the

algorithms were unable to detect the position of the

targets. due to noise and holes (reﬂections) in the

cloud point present specially in tests done with metal-

lic material.

In this paper we present a novel algorithm for 6D

pose estimation, using computer vision, of objects,

including metallic ones, that are going to be tele-

manipulated by robots. The algorithm works con-

sistently in unstructured and harsh environments pre-

senting several constraints like variable luminosity,

difﬁcult accessibility and light reﬂections.

2 SYSTEM OVERVIEW

In the last years, the RGBD cameras have strongly

improved from the ﬁst Kinect camera. We have tested

several RGDB cameras: Intel RealSense R200(Intel,

a), Intel RealSense SR300(Intel, b), Orbbec Astra

Pro(Orbbec, a), Kinect and Kinect V2(Microsoft, ).

Due to its low noise level that is suitable for metal-

lic objects and its dimensions that are appropriate for

a mechanical integration on a robotic gripper, for the

development of the proposed work we decide to use

the Orbbec Astra Pro(Orbbec, b).

As processing power, it was used an Intel Core i7-

3630QM, 4/8 cores at 2.4 Ghz, with 76.8 GFlops. The

proposed algorithm needs approximately120 MB a of

RAM and doesn’t need a graphic card power.

The vision system has been integrated on a robot

developed at CERN with a mobile base and a Schunk

Arm Powerball, Figure 1. For the management of the

robot and its multiple systems, a dedicated graphi-

cal user interface has been developed(Lunghi et al.,

2016).

3 PROPOSED ALGORITHM

The estimation of the position and the rotation of the

object has been solved developing the proposed al-

gorithm that is adaptable to different target objects,

Figure 2. The algorithm proposed uses mainly Point

Figure 1: Robot developed at CERN for surveying tasks.

Cloud Library, therefore the input will be a point

cloud, this is done in the ﬁrst block of the ﬂow chart.

The main problem of working with point clouds of

metallic objects is that these objects produce reﬂec-

tions that give an incomplete point cloud, Figure 3.

This cannot be avoided by modifying the lights, be-

cause the RGBD cameras work with their own light

projector. Therefore, the proposed algorithm must

work with miss information on the depth ﬁeld and a

high noise.

The novel idea of the algorithm is the use of clus-

tering algorithms and segmentation to ﬁnd a robust

point versus changes in the position of the camera,

this is done in the Filtering and FindCorners blocks

of the ﬂow chart. As this point is robust, it will al-

ways be the same point even if the camera is in differ-

ent position. Once obtained the point, it can be known

the rest of the points of the object overlapping a Point

Cloud obtained from a CAD of the object, Figure 4.

To match this point cloud with the detected point in

the point cloud of the camera, the center of the CAD

point cloud is changed and it is located at the same

point that is going to be detected.

Finally, only the point cloud of the CAD should

be moved to the distance XYZ of the detected point.

In this way we can know the position of the object

and reconstruct it in 3D. The parts of the target lost

or occluded, appear reconstructed in the point cloud,

Figure 5.

To locate the robust points, a plane of the object

that contains them must be located. This is usually

given at the top of the objects. Therefore, a series of

point ﬁltrations will have to be performed in order to

locate that plane. A general ﬁltering algorithm has

been developed whose Flow Chart is in Figure 6. Fol-

lowing the steps of the ﬂow chart, the algorithm gets

a surface from which it is easy to extract the robust

points through FindCorners.

Novel Pose Estimation System for Precise Robotic Manipulation in Unstructured Environment

Figure 2: Flow Chart of the Algorithm.

Figure 3: The black zones are null data in the depth ﬁeld

caused by reﬂections of the light of the projector.

Several methods have been developedfor the ﬁltering:

DepthFilter: The target objects must be a bit near of

the camera, so we can ﬁlter distant points.

HeightFilter: This method has the same purpose of

Figure 4: Transformation from CAD mesh to Point Cloud.

Figure 5: Three-dimensional reconstruction of the object

using the algorithm of this paper.

DepthFilter and it allows remove points of the tar-

get object far from the plane that we want to ﬁnd.

Clustering: This methods allows to ﬁlter parts of

less interest of the point cloud. We take the bigger

parts.

ExtractSlice: This is the main part of the algorithm,

it allows to ﬁnd the plane, iteratively search planes

with a certain normal, between a intervals. We ap-

ply two times this method, one to detect a thick

plane, Figure 7, this allows us to ﬁlter a lot of

points, and don’t detect false planes, such as the

surface of small and useless parts. And after that

we apply again this method to extract the surface

of the detected thick plane, Figure 8.

FindCorners: Once the desired plane is found, it is

necessary to locate the corners, usually we search

for the top corners with this method.

A priori the CAD model of the object is taken and

its center is changed to the point that is searched with

the algorithm. This model then moves to the position

found and rotates to ﬁt with both points, the left and

right points; this is done in the ”Translate and rotate

the CAD model to the found corners” block of the

ﬂow chart in Figure 2. Thus ﬁnally one has the three-

dimensional reconstruction of the object.

Finally if the matching fails, autonomously the

robot can move the camera and try with a new one

point cloud.

ICINCO 2017 - 14th International Conference on Informatics in Control, Automation and Robotics

Figure 6: Flow Chart of the Filtering box in Figure 2.

Figure 7: Find the desired plane, ﬁltering another planes of

lesser importance.

4 VALIDATION AND TESTS

The objects selected in this project correspond to a

collimator (Assmann et al., 2006), Figure 9. The col-

limators are of vital importance for the correct opera-

Figure 8: Extract the surface of the desired plane.

tion of the LHC and therefore have great importance

in technical inspections. In addition they are one of

the hot spots of radiation in the LHC.

Figure 9: Photography of a collimator.

Figure 10: Photography of a piece of the collimator system,

the separator.

This algorithm works because a few restrictions:

• The camera is in front of the target.

• The targets are always straight, it can not be lying

down.

• The targets are rigid.

It has been proven that the developed algorithm

works with position errors less than 1cm in the case

of the separator and between 2 and 3cm in the case

of the collimator. This shows that the error depends

Novel Pose Estimation System for Precise Robotic Manipulation in Unstructured Environment

on the distance to which the camera is located, in the

case of the collimator being larger the camera was sit-

uated between 65cm and 75cm further,this distance is

because of the limited dimensions of the LHC tunnel,

Figure 11.

Figure 11: Schematic of a cross section of the LHC tunnel.

During the tests, Figure 12, it has been veriﬁed that

the processing time of the algorithm is between 5 and

10 seconds. This depends on the segmentation op-

eration, which must sometimes iterate over and over

again until it ﬁnds the right plane. The times are quite

acceptable, since the algorithm must be run only once,

because the global position is known.

The test point clouds were taken at distances to the

target between 50 cm and 136 cm, which is the maxi-

mum distance in the tunnel, Figure 11.

5 FUTURE DEVELOPMENT

As seen in the validation tests on large pieces, such

as the collimator, the error of the estimated position

increases, approaching levels that would cause prob-

lems in tele-operation. One way to reduce this error is

to make a second estimation of the position. Once the

ﬁrst estimation is made, the camera in the robot arm

can be approached to a predetermined part, this could

be done automatically. Using the algorithm to detect

that part of the large piece, it is detected with minor

errors. And since the position of that part is known

a priori with respect to the rest, this allows to reduce

the error of the global piece.

6 CONCLUSION

In this paper, a novel algorithm to detect a 6D pose

of an object was presented. The novel solution has

been shown to be robust to be deployed in harsh and

unstructured environment, like the CERN accelera-

tors complexes. The proposed solution is time-wise

light and allows the three-dimensional reconstruction

of an object. This aspects are fundamental for robotic

inspections and telemanipulation, in the speciﬁc for

detecting collisions and performing path planning in

areas that the 2D cameras detect incompletely.

REFERENCES

Assmann, R., Magistris, M., Aberle, O., Mayer, M., Rug-

giero, F., Jim´enez, J., Calatroni, S., Ferrari, A., Bel-

lodi, G., Kurochkin, I., et al. (2006). The ﬁnal colli-

mation system for the lhc. Technical report.

Engel, J., Koltun, V., and Cremers, D. (2016). Direct sparse

odometry. In arXiv:1607.02565.

Engel, J., Sch¨ops, T., and Cremers, D. (2014). LSD-

SLAM: Large-Scale Direct Monocular SLAM, pages

834–849. Springer International Publishing, Cham.

Hinterstoisser, S. Holzer, S., Cagniart, C., Ilic, S., Kono-

lige, K., Navab, N., and Lepetit, V. (2011). Multi-

modal templates for real-time detection of texture-less

objects in heavily cluttered scenes.

Intel. Intel realsense camera r200 datasheet. [online].

Intel. Intel realsense camera sr300 datasheet. [online].

Lunghi, G., Prades, R. M., and Castro, M. D. (2016). An ad-

vanced, adaptive and multimodal graphical user inter-

face for human-robot teleoperation in radioactive sce-

narios. In Proceedings of the 13th International Con-

ference on Informatics in Control, Automation and

Robotics (ICINCO 2016) - Volume 2, Lisbon, Portu-

gal, July 29-31, 2016., pages 224–231.

Microsoft. Kinect hardware. [online].

Orbbec. Orbbec astra pro. [online].

Orbbec. Orbbec. how our 3d camera works. [online].

Prisacariu, V. and Reid, I. (2009). Pwp3d: Real-time seg-

mentation and tracking of 3d objects. In Proceedings

of the 20th British Machine Vision Conference.

Song, S. and Xiao, J. (2014). Sliding Shapes for 3D Object

Detection in Depth Images, pages 634–651. Springer

International Publishing, Cham.

Song, S. and Xiao, J. (2016). Deep Sliding Shapes for

amodal 3D object detection in RGB-D images.

Zeng, A., Yu, K.-T., Song, S., Suo, D., Jr., E. W., Rodriguez,

A., and Xiao, J. (2016). Multi-view self-supervised

deep learning for 6d pose estimation in the amazon

picking challenge.

ICINCO 2017 - 14th International Conference on Informatics in Control, Automation and Robotics

Figure 12: Reconstructions with different views of different point clouds of the separator (ﬁrst row) and the collimator (second

row). And an oiler (third row) and a special socket (fourth row).

Novel Pose Estimation System for Precise Robotic Manipulation in Unstructured Environment