Vision-based Robotic System for Object Agnostic Placing Operations

Nikolaos Rofalis

, Lazaros Nalpantidis

, Nils Axel Andersen

and Volker Kr

uger

Dept. of Mechanical and Manufacturing Engineering, Aalborg University, Aalborg, Denmark

Dept. of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark

Keywords:

Industrial Robots, Object Detection, Object Matching, Grasping, Placing, Kitting.

Abstract:

Industrial robots are part of almost all modern factories. Even though, industrial robots nowadays manipulate

objects of a huge variety in different environments, exact knowledge about both of them is generally assumed.

The aim of this work is to investigate the ability of a robotic system to operate within an unknown environment

manipulating unknown objects. The developed system detects objects, ﬁnds matching compartments in a

placing box, and ultimately grasps and places the objects there. The developed system exploits 3D sensing

and visual feature extraction. No prior knowledge is provided to the system, neither for the objects nor for

the placing box. The experimental evaluation of the developed robotic system shows that a combination of

seemingly simple modules and strategies can provide effective solution to the targeted problem.

1 INTRODUCTION

Even if robots are evolving rapidly, the level of au-

tomation in manufacturing can in reality be very low.

As evidenced by the EU project STAMINA (Sustain-

able and reliable robotics for part handling in manu-

facturing automation), part handling across the vari-

ous assembly stages in the automotive industry is the

task with the lowest automation levels—below 30%

This fact comes as the result of two factors. First, pro-

duction lines and handled parts in industry are charac-

terized by large diversity. Second, most robotic sys-

tems deployed in industry require pre-speciﬁed struc-

tured environments and can only manipulate a priori

known objects. Thus, it becomes evident that au-

tomated systems need to evolve and become more

ﬂexible requiring less—or even no—prior knowledge

about their environment and the objects to be handled.

In this work we present the development of an in-

dustrial robotic system that is able to operate within

an uncertain environment and manipulate unknown

objects. We focus on automating part handling tasks,

as an indicative industrial task that will have an effect

on various industrial sectors—including the automo-

tive industry. A relevant such task is kitting— a con-

cept whose automation is pursued in the STAMINA

project. Kitting boxes are placed on the chassis of

each car in the production line, containing parts that

http://stamina-robot.eu/about-stamina

Figure 1: The STAMINA mobile robot (left) manipulating

objects and ﬁlling two kitting boxes placed in the front of

the robotic platform. A closeup of a kitting box structure

(right).

will be used for that speciﬁc car. Thus, each box con-

tains different parts. This is where the STAMINA

robotic system—shown in Fig. 1(left)—comes into

the picture. The robot receives information form the

Manufacturing Execution System (MES) about the

parts required for each kitting box, it identiﬁes the re-

quested objects and it places them into speciﬁc com-

partments of the kitting box, as shown in Fig. 1(right).

We have replicated the STAMINA scenario, in the

lab using a smaller UR10 robotic arm, 3D printed

parts with complex geometries and also a custom-

made kitting box. The system has no prior knowledge

about the objects or the kitting box, apart from their

coarse initial locations. Each object ﬁts in a speciﬁc

compartment of the kitting box. However these corre-

spondences are unknown to the system. Correct cor-

Rofalis, N., Nalpantidis, L., Andersen, N. and Krüger, V.

Vision-based Robotic System for Object Agnostic Placing Operations.

DOI: 10.5220/0005712404650473

In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 3: VISAPP, pages 467-475

ISBN: 978-989-758-175-5

467

respondences between the objects and the compart-

ments are required before starting manipulating the

former. Afterwards, the robot picks the object—with

no knowledge about suitable grasping poses—moves

to the matching hole, and ﬁnally places it in there.

2 RELATED WORK

This work presents a working system that needs to

perform a number of different functions, such as

object detection from visual data, object matching,

grasping and placing of objects. Previous work on

object detection has been conducted using a variety of

methods. (Divvala et al., 2009) conducted an empiri-

cal study on object detection. Using a standard dataset

and top-performing local appearance, they evaluate

numerous sources of context. Context is also used

by (Xiong and Huber, 2010) in order to create seman-

tic 3D models. These 3D models contain information

about the geometry and the identity of a part of a fa-

cility (ﬂoors, walls). Using data from a 3D laser scan-

ner (point clouds) they classify planes discovered in

the environment. Also, (Koppula et al., 2011) label

semantically objects in indoor scenes using 3D point

clouds. Their graphical model, contains information,

such as, visual appearance, shape, geometric relation-

ships. Differently, we are focusing on industrial ob-

jects.

Another object segmentation method by (Nalpan-

tidis et al., 2012) takes advantage of camera move-

ment; performs edge extraction, polar domain repre-

sentation and integrates them over time. Furthermore,

(Fisher and Hanrahan, 2010) developed an algorithm

that can search a scene and distinguish the asked ob-

ject among the others using geometric cues and spa-

tial relationships. Robust real-time object detection

performed by (Viola and Jones, 2001). They intro-

duced a new image representation that allows rapid

feature detection. Also, their learning algorithm is ca-

pable of detecting a number of crucial features on the

images. Their developed algorithm utilises classiﬁers

that allow quick and robust background extraction, in

order to focus only on image’s part that contains use-

ful for detection features. Another comparative study

about object detection performed by (Sapna Varshney

et al., 2009). They tested different techniques for im-

age segmentation, such as, edge-based, KMeans clus-

tering, thresholding and region-based. Moreover, us-

ing still images, 3D geometric properties can be de-

rived that will allow easier object detection. (Saxena

et al., 2008) estimate depth from a single still image.

They collect monocular images of outdoor environ-

ment alongside with their corresponding depth maps

(ground truth). Applying supervised machine learn-

ing make an estimation of the depth using still im-

ages. On the contrary, in our work there is no prior

knowledge about the objects.

Object matching in two-dimensional images has

been an important issue in computer vision. Work on

object matching similar to the task the current work

attempts to deal with, is the one presented by (Flusser,

1995). This article present the work on matching two

sets of objects, which may differ in translation, rota-

tion and scale. Aiming in accurate matching, local

information (set of invariant features) and object-to-

object distances on the plane are used. Also, match-

ing likelihood coefﬁcients are introduced to indicate

the correspondence between objects. Work on shape

matching and later object recognition was conducted

by (Berg et al., 2005). Using geometric blur point

descriptors and geometric distortion between the cor-

responding feature points, they calculate the aligning

transformation that results in solid shape matching.

Object matching using locally afﬁne-invariant con-

strain conducted by (Li et al., 2010). The idea behind

their work is that each point can be represented by an

afﬁne combination of its neighbour points. (Jiang and

Yu, 2009) proposed a linear formulation that ﬁnds fea-

ture points correspondences and the geometric trans-

formations.

Regarding grasping objects, an extended body of

work has been performed. (Rietzler et al., 2013) pre-

sented a grasping method that takes into considera-

tion constrains established by both local shape and

acted by the object. A combination of human in-

put and automatic grasping technique is introduced

by (Ciocarlie and Allen, 2008). They created a sys-

tem that is equipped with an automated grasp planner

capable of shaping the artiﬁcial hand accordingly to

the shape of the object that is aiming to grasp, let-

ting the user to complete successfully the task. An

other approach is introduced by (Miller et al., 2003).

They simpliﬁed grasping task by simplifying the ob-

jects and modelling them into sets of primitive shapes,

such as cylinders, boxes, spheres. As a result, sim-

pler objects and sets of rules allow the calculation of

grasping poses. Grasping objects in conjunction with

supervised machine learning introduced by (Saxena

et al., 2006). Their learning algorithm does not re-

quire 3D model of the object. The training is per-

formed on synthetic images set. In addition, (Detry

et al., 2012) proposed a grasping method that utilizes

a set of grasping examples and tries to match the cur-

rent view with them or with a part of them. In our

work no prior knowledge is available for any of the

objects, so training or comparing with a set of prede-

ﬁned grasping poses is not possible.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

468

The work of (Hsiao et al., 2009) introduced a

method for grasping objects using optical proxim-

ity sensors, located inside the ﬁngertips of the grip-

per. This system could be supplementary to exist-

ing grasping algorithms. A combination of different

object representations is conducted by (Brook et al.,

2011). Instead of using one representation of the ob-

ject in order to plan the grasping, all the available

representations are combined and the extracted infor-

mation is used to plan grasping accordingly. More-

over, efﬁcient grasping was presented by (Jiang et al.,

2011). Their technique derives information from

RGB-D images (normal RGB images that also con-

tain depth information). Firstly, space not suitable

for grasping is excluded and the remaining is tested

with advanced features until the best one is detected.

Inspired by human actions is the work conducted by

(Dogar and Srinivasa, 2011). They used a library of

actions inspired of actions that humans perform while

grasping objects in cluttered environment (i.e. rear-

ranging clutters).

Regarding object placing, (Schuster et al., 2010)

developed an algorithm to detect clutter-free planes

were objects can safely be placed. Orientation is also

essential, thus (Fu et al., 2008) based on geometrical

features of the objects they reduced the dimensional-

ity of the orientation to a set of possible orientations

that are suitable for an object. Their algorithm fo-

cused on deriving the upright orientation for proper

placing on ﬂat areas. Also, (Saxena et al., 2009),

focused on deriving object orientation. Their algo-

rithm could extract object’s orientation from a sin-

gle image. (Glover et al., 2012) calculate the pose of

an object using sets of local features on partial point

clouds. Additionally, (Kouskouridas and Gasteratos,

2012) proposed a method that takes into account both

geometrical and appearance based characteristics in

order to extract reliable 3D pose of an object. Inter-

action between human and robot that places objects

was introduced by (Edsinger and Kemp, 2006). The

human passes an object to the robot that afterwards

places it on a shelf. Placing task utilizes force control

that leads to a gentle execution release on the shelf.

(Toussaint et al., 2010) integrated planning, control,

reasoning for placing objects located on ﬂat surfaces

into stacks.

3 DEVELOPED SYSTEM

The complete system developed in a lab environment

comprises of a Universal Robots UR10 robotic arm, a

Robotiq 3-ﬁnger adaptive robot gripper, and a Prime-

sense Carmine short range sensor. The Carmine sen-

Figure 2: Complete lab setup. The 7 considered objects

can be seen at the lower-right part of the picture, while the

kitting box on the left.

sor is located on the arm and before the gripper in an

eye-in-hand conﬁguration. The lab setup of the inte-

grated system can be seen in Fig. 2.

In order to place any object into its matching com-

partment, we need to initially perform object detec-

tion both for the objects and for the compartments of

the kitting box. Then, we need to perform matching

between the two of them. However, the two images,

of the kitting box and of the objects, do not depict the

same physical entities and as a result matching is not

straightforward. Actually, the only common charac-

teristic between the holes and the objects is their outer

shape. Hence, the contours of all objects and holes are

extracted and compared. Next, matching is performed

on the contours. These correspondences between ob-

jects and holes are used as an input for the ﬁnal step,

planning and performing grasping and placing. An

overview of the structure and ﬂow of the developed

algorithm is shown in Fig. 3.

3.1 Visual Detection and Matching

The used RGB-D sensor generates depth images,

which are not sensitive to shading and changes of the

lighting conditions as RGB images are. Hence, as

a start, we use the depth images to perform robust

edge detection. Even without any preprocessing of

the depth images the edges of the objects were quite

clear, containing however some noise, as can bee seen

in Fig. 4(left). We then applied a dilatation-erosion

technique to remove most of the noise. Nevertheless,

some limited noise still existed after this “cleaning”

step, as shown in Fig. 4(right). There was just a slight

improvement.

The next step is to distinguish the objects and the

holes in the edge images. Examining the input images

Vision-based Robotic System for Object Agnostic Placing Operations

469

Figure 3: Algorithm ﬂow chart.

Figure 4: Edges in a typical image detected by Canny (left)

and the same edges after applying dilation-erosion (right).

of Fig. 4(right) one can notice that most of the noise

belongs to the supporting plane where the objects or

the placing box are placed. However, this area is of

no interest from our purpose and can be ignored.

Due to the placing of the camera on the gripper,

the latter is always visible in the captured images and

point clouds. In order to get rid of the part of the point

cloud belonging to the gripper itself, any point closer

than 55 cm from the sensor is removed. Furthermore,

all points more than 15 cm farther than the ﬁrst de-

tected object are removed. as well. As a result, we are

left with a truncated point cloud both for very close

and for very distant objects. Of course, the aforemen-

tioned values were chosen based on the speciﬁc geo-

metric characteristics of our lab setup and should be

adapted accordingly in different environments.

Then, the depth image is ﬁltered, so as to keep

only edges belonging to objects present in the ﬁltered

point cloud. Areas where no information about the

point cloud exist (e.g. the homogeneous outer area

of Fig. 5(left)) are also ﬁltered from the depth im-

age. The result as shown in Fig. 5(right) does not

contain much irrelevant information and the objects

Figure 5: The ﬁltered point cloud (left) and the ﬁltered de-

tected edges on the depth image (right).

are clearly depicted.

The next step, after removing noise is to isolate

and group the detected edges in contours. The algo-

rithm by (Suzuki et al., 1985) is used for that pur-

pose as it is implemented in the OpenCV library. All

objects are detected, but as can be seen in the up-

per part of Fig. 5(right), some edges—owed to the

depth discontinuities in the boundaries of the sup-

porting surface—occur some times. The spots in the

middle of each contour are their respective centers of

mass (Fig. 5(right)). One can notice that the spurious

edge in the upper part of the images also gets a center

of mass assigned.

We are using image moments to perform the

matching between the contours belonging to objects

and holes. Moments, in general, are widely used to

describe images or shapes. Direct shape matching

uses Hu invariants to compare shapes. We use this

technique as implemented in OpenCV. The Hu mo-

ments are scale invariant. However, this is not help-

ful when trying to distinguish between similar objects

that are different in size. Therefore, an additional ﬁl-

ter regarding the size of the detected contour area was

added to the matching process. As a result, matching

between object and corresponding holes in the kitting

boxes is established, as can be shown in Fig. 6.

Figure 6: Matched objects and box-holes.

3.2 Grasping & Placing

After matching the objects to the appropriate holes,

the arm grasps and places them accordingly. When

the RGB-D sensor is looking at an object vertically

the precision is better. The positions of the objects are

inferred from their calculated centers of mass. The

arm moves above a randomly selected object to get

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

470

a close vertical view and, as a result, a more precise

estimation of its pose.

As we assume no prior knowledge about the ob-

jects, no predeﬁned grasping poses are known to the

system. Our approach is to grasp the object perpen-

dicular to its principal direction, thus maximizing the

grasping surface. When the RGB-D sensor gets a

more precise view of the object it also calculates its

two-dimensional orientation on the supporting plane.

In order to acquire that information, principal com-

ponent analysis (PCA) is performed on the detected

contour of the observed object and the principal di-

rection is extracted.

After acquiring more precise information about

the center of mass and the orientation of the object,

the arm grasps it and releases it again. In most of the

cases during this initial grasping, the object moves

and rotates a bit. The reason for this is that the ex-

tracted center of mass and orientation are rarely per-

fect, but even if they are, the ﬁngers of the gripper can

slide into an object’s cavity or on a non vertical side.

The result of this is a change of the object’s pose.

Afterwards, a second observation above the object

is performed (using the exact same pose that was uti-

lized to acquire the ﬁrst measurement) and extracts

the new center of mass. The gripper grasps the object

with the same pose as before. This strategy decreases

the possibility of moving the object once again and

measures the misplacement that occurred after the

ﬁrst grasping attempt. The measured change of the

position (center of mass misplacement) between the

two grasping attempts will be used later during the

placing of the object.

In the following ﬁgures (Fig. 7) the output of the

PCA algorithm applied on the detected contours of

an object before and after the ﬁrst grasping attempt is

displayed. It is visible that one axis (the secondary

one) has opposite direction even though it is the same

object slightly rotated. At that point, this has no ef-

fect due to the fact that PCA is used just once before

grasping the object for the ﬁrst time, in order to get

the orientation that the gripper will use so to grasp the

object. Thus, the change on the orientation before and

after grasping for the ﬁrst time does not matter at that

stage.

Figure 7: Principle Components Analysis (PCA) performed

before (left) and after (right) the ﬁrst grasping attempt.

When the picking task is completed and the mis-

placement after the ﬁrst grasping is known, the arm

moves to the corresponding box-hole to place the ob-

ject. What is important to know for that step is the

precise placement position and orientation of the cor-

responding hole.

The arm moves directly above the matched box-

hole (using the center of mass of the detected con-

tour). This step is necessary in order to get a new

more precise observation of the hole. The newly de-

tected contour of the box-hole is used so to extract a

precise center of mass at this point. Moreover, the cal-

culated misplacement after the ﬁrst grasping is taken

into consideration when the placing process is per-

formed. An important note, is the fact that the axes

while grasping and placing are not the same (Fig. 8).

There is a rotation of -90

◦

(clockwise), hence the esti-

mated misplacement on the X-axis should be consid-

ered in order to make a correction on the Y-axis during

placing. Accordingly, the estimated misplacement on

the Y-axis should be considered when placing on the

X-axis.

Figure 8: Comparison between camera axis during grasping

and placing task.

As it is already stated, precision is highly impor-

tant during placing. The gap between an object and

the corresponding box-hole, for our tested objects and

kitting box, is small (less than 5 mm). Apart from

the precise position, the exact orientation of the box-

hole needs to be also calculated. We reﬁne the ini-

tial coarse orientation using the Iterative Closest Point

(ICP) method.

The ICP algorithm aligns the captured point

clouds of the object and of the hole. The object point

cloud is the source and box-hole point cloud the ref-

erence. Both of them were ﬁltered on all three axis to

reduce the processed data.

However, the two point clouds do not have parts in

common (Fig. 9) and even though they look similar to

each other, they are in practice complementary. Thus,

ICP can not be applied directly on them.

As solution to the this problem, artiﬁcial point

Vision-based Robotic System for Object Agnostic Placing Operations

471

Figure 9: The box-hole (left) and the object (right) point

clouds input for the ICP algorithm.

clouds are generated in order to provide input to ICP.

The contour of both the object and the box-hole are

similar, thus the new point clouds are generated using

the points of the detected contours in both cases. This

results in clean point clouds with no noise, where the

detected contours are clearly visible (Fig. 10). Ap-

plying ICP on these new point clouds provides good

results concerning rotation. The output translation is

not taken into consideration, only the rotation trans-

formation is necessary. It is important to note that

before applying ICP no transformation is applied to

the source point cloud (object). Thus, there might be

cases in which ICP “locks” in local minima, provid-

ing wrong angle.

Figure 10: The box-hole (left) and the object (right) artiﬁ-

cial point clouds input for the ICP algorithm.

4 SYSTEM EVALUATION

The developed system has been tested in order to

study its performance and explore its limitations. All

following tests were performed on a single placing

kitting box (Fig. 11(top left)) and a speciﬁc set of

seven objects (Fig. 11(top right)). The placing box

contains holes that are shaped exactly as the objects,

the only difference being that the holes are slightly

larger allowing a 4-5 mm gap around placed objects

(Fig. 11(bottom)). The set of objects was chosen and

designed in order to ensure diversity and increasing

shape complexity. Two of the objects have simple

shapes (objects 3 and 6 in Fig. 11(top right)), two are

slightly more complex but similar to each other (ob-

jects 4 and 7 in Fig. 11(top right)), and three have ﬁne

details and complicated geometry (objects 1, 2 and 5

in Fig. 11(top right)). Even if we consider the cho-

sen set of objects challenging—especially since the

kitting box compartments are tight— it is dangerous,

if not completely mistaken, to consider it as indica-

tive of the vast majority of industrial parts handled

by robots. However, this work aspires to show that

within some limits a robotic system can be ﬂexible

enough to handle diverse sets of previously unknown

objects.

Figure 11: The objects (top right) placed (bottom) into the

placing box (top left).

The placing box and the objects are located in dif-

ferent positions in the working space and with ran-

dom orientations. We performed 50 iterations of ﬁll-

ing the kitting box with the whole set, resulting in a

total of 350 individual matching, picking and plac-

ing sequences. Notes were taken during each attempt

in order to extract useful information concerning the

system and its performance. In the rest of this work

we will refer to the objects with their assigned num-

bers, as shown in Fig. 11(top right).

The operation of the developed system can be

seen in a captured video showing the placing of the

whole set of objects once

. During the tests, both

the holes of the placing box and the objects were

always detected correctly. This fact ensures a good

starting point for the performance of the matching

algorithm. Out of the 350 matching attempts that

were performed, only 16 resulted in mismatches, i.e.

we achieved 95.43% success rate. Out of these 16

mismatches, 9 were on object No.2, 5 were on ob-

ject No.5, one on No.1, and another one on No.7.

This means that more than half (56.25%) of the mis-

matches were on object No.2 and almost one third

(31.25%) on No.5. Objects No.2 and No.5 were

among the smallest ones, and also are very detailed

around their perimeter. Due to that level of detail

and their small size, the detected contours were not

always sharp, thus making matching difﬁcult and in

some cases erroneous. These results are summarized

in Table 1.

https://youtu.be/UlqnO0-YZQw

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

472

Table 1: Matching algorithm performance.

Correct matches

Wrong matches

Total

Object Occurences

334

No.2 9

350

No.5 5

No.1 1

No.7 1

To perform the picking and placing tasks, an exter-

nal path planner was employed (we used the MoveIt!

library) for planning the path of the robot arm avoid-

ing collisions. However, in certain cases the path

planner was failing to converge or to provide a valid

path. More precisely, out of a total of 350 attempts

the planner failed to provide a satisfactory path 72

times. This means that only 79.43% of all attempts

were accomplished without any planner relevant er-

ror. While we have marked those attempts as failures,

they are not directly concerned with our developed

pipeline. Such problems could be avoided by trying

alternative path planners or by properly parametrizing

them. However, this work falls outside our scope and

we consider it as a possible future extension.

When assessing the overall performance of the

system, we do not consider the failures caused by

the planner. Only errors directly related to our de-

veloped systems are taken into account. Furthermore,

only objects that are fully into the appropriate box-

hole are measured as successful placings. Objects

that are partially into the hole (e.g. having one cor-

ner outside the hole) are considered as unsuccess-

ful attempts. Under these assumptions, the overall

performance measured is 76.26%, or 212 successful

placing attempts out of 278. When excluding the at-

tempts where matching was erroneous—thus evalu-

ating the pick and place modules alone— we get a

better success rate of 80.92%, or 212 successful plac-

ing attempts out of 262. Erroneously calculated plac-

ing angle was the reason for not precise object place-

ment in 9 attempts. The wrong angle is due to the fact

that ICP “locked” in a local minimum and calculated

wrong rotation for the object. In more than half of

these misplacements (5 out of 9, or 55.56%) the in-

volved object was No.7, twice No.6, and also twice

No.4. These results are summarized in Table 2.

Furthermore, apart from angle miscalculation,

other misplacements occurred due to not precise po-

Table 2: Overall system performance.

Assessed Successful Total Success

Modules attempts attempts rate (%)

Match, Grasp, Place 212 278 76.26

Grasp, Place 212 262 80.92

sitioning. There were 41 attempts that did not end up

with a nice placement and in all cases the error was

less than 3 mm. Out of 41 misplacements 13 were

about object No.7 (31.71%). This object can be char-

acterized as tricky regarding grasping. Moreover, 8

out of 41 attempts were about object No.2 (19.52%).

As it was mentioned regarding mismatching, object

No.2 has a high level of detail and small size. This

makes it difﬁcult to extract rigid and detailed con-

tours, and as a consequence the center of mass could

not be precise enough for a good placing. It can be

observed that more than half of the misplacements

(51.23%) due to not precise position happened on ob-

jects No.7 and No.2.It is worth mentioning that object

No.3 (the cylinder) was never misplaced. Of course,

the placement angle in this case is not an issue but also

the lack of details and the relatively big size of it (it

was among the biggest objects) led to clear contours

and as a result higher precision for the calculated cen-

ter of mass. These results are summarized in Table 3.

Table 3: Object misplacement analysis.

Object Misplacements %

No.7 13 31.71

No.2 8 19.52

No.6 6 14.63

No.1 5 12.2

No.4 5 12.2

No.5 4 9.76

Total 41 100

5 DISCUSSION AND

CONCLUSIONS

Our aim was to test the ability of an industrial robotic

system to manipulate unknown objects within an un-

known environment. The developed system indeed

fulﬁlled its purpose, by placing objects using a robotic

arm and exploiting computer vision, with an accept-

able success rate. Even though there was no prior

knowledge about the objects, it detected and matched

them to the corresponding box-holes with a good suc-

cess rate.

Regarding edge detection, Canny edge detection

algorithm was used due to its robustness throughout

the development of the robotic system. Applying it

on the depth image instead of the RGB one, was a

decision made by keeping in mind that the system

is aiming industry, thus robustness here also, is ex-

cessively important. Shape matching using OpenCV

performed well, resulting in good success rate during

testing. Apart from difﬁculties related to the match-

Vision-based Robotic System for Object Agnostic Placing Operations

473

ing task, the low ability of the used RGB-D sensor

to distinguish all the details around objects’ contour

led occasionally to not precise enough estimation of

the center of mass, hence increased misplacements.

Also misplacements occurred due to erroneous plac-

ing orientation that derive from the drawback of the

ICP algorithm to “lock” occasionally in local minima.

Nonetheless, ICP performed better than the tested

sole PCA algorithm. On the other hand, PCA per-

formed well in the grasping module. It provided in-

formation that allowed picking objects even though

no prior knowledge about them was available. How-

ever, the designed objects proved to be relative small

for the bulky gripper that was used, hence grasping in

some occasions was problematic. Finally, it is notable

the fact that the planner is responsible for several fail-

ures and cancellations during the testing process. It

does not affect the results because it did not consid-

ered as failure of the developed algorithm but it is an

important issue that lowers the robustness of the sys-

tem.

In order to improve the developed system there are

changes that could be applied to all its different mod-

ules. In order to deal with the issues owed to the plan-

ner, one could perform checks on the estimated tra-

jectories before executing them. These checks could

either consider the time that is needed for the trajec-

tory to be executed (too short or too long times should

be rejected), or on the total distance that the trajectory

indicates. Also, carefully applying constrains to each

joint separately can improve the situation.

Larger objects while using the current gripper will

make grasping easier and more solid. In case the sys-

tem has to manipulate small objects, a replacement

of the gripper with one designed for smaller objects

would deﬁnitely help. Of course a more generic so-

lution would be the use of a tool changer. Informa-

tion extracted by the RGB-D sensor provide enough

evidence for each object’s size in order to decide re-

garding the proper gripper that is more suitable. Thus

switching end effector during the process will con-

clude in a more robust system. One gripper can not

be perfect for all objects.

Regarding grasping, an additional algorithm that

automatically generates grasping poses that lead to

solid grasping can make much of a difference. This

additional algorithm is necessary, in order to have a

system which does not require prior knowledge. Also,

an extension of our double-grasping strategy could be

an iterative process of grasping the object and releas-

ing it multiple times until the misplacement is within

a threshold. This will increase the possibilities for not

moving the object during the last grasping attempt be-

fore it moves above the box-hole.

Placing can become more robust using force

sensing—the so called guarded motions. Even if the

placing pose is not perfectly calculated, the arm could

slide the object around the estimated position and us-

ing different orientation that are close to the estimated

one while having access to the forces that the robot

senses. When these forces are minimized the object

has the proper pose for placing. However, this would

require a solid placing box that is rigidly attached on

the working plane. Furthermore, placing could be im-

proved by using visual servoing (vision based robot

control). Such solutions would transform the devel-

oped system from an open control loop to a closed

loop one.

Lastly, a more precise sensor will deﬁnitely boost

the accuracy of the center of mass that is calculated

for both objects and holes. Taking into account the

fact that this system is aiming industry, testing also

other sensors is a necessity. Parts that are used in in-

dustry are mostly metallic, hence reﬂective. This will

decrease signiﬁcantly the applicability of the current

sensor.

ACKNOWLEDGEMENTS

This work has been supported by the European Com-

mission through the research project “Sustainable

and Reliable Robotics for Part Handling in Manu-

facturing Automation (STAMINA)” (FP7-ICT-2013-

10-610917) and by the Innovation Fund Denmark

through the research project “Mini-Picker”.

REFERENCES

Berg, A. C., Berg, T. L., and Malik, J. (2005). Shape match-

ing and object recognition using low distortion corre-

spondences. In Computer Vision and Pattern Recogni-

tion, 2005. CVPR 2005. IEEE Computer Society Con-

ference on, volume 1, pages 26–33. IEEE.

Brook, P., Ciocarlie, M., and Hsiao, K. (2011). Collabo-

rative grasp planning with multiple object represen-

tations. In Robotics and Automation (ICRA), 2011

IEEE International Conference on, pages 2851–2858.

IEEE.

Ciocarlie, M. T. and Allen, P. K. (2008). On-line interactive

dexterous grasping. In Haptics: Perception, Devices

and Scenarios, pages 104–113. Springer.

Detry, R., Ek, C. H., Madry, M., Piater, J., and Kragic, D.

(2012). Generalizing grasps across partly similar ob-

jects. In IEEE International Conference on Robotics

and Automation.

Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A., Hebert,

M., et al. (2009). An empirical study of context in ob-

ject detection. In Computer Vision and Pattern Recog-

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

474

nition, 2009. CVPR 2009. IEEE Conference on, pages

1271–1278. IEEE.

Dogar, M. and Srinivasa, S. (2011). A framework for push-

grasping in clutter. Robotics: Science and Systems

VII.

Edsinger, A. and Kemp, C. C. (2006). Manipulation in hu-

man environments. In Humanoid Robots, 2006 6th

IEEE-RAS International Conference on, pages 102–

109. IEEE.

Fisher, M. and Hanrahan, P. (2010). Context-based search

for 3d models. In ACM Transactions on Graphics

(TOG), volume 29, page 182. ACM.

Flusser, J. (1995). Object matching by means of matching

likelihood coefﬁcients. Pattern Recognition Letters,

16(9):893–900.

Fu, H., Cohen-Or, D., Dror, G., and Sheffer, A. (2008).

Upright orientation of man-made objects. In ACM

transactions on graphics (TOG), volume 27, page 42.

ACM.

Glover, J., Bradski, G., and Rusu, R. B. (2012). Monte carlo

pose estimation with quaternion kernels and the bing-

ham distribution. In Robotics: science and systems,

volume 7, page 97.

Hsiao, K., Nangeroni, P., Huber, M., Saxena, A., and Ng,

A. Y. (2009). Reactive grasping using optical prox-

imity sensors. In Robotics and Automation, 2009.

ICRA’09. IEEE International Conference on, pages

2098–2105. IEEE.

Jiang, H. and Yu, S. X. (2009). Linear solution to scale

and rotation invariant object matching. In Computer

Vision and Pattern Recognition, 2009. CVPR 2009.

IEEE Conference on, pages 2474–2481. IEEE.

Jiang, Y., Moseson, S., and Saxena, A. (2011). Efﬁcient

grasping from rgbd images: Learning using a new

rectangle representation. In Robotics and Automa-

tion (ICRA), 2011 IEEE International Conference on,

pages 3304–3311. IEEE.

Koppula, H. S., Anand, A., Joachims, T., and Saxena, A.

(2011). Semantic labeling of 3d point clouds for in-

door scenes. In Advances in Neural Information Pro-

cessing Systems, pages 244–252.

Kouskouridas, R. and Gasteratos, A. (2012). Establishing

low dimensional manifolds for 3d object pose estima-

tion. In Imaging Systems and Techniques (IST), 2012

IEEE International Conference on, pages 425–430.

IEEE.

Li, H., Kim, E., Huang, X., and He, L. (2010). Object

matching with a locally afﬁne-invariant constraint. In

Computer Vision and Pattern Recognition (CVPR),

2010 IEEE Conference on, pages 1641–1648. IEEE.

Miller, A. T., Knoop, S., Christensen, H., Allen, P. K.,

et al. (2003). Automatic grasp planning using shape

primitives. In Robotics and Automation, 2003. Pro-

ceedings. ICRA’03. IEEE International Conference

on, volume 2, pages 1824–1829. IEEE.

Nalpantidis, L., Bj

orkman, M., and Kragic, D. (2012). Yes-

yet another object segmentation: Exploiting camera

movement. In Intelligent Robots and Systems (IROS),

2012 IEEE/RSJ International Conference on, pages

2116–2121. IEEE.

Rietzler, A., Detry, R., Kopicki, M., Wyatt, J. L., and Piater,

J. (2013). Inertially-safe grasping of novel objects. In

Cognitive Robotics Systems: Replicating Human Ac-

tions and Activities (Workshop at IROS 2013).

Sapna Varshney, S., Rajpa, N., and Purwar, R. (2009). Com-

parative study of image segmentation techniques and

object matching using segmentation. In Methods and

Models in Computer Science, 2009. ICM2CS 2009.

Proceeding of International Conference on, pages 1–

6. IEEE.

Saxena, A., Chung, S. H., and Ng, A. Y. (2008). 3-d depth

reconstruction from a single still image. International

journal of computer vision, 76(1):53–69.

Saxena, A., Driemeyer, J., Kearns, J., and Ng, A. Y. (2006).

Robotic grasping of novel objects. In Advances in

neural information processing systems, pages 1209–

1216.

Saxena, A., Driemeyer, J., and Ng, A. Y. (2009). Learn-

ing 3-d object orientation from images. In Robotics

and Automation, 2009. ICRA’09. IEEE International

Conference on, pages 794–800. IEEE.

Schuster, M. J., Okerman, J., Nguyen, H., Rehg, J. M., and

Kemp, C. C. (2010). Perceiving clutter and surfaces

for object placement in indoor environments. In Hu-

manoid Robots (Humanoids), 2010 10th IEEE-RAS

International Conference on, pages 152–159. IEEE.

Suzuki, S. et al. (1985). Topological structural analy-

sis of digitized binary images by border following.

Computer Vision, Graphics, and Image Processing,

30(1):32–46.

Toussaint, M., Plath, N., Lang, T., and Jetchev, N. (2010).

Integrated motor control, planning, grasping and high-

level reasoning in a blocks world using probabilistic

inference. In Robotics and Automation (ICRA), 2010

IEEE International Conference on, pages 385–391.

IEEE.

Viola, P. and Jones, M. (2001). Robust real-time object

detection. International Journal of Computer Vision,

4:51–52.

Xiong, X. and Huber, D. (2010). Using context to create se-

mantic 3d models of indoor environments. In BMVC,

pages 1–11.

Vision-based Robotic System for Object Agnostic Placing Operations

475