ValidNet: A Deep Learning Network for Validation of Surface

Registration

Joy Mazumder

1,3

, Mohsen Zand

1,3

, Sheikh Ziauddin

1,3

and Michael Greenspan

1,2,3

Department of Electrical and Computer Engineering, Queen’s University, Kingston, Ontario, Canada

School of Computing, Queen’s University, Kingston, Ontario, Canada

Ingenuity Labs, Queen’s University, Kingston, Ontario, Canada

Keywords:

Surface Registration, 3D Object Recognition, Validation, Shared mlp Network.

Abstract:

This paper proposes a novel deep learning architecture called ValidNet to automatically validate 3D surface

registration algorithms for object recognition and pose determination tasks. The performance of many tasks

such as object detection mainly depends on the applied registration algorithms, which themselves are suscep-

tible to local minima. Revealing this tendency and verifying the success of registration algorithms is a difﬁcult

task. We treat this as a classiﬁcation problem, and propose a two-class classiﬁer to distinguish clearly between

true positive and false positive instances. Our proposed ValidNet deploys a shared mlp architecture which

works on the raw and unordered numeric data of scene and model points. This network is able to perform two

fundamental tasks of feature extraction and similarity matching using the powerful capability of deep neural

network. Experiments on a large synthetic datasets show that the proposed method can effectively be used in

automatic validation of registration.

1 INTRODUCTION

The rigid registration of 3D surfaces is of central

importance to the processing of 3D data. Since

the initial introduction of the Iterative Closest Point

(ICP) Algorithm in the late 1990’s (Besl and McKay,

1992), there have been many variations proposed to

the basic approach to improve robustness, efﬁciency

and generality (Segal et al., 2009; Rusinkiewicz and

Levoy, 2001). Current research aims to address non-

rigid registration (Paiement et al., 2016; Yu et al.,

2017) and investigate machine learning approaches

(Schwarz et al., 2017; Pais et al., 2019).

The basic approach to rigid 3D surface registra-

tion applies local optimization, and therefore depends

upon initial conditions to resolve to a correct solu-

tion. In the case of ICP and its variants, an initial esti-

mate of the true rigid transformation needs to be pro-

vided which is “close enough” to the true solution,

lest the result be driven to a false positive (i.e., local

vs. global minimum). It is therefore expected that a

3D surface registration task will sometimes fail, by

returning a false positive. This failure mode is often

difﬁcult to recognize and diagnose, as the distribution

of the surface residuals (i.e., distances between cor-

responding surface points and/or surfaces) between

ValidNet

Registration

Valid or Not

Registration

Output

Model

Segment

Figure 1: Overview of ValidNet. We proposed a deep neu-

ral network that takes two raw point clouds (model and seg-

ment) from a registration technique as input and provides

validity of the surface registration.

false positives and true positives can be similar.

There are some systems where a human is in the

loop to inspect the results from a registration task, in

which case the occasional failure may be a nuisance,

but is tolerable. There are also other desired uses of

3D surface registration, however, where registration

failure is less tolerable, and can lead to a catastrophic

system failure. One such desired use is automated

robot grasping and manipulation, which are common

tasks in industrial automation (Mahler and Goldberg,

2017). In this critical application, false positives in

registration can result in damage to equipment, as the

Mazumder, J., Zand, M., Ziauddin, S. and Greenspan, M.

ValidNet: A Deep Learning Networ k for Validation of Surface Registration.

DOI: 10.5220/0008988103890397

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 4: VISAPP, pages

389-397

ISBN: 978-989-758-402-2; ISSN: 2184-4321

389

robot attempts to grasp a part that was incorrectly

identiﬁed and/or localized. Even small error rates of a

few percentage points, which are commonly accepted

in many general approaches, can be unacceptable in

this automation process.

This paper proposes a new approach for automatic

validation of registration algorithms. In our method,

we focus on the validation of 3D registration results

for rigid parts. A 3D segmentation algorithm is ini-

tially used to extract individual object instances from

a scene. A registration algorithm is then used to lo-

calize object hypotheses in the scene. Each hypoth-

esis is thus a segmented object matched with the ob-

ject model. Our algorithm categorizes each hypothe-

sis as a true positive or false positive. In particular,

the raw points of each segment and the object model

are considered as the inputs to the network. The pro-

posed network, called ValidNet, applies the powerful

capability of a Deep Neural Network (DNN). This

network can effectively extract required features and

match similarities between them. Thus, this archi-

tecture can learn to extract features and classify val-

idation hypotheses from the raw numeric data. The

overview of the proposed method is shown in Fig-

ure 1.

Intuitively, a reliable validation for surface reg-

istration can be useful for general pose determina-

tion. However, the proposed method is evaluated

for the time-sensitive application of robotic bin pick-

ing to emphasize its efﬁciency in real-time scenar-

ios. Some samples of real and synthetic 3D images

of bin picking scenes that are representative as input

to the system are illustrated in Figure 2. It is well

known that a large number (typically thousands to

millions) of image data are required to train DNNs,

and such a volume of ground truth datasets of real

images are not available for this task, and would be

expensive to generate. We therefore use a simple

method to generate synthetic datasets for the bin pick-

ing problem, as illustrated in Figure 2b. Similar to the

real-world data, the generated synthetic datasets con-

tain occlusion, clutter, noise, and missing information

(dropouts), and ground truth samples can be quickly

and easily generated in high volume for training pur-

poses.

2 RELATED WORK

Object recognition has been a major research direc-

tion in computer vision for many years. Its perfor-

mance is crucial in various application scenarios such

as image classiﬁcation, image retrieval, scene under-

standing, visual tracking, and robot grasping and ma-

nipulation (Aldoma et al., 2012). A popular approach

is to transform the given 3D model to the scene, and

look for an accurate alignment of the model with the

objects in the scene (Papazov and Burschka, 2010;

Mian et al., 2006). Each object is ﬁnally recognized

if it is accurately aligned with the model. The ob-

ject recognition problem therefore depends upon sur-

face registration where object models are matched to

each object, and hypotheses veriﬁcation is required to

reject the false detections. Although there are many

registration algorithms (Myronenko and Song, 2010;

Segal et al., 2009), this ﬁnal validation stage remains

a challenge, as it has proven difﬁcult to automati-

cally determine whether registration algorithms such

as ICP have succeeded or failed. Despite the ubiquity

and importance of surface registration in 3D process-

ing, there has been little previous work at registration

validation.

Early methods for validation focused on visual as-

sessment of the registration results. This was per-

formed by viewing image pairs and comparing their

contour overlays, alternate pixel displays, anatomi-

cal landmarks, or analytical models (Guehring, 2001;

Schnabel et al., 2001; Rogelj et al., 2002; Pappas

et al., 2005). These techniques were mostly based

on algorithms such as template matching and pat-

tern classiﬁcation in 2D space. They were also

application-dependent. In industrial inspection, for

instance, (Guehring, 2001) compared the registered

dataset with its CAD description to verify the vali-

dation of a measured part. To this end, a weighted

pose estimation was used to determine the uncer-

tainty in the computed deviation. In medical imag-

ing, Schnabel et al. (Schnabel et al., 2001) proposed

to model tissue properties biomechanically for Mag-

netic Resonance (MR) mammography images. They

constructed ﬁnite element breast models to obtain the

average displacement of the whole breast volume.

A validation method was also presented in (Pappas

et al., 2005) to assess the CT–MR image registration

accuracy locally in all volume regions based on the

correspondence analysis of cortical bone structures on

the original images.

One interesting approach is to consider one hy-

pothesis at a time and use theresholding for veriﬁca-

tion (Mian et al., 2006; Bariya and Nishino, 2010;

Papazov and Burschka, 2010). For example (Mian

et al., 2006) converted the point cloud of a scene into

a triangular mesh and constructed a tensor from ran-

domly selected vertices of this mesh. They ranked a

set of hypotheses according to a similarity measure

between the scene tensor and the tensors of the 3D

models in the library. The validation of each hypothe-

sis was achieved by transforming the 3D model to the

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

390

(a)

(b)

Figure 2: Synthetic data compared to the real-world data. (a) shows two real data samples and (b) represents two synthetic

samples which are obtained using our data generation procedure.

scene. (Bariya and Nishino, 2010) used a threshold

set on the number of supporting correspondences and

obtained a collection of hypotheses for each model.

The ﬁnal hypothesis selection was based on the cal-

culation of overlap between the model and the scene.

(Papazov and Burschka, 2010) introduced an accep-

tance function for pruning hypotheses and validating

the model-scene correspondences. They used a hash

table to rapidly retrieve oriented scene and model

points. For each retrieved model point set, a rigid

transform aligning model and scene points was then

computed. The result was a hypothesis that the model

was present in the scene. The validity of this hypoth-

esis was ﬁnally evaluated by the acceptance function.

Instead of considering each model hypothesis indi-

vidually, (Aldoma et al., 2012) took into account the

whole set of hypotheses as a global scene model by

formulating this problem in terms of the minimization

of a global cost function.

In (Yang et al., 2015), the authors proposed a

global optimal solution to 3D registration by search-

ing the entire 3D rigid transformation space. They

also deﬁned upper and lower bounds for the registra-

tion error function (L2 error metric deﬁned in ICP).

However, their method was based on a branch-and-

bound search for the entire space, which was compu-

tationally expensive.

Instead of searching a huge space, we propose to

use a 3D segmentation model to obtain the object hy-

potheses. Thus, the whole object shape is preserved in

the corresponding descriptor. Segmented objects hy-

potheses in the scene will be obtained by applying a

registration algorithm using the object model. These

are the inputs to our ValidNet for validation. In par-

ticular, two main steps which are also common in the

other methods, i.e., feature extraction and similarity

matching are effectively modeled by a deep network

similar to PointNet (Qi et al., 2017) which is a lead-

ing approach that worked directly on the raw point

clouds as inputs. Deep learning strategies are able to

induce spatial–contextual features from the 3D geo-

metric data such as point clouds. Thus, the important

features can be captured using a set of learnable ﬁlters

at training time. Likewise, we employ an unordered

list of points and use a deep network to extract the

local and global features. In particular, we rely upon

the feature engineering of PointNet and use both clas-

siﬁcation and segmentation architectures in order to

effectively extract the local and global features. Sim-

ilarity matching, however, is another essential step,

and we additionally propose to use the dot product

and the max-pooling strategies for obtaining the high-

est similarity scores.

3 ValidNet FOR REGISTRATION

VALIDATION

A point cloud is a set of 3D points and can be repre-

sented by P

|i = 1, . . . , N where N is the total number

of points. Each P

is a vector that contains the position

of the i-th point. It may include any other useful infor-

mation such as color, intensity, or normal, although in

this work, we consider only the positional information

of each point.

Suppose we are given two point clouds as M and

S denoting a model and a segment in the scene, re-

spectively. We use m and n to respectively represent

the number of points in M and S. The transforma-

tion matrix that aligns M to S is found by the reg-

istration algorithm, and is denoted by T . Notably,

the segments are generated by applying a segmenta-

tion algorithm such as region growing (Rabbani et al.,

2006; Vo et al., 2015). Segmentation can beneﬁcially

reduce the search space and make the registration pro-

cess faster.

The objective of validation is to quantify how well

the transformed model TM can represent the segment

S, or equivalently how well the inverse transformed

segment T

−1

S can represent the model M. The bene-

ﬁt of transforming the segment to the model through

−1

S, is that this maps the segment to the same

canonical model pose, so that we do not have to con-

sider different model poses, which reduces and sim-

pliﬁes network training. In other words, the network

ValidNet: A Deep Learning Network for Validation of Surface Registration

391

Similarit

y matrix,

X (m*n)

mlp

(1024-1024-

256)

mlp shared

(64,64)

mlp shared

(64,128,1024)

Maxpool

output

model

segment

1024

mlp shared

(512,256,128)

1024

1088

128

Figure 3: ValidNet for registration validation. The left part is the feature learning network that learns the features of the model

and the segment and the right part is the similarity ﬁnding network that measures the similarities between these features and

provides the validation probability of registration success.

does not need to learn the translation and rotational

invariance properties since the segments and model

are mapped to the same position. Another point is

that our method only uses position characteristics of

the point clouds, and other information such as color,

intensity, or normal are not employed. This not only

allows inputs from different sensors (which only cap-

ture position information) to be applicable, but also

can lead to faster computations.

The main framework of the proposed Validnet net-

work is shown in Figure 3. It consists of two parts,

including feature learning and similarity ﬁnding net-

work. The inputs to the network are vectors of size

(m ∗ 3) + (n ∗ 3) representing the overall points of a

set of segment and model pair.

The feature learning network learns the positional

information and characteristics of each point for both

model and segment. In particular, the local and global

information are accumulated in this part. It is intu-

itively similar to the PointNet segmentation network.

Likewise, we use a multi-layer perceptron network to

aggregate the global information.

The feature learning network consists of shared

mlp layers, skip connections, and symmetric func-

tions. In this network, initially ﬁve shared mlp layers

operate on each point independently. They are also

shared between all the model and the segment points.

These shared mlp layers map all the model points

and the segment points into a high dimensional space.

Two symmetric functions are used on the remapped

point-sets to leverage the global characteristics of the

model and the segment, respectively. Here, the max-

pool is used as a symmetric function. The captured

global characteristics are invariant to the segment’s

and the model’s point order.

After modeling the global characteristics, as

shown in Figure 3, the global vectors are concatenated

to the output of the second mlp layer. Finally, another

set of shared mlp layers are applied on the concate-

nated outputs of the model and the segment points to

generate a new set of point features that contains both

local and global information.

The similarity ﬁnding network measures the sim-

ilarities between obtained features of both model and

segment. It consists of a vector dot product, a sym-

metric function, and a feed forward network. In or-

der to ﬁnd similarities between features learned in the

previous step, a similarity matrix X

m∗n

is calculated

where, each element x

i j

indicates the similarity be-

tween i-th model point and j-th segment point. If p

and q

represent feature vectors for i-th model point

and j-th segment point, respectively, then:

i j

= p

(1)

One essential characteristic is that if i-th model

point is accurately represented by any inversely trans-

formed scene point, a value along the i-th row will be

high. Hence, performing a max-pool along the row

preserves this information and makes the model in-

variant to scene input perturbation. A feed forward

network can ﬁnally classify this vector. As a result,

the output layer can provide the validation probability

of the registration algorithm. More speciﬁcally, the

outputs of the networks are two scores for two classes,

i.e. the positive (correctly registration) and negative

(incorrect registration) class. These scores represent

how accurately the registration method is able to ﬁnd

the transformation matrix.

4 EXPERIMENTAL EVALUATION

In this section, ﬁrst we will explain the data genera-

tion process for training and testing of our proposed

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

392

network. After that we will provide detailed exper-

imental results and comparison with two other tech-

niques. We will also present how ValidNet performs

under noisy conditions. Finally we will provide a vi-

sualization of our proposed network.

4.1 Dataset Generation

For training and testing of ValidNet, we generated

a large synthetic dataset containing 191,980 training

and 21,800 test samples. Our dataset generation pro-

cedure starts with modeling of synthetic bin, such as

might occur in a robotic bin picking task. In bin pick-

ing, the multiple objects in each bin are of a single

class, and are randomly arranged such that they ex-

hibit clutter and occlusions. The challenge is to be

able to determine the pose of each object to a high de-

gree of conﬁdence, so that automated robotic grasp-

ing will succeed. For a realistic representation of the

environment for objects falling in a bin, we used the

Bullet Physics library (Coumans, 2015) which is a

physics engine that simulates collision, soft and rigid

body dynamics. Using this library, we modeled var-

ious physical scenarios related to bin picking prob-

lem such as gravity, center of mass, and collision with

other objects. This results in a reasonable modeling of

real-world scenarios with clutter and occlusion in the

scenes. Using the Bullet Physics library, we generated

2500 synthetic scenes with varying number of objects

(min=1, max=15, mean=4.28), and divided this data

into 2,250 training and 250 test scenes. Using the ori-

entations returned by the simulator, we then converted

these scenes into a 2.5D point cloud data. Figure 2b

shows two sample scenes from our dataset.

Figure 4: Two sample outputs of region growing segmenta-

tion.

The next step in our data generation pipeline is ob-

ject segmentation. We used region growing segmenta-

tion (Rabbani et al., 2006) to segment objects from the

Figure 5: Output of RANSAC-based registration.

scene. Two sample outputs from region growing seg-

mentation are shown in Figure 4, where the top two

images are the synthetic scenes and the bottom two

images are the outputs of the segmentation after re-

moving the background. The resulting segments were

then ﬁltered to exclude those containing too many or

too few points.

Next we extracted FPFH (Rusu et al., 2009) fea-

tures from the model and the scene segments. These

features were then used to ﬁnd a RANSAC-based reg-

istration transformation (Buch et al., 2013) between

the model and the segment resulting in an initial pose

estimation for the segment. A few sample outputs

from this step are shown in Figure 5. The model and

the segment are shown by the red and white points, re-

spectively. Note that the bottom left image indicates

an under segmentation case where two objects clus-

tered into a single object.

After RANSAC-based registration, we built a

dataset of 9599 training and 1090 test segments. To

generate more data for better training, we augmented

our dataset by introducing perturbations to the initial

poses of segments. For each segment, we generated

20 new poses by adding small translations and rota-

tions controlled by two Gaussian functions (mean=0,

σ=3 mm for translation, mean=0, σ=0.06 rad. for ro-

tation). This increased the training and test data sizes

to 191,980 and 21,800 samples, respectively.

The next step in the data generation pipeline is to

assign ground-truth labels to the generated segments.

Validation of registration results is a binary classiﬁ-

cation problem where the positive class is the one

for which registration was successful and the nega-

ValidNet: A Deep Learning Network for Validation of Surface Registration

393

tive class indicates unsuccessful registrations. To as-

sign a class label to each segment, we ran the ICP

algorithm (Besl and McKay, 1992) on each model-

segment pair and used the ICP output to assign the

class label. If the centroids and principal axes of the

model and the segment were within some predeﬁned

threshold, we labeled it as a positive class segment,

otherwise a negative class one. We used three differ-

ent thresholds corresponding to three different preci-

sion levels as shown in Table 1.

After that, we aligned the segment to the model’s

initial pose using the transformation matrix returned

by ICP. Finally, before feeding into the network, we

randomly sampled all the segments and models to a

ﬁxed number of points. In our experiments, we used

m = n = 1, 024 points each for both the model and the

segment.

4.2 Training Process

For ValidNet training, we used ReLU activation func-

tion on all the shared mlp layers and fully connected

layers. For optimization, we used Adam optimizer

with a cross-entropy loss function. We used a batch

size of 8 and an initial learning rate of 0.000001 which

was reduced by a factor of 10 every 5 epochs. The net-

work was trained for 60 epochs on a single Amazon-

AWS GPU instance (≈ 8 hours on Tesla V100 GPU).

A dropout of 30% was used only for the fully con-

nected layers.

4.3 Results

In order to evaluate the effectiveness of our proposed

method, we calculated overall accuracy and average

class accuracy on all test samples in our dataset. We

also compared ValidNet results with that of Papazov

et al. (Papazov and Burschka, 2010) and PointNet (Qi

et al., 2017). PointNet is a well-known deep learning

model for point cloud classiﬁcation. In order to use

PointNet for registration validation, we stacked two

input point clouds (segment and model) before pass-

ing to the PointNet network, as shown in Figure 6.

The results in terms of overall accuracy and aver-

age class accuracy for three different levels of match-

ing precision are shown in Table 2. Firstly, it can be

seen that different levels of matching precision does

not have any signiﬁcant impact on the performance

of competing methods which shows that the meth-

ods adapted well to different thresholds. Secondly,

the results show that ValidNet outperforms Papazov

et al. and PointNet for both evaluation criteria and

all three precision levels. We believe that the major

reason for ValidNet’s better performance over Point-

Net is that the characteristics learned by ValidNet’s

feature extraction network is the combination of local

and global characteristics of a point cloud whereas

PointNet captures only global characteristics. For a

small misalignment between the model and the seg-

ment, the overall global information does not change

that much which makes it difﬁcult for PointNet to

classify those cases correctly. On the other hand, the

local characteristics captured by ValidNet change sig-

niﬁcantly resulting in a better classiﬁcation accuracy.

Figure 7 shows some sample inputs for which

ValidNet successfully validated the registration re-

sults. Figure 8 shows some results where ValidNet

failed to validate correctly. By analyzing failure cases

for ValidNet, we noticed that most of the misclassi-

ﬁcations occur near the threshold level. The thresh-

olds we used to separate two classes (based on dis-

tance between centroid and angle between principle

axis) are relatively small making it hard even for hu-

mans to differentiate between two classes resulting in

most misclassiﬁcations near the threshold. In particu-

lar, the classiﬁcation is determined by comparing the

results against the ground truth pose, which is subtle

and which may not be obvious in the human visual

comparison.

4.4 Robustness of ValidNet

Robustness against noise is an important property of

any machine vision system to cater for the unrelia-

bility of the sensor’s characteristics. We performed

an experiment to test the robustness of our proposed

method against noise. For each segment in the test

data, we added Gaussian noise on all the points inde-

pendently with mean 0 and speciﬁed standard devia-

tions in all three dimensions. The comparative results

are shown in Figure 9. Though accuracy of all tech-

niques suffer with increasing noise as expected, the

performance of ValidNet degrades more gracefully as

compared to that of PointNet and Papazov et al. As

noise level σ approaches 2, the discriminative abil-

ity of both PointNet and Papazov et al. reduces to

a random guess whereas ValidNet continues to per-

form well beyond that level as shown in Figure 9.

We believe that this greater robustness of ValidNet

against noise is due to its use of local characteristics

of points. For higher noise levels, points move farther

away from their original positions resulting in a dras-

tic change in the global shape of the object but the

change in points’ local characteristics remains rela-

tively less signiﬁcant.

We saw that ValidNet performed well when it was

trained on noiseless data and tested on noisy data. We

performed another robustness experiment to evaluate

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

394

Table 1: Different levels of matching precision.

Precision-level Threshold Class Distribution

Train Test

Translation (mm) Rotation (radian) Positive Negative Positive Negative

High 4 0.12 77572 114408 8777 13023

Medium 4.5 0.14 94699 97281 10666 11134

Low 5 0.16 109920 82060 12395 9405

mlp shared

(64,64)

mlp shared

(64,128,1024)

model

segment

1024

mlp

(1024-1024-

256)

Max

pool

output1024

Figure 6: PointNet for registration validation. It treats the model and the segment point clouds as a single point cloud, captures

their global relationship and provides the validation probability of registration success.

Figure 7: Some samples where ValidNet classiﬁed cor-

rectly. First row shows 3 images for positive class and sec-

ond row shows 3 images for negative class.

its performance when noise is added to training data

as well. The results are shown in Figure 10 . For both

ValidNet and PointNet, the reduction in accuracy with

noise is now much lower which is expected as the

networks have already seen noisy data during train-

ing. ValidNet again shows more robustness by out-

performing PointNet and Papazov et al. for all noise

levels.

4.5 Visualization of Results

To visualize how well ValidNet captures the required

matching information, we display activation output

Figure 8: Some samples where ValidNet failed to classify

correctly. First row shows 3 images for positive class and

second row shows 3 images for negative class.

of the max-pooling layer (after similarity matrix) as

a heat map in Figure 11. In the ﬁgure, registration

outputs along with heat maps of three samples each

for positive and negative classes are shown. The ﬁg-

ure shows two alternate heat map representations for

each segment. The ﬁrst one, that we call vector repre-

sentation, shows the activation output for 1024 points

sequentially in a vector (some width is added for bet-

ter visualization). The second one, that we call point

cloud representation, shows the activation output for

1024 points mapped to their locations in the input

point cloud segment.

The heat maps in Figure 11 shows clear differen-

tiation between the positive and the negative classes.

ValidNet: A Deep Learning Network for Validation of Surface Registration

395

Table 2: Performance comparison on our dataset.

High Precision Medium Precision Low Precision

Avg. Class Acc. Overall Acc. Avg. Class Acc. Overall Acc. Avg. Class Acc. Overall Acc.

Papazov et al. 64.57% 68.05% 65.502% 65.74% 65.89% 64.14%

PointNet 86.40% 86.95% 86.53% 86.60% 85.47% 86.63%

ValidNet 90.36% 89.47% 90.85% 90.81% 89.68% 90.11%

Figure 9: Effect of noise when trained on noiseless data and

tested on noisy data.

Figure 10: Effect of noise when noise is added on the train-

ing data as well

For the positive classes, the heat map is more towards

the yellow-red side (showing higher activation out-

puts) while for the negative classes, the heat map goes

more towards the blue region (showing lower acti-

vation outputs). The ﬁnal fully connected layers of

our network perform the ﬁnal classiﬁcation based on

these values.

5 CONCLUSIONS

We have presented a novel approach for the open

problem of automatic validation. It is a deep learning

network to verify 3D surface registration results for

rigid parts. The proposed method is a shared mlp ar-

chitecture that treats the validation problem as a two-

class classiﬁcation task. It has explored the potentials

of using deep learning for the validation task. Particu-

larly, this work reveals that feature learning and simi-

Figure 11: Visualization of ValidNet.

larity matching using deep neural network can be use-

ful in verifying the success of registration algorithms.

Working on segments instead of the complete scene

also reduces the search space and can be beneﬁcial for

real-time applications. Moreover, it does not need to

learn the translation and rotational invariance proper-

ties since the scene segments are mapped to the same

canonical model pose. This work, however, can be

extended in several ways. For instance, it may be fur-

ther improved by using intensity data or other useful

information. In our future work, we will investigate

more experiments on different objects. We will also

tend to use this method as a post processing phase to

general object recognition to reduce false positives.

ACKNOWLEDGEMENTS

The authors would like to acknowledge Bluewrist

Canada, and the Natural Sciences and Engineering

Research Council of Canada for their support of this

work.

REFERENCES

Aldoma, A., Tombari, F., Di Stefano, L., and Vincze, M.

(2012). A global hypotheses veriﬁcation method for

3d object recognition. In European conference on

computer vision, pages 511–524. Springer.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

396

Bariya, P. and Nishino, K. (2010). Scale-hierarchical 3d

object recognition in cluttered scenes. In 2010 IEEE

computer society conference on computer vision and

pattern recognition, pages 1657–1664. IEEE.

Besl, P. J. and McKay, N. D. (1992). A Method for regis-

tration of 3-D shapes. In Sensor Fusion IV: Control

Paradigms and Data Structures, pages 586–607. In-

ternational Society for Optics and Photonics.

Buch, A. G., Kraft, D., Kamarainen, J.-K., Petersen, H. G.,

and Kr

uger, N. (2013). Pose estimation using local

structure-speciﬁc shape and appearance context. In

2013 IEEE International Conference on Robotics and

Automation, pages 2080–2087. IEEE.

Coumans, E. (2015). Bullet physics simulation. In ACM

SIGGRAPH 2015 Courses, page 7. ACM.

Guehring, J. (2001). Reliable 3d surface acquisition, regis-

tration and validation using statistical error models. In

Proceedings Third International Conference on 3-D

Digital Imaging and Modeling, pages 224–231. IEEE.

Mahler, J. and Goldberg, K. (2017). Learning deep policies

for robot bin picking by simulating robust grasping

sequences. In Conference on Robot Learning, pages

515–524.

Mian, A. S., Bennamoun, M., and Owens, R. (2006).

Three-dimensional model-based object recognition

and segmentation in cluttered scenes. IEEE trans-

actions on pattern analysis and machine intelligence,

28(10):1584–1601.

Myronenko, A. and Song, X. (2010). Point set registra-

tion: Coherent point drift. IEEE transactions on pat-

tern analysis and machine intelligence, 32(12):2262–

2275.

Paiement, A., Mirmehdi, M., Xie, X., and Hamilton, M. C.

(2016). Registration and modeling from spaced and

misaligned image volumes. IEEE Transactions on Im-

age Processing, 25(9):4379–4393.

Pais, G. D., Miraldo, P., Ramalingam, S., Govindu, V. M.,

Nascimento, J. C., and Chellappa, R. (2019). 3dreg-

net: A deep neural network for 3d point registration.

arXiv preprint arXiv:1904.01701.

Papazov, C. and Burschka, D. (2010). An efﬁcient

ransac for 3d object recognition in noisy and occluded

scenes. In Asian Conference on Computer Vision,

pages 135–148. Springer.

Pappas, I. P., Styner, M., Malik, P., Remonda, L., and

Caversaccio, M. (2005). Automatic method to as-

sess local ct–mr imaging registration accuracy on im-

ages of the head. American journal of neuroradiology,

26(1):137–144.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017). Pointnet:

Deep learning on point sets for 3d classiﬁcation and

segmentation. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages

652–660.

Rabbani, T., Van Den Heuvel, F., and Vosselman, G. (2006).

Segmentation of point clouds using smoothness con-

straints. In ISPRS commission V symposium: image

engineering and vision metrology, pages 248–253. In-

ternational Society for Photogrammetry and Remote

Sensing (ISPRS).

Rogelj, P., Kovacic, S., and Gee, J. C. (2002). Validation of

a nonrigid registration algorithm for multimodal data.

In Medical Imaging 2002: Image Processing, volume

4684, pages 299–308. International Society for Optics

and Photonics.

Rusinkiewicz, S. and Levoy, M. (2001). Efﬁcient variants

of the ICP algorithm. In 3D Digital Imaging and Mod-

eling (3dim), volume vol. 1, pages 145–153.

Rusu, R. B., Blodow, N., and Beetz, M. (2009). Fast point

feature histograms (fpfh) for 3d registration. In 2009

IEEE International Conference on Robotics and Au-

tomation, pages 3212–3217. IEEE.

Schnabel, J. A., Tanner, C., Smith, A. D. C., Hill, D. L.,

Hawkes, D. J., Leach, M. O., Hayes, C., Degenhard,

A., and Hose, R. (2001). Validation of non-rigid reg-

istration using ﬁnite element methods. In Biennial In-

ternational Conference on Information Processing in

Medical Imaging, pages 344–357. Springer.

Schwarz, M., Milan, A., Lenz, C., Munoz, A., Periyasamy,

A. S., Schreiber, M., Sch

uller, S., and Behnke, S.

(2017). Nimbro picking: Versatile part handling

for warehouse automation. In 2017 IEEE Inter-

national Conference on Robotics and Automation

(ICRA), pages 3032–3039. IEEE.

Segal, A. V., Haehnel, D., and Thrun, S. (2009).

Generalized-icp. In Robotics: science and systems,

pages 435–442.

Vo, A.-V., Truong-Hong, L., Laefer, D. F., and Bertolotto,

M. (2015). Octree-based region growing for point

cloud segmentation. ISPRS Journal of Photogramme-

try and Remote Sensing, 104:88–100.

Yang, J., Li, H., Campbell, D., and Jia, Y. (2015). Go-

icp: A globally optimal solution to 3d icp point-set

registration. IEEE transactions on pattern analysis

and machine intelligence, 38(11):2241–2254.

Yu, W., Tannast, M., and Zheng, G. (2017). Non-rigid free-

form 2d–3d registration using a b-spline-based statis-

tical deformation model. Pattern recognition, 63:689–

699.

ValidNet: A Deep Learning Network for Validation of Surface Registration

397