TriSI: A Distinctive Local Surface Descriptor for 3D Modeling

and Object Recognition

Yulan Guo

1,2

, Ferdous Sohel

, Mohammed Bennamoun

, Min Lu

and Jianwei Wan

College of Electronic Science and Engineering, National University of Defense Technology, Changsha, P.R.China

School of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia

Keywords:

Local Surface Descriptor, 3D Modeling, 3D Object Recognition, Range Image Registration.

Abstract:

Local surface description is a critical stage for surface matching. This paper presents a highly distinctive local

surface descriptor, namely TriSI. From a keypoint, we ﬁrst construct a unique and repeatable local reference

frame (LRF) using all the points lying on the local surface. We then generate three spin images from the three

coordinate axes of the LRF. These spin images are concatenated and further compressed into a TriSI descriptor

using the principal component analysis technique. We tested our TriSI descriptor on the Bologna Dataset

and compared it to several existing methods. Experimental results show that TriSI outperformed existing

methods under all levels of noise and varying mesh resolutions. The TriSI was further tested to demonstrate

its effectiveness in 3D modeling. Experimental results show that it can accurately perform pairwise and

multiview range image registration. We ﬁnally used the TriSI descriptor for 3D object recognition. The results

on the UWA Dataset show that TriSI outperformed the state-of-the-art methods including spin image, tensor

and exponential map. The TriSI based method achieved a high recognition rate of 98.4%.

1 INTRODUCTION

Surface matching is a fundamental research topic in

both 3D Computer Vision and Computer Graphics. It

has a number of applications, including 3D modeling

(Mian et al., 2006), 3D shape retrieval (Shilane et al.,

2004), 3D object recognition (Attene et al., 2011; Guo

et al., 2012; Guo et al., 2013), 3D mapping (Huber

et al., 2000), robotics (Lai et al., 2011b), and reverse

engineering(Williams and Bennamoun, 2000). With

the rapid development of low-cost 3D scanners (e.g.,

Microsoft Kinect), range images are becoming more

available (Lai et al., 2011a; Rusu and Cousins, 2011).

The data availability together with the progress in

high-speed computing devices have increased the de-

mand for efﬁcient and accurate range image represen-

tation techniques (Mian et al., 2010).

There are two basic approaches to represent a

range image, namely global feature and local feature

based approaches (Salti et al., 2011). A global fea-

ture based approach uses a global feature to represent

a surface. It is very popular in 3D shape retrieval, but

it is sensitive to occlusion and clutter. A local feature

based approach however, uses a set of 3D keypoints

and local surface descriptors to represent a surface. It

is therefore, suitable to surface matching in the pres-

ence of occlusion and clutter (Salti et al., 2011). In

the process of surface matching, the distinctiveness

and robustness of the local surface descriptors play a

signiﬁcant role (Taati and Greenspan, 2011).

A number of papers on local surface descriptors

can be found in the literature (Bustos et al., 2005;

Bronstein et al., 2010; Boyer et al., 2011). Chua and

Jarvis (1997) proposed a Point Signature by record-

ing the signed distances between the neighboring sur-

face points and their correspondences in a ﬁtted plane.

Point Signature is however, sensitive to noise and

varying mesh resolutions (Mian et al., 2005). Johnson

and Hebert (1999) proposed a Spin Image descriptor

by accumulating the neighboring points into a 2D his-

togram. The spin image is one of the most cited meth-

ods in the literature. It is however weakly distinc-

tive and sensitive to varying mesh resolutions (Mian

et al., 2010). Chen and Bhanu (2007) used Local Sur-

face Patches (LSP) to represent a range image. Since

the LSP descriptor requires the calculation of second-

order derivatives of a surface, it is sensitive to noise.

Flint et al. (2007) introduced the THRIFT descriptor

by generating a 1D histogram according to the surface

normal deviations. Tombari et al. (2010) proposed

a Signature of Histograms of OrienTations (SHOT)

by encoding the surface normal deviations in a parti-

Guo Y., Sohel F., Bennamoun M., Lu M. and Wan J..

TriSI: A Distinctive Local Surface Descriptor for 3D Modeling and Object Recognition.

DOI: 10.5220/0004277600860093

In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information

Visualization Theory and Applications (GRAPP-2013), pages 86-93

ISBN: 978-989-8565-46-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

tioned spherical neighborhood. The SHOT is highly

descriptive. It is however, also sensitive to varying

mesh resolutions. Other descriptors include Point’s

Fingerprint (Sun and Abidi, 2001), 3D Shape Context

(Frome et al., 2004), Tensor (Mian et al., 2006), Ex-

ponential Map (EM) (Novatnack and Nishino, 2008)

and Variable-Dimensional Local Shape Descriptors

(VD-LSD) (Taati and Greenspan, 2011).

Most of the existing local surface descriptors suf-

fer from low descriptiveness, or robustness to noise,

or sensitivity to varying mesh resolutions (Bariya

et al., 2012). Motivated by these limitations, we

propose a highly distinctive and robust local surface

descriptor called Tri-Spin-Image (TriSI). The TriSI

is an improvement of the Spin Image descriptor. It

ﬁrst builds a unique and repeatable Local Reference

Frame (LRF) for each keypoint. It then generates

three spin images by spinning one sheet around each

axis of the LRF. The three images are concatenated to

form an overall TriSI descriptor. The TriSI descrip-

tor is further compressed using the Principal Compo-

nent Analysis (PCA) technique. Performance evalu-

ation results show that our proposed TriSI descriptor

is highly descriptive. It is very robust to both noise

and varying mesh resolutions. The effectiveness of

TriSI descriptor was also demonstrated by 3D mod-

eling including pairwise and multiview range image

registration. The TriSI descriptor was further used for

3D object recognition and was tested on the UWA

Dataset. Experimental results show that TriSI out-

performed the state-of-the-art methods including Spin

Image, Tensor, EM and VD-LSD.

The rest of this paper is organized as follows: Sec-

tion 2 describes the TriSI surface descriptor. Section

3 presents the feature matching performance. Section

4 demonstrates the TriSI based 3D modeling meth-

ods and their experimental results. Section 5 presents

the 3D object recognition results. Section 6 concludes

this paper.

2 TriSI SURFACE DESCRIPTOR

The process of generating a TriSI surface descriptor

includes three modules, i.e., LRF construction, TriSI

generation and TriSI compression.

2.1 LRF Construction

Given a triangular mesh surface S and a set of key-

points {o

,··· , o

}, a set of local surface descrip-

tors { f

, f

,··· , f

} should be generated to repre-

sent the surface S . Here, K denotes the number of

keypoints on the surface S . For a given keypoint

,k = 1, 2,··· ,K, we ﬁrst extract the local surface

L using a sphere of radius r centered at o

. We then

construct a LRF for o

using all the points lying on

the local surface L rather than using just the mesh

vertices.

Assume that the local surface L contains N trian-

gles and M vertices. For the ith triangle with vertices

, q

and q

, we calculate the scatter matrix S

us-

ing the continuous PCA algorithm:

∑

j=1

∑

n=1



− o



− o

)

∑

j=1



− o



− o



. (1)

The overall scatter matrix S of the local surface L

is then calculated as:

S =

∑

i=1

, (2)

where the weights γ

and γ

are respectively deﬁned

as:

|(q

− q

) × (q

− q

∑

i=1

|(q

− q

) × (q

− q

, (3)



r−



−

+ q





. (4)

We perform an eigenvalue decomposition on the

scatter matrix S to get three orthogonal eigenvectors

, v

and v

. These eigenvectors are in the order

of decreasing magnitude of their associated eigenval-

ues. The eigenvectors v

, v

and v

form the basis

for the LRF. However, their directions are ambigu-

ous. That is, −v

, −v

and −v

are also eigenvectors

of the scatter matrix S. We therefore propose a sign

disambiguation technique.

We deﬁne the unambiguous vectors

and

as:

= v

· sgn

∑

i=1

∑

j=1



− o



, (5)

= v

· sgn

∑

i=1

∑

j=1



− o



, (6)

where sgn(·) denotes the signum function that ex-

tracts the sign of a real number. Vector

is then

deﬁned as

Finally, we construct a LRF for the keypoint o

using o

as the origin and the unambiguous vectors

{

} as the three coordinate axes.

TriSI:ADistinctiveLocalSurfaceDescriptorfor3DModelingandObjectRecognition

Figure 1: An illustration of the generation of a spin image.

(Figure best seen in color.)

2.2 TriSI Generation

Given a keypoint o

, the local surface L and the LRF

vectors {

}, we generate three spin images

{SI

,SI

} by respectively spinning a sheet about

the three axes of the LRF.

We ﬁrst generate a spin image by spinning a sheet

about the

axis. An illustration is shown in Fig. 1.

Given the LRF, each point q

q on the local surface L is

represented by two parameters α and β. Here, α is the

perpendiculardistance of q

q from the line which passes

through o

and is parallel to

. β is the signed per-

pendicular distance to the plane which goes through

and is perpendicular to

, that is:

α =

q− o

− (

· (q

q− o

))

, (7)

β =

· (q

q− o

). (8)

We accumulate the parameters (α,β) of all the

points on L into a B × B histogram, where B is the

number of bins along each dimension of the his-

togram. We further bilinearly interpolate this 2D his-

togram to account for noise. The interpolated his-

togram is our improved spin image SI

We then generate two other spin images SI

and

by following the exact same way we adopted to

generate SI

. That is, SI

and SI

are generated by

substituting the

in Equations (7-8) with

and

respectively. We ﬁnally concatenate the three spin im-

ages to obtain our TriSI descriptor, that is:

TriSI = {SI

,SI

}. (9)

2.3 TriSI Compression

The dimensionality of the TriSI descriptor is 3B

, and

it is therefore too large for an efﬁcient feature match-

ing. For example, if B is set to be 15, the dimension-

ality of a TriSI is 675. We therefore project the TriSI

to a PCA subspace to get a more compact feature de-

scriptor. The PCA subspace can be learned from a set

of training descriptors { f

, f

,··· , f

}, where T is

the number of training descriptors. We calculate the

scatter matrix M as:

M =

∑

i=1



− f



− f



, (10)

where f

f is the mean vector of all these training de-

scriptors.

We then perform an eigenvalue decomposition on

MV = VD, (11)

where D is a diagonal matrix with diagonal entries

equal to the eigenvalue of M, V is a matrix with

columns equal to the eigenvectors of M.

We deﬁne the PCA subspace using the eigenvec-

tors corresponding to the highest n eigenvalues. The

value of n is chosen such that 95% of the ﬁdelity is

preserved in the compressed data.

Therefore, the compressed vector

of a feature

descriptor f

is:

= V

, (12)

where V

is the transpose of the ﬁrst n columns of V.

During feature matching, we use the Euclidean

distance to measure the distance between any two de-

scriptors

and

3 FEATURE MATCHING

PERFORMANCE

We tested the performance of TriSI descriptor on

the Bologna Dataset (Tombari et al., 2010). We

also compared it with several state-of-the-art descrip-

tors including Spin Image (SpinIm) (Johnson and

Hebert, 1999), Normal Histogram (NormHist) (Het-

zel et al., 2001), Local Surface Patches (LSP) (Chen

and Bhanu, 2007) and SHOT (Tombari et al., 2010).

The experimental results are presented in this section.

3.1 Experimental Setup

The Bologna Dataset (Tombari et al., 2010) contains

six models and 45 scenes. The models were taken

from the Stanford 3D Scanning Repository (Curless

and Levoy, 1996). The scenes were generated by ran-

domly placing different subsets of the models in order

to create clutter and pose variances. The ground-truth

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

Table 1: Tuned parameter settings for ﬁve feature descrip-

tors.

Support Radius (mr) Dimensionality Length

SpinIm 15 15*15 225

NormHist 15 15*15 225

LSP 15 15*15 225

SHOT 15 8*2*2*10 320

TriSI 15 29*1 29

transformations between each model and its instances

in the scenes were provided in the dataset.

We adopt the frequently used Recall vs 1-

Precision Curve (RP Curve) (Ke and Sukthankar,

2004) to measure the performance of a feature de-

scriptor. If the distance between a scene feature de-

scriptor and a model feature descriptor is smaller than

a threshold τ, this pair of feature descriptors is consid-

ered a match. Further, if the two feature descriptors

come from the same physical location, this match is

considered a true positive. Otherwise, it is considered

a false positive. Given the total number of positives as

a priori, the recall and 1-precision are calculated, as in

(Ke and Sukthankar,2004).A RP Curve can therefore,

be generated by varying the threshold τ.

We ﬁrst randomly selected a set of keypoints from

each model (1000 keypoints in our case), and ob-

tained their corresponding points from the scene. We

then used a feature description method (e.g., TriSI) to

extract a set of feature descriptors. The scene feature

descriptors were ﬁnally matched against the model

feature descriptors to produce a RP Curve. The pa-

rameters for all the ﬁve feature description methods

were tuned by a Tuning Dataset which contains the

six models and their transformed versions (obtained

by resampling to

/2 of their original mesh resolution

and adding 0.1mr Gaussian Noise). Note that, ‘mr’

denotes the average mesh resolution of the six mod-

els, which is used as the unit for metric parameters.

We also trained the PCA subspace using the TriSI de-

scriptors of the six models. The tuned parameter set-

tings for all feature descriptors are shown in Table 1.

3.2 Robustness to Noise

In order to test performance of all these descriptors

in the presence of noise, we added different levels of

noise with standard deviations of 0.1mr, 0.3mr, and

0.5mr to the 45 scenes. We generated one RP Curve

for a feature descriptor at each noise level. The RP

Curves of these ﬁve descriptors are shown in Fig. 2.

It can be observed that our TriSI descriptor

achieved the best performance at all levels of noise,

while SHOT achieved the second best performance.

The performance of all these descriptors decreased

as the level of noise increased from 0.1mr to 0.5mr.

However, our TriSI was more robust to noise com-

pared to the others. It obtained a high recall of about

0.7 together with a high precision of about 0.7 even

with a high level noise (with a deviation of 0.5mr),

which is shown in Fig. 2(c). Moreover, the gap be-

tween SHOT and TriSI got larger when level of noise

increased. This validated the strong robustness and

consistency of our TriSI descriptor in the presence of

noise.

It can also be observed that NormHist and SpinIm

performed well under a low level of noise (with a de-

viation of 0.1 mr), as shown in Fig. 2(a). They how-

ever, failed to work when a medium level of noise

(with a deviation of 0.3mr) was added, as shown

in Fig. 2(b). This is because the generation of a

NormHist or a SpinIm requiressurface normals which

are very sensitive to noise. In contrast, LSP was

highly susceptible to noise. It achieved a very low

recall and precision even under a low level of noise

(with a deviation of 0.1mr), as shown in Fig. 2(a).

This is mainly due to the reason that LSP is based on

the shape index values of the local surface which are

even more sensitive to noise compared to surface nor-

mals.

3.3 Robustness to Varying Mesh

Resolutions

In order to test the performance of these descriptors

with respect to varying mesh resolutions, we resam-

pled the noise-free scenes down to

/2,

/4 and

/8 of

their original mesh resolution. We generated one RP

Curve for a feature descriptor at each level of mesh

decimation. The RP Curves of these descriptors are

shown in Fig. 3.

It was found that our TriSI was very robust to

varying mesh resolutions, and outperformed the other

descriptors by a large margin under all levels of mesh

decimation. TriSI achieved a high recall of about 0.95

under

/2 mesh decimation, as shown in Fig. 3(a). It

also achieved a recall of about 0.9 under

/4 mesh dec-

imation, as shown in Fig. 3(b). TriSI consistently

achieved a good performance even under

/8 mesh

decimation, with a recall of more than 0.8, as shown

in Fig. 3(c). It is worth noting that the performance

of TriSI under

/8 mesh decimation was even better

than the performance of SpinIm under

/2 mesh deci-

mation, as shown in Figures 3(a) and (c). This clearly

justiﬁes the effectiveness and robustness of the TriSI

descriptor with respect to varying mesh resolutions.

Moreover, our TriSI descriptor is very compact.

As shown in Table 1, the length of a compressed TriSI

in this experiment is only 29. In contrast, the second

TriSI:ADistinctiveLocalSurfaceDescriptorfor3DModelingandObjectRecognition

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

0.1

0.2

0.3

0.4

0.5

0.6

0.70.7

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(c)

Figure 2: Recall vs 1- precesion curves in the presence of noise. (a) Noise with a deviation of 0.1mr. (b) Noise with a deviation

of 0.3mr. (c) Noise with a deviation of 0.5mr.

0 0.1 0.2 0.3 0.4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−Precision

Recall

SpinIm

NormHist

LSP

SHOT

TriSI

(c)

Figure 3: Recall vs 1- precision curves with respect to varying mesh resolutions. (a)

/2 mesh decimation. (b)

/4 mesh

decimation. (c)

/8 mesh decimation.

shortest length of the other descriptors is 225, which

is larger than the length of TriSI by an order of magni-

tude. This means that our TriSI can be matched more

efﬁciently compared to the other methods.

4 3D MODELING

In order to demonstrate the effectiveness of TriSI for

3D modeling, we used TriSI to perform pairwise and

multiview range image registration.

4.1 Pairwise Range Image Registration

Given a pair of range images {S

} of an object,

we respectively generate a set of TriSI descriptors for

both S

and S

. We match the TriSI descriptors of S

against the TriSI descriptors of S

to obtain a set of

feature correspondences {C

,··· ,C

}, where N

is the number of feature correspondences. For each

feature correspondence C

, we can generate a trans-

formation (R

) between S

and S

using the LRFs

and the point positions. That is, the rotation matrix R

and translation vector t

are calculated as follows:

= F

s1i

s2i

, (13)

= o

s1i

− o

s2i

R, (14)

where o

s1i

and o

s2i

are respectively the positions of

the corresponding points from S

and S

, F

s1i

and F

s2i

are respectively the LRFs of o

s1i

and o

s2i

We calculate a total of N

transformations from

the N

feature correspondences. These transforma-

tions are then grouped into a few clusters. Each clus-

ter center gives a potential transformation between S

and S

. We sort these clusters based on their sizes

and their average feature distances. We verify several

most likely transformations and choose the transfor-

mation which produce the least registration error be-

tween S

and S

We applied our TriSI based pairwise range image

registration method to two range images of the Chef

model of the UWA Dataset (Mian et al., 2006). Fig.

4(a) shows the feature matching results between the

two range images. It is clear that the majority of fea-

ture matches are true positives. This feature matching

result further indicates the high descriptiveness of our

TriSI descriptor. Although there are a few false posi-

tives in the feature matching result, they can easily be

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

(a) (b)

Figure 4: Pairwise range image registration results. (a) Feature matching result between two range images of the Chef. (b)

Pairwise registration result. (Figure best seen in color.)

(a) (b) (c)

Figure 5: Multiview range image registration result. (a) Initial set of 22 range images of the Chef model. (b) Coarse registra-

tion result by TriSI based method. (c) The ﬁne 3D model in the UWA Dataset. (Figure best seen in color.)

eliminated by the process of grouping. As shown in

Fig 4(b), the alignment between the two range images

are very accurate.

4.2 Multiview Range Image

Registration

Given a set of range images {S

,··· ,S

} of an

object, we generate a set of TriSI descriptors for each

range image. Once the TriSI descriptors are obtained,

we use a tree based algorithm to perform multiview

range image registration. We ﬁrst select the range im-

age S

which has the maximum surface area as the

root node of the tree. We then use the aforementioned

pairwise registration method to match S

with the re-

maining range images in the search space. Once a

range image S

is accurately registered with S

, the

range image S

is added to the tree as a new node

and removed from the search space. The arc between

the two nodes represents the rigid transformation be-

tween S

and S

. Once all range images have been

matched with S

, the algorithm proceeds to the next

node. This process continues until all range images in

the search space are added to the tree. Once the tree is

fully constructed, the transformation between any two

connected nodes is already available. Based on these

estimated transformations, we transform all range im-

ages to the coordinate basis of the range image at the

root node. These multiview range images can there-

fore be aligned without any manual intervention.

We applied our TriSI based multiview range im-

age registration method to 22 range images of the

Chef model of the UWA Dataset. Fig. 5(a) shows the

initial set of range images, and Fig. 5(b) illustrates

our registration result. It is clear that our TriSI based

method achieved a highly accurate alignment without

use of any priori information (e.g., the image order

or initial pose of each image). A visual comparison

between the coarse registered range images and the

original ﬁne 3D model shows that the registration re-

sult is almost identical to the original model, as shown

in Figures 5 (b) and (c). This result further demon-

strates the effectiveness of our TriSI based method.

It is expected that the registration results can further

be improved by using a global registration algorithm,

e.g., (Williams and Bennamoun, 2001).

TriSI:ADistinctiveLocalSurfaceDescriptorfor3DModelingandObjectRecognition

5 OBJECT RECOGNITION

To further evaluate the performance of our TriSI de-

scriptor, we used TriSI to perform 3D object recogni-

tion on the UWA Dataset (Mian et al., 2006). The

UWA Dataset is one of the most frequently used

datasets. It contains ﬁve models and 50 real scenes.

Our 3D object recognition algorithm goes through

two phases: ofﬂine pre-processing and online recog-

nition. During the ofﬂine pre-processing phase, we

extract our TriSI descriptors from a set of randomly

selected keypoints from all models. We also index

these TriSI descriptors using a k-d tree structure for

efﬁcient feature matching. During the online recogni-

tion phase, we ﬁrst extract TriSI descriptors from a set

of randomly selected keypoints in a given scene. We

then match the scene descriptors against the model

descriptors using the k-d tree. The feature matching

results are used to vote for candidate models and to

generate transformation hypotheses. The candidate

models are then veriﬁed in turn by aligning them with

the scene using a transformation hypothesis. If the

candidate model is aligned accurately with a portion

of the scene, the candidate model and transforma-

tion hypothesis are accepted. Subsequently, the scene

points that correspond to this model are recognized

and segmented. Otherwise, the transformation hy-

pothesis is rejected and the next hypothesis is veriﬁed

by turn.

We conducted our experiments using the same

data and experimental setup as in (Mian et al., 2006)

and (Bariya et al., 2012) to achievea rigorous compar-

ison. The recognition rates of our TriSI based method

are shown in Fig. 6 with respect to varying levels

of occlusion. We also present the recognition results

of Tensor (Mian et al., 2006), SpinIm (Mian et al.,

2006), VD-LSD (Taati and Greenspan, 2011) and EM

(Bariya et al., 2012) based methods.

As shown in Fig. 6, our method achieved the best

recognition results. The average recognition rate of

our TriSI based methods was 98.4%. That is, only

three out of the 188 objects in the 50 scenes were

not correctly recognized. The second best results

were produced by EM based method with an aver-

age recognition rate of 97.5%. They were followed

by Tensor, VD-LSD and SpinIm based methods.

Our TriSI based method is also robust to occlu-

sion. It achieved a 100% recognition rate under upto

80% occlusion. As occlusion further increased, the

recognition rate decreased slightly. It obtained a

recognition rate of more than 80% even under 90%

occlusion. In contrast, the second best recognition

rate under 90% occlusion was only about 50%, which

was reported by the EM based method.

60 65 70 75 80 85 90

100

Occlusion (%)

Recognition rate (%)

Tensor

SpinIm

VD−LSD

TriSI

Figure 6: Recognition rates on the UWA Dataset.

6 CONCLUSIONS

In this paper, we presented a distinctive TriSI local

descriptor. We evaluated the descriptiveness and ro-

bustness of our TriSI descriptor with respect to dif-

ferent levels of noise and mesh resolutions. Experi-

mental results show that TriSI is very robust to noise

and varying mesh resolutions. It outperformed all

existing method. We also used our TriSI descriptor

for both pairwise and multiview range image regis-

tration. TriSI achieved highly accurate registration

results. Moreover, we used TriSI based method to

perform 3D object recognition. Experimental results

revealed that our method achieved the best recogni-

tion rate. Overall, our TriSI based method is robust

to noise, occlusion and varying mesh resolutions. It

outperformed the state-of-the-art methods.

ACKNOWLEDGEMENTS

This research is supported by a China Scholar-

ship Council (CSC) scholarship (2011611067) and

Australian Research Council grants (DE120102960,

DP110102166).

REFERENCES

Attene, M., Marini, S., Spagnuolo, M., and Falcidieno, B.

(2011). Part-in-whole 3D shape matching and dock-

ing. The Visual Computer, 27(11):991–1004.

Bariya, P., Novatnack, J., Schwartz, G., and Nishino, K.

(2012). 3D geometric scale variability in range im-

ages: Features and descriptors. International Journal

of Computer Vision, 99(2):232–255.

Boyer, E., Bronstein, A., Bronstein, M., Bustos, B., Darom,

T., Horaud, R., Hotz, I., Keller, Y., Keustermans, J.,

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

Kovnatsky, A., et al. (2011). SHREC 2011: Robust

feature detection and description benchmark. In Euro-

graphics Workshop on Shape Retrieval, pages 79–86.

Bronstein, A., Bronstein, M., Bustos, B., Castellani, U.,

Crisani, M., Falcidieno, B., Guibas, L., Kokkinos, I.,

Murino, V., Ovsjanikov, M., et al. (2010). SHREC

2010: robust feature detection and description bench-

mark. In Eurographics Workshop on 3D Object Re-

trieval, volume 2, page 6.

Bustos, B., Keim, D., Saupe, D., Schreck, T., and Vrani´c, D.

(2005). Feature-based similarity search in 3D object

databases. ACM Computing Surveys, 37(4):345–387.

Chen, H. and Bhanu, B. (2007). 3D free-form object recog-

nition in range images using local surface patches.

Pattern Recognition Letters, 28(10):1252–1262.

Curless, B. and Levoy, M. (1996). A volumetric method

for building complex models from range images. In

23rd Annual Conference on Computer Graphics and

Interactive Techniques, pages 303–312.

Frome, A., Huber, D., Kolluri, R., B¨ulow, T., and Malik, J.

(2004). Recognizing objects in range data using re-

gional point descriptors. In 8th European Conference

on Computer Vision, pages 224–237.

Guo, Y., Bennamoun, M., Sohel, F., Wan, J., and Lu, M.

(2013). 3D free form object recognition using rota-

tional projection statistics. In IEEE 14th Workshop on

the Applications of Computer Vision. In press.

Guo, Y., Wan, J., Lu, M., and Niu, W. (2012).

A parts-based method for articulated tar-

get recognition in laser radar data. Optik.

http://dx.doi.org/10.1016/j.ijleo.2012.08.035.

Hetzel, G., Leibe, B., Levi, P., and Schiele, B. (2001). 3D

object recognition from range images using local fea-

ture histograms. In IEEE Conference on Computer

Vision and Pattern Recognition, volume 2, pages II–

394.

Huber, D., Carmichael, O., and Hebert, M. (2000). 3D map

reconstruction from range data. In IEEE International

Conference on Robotics and Automation, volume 1,

pages 891–897. IEEE.

Johnson, A. E. and Hebert, M. (1999). Using spin images

for efﬁcient object recognition in cluttered 3D scenes.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 21(5):433–449.

Ke, Y. and Sukthankar, R. (2004). PCA-SIFT: A more

distinctive representation for local image descriptors.

In IEEE Conference on Computer Vision and Pattern

Recognition, volume 2, pages 498–506.

Lai, K., Bo, L., Ren, X., and Fox, D. (2011a). A large-

scale hierarchical multi-view GRB-D object dataset.

In IEEE International Conference on Robotics and

Automation, pages 1817–1824.

Lai, K., Bo, L., Ren, X., and Fox, D. (2011b). A scalable

tree-based approach for joint object and pose recogni-

tion. In Twenty-Fifth Conference on Artiﬁcial Intelli-

gence (AAAI).

Mian, A., Bennamoun, M., and Owens, R. (2005). Au-

tomatic correspondence for 3D modeling: An exten-

sive review. International Journal of Shape Modeling,

11(2):253.

Mian, A., Bennamoun, M., and Owens, R. (2006). Three-

dimensional model-based object recognition and seg-

mentation in cluttered scenes. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

28(10):1584–1601.

Mian, A., Bennamoun, M., and Owens, R. (2010). On the

repeatability and quality of keypoints for local feature-

based 3D object retrieval from cluttered scenes. Inter-

national Journal of Computer Vision, 89(2):348–361.

Novatnack, J. and Nishino, K. (2008). Scale-dependent/

invariant local 3D shape descriptors for fully auto-

matic registration of multiple sets of range images. In

10th European Conference on Computer Vision, pages

440–453.

Rusu, R. and Cousins, S. (2011). 3D is here: Point cloud

library (pcl). In 2011 IEEE International Conference

on Robotics and Automation, pages 1–4.

Salti, S., Tombari, F., and Stefano, L. (2011). A perfor-

mance evaluation of 3D keypoint detectors. In Inter-

national Conference on 3D Imaging, Modeling, Pro-

cessing, Visualization and Transmission, pages 236–

243.

Shilane, P., Min, P., Kazhdan, M., and Funkhouser, T.

(2004). The Princeton shape benchmark. In Inter-

national Conference on Shape Modeling Applications,

pages 167–178.

Sun, Y. and Abidi, M. (2001). Surface matching by 3D

point’s ﬁngerprint. In 8th IEEE International Confer-

ence on Computer Vision, volume 2, pages 263–269.

Taati, B. and Greenspan, M. (2011). Local shape descriptor

selection for object recognition in range data. Com-

puter Vision and Image Understanding, 115(5):681–

694.

Tombari, F., Salti, S., and Di Stefano, L. (2010). Unique

signatures of histograms for local surface description.

In European Conference on Computer Vision, pages

356–369.

Williams, J. and Bennamoun, M. (2000). A multiple view

3D registration algorithm with statistical error model-

ing. IEICE Transactions on Information and Systems,

83(8):1662–1670.

Williams, J. and Bennamoun, M. (2001). Simultaneous reg-

istration of multiple corresponding point sets. Com-

puter Vision and Image Understanding, 81(1):117–

142.

TriSI:ADistinctiveLocalSurfaceDescriptorfor3DModelingandObjectRecognition