A NEW OBJECT RECOGNITION SYSTEM

Nikolai Sergeev and Guenther Palm

Institute of Neural Information Processing, Ulm University, Ulm, Germany

Keywords:

Object recognition system, Invariant object representation.

Abstract:

This paper presents a new 2D object recognition system. The object representation used by the system is

rotation, translation, scaling and reﬂection invariant. The system is highly robust to partial occlusion, defor-

mation and perspective change. The last makes it applicable to 3D tasks. Color information can be ignored

as well as combined with form representation. The boundary of an object to be recognized doesn’t need to

be path-connected. The time demand to learn a new object doesn’t depend on the number of objects already

learned. No object segmentation prior to recognition is needed. To evaluate the system the 3D object library

COIL-100 was used.

1 INTRODUCTION

1.1 System Architecture

An object recognition system usually consists of

three parts. Part one extracts image primitives e.g.

edges(Canny, 1986), lines(Hough, 1962), orientation

histograms (Dalal and Triggs, 2005) or moments (Hu,

1962),(Reiss, 1993). Part two constructs feature vec-

tors. Finally part three is responsible for learning

and retrieval of the information e.g. support vector

machines (Vapnik, 1998), artiﬁcial neural networks

(Rosenblatt, 1962; Bishop, 2007) or regression esti-

mators (Gyofri et al., 2002). The system to introduce

in this paper also has this architecture. Image prim-

itives are half ellipses. One feature vector encodes a

combination of half ellipses. To learn and compare

the feature vectors a new storage as well as a new re-

trieval algorithm were developed.

1.2 Motivation

Afﬁne invariant object representation methods nor-

mally used are either not suitable for not path-

connected objects as fourier descriptors (Arbter et al.,

1990) or need segmentation prior to recognition as

moments (Reiss, 1993). One common problem of

these approaches is discrimination. Invariant features

deliver no unique description of an object. So it can

happen that two objects with similar features have

nothing in common for an observer. The represen-

tation to be introduced in this paper overcomes theses

problems. However it is not suitable for standard ma-

chine learning algorithms as support vector machines

or neural networks. An image to analyze doesn’t pro-

duce just a single representation vector but e.g. more

than 50

feature vectors. It makes standard retrieval

algorithms unusable. For that reason a new type of

storage together with a new search algorithm was de-

veloped.

In sum all the three characteristic components of

this object recognition system (features, representa-

tion, machine learning algorithm) are new.

2 OUTLINE OF

IMPLEMENTATION

An object is represented as a set of half ellipse com-

binations A as shown in Figure1. Combinations don’t

need to be of equal length.

For each a ∈ A the system looks for a correspond-

ing half ellipses combination b in the image to ana-

lyze. b should be as long as possible.

More precisely expressed: From the image to

analyze the system extracts a set of half ellipses

B as shown in Figure 2. For each combina-

tion (a

)

i∈{1,...,n}

∈ A a maximal m ∈ {1, ..., n} has

to be determined for which a subsequence π ∈

{1, ..., n}

{1,...,m}

with π(1) = 1 and (b

)

i∈{1,...,m}

∈ B

exist so that (a

π(i)

)

i∈{1,...,m}

can be approximately

transformed into (b

)

i∈{1,...,m}

through translation, ro-

tation, scaling, reﬂection and perspective change as

395

Sergeev N. and Palm G..

A NEW OBJECT RECOGNITION SYSTEM.

DOI: 10.5220/0003307103950400

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 395-400

ISBN: 978-989-8425-47-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: An object to learn and its representation.

Figure 2: An object to analyze with a set of extracted half

ellipses.

Figure 3: A way to transform one combination into anther.

shown in Figure 3. All in all the entire number of

combination pairs to be compared is

∑

,...,a

)∈A

∑

m=1



n − 1

m − 1



|B|

. (1)

With A containing only one combination of length n =

10 and B consisting of 50 half ellipses the number of

pairs is at least 50

As the system makes the check for each subse-

quence (a

π(i)

)

i∈{1,...,m}

it is robust to partial occlusion.

Without any further extension this representation

is color invariant.

3 INVARIANT

REPRESENTATION OF A

COMBINATION OF HALF

ELLIPSES

3.1 Overview

In this section a number of functions F

are intro-

duced. There are needed to obtain an invariant rep-

resentation of a half ellipses combination. To under-

stand their geometrical meaning it is unnecessary to

read their mathematical description. The correspond-

ing ﬁgures are enough. The most important ﬁgure is

number 7. It shows the rotation, translation and scal-

ing invariant representation of a combination of half

ellipses.

3.2 Half Ellipses

For C = R

and P(C) standing for power set of C

a half ellipse is deﬁned as a pair (e, B) ∈ C

× P(C)

with e

6= e

for which (a, b,t

, δ) ∈ [0, ∞) × [0, ∞) ×

R × {−1, 1} as well as (c, β) ∈ C × R exist so that

B =









acost

bsint





t ∈

+ δπ]

∪

+ δπ,t

]







(2)

and

= T



acost

bsint



, (3)

= T



acos(t

+ π)

bsin(t

+ π)



. (4)

, R

stand for translation, scaling and rotation re-

spectively. The set of half ellipses will be denoted

with HE. In other words a half ellipse consists of end-

points e

, e

∈ C and of a set of bow points B ∈ P(C).

The endpoints of a half ellipse play a very important

Figure 4: Examples of half ellipses.

role for the invariant representation. There are mainly

two reasons to use half ellipses. An afﬁne transfor-

mation A 6= 0 always maps a half ellipse onto another

half ellipse. The second reason is the variety of half

ellipses as Figure 4 shows.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

396

3.3 Rotation, Translation and Scaling

Invariant Representation of a Half

Ellipse

Now a unique rotation, translation and scaling invari-

ant representation of a half ellipse will be introduced.

At ﬁrst two preliminary deﬁnitions are needed. For

x, y ∈ C with x 6= y the function F

x,y

is deﬁned as

x,y

(

C → C

z 7→

z−x

y−x

. (5)

Figure 5 shows the geometric meaning of the trans-

Figure 5: Geometric meaning of F

x,y

(z).

formation. F

x,y

is an afﬁne transformation with

x,y

(x) = 0 and F

x,y

(y) = 1. The second function

: HE → C is deﬁned as

(e, B) =







max

x∈B





, e

(x)





max

x∈B





, e

(x)











. (6)

Finally the invariant representation F

: HE → C is

deﬁned in such a way that for always existent x,y ∈ B

with

(e, B) =











, e

(x)







, e

(y)











(7)

(e, B) =





z − SIGNUM(z)



, e

(y)







(8)

with

z =



, e

(x)



. (9)

For (e, B) in Figure 6

(e, B) =



− 1



. (10)

Figure 6: Representation of a half ellipse.

It can be shown that for each x ∈ C a half ellipse

(e, B) ∈ HE exists with F

(e, B) = x. Additionally for

two half ellipses (e, B), ( ˜e,

B) ∈ HE with F

(e, B) =

( ˜e,

B) it can be shown that they can be transformed

in each other through translation, rotation and scaling.

On the other side F

(e, B) is invariant to translation,

rotation and scaling of (e, B).

3.4 Extraction of a Half Ellipses

In the literature one can ﬁnd several methods to ex-

tract ellipses. Most of them are based on Hough trans-

form e.g. (Tsuji and Matsumoto, 1978). However

they are not suitable for half ellipse detection as they

do not deliver endpoints.

Endpoints of a half ellipse to be extracted don’t

need to be labeled or explicitly visible e.g. as corners.

For that reason from a circle the system extracts sev-

eral half circles. The exact number depends on the

size of the circle. A bigger one can deliver over 100

half circles.

The invariant representation introduced above of-

fers a convenient way to extract a half ellipse. As the

Figure 6 shows the system determines two extremes

for a chain of edge points. In the next step it calcu-

lates the unique half ellipse which would also have

such extremes and endpoints. Then it checks if all the

edge points of the chain are in an ε-neighborhood of

the calculated unique half ellipse.

3.5 Rotation, Translation and Scaling

Invariant Representation of a

Combination of Half Ellipses

The set of combinations C =

n∈N

consists of

ordered sequences of half ellipses. Rotation, trans-

lation and scaling invariant representation F

: C →

n∈N

is deﬁned as



, B

)





), F

, B

))



(11)

with i ∈ {1, ..., n}. Values of the representation of the

A NEW OBJECT RECOGNITION SYSTEM

397

Figure 7: Representation of the endpoints of a combination.

endpoints of the combination showed in the left part

of Figure 7 can be directly read off from the right part

of the ﬁgure.

3.6 Reﬂection Invariance

The rotation, translation, scaling invariant representa-

tion introduced above will be now extended to a re-

ﬂection invariant one. The purpose is to ﬁnd a rep-

resentation which doesn’t change with the axis of re-

ﬂection. For n ∈ N and diagonal matrix M

∈ R

6n×6n

deﬁned as







1 0 . . . 0 0

0 −1 . . . 0 0

0 0 . . . 1 0

0 0 . . . 0 −1







(12)

rotation, translation, scaling and reﬂection invariant

representation F

is deﬁned as

(

C →

n∈N

P(R

)

c 7→ {F

(c), M

(c))}

. (13)

In other words the representation consists of two fea-

ture vectors. On the one hand this representation

doesn’t change despite rotation,. . . , reﬂection. On the

other hand two combinations with identical represen-

tations can be transformed in each other through rota-

tion,. . . , reﬂection.

An example makes plausible why the additional

vector is invariant to the axis of reﬂection. Figure

8 shows one combination reﬂected horizontally and

vertically. Figure 9 shows the identical code of the

endpoints of both reﬂected combinations.

4 ROBUSTNESS TO

PERSPECTIVE CHANGE

Figure 10 offers a sketch of the rather technical for-

mulation and implementation of the view point tol-

erance of the system. As shown in the left part of

Figure 8: A combination reﬂected horizontally and verti-

cally.

Figure 9: Representation of endpoints of the reﬂected com-

binations.

Figure 10: Formulation and implementation of perspective

robustness.

the ﬁgure the task is to recognize an object within the

frame if a camera is placed at some point of the sphere

above the dark line and its projection surface is par-

allel to the tangential plane of the point. To model

a camera the perspective projection was used as it is

described e.g. in (Jaehne, 2005).

To solve this task the system builds an in some

sense minimal coverage of the sphere above the dark

line as shown in the right part of the ﬁgure. For each

point of the coverage the system makes a perspective

transformation of the original rotation,. . . , reﬂection

invariant representation and learns it. Albeit storage

intensive this solution is simple and mathematically

precise.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

398

5 THE TASK OF THE

RETRIEVAL ALGORITHM

The storage of the system saves a set A ⊆

n∈N

of feature vectors representing combinations of half

ellipses not the original combinations.

From an image to analyze the system extracts a set

of half ellipses B ⊆ HE. For each a = (a

)

i∈{1,...,n}

∈

A∩R

= A∩

∏

i∈{1,...,n}

the retrieval algorithm de-

termines maximal m ∈ {1, ...,n} for which a subse-

quence π ∈ {1, ..., n}

{1,...,m}

with π(1) = 1 and b ∈ B

exists with

∀i ∈ {1, ..., m} :



π(i)

− c



max

≤ ε

(14)

with F

(b) = c ∈ R

and ε > 0. With other words

two feature vectors get compared with respect to max-

imum norm.

To ﬁnd such a maximal m ∈ {1, ..., n} the system

tries all m, π, b. The new type of machine learning

algorithm allows to compare each



π(i)



with each

c = F

(b). Thus the task is to ﬁnd the highest m with

a successful comparison.

ε > 0 can be chosen only once prior to the initial-

ization of the system.

Comparing two feature vectors with respect to

maximum norm the system tolerates deformation of

an object within ε.

6 EXPERIMENTAL RESULTS

6.1 COIL-100

To evaluate the system the well known

database COIL-100 (Columbia Object Im-

age Library) was used. It is available at

http://www1.cs.columbia.edu/CAVE/software/softlib/

coil-100.php. The data set is described in (Nene

et al., 1996). It contains 7200 color images of 100 3D

objects shown in Figure11. One image is taken per

◦

of rotation.

6.2 Experiment Settings and Results

The computer used in the experiments has a proces-

sor Intel(R) Core(TM)2 Duo CPU P8600 @2.40 GHz

2.40 GHz and 4.00 GB RAM. The system is imple-

mented in Java.

There were made 2 experiments with slightly dif-

ferent parameter settings.

In the ﬁrst experiment 18 views(1 per 20

◦

) were

used to learn each object. The remaining 5400 im-

ages were analyzed. A recognition rate of 99.2% was

Figure 11: COIL-100 objects.

reached. The time demand to learn all objects is 277

seconds. The average time demand to analyze one

image is 980 milliseconds.

In the second experiment 8 views(1 per 45

◦

) were

used to learn an object. The other 6400 were ana-

lyzed. A recognition rate of 96.3% was reached. The

system needs 142 seconds to learn all objects. The

time demand to analyze a single image is 1593 mil-

liseconds.

6.3 Comparison to other Methods

The Table 1 is based on the results described in (Yang

et al., 2000) and (Caputo et al., 2000).

Table 1: Comparison with Alternative Results.

Method 18 views 8 views

LAFs 99.9% 99.4%

Half Ellipses 99.2% 96.3%

SNoW / edges 94.1% 89.2%

SNoW / intensity 92.3% 85.1%

Linear SVM 91.3% 84.8%

Spin-Glass MRF 96.8% 88.2%

Nearest Neighbor 87.5% 79.5%

6.4 Color Information

The pure form representation described above was ex-

tended with color information. A half ellipse has the

ﬁrst and the last point. Hence it also has the right

and the left side as the Figure12 shows. After the ex-

traction of a half ellipse the system determines arith-

metic RGB average along the right side of the half

ellipse as well as along the left one. Thus it deter-

mines two RGB vectors l, r ∈ R

. Color code c ∈ R

is just Cartesian product of this two vectors c = (l, r).

A representation vector a ∈ R

of a half ellipse com-

bination b ∈ HE

gets extended to ˜a ∈ R

6n+6n

with

color code (c

)

i∈{1,...,n}

∈

∏

i∈{1,...,n}

for each half

ellipse of the combination. An additional threshold

A NEW OBJECT RECOGNITION SYSTEM

399

Figure 12: The left and the right side of a half ellipse.

value

ε > 0 is used to compare the color information

of two representation vectors with respect to maxi-

mum norm. An upcoming article explicitly describes

the extraction of half ellipses and its color.

6.5 Learning and Recognition Scheme

used for COIL-100

As mentioned above the system uses e.g. 8 images

to learn an object. For one image it constructs e.g.

10 combinations of half ellipses. Each combination is

represented with e.g 6 feature vectors. Each vector is

labeled with the number N ∈ {1, ..., 100} of the object

it refers to.

Analyzing an image the system at ﬁrst determines

the maximal length m ∈ N of the matched subse-

quences for each learned feature vector. Let the set

of such lengths be denoted as M. For ˜m = max M

the system depicts all feature vectors for which sub-

sequences of the length ˜m were matched. The ob-

ject with the greatest number of such feature vectors

will be returned as the recognized one. Having sev-

eral such objects the system chooses one of them ran-

domly.

7 SUMMARY AND FUTURE

WORK

The object recognition system presented in this paper

combines several important characteristics. It’s capa-

ble of handling 3D objects. The half ellipse extraction

is at least stable enough to wield COIL-100 images.

The trivial color representation used now has yet

to become illumination invariant. The optimization of

the running time doesn’t appear to be a great problem

as the central retrieval algorithm is highly paralleliz-

able. The greatest challenge seems to be the reduction

of the storage consumption without lost of perspective

robustness.

At the present the authors develop a ﬂow estima-

tor based on the comparison of half ellipse combina-

tions. The ﬂow estimator learns thousands of half

ellipse combinations on the ﬁrst frame and tries to

match them on the second one. So in a near future

the system could gain an universal character being si-

multaneously capable of object recognition as well as

ﬂow estimation.

REFERENCES

Arbter, K., Snyder, W. E., Burkhardt, H., and Hirzinger,

G. (1990). Application of afﬁne-invariant fourier de-

scriptors to recognition of 3d objects. In IEEE Trans.

Pattern Analysis and Machine Learning.

Bishop, C. (2007). Neural Networks for Patternrecognition.

Oxford University Press.

Canny, J. (1986). A computational approach to edge detec-

tion. In IEEE Transactions on Pattern Analysis and

Machine Intelligence.

Caputo, B., Hornegger, J., Paulus, D., and Niemann, H.

(2000). A spin-glass markov random ﬁeld for 3d ob-

ject recognition. NIPS 2000.

Dalal, N. and Triggs, B. (2005). Histograms of oriented

gradients for human detection. In IEEE Conference

Computer Vision and Pattern Recognition , San Diego.

Gyofri, L., Kohler, M., Krzyzak, A., and Walk, H. (2002).

A Distribution-Free Theory of Nonparametric Regres-

sion. Springer.

Hough, P. V. C. (1962). Method and Means of Recognising

Complex Patterns. US Patent 3069654.

Hu, M. K. (1962). Visual pattern recognition by moment

invariants. In IRE Transactions on Information The-

ory.

Jaehne, B. (2005). Digital Image Processing. Springer-

Verlag Berlin.

Nene, S. A., Nayar, S. K., and Murase, H. (1996). Columbia

Object Image Library (COIL-100).

Reiss, T. H. (1993). Recognizing Planar Objects Using In-

variant Image Features. Springer-Verlag Berlin Hei-

delberg.

Rosenblatt, F. (1962). Principles of Neurodynamics. Spar-

tan, New York.

Tsuji, S. and Matsumoto, F. (1978). Detection of ellipses by

a modiﬁed hough transform. In IEEE Trans. Comput.

Vapnik, V. N. (1998). Statistical Learning Theory. Wiley,

New York.

Yang, M. H., Roth, D., and Ahuja, N. (2000). Learning to

recognize 3d objects with snow. In ECCV 2000.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

400