IMAGE MATCHING BY RANSAC USING MULTIPLE

NON-UNIFORM DISTRIBUTIONS COMPUTED FROM IMAGES

Yasushi Kanazawa and Yoshihiro Ito

Department of Knowledge-based Information Engineering, Toyohashi University of Technology

1–1 Hibarigaoka, Tempaku, Toyohashi, Aichi 441-8580 JAPAN

Keywords:

image matching, RANSAC, multiple homographies, planar region detection, uncalibrated stereo.

Abstract:

We propose an accurate method for establishing point correspondences between two images taken by an un-

calibrated stereo. We explores the case of a scene with multiple planes and we detect the homographies of the

planes by using a RANSAC-like algorithm. For random sampling in RANSAC, we deﬁne three nonuniform

sampling weights that are computed from feature points in the images. By introducing these weights, our

method can detect more accurate matches than the usual methods. Furthermore, our method can establish the

correspondence stably irrespective of the scene is faraway or not. We demonstrate effectiveness of our method

by real image examples.

1 INTRODUCTION

Establishing point correspondences over multiple im-

ages is the ﬁrst step of many computer vision appli-

cations. Therefore, various matching methods have

been proposed (Kanazawa and Kanatani, 2004b; Ma-

ciel and Costeira, 2002; Olson, 2002; Zhang et al.,

1995)

RANSAC (Fischler and Bolles, 1981) and LMedS

(Rousseeuw and Leroy, 1987) are very powerful

methods for estimating parameters over images. They

are also very robust to outliers in data. So, for estab-

lishing point correspondences, many methods based

on them have been proposed (Torr and Davidson,

2003; Torr and Zisserman, 1998; Torr and Zisserman,

2000). In those procedures, we usually use a uniform

distribution for sampling data. It is reasonable when

we want to estimate global parameters over images.

For example, for estimating the homography to make

the panoramic image from two images, RANSAC and

LMedS work very well. For estimating the funda-

mental matrix of an image pair, they also work ﬁne.

Because these matrices are the global parameters be-

tween the two images.

When there are multiple planes in a scene, we can

compute the fundamental matrix from the homogra-

phies of the planes (Hartley and Zisserman, 2000).

Such the fundamental matrix is more accurate than

that computed from point matches and can be de-

composed into camera parameters stably (Kanazawa

et al., 2004). However, if we want to estimate the ho-

mographies for small planes in the scene, RANSAC

and LMedS with a uniform distribution do not work

well. Because the probability of the four matches,

which are chosen by a uniform distribution, being on

the same plane is very small. Then, we need many

iterations for estimating such the homographies. In

addition, we often obtain the homographies for non-

existing planes. For such the case, we need some

knowledge about the existing planes (Dick et al.,

2000), a criterion for judgment whether the region

is planar or not (Kanazawa et al., 2004), or detect-

ing special features for planar regions (Matas et al.,

2002).

In this paper, we propose an accurate method for

establishing point correspondences based on detect-

ing the homographies of multiple planes in a scene

using a RANSAC-like algorithm. Instead of using a

uniform distribution for random sampling, we intro-

duce three nonuniform sampling weights: concentrate

likelihoods, coplanarity likelihoods, and correspond-

ing likelihoods. These likelihood distributions are de-

ﬁned from the locations of feature points and residu-

als of template matching. By introducing these likeli-

hoods, our method can detect more accurate matches

than other methods. Furthermore, our method can es-

tablish the correspondence stably irrespective of the

scene is faraway or not. We demonstrate effectiveness

of our method by real image examples.

377

Kanazawa Y. and Ito Y. (2006).

IMAGE MATCHING BY RANSAC USING MULTIPLE NON-UNIFORM DISTRIBUTIONS COMPUTED FROM IMAGES.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 377-382

DOI: 10.5220/0001372103770382

Copyright

c

SciTePress

t

X

Y

Z

Y

Z

P

x

Rx

O

O

X

’

’

’

’

’

Figure 1: The camera model and the co-

ordinates systems.

Image Plane

Optical Center

3D surface

Figure 2: Coplanarity likelihood.

β

1.0

ρ

y = a x

α

index

q(Pβ|Pα)

Figure 3: Planar point likelihood.

2 COMPATIBILITY OF

FUNDAMENTAL MATRIX AND

HOMOGRAPHIES

We assume that the camera model is the pinhole

model. We take the ﬁrst camera as a reference coordi-

nate system and place the second camera in a position

obtained by translating the ﬁrst camera by vector t

and rotating it around the center of the lens by matrix

R. The two cameras may have different focal lengths

f and f

.

Let (x, y) be the image coordinates of a feature

point P projected onto the image plane of the ﬁrst

camera, and (x

,y

) be those for the second camera.

We use the following three-dimensional vectors to

represent them (the superscript denotes transpose):

x =(x/f

0

,y/f

0

, 1)

, x

=(x

/f

0

,y

/f

0

, 1)

.

(1)

Here, f

0

is a scale factor for stabilizing computation.

We consider the vectors x

i

and x

i

for feature points

P

i

, i =1, ..., N. As shown in Fig. 1, the vectors x

i

and x

i

must satisfy the following epipolar equation

(Hartley and Zisserman, 2000; Kanatani, 1996):

(x

i

, Fx

i

)=0. (2)

Here, (a, b) denotes the inner product of vectors a

and b. The matrix F , which is called the fundamental

matrix, is a singular matrix of rank 2.

When all the point P

i

lie on a plane Π, the vectors

x

i

and x

i

are related in the following form (Hartley

and Zisserman, 2000; Kanatani, 1996):

x

i

= Z[Hx

i

]. (3)

Here, Z[ · ] designates a scale normalization to make

the third component 1. The matrix H, which is called

the homography, is a nonsingular matrix.

When the feature points P

j

, j =1, ..., M lied on a

plane in a scene, Eqs. (2) and (3) are satisﬁed simulta-

neously. This time, the homography H is compatible

to the fundamental matrix F (Hartley and Zisserman,

2000) and the matrix product FH must be a skew-

symmetric matrix:

FH + H

F

= O. (4)

Using the compatibility (4), we can compute the

fundamental matrix F from two or more homogra-

phies H

1

, ..., H

K

, K ≥ 2. In addition, if we

compute the homographies by an optimal method

(Kanatani et al., 2000) and compute the fundamental

matrix from the homographies, the obtained funda-

mental matrix is more accurate than that directly com-

puted from point matches (Kanazawa et al., 2004).

3 WEIGHTS FOR RANDOM

SAMPLING

For detecting multiple planes from two images taken

by an uncalibrated stereo, we must estimate homo-

graphies for the planes from point matches between

the two images. Generally, we can robustly esti-

mate a global homography between two images by

RANSAC (Fischler and Bolles, 1981) and LMedS

(Rousseeuw and Leroy, 1987). But, if we want to es-

timate the homographies of local or small planes in

the scene, RANSAC and LMedS do not work well.

Because the probability of chosen four matches being

on the same plane is very small due to using a uni-

form distribution for random sampling. Therefore, we

need many iterations, but we may often obtain the ho-

mographies for non-existing planes. However, if we

know some knowledge about the planes in the scene

and we deﬁned the sampling weights for random sam-

pling using the knowledge, we can efﬁciently choose

pairs on the same plane and can estimate the homog-

raphy for them. So, we deﬁne three weights for doing

random sampling. We compute them from the loca-

tions of feature points and the residuals of template

matching.

3.1 Coplanarity Likelihoods

First, we deﬁne coplanarity likelihood between two

feature points in an image. The same likelihoods

have been proposed by the Kanazawa and Kawakami

(Kanazawa and Kawakami, 2004), however, we add

physical interpretation to them in this paper.

Considering two points that are on a 3-D surface

(Fig. 2), we can regard the proximity two points are

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

378

on the same plane whether the surface is exactly pla-

nar or not. On the other hand, when the distance be-

tween the two points is long, we regard the two points

are not on the same plane if the surface is not planar.

So, the proximity two points on the image have high

likelihood that they are on the same plane. Then, we

deﬁne a likelihood of coplanarity with respect to the

two points by the distance between them.

Let I

1

and I

2

be the sets of all the feature points

in I

1

and I

2

, respectively. For P

α

, P

β

∈I

1

, let d

αβ

be the Euclidean distance between them. We deﬁne

the conditional likelihood p(P

β

|P

α

) by the following

equations.

p(P

β

|P

α

)=

1

Z

α

e

−s

α

d

2

αβ

··· α = β

0 ··· α = β

, (5)

where, Z

α

=

N

β=α

e

−s

α

d

2

αβ

and N is the number of

the feature points on the image I

1

. We call this like-

lihoods the coplanarity likelihoods, which indicates

a likelihood about that P

α

and P

β

are on the same

plane. Here, the parameter s

α

is determined by solv-

ing the following equations.

N

β=1

(d

αβ

−

¯

d

α

)e

−s

α

d

2

αβ

=0,

¯

d

α

=

1

N

N

β=1

d

αβ

.

(6)

3.2 Planar Point Likelihood

Next, we deﬁne a planar point likelihood of the

feature point P

α

using the coplanarity likelihood

p(P

β

|P

α

).

For each P

α

, we deﬁne the following conditional

cumulative likelihood

q(P

β

|P

α

)=

β

µ=1

p(P

µ

|P

α

), (7)

where, the p(P

µ

|P

α

) are sorted in descending order

with respect to P

µ

for each P

α

.

In the space of the cumulative likelihood q(P

β

|P

α

),

we consider a line y = a

α

x passing through the origin

and the point that q(P

β

|P

α

) = ρ (Fig.3). Using the

coefﬁcient a

α

of the line, we deﬁne the planar point

likelihood ˆp(P

α

) for the point P

α

by

ˆp(P

α

)=

a

α

α∈I

1

a

α

. (8)

3.3 Corresponding Likelihood

Finally, we deﬁne corresponding likelihoods by the

residuals of template matching.

Correlations or residuals obtained by template

matching are often used for establishing point corre-

spondences between two images. We must not ab-

solutely trust them, because it depends on the posi-

tions and the orientations of the two cameras. How-

ever, the correct pairs usually have high correlation

values. So, we deﬁne the likelihood of correspon-

dence for each match by the residual of template

matching.

Let P

β

be a feature point in I

1

and Q

β

be a feature

point in another image I

2

. Let j

ββ

be the residual

of template matching between them. Using j

ββ

,we

deﬁne the conditional likelihoods p

(Q

β

|P

β

) as fol-

lows:

p

(Q

β

|P

β

)=

1

Z

β

e

−t

β

j

2

ββ

,Z

β

=

M

β

=1

e

−t

β

j

2

ββ

(9)

Here, M is the number of the feature points in the im-

age I

2

. We call this likelihoods the correspondence

likelihoods, which indicate that the pair {P

β

, Q

β

}

is the correct match. Here, the parameter t

β

is deter-

mined by the same way as the coplanarity likelihoods:

M

β

=1

(j

ββ

−

¯

j

β

)e

−t

β

j

2

ββ

=0,

¯

j

β

=

1

L

L

β

=1

j

ββ

(10)

Here, the residuals j

ββ

are sorted in ascending order

for each β and L is the average index number of the

correct matches (1 ≤ L ≤ M ).

3.4 RANSAC with the Three

Nonuniform Likelihoods

Using these likelihood as the weights for random sam-

pling, we can efﬁciently choose candidate matches

that are coplanar in the scene and have high corre-

lation values. We also can avoid combinational ex-

plosion for choosing the candidate matches.

Here, in advance, we sort the likelihoods in de-

scending order and compute cumulative likelihoods,

respectively. In each random sampling, we ﬁrst gen-

erate one random number x in the range [0, 1) using a

uniform distribution, then increase β from 1 and ﬁnd

β that satisﬁes

q

β−1

≤ x<q

β

, (11)

where q

β

is a cumulative likelihood and q

0

=0.

The procedure of our method is as follows:

1. Randomly choose a point P

α

in I

1

using the planar

point likelihood ˆp(P

α

).

2. Choose 4 points P

β

1

, P

β

2

, P

β

3

, and P

β

4

in I

1

using

the coplanarity likelihood p(P

β

|P

α

) with respect to

P

α

.

IMAGE MATCHING BY RANSAC USING MULTIPLE NON-UNIFORM DISTRIBUTIONS COMPUTED FROM

IMAGES

379

(a) (b) (c)

(d) (e) (f) (g)

Figure 4: (a) A stereo image pair and detected feature points. (b) Correspondences and planar regions and obtained by the

proposed method. (c) 3-D shape (top view) from (b). (d) Correspondences obtained by the method of Kanazawa and Kanatani

(Kanazawa and Kanatani, 2004b). (e) 3-D shape (top view) from (d). (f) Correspondences obtained by the standard RANSAC.

(g) 3-D shape (top view) from (f).

3. Choose 4 matches {P

β

1

,Q

β

1

}, {P

β

2

,Q

β

2

},

{P

β

3

,Q

β

3

}, and {P

β

4

,Q

β

4

} using the corre-

sponding likelihoods p

(Q

β

|P

β

1

), p

(Q

β

|P

β

2

),

p

(Q

β

|P

β

3

), p

(Q

β

|P

β

4

), respectively. Here,

Q

β

1

, Q

β

2

, Q

β

3

, Q

β

4

∈I

2

.

4. Check the chosen 4 matches are skewed or not

(Kanazawa and Kawakami, 2004). If the matches

are skewed, back to the step 1.

5. Compute a homography H

α

from chosen 4

matches.

6. Let S

α

be the set of the matches {P

γ

,Q

γ

} which

satisfy

E(P

γ

,Q

γ

, H

α

) <d and p

(Q

γ

|P

γ

) <t.

where P

γ

∈I

1

and Q

γ

∈I

2

. Here, t and d

are the thresholds speciﬁed by users. The func-

tion E(P

γ

,Q

γ

, H ) is an error function (or resid-

ual) of the match {P

γ

,Q

γ

} and the homography

H

α

, which is obtained by Eq. (3). Then, let M

α

be the number of the elements of S

α

.

7. Repeat the above computation until M

α

reaches its

maximum.

8. Finally, enforce uniqueness with respect to

E(P

γ

,Q

γ

, H

α

) to the resulting S

α

and re-

compute the homography H

α

from them.

By repeating the above procedure, we can obtain one

or more homographies of the planes in the scene.

We summarize the above procedure as follows. In

the ﬁrst image, by using the planar point likelihood,

we ﬁrst choose a “seed” in the region that includes

many feature points. Such region can be regarded as

planar in the scene. We then choose 4 points that

are close to the seed by using the coplanarity likeli-

hood about the seed. We ﬁnally choose correspond-

ing points in the second image for the 4 points chosen

from the ﬁrst image by using the correspondence like-

lihoods. After computing a homography from the 4

matches, we then make the consensus set for the com-

puted homography from the set of the matches that

have high correspondence probabilities and satisfy the

speciﬁed degree to the computed homography. By re-

peating this procedure, we can ﬁnd the consensus set

that have the maximum number of the elements. So,

we can regard all the correspondences in the obtained

consensus set are in the same planar region. Finally,

we compute the homography from them by an op-

timal method (Kanatani et al., 2000). If we obtain

multiple homographies in the scene, we compute the

fundamental matrix from the obtained homographies

using the compatibility (4). Furthermore, if we need

the correspondences which original 3-D points are not

on any planes, we can check each candidate matches

using the epipolar equation (2) using the computed

fundamental matrix.

4 EXPERIMENTAL RESULTS

We show some experiments using real images.

Fig. 4 shows a real image example of a scene of

brick walls. Fig. 4 (a) shows a stereo image pair

and the feature points detected by Harris operator

(Harris and Stephens, 1988). Fig. 4 (b) shows the

correspondences and the planar regions obtained by

our method. Here, we show only the correspon-

dences on the detected planar regions. Fig. 4 (d)

shows the correspondences obtained by the method of

Kanazawa and Kanatani

1

(Kanazawa and Kanatani,

1

We used the program code placed at

http://www.img.tutkie.tut.ac.jp/programs/index-e.html

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

380

(a) (b) (c)

(d) (e) (f) (g)

Figure 5: (a) A stereo image pair and detected feature points. (b) Result by the proposed method. (c) 3-D shape (top view)

from (b). (d) Result by the method of Kanazawa and Kanatani (Kanazawa and Kanatani, 2004b). (e) 3-D shape (top view)

from (d). (f) Result the standard RANSAC. (g) 3-D shape (top view) from (f).

2004b). Fig. 4 (f) shows the result obtained by the

standard RANSAC only using the epipolar constraint.

In these results, we show the correspondences using

line segments whose endpoints are the positions of a

pair of points. We can see that the proposed method

can establish many correct matches compared with

the other methods.

Fig. 4 (c), (e), and (g) show the reconstructed 3-

D shapes from the correspondences (b), (d), and (f),

respectively. Here, we use the method of Kanatani

and Matsunaga (Kanatani and Matsunaga, 2000) for

decomposing the fundamental matrix into the camera

parameters. In these 3-D reconstructions, the angles

of the walls are about 90 degrees in Fig. 4 (c), 95 de-

grees in Fig. 4 (e), and 100 degrees in Fig. 4 (g). We

see the fundamental matrix obtained by our method is

accurate compared with the other methods.

Fig. 5 shows an example in a scene of buildings.

We see that there are many wrong matches in the re-

sults obtained by the other methods. But, we see that

there are few wrong matches in the result obtained by

our method. The angles of the walls in 3-D shapes

are about 94 degrees in Fig. 5 (c), 80 degrees in Fig. 5

(e), and 69 degrees in Fig. 5 (g). Again, we see the

3-D shape obtained by the proposed method is more

accurate than the other methods.

Fig. 6 shows an example for a faraway scene. Gen-

erally in a faraway scene, we can detect only one

plane, but we cannot compute the fundamental ma-

trix because the scene is degeneracy. So, we can-

not obtain the correspondences using the standard

RANSAC. We compare the results by our method

and the method for image mosaicing proposed by

Kanazawa and Kanatani

2

(Kanazawa and Kanatani,

2

We used the program code placed at

http://www.img.tutkie.tut.ac.jp/programs/index-e.html

2004a). Fig. 6 (b) and (c) show the correspondences

obtained by our method and the method of Kanazawa

and Kanatani. Fig. 6 (d) and (e) show panoramic

(difference) images from (b) and (c). We see our

method detect many correct matches and the gener-

ated panoramic image is also accurate.

In our method, we need not the judgment whether

the scene is degenerated or not (Kanazawa and

Kanatani, 2004b). In other words, our method can es-

tablish the correspondence stably irrespective of the

scene is faraway or not.

For these examples, we stopped the search when

no update occurred 20000 times consecutively in the

iteration in our method. The total computation times

were 335 seconds for Fig. 4, 341 seconds for Fig. 5,

and 58 seconds for Fig. 6. We used Pentium 4, 2.4

GHz for the CPU with 512 MB main memory and

Linux for the OS.

5 CONCLUSION

We have proposed an accurate method for establish-

ing point correspondences based on detecting one or

more planes by random sampling. Instead of using

a uniform distribution in random sampling, we intro-

duce three nonuniform likelihoods, which are deﬁned

by the feature points and their correlations. By us-

ing these likelihoods, our method can choose correct

matches efﬁciently in random sampling. So we can

detect more correct matches than the other methods.

Furthermore, our method can establish the correspon-

dence stably irrespective of the scene is faraway or

not. By real image examples, we have demonstrated

that the proposed method is robust and accurate.

In future works, we must reduce processing times

of the method.

IMAGE MATCHING BY RANSAC USING MULTIPLE NON-UNIFORM DISTRIBUTIONS COMPUTED FROM

IMAGES

381

(a) (b) (c)

(d) (e)

Figure 6: (a) A stereo image pair and detected feature points. (b) Result by the proposed method. (c) Result by the method of

Kanazawa and Kanatani (Kanazawa and Kanatani, 2004a). (d) Panoramic image from (b). (e) Panoramic image from (c).

ACKNOWLEDGEMENTS

This work was supported in part by the Ministry of

Education, Culture, Sports, Science and Technology,

Japan, under the Grant for 21st Century COE Program

“Intelligent Human Sensing.”

REFERENCES

Dick, A., Torr, P., and Cipolla, R. (2000). Automatic 3d

modeling of architecture. In Proc. 11th British Ma-

chine Vision Conf., pages 372–381, Bristol, U.K.

Fischler, M. and Bolles, R. (1981). Random sample consen-

sus: A paradigm for model ﬁtting with applications to

image analysis and automated cartography. Comm.

ACM, 24(6):381–395.

Harris, C. and Stephens, M. (1988). A combined corner and

edge detector. In Proc. 4th Alvey Vision Conf., pages

147–151, Manchester, U.K.

Hartley, R. and Zisserman, A. (2000). Multiple View Geom-

etry. Cambridge University press, Cambridge.

Kanatani, K. (1996). Statistical Optimization for Geometric

Computation: Theory and Practice. Elsevier Science,

Amsterdam.

Kanatani, K. and Matsunaga, C. (2000). Closed-form ex -

pression for focal lengths from the fundamental ma-

trix. In Proc. 4th Asian Conf. Comput. Vision, pages

128–133, Taipei, Taiwan.

Kanatani, K., Ohta, N., and Kanazawa, Y. (2000). Optimal

homography computation with a reliability measure.

IEICE trans. Inf. & Syst., E83-D(7):1369–1374.

Kanazawa, Y. and Kanatani, K. (2004a). Image mosaic-

ing by stratiﬁed matching. Image Vision Comput,

22(2):93–103.

Kanazawa, Y. and Kanatani, K. (2004b). Robust image

matching preserving global consistency. In Proc. 6th

Asian Conf. Comput. Vision, pages 1128–1133, Jeju

Island, Korea.

Kanazawa, Y. and Kawakami, H. (2004). Detection of pla-

nar regions with uncalibrated stereo using distribu-

tions of feature points. In Proc. 15th British Machine

Vision Conf., pages 247–256, London, U.K.

Kanazawa, Y., Sakamoto, T., and Kawakami, H. (2004).

Robust 3-d reconstruction using one or more homo-

graphies with uncalibrated stereo. In Proc. 6th Asian

Conf. Comput. Vision, pages 503–508, Jeju Island,

Korea.

Maciel, J. and Costeira, J. (2002). Robust point correspon-

dence by concave minimization. Image Vision Com-

put., 20(9/10):683–690.

Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-

bust wide baseline stereo from maximally stable ex-

tremal regions. In Proc. 13th British Machine Vision

Conf., pages 384–393, Cardiff, U.K.

Olson, C. F. (2002). Maximum-likelihood image matching.

IEEE Trans. Patt. Anal. Mach. Intell., 24(6):853–857.

Rousseeuw, P. J. and Leroy, A. (1987). Robust Regression

and Outlier Detection. Wiley, New York.

Torr, P. and Davidson, C. (2003). Impsac: Synthesis of

importance sampling and random sample consensus.

IEEE Trans. Patt. Anal. Mach. Intel., 25(3):354–364.

Torr, P. and Zisserman, A. (1998). Robust computation and

parameterization of multiple view geometry. In Proc.

6th Int. Conf. Computer Vision, pages 727–732, Bom-

bay, India.

Torr, P. and Zisserman, A. (2000). Mlesac: A new robust

estimator with application to estimating image geom-

etry. Comput. Vis. Image. Understand., 78:138–156.

Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T.

(1995). A robust technique for matching two uncal-

ibrated images through the recovery of the unknown

epipolar geometry. Artif. Intell., 78:87–119.

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

382