LINE SEGMENT BASED STRUCTURE AND MOTION FROM TWO

VIEWS

A Practical Issue

Saleh Mosaddegh, David Foﬁ

Le2i, UMR CNRS 5158, Universit

e de Bourgogne, Le Creusot, France

Pascal Vasseur

MIS, Universit

e de Picardie Jules Verne, 33 Rue Saint Leu, 80039 Amiens, France

Keywords:

Motion estimation, Line segment correspondences, Two views, Optimization.

Abstract:

We present an efﬁcient measure of overlap between two co-linear segments which considerably decreases the

overall computational time of a Segment-based motion estimation and reconstruction algorithm already exist

in literature. We also discuss the special cases where sparse sampling of the motion space for initialization of

the algorithm does not result in a good solution and suggest to use dense sampling instead to overcome the

problem. Finally, we demonstrate our work on two real data sets.

1 INTRODUCTION

Using lines for estimating motion is advantageous be-

cause they are easier to extract and have less localiza-

tion error than other features of interest such as points.

However, one can ﬁnd several works in literature in

which the impossibility of motion determination from

the line correspondences between only two views are

mentioned (Zhang, 1995; Holt and Netravali, 1996;

Netravali and Huang, 2002; Faugeras, 2001). Clas-

sical methods such as (Taylor and Kriegman, 1995;

Bartoli and Sturm, 2005) which use supporting lines

(geometric abstraction of straight line segments) need

many line correspondences across at least three im-

ages. To our knowledge, the algorithm introduced

in (Zhang, 1995) is, so far, the only work based on

only two views which tries to recover motion and

structure by maximizing the total overlap of line seg-

ments in correspondence. In this paper we introduce

a unique and efﬁcient way of computing overlap be-

tween two segments which considerably decreases the

overall computational time of this algorithm. We im-

prove the speed of proposed method by replacing its

objective function with a less computationally expen-

sive one without affecting the output of the algorithm.

For all our data sets in hand, it was also found that the

sampling strategy of the proposed method often is not

dense enough to obtain a good initial guess and one

need to sample the motion space with very small steps

to obtain an acceptable solution. On the other hand,

after densely sampling of the motion space, the best

initial guess is already enough close to the good so-

lution and further optimization does not improve the

estimation considerably. This observation also moti-

vated us to work on decreasing the time for calculat-

ing the objective function in order to be able to search

for a good solution over a densely sampled motion

space in a shorter time. The rest of this text is or-

ganized as follows. First a brief explanation of the

Zhang method as it is introduced in (Zhang, 1995). In

section 3 the new objective function is deﬁned and its

performance compared to the previous function is dis-

cussed. The results in section 5 demonstrate how this

new function lead to a signiﬁcant improvement in per-

formance in terms of execution time through applying

the method on a set of real data.

2 MOTION ESTIMATION BY

MAXIMIZING OVERLAPS

In this section, we present a brief summary of the al-

gorithm for solving the motion problem by maximiz-

ing the overlap of line segments introduced in(Zhang,

1995). The problem to be solved is that given the

680

Mosaddegh S., Foﬁ D. and Vasseur P..

LINE SEGMENT BASED STRUCTURE AND MOTION FROM TWO VIEWS - A Practical Issue.

DOI: 10.5220/0003357106800684

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 680-684

ISBN: 978-989-8425-47-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Overlap of two line segments in correspondence.

cameras intrinsic parameters and two sets of line seg-

ments, which are in correspondence, estimate the

camera extrinsic parameters (motion R and t).

Consider the pair of line segments(l, l

) in corre-

spondence as shown in Fig. 1. The line l

in the sec-

ond image is the epipolar line of end point s from the

ﬁrst image, i.e. l

= E

s, where E = [t]

R is the essen-

tial matrix (Hartley and Zisserman, 2004).

s is the ray

which pass through end point s and center of ﬁrst cam-

era and it can be easily computed since the camera

intrinsic parameters are assumed to be known. Sim-

ilarly the line l

is the epipolar line of the other end

point e. Taking cross product of each of these two

epipolar lines with the segment e

results in their in-

tersection, s” and e” with the segment. Provided that

the epipolar geometry (i.e. matrix E, or the motion

(R , t)) between two images is correct, then s and s”

correspond to a single point in space; so do e and e”.

Thus, the statement that two line segments l and l

share a common part of a 3D line segment is equiva-

lent to saying that line segment s”e” and line segment

(i.e.l

) overlap. The overlap length, L

, for two

line segments in correspondence can then be com-

puted from:











min(||e

− s

||,||e” − s

||,||e

− s”||,||e” − s”||)

i f











(s” − s

).(e

− s”) > 0

or (s” − s

).(e

− s”) > 0

or (s” − s

).(e

− s”) > 0

or (s” − s

).(e

− s”) > 0











−min(||e

− s”||,||e” − s

||)

otherwise

(1)

where . stands for dot product of two vectors. The

overlap length is positive if two line segments over-

lap, otherwise it is negative. We assume that the ori-

entation information of a line segment is not available

(i.e. the correspondence between end points of the

segments is not known). The overlap length in the

ﬁrst image, denoted by L, can be computed exactly

in the same way . Since a small overlap length for a

short line segment is as important as a large overlap

length for a long line segment, we should use the rel-

ative overlap length, L

/||l

|| and L/||l|| to measure

the overlap of the pair of line segments. The relative

overlap length takes a value between 0 and 1 when

two segments overlap; otherwise it will be negative.

We deﬁne relative non-overlap length between two

corresponding segments l

and l

in the second image

as:

= (1 −

||l

) (2)

which is 0 when two segments completely overlap,

between 0 and 1 when they partially overlap and big-

ger than one when there is a gap between two seg-

ments. We can now formulate the motion problem as

estimating the camera motion parameters (R ; t) by

minimizing the following non-linear objective func-

tion:

F =

∑

i=1

+ H

)

where n is the number of lines. The algorithm can be

summarized as the following pseudo-code:

Sample the rotation and the translation space with

sufﬁcient steps.

for each sample R(i) in the rotation space do

for each t( j) in the translation space do

For hypothesized motion E = [t( j)]R(i) cal-

culate objective function F

∑

k=1

+ H

)

end for

for 10 best solution in matrix F

Using downhill simplex method, minimize F

starting with the best solution as initial guess.

end for

If sampling of motion space is with adequate small

steps, at least one of the ten minimization efforts in

the last loop converges to a good solution.

3 THE NEW MEASURE OF

OVERLAP

In above algorithm, H

(or H ), the function for com-

puting relative non-overlap length for each line cor-

respondence is the most frequently called function

and reducing computational time of this function can

largely decrease the overall computational time of the

algorithm. Consider the two possible conﬁgurations

of two collinear line segments as shown in Fig. 2.

The coordinates of two endpoints of the overlap part,

(Xmin,Y min) and (Xmax,Y max) can be found by:

min

= max(min(s

), min(s”

,e”

)),

max

= min(max(s

), max(s”

,e”

)),

LINE SEGMENT BASED STRUCTURE AND MOTION FROM TWO VIEWS - A Practical Issue

681

Figure 2: Two possible conﬁgurations of two collinear line

segments.

min

= max(min(s

), min(s”

,e”

)),

max

= min(max(s

), max(s”

,e”

)),

therefore, overlap length can be expressed by its

Cartesian length by:

= L

+ L

where

= (X

max

− X

min

), L

= (Y

max

−Y

min

)

Note that the output of this new measure of overlap

length is exactly equal to that of Equ. 1 but with-

out need for a i f − then construct with four OR con-

ditions. While computing the relative non-overlap

length, computational time can further be reduced by

half by considering only one of the Cartesian com-

ponents of the overlap part and the segment in the

second image (we chose x component) based on the

relation shown in Fig. 3:

= (1 −

||l

) (3)

However care should be taken when the segment

is vertical where y components should be used to

avoid undeﬁned division 0/0. In order to have a very

accurate comparison between two measures, we care-

fully counted the number of CPU cycles needed to

run the assembly instructions for the new non-overlap

length measure as deﬁned by Equ. 3 versus the mea-

sure deﬁned by Equ. 2 compiled using a C com-

piler on an Intel Pentium machine. Our new non-

overlap length measure needs 302 clockticks (includ-

ing the conditional i f for proper treating of verti-

cal segments). The original measure of non-overlap

needs a minimum of 720 clockticks (if the ﬁrst in-

equality condition among four inequalities in Equ. 1

is satisﬁed) and a maximum of 858 clockticks (if none

of four inequalities are satisﬁed). We can not use

clockticks average since for the majority of samples

in motion space and except for some random lines,

the rest of lines do not exhibit an overlap therefore the

computation of the objective function for these sam-

ples requires maximum number of clockticks. This

means our new measure can be computed on average

slightly less than 858/302 = 2.841 times faster than

Figure 3: The relation between Cartesian components of

the overlap part and the segment in the second image. The

ratios of corresponding sides of two right triangles are con-

stant.

the measure introduced in (Zhang, 1995), assuming

that the variation of the line segments is random. Re-

fer to the result section for a comparison using real

data.

4 RECOVERING MOTION BASED

ON DENSE SAMPLING

Sparse sampling of the 5 dimensional motion space

(3 for the rotation and 2 for the translation) followed

by reﬁnement of the best samples as suggested in

(Zhang, 1995) is problem-in-hand dependent and de-

pending on how far the best initial guesses are to the

global minimum, the optimization stage can use con-

siderable iterations to converge to a good solution or

it may not be able to converge at all. Thanks to our

faster method for calculating the objective function,

we are able to sample the motion space more densely,

resulting in a better initial guess closer to the global

minimum with less time required by optimization al-

gorithm to converge to a good solution. The results in

the next section demonstrate how this new approach

can help to recover the motion for the examples where

the sparse sampling followed by a reﬁnement of the

best samples cannot converge to a good solution.

5 RESULTS

We have already shown the efﬁciency of the new ob-

jective function in the terms of execution cycles. In

this section, however, we give the results on two real

data sets where for the last set the original algorithm

fails to recover the motion due to sparse sampling of

the motion space. The ﬁrst set of real data is an image

pair of a bakery (Fig. 4). The position and rotation

of the second camera with respect to the ﬁrst one was

obtained through a very careful setup and use of a gy-

roscope:

R = [−0.0073,−0.3049, −0.0036],

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

682

tra

= [0.9318,−0.0123, 0.3629],

where the translation T

tra

is normalized and the ro-

tation R is represented by a 3D vector whose direction

is that of the rotation axis and whose norm is equal to

the rotation angle. The segments which are aligned

with the epipolar lines are neglected during comput-

ing total overlap and later for the scene reconstruction

since in this case computed intersections, s and s” are

instable and the result overlap can be irrationally big,

resulting in eliminating a good solution.

For speed comparison, we applied the algorithm

with both original and new objective functions on this

data. We extracted and matched 85 lines between two

views manually. Through searching for the initial mo-

tion estimation by the sampling strategy as described

in the original algorithm (i.e. sampling the range

[−

] with steps equal to

for the rotation and 40

uniform sampling of a Gauss hemisphere based on

the icosahedron for the translation), only 1 of 10 best

samples converged to the good solution. The result of

the best solution is shown in Fig. 4. The whole pro-

cess (excluding the time for line extraction and match-

ing) took 3159 seconds composed of 1479 seconds

for evaluating the objective function over the motion

space and 1680 seconds for optimization of 10 best

initial guesses (both algorithm were implemented in

Matlab and were executed on the same computer).

Figure 4: Top row: Two images of a bakery with 85

matched line segments superimposed on the images (in

green) . Bottom row: 3D reconstruction of the bakery by

the structure from motion technique described in (Zhang,

1995). The right image corresponds to the top view.

Though replacing the function for computing rela-

tive non-overlap with our function does not alter the

output of the new algorithm, however it reduces the

overall computational time to 1215 seconds (around

2.6 times faster including all overhead computations).

The error in the translation direction is 1.0847

◦

. The

error in the rotation angle is 0.3087

◦

and the error in

the rotation axis is 1.9147

◦

Fig. 5 shows the second pair of images taken from

the real stereo image data set available at (INRIA, ).

The transformation from the ﬁrst camera to the sec-

ond camera is:

Figure 5: Top row: A stereo pair with 104 matched line

segments superimposed on the images (in green) . Middle

row: A perspective view of the 3D reconstruction by classi-

cal stereo including two camera image planes. Bottom row:

a view from the side.

R = [−0.0004,0.3133, 0.0717],

tra

= [−0.9859,−0.0441, 0.1617],

We extracted and matched 104 lines between two

views manually. Through searching for the initial mo-

tion estimation by original sampling, none of the best

samples converged to a good solution. The best solu-

tion reconstruction corresponding to the initial guess

with the smallest value of objective function is shown

in Fig. 6.

As a matter of fact, this is an example of a scene

LINE SEGMENT BASED STRUCTURE AND MOTION FROM TWO VIEWS - A Practical Issue

683

Figure 6: 3D reconstruction of the scene by the best solution

of the structure from motion technique described in (Zhang,

1995). The bottom image corresponds to a side view. Ap-

parently this is not a good solution.

where it can be shown that the global minimum is

closely located to many local minima and only a ﬁne

sampling of the motion space can result in a good

solution. Therefore we applied our dense sampling

strategy by 90 sampling of translation space and 1330

sampling of rotation space. The evaluation of the ob-

jective function for all these samples takes around

3120 seconds. The best solution’s reconstruction is

shown in Fig. 7. The error in the translation direc-

tion is 1.9

◦

. The error in the rotation angle and ro-

tation axis are 0.94

◦

and 1.986

◦

respectively. This

result is already very good and further optimization

is not necessary. One can notice that even though we

are beneﬁting from a faster objective function, how-

ever the evaluation of all samples in the dense motion

space is quite time consuming and except inevitable

evaluation of all these samples, there is not any other

deterministic alternative approach for such particular

examples.

6 CONCLUSIONS

We introduce a new measure of overlap which in-

creases the speed of the calculating the overlap be-

tween two line segments in correspondence. It also

allows a denser sampling of the motion space for ﬁnd-

ing initial guesses for optimization of the non-linear

objective function for recovering motion based on line

segment correspondences and therefor facilitating the

search for a good solution where due to the nature

of the scene, sparse sampling followed by optimiza-

Figure 7: 3D reconstruction of the scene corresponding to

the sample with minimum value of objective function from

densely partitioned motion space.

tion does not converge to a good solution. We demon-

strated this situation by giving the results on two real

data sets including the scene where the original algo-

rithm fails to recover the motion.

REFERENCES

Bartoli, A. and Sturm, P. (2005). Structure-from-motion

using lines: Representation, triangulation, and bundle

adjustment. CVIU, 100(3):416–441.

Faugeras, O. (2001). Three-dimensional computer vision :

a geometric viewpoint. MIT Press, Cambridge, Mass.

Hartley, R. and Zisserman, A. (2004). Multiple View Geom-

etry in Computer Vision. Cambridge University Press.

Holt, R. and Netravali, A. (1996). Uniqueness of solutions

to structure and motion from combinations of point

and line correspondences. In JVCIR, volume 7, pages

126–136.

INRIA. Syntim stereo images. http://perso.lcpc.fr/

tarel.jean-philippe/syntim/paires.html.

Netravali, A. and Huang, T. (2002). Motion and structure

from feature correspondences: A review. In AIPU02,

pages 331–348.

Taylor, C. and Kriegman, D. (1995). Structure and mo-

tion from line segments in multiple images. PAMI,

17(11):1021–1032.

Zhang, Z. (1995). Estimating motion and structure from

correspondences of line segments between two per-

spective images. In ICCV95, pages 257–262.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

684