ACCURATE IMAGE REGISTRATION BY COMBINING
FEATURE-BASED MATCHING AND GLS-BASED MOTION
ESTIMATION
Raul Montoliu and Filiberto Pla
Computer Vision Group. Jaume I University. 12071 Castellon. Spain
Keywords:
Image Registration, Motion Estimation, Generalized Least-Squares Estimation, SIFT.
Abstract:
In this paper, an accurate Image Registration method is presented. It combines a feature-based method, which
allows to recover large motion magnitudes between images, with a Generalized Least-Squares (GLS) motion
estimation technique which is able to estimate motion parameters in an accurate manner. The feature-based
method gives an initial estimation of the motion parameters, which will be refined using the GLS motion
estimator. Our approach has been tested using challenging real images using both affine and projective motion
models.
1 INTRODUCTION
Image registration (Brown, 1992) is a key problem in
many applications in computer vision and image pro-
cessing. We refer to Image Registration as the process
of finding the correspondence between all the pixels
of two images of the same scene captured using differ-
ent time, sensors or viewpoints. In the case of having
more than two images, this problem is closely related
to the creation of panoramic images (Szeliski, 2004).
In the literature of computer vision and image
processing there can be found two main research
directions in Image Registration: feature-based and
optimization-based.
The main limitation of the feature-based meth-
ods is the high dependence about how the detection
and extraction of the features from the images are
performed. This can affect to the accuracy of the
registration in the case of using interest point detec-
tors with a low repeatability rate. However, impor-
tant advances have been reached in this area. Many
researchers have developed interest point detectors
and descriptors invariant to large rotations, changes
of scale, illumination changes and even partially in-
variant affine changes. See (Mikolajczyk et al., 2005)
This paper has been partially supported by project
ESP2005-07724-C05-05 from Spanish CICYT.
and (Mikolajczyk and Schmid, 2005) for a compara-
tive study of scale and affine invariant interest point
detectors and local descriptors, respectively. Szeliski
in (Szeliski, 2004) maintains that if the features are
well distributed over the image and the descriptors are
reasonably designed for repeatability, enough corre-
spondences to permit image registration can usually
be found. This is the case when using the feature de-
tectors and descriptors reported at (Mikolajczyk et al.,
2005; Mikolajczyk and Schmid, 2005), which allow
to register images with large deformations.
On the other hand, optimization methods, which
use directly the grey level of all pixels, are based on
estimating a vector of parameters that minimize (or
maximize) an objective function. The main advan-
tage of optimization methods is their estimation ac-
curacy because of the huge volume of data implies
that parameter estimation for image registration are
heavily over-constrained. Therefore, methods based
on optimization techniques can be very accurate since
a small number of parameters (6 for the affine mo-
tion model) are estimated using a large number of
constraints. However, they suffer from initialization
problems due to its iterative nature: the initial param-
eters must not be very far from the solution in order
to avoid falling in a local minima. A well-know tech-
nique to cope with this initialization problem is the
use of a hierarchical (coarse-to-fine) technique. How-
386
Montoliu R. and Pla F. (2007).
ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED MOTION ESTIMATION.
In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IFP/IA, pages 386-389
Copyright
c
SciTePress
ever, even using hierarchical techniques, optimization
methods are not able to cope with very large motion.
Given these two approaches to image registration,
which is preferable? A possible solution of this prob-
lem is to combine both methods, feature-based and
optimization-based to form an accurate image reg-
istration technique able to cope with large deforma-
tions. That is the strategy that uses our approach. First
a feature-based method is used to obtain a good initial
motion parameters that are not very far from the true
solution. Using this initialization, in the second step,
an optimization-based algorithm is applied, which re-
fines the estimation of the motion parameters until the
accuracy level desired by the user.
At the first step, to cope with changes of scale,
rotations, illumination changes and partially affine in-
variance, the SIFT technique (Lowe, 2004) has been
used to detect and descript interest points due to
its excellent performance (Mikolajczyk and Schmid,
2005). As the main contribution of this paper, we pro-
pose to use a Generalized Least-Squared (GLS) mo-
tion estimation method in the second step. As it will
be shown, the use of a GLS estimator is an effective
way of solving regression problems, allowing to ob-
tain accurate estimation of the parameters in Image
Registration.
The rest of the document is organized as follows:
The next section explains the GLS estimator for gen-
eral problems and particulary for motion estimation.
Section 3 comments in detail the proposed algorithm.
Section 4 shows the experiments performed using our
approach and finally the last section presents the main
conclusions drawn from this paper.
2 GLS MOTION ESTIMATION
In general, the GLS estimation problem can be ex-
pressed as follows:
minimize [Θ
υ
= υ
t
υ] subject to F
i
(χ, L
i
) = 0, L
i
(1)
where υ is the vectors of residuals of the observa-
tions, χ = (χ
1
, . . . , χ
p
) is a vector of p parameters,
each L
i
is an observation vector with n components
L
i
= (L
1
i
, . . . , L
n
i
), and F
i
is a set of f functions that
depend on the common vector of parameters χ and on
an observation vector L
i
, with i = 1 . . . r.
Briefly summarizing, in the GLS method, the it-
erative optimization is started with an initial guess
of the parameters χ
0
. At each iteration j, the algo-
rithm estimates ∆χ to update the parameters as fol-
lows: χ
j
= χ
j1
+ ∆χ. The increment ∆χ is calcu-
lated (Danuser and Stricker, 1998) based on the par-
tial derivatives of the functions F
i
with respect to the
parameters, χ, and the observation vectors L
i
, using
the following expressions:
∆χ =
i=1...r
N
i
!
1
i=1...r
R
i
!
(2)
where N
i
= A
T
i
(B
i
B
T
i
)
1
A
i
and R
i
= A
T
i
(B
i
B
T
i
)
1
E
i
,
with
B
i
=
F
1
i
(χ
j1
,L
i
)
L
1
i
. . .
F
1
i
(χ
j1
,L
i
)
L
n
i
.
.
.
.
.
.
F
f
i
(χ
j1
,L
i
)
L
1
i
. . .
F
f
i
(χ
j1
,L
i
)
L
n
i
( f ×n)
A
i
=
F
1
i
(χ
j1
,L
i
)
∂χ
1
. . .
F
1
i
(χ
j1
,L
i
)
∂χ
p
.
.
.
.
.
.
F
f
i
(χ
j1
,L
i
)
∂χ
1
. . .
F
f
i
(χ
j1
,L
i
)
∂χ
p
( f ×p)
E
i
=
F
1
i
(χ
j1
, L
i
)
.
.
.
F
f
i
(χ
j1
, L
i
)
( f ×1)
(3)
In motion estimation problems, the objective func-
tion is usually based on the assumption that the gray
level of all the pixels of a region remains constant
between two consecutive images in a sequence, i.e.
the Brightness Constancy Assumption (BCA), which
is based on the principle of assuming that the changes
in gray levels between the reference image and the
test one are only due to motion.
In order to directly use the BCA instead of its lin-
earized version, i.e. the optical flow equation, a non-
linear estimator should be used. The GLS estimator
can be used in this context. In our formulation of
the motion estimation problem, the function F
i
is ex-
pressed as follows (note that in this case the number
of functions f is 1):
F
i
= I
1
(x
i
, y
i
) I
2
(x
0
i
, y
0
i
), (4)
where I
1
(x
i
, y
i
) is the gray level of the first image in
the sequence (reference image) at the point [x
i
, y
i
], and
I
2
(x
0
i
, y
0
i
) is the gray level of the second image in the
sequence (test image) at transformed point [x
0
i
, y
0
i
]. In
this case, each observation L
i
is related to each pixel
[x
i
, y
i
], with r being the number of pixels. is the
area of interest.
Let us consider the test image (I
2
) as the data
model to match, and the reference image (I
1
) as ob-
servation data. For each pixel i, let us define the ob-
servation vector as L
i
= (x
i
, y
i
, I
1
(x
i
, y
i
)), which has
ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED
MOTION ESTIMATION
387
three elements (n = 3): column, row (pixel coordi-
nates) and gray level of reference image at these co-
ordinates. The gray level of the reference image has
been selected as an element of the observation vec-
tor since it is the observation that we want to match
with the given gray level in the test image using the
BCA. The spatial coordinates have also been selected
as part of the observations, since inaccuracy in their
measurement can happen, because of the image ac-
quisition process.
In order to calculate the matrices A
i
, B
i
and E
i
(see
equation 3), the partial derivatives of the function F
i
with respect to the parameters and with respect to ob-
servations must be worked out.
For instance, using affine motion, the terms B
i
, A
i
and E
i
are expressed as follows:
B
i
=
I
1
x
a
1
I
2
x
a
2
I
2
y
, I
1
y
b
1
I
2
x
b
2
I
2
y
, 1.0
(1×3)
A
i
=
x
i
I
2
x
, y
i
I
2
x
, I
2
x
, x
i
I
2
y
, y
i
I
2
y
, I
2
y
(1×6)
E
i
=
I
1
(x
i
, y
i
) I
2
(x
0
i
, y
0
i
)
(1×1)
(5)
where I
1
x
, I
1
y
, I
2
x
and I
2
y
have been introduced to sim-
plify notation as: I
1
x
= I
1
x
(x
i
, y
i
), I
1
y
= I
1
y
(x
i
, y
i
), I
2
x
=
I
2
x
(x
0
i
, y
0
i
) and I
2
y
= I
2
y
(x
0
i
, y
0
i
), being I
1
x
(x
i
, y
i
), I
1
y
(x
i
, y
i
),
the gradients of the reference image and I
2
x
(x
0
i
, y
0
i
) and
I
2
y
(x
0
i
, y
0
i
) the gradients of the test image.
3 ALGORITHM PROPOSED
Our proposed algorithm can be summarized in these
four sequential steps:
1. Detection and description of interest points:
The SIFT technique is applied for detecting and
performing the description of the points of inter-
est in both images.
2. Matching of interest points: For each interest
point belonging to the first image a K-NN search
strategy is performed to find the k-closest inter-
est points at the second image. At the end of this
process, a set of point pairs is obtained.
3. Estimate first approximation using random
sampling: For estimating the first approximation
of the motion parameters a random sampling tech-
niques is used to determine a good initial solution.
4. Final motion estimation using GLS: The GLS
motion estimator is applied using as observations
all the pixels into the overlapped area in order to
move to more accurate solution. The process is
finished when ∆χ is close to 0, which is usually
fulfilled in a few iterations.
4 EXPERIMENTS
In order to test our approach in Image Registration
problems, a set of challenging sets of image pairs
have been selected. They can be downloaded from
Oxford’s Visual Geometry Group web page
2
. They
present ve types of changes between images in 8
different sets of images: Blur: bikes and tree sets,
illumination: leuven set, jpg compresion: ubc set,
zoom+rotation: bark and boat sets, and viewpoint:
graf and wall sets. To check the accuracy of the regis-
tration, the normalized correlation coefficient (NCC)
similarity measure has been calculated using the pix-
els of the overlapped area of both images. The NCC
gives values from 1.0 (low similarity) to 1.0 (high
similarity). The NCC is expressed as follows, with
µ
1
,µ
2
being the average of the gray level of both im-
ages and the overlapped area:
NCC(I
1
, I
2
) =
(x
i
,y
i
)
[(I
1
µ
1
)(I
2
µ
2
)]
q
(x
i
,y
i
)
(I
1
µ
1
)
2
(x
i
,y
i
)
(I
2
µ
2
)
2
(6)
I
1
and I
2
have been introduced to simplify notation as:
I
1
= I
1
(x
i
, y
i
), I
2
= I
2
(x
0
i
, y
0
i
)
We have focused on showing the results when
solving challenging situations like the zoom+rotation
and viewpoint sets of pair of images, particularly on
boat, bark, graf and wall sets. The affine motion
model has been used for images from sets bark and
boat since there is not a viewpoint change. The main
difficulty of this set is the presence of large rotations
and changes of scale. The presence of moderate and
large viewpoint changes forces to use the projective
motion model instead of the affine one for images
from graf and wall sets.
Table 1 shows the average NCC calculated for the
experiments performed with the images belonging to
each set, after initial estimation and after final GLS
estimation. In general, the feature-based technique
provides a good but not excellent (in terms of ac-
curacy) initial estimation of the motion parameters,
which are accurately improved after the GLS estima-
tion.
Figure 1 and 2 shows the results of the registration
process obtained for some of the most difficult pairs
from the four studied sets. The discontinuous white
line mark the boundary of the reference image (i.e the
first image of the pair).
In general the proposed method is able to regis-
ter all the images from bark and boat sets, but suffers
2
http://www.robots.ox.ac.uk/ vgg/research/affine/
index.html
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
388
Table 1: Average NCC obtained for each set.
bark boat graf wall
After initial estimation 0.84 0.81 0.64 0.85
After GLS estimation 0.95 0.91 0.88 0.92
Figure 1: Registration results for images from boat. (1,4)
set.
Figure 2: Registration results for images from graf (1,3) set.
in the case of strong viewpoint transformation, like in
the case of registering the last images from wall and
graf sets. For instance, when registering images 1 and
5 from graf set. The main problem in those cases is
that, for the initial registration, there are not enough
good feature point matches, due to the SIFT limita-
tions to the presence of strong viewpoint changes.
The registration of images from graf has an addi-
tional difficulty, the car changed its position between
the image capture process (see the right-bottom cor-
ner of images). Therefore, the NCC obtained is not
as good as the obtained for the other images (see Ta-
ble 1) since those pixels are also been included in the
calculation of the NCC. However, those pixels do not
affect the accurate estimation of the motion parame-
ters.
5 CONCLUSIONS
In this paper an image registration approach has been
presented. It uses a feature-based method, which al-
lows to cope with large magnitude of changes in scale,
rotation and viewpoint, combined with an accurate
Generalized Least Squares motion estimation tech-
nique, which uses the result obtained by the feature-
based method as initialization, and refines the estima-
tion of the motion parameters.
The proposed approach has been successfully
tested using challenging real pairs of images with il-
lumination changes, different blur level, different jpg
compression, large changes of scale, large rotations
and moderate viewpoint changes, obtaining high ac-
curacy in the estimation of motion parameters. How-
ever, some problems arise in presence of very large
viewpoint changes. This is due to the use of a non-
viewpoint invariant interest point detector. As part of
our future work we would introduce an viewpoint in-
variant point detector to overcome this shortcoming.
REFERENCES
Brown, L. G. (1992). A survey of image registration tech-
niques. ACM Computing Surveys, 24(4):325–376.
Danuser, G. and Stricker, M. (1998). Parametric model-
fitting: From inlier characterization to outlier detec-
tion. IEEE Transaction on Pattern Analysis and Ma-
chine Intelligence, 20(3):263–280.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Mikolajczyk, K. and Schmid, C. (2005). A performance
evaluation of local descriptors. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 27(10).
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A.,
Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L. V.
(2005). A comparison of affine region detector s. In-
ternational Journal of Computer Vision, 65(1/2).
Szeliski, R. (2004). Image alignment and stitching: A tu-
torial. Technical Report MSR-TR-2004-92, Microsoft
Research.
ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED
MOTION ESTIMATION
389