P-HAF: Homography Estimation using Partial Local Afﬁne Frames

Daniel Barath

Machine Perception Research Laboratory, MTA SZTAKI, Kende utca 13-17, Budapest, Hungary

barath.daniel@sztaki.mta.hu

Keywords:

Homography, Minimal Problem, Local Afﬁne Transformation, Stereo Vision.

Abstract:

We propose an algorithm, called P-HAF, to estimate planar homographies using partially known local afﬁne

transformations. This general theory is able to exploit the afﬁne components obtained by the commonly used

partially afﬁne covariant detectors, such as SIFT or SURF, in a real time capable way. P-HAF as a minimal

solver can estimate the homography using two SIFT correspondences, moreover, it can deal with any number

of point pairs as an overdetermined system. It is validated both on synthesized and publicly available datasets

that exploiting all information leads to more accurate estimates and makes multi-homography estimation less

ambiguous.

1 INTRODUCTION

Estimating planar correspondences is a crucial part of

several vision tasks e. g. robot vision (Zhou and Li,

2006; Chen et al., 2006), camera calibration (Zhang,

2000; Ueshiba and Tomita, 2003; Chuan et al., 2003),

3D reconstruction (Zhang and Hanson, 1996; Werner

and Zisserman, 2002) and augmented reality appli-

cations (Prince et al., 2002). Between two views,

a planar correspondence is described by a 3 × 3 ho-

mography matrix which is a P

→ P

perspective

transformation. Even though the most popular es-

timation techniques are based on point correspon-

dences (Hartley and Zisserman, 2003), a homogra-

phy is estimable from line (Hartley and Zisserman,

2003), region (Tanacs et al., 2014), contour (Jain

and Jawahar, 2006), or afﬁne correspondences (K

oser,

2009; Chum and Matas, 2012; Barath and Hajder,

2016). Most of these algorithms include data normal-

ization (Hartley and Zisserman, 2003), and numerical

optimization to minimize the effect of the noise.

In this paper, we assume that not only the point lo-

cations but several afﬁne components and the funda-

mental matrix are known.

Local afﬁnities are usually

represented by elliptical features. We adapt the deﬁ-

nition used in (Chum and Matas, 2012; Barath and

Hajder, 2016) where a local afﬁne transformation is

deﬁned as the ﬁrst-order Taylor-approximation of the

The pre-estimation of the fundamental matrix for rigid

scenes using point correspondences is a usual step in com-

puter vision pipelines. The proposed theory is straightfor-

ward to generalize for multiple rigid motions.

related the homography.

Local afﬁne transformations have become more

popular in the last decade. Matas et al. (Matas et al.,

2002) presented that local afﬁnities can support stereo

matching. The 3D camera pose can also be estimated

using a corresponding point pair and the related afﬁn-

ity as it is proposed by K

oser and Koch (K

oser and

Koch, 2008). Staying on the topic of 3D reconstruc-

tion, these transformations can facilitate the recov-

ery of spatial point coordinates (K

oser, 2009). Cur-

rent 3D reconstruction pipelines exploit point corre-

spondences as well as patches (Furukawa and Ponce,

2010; B

odis-Szomor

u et al., 2014; Raposo and Bar-

reto, 2016) to compute realistic 3D models of real-

world objects. Bentolila et al. (Bentolila and Fran-

cos, 2014) proved that afﬁne transformations put con-

straints on the epipoles in stereo images. Barath et

al. (Barath et al., 2016b) showed that a one-to-one re-

lationship exists between the surface normal and the

local afﬁnity.

Even though local afﬁne transformations are use-

ful and can signiﬁcantly improve the quality of the

estimation, it is time consuming to recover them –

e. g. by afﬁne covariant detectors which cannot be

applied in real time. Even so, most of the detec-

tors obtain some part of these local afﬁnities, such as

SIFT (Lowe, 1999) or SURF (Bay et al., 2006) recov-

ering the rotational and scale components. Therefore,

using solely the translation part (the point location)

causes information loss. The motivation of this re-

search is to formulate a general theory about the usage

of the afﬁne components obtained by partially afﬁne

Barath D.

P-HAF: Homography Estimation using Partial Local Afﬁne Frames.

DOI: 10.5220/0006130302270235

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 227-235

ISBN: 978-989-758-227-1

227

covariant feature detectors. The main contributions:

1. A general theory to exploit the afﬁne components

obtained by partially afﬁne covariant detectors

which is real time capable.

2. The proposed method estimates the homography

from two SIFT correspondences if the fundamen-

tal matrix is known. To our knowledge, the mini-

mum number of required point pairs was three.

1.1 Theoretical Background

Homography is deﬁned in this paper as a mapping

→ P

which maps each vector p

= [u

to its corresponding location p

= [u

i,2

∼ H[u

. The lower and up-

per indices denote the index of the current image (∈

{1,2}) and the number of the feature point (i ∈ [1,n]),

respectively.

Homography and Fundamental Matrix. The well-

known relationship

]

H = F,

where

F =









, H =









and e

= [e

]

are the homography, fundamental

matrix and epipole on the second image, respectively,

decreases the degrees-of-freedom of the homography

estimation to three as it is shown in (Barath and Ha-

jder, 2016) in detail. As a consequence, homography

H is determined by its last row (h

, h

, and h

) as

follows:

1 j

= e

3 j

+ f

2 j

, h

2 j

= e

3 j

+ f

1 j

, j ∈ {1,2}. (1)

To form a three-point-solver, called 3PT in the further

sections, Eq. 1 is substituted into the system given by

∼ Hp

. The obtained Direct Linear Transform-like

(DLT) system is as

− u

+ (v

− v

− u

= −u

− v

− f

, (2)

− u

+ (v

− v

= u

+ v

+ f

Note that one point pair yields only one equation since

the fundamental matrix reduces the DoF of the corre-

spondence problem to one since the point pairs have

to lie on the related epipolar lines. Thus three corre-

spondences are enough to estimate the homography.

Homography estimation using three corre-

spondences and the fundamental matrix is well-

known (Hartley and Zisserman, 2003).

Local Afﬁne Transformation regarding to the i-th

correspondence





(3)

is deﬁned as the partial derivative of the related ho-

mography H (K

oser, 2009; Chum and Matas, 2012;

Moln

ar and Chetverikov, 2014) as follows:

∂u

− h

∂u

∂v

− h

, (4)

∂v

∂u

− h

∂v

− h

where s = h

,1]

and h

is the j-th row of H

( j ∈ {1, 2, 3}). These four parameters of A

are re-

sponsible for horizontal and vertical scales, shear and

rotation. The last column of A

determines the trans-

lation as a

= u

− u

and a

= v

− v

Homography H deﬁnes the correspondence be-

tween the coordinates in the ﬁrst (u

and v

) and sec-

ond (u

and v

) images as

,1]

, v

,1]

2 HOMOGRAPHY ESTIMATION

FROM AFFINE

TRANSFORMATION

In this section, we show that the problem becomes

much simpler if the epipolar geometry and local afﬁne

transformations are known.

2.1 Homography from Afﬁnities

As it is shown in the Homography from Afﬁne

transformation and Fundamental matrix (HAF)

method (Barath and Hajder, 2016) the estimation of

a homography can be written in an inhomogeneous,

linear form if at least one local afﬁne transformation

and the epipolar geometry is known. The coefﬁcient

matrix C is as follows:

C =







+ u

− e

+ u

− e

+ v

− e

+ v

− e







(5)

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

228

The equation system can be formed as Cy = d ,

where vector d = [ f

, f

,− f

] is the inhomo-

geneous part while y = [h

]

is the vector of

the unknown parameters. The optimal solution in the

least squares sense is given by y = C

†

d where C

†

the Moore-Penrose pseudo-inverse of matrix C. Note

that augmenting this system with the formulas regard-

ing to the point locations (Eqs. 2) leads to more robust

estimation.

2.2 Afﬁne Transformation Model

Let us denote the afﬁne transformation related to the

i-th (i ∈ [1,N]) point pair without the translation part

as follows:







cos(α

) −sin(α

)

sin(α

) cos(α

)



0 s





cos(α

) w

cos(α

) − s

sin(α

)

sin(α

) w

sin(α

) + s

cos(α

)



(6)

Variables α

, s

, and w

are the rotational angle,

scales along x and y axes, and the shear parameters,

respectively.

2.3 Homography from Partially Known

Afﬁne Transformation

In this section, it is shown that not the full local afﬁn-

ity is necessary for homography estimation, but their

parts – obtained by e. g. SIFT or other partially afﬁne

covariant detector – can also be exploited. In the rest

of this paper the proposed method is called P-HAF

as the abbreviation of Partial HAF. Let us substitute

Eqs. 6 into Eqs. 5 as



cos(α

+ u

− e



cos(α

+ h

cos(α

) = f

, (7)



cos(α

) − s

sin(α

))v

+ u

− e



cos(α

) − s

sin(α

))u

cos(α

) − s

sin(α

)) = f

, (8)



sin(α

+ v

− e



sin(α

+ h

sin(α

) = − f

, (9)



sin(α

) + s

cos(α

))v

+ v

− e



sin(α

) + s

cos(α

))u

sin(α

) + s

cos(α

)) = − f

. (10)

These four equations contain the afﬁne transforma-

tion in an easy-to-handle form. For a given part of the

afﬁnity, e. g. rotation and scale, the appropriate equa-

tions can be selected and used. After the selection,

the given system is linear, inhomogeneous and can be

solved as in Section 2.1.

2.4 Specialization to SIFT Features

The popular SIFT (Lowe, 1999) detector obtains ro-

tation and scale covariant features, therefore, the pro-

posed theory can be specialized to use SIFT. Beside

the point locations the rotation and the scale is given

for each feature point. After the matching process the

related parts of the local afﬁne transformation are as

follows:

s =

, α = α

− α

where s

, s

, α

, and α

are the scales and angles on

the two images, respectively. Here, we assume s as

horizontal scale, thus only Eqs. 7, 9 have to be kept.

Even though one SIFT correspondence yields three

equations – one from the locations and two from the

afﬁnity –, the two regarding to the afﬁne parts are lin-

early dependent. As a consequence, two SIFT corre-

spondences are enough for homography estimation –

and the system has been already overdetermined.

For N ≥ 2 point pairs, an overdetermined, inho-

mogeneous, linear system is formed.

2.5 Normalization

As it is well-known, normalization of the input data

is a usual and important part of homography estima-

tion (Hartley and Zisserman, 2003) due to the numer-

ical instability. Let us denote the normalization trans-

formations by T

and T

where the normalized ho-

mography is calculated as

H = T

−1

. The trans-

formation matrices T

and T

are special afﬁne trans-

formations: they consist of translation and scale. The

horizontal and vertical scales of the two transforma-

tions are denoted by l

and l

(k ∈ {1,2}), respec-

tively.

Normalization of Point Pairs and Fundamental

Matrix. The normalized point pairs are calculated

on the ﬁrst and second images as ˆp

= T

, and

ˆp

= T

, respectively. The normalization formula

for the fundamental matrix is written (Hartley and

Zisserman, 2003) as

F = T

−T

−1

. (11)

Normalization of the Afﬁne Transformation. The

normalizing transformation modiﬁes the basic equa-

tions written in Eqs. 4. For example,



+ h



ˆa

−

(12)

P-HAF: Homography Estimation using Partial Local Afﬁne Frames

229

where l

= T

i,11

, l

= T

i,22

(i ∈ {1,2}) are the horizon-

tal and vertical scales of the i-th normalizing transfor-

mation, respectively. The left side of Eq. 12 is the

multiplication of the projective depth and ﬁrst afﬁne

parameter in the normalized system. After elemen-

tary modiﬁcation, it is straightforward to prove that

the afﬁne parameters are modiﬁed as

ˆa





, ˆa





ˆa





, ˆa





The normalized afﬁne transformations modify

Eqs. 7–10 as





cos(α

+ u

− e







cos(α

+ h

cos(α

)



= f

, (13)



cos(α

) − s

sin(α

))





+ u

− e







cos(α

) − s

sin(α

))u





cos(α

) − s

sin(α

)) = f

, (14)







sin(α

+ v

− e







sin(α

+ h

sin(α

)



= − f

, (15)







sin(α

) + s

cos(α

))v

+ v

− e







sin(α

) + s

cos(α

))u





sin(α

) + s

cos(α

)) = − f

. (16)

If the system is combined with Eqs. 2 an inhomoge-

neous, linear system of equations is obtained. Note

that the normalized correspondences and

F are used

in Eqs. 13– 16.

2.6 Algorithmic Details

Alg. 1 shows the P-HAF algorithm specialized to

SIFT features. The required input is a set of point cor-

respondences P and the related rotation R and scale S

components, for each. The output is the homogra-

phy. Note that for functions with parameter · · · , all

the available ones are passed.

2.7 Processing Time

Time Demand of the Algorithm. The processing

time of the proposed algorithm depends on the

solution of the inhomogeneous, linear system which

can be carried out via Moore-Penrose pseudo-

inverse. On a serial processor its time complexity is

O(m

) + O(r

) where m and r are the row number of

the coefﬁcient matrix A and its rank, respectively.

Algorithm 1: P-HAF for SIFT points.

Input: P – points on the ﬁrst and second images

R, S – rotation and scale for each point pair

F – fundamental matrix

Output: H – homography

1: P,R,S := Normalization(...);  Sec. 2.5

2: C,b := BuildCoefﬁcientMatrix(...);  Eqs. 7, 9

3: x := C

†

b; 

†

is the Moore-Penrose pseudo-inverse

4: H := HomographyFromFundMat(x, F);  Eq. 1

Remark that it is reduced to O(m) + O(r

) in paral-

lel computing (Courrieu, 2008). Therefore, P-HAF is

computable in a few milliseconds (see Table 2).

Time Demand of RANSAC. Augmenting

RANSAC (Fischler and Bolles, 1981) or other

robust statistics (Maronna et al., 2006) with P-HAF

signiﬁcantly reduces the iteration number, thus higher

processing speed is achieved. Table 1 reports the

required iteration number (Hartley and Zisserman,

2003) of RANSAC to converge using different

minimal methods (columns) as engine. Rows show

the ratio of the outliers.

Table 1: Required iteration number of RANSAC augmented

with minimal methods (columns) with 95% probability on

different outlier levels (rows).

# of required points

outl. 2 3 4

50% 11 23 47

80% 74 373 1871

It can be seen that using two points leads to signif-

icantly less iterations, thus speeding up the process,

especially for high outlier ratio.

3 EXPERIMENTAL RESULTS

The aim of this section is to show that the proposed

theory works both on synthetic and real world data.

All algorithms ended with a numerical reﬁnement

stage using Levenberg-Marquardt optimization tech-

nique (Mor

e, 1978) to minimize re-projection error.

Table 2: The processing time (in milliseconds) of normal-

ized P-HAF – including normalization – implemented in

Matlab and C++. The ﬁrst row shows the time of P-HAF

applied to a minimal subset – two correspondences. The

second one reports the mean time on all pairs of the Ade-

laideRMF and Multi-H datasets. On average, P-HAF is ap-

plied to 27 SIFT point pairs as an overdetermined system.

Matlab (ms) C++ (ms)

2 points 0.336 0.005

N points 1.106 0.012

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

230

The competitor methods are the Direct Linear Trans-

formation (DLT) and Three Point Method (3PT) ap-

plied to normalized data.

3.1 Synthesized Tests

For synthesized testing, two perspective cameras are

generated by their projection matrices P

and P

Their positions are randomized – using uniform distri-

bution – on a plane represented by function S

(u,v) =

[u v 60]

, (u,v ∈ [−20, 20]). Both cameras point

towards the origin. Their common focal length and

principal point are 600 and [300 300]

, respectively.

Fundamental matrix F is computed from projection

matrices P

, and P

(Hartley and Zisserman, 2003).

A plane passing through the origin is generated

with random orientation and sampled in 50 different

locations – these points are projected onto cameras

and P

. Zero-mean Gaussian-noise is added to

the point coordinates. The local afﬁnity related to

each point pair is calculated from the plane param-

eters (Barath and Hajder, 2016) and the noisy point

locations, then decomposed into the form

A =



cos(α

) wcos(α) − s

i,y

sin(α)

sin(α

) wsin(α) + s

i,y

cos(α)



and angle α, scale s

are kept. Tests are repeated 500

times on every noise level.

Fig. 1(a) and Fig. 1(b) visualize the mean and me-

dian errors of the normalized DLT, 3PT and P-HAF

methods plotted as the function of the σ value of the

zero-mean Gaussian-noise. P-HAF achieves the low-

est mean and median errors. Fig. 1(c) shows the effect

of the normalization. Even though the difference is

not signiﬁcant, the normalized algorithm is the most

accurate estimator.

3.2 Homography Estimation

In order to test P-HAF on real world images Ade-

laideRMF (Wong et al., 2011) and Multi-H (Barath

et al., 2016a) datasets are used. They consist of

images of different sizes and point correspondences

assigned to planes. Figure 2 shows four exam-

ple images – the ﬁrst one from each stereo pair –

from the datasets. The left column is from Multi-

H, pairs boxesandbooks and glasscasea, and the

right one from AdelaideRMF – pairs elderhalla

and bonhall. Points are painted by circles and each

is assigned to a plane by color.

Annotations contain no information about the ro-

tational or scale components, therefore, SIFT detec-

tor is applied to each image pair. Then the closest

detected feature is paired to every annotated one. If

(a) Comparison of methods, mean error.

(b) Comparison of methods, median error.

Figure 1: Re-projection error (vertical axis) calculated from

500 tests on each noise level. Parameter σ of the zero-mean

Gaussian-noise added to the point coordinates is shown on

the horizontal axis.

P-HAF: Homography Estimation using Partial Local Afﬁne Frames

231

Table 3: The mean re-projection error (in pixels) of the

methods applied to the AdelaideRMF and Multi-H datasets.

Each row represents an image pair and each column con-

sists of the re-projection errors of a method. Homographies

are estimated using the 25% of the correspondences, re-

projection error is computed w.r.t. all of them.

Test case P-HAF DLT 3PT

barrsmith 27.01 35.98 27.22

bonhall 0.97 0.82 0.99

bonython 1.33 1.35 1.35

boxesandbooks 2.06 8.46 2.12

elderhalla 3.26 3.50 3.20

elderhallb 4.73 5.34 5.21

glasscasea 7.85 26.76 9.63

glasscaseb 9.63 21.29 7.95

graffiti 0.92 1.01 0.96

hartley 1.98 1.61 2.01

johnssona 10.78 10.39 11.29

johnssonb 5.28 6.34 5.67

ladysymon 7.55 7.58 7.50

library 4.82 6.03 4.97

napiera 15.01 14.48 17.78

napierb 17.74 30.61 17.28

neem 4.31 5.44 5.64

nese 4.35 6.90 4.66

sene 4.07 7.88 4.73

unihouse 8.80 5.38 5.58

unionhouse 7.01 7.53 7.01

mean 7.11 10.22 7.27

median 4.82 6.90 5.58

the distance is greater than 5 pixels the point pair is

omitted from the evaluation. The fundamental ma-

trix F is estimated by the RANSAC eight-point tech-

nique (Hartley and Zisserman, 2003) with threshold

value set to 1.0 followed by a Levenberg-Marquardt

optimization minimizing symmetric epipolar dis-

tance. Every homography is estimated using the 25%

of the correspondences, however, the reported re-

projection errors are computed using all of them.

In Table 3, the mean re-projection errors (in

pixels) are reported. Rows represent different test

pairs from the AdelaideRMF and Multi-H datasets,

columns show the related errors. It can be seen that

the mean errors of P-HAF and 3PT are quite simi-

lar, even so, P-HAF is slightly better. The median er-

ror of P-HAF is signiﬁcantly better than that of DLT

and 3PT. This is expected since DLT and 3PT use a

smaller part of the underlying afﬁne transformation –

the translation – while P-HAF exploits all the avail-

able information.

Figure 2: Example images from the image pairs of Multi-

H (left column) and AdelaideRMF (right column) datasets.

Points are marked by circles, planes are denoted by color.

3.3 Multiple Homography Fitting

One of the main advantage of P-HAF is the required

minimal point number as it is able to estimate a ho-

mography from only two SIFT correspondences. DLT

needs four and 3PT three of those. Most of the robust

model ﬁtting techniques, e. g. RANSAC, are based on

minimum subsets consisting of the minimum number

of data to estimate a given model. Using as few data

as possible makes the estimation faster, less ambigu-

ous, and possibly more accurate.

In this section, a multi-model ﬁtting technique,

PEARL (Isack and Boykov, 2012), is augmented with

different model initialization methods: normalized

DLT and P-HAF. The same datasets are used as in

the previous experiments, AdelaideRMF and Multi-

H. AdelaideRMF mainly consists of buildings while

Multi-H smaller planar objects.

Fig. 3 shows the results of multi-homography ﬁt-

ting. Each row consists of the ﬁrst image of a se-

lected test pair. The left column shows the original

image and the other ones report the obtained planar

labellings obtained by PEARL with different hypoth-

esis generation techniques: normalized DLT (middle)

or P-HAF (right). The same parameters are used for

all the tests and the same amount of hypothesizes are

generated. The reported misclassiﬁcation error (ME)

is the ratio of the points assigned to wrong plane in

percentage. It can be seen that PEARL augmented

with P-HAF is signiﬁcantly more accurate then the

one using normalized DLT.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

232

(a) Test: neem. 1. Original image, 2. by DLT (ME = 29.46%), 3. by P-HAF (ME = 10.63%)

(b) Test: nese. 1. Original image, 2. by DLT (ME = 19.90%), 3. by P-HAF (ME = 13.78%)

(d) Test: napierb. 1. Original image, 2. by DLT (ME = 38.22%), 3. by P-HAF (ME = 23.17%)

Figure 3: The results of multiple homography ﬁtting to point correspondences. Each row is the ﬁrst image of a test pair

from AdelaideRMF dataset and the results of PEARL. Columns reports the obtained planar labellings of PEARL method with

different hypothesis generation techniques: normalized DLT or P-HAF. The same parameters are used for all the tests and the

same amount of hypothesizes are generated. The reported misclassiﬁcation error (ME) is the ratio of the points assigned to

wrong plane in percentage. Points are painted by circles and planes marked by color.

4 CONCLUSION

A novel minimal method is presented in this paper

to improve the general point-based homography esti-

mation by exploiting the information yielded by the

commonly used feature detectors. The proposed P-

HAF method is able to estimate the homography us-

ing at least two SIFT correspondences and applicable

in real time. The main message of this paper is that

usually there are more information about the underly-

P-HAF: Homography Estimation using Partial Local Afﬁne Frames

233

ing homography than only the point coordinates – e.

g. SIFT, SURF obtain the rotational component and

the scale as well. Neglecting this information yields

information loss. We see no reasons to use the four-

point algorithm instead of P-HAF for rigid scenes if

SIFT or SURF features are given.

REFERENCES

Barath, D. and Hajder, L. (2016). Novel ways to estimate

homography from local afﬁne transformations. In In

Proceedings of the 11th Joint Conference on Com-

puter Vision, Imaging and Computer Graphics The-

ory and Applications - Volume 3: VISAPP, pages 432–

443.

Barath, D., Hajder, L., and Matas, J. (2016a). Multi-h: Ef-

ﬁcient recovery of tangent planes in stereo images. In

BMVC 2016, 27th British Machine Vision Conference,

19-22 September, York, England, volume 28, page 32.

Barath, D., Molnar, J., and Hajder, L. (2016b). Novel meth-

ods for estimating surface normals from afﬁne trans-

formations. In Computer Vision, Imaging and Com-

puter Graphics Theory and Applications, pages 316–

337. Springer International Publishing.

Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:

Speeded up robust features. In European conference

on computer vision, pages 404–417. Springer.

Bentolila, J. and Francos, J. M. (2014). Conic epipolar con-

straints from afﬁne correspondences. Computer Vision

and Image Understanding, 122:105–114.

odis-Szomor

u, A., Riemenschneider, H., and Gool, L. V.

(2014). Fast, approximate piecewise-planar modeling

based on sparse structure-from-motion and superpix-

els. In IEEE Conference on Computer Vision and Pat-

tern Recognition.

Chen, J., Dixon, W. E., Dawson, D. M., and McIntyre, M.

(2006). Homography-based visual servo tracking con-

trol of a wheeled mobile robot. Robotics, IEEE Trans-

actions on, 22(2):406–415.

Chuan, Z., Long, T. D., Feng, Z., and Li, D. Z. (2003). A

planar homography estimation method for camera cal-

ibration. In Computational Intelligence in Robotics

and Automation, 2003. Proceedings. 2003 IEEE In-

ternational Symposium on, volume 1, pages 424–429.

IEEE.

Chum, O. and Matas, J. (2012). Homography estimation

from correspondences of local elliptical features. In

Pattern Recognition (ICPR), 2012 21st International

Conference on, pages 3236–3239. IEEE.

Courrieu, P. (2008). Fast computation of moore-penrose

inverse matrices. arXiv preprint arXiv:0804.4809.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: a paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Communications of the ACM, 24(6):381–395.

Furukawa, Y. and Ponce, J. (2010). Accurate, dense, and

robust multi-view stereopsis. IEEE Trans. on Pattern

Analysis and Machine Intelligence, 32(8):1362–1376.

Hartley, R. I. and Zisserman, A. (2003). Multiple View Ge-

ometry in Computer Vision. Cambridge University

Press.

Isack, H. and Boykov, Y. (2012). Energy-based geometric

multi-model ﬁtting. International journal of computer

vision, 97(2):123–147.

Jain, P. K. and Jawahar, C. (2006). Homography estima-

tion from planar contours. In 3D Data Processing,

Visualization, and Transmission, Third International

Symposium on, pages 877–884. IEEE.

oser, K. (2009). Geometric Estimation with Local Afﬁne

Frames and Free-form Surfaces. Shaker.

oser, K. and Koch, R. (2008). Differential spatial resection

- pose estimation using a single local image feature. In

ECCV, pages 312–325.

Lowe, D. G. (1999). Object recognition from local scale-

invariant features. In Computer vision, 1999. The pro-

ceedings of the seventh IEEE international conference

on, volume 2, pages 1150–1157. Ieee.

Maronna, R., Martin, D., and Yohai, V. (2006). Robust

statistics. John Wiley & Sons, Chichester. ISBN.

Matas, J., Obdrz

alek, S., and Chum, O. (2002). Local afﬁne

frames for wide-baseline stereo. In ICPR, Quebec,

Canada, August 11-15, 2002., pages 363–366.

Moln

ar, J. and Chetverikov, D. (2014). Quadratic transfor-

mation for planar mapping of implicit surfaces. Jour-

nal of Mathematical Imaging and Vision, 48:176–184.

Mor

e, J. J. (1978). The levenberg-marquardt algorithm: im-

plementation and theory. In Numerical analysis, pages

105–116. Springer.

Prince, S. J., Xu, K., and Cheok, A. D. (2002). Augmented

reality camera tracking with homographies. Computer

Graphics and Applications, IEEE, 22(6):39–45.

Raposo, C. and Barreto, J. P. (2016). Theory and practice of

structure-from-motion using afﬁne correspondences.

Tanacs, A., Majdik, A., Molnar, J., Rai, A., and Kato, Z.

(2014). Establishing correspondences between planar

image patches. In Digital lmage Computing: Tech-

niques and Applications (DlCTA), 2014 International

Conference on, pages 1–7. IEEE.

Ueshiba, T. and Tomita, F. (2003). Plane-based calibra-

tion algorithm for multi-camera systems via factor-

ization of homography matrices. In Computer Vision,

2003. Proceedings. Ninth IEEE International Confer-

ence on, pages 966–973. IEEE.

Werner, T. and Zisserman, A. (2002). New techniques

for automated architectural reconstruction from pho-

tographs. In Computer VisionECCV 2002, pages 541–

555. Springer.

Wong, H. S., Chin, T.-J., Yu, J., and Suter, D. (2011).

Dynamic and hierarchical multi-structure geometric

model ﬁtting. In International Conference on Com-

puter Vision (ICCV).

Zhang, Z. (2000). A ﬂexible new technique for camera cal-

ibration. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 22(11):1330–1334.

Zhang, Z. and Hanson, A. R. (1996). 3d reconstruction

based on homography mapping. Proc. ARPA96, pages

1007–1012.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

234

Zhou, J. and Li, B. (2006). Robust ground plane detection

with normalized homography in monocular sequences

from a robot platform. In Image Processing, 2006

IEEE International Conference on, pages 3017–3020.

IEEE.

P-HAF: Homography Estimation using Partial Local Afﬁne Frames

235