Camera Self-Calibration from Two Views with a Common Direction

Yingna Su

1,2

, Xinnian Guo

1,2

and Yang Shen

College of Information Engineering, Suqian University, Suqian, China

Suqian Key Laboratory of Visual Inspection and Intelligent Control, Suqian University, Suqian, China

Industrial Technology Research Institute, Suqian University, Suqian, China

Keywords:

Camera Self-Calibration, Gravity Direction, Homography Constraints, Principal Point Estimation.

Abstract:

Camera calibration is crucial for enabling accurate and robust visual perception. This paper addresses the chal-

lenge of recovering intrinsic camera parameters from two views of a planar surface, that has received limited

attention due to its inherent degeneracy. For cameras equipped with Inertial Measurement Units (IMUs), such

as those in smartphones and drones, the camera’s y-axes can be aligned with the gravity direction, reducing the

relative orientation to a one-degree-of-freedom (1-DoF). A key insight is the general orthogonality between

the ground plane and the gravity direction. Leveraging this ground plane constraint, the paper introduces new

homography-based minimal solutions for camera self-calibration with a known gravity direction. we derive

2.5- and 3.5-point camera self-calibration algorithms for points in the ground plane to enable simultaneous

estimation of the camera’s focal length and principal point. The paper demonstrates the practicality and ef-

ﬁciency of these algorithms and comparisons to existing state-of-the-art methods, conﬁrming their reliability

under various levels of noise and different camera conﬁgurations.

1 INTRODUCTION

In the ﬁeld of computer vision, the calibration of cam-

eras plays a fundamental role in enabling accurate and

robust visual perception. Planar structures are ubiqui-

tous in man-made environments and have found ex-

tensive utility in various geometric model estimation

tasks. Zhang et al. (Zhang, 2000) employed a known

planar target to derive a closed-form solution for the

camera calibration problem. Fitzgibbon (Fitzgibbon,

2001) introduced a minimal solver for the estima-

tion of two-view homography with consistent distor-

tion. Kukelova and Pajdla (Kukelova et al., 2015) pre-

sumed varying distortions between two cameras and

formulated algorithms for estimating corresponding

homography and distortion parameters. Nonetheless,

the challenge of recovering intrinsic camera parame-

ters from two views of a planar surface has received

limited attention, primarily due to its degeneracy in

the context of most algorithms (Nist

er, 2004).

Recent research by Ding et al. (Ding et al., 2022)

has demonstrated the feasibility of resolving this

problem when the two views share a common direc-

tion. This ﬁnding bears particular relevance, given

the prevalence of smartphones, tablets, and camera

systems in applications such as automobiles and un-

Figure 1: The y-axis of the camera is orthogonal to the

ground plane after being aligned with the gravity direction.

manned aerial vehicles (UAVs), which commonly

feature IMUs capable of measuring the gravitational

vector. Given an uncalibrated smart device, e.g., a

smart phone, we can capture the images and the cor-

responding IMU data which can be used to measure

the gravity direction. As shown in (Kukelova et al.,

2010; Guan et al., 2018), the relationship between the

axes of the camera and the IMU are usually 0°, 90°

or 180°. In this case, the rotation between the camera

and the IMU of smart devices can be known with-

out calibrating the camera and the IMU. We can align

y-axes of the camera with the gravity direction, re-

ducing relative orientation to 1-DoF rotation around

680

Su, Y., Guo, X. and Shen, Y.

Camera Self-Calibration from Two Views with a Common Direction.

DOI: 10.5220/0012438100003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

680-685

ISBN: 978-989-758-679-8; ISSN: 2184-4321

the gravity direction (Fig. 1). A crucial insight is

the general orthogonality between the ground plane

and the gravity direction. This assumption is fulﬁlled

for many man-made environments, and has been suc-

cessfully used in many computer vision tasks (Dibene

et al., 2023; Li et al., 2023). Leveraging this ground

plane constraint, we propose new homography-based

minimal solutions for camera self-calibration with the

known gravity direction. The proposed framework is

depicted in Fig. 2 The main contributions of this paper

are:

(i) By exploiting the ground plane assumption, we

show that the Euclidean homography matrix has spe-

cial properties which allows us to derive new con-

straints and solve the homography-based camera self-

calibration problem efﬁciently.

(ii) Based on the new homography-based con-

straints, we derive 2.5-point algorithms for points in

the ground plane to estimate the focal length of the

camera.

(iii) Moreover, we propose a 3.5-point algorithm

to estimate the focal length and principal point coor-

dinates of the camera simultaneously.

2 OUR APPROACH

2.1 Homography-Based Constraints

Suppose two image points m = [u, v,1]

⊤

and m

′

,1]

⊤

are given for a point on a plane in the 3D

space with respect to two camera frames. The Eu-

clidean homography matrix H that transforms one

into the other satisﬁes

λK

−1

′

= HK

−1

m, (1)

where λ is a scaling factor, and K is the camera in-

trinsic matrix. The Euclidean homography matrix H

is related to the rotation matrix R, the translation ma-

trix T, the distance d from the camera frame to the

target plane, and the normal N of the plane according

H = R −

⊤

. (2)

Since the gravity direction can be calculated from the

IMU data, without loss of generality, we can align

the y-axes of the cameras with the gravity direction

(Fig. 1). After alignment, the rotation transformation

matrix of two camera views reduces from 3-DoF to

1-DoF and can be represented as





cosθ 0 sinθ

0 1 0

−sinθ 0 cosθ





. (3)

Applying the rotations to the normalized image

points, then Eq. (1) becomes

λR

⊤

−1

′

= H

⊤

−1

m, (4)

with

= R

− tn

⊤

, (5)

where R

are the rotation matrices of two cameras

for the alignment, t = [t

]

⊤

and n are the transla-

tion and plane parameters after the alignment. Based

on the assumption that the ground planes are orthogo-

nal to the gravity direction, the plane normal n is equal

to [0 1 0]

⊤

when the points lie in a horizontal plane.

Then Eq. (5) can be formulated as





cosθ 0 sinθ

0 1 0

−sinθ 0 cosθ





−









[0 1 0]





cosθ −t

sinθ

0 1 −t

−sinθ −t

cosθ





(6)

Obviously H

obeys 4 constraints:

= 0, h

− h

= 0, h

+ h

= 0, (7)

where h

are the elements of the matrix H

. These

constraints allow us to solve minimal solutions for

camera self-calibration more efﬁciently.

2.2 Unknown Focal Length(2.5-point)

For most modern CCD and CMOS cameras, it is rea-

sonable to assume unit aspect ratio and that the princi-

pal point coincides with the image centerEq.(Hartley

and Li, 2012). In this case, the only unknown intrinsic

camera parameter is the focal length f . We propose a

2.5-point algorithm for estimating f .

In general, Eq. (1) can be written as

λm

′

= Gm, (8)

where G transforms the image points. Given one

point correspondence (m,m

′

), Eq. (8) can also be

written as



0 0 0 -u -v -1 v

′

u v

′

v v

′

u v 1 0 0 0 -u

′

u -u

′

v -u

′



g = 0,

g = [g

]

⊤

, (9)

where g

,...,g

are the elements of the 2D homog-

raphy matrix G. Each point correspondence gives two

linearly independent constraints. By stacking the con-

straints for κ point correspondences,Eq. (9) leads to a

system of equations of the form

Ag = 0, (10)

Camera Self-Calibration from Two Views with a Common Direction

681

Figure 2: The overall framework of the proposed method.

where A is a 2κ × 9 matrix. Then g and the 2D ho-

mography matrix G can be found as the null space

of A. With 2.5 point correspondences (note that we

still need to use three point correspondences, but only

need one equation from the last correspondence), the

general solution of g in Eq. (10) is a 4-dimensional

null space which can be written as

g = αg

+ βg

+ γg

+ g

, (11)

where α, β,γ are the coefﬁcients. Based on Eq. (4) and

Eq. (8), the Euclidean homography matrix H

can be

formulated as

= R

⊤

−1

GKR

. (12)

Let K = diag( f , f ,1), K

−1

= diag(1/ f ,1/ f ,1). Sub-

stituting Eq. (11) into Eq. (12) we can parameterize

using {α,β,γ, f }. Then substituting this formula-

tion into constraints Eq. (7), we obtain 4 polynomial

equations in 4 unknowns {α,β,γ, f }:

[1,α,β, γ, f ,α f , β f ,γ f , f

,α f

,β f

,γ f

]

⊤

= 0,

(13)

where {a

|i = 1, 2,3,4} are coefﬁcients. The system

of equations Eq. (13) can be solved using the Gr

obner

basis method (Cox et al., 2006). For more details

about the Gr

obner basis method and the polynomial

eigenvalue solution we refer the reader to (Kukelova

et al., 2012; Larsson et al., 2017b; Larsson et al.,

2017a; Larsson et al., 2018). There are up to 4 real

solutions. Negative solutions of f can also be aban-

doned.

2.3 Unknown Focal Length and

Principal Point(3.5-point)

However, sometimes the principal point may not co-

incide with the image center. In this case, the un-

known camera intrinsic parameters are the unknown

focal length f and the principal point (u

). Let

K =





f 0 u

0 f v

0 0 1





−1





1/ f 0 −u

/ f

0 1/ f −v

/ f

0 0 1





(14)

We derive a 3.5-point algorithm to estimate the cam-

era intrinsic parameters. With 3.5 point correspon-

dences, the general solution of g in Eq. (10) is a 2-

dimensional null space which can be written as

g = αg

+ g

. (15)

Substituting Eq. (14) and Eq. (15) into Eq. (12) we

can parameterize H

using {α, f ,u

}. Then sub-

stituting this formulation into constraints Eq. (7),

we obtain 4 polynomial equations in 4 unknowns

{α, f , u

[1,α, f ,u

,α f , αu

,αv

, f u

, f v

,·· ·

α f u

,α f v

,αu

, f

,α f

,αu

,αv

]

⊤

= 0,

(16)

where {b

|i = 1,2,3,4} are coefﬁcients. Here we rec-

ommend using an automatic generator to solve the

system of polynomial equations, e.g. , (Larsson et al.,

2017a). We obtain a Gauss-Jordan elimination tem-

plate of size 79 × 91, and there are up to 12 real solu-

tions.

3 EXPERIMENTS

We choose the following setup to generate the syn-

thetic data for the self-calibration evaluation. It con-

tains two image sequences. The simulated cameras

of both sequences have the same parameters: the fo-

cal length f

of the camera is set to 3442 pixels, and

the coordinates of the principal point (u

) is set

to (2016,1512). The image resolution of the ﬁrst

sequence is 4032 × 3024, i.e., the principal point of

the camera coincides with the image center. The im-

age resolution of the second sequence is 3225 × 2419

which indicates that the principal point does not lo-

cate at the center of the image. 100 3D points are

distributed on the ground plane which is orthogonal

to the image plane of the ﬁrst view. Each 3D point

is observed by two camera views to generate an im-

age pair. This is similar to (Fraundorfer et al., 2010;

Saurer et al., 2017; Ding et al., 2022). We generate

1,000 pairs of images for each sequence to evaluate

the performance. The relative focal length error is de-

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

682

(a) the ﬁrst sequence (b) the second sequence

Figure 3: Histograms of the relative focal length error E

distribution for 10,000 runs with the ﬁrst and the second

image sequences, respectively.

ﬁned as

= | f

− f

|/ f

, (17)

where f

denotes the estimated focal length and f

the ground truth. The principal point error is formu-

lated as

(

− u

)

) ∗ (

− v

)

), (18)

where (u

) denotes the estimated coordinates of

the principal point and (u

) is the true one.

3.1 Numerical Precision

Figure 3 shows the histograms of the relative focal

length error E

of the proposed algorithms for 10,000

runs with the ﬁrst and the second image sequences,

respectively. ’2.5pt’ denoteS the 2.5-point algorithms

using the Gr

obner basis solution. ’3.5pt’ denotes the

3.5-point algorithms using the Gauss-Jordan elimina-

tion template of size 79 ×91. The error distribution of

Fig. 3(a) shows that the 2.5-point algorithm performs

as expected for the focal length estimation when the

principal point of the camera coincides with the im-

age center. The stability of 3.5-point algorithm is not

as good as the 2.5-point case, but it does not contain

large errors and is sufﬁcient for real applications. As

shown in Fig. 3(b), the 3.5-point algorithm is more

reliable than the 2.5-point algorithm when the princi-

pal point of the camera does not locate at the center

of the image. Figure 4 shows the principal point error

of the 3.5-point algorithm for 10,000 runs with the

ﬁrst and the second image sequences, respectively. As

shown, our method is efﬁcient and robust in estimat-

ing the principal point of the camera on both of the

image sequences.

3.2 Stability of the Solutions Compared

to Other Methods

In this section we compare the proposed methods with

the state of the arts. ’6pt’ denotes the two-view 6-

point algorithm proposed in (Kukelova et al., 2017).

(a) the ﬁrst sequence (b) the second sequence

Figure 4: Histograms of the principal point error E

distri-

bution for 10,000 runs with the ﬁrst and the second image

sequences, respectively.

’4pt’ denotes the 4-point homography based algo-

rithm proposed in (Ding et al., 2022). Because we

still need to sample 3 and 4 points in practice, we use

SVD to compute the null space with 3 and 4 points,

respectively. These algorithms are evaluated under in-

creased level of image noise (point location) from 0 to

1 pixel. In addition, the gravity direction measured by

the accelerometers is not perfect in real environment.

Thus we also simulate the noisy case with increased

roll, pitch noise (gravity direction) and constant im-

age noise of 0.5 pixel standard deviation. The max

standard deviation of the (roll, pitch) noise is set to

0.5

◦

, because smart phone IMUs typically have noise

of less than 0.5

◦

(Sweeney et al., 2014). Note that for

our algorithms we use the noisy roll, pitch angles to

compute the full rotation.

Figure 5 shows the median focal length error of

the ﬁrst image sequence (the ﬁrst row) and the sec-

ond image sequence (the second row) with increased

image noise (the left column), roll noise (the mid-

dle column) and pitch noise (the right column), re-

spectively. As expected, the proposed 2.5-point al-

gorithm performs better than the other ones under

perfect IMUs data and the 3.5-point algorithm can

also achieve promising results for estimating the fo-

cal length of camera when the principal point of the

camera locates at the center of the image (as shown in

Fig. 5(a)). Figure 5(b) shows that the proposed 3.5-

point algorithm is more accurate than the other three

methods when the principal point of the camera does

not coincide with the image center. The 6pt algorithm

is not inﬂuenced by the roll and pitch noise because it

does not need IMUs data. Overall, we can see that the

proposed 2.5- and 3.5-point algorithms are slightly

better than the other methods on focal length estima-

tion when the principal point of the camera does and

does not coincide with the image center, respectively.

Camera Self-Calibration from Two Views with a Common Direction

683

(a) the ﬁrst image sequence

(b) the second image sequence

Figure 5: Boxplot of relative focal length error. The results of the ﬁrst column are with the increased image noise from 0 to 1

pixel. The results of the second column are with the increased roll noise from 0 to 0.5

◦

and the constant image noise of 0.5

pixel. The results of the last column are with the increased pitch noise from 0 to 0.5

◦

and the constant image noise of 0.5

pixel.

Table 1: The principal point error E

of the 3.5-point al-

gorithm with the synthetic data for 10,000 runs under both

image sequences.

the ﬁrst sequence the second sequence

mean median mean median

Image

noise

1.153

e-09

1.8685

e-13

1.2959

e-09

1.8688

e-13

0.5 0.0396 0.012 0.0405 0.0119

1.0 0.0704 0.0241 0.0692 0.0237

Roll

noise

0.1 0.0496 0.0193 0.0462 0.0187

0.3 0.1060 0.0457 0.1134 0.0534

0.5 0.1503 0.0666 0.1537 0.0782

Pitch

noise

0.1 0.0471 0.0192 0.0460 0.0184

0.3 0.0705 0.0305 0.0687 0.0295

0.5 0.0919 0.0438 0.0877 0.0433

3.3 Evaluation of the Principal Point

To our best knowledge, no homography-based two

view method has been performed to estimate the prin-

cipal point. So we only give the statistical results of

our method without the comparisons to other meth-

ods. Table 1 gives the principal point error E

of the

3.5-point algorithm with the synthetic data for 10,000

runs under both image sequences. Similarly, we eval-

uate the 3.5-point algorithm under increased level of

image noise and roll, pitch noise. The third and the

fourth column show the results of the ﬁrst image se-

quence (the principal of the camera coincides with the

image center). The ﬁfth and the sixth column give the

results of the second image sequence (the principal

point does not locate at the image center). The third to

the ﬁfth rows show the principal point error E

with

the increased image noise from 0 to 1 pixel. The sixth

to the eighth rows show the results with the increased

roll noise from 0 to 0.5 degree and the constant im-

age noise of 0.5 pixel standard deviation. The last

three rows shows the results with the increased pitch

noise from 0 to 0.5 degree and the constant image

noise of 0.5 pixel. As shown, the proposed method

can achieve efﬁcient results for the principal point es-

timation under different noise cases and sequences.

In general, based on the simulation experiments

we ﬁnd that the proposed 2.5- and 3.5-point algo-

rithms are comparable to the existing methods for the

focal length and the principal point estimation under

different noise cases. To the best of our knowledge,

good IMUs today can have noise levels of around 0.06

degrees in the computed angles (Fraundorfer et al.,

2010). In this case, our algorithms are practical and

can be used as alternative algorithms on camera self-

calibration pipelines for smart phones and tablets.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

684

4 CONCLUSION

This paper proposes a self-calibration method for

estimating camera focal length and principal point

based on the orthogonality assumption and homog-

raphy constraints. Leveraging IMU data and the or-

thogonality assumption, new homography constraints

are derived in this paper. The 2.5-point and 3.5-point

methods for estimating camera focal length and prin-

cipal point are presented. Thanks to the simpliﬁed

constraints, the algorithm in this paper not only ex-

hibits superior performance compared to alternative

approaches but also ensures high efﬁciency. We be-

lieve that the method proposed in this paper can serve

as an alternative algorithm for camera self-calibration

in intelligent vehicle applications, further enhancing

the performance of intelligent vehicle systems.

ACKNOWLEDGEMENTS

The authors would like to thank the editor and

the anonymous reviewers for their critical and con-

structive comments and suggestions. This work is

supported by Suqian science and technology plan

project under No. K202233, K202229, K202231,

H202117 and Suqian Natural Science Foundation

(No. M202305).

REFERENCES

Cox, D. A., Little, J., and O’shea, D. (2006). Using al-

gebraic geometry, volume 185. Springer Science &

Business Media.

Dibene, J. C., Min, Z., and Dunn, E. (2023). General planar

motion from a pair of 3d correspondences. In Pro-

ceedings of the IEEE/CVF International Conference

on Computer Vision, pages 8060–8070.

Ding, Y., Barath, D., Yang, J., and Kukelova, Z. (2022).

Relative pose from a calibrated and an uncalibrated

smartphone image. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recogni-

tion, pages 12766–12775.

Fitzgibbon, A. (2001). Simultaneous linear estimation of

multiple view geometry and lens distortion. In Pro-

ceedings of the 2001 IEEE Computer Society Con-

ference on Computer Vision and Pattern Recognition.

CVPR 2001.

Fraundorfer, F., Tanskanen, P., and Pollefeys, M. (2010). A

minimal case solution to the calibrated relative pose

problem for the case of two known orientation an-

gles. In The European Conference on Computer Vi-

sion (ECCV).

Guan, B., Yu, Q., and Fraundorfer, F. (2018). Minimal solu-

tions for the rotational alignment of imu-camera sys-

tems using homography constraints. Computer vision

and image understanding.

Hartley, R. and Li, H. (2012). An efﬁcient hidden variable

approach to minimal-case camera motion estimation.

IEEE transactions on pattern analysis and machine

intelligence.

Kukelova, Z., Bujnak, M., and Pajdla, T. (2010). Closed-

form solutions to minimal absolute pose problems

with known vertical direction. In Asian Conference

on Computer Vision.

Kukelova, Z., Bujnak, M., and Pajdla, T. (2012). Poly-

nomial eigenvalue solutions to minimal problems in

computer vision. IEEE Transactions on Pattern Anal-

ysis and Machine Intelligence.

Kukelova, Z., Heller, J., Bujnak, M., and Pajdla, T. (2015).

Radial distortion homography. In The IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR).

Kukelova, Z., Kileel, J., Sturmfels, B., and Pajdla, T.

(2017). A clever elimination strategy for efﬁcient min-

imal solvers. In The IEEE Conference on Computer

Vision and Pattern Recognition (CVPR).

Larsson, V.,

Astr

om, K., and Oskarsson, M. (2017a). Ef-

ﬁcient solvers for minimal problems by syzygy-based

reduction. In The IEEE Conference on Computer Vi-

sion and Pattern Recognition (CVPR).

Larsson, V., Astrom, K., and Oskarsson, M. (2017b). Poly-

nomial solvers for saturated ideals. In The IEEE In-

ternational Conference on Computer Vision (ICCV).

Larsson, V., Oskarsson, M.,

Astr

om, K., Wallis, A.,

Kukelova, Z., and Pajdla, T. (2018). Beyond gr

obner

bases: Basis selection for minimal solvers. In The

IEEE Conference on Computer Vision and Pattern

Recognition (CVPR).

Li, H., Zhao, J., Bazin, J.-C., Kim, P., Joo, K., Zhao, Z.,

and Liu, Y.-H. (2023). Hong kong world: Leveraging

structural regularity for line-based slam. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence.

Nist

er, D. (2004). An efﬁcient solution to the ﬁve-point

relative pose problem. IEEE transactions on pattern

analysis and machine intelligence.

Saurer, O., Vasseur, P., Boutteau, R., Demonceaux, C.,

Pollefeys, M., and Fraundorfer, F. (2017). Homog-

raphy based egomotion estimation with a common di-

rection. IEEE transactions on pattern analysis and

machine intelligence.

Sweeney, C., Flynn, J., and Turk, M. (2014). Solving

for relative pose with a partially known rotation is a

quadratic eigenvalue problem. 3DV.

Zhang, Z. (2000). A ﬂexible new technique for camera cal-

ibration. IEEE Transactions on pattern analysis and

machine intelligence, 22.

Camera Self-Calibration from Two Views with a Common Direction

685