Line-based Registration of Photogrammetric Point Clouds with 3D City

Models by Means of Mixed Integer Linear Programming

Steffen Goebbels and Regina Pohle-Fr¨ohlich

Institute for Pattern Recognition, Faculty of Electrical Engineering and Computer Science,

Niederrhein University of Applied Sciences, Reinarzstr. 49, 47805 Krefeld, Germany

Keywords:

Point Cloud Registration, Linear Programming, Structure from Motion, Building Reconstruction, CityGML.

Abstract:

This paper describes a method to align photogrammetric point clouds with CityGML 3D city models. Amongst

others, we use photogrammetric point clouds that are generated from videos taken from the driver’s perspective

of a car. Clouds are computed with the Str ucture-from-Motion algorithm. We detect wall planes to rotate these

clouds so that walls become vertical. This allows us to ﬁnd buildings’ footprints by accumulating points that

are orthogonally projected to the ground. Thus, the main alignment step can be performed in 2D. To this end,

we match detected footprints with corresponding footprints of CityGML models in a x-y-plane based on line

segments. These line segments are detected using a probabilistic Hough transform. Then we apply a Mixed

Integer Linear Program to ﬁnd a maximum number of matching line segment pairs. Using a Linear Program,

we optimize a rigid afﬁne transformation to align the lines of these pairs. Finally, we use height information

along CityGML terrain intersection lines to estimate scaling and translation in z-direction. By combining the

results, we obtain an afﬁne mapping that aligns the point cloud wi th the city model. Linear Programming is

not widely applied to registration problems; however the technique presented is a fast alternative to Iterative

Closest Point algorithms t hat align photogrammetric point clouds with clouds sampled from cit y models.

1 INTRODUCTION

Virtual 3D city models are used for simulations (eg

noise maps, lighting models, solar poten tial analy-

zes, ﬂood maps, h e at requireme nt mapping) as well

as for planning purposes like Building Information

Modeling (BIM). CityGML is the XML b ased des-

cription standard for city mode ls. It offers a conc e pt

of Level of Detail (LoD), see (Gr¨oger et al., 2012)

Most models currently are given in LoD 2 with de-

ﬁned roof facets and walls but without detailed fa-

cade informatio n. Window and door objects belong

to LoD 3. Facade models do not only serve for visua-

lization or planning purp oses. For example, gateways

and routes for rescue workers are very interesting in

smart-city applications. Whereas airborne laserscan-

ning point clouds are provided as open da ta in o ur

country, they only show roofs but no facades. Facade

data have to be acquired individually. An inexpensive

way to obtain photog rammetric point clou ds of faca-

des from videos or overlapping photos is to use the

Structure-from-Motion algorithm (SfM).

We generate dense point clouds from videos taken

either through the fron t window of a car or b y UAVs

with the SfM tool Agisoft Photoscan, see Figure 1.

The tool delivers a c loud of colored po ints, estima -

ted camera position and camer a parameters for video

frames, and a textur e d mesh.

It is difﬁcult to do a manual r egistration of pho-

togrammetric point clouds with sufﬁcient precision to

cleanly project points or textured meshes to CityGML

walls. Therefore , we apply an automatic precise re-

gistration of the point cloud with the city model. As

a p reliminary, the cloud has to be coarsely registe-

red with UTM coo rdinates, either manually or using

GPS/GLONASS informatio n. The next section dis-

cusses some approaches to do such a precise registra-

tion. In Sections 3–5 we in detail descr ibe ou r ap-

proach to match line segments by solving a combi-

natorial proble m with fast Linear and Mixed Integer

Linear Program s. Section 6 summarizes results.

2 VARIOUS REGISTRATION

APPROACHES

There exists a large variety of point cloud registration

algorithm s, see (Maiseli et al., 2 017). Most of them

can b e classiﬁed as non-feature- or feature-based. Ho-

Goebbels, S. and Pohle-Fröhlich, R.

Line-based Registration of Photogrammetric Point Clouds with 3D City Models by Means of Mixed Integer Linear Programming.

DOI: 10.5220/0006533002990306

In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018) - Volume 4: VISAPP, pages

299-306

ISBN: 978-989-758-290-5

299

Figure 1: SfM point cloud, and cloud sampled from Ci-

tyGML walls and merged with terrain points from airborne

laserscanning.

wever, our input consist of only one (colored) point

cloud. The city model basically is a set of poly-

gons. The easiest way to apply non- feature-based al-

gorithms is to sample a point cloud from the city mo-

del. We work with 100 points/ m

. Experiments sho-

wed that increased sampling rates only do have mi-

nor inﬂuence on registration results. In contrast to

simultaneou s loc alization and mapping (SLAM) ap-

plications, the point clouds might be very different:

The photogrammetric point cloud covers ground and

vegetation whereas our city models do no t. With re-

gard to the ground, the problem can be healed by mer-

ging the sampled model cloud with a ﬁltered cloud

from airb orne la serscanning that only consists of last

pulse terrain p oints, see see Figure 1. Another pro-

blem is that buildings ar e modeled separately in Ci-

tyGML. Thus, there exist walls between neighboring

buildings that are not visible from the outside. The-

refore, we only consid e r model walls for sampling if

they are visible from outside the buildings. We also

limit the sampled scene roughly to the area that is co-

vered by the photogrammetric point cloud.

The standard me ans to align point clouds is the

non-feature-ba sed Iterative Closest Point (ICP) algo-

rithm, see (Rusinkiewicz a nd Levoy, 200 1).

We did several experiments with Point

Cloud Library’s ICP implementations (version

1.8.0) in point-to-point and in point-to-plane

mode (see (Holz et al., 2015) and tu torials o n

http://pointclouds.org/documentation/tutorials), both

with a maxim um corresponden c e distance of 5m that

ﬁts with data. Whereas the base implementa tion of

point-to-point ICP migh t converge very slowly or get

stuck in a sub-optimum, ICP based on Levenberg-

Marquar d optimization and poin t-to-plane ICP ﬁnd

good solution s. In p oint-to-plain mode, distances

are not measured between points but betwe en points

and planes deﬁned by estimated local normal vectors

(see (Chen and Medioni, 1992)). This allows for

sliding along wall planes and is better suited to

match segments of planes like facades. Whereas the

Levenberg-Mar quard based point-to-point algor ithm

takes 4507 seconds

to align the clouds shown in

Figure 1, poin t-to-plane I CP ﬁnishes in 719 seconds.

However, our proposed method terminates in less

than 30 seconds (see St. Anton street scenario in

Ta ble 1).

A somewhat related approach is the Normal Dis-

tribution Transform (NDT). Its idea is to replace all

cloud points within a grid cell of the target cloud by

a normal distribution that describes the probability of

ﬁnding a point at a certain po sition. Instead of ma-

tching single points to each other, the probability of

points being at the right place (with regard to the tar-

get cloud) can be optimized with Newton iterations,

see (Magnusson et al., 2009). This technique elimina-

tes the time of ICP’s ne arest neighbor search. With a

resolution pa rameter set to 5 an d a step size parame-

ter set to 2.5 Point Cloud L ibrary’s algorithm aligns

the two previously investigated clo uds in 3 96 seconds.

For chosen settings, this indeed is faster than ICP in

point-to-plane mode but still signiﬁcant slower than

our proposed method. In our scenario, success and

running time of NDT heavily depend on choice of

parameters. For examp le, the method converges to

a wrong alignment for step size 3 but converges to the

correct global optimu m for step sizes 1, 2.5, a nd 5 .

For clouds of structured environments, examples

of (Ma et al., 2016) show that there might be problems

with iterative non-feature- based methods. Such pro-

blems occur if clouds overlap only partially or initial

coarse alignment is bad. Also, runn ing times strongly

depend on cloud size and parameters. Another dis-

advantage is that we need additional in formation like

digital terrain mode ls to en rich sampled clouds.

In our scenarios, walls are dominant structures

and reference surfaces are very simple. Therefore,

we concentrate on pairing spe c iﬁc geometric primiti-

ves as do most feature-based registration methods (cf.

(Chuang and Jaw, 2015)). One could directly detect

plane segments that represent walls and align them

with CityGML wall po lygons. However, both de-

tected segments and CityGML polygo ns are only ap-

proxim ations of real walls. Also CityGML polygons

are simpliﬁed due to their level of detail. But most

CityGML models use high quality building footprints

from cadastral data. Therefore, we work with walls’

footpr ints. Based on RANSAC estimates of wall pla-

nes, we rotate the point cloud so that walls become

straightened up and ground is oriented parallel to the

x-y-plane . By or thogonally projecting points to the x-

y-plane, walls become visible as dense contours. This

Running times are measured on a single core of a

2.4GHz i5 processor (2013) with 4GB RAM.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

300

basically simpliﬁes the 3D alignment problem to a 2D

task of aligning these contours with CityG ML buil-

ding footprints. The RANSAC-based rotation step is

similar to the use of angle-f e atures in the general pur-

pose point cloud alignm e nt algorithm of (Ma et al.,

2016). They compute a rotation matrix based on an

angle histogram. Rotation leads to translation of the

histogram that, for example, can be detected using the

Fourier transform shift property

. However, angle-

features become disturbed if d ifferent scaling factors

are used for different coordin ate directions du ring co-

arse registration. This also is a problem if one tries to

detect rotation between 2D footprint images based on

an angle histogram of Hough lines. In our scenario,

Hough space registration (cf. (Zhao, 2006)) additio-

nally suffers from different lengths of line segments.

Established alignment proc edures for 2D ima ges

mostly avoid to solve an NP-complete combinatorial

optimization problem. But footprint images of point

clouds and city models are quite different so that com-

binatorial o ptimization promises to deliver results that

are more r obust against distortions. We investiga-

ted combinatorial alignment procedures based on cor-

ners of footprints, based on line segments of foot-

prints and based on straight lines covering line seg-

ments. For UAV po int clouds covering larger urban

areas, corner-based alignment works well if there are

sufﬁcient many building corners (of different height)

in a scene. In our experiments, we detected building

corners using Harris corner detector and aligned them

with vertices of the city model using a Mixed Integer

Linear Program (MIP) similar to the one in Section

4, see Figure 2. Unfo rtunately, the videos that we use

Figure 2: Left: Corner-based matching of building walls

detected from a UAV point cloud (see Section 6) with Ci-

tyGML model’s walls. Right: L ines that are detected in the

point cloud are matched with CityGML model lines.

for facade detec tion mostly are taken from street level.

They cover more detail, and corner detection does not

only ﬁnd signiﬁcant building corners but also corners

of smaller structures. Also, in a straight street with

Sarvaiya, J., Patnaik, S., and Kothari K.: Feature Ba-

sed Image Registration Using Hough Transform. http://

psrcentre.org/images/extraimages/48.%2050.pdf (Accessed

07 July 2017)

few intersections, th e number of signiﬁcant corners is

small. Therefore, our implemen ta tion of corner based

matching does not work well with this type of point

clouds. Instead, straight lines of building footprints

are signiﬁcant. Line-based matching ap proaches have

been used successfully to align images with digital

surface mo dels a nd air borne laserscanning data, see

for example (Avbelj et al., 2 013; Cui et al., 2017).

They have been also used to align two p oint clouds,

see for example (Li et al., 2012). One can either

match unbounded straight lines or the short bounded

line segments that are part o f footprints. There might

be several line segments appr oximately lying on the

same straight unbounded line. Thus joining line seg-

ments to u nbounded lines reduces the number of pos-

sible matching comb inations. Whereas match ing o f

unbounded straight lines works for simple street sce-

narios (cf. Figure 2), it might fail for complex UAV

point clouds that lead to a variety of similar lines.

Therefore, we propose to match line segments (cf.

Figure 4). Where as (Avbelj et al., 2013) use a sta-

tistical, accumu lator based approach to match lines,

(Li et al., 2012) present an iteration sch e me and (Cui

et al., 2017) apply non-linear least-squa res optimiza -

tion, we u se a simpler Mixed Integer Linear Program

to ﬁnd matching line segmen t pa irs in combination

with a Linear Program (LP) to compute a 2D trans-

formation matrix. However, this requires pre- and

post-processing steps that u tilize the vertical nature

of walls.

3 PRE-PROCESSING

First, we both translate photogrammetric cloud and

city model into a local coo rdinate system with orig in

at −

t ∈ R

to avoid large numerical erro rs: Th e cor-

respond ing translation is given by

T =



(0,0,0) 1



∈ R

4×4

with un ity matrix E ∈ R

3×3

. Then we rotate the pho -

togrammetric cloud to correctly align with the z-axis

and generate a 2D building footprint ima ge a s fol-

lows:

Figure 3: Density of point cloud and binary image of likely

wall footprints superposed with CityGML building foot-

prints.

Line-based Registration of Photogrammetric Point Clouds with 3D City Models by Means of Mixed Integer Linear Programming

301

• To generate a preliminary binary image of li-

kely wall footprints, a resolution of 9 pixels per

square meter is sufﬁcient for ou r data, see Fi-

gure 3 . We compute minimum and maximum z-

coordinates (height values) of all points with x-

and y-coord inates within the p ixel’s area. If these

values at least differ 3.5 m in height (one building

level) and if the re exist at least eight points with

z-coordinates pairwise belonging to disjoint inter-

vals of width 0.5m then we classify the pixel as

being part of a wall footpr int. One could also

generate a density image by counting the po ints

above the pixel’s area. Then thresho lding cou ld

give a wall map. Unfor tunately, we work with

point clouds of very different local densities so

that it is difﬁcult to ﬁnd or generate a suitable

threshold value.

• Walls might not be exactly vertical. Before we re -

duce the cloud to wall and ground points, we have

to rotate it with a matrix D to make walls upright.

To this end, we divide the ground into 10 m ×10 m

sections. For each section we iteratively apply a

RANSAC algorithm to the section’s subset of the

cloud that also corre sponds roughly with previ-

ously comp uted p ixels o f wall footprints. With

RANSAC we estimate nearly vertical planes for

each section. We collect th e nor mal vectors of

the third of planes with largest number of inliers.

Let N be the set of these normals. We estimate

the common upward-direction of all walls with a

RANSAC algorithm as well: We iteratively select

a plane through the origin and through two non-

collinear points in N. Out of all selected planes

we choose one with the largest n umber of inliers

p, p ∈ N. The n we apply a rotation D,

D =







cos(β) −sin(β)sin(α) − sin(β)cos(α) 0

0 cos(α) −sin(α) 0

sin(β) cos(β)sin(α) cos(β)cos(α) 0

0 0 0 1







to the point cloud that aligns this plane’s normal

vector (n.x,n.y,n.z), n.z > 0, with the z-axis. Let

h = sin(α) · n.y + cos(α) · n.z. Angles are

α = sign(n.y)arccos(n.z/

n.y

+ n.z

β = sign(n.x)arccos(h/

n.x

+ h

The appr oach requires the existence of walls with

different orientations.

• Now we compute a sharper version of the binary

wall footprint image. Since walls should exactly

point upwards after applying D, we can reduce

noise by ﬁltering for even larger height differen-

ces ( 5m instead of 3 .5m) to detect walls.

Our goal is to match the footprint image with a si-

milar image that we obta in from the city model. To

this end, we draw a single picture of ﬁlled footprints

of a ll CityGML buildings, on ly considering the area

of the photogrammetric point cloud. Then we detect

edges with the Can ny operator. These edges corr e-

spond with facades but not with walls between hou-

ses.

Both on the footprint image and on the edge pic-

ture, we apply a probabilistic Hough transform to de-

tect line segments, see Figure 4. In the f ollowing

section we use the sets P and Q that contain line seg-

ments of the footprint image and the model’s edg e

picture, respectively.

4 LINEAR PROGRAMS

Linear and Mixed Integer Linear Programming have

been used for registration purpo ses, see for exam-

ple (Sakakubara et al., 2007; Wang et al., 2017).

However, most alignment procedures u se n on-linear

optimization. Linear optimization isn’t even listed

amongst the optimization methods for point cloud

alignment in the overview article (Tam et al., 2013) .

But if one seeks for a linear or afﬁne transformation

and if one is allowed to measure er rors in l

-norm

(sum of abso lute values) instead of the wid e ly used

-norm (least squares) then Linear Progra mming is a

very powerful tool. MIP and LP also have been consi-

dered for the generation of CityGML models and 3D

modeling. For example, (Boulch et al., 2014) use a

MIP to reconstruct surfaces from point clouds.

The difﬁculty of our r egistration task is that we

have to select from a candidate set of line pairs

before we can compute a linear tr a nsformatio n to

align line segments of P with line segments of

Q. Let P = {(p

1,1

, p

1,2

),... , (p

m,1

, p

m,2

)} and Q =

{(q

1,1

1,2

),... , (q

n,1

n,2

)}. Each line segment is de-

ﬁned by its two endpoints that are given in homogen e -

ous coordinates, for example p

i,k

= (p

i,k

.x, p

i,k

.y,1)

⊤

Figure 4: First two pictures: Detected line segments of point

cloud and CityGML model; third picture: Matching line

segments are connected with thin bright lines.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

302

Now we have to determine a linear tran sform L

that align s the largest possible subset of P with a cor-

respond ing subset of Q by u sin g translation, scaling

and rotation as feasible operatio ns.

A common method to ﬁnd large correspondin g

sets of line features is to use RANSAC in Hough

space. For example, (Colleu et al., 2 008) use this ap-

proach to match video fr ames with city model data. In

contrast to this we match bounded line segments with

a M IP that automatically also c omputes an initial ver-

sion of the transf ormation matrix.

First, w e have to ﬁnd matching candidate pairs

between P and Q. If a binary variable x

i, j

, 1 ≤

i ≤ m, 1 ≤ j ≤ n, equals one, then the pair

((p

i,1

, p

i,2

),(q

j,1

j,2

)) is selected for matching. It is

not selected if x

i, j

= 0. We can a-priori exclude pairs

by setting their bina ry variables ﬁxed to z e ro. We do

this if their lines have a distance in Hough space above

a threshold value. Also, we compute a bounding bo x

for each line segment and extend it by the expected

error of coarse alignment. If bounding boxes of two

line segments do not inter sect th en we also exclude

the pair. This is the sole step in our alignment method

that might cause it to only ﬁnd a local but no global

optimum . In contrast to ICP and other non-feature-

based algorithms, what follows describes global op ti-

mization.

We have to maximize an objective function like

∑

i=1

∑

j=1

i, j

(1)

subject to the restriction that there is a linear mapping

L =





cos(α) −s

sin(α) d

sin(α) s

cos(α) d

0 0 1





∈ R

3×3

that approximately maps line of segment (p

i,1

, p

i,2

)

onto line of (q

j,1

j,2

) if x

i, j

= 1. Manual coarse re-

gistration should be done so that scaling s

in x- and

scaling s

in y- direction is equal: s := s

= s

. Then

there is less uncertainty and more stability.

There need to exist scalars r

1,i, j

and r

2,i, j

such that

for k ∈ {1,2} and

k,i, j

− d

−

k,i, j

:= q

j,1

+ r

k,i, j

j,2

− q

j,1

) − Lp

i,k

(2)

there ho lds true 0 ≤ d

k,i, j

.x ≤ ε, 0 ≤ d

k,i, j

.y ≤ ε.

Thus, the co ordinate- w ise distance between tr a nsfor-

med points Lp

i,1

, Lp

i,2

and the straight line through

j,1

and q

j,2

has to be bounded by a ﬁxed threshold

value ε. We work with ε := 0.5 for selection of candi-

date pairs (distanc e s will be further minimized in a se-

cond optimization step). The source line segment has

to be approximately mapped onto a straight line going

through the target line segment. There is no need to

actually hit the target segment. This allows for sli-

ding like in ICP p oint-to-plane mode. Nevertheless,

we match segments and not straight lines because we

a-prior i exclude pairs (i, j) of distant segments by set-

ting x

i, j

= 0.

To obtain a linear pr oblem, we do no t compute

sine and cosine functions but seek for a ma trix

L =





1,1

1,2

1,3

2,1

2,2

2,3

0 0 1





under restrictions

1,1

= l

2,2

1,2

= −l

2,1



if s

= s

, (3)

−µ ≤ l

1,1

− l

2,2

≤ µ,

−µ ≤ l

1,2

+ l

2,1

≤ µ



if s

6= s

, (4)

1 − δ ≤ l

1,1

≤ 1 + δ, −δ ≤ l

1,2

≤ δ, (5)

where δ = 0.3 and µ = 0.1 are small threshold values.

These restrictions avoid that L describ es mirroring .

Instead of using simple objective function (1) we

consider the le ngths of point cloud line segments as

weights. Let w

be th e length o f the i-th point cloud

line segment. We extend the objective function to

∑

i=1

∑

j=1

i, j

. Maximizatio n favors selection of

long line segments.

We write the optimization problem as an Integer

Linear Program with the help of a large number M.

M is used to restrict the distance condition to selected

pairs of line segments while keeping the problem li-

near. Let distances d

k,i, j

, d

−

k,i, j

∈ (R

≥0

)

and matrix

coefﬁcients l

r,c

∈ R, 1 ≤ r ≤ 2, 1 ≤ c ≤ 3:

Max.

∑

i=1

∑

j=1

i, j

s.t.

∑

i=1

i, j

≤ 1 for 1 ≤ j ≤ n,

∑

j=1

i, j

≤ 1 for 1 ≤ i ≤ m,

conditions (2) and (( 3) or (4)) and (5), and

max{d

k,i, j

.x,d

k,i, j

.y,d

−

k,i, j

.x,d

−

k,i, j

.y} + Mx

i, j

≤ ε+ M.

A MIP in genera l is NP complete. It very much

depends on the number of candidate corresponden-

ces and on the size of ε how long a solver takes to

ﬁnd a solution. We use the GNU Linear Programming

Kit library GLPK (Makhorin, 2009) to solve LPs and

MIPs. I n most scenarios of this pap e r, solutions of

the MIPs are found in less than a second. Neverthe-

less, running times can be decreased. It turns out that

the exclusion of distant line segments from the mat-

ching task allows f or an LP relaxation of the MIP. In

the relaxed problem, binary variables x

i, j

are repla-

ced by real-valued variables x

i, j

∈ [0, 1]. It turns out

Line-based Registration of Photogrammetric Point Clouds with 3D City Models by Means of Mixed Integer Linear Programming

303

that values of x

i, j

become indeed very close either to

zero or to one. We then round th em. The tr adeoff of

relaxation is that one has to deal with a few wrong

matchings during the following second optimization

step that improves the mapping L:

Based on an optimal solution of the MIP, we de-

ﬁne the set R of pairs (i, j) for which x

i, j

= 1 so that

for all (i, j) ∈ R distan ces fulﬁll 0 ≤ d

k,i, j

.x,d

k,i, j

.y ≤

ε. To further minimize the errors, we apply another

LP. By u sin g the same variable names as for the MIP,

this LP h a s to minimize

∑

(i, j)∈R

∑

k=1

k,i, j

.x + d

k,i, j

.y + d

−

k,i, j

.x + d

−

k,i, j

.y)

subject to (2) for (i, j) ∈ R, and conditions (3) or ( 4),

and (5).

Weights w

now enforce large line segments to be

matched with higher precision than short ones.

Based on L, we can align the poin t cloud in the

x-y-plane using m a trix

A :=







1,1

1,2

0 l

1,3

2,1

2,2

0 l

2,3

0 0 1 0

0 0 0 1







5 POST-PROCESSING

It remains to estimate an ad ditional z-scaling factor

and a z-translation . To this end, we look at te rrain in-

tersection points of CityGML models. For each such

point, we have a true z-value z

of the ground and also

determine the height z

of th e roof ab ove this point.

Figure 5: Height maps showing minimum and maximum

values of adjusted photogrammetric point cloud and sam-

pled model cloud

Figure 6: A video frame used for point cloud generation,

projection of point cloud onto city model prior to alignment,

and projection of aligned point cloud onto city model.

We also determ ine a corresponding lowest and hig -

hest point of the pre-proc essed point c loud that has

been tr ansformed with A, see Figure 5. Let z

and z

be the z-coordinates of these two points, respectively.

Then we get a local z-scalin g factor

−z

. If the same

scaling factor was applied to all directions during ma-

nual coarse registration, z-scaling factor should be

s =

1,1

· l

2,2

− l

1,2

· l

2,1

. We allow a small deviation

by considering all local scaling factors within the in-

terval [s − 0.2,s + 0.2 ]. To avoid outliers, we compute

the median value z

of these feasible local scaling fac-

tors. Using z

we determine the z-translation value z

as the media n of all lo c al translation values z

− z

·z

Then we ﬁnally align the cloud with the city mo del by

multiplying its points with

−1

ZADT, Z :=







1 0 0 0

0 1 0 0

0 0 z

0 0 0 z







Figure 7: Perspective projection of video frame areas onto

CityGML walls.

Figure 8: A city model is drawn onto a video frame using

the frame’s SfM camera transformation and estimated ca-

mera projection. The upper picture is based on a course

manual alignment between SfM point cloud and UTM coor-

dinate system. The second picture shows optimized align-

ment.

After aligning the point cloud with the city model,

we c an match model walls with areas in video frames

or orthogonally p roject points to walls, see Figures 6

and 7. Fortunately, the SfM tool provides camera pa-

rameters and, in a bundle output ﬁle, for each frame

k ∈ {1,... , n} a translation vector

∈ R

and a rota-

tion ma trix R

∈ R

3×3

that tog e ther de ﬁne a transfor-

mation C

of the orig inal, non-aligne d photogramme-

tric cloud into the frame’s camera coordinate system:



(0,0,0) 1



After applying this mapping, the camera’s position is

the o rigin and the camera looks into the d irection of

the negative z-axis. We multiply the photogramme-

tric point cloud with matrix T

−1

ZADT to align with

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

304

the city model. Thus, the transform a tion of the c ity

model into the camera’s coordinate system is given

via m a trix C

−1

ZADT )

−1

= C

−1

T .

We can now ren der the model using cam e ra para me-

ters so that it exactly ﬁts with the came ra frame ( see

Figure 8 for the UAV city center scenario that is des-

cribed in Section 6). Vice versa, CityGML walls can

be textured either based on the point cloud (Figure 6),

or based on the corresp onding textured mesh or based

on single vide o fra mes (Figure 7).

6 RESULTS

We apply the algorithm to three different point clouds.

So far, we h ave used a dense point cloud (now deno-

ted as St. Anton street) a s example. The point clo ud

of Knight street (see Figure 9) also is taken throug h

the front window o f a car, but it is much thinner than

the one of St. Anton street. The third clo ud (see Fi-

gure 10) originates from a UAV video of a city center.

This scenario also allows for alignment by matchin g

corners, see Figure 2. To visualize transformations,

we worsened coa rse manual registration by shifting

the clouds by 4 mete rs in x- and y-directions, respec ti-

vely. Table 1 an d Figures 9 and 10 summarize r esults

of comp utation.

Precision of the registration co rresponds with re-

solution 1/3 meter of the pictures used to detect wall

lines. It can be slightly impr oved by increasing pictu-

res’ resolution. However, if the resolution is too high,

it becomes difﬁcult to detect walls.

To investigate the lim its of our algorithm, we align

sampled model point clouds with transf ormed versi-

ons of them selves in a local coordinate system. These

small clouds are free of noise and lead to a high num-

ber of correspondences. Th is increases running time

of our MIP optimization step whereas such clouds

are simple to ma tc h for ICP in point-to-plane mode.

For example, our algorithm takes 82 seconds to (ap-

proxim ately) undo the transfor mation B

−1

(rotations

by 0.01 degrees around all axes, scaling factor 0.99

and translation with (−4,−4,4)) applied to the mo-

del cloud of Figure 1. It computes a matrix

B that is

a good approximation of ground truth ma trix B. In

compariso n, ICP in point-to-plane mode

converges

to a corresponding ma trix

B in 89 seconds. In this ex-

ample, ou r algorithm ﬁnds 136 candid a te pairs of cor-

respond ing line segments. The single MIP step runs

67 seconds. If we do not compute the exact solution

of the MIP but use a solution of the relaxed MIP th e n

Point Cloud Library’s class IterativeClosestPointWith-

Normals is used with parameters TransformationEpsilon =

−8

and EuclidianFitnessEpsilon = 0.1.

Table 1: Three i nvestigated scenarios.

St. Anton Knight UAV city

street street scenario

number of points 8.593.746 1.995.498 5.017.262

line segments of point cloud 114 71 68

line segments of city model 61 73 338

candidate pairs of line 71 95 56

segments

pairs selected by relaxa tion 24 24 24

of MIP

pairs selected by MIP 17 20 21

model vertices used for 32 48 33

z-operations

running time pre-processing 14s < 4s < 1s

running time MIP a nd LP < 1s < 1s < 1s

running time post-processing 15s 6s 2s

overall running time decreases to 14 seco nds and the

optimization steps ﬁnish in less than a second w ith

outcome matrix

B. The overall ru nning tim e can be

further decreased by limiting the number of RANSAC

plane estimates in th e pre-processing step.

Figure 9: Knight street scenario: matching between line

segments, thin point cloud, points projected to model walls.

Figure 10: Left: UAV point cloud and points projected to

model walls. Right: matching between line segments.

The l

-vector norm is deﬁned via k(a

,... ,a

+ ·· · + a

, and the l

-matrix (spectral) norm

kB −

:= sup



k(B −

B) ·~ak

k~ak

0 6=~a ∈ R



is a mea sure for the quality of the c omputed align ment

mapping

B. Let ~v ∈ R

be a point of the cloud and

~e = (e.x,e.y, e.z) ∈ R

the difference between the cor-

Line-based Registration of Photogrammetric Point Clouds with 3D City Models by Means of Mixed Integer Linear Programming

305

rectly and ap proximately aligned versions of ~v. Then

k(e.x,e.y,e.z,0)k



(B −

B)· (v.x,v.y,v.z,1)

⊤



≤ kB −

k~vk

+ 1.

Our non-re la xed algorithm aligns best with kB −

≈ 0.1407 in the current scenario, f ollowed by

the relaxed version (kB −

≈ 0.2069) an d b y ICP

(kB −

≈ 0.288).

7 CONCLUSIONS

One can utilize the vertical orientation of walls to re-

duce the problem of aligning photogrammetric point

clouds with 3D city models to two space dimensi-

ons if at least two wall segments with linear inde-

pendent directions are detected. This might be given

if the scene covers an intersectio n of stree ts. Th en,

instead of ICP, featur e-based alignment using Linear

Programming is a suitable means. Useful results

are obtained by matching line segments of wall foot-

prints. While the algorithm is designed to align with

city models, it can also be used to align two poin t

clouds in which walls are dominant. Although not

so fast, the point-to-plane version of ICP also aligns

well with sampled CityGM L models.

REFERENCES

Avbelj, J., Iwaszczuk, D., M¨uller, R., Reinartz, P., and

Stilla, U. (2013). L ine-based registration of DSM and

hyperspectral images. ISPRS - International Archives

of the Photogrammetry, Remote Sensing and Spatial

Information Sciences, XL-1/W1:13–18.

Boulch, A., de La Gorce, M., and Marlet, R. (2014).

Piecewise-planar 3D reconstruction with edge and

corner regularization. Computer Graphics Formum,

33(5):55–64.

Chen, Y. and Medioni, G. (1992). Object modelling by re-

gistration of multiple range images. Image and Vision

Computing, 10(3):145–155.

Chuang, T.-Y. and Jaw, J.-J. (2015). Automated 3D feature

matching. The Photogrammetric Record, 30(149):8–

29.

Colleu, T., Sourimant, G., and Morin, L. (2008). Automa-

tic initialization for the registration of GIS and video

data. In Proc. 2008 3DTV Conference: The True Vi-

sion - Capture, Transmission and Display of 3D Video,

pages 49–52, Washington, DC. IEEE.

Cui, T., Ji, S., Shan, J., Gong, J. , and Liu, K. (2017). Line-

based registration of panoramic images and lidar point

clouds for mobile mapping. Sensors, 17(1).

Gr¨oger, G., Kolbe, T. H., Nagel, C., and H¨afele, K. H.

(2012). OpenGIS City Geography Markup Language

(CityGML) Encoding Standard. Version 2.0.0. Open

Geospatial Consortium.

Holz, D., Ichim, A. E., Tombari, F. , Rusu, R. B., and

Behnke, S. (2015). Registration with the point cloud

library: A modular framework for aligning in 3-D.

IEEE Robotics Automation Magazine, 22(4):110–124.

Li, W., Li, X., Bian, Y., and Zhao, H. (2012). Multiple view

point cloud registration based on 3D lines. In Proc.

International Conference on Image Processing, Com-

puter Vision, and Pattern Recognition (IPCV), pages

1–5.

Ma, Y., Guo, Y., Zhao, J., Lu, M., Zhang, J., and Wan, J.

(2016). Fast and accurate registration of structured

point clouds with small overlaps. In 2016 IEEE Con-

ference on Computer Vision and Pattern Recognition

Workshops (CVPRW), pages 643–651.

Magnusson, M., N¨uchter, A., L¨orken, C., Lilienthal, A. J.,

and Hertzberg, J. (2009). Evaluation of 3D regis-

tration reliability and speed — a comparison of ICP

and NDT. In Proc. IEEE International Conference on

Robotics and Automation (ICRA), pages 3907–3912,

Washington, DC. IEEE.

Maiseli, B., Gu, Y., and Gao, H. (2017). Recent deve-

lopments and trends in point set registration methods.

Journal of Visual Communication and Image Repre-

sentation, 46:95–106.

Makhorin, A. (2009). The GNU Linear Programming Kit

(GLPK). F ree Software Foundation, Boston, MA.

Rusinkiewicz, S. and Levoy, M. (2001). Efﬁcient variants of

the ICP algorithm. In Proc. Third International Con-

ference on 3-D Digital Imaging and Modeling, pages

145–152.

Sakakubara, S., Kounoike, Y., Shinano, Y., and Shimizu,

I. (2007). Automatic range i mage registration using

mixed integer linear programming. In Yagi, Y., Kang,

S. B., Kweon, I. S., and Zha, H., editors, Proc. 8th

Asian Conference on Computer Vision, Tokyo 2007,

Part II, pages 424–434, Berlin. Springer.

Tam, G. K., Cheng, Z .-Q., Lai, Y.-K., Langbein, F. C., Liu,

Y., Marshall, D., Martin, R. R., Sun, X.-F., and Rosin,

P. L. (2013). Registration of 3D point clouds and mes-

hes: A survey from rigid to non-rigid. IEEE Trans.

Visualization and Computer Graphics, 19(7):1–20.

Wang, Y., Moreno-Centeno, E., and Ding, Y. (2017).

Matching misaligned two-resolution metrology data.

IEEE Transactions on Automation Science and Engi-

neering, 14(1):222–237.

Zhao, S. (2006). Hough-domain image registration by me-

taheuristics. In Proc. 9th International Conference on

Control, Automation, Robotics and Vision 2006, pages

1–5.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

306