ON APPLICATIONS OF SEQUENTIAL MULTI-VIEW DENSE

RECONSTRUCTION FROM AERIAL IMAGES

Dimitri Bulatov, Peter Wernerus and Hermann Gross

Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB

Gutleuthausstr. 1, Ettlingen, Germany

Keywords:

Depth map, Point cloud, Urban terrain modeling.

Abstract:

Because of an increasing need and a rapid progress in the development of (unmanned) aerial vehicles and

optical sensors that can be mounted onboard of these sensor platforms, there is also a considerable progress

in 3D analysis of air- and UAV-borne video sequences. This work presents a robust method for multi-camera

dense reconstruction as well as two important applications: creation of dense point clouds with precise 3D

coordinates and, in the case of videos with Nadir perspective, a context-based method for urban terrain mod-

eling. This method, which represents the main contribution of this work, includes automatic generation of

digital terrain models (DTM), extraction of building outlines, modeling and texturing roof surfaces. A simple

interactive method for vegetation segmentation is described as well.

1 INTRODUCTION

Automatic detection and reconstruction of buildings

and vegetation from aerial images has a wide ﬁeld of

applications (e.g. urban planning, surveillance, disas-

ter rescue). In this ﬁeld, unmanned aerial vehicles

(UAVs) have become an increasingly attractive tool,

because of their low cost and easy use. From the

mathematical point of view, it leads, however, to an

additional challenge to make difference between 2.5D

and 3D situations. In the ﬁrst case, we think about

Nadir ﬂights or ﬂights in sufﬁcient altitudes, restricted

depth ranges, and a relatively high potential of model-

based approaches (Fischer et al., 1998; Gross et al.,

2005). The second case implies a relatively high res-

olution of building walls together with surrounding

terrain, wherefore large depth ranges must be taken

into account and generic approaches for building re-

construction (Bulatov and Lavery, 2010; Curless and

Levoy, 1996) from geometric primitives(points, lines,

or triangulated depth maps) obtained in several (ref-

erence) images have clear advantages.

In the present paper, we show how high quality

depth maps can be obtained from short image se-

quences and used to accomplish both tasks. Our input

is thus given by a monocular video or image sequence

processed by a structure-from-motion method, such

that additionally to the camera positions and orienta-

tions, we have a sparse, but precise and reliable set

of 3D points that will be used for dense reconstruc-

tion. After a brief overview of related work in Sec. 2,

the approach (Bulatov et al., 2011) for dense depth

maps computation supported by triangular meshes is

summarized in Sec. 3. A depth map assigns a spatial

coordinate to a dense pixel set of an image. A union

of several such depth maps is a 3D point cloud, which,

visualized in a suitable way (see Sec.4), is often suf-

ﬁcient to perceive the structure of the scene. Never-

theless, for the special case of Nadir images, the as-

sumption of a 2.5D graph (terrain skin) z(x,y) can be

made. We provide in Sec. 5 a model-based approach

tied up with related work (Gross et al., 2005), which

in its original idea, has a LIDAR point cloud as input.

We show qualitative results of the reconstruction in

Sec.6 and give concluding remarks in Sec.7.

2 PREVIOUS WORK

Since the goal of this work is to present the main ap-

plications of depth map extraction rather than depth

map extraction itself, we refer to a survey (Scharstein

and Szeliski, 2002) for a detailed overviewof state-of-

the-art algorithms on dense stereo. Since depth val-

ues are usually discretized and the discretization arti-

facts are undesirable in scenes with many non-fronto-

parallel surfaces, which are typical for UAV videos,

triangular meshes from already available points will

275

Bulatov D., Wernerus P. and Gross H. (2012).

ON APPLICATIONS OF SEQUENTIAL MULTI-VIEW DENSE RECONSTRUCTION FROM AERIAL IMAGES.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 275-280

DOI: 10.5220/0003707802750280

 SciTePress

be extensively used in the course of this work to re-

place discretization with triangular interpolation.

If 3D point clouds are dense and accurate enough,

they can not only be directly visualized on different

levels of detail but also processed with techniques

mentioned in survey (Kobbelt and Botsch, 2004).

Also, we refer to the work (Pock et al., 2011) for

overview of existing functionals allowing 2.5D-based

depth map fusion. Finally, numerous approaches for

building extraction from images exist. Since it is

hardly possible to obtain heights of buildings from

only one image, those reconstruction pipelines that

work with single images, e.g. those reviewed by

(Mayer, 1999), are less stable than the process of ob-

taining building outlines from image sequences with

partial overlaps. The work of (Rottensteiner, 2010)

presupposes a color segmentation of a pair of images

and uses LIDAR point clouds (sparse, but homoge-

neously distributed in the images) to determine ini-

tial orientation of planes. The non-trivial parts in-

clude grouping the segments into planes and gener-

alizing this approach to video sequences with hun-

dreds of frames. In (Baillard and Zisserman, 2000),

the (roof) planes are associated with an induced ho-

mography with three degrees of freedom between cor-

responding images. If a correspondence of lines bor-

dering this plane is established, the number of degrees

of freedom is reduced to one, namely, the inclination

angle of the (half)-plane. This angle is estimated by

means of error minimization algorithms; the initial-

ization is computed for points with high response of

a ”cornerness” operator (Harris and Stevens, 1998)

in order to facilitate search for correspondences and

then reﬁned for the rest of pixels presumed to lie in

the half-plane. In the next step, neighboring relations

are extensively exploited for grouping of lines, delin-

eation and fusing of planes etc. However, the tasks of

detection and matching edges are not always feasible

for optical images of low quality. The approach of

(Mayer and Bartelsen, 2008) consists of determining

building walls from vertical planes. The algorithm is

very simple and fast because a pixel-wise depth cal-

culation is not performed. However, without a com-

plete visibility analysis, it is not possible to determine

the borders of the walls. Determination of roofs is

also not performed. To our knowledge, the majority

of state-of-the-art approaches does not use dense 3D

point clouds from passive sensors for obtaining build-

ings and vegetation. Hence, we strive to make use

of the rapid progress in depth map calculation from

image sequences and adopt different features of algo-

rithms originally elaborated for LIDAR point clouds

(Geibel and Stilla, 2000; Gross et al., 2005; Rotten-

steiner, 2010).

3 SEQUENTIAL MULTI-VIEW

DENSE RECONSTRUCTION

We consider a sequence of 5 to 10 images I

, the

corresponding camera matrices P

, and a sparse 3D

point cloud that was obtained by a structure-from-

motion approach from characteristic points detected

and tracked in the images. The desired output is a

dense 3D point cloud corresponding to any pixel of

the reference image (typically in the middle of the se-

quence), as depicted in Fig. 1. In the following, we

give a description of the algorithm (Bulatov et al.,

2011), in which a detailed insight into choice of rele-

vant terms and parameters is provided.

Figure 1: A multi-view conﬁguration. Cameras are de-

picted by triangles, the object surface is below, already re-

constructed points are shown by red circles and the triangu-

lation by red lines. The unknown depth value is determined

by projecting the corresponding 3D point into other images

and comparing intensities of projected (dashed lines) points.

For any pixel x = x

of the reference image, there

is only one degree of freedom for its position x

another image I

. This degree of freedom is given by

the depth value d of x (Hartley and Zisserman, 2000).

The depth is the distance from the corresponding 3D

point X to the principal plane, and, in the case of a

classical pinhole camera with calibration matrix K,

rotation matrix R and translation vector C, the coordi-

nates of X are given as a function of d by the relation:

X = d · (KR)

−1

x+ C. (1)

A reasonable depth range is discretized into depth

labels d

. For every pixel x

of the reference im-

age, every label d

and every other image I

of the

sequence, the windows I

(w(x

))) are compared

with I(w(x

)). Here x

is the projection of X from

(1) by camera matrix P

and the comparison function

between two such windows w can be, e.g., a trun-

cated sum of absolute values (our choice) of inten-

sity differences or normalized cross correlation. The

data is aggregated into a cost matrix E

data

(m, j). If

x lies in the convex hull of already available points,

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

276

or, more precisely, in a triangle T of their Delaunay-

triangulation in Image I

, we add to E

data

a triangle-

based smoothness term E

mesh

that biases the cost val-

ues of x to be equal to the depth d

T,x

resulting from

intersection of the reprojection ray at x with the sup-

port plane of T. In other words, E

mesh

(m, j) can be

any non-decreasing function of kd

− d

T,x

k, where

T,x

= ad

+ bd

+ cd

, (2)

a,b,c are the local barycentric coordinates of x in

T and d

are the depth values at the trian-

gle vertices. This triangle-based smoothing reduces

matching cost ambiguities in untextured areas. The

second step consists of the non-local optimization;

the smoothness function E

smooth

and the optimza-

tion method are chosen according to (Hirschm¨uller,

2008). This methods performs – in a reasonable time

– quite well also in scenes with manyslanted surfaces.

Finally triangles consistent with the surface are se-

lected. To achieve this, the percentage of pixels x with

minimum cost values similar to that resulting from

T,x

is measured for every triangle T. In other words,

one checks for x

∈ T if min

E (m, j)/E (m, j

) > r,

where E = E

data

+ E

mesh

+ E

smooth

, j

is the depth

label corresponding to d

T,x

and r ≈ 1. When the per-

centage of pixels in a triangle T exceeds a threshold,

all pixels within T are assigned depth values from (2).

This evaluation will facilitate the normal vector ex-

traction in Sec.5.1. Equation (??) is recursive.

4 FUSION, FILTERING AND

VISUALIZATION OF DENSE

POINT CLOUDS

A typical UAV-borne video contains many overlap-

ping images and provides a sufﬁcient coverage of

the scene. Coordinates of 3D points corresponding

to pixels of different reference images are simultane-

ously calculated from the corresponding depth maps

using (1). Unfortunately, the depth estimation is

error-prone, although the number of outliers is greatly

reduced by means of the multi-view reconstruction

presented in Sec. 3. In order to reduce the number of

outliers in the resulting point set, following assump-

tions have been made: given a sufﬁciently high over-

lap of depth maps, pixels consistent with the surface

can be expected not only within the neighborhood of

correctly estimated points in the same, but also in

other depth maps. On the other hand, outliers tend

to have isolated positions. As a consequence, the lo-

cal density at a 3D point X and the quality of X are

strongly correlated. We assign X an accumulator

W(X) =

∑

∈N

exp

(ρ − kX− X

where N = {X

: kX− X

k < ρ}

(3)

and ρ, σ are empiric constants. We deﬁne X to be con-

sistent with the surface if the quantile value of the ac-

cumulator function W(X) exceeds a given threshold;

this threshold is, however, not global, but is an in-

creasing function of point density in different regions

of the computation domain. Doing so, will take the

fact into account that the different regions of the scene

are covered by a different number of depth maps.

The output of this procedure is a relatively precise

and homogeneously distributed dense 3D point cloud.

In our OpenGL interface, these 3D points, colored ac-

cording to the correspondingreference images, can be

directly visualized and manipulated. Three main ap-

plications that have these point clouds as input are:

multi-modal registration (Bodensteiner et al., 2010),

generic surface reconstruction, but also the context-

based approach described in Sec.5.1.

5 MODEL-BASED URBAN

TERRAIN RECONSTRUCTION

5.1 Building Extraction

We now consider the situation where (nearly) Nadir

views of the terrain are given. In order to work with

Euclidean units, we project z-coordinates of points

from Sec.4 into the xy-plane and grouping of these

z-values into cells (rastering). In order to segment

buildings from the surrounding, (not necessarily pla-

nar) terrain, the Digital Terrain Model (DTM) ex-

traction is carried out. At the beginning, cells cor-

responding to the ground – those with minimum al-

titude within a circular ﬁlter – are ﬁxed; whereby

the circle radius corresponds to the largest dimension

of the smallest building. To cope with few remain-

ing outliers, the original approach of (Gross et al.,

2005), which proposes a solution of Neumann Differ-

ential equation, can be replaced by one of the robust

cost function mentioned in (Pock et al., 2011). We

chose the 2.5D-based L

-spline solution due to (Bu-

latov and Lavery, 2010). For the sake of computa-

tion time, the Digital Surface Model (DSM) is given

by a low-pass ﬁltering of the rasterized image. The

height information, given by the difference of DSM

and DTM, is used in a three-step procedure for deter-

mining shape and height of the buildings. The aerial

image is needed to detect trees and to texture the ter-

rain model and the roofs. The procedure is brieﬂy de-

ON APPLICATIONS OF SEQUENTIAL MULTI-VIEW DENSE RECONSTRUCTION FROM AERIAL IMAGES

277

Figure 2: Three steps of context-based building modeling.

The main input is given by (a fragment of) the depth map

followed by extraction of building outlines, modeling of

roof surfaces and texturing. Data set Bonnland, see Sec.6.2.

scribed in the following three paragraphs and visual-

ized in Fig. 2.

Extraction of Building Outlines. The segmenta-

tion process for buildings delivers regions whose bo-

undaries are approximated by rectangular polygons.

If there are small convexities or indentations in the

building contour, short edges are removed by mod-

ifying the object contour through iterative general-

ization. The area is changed as little as possible by

adding to or removing from the object rectangular

subparts. As a result, building outlines are created.

Roof Plane Modeling. To model roof planes, the ap-

proach (Geibel and Stilla, 2000) was incurred into our

work. The normal vector of every internal building

pixel x is determined by computing a local adaptive

operator in a small window around x. Contrary to the

original approach of (Gross et al., 2005) which de-

rived roof planes orientation by extracting dominant

directions of a weighted histogram over all pixels in

the interrelated areas of a building, this task is now

solved by k-means-based clustering these normal vec-

tors and grouping connected pixels into regions. The

roof surfaces are described by polygons afterwards. A

polygonencloses the entire roof surface including dis-

turbed areas; its borders are determined by intersec-

tions of the approximated roof plane with its neigh-

bor planes. Finally, the walls of the buildings are con-

structed through the outer polygon edges of the roof

surfaces (upper edge) and through the terrain height

(lower edge) available from the depth map.

Texturing. The roofs and terrain are textured by

means of the aerial image. If calibrated terrestrial

views are available, the process of texturing can be

extended to the building walls, see e.g. (Haala, 2005).

5.2 An Interactive Tree Detection

Approach

The determination of the building contour is often dis-

turbed by vegetation – especially if the roof is par-

tially occluded by trees. Since tree classiﬁcation by

ﬁrst/last echo is impossible for these point clouds,

classiﬁcation is done in the rasterized image. In the

aerial image, some tree regions are interactively de-

ﬁned. For each band (RGB), mean value and stan-

dard deviation inside the deﬁned tree regions are cal-

culated. All pixels with color values of a smaller devi-

ation from the mean value than the standard deviation

for each band are declared as treelike pixels. These

pixels of the depth map are excluded from the build-

ing reconstruction. In sufﬁcient large tree like areas,

trees are added to the model. To model a tree, we ﬁrst

create an image V illustrating a tree with transparent

background. The treecolor can be modiﬁed to match

the season or the color of the detected tree regions.

Finally, two such images V are placed vertically and

perpendicularly to each other into the model.

6 COMPUTATIONAL RESULTS

6.1 Model-free Dense Reconstruction

We ﬁrst consider a video sequence representing a

rather complicated building – the cathedral of Speyer

(Germany), recorded by a hand-held camera onboard

of a Cessna. The angle of inclination of the camera

is about 30 degrees to cover building walls. We ob-

tained a relative orientation of the camera trajectory

and a sparse point cloud with (Bulatov, 2008). Depth

map computation was performed by the method de-

scribed in Sec. 3 from seven images. A reference

frame and the corresponding depth map is depicted in

Fig.3, top, while the bottom of the ﬁgure represents

two views of a point cloud, before and after fusion,

obtained from seven such depth maps and visualized

in our OpenGL-interface.

6.2 Urban Terrain Modeling

The input data set of this section is a video taken dur-

ing an UAV ﬂight over the village Bonnland, in Ger-

many. After a structure-from-motion algorithm (Bu-

latov, 2008), the depth maps supported by triangu-

lar meshes were obtained from ﬁve reference frames.

One of these reference images and the correspond-

ing depth map are presented in Fig.4, top. From the

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

278

Figure 3: Dense reconstruction of the data set Speyer. Top

row: A video frame and the corresponding depth map. Two

views of the dense point cloud before and after ﬁltering are

shown in the second and last row, respectively. The number

of outliers in the last row is greatly reduced.

Figure 4: Input and intermediate results of the reconstruc-

tion of sequence Bonnland. Top row: a reference image

and the corresponding depth map. Bottom left: the syn-

thetic image obtained by the procedure of Sec. 4 and bottom

right the corresponding terrain skin map z(x,y). In the syn-

thetic image, interactively determined regions of vegetation

are depicted in dark-red, those automatically detected and

post-processed by morphological operations are violet.

depth maps, the z-coordinates of the 3D points are ob-

tained by the proceduredescribed in Sec. 4 and resam-

pled on a rectangular equally-spaced grid (x

),k =

0,...,470, l = 0, ..., 480. The values for x

, x

470

, y

480

are given by the minimum and maximum of x and

y coordinates of the data points, respectively, while

the value of the terrain skin map z(k, l) is the me-

dian of z-coordinates of all data points (x,y) such that

≤ x < x

k+1

≤ y < y

l+1

, see illustration in Fig. 4,

bottom. This is the input image for the algorithm

described in Sec. 5. Two views of the scenery are

depicted in Fig. 5. From the illustration, it becomes

clear that in the small, exemplary sample of the data

set, all four buildings were detected and correctly re-

constructed. For building reconstruction from larger

data sets, the steps of Euclidean reconstruction and

depth maps computation must be performed for dif-

ferent, overlapping parts of the terrain and then fused

by means of the rasterization procedure. The compu-

tation of the DTM is then carried out by the domain-

decomposition routine of (Lin et al., 2006) while the

building reconstruction procedure does not have such

limitations with respect to number of building or size

of the rasterized image.

Figure 5: Views of the model of the data set Bonnland.

Building walls are textured according to regional traditions

while the trees can be modeled according to season. In the

densely wooded regions (in the bottom left corner in all im-

ages of Fig. 4), trees from Sec.6.2 with a constant diameter

are instantiated until they ﬁll the region.

7 CONCLUSIONS AND

OUTLOOK

A robust and automatic approach for extraction of

dense 3D point clouds from several images was pre-

sented. We improved the performance of non-local

methods by overcoming biases towards fronto paral-

lel surfaces and a more reliable reconstruction in tex-

tureless areas by consideration of triangular meshes.

In order to obtain correct depths for pixels that either

lie outside the convex hull spanned by already avail-

able points or in triangles inconsistent with the sur-

face, non-local optimization methodscan be used. Al-

though the semi-global algorithm with 16 optimiza-

tion paths, as proposed in (Hirschm¨uller, 2008), usu-

ally provides good results, the implementation of the

software is very ﬂexible. New cost and aggregation

functions, but also triangular-based smoothness terms

and non-local algorithms can easily be added as addi-

tional modules.

Two applications of sequential multi-view dense

reconstruction were discussed. First we presented the

creation and visualization of dense point clouds from

ON APPLICATIONS OF SEQUENTIAL MULTI-VIEW DENSE RECONSTRUCTION FROM AERIAL IMAGES

279

several reference images. Remaining outliers were re-

moved according to the local density (accumulator)

function (3). Further integration of color and con-

ﬁdence information will concede an additional sta-

bility in the approach. The second application con-

cerns building modeling. The three-step procedure

of (Gross et al., 2005), with the two modiﬁcations

of DTM modeling by means of a robust cost func-

tion (L

-splines) and k-means based normal vector

clustering, also automatically processes dense point

clouds obtained by passive sensors from light UAVs

in nadir view. Therefore it is shown, that methods for

large scale range data with homegenously distributed

samples can be adapted to relatively low quality, se-

quentially obtained data of theoretical inﬁnite length.

In the majority of cases, urban structures are recon-

structed well, as one can see from Fig. 5. To per-

form an accurate quantitative evaluation of complete-

ness and correctness of the procedure in comparison

with other procedures, such as (Rottensteiner, 2010),

reconstruction of either several high-resolution aerial

images or a larger video sequence must be performed.

These goals are currently being met, but they are be-

yond the scope of our work. Further consideration

of image information (e.g. segmentation) will be a

topic of future work. One can additionally ﬁlter out

vegetation: analyzing the reference image by means

of trained data is the only interactive part of the ap-

proach. The trees can then be found in larger regions

of the image (sequence); their height is given by the

depth map. Also here efforts must be made in future

by using color and gradient information in input im-

ages as well as conﬁdence maps for better building

contour extraction and roof analysis.

REFERENCES

Baillard, C. and Zisserman, A. (2000). A plane-sweep strat-

egy for the 3D reconstruction of buildings from mul-

tiple images. ISPRS Congress and Exhibition in Ams-

terdam (Netherlands).

Bodensteiner, C., Hebel, M., and Arens, M. (2010). Ac-

curate single image multi-modal camera pose esti-

mation. Workshop on Reconstruction and Modeling

of Large-Scale 3D Virtual Environments. European

Conference on Computer Vision (ECCV).

Bulatov, D. (2008). Towards Euclidean reconstruction from

video sequences. Int. Conf.Computer Vision Theory

and Applications (2), pages 476–483.

Bulatov, D. and Lavery, J. (2010). Reconstruction and tex-

turing of 3D urban terrain from uncalibrated monocu-

lar images using L

Splines. Photogrammetric Engi-

neering and Remote Sensing, 75(10):439–450.

Bulatov, D., Wernerus, P., and Heipke, C. (2011). Multi-

view dense matching supported by triangular meshes.

ISPRS Journal of Photogrammetry and Remote Sens-

ing, accepted for publication.

Curless, B. and Levoy, M. (1996). A volumetric method for

building complex models from range images. Proc.

ACM SIGGRAPH, 30:303–312.

Fischer, A., Kolbe, T., Lang, F., Cremers, A., F¨orstner,

W., Pl¨umer, L., and Steinhage, V. (1998). Extracting

buildings from aerial images using hierarchical aggre-

gation in 2D and 3D. Computer Vision and Image

Understanding, 72(2):185–203.

Geibel, R. and Stilla, U. (2000). Segmentation of Laser-

altimeter data for building reconstruction: Compari-

son of different procedures. Int. Arch. of Photogram-

metry and Remote Sensing, 33 part B3:326–334.

Gross, H., Th¨onnessen, U., and v. Hansen, W. (2005). 3D-

Modeling of urban structures. Joint Workshop of IS-

PRS/DAGM Object Extraction for 3D City Models,

Road Databases, and Trafﬁc Monitoring CMRT05,

Int. Arch. of Photogrammetry and Remote Sensing, 36,

Part 3W24:137–142.

Haala, N. (2005). Multi-Sensor-Photogrammetrie – Vi-

sion oder Wirklichkeit? Habilitation, Deutsche Geo-

d¨atische Kommission, M¨unchen, C589.

Harris, C. G. and Stevens, M. J. (1998). A combined corner

and edge detector. Proc. of 4th Alvey Vision Confer-

ence, pages 147–151.

Hartley, R. and Zisserman, A. (2000). Multiple view geom-

etry in computer vision. Cambridge University Press.

Hirschm¨uller, H. (2008). Stereo processing by semi-global

matching and mutual information. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

30(2):328–341.

Kobbelt, L. and Botsch, M. (2004). A survey of point-

based techniques in computer graphics. Computers

& Graphics, 28(6):801–814.

Lin, Y.-M., Zhang, W., Wang, Y., Fang, S.-C., and Lavery, J.

E. (2006). Computationally efﬁcient models of urban

and natural terrain by non-iterative domain decompo-

sition with l

-smoothing splines. Proc. 25th Army Sci-

ence Conf., Department of the Army, Washington DC,

USA.

Mayer, H. (1999). Automatic object extraction from aerial

imagery – A Survey focusing on buildings. Computer

Vision and Image Understanding, 74(2):139–149.

Mayer, H. and Bartelsen, J. (2008). Automated 3D re-

construction of urban areas from networks of wide-

baseline image sequences. The Int. Arch. of the Pho-

togrammetry, Remote Sensing and Spatial Informa-

tion Sciences, 37, Part B5:633–638.

Pock, T., Zebedin, L., and Bischof, H. (2011). Tgv-

fusion. Rain-bow of Computer Science. Springer-

Verlag, 6570/2011:245–258.

Rottensteiner, F. (2010). Roof plane segmentation by com-

bining multiple images and point clouds. Proc. of

Photogrammetric Computer Vision and Image Analy-

sis Conference, Int. Arch. of Photogrammetry and Re-

mote Sensing, 38, Part 3A:245–250.

Scharstein, D. and Szeliski, R. (2002). A Taxonomy

and Evaluation of Dense Two-frame Stereo Corres-

pondence Algorithms. International Journal of Com-

puter Vision, 47(1):7–42. Images and ground truth can

be downloaded at: http://bj.middlebury.edu/∼schar/

stereo/data/Tsukuba/.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

280