Image based 3D Reconstruction in Cultural Heritage Preservation
Alessandro Cefalu, Mohammed Abdel-Wahab, Michael Peter, Konrad Wenzel and Dieter Fritsch
Institute for Photogrammetry, University of Stuttgart, Geschwister-Scholl-Straße 24D, Stuttgart, Germany
Keywords: 3D Reconstruction, Structure-from-Motion, Dense Image Matching, Cultural Heritage.
Abstract: Documentation of the current state of an object is often the first crucial step in cultural heritage
preservation. Especially for large scale objects as buildings this task becomes complex and time consuming.
Hence, there is a growing interest in new, more efficient techniques, which ease the process and reduce the
financial impact of surveying actions. In case of façade restoration, experts need to map damages and plan
the corresponding measures, before the actual restoration can take place. Here, two-dimensional CAD
drawings, depicting each single stone, serve as a basis. Traditionally these plans are derived from classical
surveying. Often a photogrammetric approach is chosen to reduce the efforts on site. But still image
processing, including image registration and point measurements, is carried out punctually and manually.
Since about three years, our institute supports the introduction of modern image processing tool chains to
the application field of heritage preservation. Recently we participated in the restoration of the tower
facades of the St. Martin dome in Rottenburg/Neckar, Germany. We combined laser scans, terrestrial
imagery and images captured from a UAV platform, incorporating structure-from-motion, dense image
matching, point cloud registration and production of orthographic projections, from which the CAD
drawings could be derived.
1 INTRODUCTION
Cultural heritage preservation has come to more
attention in the last years, as there is a growing
awareness of the need to maintain monuments and
other artefacts for future generations. As an initial
step most preservation actions include a
documentation of the current state of the object of
interest. In case of façade restoration actions,
standard surveying methods are usually chosen to
observe the façade’s geometric appearance. Enough
points need to be measured to allow a mapping of
each single stone and other details important for the
restoration task. From these measurements 2D CAD
drawings are derived, which again enable civil or
structural engineers and architects to map damages,
plan corresponding counter measures and estimate
costs.
Tachymetry as a classical surveying method
provides very accurate but only punctual
measurements which need to be triggered manually.
Thus laser scanning is often preferred due to its’
high measurement density and fast acquisition rate.
Photogrammetry also provides fast on-site
acquisition. When used in a modern, highly
automated manner, incorporating techniques as
structure-from-motion and dense image matching, it
can provide results comparable to laser scanning to a
much lower price of hardware.
Certainly, it has the drawback of being a
triangulating measurement technique and thus a
point in space must be observed from more than one
station, but a camera station can be changed without
much effort. In fact the camera can be mounted on a
moving platform as a crane or a hoisting platform
and reach areas of a building which are not
accessible from the ground. Due to its’ relatively
low weight it can also be carried by small UAVs,
improving the approach’s flexibility even more.
Since both, laser scanning and dense image
matching, observe arbitrary points in space with a
high density, producing orthographic projections
(orthophotos) of the data seems an adequate basis
for the final CAD drawings.
Our institute promotes the introduction of
modern image processing strategies into the branch
of heritage preservation through practical application
of internal and external developments since more
than three years. Within the presented project, which
aims at the restoration of the facades of the St.
Martin dome in Rottenburg am Neckar, Germany,
we took over the task of status-quo-documentation.
201
Cefalu A., Abdel-Wahab M., Peter M., Wenzel K. and Fritsch D..
Image based 3D Reconstruction in Cultural Heritage Preservation.
DOI: 10.5220/0004475302010205
In Proceedings of the 10th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2013), pages 201-205
ISBN: 978-989-8565-70-9
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
2 DATA ACQUISITION
The dome is located in the city’s historical centre
and is mostly surrounded by dense and irregular
housing. The dome’s tower rises from the dome’s
southern roof surface and reaches roughly 70m in
height. The facades have a breadth of roughly
7,50m. Regardless of the used measurement
technique, the rare possible terrestrial survey stations
can mostly be found in the surrounding alleys and
always suffer from either high distance or bad
viewing angle or occlusion. Although there is a
market square west of the dome, which provides a
lot of space, the dome’s entrance strongly occludes
the view to the west side of the tower (Figure 1).
Figure 1: View at the west side of the dome from the
market square.
As indicated before, dense image matching was
to be used for as many parts of the tower as possible.
The orientation of the images has been computed
without any a priori information by means of
structure-from-motion algorithms. To guarantee high
coverage and quality of the derived point clouds,
several aspects need to be taken into account. First,
the image data set needs to provide enough overlap
between the images to enable a stable orientation of
the camera stations. Here usually a minimum of 60%
is chosen. Second, this overlap needs also to fit the
requirements of dense image matching. Here a
higher overlap and hence greater similarity of the
images is preferable. Of course, this results in a
trade-off between matching completeness and
triangulation accuracy, which gets worse for smaller
intersection angles. We usually try to provide an
overlap of 80%, which is a good compromise.
Additionally, we make use of the given redundancy
by applying multi-view-stereo triangulation to our
data sets.
Figure 2: The tower helmet’s rich details which should be
collected from an elevated position.
In terms of object coverage, it is obvious, that
images from elevated stations were needed to obtain
data for areas appearing occluded from below.
Hence, we decided to hire an external company to
collect the images for us, using an octocopter
(Figure 3). We planned to cover the tower in twelve
vertical flight routes, to guarantee horizontal linkage
of the four facades. The camera was triggered every
two seconds. Unfortunately, the GPS stabilisation of
the octocopter was corrupted in lower areas by the
surrounding housing, thus the pilot decided to steer
the UAV completely manually. This again, was
made difficult by wind shear around the tower. The
strong and fast steering manoeuvres, together with
the high payload, caused one of the rotors to break
during the second vertical flight route. Fortunately,
the emergency landing could be carried out without
any people being hurt or material being damaged,
except the octocopter itself. Nevertheless, it was not
possible to complete the flight. Since there was only
a short time left between the date of receiving the
flight permission and the date of raising a scaffold
around the tower, it was not possible to organize a
crane or similar to reach the upper parts of the tower
(Figure 2). Instead we decided to obtain as much
terrestrial imagery as possible and collected very
close images from the scaffold itself, accepting the
predictable complications caused by non-optimal
choice of camera stations. Also, scaffolds are usually
built in a manner which occludes view from one
layer to the next and the strong scaling differences
between UAV / terrestrial images and the images
captured from the scaffold, were expected to make
an image based registration impossible. To
overcome this challenge, we additionally laser
scanned the tower, in order to perform a point cloud
based registration of the data.
All in all, we collected laser scans from twelve
stations, roughly 1900 terrestrial and UAV images
ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics
202
and roughly 2400 very close images from the six
uppermost scaffold levels.
Figure 3: The octocopter collecting images at the south-
west side of the tower.
3 DATA PROCESSING
As mentioned above, our goal is to identify and
practically apply given software solutions to heritage
preservation. In this case, the tasks which we needed
to perform range from image registration, dense
point measurement and registration of point clouds
to the production of orthographic projections and
deriving CAD drawings from the latter. To our
knowledge, there is no software package available
which can handle all these topics in one pipeline, but
there is a variety of packages which can provide
solutions for single sub-tasks. For structure-from-
motion, as well as dense image matching we have
applied internal own developments. For point cloud
processing we mostly used freely available packages
(OpenSource).
3.1 Structure-from-Motion
This step appeared to be the most sensitive part of
the process chain. We applied our internal
development (Abdel-Wahab et al., 2012) and cross-
checked it with results obtained from the free
software VisualSFM (Wu et al., 2011). The basic
work flow of both implementations consists of:
1. Keypoint detection and descriptor extraction (per
image)
2. Keypoint matching and estimation of relative
orientation (for each image pair)
3. Point tracking and connectivity computation
4. Iterative triangulation of points contained in
oriented images and resection of new images,
followed by a (small scaled) bundle adjustment
5. Final (global) bundle adjustment
For completeness, it should be mentioned, that in
some cases optical flow is used instead of extraction
of distinct features. However, optical flow is better
suited for video streams, where parallaxes between
consecutive images stay small.
The selection of images and assigning the initial
image pair and the setting of program parameters
has a high impact on the results. While both
implementations yield comparable results, problems
regarding connectivity or orientation accuracy occur
at different parts of the dataset.
Both implementations use a 2-parameter radial
distortion model and allow either assigning one
camera model to all images or assigning a separate
model to each image. While the latter is especially
useful for cases in which images are obtained from
internet data bases or with a zoom lens, the first
option should be preferred when the images are
taken with a single camera with fixed focal length.
In our case four cameras have been used, partially
with zoom lenses and partially with fixed focal
length. Assigning separate models per image was the
most practicable solution for us and delivered
reasonable results. Nonetheless a more flexible
assignment of camera models is a desirable feature
for future developments, as it can be expected to
make the structure-from-motion process more robust
in some cases. The same might hold for embedding
more complex distortion models.
However, the major drawback of many
approaches to structure-from-motion is their iterative
nature. Errors introduced in an early iteration are
propagated through the process and can lead to a
drifting behaviour. This is especially easy to observe
in circular camera station configurations, where this
circumstance can lead to the well-known loop-
closure problem. The propagated errors lead to a
rejection of the linkage between images introduced
in an early iteration and images introduced in a later
iteration. As mentioned this problem occurred with
both software variants, but at different parts of the
data. This might be explained by the different
approaches to evaluating connectivity and
accordingly, to a different behaviour in rejection of
images and a different order of processing the
images.
Imagebased3DReconstructioninCulturalHeritagePreservation
203
The connectivity of the images yields a network
of relative orientations. While the described
technique can be understood as an attempt to find
optimal paths through this network, some recent
developments aim at solving the issue as a global
network optimization problem (Crandall et al.,
2011). The basic idea is that a concatenation of
relative transformations of a network cycle needs to
result in an identity transformation. As we see a lot
of potential in this approach, our future
developments will aim in this direction.
Figure 4: Sparse point cloud and stations of terrestrial and
UAV images (VisualSFM output).
However, it was possible to process all terrestrial
and UAV images in one project and obtain a single
connected scene (Figure 4) . As expected, it was not
possible to successfully include the close-up images
collected from the scaffold. Also, it turned out
during the dense matching procedure, that the
registration accuracy was insufficient. Instead,
processing smaller subsets of the data delivered
better results, which were co-registered in an extra
step.
3.2 Dense Image Matching
The undistorted images and their orientations were
passed to our dense image matching software SURE
(Rothermel et al., 2012), which incorporates a
variant of semi-global matching, with a multi-view
stereo triangulation step. The quality of the obtained
point clouds has a large variety. As expected, the
point clouds derived from the UAV images and
those derived from the images collected from the
scaffold (Figure 5), yield good to very good results.
In contrast, the point clouds derived from terrestrial
images yield a medium to bad quality. Nonetheless,
redundancy and applying different filters to the point
clouds delivered a sufficient overall data quality.
Figure 5: The six point clouds captured from the scaffold
registered into a common coordinate system.
3.3 Point Cloud Registration
In a first step the point clouds obtained by laser
scanning were registered using the ICP (iterative
closest point) functionality provided by the free
software package Meshlab. Here, by manually
choosing a few tie-points, an initial rigid body
transformation is applied, and is refined afterwards
by the actual algorithm. The transformation can be
set to keep one of the original coordinate systems
fixed.
In a second step, a selection of the vast number
of point clouds obtained from structure-from-motion
and dense image matching was registered to the final
laser data. Here, the available scaling option was
activated to obtain metric results.
3.4 Orthophoto and CAD Drawing
In order to produce metric orthographic projections,
a small software tool, which allows to define a local
coordinate system for the projection and is capable
of dealing with the relatively high amount of noise,
had to be developed. The density of points is tracked
along the projection ray and the first found
maximum is chosen as the correct depth. Points in
the neighbourhood of this solution are aggregated to
further reduce noise. These simple features are
usually not given in other tools. The orthophotos
served as a basis for manually deriving the requested
contour lines as a collection of 15 CAD drawings
(Figure 6, right and Figure 7, right). Not only
orthophotos depicting the colour information (Figure
7, left) of the images were used, but also ortho-
ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics
204
images showing the normal vector directions (Figure
6, left) proved useful.
Figure 6: Left: Highlighting geometrical contours by
color-coding normal vector directions (eastern facade).
Right: Drawing of the south-western side of the tower
helmet.
Figure 7: Left: Orthophoto of the tower’s south façade,
derived from UAV imagery. Right: The corresponding
CAD drawing.
4 CONCLUSIONS
Although adverse conditions in the data acquisition
step made processing of the data difficult in some
points, we can state that in general it is possible to
apply the described techniques to heritage
applications. To make the procedures applicable for
a broader range of users, there is a need to further
increase robustness of the software solutions and to
integrate a wider range of functionalities to the
software packages. When processing point clouds,
the usability is often decreased by the high amount
of data. Space for improvements may be found in all
parts of the processing chain, but are especially
needed in structure-from-motion, since it stands in
the beginning of the chain and influences all
consecutive results. Nonetheless, we observe the
given techniques to become more and more
attractive to practical application.
REFERENCES
Abdel-Wahab, M., Wenzel, K., Fritsch, D., 2012.
Automated and Accurate Orientation of Large
Unordered Image Datasets for Close-Range Cultural
Heritage Data Recording. Photogrammetrie –
Fernerkundung – Geoinformation (PFG),Vol. 6, 2012,
pp. 679-690.
Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.,
2011. Discrete-Continuous Optimization for Large-
Scale Structure from Motion, In Proceedings
Conference on Computer Vision and Pattern
Recognition (CVPR). Colorado Springs.
Rothermel, M., Wenzel, K., Fritsch, D., Haala, N., 2012.
SURE: Photogrammetric Surface Reconstruction from
Imagery. In Proceedings LC3D Workshop. Berlin.
Wu, C., Agarwal, S., Curless, B., Seitz, S. M., 2011.
Multicore Bundle Adjustment. In Proceedings
Conference on Computer Vision and Pattern
Recognition (CVPR). Colorado Springs.
Imagebased3DReconstructioninCulturalHeritagePreservation
205