Real-Time Tree Extraction and Rendering
Based on Seasonal Large Scale Aerial Pictures
Timo Ropinski, Frank Steinicke, Jennis Meyer-Spradow and Klaus Hinrichs
Institut f
ur Informatik, Westf
alische Wilhelms-Universit
at M
unster, Germany
3D city visualization, treetop extraction, massive texture rendering, aerial photography.
In this paper we introduce visualization techniques for massive multimodal texture datasets. These techniques
work on registered texture datasets and have been developed with the goal to improve rendering of these
datasets when used in virtual landscape or city environments. We briefly discuss how to render these mul-
timodal datasets at interactive frame rates, and we present an interactive treetop extraction technique, which
allows to segment treetops and visualize them as 3D objects to increase realism.
With the widespread use of geographic information
systems and the increasing availability of image ac-
quisition systems the amount of generated geo-spatial
data has increased dramatically within the last years.
Systems like Google Earth have demonstrated that the
interactive exploration of these datasets finds many
application areas in the domains of tourist informa-
tion, way finding, climatology and many more. Two
aspects of applications supporting the exploration of
geo-spatial datasets are interactivity and visual repre-
sentation. Usually there is a trade-off between these
two aspects, since in general visually appealing repre-
sentations require more complex computations which
results in reduced frame rates and thus interferes with
the interactive experience of the user. In this paper
we will propose interactive visualization techniques
for large scale raster data which generate appealing
visual representations at interactive frame rates.
Since nowadays for many locations more than one
aerial image is available, multiple information layers
can be taken into account for the presentation, possi-
bly showing a location during different seasons and/or
weather conditions. Since the information required
for a location may be contained in different aerial im-
ages, novel visualization techniques are needed to in-
teractively analyze and fuse this multimodal content.
We have developed techniques which support the in-
teractive exploration of multiple registered large scale
aerial images, each having a size of possibly several
gigabytes. In particular we will show how to combine
the information stored in two registered aerial images
in order to achieve a more appealing visual represen-
tation in geo-virtual environments. Our techniques
use a summer and a winter aerial image to extract tree-
tops during runtime. These extracted 2D treetops are
extended to 3D and visualized as objects of the geo-
virtual environment.
All concepts proposed within this paper have been
tested using two registered aerial image datasets (see
Figure 1). The datasets cover a 21km × 24km area
around the City of M
unster at a resolution of 10cm ×
10cm. They have been acquired in August 2001 and
in January 2005; in the following we will simply refer
to them as summer texture resp. as winter texture. Al-
though we deal with two large scale texture datasets,
having sizes of 4.33GB resp. 7.80GB, all techniques
proposed in this paper can be applied during runtime
at interactive frame rates. Thus it is possible to inte-
grate them into client side applications streaming data
from map servers without requiring interface adap-
tions with the data provider.
Ropinski T., Steinicke F., Meyer-Spradow J. and Hinrichs K. (2007).
AUTOMATIC INTEGRATION OF FOLIAGE INTO 3D CITY MODELS - Real-Time Tree Extraction and Rendering Based on Seasonal Large Scale Aerial
In Proceedings of the Second International Conference on Computer Graphics Theory and Applications - GM/R, pages 299-304
DOI: 10.5220/0002081402990304
(a) summer texture (b) winter texture
Figure 1: Parts of the summer and the winter texture.
Several techniques have been developed with the goal
to achieve interactive frame rates when rendering
large scale texture datasets (Hua et al., 2004; Broder-
sen, 2005). These techniques mainly differ in the
supported feature set as well as in implementation
and runtime complexity. To achieve interactive frame
rates when rendering multimodal texture datasets, we
have decided to implement the clipmapping technique
originally proposed by Tanner et al. (Tanner et al.,
1998). To exploit clipmapping for our visualization
techniques we have extended the implementation to
maintain more than one clipmap hierarchy, i.e., in
our case one for each modality. For a more realis-
tic representation aerial textures need to be projected
onto terrain models. Therefore we have integrated
the geometry clipmap proposed by Asirvatham and
Hoppe (Asirvatham and Hoppe, 2005) into our terrain
rendering engine.
Nowadays remote sensing technologies allow an
exact classification of certain usage areas (Castel
et al., 2001; Lee et al., 2004; Schlerf and Atzberger,
2006). Although, it is possible to extract forest cov-
erage as well as treetops, most of the techniques ei-
ther require additional registered LIDAR (= light de-
tection and ranging) datasets or are not applicable in
Many algorithms for the interactive rendering of
trees and foliage have been presented in the last years.
Since this paper does neither address natural model-
ing of trees nor specialized LoD representation we
simply refer to (Lluch et al., 2003; Deussen et al.,
2004) for an overview of some approaches.
An important aspect of LoD rendering techniques is
the determination of the appropriate LoD for a given
point or region in space. Since the LoD is view-
dependent the view frustum defined by the camera’s
position and orientation as well as further properties
such as the associated clipping planes have to be con-
sidered when determining the correct LoD. Since Tan-
ner et al. (Tanner et al., 1998) have not covered LoD
determination in their description of clipmapping, we
will describe our approaches to find the correct LoD
for a given point or region in space. We distinguish
between two different cases of camera control in or-
der to provide an optimal arrangement of clipmapping
textures: (a) in the 2D case, the view direction of the
camera is orthogonal to the ground plane, whereas (b)
in the 3D case, the camera is arbitrary. The goal is to
arrange the clipmapping textures in such a way that
most of the ground plane within the view frustum is
covered with textures of the highest quality. For sim-
plicity, we reduce the region for which a LoD has to
be determined to a single point in three-dimensional
space, which is located on the ground plane. We ap-
ply a function height(x,y) that returns the height at
the position (x,y). In the following cmc denotes this
optimal clipmap center position in 3D world coordi-
3.1 3 DoF Top-View Camera
For the simple case in which the camera is constrained
to three degrees of freedom (DoF), i.e., the viewer ex-
plores the virtual environment from a top-view per-
spective where the view direction is focussed along
the negative z-axis towards the ground plane, the cam-
era position p = (p
, p
, p
) is the only camera at-
tribute that has to be considered. We define the func-
tion lod(p) = min({l|(x
)}) that returns the appropriate
LoD for p, i.e., the level of the clipmap hierarchy,
such that lod(p) [0, l
] where 0 resp. l
is the in-
dex of the clipmap level representing the highest resp.
lowest resolution; and the clipmap at level k covers
the area reaching from (x
) to (x
With these prerequisites the determination of the
correct LoD can be reduced to the calculation of the
optimal clipmap center cmc for a given camera con-
figuration. We assume that the base plane of the
height field onto which the aerial photograph has to
be projected is always parallel to the xy-plane having
an averaged surface normal ~n = (0,0, 1). Hence,
assuming that the viewer focuses on the center of
the viewport, we assign the clipmap center cmc
, p
, p
)) for all levels l of the clipmap
hierarchy and we can determine the correct LoD for
each camera position p. Thus all clipmap textures are
arranged concentrically. The result is shown in Fig-
GRAPP 2007 - International Conference on Computer Graphics Theory and Applications
(a) camera centered 3 DoF view (b) camera centered 6 DoF view (c) our technique 6 DoF view
Figure 2: Visualization of different LoD determinations. The LoDs are color coded: highest/red, middle/green, lowest/blue.
ure 2(a), where red, green and blue are used to color
the first three levels of the clipmap hierarchy.
3.2 6 DoF Camera
Calculation of the optimal clipmap center in the 6
DoF case is more complicated. While in the afore-
mentioned case we have assumed the optimal clipmap
center to be cmc
= (p
, p
, p
)), i.e., the
orthogonal projection of the camera position onto the
ground plane, such clipmap center is not sufficient for
the general 6 DoF case.
As it can be seen in Figure 2(b) a camera setup
with a large angle between the view direction and
~n = (0, 0, 1), results in large areas of the highest
LoD that cannot be seen by the user because they lie
outside the view frustum. Since areas in front of the
user are perceived best, low resolution in these areas
results in unsatisfying visualizations. Thus the strat-
egy to display the highest LoD in the area around the
viewport center as done in the simple top-view case is
not sufficient for the general case.
For a sophisticated visualization when using a 6
DoF camera we have to extend the clipmapping tech-
nique to allow different clipmap centers for the levels
of the hierarchy. We redefine the set of clipmap cen-
Figure 3: Schematic side view of the view frustum showing
the configuration of clipmaps on the ground plane.
ters cmc
, where l denotes again the increasing index
of a level in the clipmap hierarchy, without the con-
straint of concentric clipmap levels, and we perform a
clipmap center calculation with the goal to maximize
the areas in image space showing the highest LoD.
Figure 3 illustrates the procedure when arranging
the different clipmap textures. As illustrated we as-
sume that the angle α between the camera’s view di-
d and ~n is between 0 and 90 degrees. Since in
this case the area which is projected directly above
the bottom plane of the view frustum can be per-
ceived best by the user, we have to assure that the
highest level of the clipmap hierarchy is presented in
this area. Therefore we have to determine the edge
formed by the intersection of the ground plane and
the bottom plane of the view frustum which is deter-
mined by b in the near plane, where t denotes the dis-
tance to the top of the field of view in the near plane
(see Figure 3). This edge can be calculated by using
the angle α as well as the ground vector ~gvec from
, p
, p
)) to the intersection point hit of
the vector
dir with the height field. The length of ~gvec
is given by the distance from (p
, p
, p
to the corresponding edge of the view frustum us-
ing b (see Figure 3). The center of this edge is used
to position the clipmap with highest quality at cmc
Moreover, we add an offset defined by the half of the
clipmap texture’s size to shift the texture along the
ground vector ~gvec in order to ensure that the user can
always see the entire clipmap. The next clipmap cen-
ters are shifted analogously to cmc
, cmc
etc. as il-
lustrated in Figure 3. As depicted the clipmap textures
do not overlap concentrically, but the front edges, i.e.,
closest to the viewer, of the clipmap textures coincide.
Hence, with decreasing distance between the viewer
and the virtual environment displayed on the ground
plane, the quality increases up to the highest resolu-
tion shown directly in front of the user.
As explained in Section 3.1, when the user focuses
on a region orthogonally under her position we use
the concentric arrangement of the clipmap textures
Based on Seasonal Large Scale Aerial Pictures
(a) trees projected on the
(b) 3D model rendered on
top the projected trees
Figure 4: Two images showing how trees are integrated in
nowadays 3D city models.
around the hit point hit = (p
, p
, p
Therefore, when the angle α gets smaller than a cer-
tain threshold we switch to the approach described in
Section 3.1. We experienced best results for an angle
of about 15 degrees (see Figure 2(c)).
When combining 2D aerial pictures with 3D city
models a common problem arising is that real objects
contained in the aerial picture are visualized as pro-
jections on the ground as long as no 3D model is pro-
vided. In the cases where a 3D model is visualized
at the appropriate position, this may not be an issue.
Since the projection shown on the aerial picture is ei-
ther not visible because it is occluded by the model,
or the aerial picture is projected onto the model pro-
viding it with a realistic coloring. The second alter-
native is commonly used in 3D city models in order
to texture buildings. However, in the areas were no
3D geometry is present the projections on the aerial
pictures look unrealistic. Especially when using aer-
ial pictures containing many trees, this issue becomes
obvious and results in a very unnatural appearance
(see Figure 4(a)). The commonly used strategy to deal
with this problem is to place a 3D object, in this case
a tree, at the appropriate position of the aerial picture
(see Figure 4(b)). Unfortunately when using 3D tree
models, in general the trunk does not cover the whole
area of the aerial picture which is covered by the tree
as seen from above. With the technique presented in
this section it is possible to visualize trees, which do
not have to be positioned manually on the aerial pic-
ture. To achieve this we perform an automated tree
segmentation on the fly and visualize the trees in an
additional rendering pass.
4.1 Tree Segmentation
In order to visualize the trees contained within a 3D
city model without requiring manual user interac-
tions, a tree segmentation is needed. With the seg-
mentation presented in this subsection we were able
to extract the parts of the textures which correspond
to deciduous trees. The idea is to simply compare the
content of two registered textures showing the same
area of interest during different seasons, i.e., summer
and winter. As it can be seen in Figure 1, the sum-
mer texture is visually more appealing while the cor-
responding winter texture does not contain the tree-
tops, since the leaves are gone in winter. Thus we can
assume, that a texel t
in the summer texture repre-
sents parts of a treetop, if the following condition is
) = g max
) 6= g.
Here t
denotes the texel in the summer texture, t
the corresponding texel in the winter texture and
returns the color channel with the max-
imum intensity. Although this looks quite easy in
theory, in practice several problems arise. The main
problems are:
Because lighting is present in both aerial pictures,
treetops are shaded and do not contain a green hue
across their whole extent.
Since the registered aerial pictures have been ac-
quired during different seasons and possibly dur-
ing different times of the day, the lighting condi-
tions may differ drastically.
Due to the first issue no sufficient segmentation
can be obtained when using a simple pixel-based
comparison to identify treetop areas (see Figure 5(a)).
Thus we have extended the tree segmentation not to
process on a texel t isolated from the others, but con-
sider also the k × k neighborhood N
(t), k odd, of the
current texel t. Thus a texel is identified as a treetop,
if the following assertion evaluates to true:
) = g}| >
) 6= g}| >
Obviously the size k of the filter mask is dependent on
the resolution of the aerial picture as well as the light-
ing conditions. For our aerial pictures having a res-
olution of 10cm × 10cm we achieved a good tradeoff
between segmentation results and frame rate, when
using a filter covering 3 × 3 texels (see Figure 5(b)).
To address the second problem we have calculated
a histogram equalization which can be accessed when
GRAPP 2007 - International Conference on Computer Graphics Theory and Applications
(a) pixel-based segmentation (b) averaged neighborhood segmen-
(c) averaged neighborhood segmenta-
tion with color transformation
Figure 5: Application of different tree segmentations strategies: simple pixel-based segmentation, averaged neighborhood
segmentation, and averaged neighborhood segmentation with color transformation (from left to right).
fetching the textures. This can be done in real-time
and thus we can ensure that the original image data
does not have to be modified as it would be neces-
sary when applying the more time consuming light-
ing estimation techniques performed in a preprocess-
ing step. Figure 5(c) shows the segmentation results
we could achieve when combining the neighborhood
average operator with this color transformation.
We achieved good segmentation results by using
the described method. However, there are also draw-
backs. One problem obviously arises when a non-
treetop texel is green in the summer texture and of
different color in the winter texture. This may be the
case for green cars and other green objects present in
the summer texture but not in the winter texture or
for green objects colored differently in the winter tex-
ture due to the lighting conditions. This problem can
be solved to a certain extent by increasing the size k
of the neighborhood used during the color averaging
process. However, increasing the neighborhood re-
sults in less performance. Another problem comes up,
when coniferous trees are present. Since these trees
are green all year round, we are not able to segment
them. Thus in cases were a more robust segmentation
is required an offline method has to be used.
4.2 Visualizing Trees
Using the segmentation technique described above,
we can distinguish aerial image texels, wether they
belong to tree tops or not. In the visualization process
we use the winter texture, which contains no tree-
tops, for the ground, and we send some extra geom-
etry down the rendering pipeline in order to render
the treetops extracted from the summer texture. For
rendering of the ground with the winter texture we
use a GPU-based implementation of the geometry
clipmaps (Asirvatham and Hoppe, 2005). In the fol-
lowing we will describe the approach for rendering
the trees.
To get a realistic appearance for the general 6 DoF
case, we use a layered geometry consisting of several
stacked geometry clipmaps. Each geometry clipmap
is altered to a certain height, while the average alti-
tude of all these clipmaps represents the average tree
height in the geo-virtual environment.
For processing each clipmap of the clipmap stack,
we can either compute the tree extraction in an ini-
tial rendering pass, or we can vary the segmentation
process for each geometry clipmap. While the first
approach allows higher frame rates the second ap-
proach results in more convincing visual representa-
tions. A reasonable approach for achieving realistic
visualizations is to alter the size k of the neighbor-
hood area based on the current height during the seg-
mentation process (see Subsection 4.1). Increasing
k with increasing distance from the ground results in
more naturally shaped treetops. Furthermore to en-
hance realism we inserted a slight texture coordinate
shift when rendering each geometry clipmap.
A major drawback of the described technique for
rendering trees is the fact, that we only render the fo-
liage and not the trunks of the trees. While this is irrel-
evant for birds-view perspectives (see Figure 6(a) and
Figure 6(b)), a different perspective would make the
lack of trunks obvious. Thus in cases where arbitrary
perspectives are required, we combine our visualiza-
tion technique with the rendering of tree trunks (see
Figure 6(c)). However, positioning these trunks is a
critical aspect possibly involving user interactions, al-
though for areas, where cadastral data is available, the
position can be extracted automatically.
Based on Seasonal Large Scale Aerial Pictures
(a) Trees visualized from top view (b) Trees visualized from top view (c) Additionally integrated trunks
Figure 6: Visualizations of the extracted trees, without and with additional trunks inserted.
In this paper we have presented an efficient imple-
mentation which supports the usage of multimodal
clipmaps on standard graphics hardware. Based on
this clipmapping implementation we have presented
a new approach for estimating the clipmap center to
determine the LoD for the general 3D case. Due
to the efficiency of the proposed implementation we
were able to develop interactive visualization tech-
niques which improve realism as well as exploration
of interactive geo-virtual environments. We have pre-
sented an interactive treetop segmentation technique,
which extracts treetops from aerial images and visu-
alize them as 3D elements.
In the future it should be investigated if more ro-
bust offline segmentation algorithms may improve the
results and how some of the segmented data can be
stored with the aerial pictures.
The authors would like to thank the reviewers for their
valuable comments. Furthermore the City of M
for providing the texture data as well as the cadastral
data as well as the students contributing to the 3D city
visualization project.
Asirvatham, A. and Hoppe, H. (2005). Terrain rendering
using gpu-based geometry clipmaps. In GPU Gems 2.
Brodersen, A. (2005). Real-time visualization of large tex-
tured terrains. In GRAPHITE ’05: Proceedings of the
3rd international conference on Computer graphics
and interactive techniques in Australasia and South
East Asia, pages 439–442, New York, NY, USA.
ACM Press.
Castel, T., Beaudoin, A., Floury, N., Toan, T. L., Caraglio,
Y., and Barczi, J. (2001). Deriving forest canopy pa-
rameters for backscatter models using the amap ar-
chitectural plant model. IEEE Transactions on Geo-
science and Remote Sensing, 39(3):571–583.
Deussen, O., Ebert, D. S., Fedkiw, R., Musgrave, F. K.,
Prusinkiewicz, P., Roble, D., Stam, J., and Tessendorf,
J. (2004). The elements of nature: interactive and
realistic techniques. In SIGGRAPH ’04: ACM SIG-
GRAPH 2004 Course Notes, page 32, New York, NY,
USA. ACM Press.
Hua, W., Zhang, H., Lu, Y., Bao, H., and Peng, Q. (2004).
Huge texture mapping for real-time visualization of
large-scale terrain. In VRST ’04: Proceedings of the
ACM symposium on Virtual reality software and tech-
nology, pages 154–157, New York, NY, USA. ACM
Lee, K.-S., Cohen, W. B., Kennedy, R. E., Maiersperger,
T. K., and Gower, S. T. (2004). Hyperspectral ver-
sus multispectral data for estimating leaf area index
in four different biomes. Remote Sensing of Environ-
ment, 91(3-4):508–520.
Lluch, J., Camahort, E., and Vivó, R. (2003). Pro-
cedural multiresolution for plant and tree rendering.
In AFRIGRAPH ’03: Proceedings of the 2nd interna-
tional conference on Computer graphics, virtual Real-
ity, visualisation and interaction in Africa, pages 31–
38, New York, NY, USA. ACM Press.
Schlerf, M. and Atzberger, C. (2006). Inversion of a forest
reflectance model to estimate structural canopy vari-
ables from hyperspectral remote sensing data. Remote
Sensing of Environment, 100(3):281–294.
Tanner, C. C., Migdal, C. J., and Jones, M. T. (1998). The
clipmap: a virtual mipmap. In SIGGRAPH ’98: Pro-
ceedings of the 25th annual conference on Computer
graphics and interactive techniques, pages 151–158,
New York, NY, USA. ACM Press.
GRAPP 2007 - International Conference on Computer Graphics Theory and Applications