DISPLACEMENT PATCHES FOR GPU-ORIENTED

VIEW-DEPENDENT RENDERING

Yotam Livny, Gilad Bauman and Jihad El-Sana

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel

Keywords:

Level-of-detail, View-dependent rendering, Displacement mapping.

Abstract:

In this paper we present a new approach for interactive view-dependent rendering of large polygonal datasets,

which relies on advanced features of modern graphics hardware. Our preprocessing algorithm starts by gener-

ating a simpliﬁed representation of the input mesh. It then builds a multiresolution hierarchy for the simpliﬁed

model. For each face in the hierarchy, it generates and assigns a displacement map that resembles the original

surface represented by that face. At runtime, the multiresolution hierarchy is used to select a coarse view-

dependent level-of-detail representation, which is sent to the graphics hardware. The GPU then reﬁnes the

coarse representation by replacing each face with a planar patch, which is elevated according to the assigned

displacement map. Initial results show that our implementation achieves quality images at high rates.

1 INTRODUCTION

Polygonal meshes dominate the representations of 3D

graphics models due to their compactness and sim-

plicity. Recent advances in design, modeling, and ac-

quisition technologies have simpliﬁed the generation

of 3D models, which have led to the generation of

large 3D models. These models consist of millions

of polygons and often exceed the rendering capabili-

ties of advanced graphics hardware. Therefore, there

is a need to reduce the complexity of these models

to match the hardware’s rendering capability, while

maintaining their visual appearance. Numerous algo-

rithms have been developed to reduce the complex-

ity of graphics models. These include level-of-detail

rendering with multiresolution hierarchies, occlusion

culling, and image-based rendering.

View-dependent rendering approaches change the

mesh structure at each frame to adapt to the appro-

priate level of detail. Traditional view-dependent ren-

dering algorithms rely on the CPU to extract a level-

of-detail representation. However, within the dura-

tion of a single frame, the CPU often fails to extract

the frame’s geometry. In addition, communication be-

tween the CPU and the graphics hardware often forms

a severe transportation bottleneck. These limitations

usually result in unacceptably low frame rates.

Cluster-based multiresolution algorithms over-

come the CPU limitation by subdividing the dataset

into disjoint regions called clusters or patches, which

are simpliﬁed independently. These algorithms man-

age to reduce the time required to extract an adaptive

level of detail. However, the partition into patches of-

ten does not take into account ﬁne object space error

and the generated clusters usually fail to reduce mem-

ory requirements.

In this paper, we present a novel cluster-based

multiresolution approach and an efﬁcient view-

dependent rendering algorithm. In an off-line stage,

our algorithm simpliﬁes the input model to reach a

coarse representation. It then constructs a multireso-

lution hierarchy for the simpliﬁed model. Each face

in the hierarchy includes a compact representation of

an original model’s patch.

The currently available graphics hardware pro-

vides advanced functionalities, such as a pro-

grammable pipeline and vertex texturing. These fea-

tures necessitate the development of new algorithms

and techniques to leveragethe introduced capabilities.

Our algorithm utilizes advanced graphics hard-

ware to efﬁciently represent patches as displacement

maps, which are generated by sampling the surface

of the original model. At runtime, the CPU extracts a

coarse view-dependent level-of-detail representation

from the multiresolution hierarchy and sends it to the

graphics hardware. Within the graphics hardware, the

GPU reﬁnes each face by replacing it with a cached

planar mesh, and elevating the inserted vertices using

the assigned displacement map.

181

Livny Y., Bauman G. and El-Sana J. (2008).

DISPLACEMENT PATCHES FOR GPU-ORIENTED VIEW-DEPENDENT RENDERING.

In Proceedings of the Third International Conference on Computer Graphics Theory and Applications, pages 181-190

DOI: 10.5220/0001095501810190

Copyright

c

SciTePress

(a) (b) (c) (d)

Figure 1: The wireframe and shaded representation of a coarse front of the Armadillo model and the reﬁned representation.

Our approach uses an error-guided subdivision to

partition the input model into disjoint patches, whose

geometry is compactly encoded using displacement

maps. Processing a coarse representation by the CPU

and applying mesh reﬁnement using the GPU dynam-

ically balance the rendering load among the CPU and

the GPU, and dramatically reduce the communication

load between them. Our approach manages to seam-

lessly stitch adjacent patches without adding extra de-

pendencies or sliver polygons.

In the rest of this paper, we discuss closely related

work. Then we present our framework followed by

implementation details and results. Finally, we sum-

marize our work, present conclusions, and suggest po-

tential directions for future work.

2 RELATED WORK

In this section, we brieﬂy discuss closely related re-

search.

View-dependent rendering schemes usually rely

on multiresolution hierarchies that encode various

levels of detail of the original model. Earlier ap-

proaches (Hoppe, 1997; Luebke and Erikson, 1997;

De Floriani et al., 1998; Pajarola, 2001) assume that

the multiresolution hierarchy ﬁts entirely into local

memory and the extraction of adaptive levels of de-

tail is performed with the CPU.

Several approaches accelerated view-dependent

rendering by reducing dependencies limitation which

is used to validate the split and merge operations (Kim

and Lee, 2001), or integrating occlusion culling

within the view-dependent rendering frameworks (El-

Sana et al., 2001; Yoon et al., 2003). To handle large

datasets that do not ﬁt in local memory, several ex-

ternal memory view-dependent rendering algorithms

were developed (El-Sana and Chiang, 2000; DeCoro

and Pajarola, 2002).

As the rendering capability of graphics hardware

improves and the size of datasets increase, the extrac-

tion of appropriate levelsof details within the duration

of a single frame becomes impractical for the CPU.

To overcome this limitation, cluster-based approaches

have been introduced (Erikson and Manocha, 2001;

Cignoni et al., 2004; Yoon et al., 2004).

The advances in graphics hardware have led to the

development of algorithms that inherently utilize the

introduced hardware capabilities. Several approaches

used the fragment processor to perform mesh sub-

divisions (Losasso et al., 2003; Bolz and Schr¨oder,

2005). The vertex processor was used to interpolate

different resolution meshes in a view-dependent man-

ner (Southern and Gain, 2003), deform displacement

maps (Schein et al., 2005), and map relief textures

onto polygonal models (Policarpo et al., 2005). Dis-

placement maps and the fragment processor were also

used to accelerate image-based rendering (Baboud

and D´ecoret, 2006; Kautz and Seidel, 2001).

Most GPU-based level-of-detail algorithms were

designed for height ﬁelds and terrain datasets (Livny

et al., 2007; Cignoni et al., 2003; Wagner, 2004;

Dachsbacher and Stamminger, 2004; Hwa et al.,

2005; Asirvatham and Hoppe, 2005; Schneider and

Westermann, 2006). Little work has been done to deal

with general 3D models. The programmable GPU

and displacement maps were used to approximate

general meshes (Doggett and Hirche, 2000; Guskov

et al., 2000) and for adaptive real-time rendering (Ji

et al., 2005; Donnelly, 2005; Hirche et al., 2004).

3 OUR APPROACH

In this section we present our framework for

interactive rendering of large polygonal datasets,

which leverages modern GPU capabilities, such as

programmability, vertex texturing, and geometry

caching. In a preprocessing stage, our algorithm sim-

pliﬁes the input mesh, and then generates a multireso-

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

182

lution hierarchy for the simpliﬁed representation. For

each face f of the multiresolution, the algorithm con-

structs and assigns a displacement map that resembles

the original surface’s patch, corresponding to f. At

runtime, the hierarchy is used to select an appropri-

ate view-dependent level-of-detail representation (of

the simpliﬁed mesh), which we call the front mesh.

In each frame, the front mesh is sent to the graph-

ics hardware for rendering. Within the graphics hard-

ware, the GPU reﬁnes the front faces by replacing

each face with a cached triangular planar mesh, which

we call the generic tile. Finally, the vertices of the tri-

angular mesh are elevated according to the assigned

displacement map (See Figure 3).

3.1 Terminology

Let us deﬁne the normal n

i

at an internal point q

i

in

the triangle t as the interpolation of the three normals

at its vertices according to the distance of the normals

from q

i

. We shall refer to the normal n

i

as the inter-

polated normal.

Let M be a triangular mesh, and S be a simpli-

ﬁed version of M. The vertex u

i

∈ S deﬁnes its cor-

responding point x

i

on M by shooting a ray from the

vertex u

i

, toward the mesh M, along the normal n

i

at

u

i

. An edge u

i

,u

j

is projected onto the shortest poly-

line on M connecting corresponding points x

i

and x

j

.

The trianglet

i

∈ S with vertices { u

i

,u

j

,u

k

} determines

a patch p

i

on M, which is deﬁned by the three inter-

section points x

i

, x

j

, and x

k

. If the patches p

0

,··· , p

l

do not overlap, and every point in M is covered by ex-

actly one patch (except the boundary polylines), we

say that S corresponds to M, and the triangle t

i

∈ S

corresponds to the patch p

i

∈ M deﬁned by t

i

.

A patch p

i

is denoted as an elevation of its corre-

sponding triangle t

i

if and only if a ray shot from any

point q

i

∈ t

i

along the normal at q

i

intersects the patch

p

i

at exactly one point (see Figure 2). A polygonal

mesh M is an elevation of another polygonal mesh

ˆ

M

if and only if

ˆ

M corresponds to M and every patch

p

i

∈ M is an elevation of its corresponding triangle

t

i

∈

ˆ

M. A simpliﬁcation algorithm preserves elevation

if the input mesh is an elevation of every approxima-

tion generated by this algorithm (see Figure 2).

3.2 Model Simpliﬁcation

Our framework requires the use of a simpliﬁcation al-

gorithm that preserves elevation to generate an initial

approximation. This requirement is essential to deﬁne

a correspondence between the triangles of the simpli-

ﬁed mesh and the patches of the original mesh, where

(a) (b)

Figure 2: The preserve elevation property (a) not preserving

elevation, and (b) preserving elevation.

Figure 3: An 8-face front for a sphere model, its generated

subdivision, and its reﬁned representation.

every patch is an elevation of its corresponding trian-

gle (see Figure 3).

In typical view-dependent rendering, the selection

of the view-dependentlevel-of-detail representation is

performed using the CPU. However, the CPU often is

not capable of traversing and updating large adaptive

meshes within the duration of a single frame. For that

reason, we simplify the input model to reach an initial

representation and a small multiresolution hierarchy.

In order to generate an initial model, our algorithm

uses the half-edge simpliﬁcation operator and the

quadric error metric (Garland and Heckbert, 1997).

However, by using both of these operators in conjunc-

tion, we risk obtaining a model that may not preserve

elevation. Therefore, during the simpliﬁcation pro-

cess, we maintain a normal-cone for each vertex v,

which encodes the normals of its adjacent triangles

and those in its subtree. A half-edge collapse is de-

ﬁned as valid if it does not result in a normal-cone

(for any affected vertex) that exceeds a half sphere.

The simpliﬁcation algorithm executes only valid half-

edge collapses, one by one, which are ordered by their

quadric error value. The algorithm proceeds until it

reaches the target polygon count, or until no valid col-

lapses are left. The resulting mesh is used as the ini-

tial model for the construction of the multiresolution

hierarchy. To generate a multiresolution hierarchy for

the simpliﬁed mesh, one can use any of the previously

developed schemes (see Section 2).

Our approach requires preserving elevation prop-

erty among consecutive simpliﬁcation steps, which

means that the mesh M

i

should be an elevation of the

mesh M

i−1

. In such a scheme, the elevation prop-

DISPLACEMENT PATCHES FOR GPU-ORIENTED VIEW-DEPENDENT RENDERING

183

erty may not be preserved with respect to the original

model, as this may prevent reaching necessary coarse

resolutions.

3.3 Generic Tile Structure

The generic tile is a triangle uniformly subdivided

into smaller triangles such that the number of vertices

along each one of its edges is the same. We shall refer

to the number of vertices along an edge of a generic

tile as the degree of the tile. A generic tile of degree k

includes (k− 1)

2

triangles and k(k− 1)/2 vertices.

The generic tile is used in two different steps of

our framework, the off-line preprocessing and the

real-time rendering (see Sections 3.4 and 3.6). These

two steps involve a geometric transformation, map-

ping the generic tile to match the scale and orientation

of the processed face.

Figure 4: The mapping of a generic tile onto a front face

and the mapping of a tile vertex w to T (w) on the reﬁned

front face.

We position the generic tile such that the three cor-

ner vertices of the generic tile are (1,0,0),(0, 1,0),

and (0,0, 1), and the coordinates of the remaining ver-

tices are determined accordingly.

The transformation T of the generic tile into an in-

put triangle t is performed by mapping the vertices

of the generic tile into the triangle t (refer to Fig-

ure 4). The position of each vertex in the mapped tile

is calculated using Equation 1a, where w

x

, w

y

, and

w

z

are the coordinates of the tile vertex w, and v

0

,

v

1

, and v

2

are the vertices of the triangle t. Note that

in such a scheme, the initial vertices of the generic

tile are mapped to the three vertices of the triangle

(T([0,1,0]) = v

0

, T([1,0,0]) = v

1

, and T([0,0,1]) =

v

2

). The normals at the mapped vertices are calcu-

lated in a similar manner using Equation 1b.

T (w) = w

x

∗ v

0

+ w

y

∗ v

1

+ w

z

∗ v

2

(1a)

N (w) = w

x

∗ n

0

+ w

y

∗ n

1

+ w

z

∗ n

2

(1b)

3.4 Displacement Maps

In the preprocessing stage, a displacement map is as-

signed to each face of the multiresolution hierarchy

to enable the recovery of the corresponding patch in

real-time rendering. The generation of these displace-

ment maps is performed using the generic tile.

For a generic tile of degree k, we deﬁne a dis-

placement map that includes k(k− 1)/2 values, as the

number of vertices in the generic tile. The generic tile

is then mapped onto the input face and an elevation

value is computed and assigned to each vertex v

i

of

the mapped tile. The elevation value for v

i

is deter-

mined by shooting a ray along the interpolated nor-

mal at v

i

, and computing the distance between v

i

and

the sample point x

i

on the patch. The sample point x

i

is determined based on the sampling method (refer to

section 3.5). In addition, we store the sampling error

ε for each face, and consider it later in the real-time

level-of-detail extraction.

Note that vertices along the edge e

k

= v

i

,v

j

are de-

termined by the two vertices v

i

and v

j

(refer to Equa-

tion 1a). As a result, reﬁned vertices along common

edges have exactly the same position and the same

normal, which provide a common polyline for each

two adjacent patches (see Figure 5).

Figure 5: The reﬁnement of the common edge of two adja-

cent faces results in exactly the same polyline.

3.5 Patch Sampling

Generating the displacement map for an input face is

practically the sampling of its corresponding patch.

According to Shannon’s theorem (Shannon, 1948),

for faithful sampling of an input function f, one needs

to sample with more than twice the frequency of

the highest-frequency component of f. In our ap-

proach, the sampling resolution is dictated by the de-

gree of the generic tile. Since our error-guided par-

tition scheme usually results in patches with low lo-

cal curvature, sampling artifacts are barely noticeable.

However, sometimes the constructed subdivision fails

to resemble the original surface and the naive sam-

pling does not provide quality images.

To overcome the above sampling limitation, we

consider the neighborhood of each sample when de-

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

184

termining its value. We have developed two sam-

pling algorithms – geometric simpliﬁcation guided,

and interpolation-based – to determine the appropri-

ate value of a sample. The two approaches compute

the intersection of the rays with the patch, and de-

termine the neighborhood of a sample as the region

bounded by its eight adjacent rays.

The geometric simpliﬁcation based approach sim-

pliﬁes the neighborhood of an intersection point and

recomputes the intersection of the ray together with

the resulting surface. The second intersection is taken

as the sample’s value. The simpliﬁcation is performed

only on the neighborhood while allowing limited sim-

pliﬁcation on the boundary. To achieve faithful sim-

pliﬁcation, we use the quadric error metric to order

the full edge collapses and determine the resulting

vertex position.

Our interpolation-based sampling algorithm inter-

polates the triangles within the neighborhood δ of the

sample point p. To compute such an interpolation,

one could bilinearly interpolate the centroids of the

triangles (c(t)) within δ weighted by w(t), which de-

pends on the size of the triangle t and its Euclidean

distance from p (see Equation 2).

The weight of a triangle is proportional to its

size, and inversely proportional to its distance from

p. However, it is not easy to distribute the triangle

weight into the two factors – triangle size, and its dis-

tance from p. For this reason, we uniformly subdivide

the neighborhood into almost equal cells in our cur-

rent implementation. The value of each cell is deﬁned

as the average of the centroids of the triangles or frac-

tion of triangles that intersect the cell, weighted by

their size. Since the cells are almost the same size, we

only need to consider their distance from the sample

point p when interpolating the neighborhood of p.

p =

∑

t

i

∈δ(p)

c(t

i

)w(t

i

) (2)

3.6 Real-Time Rendering

In real-time, the CPU relies on the multiresolution hi-

erarchy to extract appropriate fronts, based on view-

parameters and illumination. The front should be de-

tailed enough to represent the visualized model with

respect to the view-parameters, but coarse enough to

be extracted within the duration of one frame.

In each frame, the adaptation process which runs

on the CPU, traverses the front nodes. For each active

node n, it determines whether n needs to be reﬁned,

simpliﬁed, or remain in its current level. The updated

front is sent to the graphics hardware for rendering.

Within the graphics hardware, a single instance of

the generic tile is cached in video memory. In each

frame, the received front is reﬁned by mapping the

generic tile into each face. Equation 1a and Equa-

tion 1b are used to determine the position of the re-

ﬁned vertices and their interpolated normals, respec-

tively. For each vertex v

i

on the reﬁned face, a dis-

placement value d

i

is fetched from the assigned dis-

placement map, and used to elevate v

i

along its nor-

mal. As a result, each face is replaced with (k − 1)

2

triangles, where k is the degree of the generic tile (re-

fer to Section 3.3). The total number of rendered tri-

angles is (k− 1)

2

times the number of triangles in the

front. In order to maintain interactivity, the total num-

ber of rendered triangles should not exceed the ren-

dering capabilities of the graphics hardware.

3.7 External Video Memory

The size of large polygonal datasets often exceeds the

capacity of main memory. Even though we assume

the simpliﬁed representation ﬁts in main memory, the

displacement maps for large datasets may not ﬁt in

video memory. Handling large datasets usually in-

cludes two uploading stages – from an external me-

dia into main memory and from main memory into

video memory. Uploading data from external media

into main memory has been widely studied (see Sec-

tion 2), and is beyond the scope of this paper.

The limited size of video memory in current hard-

ware calls for the design of schemes that load data

from main memory into video memory. Our External

Video Memory (EVM) manager uses a single texture

as a buffer, and manages data replacement effectively.

The EVM allocates a video buffer B which is im-

plemented as a texture that ﬁts in video memory. Each

cached displacement map is stored as one row in the

video buffer. For every cached displacement map, the

EVM maintains an id, a priority, and an index. The

id is the displacement map identiﬁer, priority mea-

sures the probability of reusing it, and index is a ref-

erence to the video buffer where the data is cached.

The EVM caches displacement maps required for the

next frame. In addition, it caches displacement maps

assigned to several tree levels above and below the

front level-of-detail, based on the available memory

space. The priority of a displacement map is deter-

mined by its distance from the front along the levels

of the hierarchy and its assigned error ε (refer to Sec-

tion 3.4). When it is needed to replace a displacement

map, the entry with the lowest priority is replaced by

the newly loaded displacement map.

DISPLACEMENT PATCHES FOR GPU-ORIENTED VIEW-DEPENDENT RENDERING

185

(a) (b) (c)

Figure 6: Asian Dragon Model: (a) the partition into patches (b) sampling error using interpolated base sampling, and (c) the

ﬁnal image after GPU reﬁnement.(green and red colors represent minimum and maximum error, respectively).

4 IMPLEMENTATION DETAILS

We have implemented our algorithm in C++ and Cg,

and adopted view-dependence trees (VDT) to main-

tain a multiresolution hierarchy with displacement

maps and support view-dependent rendering. We la-

bel this VDT with displacement maps as DM-VDT.

Quadric error metric (Garland and Heckbert, 1997) is

used to order the execution of the half-edge collapses.

A single generic tile, which is cached in video

memory as a Vertex Buffer Object (VBO), is repre-

sented by two arrays. One array encodes the vertices’

coordinates and the other encodes its triangle strip or-

der, for efﬁcient rendering.

Since all displacement maps have the same size,

they can be easily stored in an array of displacement

maps, where each displacement map occupies a one-

dimensional array of elevation values. Such a scheme

avoids storing triangular maps in a rectangular texture

and, thus, simpliﬁes memory management.

To maintain the original visual appearance, we en-

able the reﬁnement process to recoverthe original sur-

face using approximately the same number of trian-

gles. It is important to note that the recovered surface

is a sampling of the original one, which may result in

small differences between them.

To achieve quality images, the screen-space error

must be determined for the selected front. Given a

screen-space error τ and a generic tile of degree k,

one can use kτ as the screen-space error to determine

the front. The reﬁnement of this front would lead to

a ﬁnal screen-space error τ. Unfortunately, in such a

scheme the actual screen-space error may exceed τ as

a result of an under-sampling error. To overcome this

limitation, we take into account the sampling error ε

at front extraction. The screen-space error of a face

f depends on its distance from viewpoint, its orienta-

tion, and its sampling error ε.

5 RESULTS

We have tested our implementation using various

datasets with different complexitiesand haveobtained

impressive results. In this section, we report samples

of these tests and their results which were obtained us-

ing a PC with 2.13 GHz Pentium Dual-Core, 2GB of

memory, and an NVIDIA GeForce 8800 GTX graph-

ics card with 768MB video memory.

5.1 Performance

Table 1 reports off-line construction time of view-

dependence trees. The ﬁrst two columns present

the properties of the tested models. The VDT and

DM-VDT columns show the preprocessing time and

memory requirements of the classic view-dependence

trees (El-Sana and Varshney, 1999), and the proposed

approach. As can be seen, the total DM-VDT pre-

processing time is less than that of the classic VDT,

as a result of starting with a simpliﬁed representation.

In addition, the space complexity of our hierarchy is

less than 1/7 of that of the classic VDT, which results

from compactly representing the lower levels using

displacement maps (more in Section 5.2).

Table 2 summarizes real-time performance for

datasets that ﬁt entirely in video memory. The ﬁrst

two columns report the generic tile properties. The

last two columns show the performance, measured

in millions of triangles. These results were com-

puted by averaging the rendering of 1000 frames. The

peak performance of our graphics hardware is 280M

∆/sec when rendering cached VBO of indexed tri-

angle strips. The peak performance drops to 267M

∆/sec when fetching one elevation per vertex. As can

be seen, the transmission cost of a front is inversely

proportional to tile’s degree. In practice, our algo-

rithm is GPU-bound at tiles of degree k = 33. The

difference between the peak performance of 280M

∆/sec and the achieved performance of 267M ∆/sec

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

186

Table 1: Preprocessing time, memory requirement, and real-time performance.

Model VDT DM-VDT Face-Quality Runtime Runtime

Dataset Size Time Space Time Space

(min : sec) (MB) (min : sec) (MB) ME MSE ME MSE (f ps)

A. dragon 7.2M 63:29 327 49:09 49.7 0.44 0.010 0.67 0.12 130

T. statue 10.0M 81:23 461 68:10 63.6 0.38 0.033 1.52 0.45 89

Lucy 28.1M 217:45 1271 191:58 178.1 1.23 0.024 1.70 1.08 64

David 56.2M 420:09 2568 332:20 365.7 0.80 0.017 1.68 1.21 64

is a result of unavoidable CPU work and an additional

minor CPU-GPU communication load. Therefore,

tiles of higher degrees, such as k = 65, also receives

267M ∆/sec, and are useless because they reduce the

ﬂexibility of the level-of-detail selection. Moreover,

tiles of degree k = 65 and higher, exceed the maximal

texture size supported in modern GPUs. According

to our results, the processing time of a frame is dis-

tributed as follows: 4% for level-of-detail selection,

24% for front transportation, and 72% for GPU-based

reﬁnement.

Table 2: The performance with respect to tile degrees.

Tile Triangles VBO Vertex fetch

degree in a tile (∆/sec) (∆/sec)

9 64 154M 146M

17 256 216M 212M

33 1024 278M 264M

Our algorithm is designed to utilize the NVIDIA

GeForce 8 series. However, vertex texturing is al-

ready supported from the NVIDIA GeForce 6 series.

We have tested our implementation on a NVIDIA

GeForce 7800 GTX graphics card with 256MB video

memory, and compared it with a GeForce 8800 GTX.

We have found that the triangle throughput of the

GeForce 7 series is only 20% the throughput of the

GeForce 8 series. This is a direct result of the differ-

ences in the architecture of the GPUs. Our algorithm

relies heavily on vertex texturing, and therefore, its

performance is directly affected by the number of ver-

tex processors. The GeForce 7800 GTX (as well as

previous graphics cards) separates the vertex proces-

sors from the fragment processors. Because of this,

the GeForce 7800 GTX has only a few vertex proces-

sors, which limits the speed of vertex texturing. In

contrast, the GeForce 8800 GTX has a uniﬁed archi-

tecture in which the processors are general. Each pro-

cessor can be used as a vertex processor and therefore

accelerate the speed of vertex texturing. Since our

algorithm utilizes the vertex processors optimally, de-

spite the limitations of the GeForce 7800 GTX archi-

tecture, peak performance of the card is still achieved.

5.2 Image Quality

The quality of the resulting images is measured us-

ing two error metrics – sampling-quality and screen-

space errors. The sampling-quality error for a face f

is deﬁned as the average distance between its reﬁned

mesh and its corresponding patch. The screen-space

error is the screen projection of the average distance.

Table 1 reports the error and rendering time over

various datasets. The sampling-quality and screen-

error columns show maximal error (ME) and mean

square error (MSE). The last column reports the re-

sulting frame rates (fps) for a screen-space error τ ≤ 2

pixels. As can be seen, our algorithm achieves high

image quality at interactive rates. As a result of se-

lecting coarse fronts, our approach requires only 8%

extra triangles (after face reﬁnement) in comparison

to classic VDT, to reach the same image quality. The

two last rows in Table 1 report the same performance

(fps) for the Lucy and the David models, as a result

of using the same screen space error τ in the tests,

thus rendering approximately the same number of tri-

angles for these two models. Using a uniform subdi-

vision for faces sometimes results in an overly large

number of triangles. This occurs when the patch rep-

resented by a face is close to planar. However, in or-

der to prevent such cases, our algorithm uses runtime

error control to adapt the selected level of detail to the

required quality. Therefore, when a face is close to

planar, our algorithm will select a more coarse repre-

sentation of the same area.

Figure 1 shows four images that illustrate the ﬂow

of our approach. The two images in Figure 1(a) and

1(b) show the selected coarse front, and the images

in Figure 1(c) and 1(d) show the same front after be-

ing reﬁned by the GPU. Figure 9 shows a color cod-

ing of the accuracy of the three sampling algorithms –

naive, simpliﬁcation guided, and interpolation-based

(see Section 3.5) – using the David model. As can be

seen, the interpolation-based approach provides the

best results, and the naive approach provides the least

accurate sampling. Figure 10 illustrates the ﬂow and

performance of our approach using the Lucy model.

The models Figure 9 and Figure 10 were rendered

DISPLACEMENT PATCHES FOR GPU-ORIENTED VIEW-DEPENDENT RENDERING

187

at approximately 64 fps with pixel error less than 2.

Figure 7 shows a close-up view of the David model.

In order to emphasize potential screen-space error,

we used a large error, τ = 15. Figure 6 shows the

coarse front (which is the same as mesh partition

into patches), the sampling error, and the ﬁnal image

(after GPU reﬁnement) of the Asian Dragon model

(7.2M f aces). Figure 8 presents the tradeoff between

the degree of the generic tile and the visual quality (as

pixel error), as was measured on the Lucy model.

(a) (b)

Figure 7: A closeup of the David Model: (a) A shaded view

of the model, (b) The screen-space errors (green and red col-

ors represent minimum and maximum error, respectively).

Figure 8: The screen-space error for Lucy, according to the

number of triangles for different tile degrees k.

5.3 External Video Memory

Table 3 presents the performance of our external

video memory scheme with respect to different tile

degrees. The third column reports the number of dis-

placement maps that ﬁt into a 16MB EVM buffer (a

single texture). The face binding column shows the

number of displacement maps we were able to up-

load into the video memory within a second, while

EVM is disabled. Without EVM, it was not possi-

ble to achieve interactive rates for a front that con-

tained more than 100 faces. The EVM column reports

the number of displacement maps uploaded into the

video memory within a second, when EVM was en-

abled. It is apparent that the EVM manager improves

caching speed by a factor 30. In addition, the EVM

manager enables temporal coherence among consecu-

tive frames – our results show that at least 94% of the

displacement maps required for the next frame, are al-

ready cached in the video buffer. When comparing the

caching efﬁciency of different tile degrees (referring

to the number of cached elevation values per second),

we conclude that the larger the generic tile, the better

the utilization of the communication channel.

Table 3: The performance of EVM manager.

Tile Faces Entries Face EVM

degree in a tile in a buffer binding

9 64 373K 3.5K 147K

17 256 94K 2.7K 78K

33 1024 29K 2.5K 30K

5.4 Comparison with Other Algorithms

Next, we compare our approach with some of the

previously developed level-of-detail rendering ap-

proaches based on their published results.

The HLOD (Erikson and Manocha, 2001), the

Tetrapuzzles (Cignoni et al., 2004), and the Quick-

VDR (Yoon et al., 2004) algorithms present a level-

of-detail rendering algorithm that uniformly subdi-

vide an input 3D model into disjoint patches, which

are simpliﬁed, ordered in triangle strip formats and

stored in a VBO. These algorithms explicitly store the

geometry of the input model in a hierarchy, and uti-

lize only limited geometry transfer formats (several

non-optimized triangle strips for a single patch). In

contrast, our approach represents patches as displace-

ment maps. Using displacement maps for implicit

representation of a model saves approximately 85%

of the application’s memory. Moreover, the generic

tile is cached in an optimized triangle strip, which

implies a single transportation and rendering pass for

each face. We compare our algorithm to those algo-

rithms using models that ﬁt entirely into main mem-

ory. In such a scenario, these algorithms are GPU-

bound and render approximately 280M∆/sec, using

our PC. As a result of using an error guided subdi-

vision scheme, our algorithm is able to utilize CPU-

based patch culling during runtime. Using CPU-

based culling for patches in GPU-bounded algorithms

increases the performance beyond the maximum ren-

dering capability of the GPU. Our algorithm increases

the performance by 35%, and renders approximately

356M∆/sec. In addition, our algorithm requires only

87% of the triangles required by the above mentioned

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

188

approaches, to reach a two-pixel error image quality.

This also means that the rendering performance of our

algorithm is faster by 16%.

Ji et al. (Ji et al., 2005) suggested a GPU-based

view-dependent rendering that selects and renders a

level-of-detail within the GPU. Instead of fully imple-

menting their approach using the GPU, they emulated

a GPU using a CPU-based implementation, as a result

of the complexity of their algorithm. The simplicity

of our approach leads to a straightforward implemen-

tation (using several Cg instructions) that efﬁciently

utilizes the processing power of the GPU.

(a) (b) (c)

Figure 9: The accuracy of (a) naive (b) geometric simpli-

ﬁcation guided, and (c) interpolated-based sampling, using

the David model(green and red colors represent minimum

and maximum error, respectively).

(a) (b) (c)

Figure 10: The Lucy Model: (b) An 8K faces coarse front,

(b) The mesh partition imposed by the coarse front, and (c)

The reﬁned mesh using generic tile of degree 33.

6 CONCLUSIONS AND FUTURE

WORK

We have presented a framework for a GPU-based

view-dependent rendering of general 3D polygonal

datasets. The simpliﬁed representation is an error-

guided subdivision of the input model into disjoint

patches. The geometry of these patches is encoded

into displacement maps. Processing smaller multires-

olution hierarchies within the CPU removes the bot-

tleneck, and makes the CPU available for other gen-

eral purpose computations. In addition, the transmis-

sion of a coarse level-of-detail along with displace-

ment maps reduces the communication load between

the CPU and the GPU. This is a result of utilizing

video memory for caching and transmitting only un-

cached primitives. In summary, our approach man-

ages to seamlessly stitch adjacent patches without

adding extra dependencies or sliver polygons.

We observe several possibilities for future en-

hancements to our algorithm and for further research.

One might wish to eliminate the case of overly large

triangulations of planar patches. It would be interest-

ing to implement a GPU-based adaptive subdivision,

instead of a uniform patch subdivision. Furthermore,

for another boost in performance, the integration of

existing occlusion culling techniques into our algo-

rithm may be considered.

ACKNOWLEDGEMENTS

This work is supported by the Lynn and William

Frankel Center for Computer Sciences and the Tuman

Fund. In addition, we would like to thank the review-

ers for their constructive comments.

REFERENCES

Asirvatham, A. and Hoppe, H. (2005). GPU Gems 2,

chapter Terrain rendering using GPU-based geome-

try clipmaps., pages 27–45. Addison-Wesley Profes-

sional.

Baboud, L. and D´ecoret, X. (2006). Rendering geometry

with relief textures. In Proceedings of Graphics Inter-

face ’06, pages 195–201.

Bolz, J. and Schr¨oder, P. (2005). Evaluation of subdivision

surfaces on programmable graphics hardware. Sub-

mitted.

Cignoni, P., Ganovelli, F., Gobbetti, E., Marton, F., Pon-

chio, F., and Scopigno, R. (2003). P-BDAM – planet-

sized batched dynamic adaptive meshes. In Proceed-

ings of Visualization ’03, pages 147–155.

DISPLACEMENT PATCHES FOR GPU-ORIENTED VIEW-DEPENDENT RENDERING

189

Cignoni, P., Ganovelli, F., Gobbetti, E., Marton, F., Pon-

chio, F., and Scopigno, R. (2004). Adaptive TetraPuz-

zles – efﬁcient out-of-core construction and visualiza-

tion of gigantic polygonal models. ACM Transactions

on Graphics, 23(3):796–803.

Dachsbacher, C. and Stamminger, M. (2004). Rendering

procedural terrain by geometry image warping. In

Proceedings of Eurographics/ACM SIGGRAPH Sym-

posium in Geometry Processing, pages 138–145.

De Floriani, L., Magillo, P., and Puppo, E. (1998). Efﬁcient

implementation of multi-triangulations. In Proceed-

ings of Visualization ’98, pages 43–50.

DeCoro, C. and Pajarola, R. (2002). Xfastmesh: Fast view-

dependent meshing from external memory. In Pro-

ceedings of Visualization ’02, pages 363–370.

Doggett, M. and Hirche, J. (2000). Adaptive view de-

pendent tessellation of displacement maps. In Pro-

ceedings of the ACM SIGGRAPH/EUROGRAPHICS

workshop on Graphics hardware ’00, pages 59–66.

Donnelly, W. (2005). GPU Gems 2, chapter Per-Pixel Dis-

placement Mapping with Distance Functions, pages

123–136. Addison-Wesley Professional.

El-Sana, J. and Chiang, Y. (2000). External memory view-

dependent simpliﬁcation. Computer Graphics Forum,

19(3):139–150.

El-Sana, J., Sokolovsky, N., and Silva, C. (2001). Integerat-

ing occlusion culling with view-dependent rendering.

In Proceedings of Visualization ’01, pages 371–378.

El-Sana, J. and Varshney, A. (1999). Generalized view-

dependent simpliﬁcation. Computer Graphics Forum,

18(3):C83–C94.

Erikson, C. and Manocha, D. (2001). Hierarchical levels

of detail for fast display of large static and dynamic

environments. In Proceedings of symposium on Inter-

active 3D graphics ’01, pages 111–120.

Garland, M. and Heckbert, P. (1997). Surface simpliﬁca-

tion using quadric error metrics. In Proceedings of

SIGGRAPH ’97, pages 209–216.

Guskov, I., Vidimˇce, K., Sweldens, W., and Schr¨oder, P.

(2000). Normal meshes. In Proceedings of the 27th

annual conference on Computer graphics andinterac-

tive techniques ’00, pages 95–102.

Hirche, J., Ehlert, A., Guthe, S., and Doggett, M. (2004).

Hardware accelerated per-pixel displacement map-

ping. In Proceedings of Graphics Interface ’04, pages

153–158.

Hoppe, H. (1997). View-dependent reﬁnement of progres-

sive meshes. In Proceedings of SIGGRAPH ’97, pages

189–198.

Hwa, L., Duchaineau, M., and Joy, K. (2005). Real-time op-

timal adaptation for planetary geometry and texture:

4-8 tile hierarchies. IEEE Transactions on Visualiza-

tion and Computer Graphics, 11(4):355–368.

Ji, J., Wu, E., Li, S., and Liu, X. (2005). Dynamic lod on

gpu. In Computer Graphics International ’05, pages

108–114.

Kautz, J. and Seidel, H. (2001). Hardware accelerated dis-

placement mapping for image based rendering. In

Proceedings of Graphics Interface ’01, pages 61–70.

Kim, J. and Lee, S. (2001). Truly selective reﬁnement of

progressive meshes. In Proceedings of Graphics In-

terface ’01, pages 101–110.

Livny, Y., Sokolovsky, N., Grinshpoun, T., and El-Sana, J.

(2007). A gpu persistent grid mapping for terrain ren-

dering. The Visual Computer. to appear.

Losasso, F., Hoppe, H., Schaefer, S., and Warren, J. (2003).

Smooth geometry images. In Proceedings of Euro-

graphics/ACM SIGGRAPH Symposium in Geometry

Processing, pages 138–145.

Luebke, D. and Erikson, C. (1997). View-dependent simpli-

ﬁcation of arbitrary polygonal environments. In Pro-

ceedings of SIGGRAPH ’97, pages 199–207.

Pajarola, R. (2001). Fastmesh: efﬁcient view-dependent

meshing. In Proceedings of Paciﬁc Graphics ’01,

pages 22–30.

Policarpo, F., Oliveira, M., and Comba, J. (2005). Real-

time relief mapping on arbitrary polygonal surfaces.

In Proceedings of the 2005 symposium on Interactive

3D graphics and games ’05, pages 155–162.

Schein, S., Karpen, E., and Elber, G. (2005). Real-

time geometric deformation displacement maps us-

ing programmable hardware. The Visual Computer,

21(8):791–800.

Schneider, J. and Westermann, R. (2006). GPU-friendly

high-quality terrain rendering. Journal of WSCG,

14(1-3):49–56.

Shannon, C. E. (1948). A mathematical theory of commu-

nication. Bell System Technical Journal, 27:379–423,

623–656.

Southern, R. and Gain, J. (2003). Creation and con-

trol of real-time continuous level of detail on pro-

grammable graphics hardware. Computer Graphics

Forum, 22(1):35–48.

Wagner, D. (2004). Terrain geomorphing in the vertex

shader. ShaderX2: Shader Programming Tips &

Tricks with DirectX 9.

Yoon, S., Salomon, B., and Manocha, D. (2003). Interac-

tive view-dependent rendering with conservative oc-

clusion culling in complex environments. In Proceed-

ings of Visualization ’03.

Yoon, S. E., Salomon, B., Gayle, R., and Manocha, D.

(2004). Quick-vdr: Interactive view-dependent ren-

dering of massive models. In Proceedings of Visual-

ization ’04, pages 131–138.

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

190