A Hybrid System for Real-time Rendering of Depth of Field Effect in

Games

Yu Wei Tan

, Nicholas Chua, Nathan Biette

and Anand Bhojan

School of Computing, National University of Singapore, Singapore

Keywords:

Real-Time, Depth of Field, Ray Tracing, Post-processing, Hybrid Rendering, Games.

Abstract:

Real-time depth of ﬁeld in game cinematics tends to approximate the semi-transparent silhouettes of out-of-

focus objects through post-processing techniques. We leverage ray tracing hardware acceleration and spatio-

temporal reconstruction to improve the realism of such semi-transparent regions through hybrid rendering,

while maintaining interactive frame rates for immersive gaming. This paper extends our previous work with a

complete presentation of our technique and details on its design, implementation, and future work.

1 INTRODUCTION

We present the design and evaluation of a novel real-

time hybrid rendering approach for the Depth of Field

(DoF) effect which incorporates post-process based

DoF with temporally and spatially reconstructed ray

trace based DoF. By adaptively combining the output

of different passes, we achieve more accurate semi-

transparencies of foreground geometry to reveal back-

ground objects. We believe that our hybrid DoF tech-

nique is the ﬁrst to integrate a ray-traced output with

a traditional post-processing pipeline.

Building on our previous work (Tan et al., 2020a),

the key contributions of this paper are as follows.

• Design and implementation of a real-time hybrid

rendering pipeline for DoF.

• Visual quality evaluation of the hybrid method,

speciﬁcally, the accuracy of semi-transparencies.

• Performance evaluation and trade-offs in the use

of ray tracing for DoF.

1.1 Background Information

Current DoF implementations in game engines

typically use the thin lens model (Potmesil and

Chakravarty, 1982) to approximate the behaviour of

cameras. The zone of focus is the part of the scene

where the objects look sharp. The Circle of Confu-

sion (CoC) (Demers, 2004) of points in the zone of

https://orcid.org/0000-0002-7972-2828

https://orcid.org/0000-0001-7827-1538

https://orcid.org/0000-0001-8105-1739

focus are smaller than a cell on the sensor, yielding a

single pixel in the image, whereas points outside the

zone of focus appear as a spot on the image based on

their CoC. For such points which lie on the same ob-

ject, an overall blur of the object is produced.

Bokeh shapes, which are bright spots created by

a beam of unfocused light hitting the camera sensor,

appear in areas out of the zone of focus. They usu-

ally take the shape of the camera’s aperture and can

have circular or polygonal frames depending on the

number of blades in the camera shutter.

Blurred foreground objects also have a slightly

transparent silhouette through which background

colour can be observed. These semi-transparent edges

cannot be properly rendered in games with post-

processing as the image does not store any informa-

tion behind a foreground object (Kraus and Strengert,

2007). However, such approaches are widely used in

real-time rendering as images produced by rasteriza-

tion are in sharp focus (McGraw, 2015). According to

Jimenez (2014), many techniques can only perform

an approximation of the background colour locally

using neighbouring pixels like in Abadie (2018) or

grow blur out of the silhouette of foreground objects

onto background colour, reusing foreground informa-

tion to avoid reconstructing the missing background.

However, shifting the blur outwards from foreground

objects produces inaccuracies with regards to their ac-

tual geometries, especially when the amount of ex-

tended area is comparable to the size of the objects

themselves. Objects with more elaborate shapes also

become fat and deformed at areas with large CoC.

Nonetheless, such inaccuracies do not exist in ray-

Tan, Y., Chua, N., Biette, N. and Bhojan, A.

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games.

DOI: 10.5220/0010839800003124

In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 1: GRAPP, pages

79-90

ISBN: 978-989-758-555-5; ISSN: 2184-4321

traced DoF (Cook et al., 1984) as we can simulate

a thin lens and query the scene for intersections, not

being limited to what is rendered in the rasterized im-

age. Nonetheless, achieving interactive frame rates

with ray tracing is difﬁcult due to the high computa-

tional costs of calculating ray-geometry intersections

and multiple shading for each pixel, even with the lat-

est GPUs developed for ray tracing. Hence, hybrid

rendering, which aims to combine existing rasteriza-

tion techniques with ray tracing, is being researched.

2 RELATED WORK

2.1 Hybrid Rendering

Examples of hybrid rendering on related effects in-

clude Macedo et al. (2018) and Marrs et al. (2018)

which invoke ray tracing for reﬂections and anti-

aliasing respectively only on pixels where rasteri-

zation techniques are unable to achieve realistic or

desirable results. Beck et al. (1981), Hertel et al.

(2009) and Lauterbach and Manocha (2009) employ

the same strategy to produce accurate shadows.

The concept of hybrid rendering can also be ex-

tended to general rendering pipelines. For example,

Cabeleira (2010) uses rasterization for diffuse illu-

mination and ray tracing for reﬂections and refrac-

tions. Barr

e-Brisebois et al. (2019) is also one such

pipeline that has replaced effects like screen-space re-

ﬂections with their ray trace counterparts to achieve

better image quality. Another commonly-used ap-

proach is Chen and Liu (2007), the substitution of

primary ray generation with rasterization in recursive

ray tracing by Whitted (1979). Andrade et al. (2014)

improves upon this technique by observing a render

time limit through the prioritization of only the most

important scene objects for ray tracing.

2.2 DoF

Many DoF rendering techniques have been devised

over the years. Potmesil and Chakravarty (1982)

ﬁrst introduced the concept of CoC for a point based

on a thin lens model which simulates the effects of

the lens and aperture of a physical camera. It em-

ploys a post-processing technique that converts sam-

pled points into their CoCs. The intensity distribu-

tions of CoCs overlapping with each pixel are then

accumulated to produce the ﬁnal colour for the pixel.

Haeberli and Akeley (1990) integrates images ren-

dered from different sample points across the aper-

ture of the lens with an accumulation buffer. On the

other hand, Cook et al. (1984) traces multiple rays

from these different sample points on the lens into the

scene using a technique now commonly known as dis-

tributed ray tracing, for which improvements in ray

budget have been made in Hou et al. (2010) and Lei

and Hughes (2013).

For rendering with real-time performance con-

straints, spatial reconstruction and temporal accumu-

lation approaches have also been developed. For in-

stance, Dayal et al. (2005) introduces adaptive spatio-

temporal sampling, choosing to sample more based

on colour variance in the rendered image with selec-

tive rendering by Chalmers et al. (2006) and favouring

newer samples for temporal accumulation in dynamic

scenes. Schied et al. (2017) also uses temporal ac-

cumulation to raise the effective sample count on top

of image reconstruction guided by variance estima-

tion. Such techniques have been applied for DoF such

as in Hach et al. (2015), Leimk

uhler et al. (2018),

Weier et al. (2018), Yan et al. (2016) and Zhang et al.

(2019). More advanced reconstruction techniques for

DoF have also been introduced, such as Belcour et al.

(2013), Lehtinen et al. (2011), Mehta et al. (2014) and

Vaidyanathan et al. (2015) which sample light ﬁelds

as well as Shirley et al. (2011) which selectively blurs

pixels of low frequency content in stochastic sam-

pling. A more adaptive temporal accumulation ap-

proach from Schied et al. (2018) which is responsive

to changes in sample attributes such as position and

normal has also been proposed to mitigate ghosting

and lag in classic temporal accumulation approaches.

Micropolygon-based techniques have also proven

to be capable of DoF like in Fatahalian et al. (2009)

and Sattlecker and Steinberger (2015). Catmull

(1984) solves for per-pixel visibility by performing

depth sorting on overlapping polygons for each pixel.

Following this, approaches based on multi-layer im-

ages like Franke et al. (2018), Kraus and Strengert

(2007), Lee et al. (2008), Lee et al. (2009) and Sel-

grad et al. (2015) have also been introduced where

the contributions from each layer are accumulated to

produce the ﬁnal image. Such layered approaches

are computationally expensive although they can gen-

erate relatively accurate results in terms of semi-

transparencies. Bukowski et al. (2013), Jimenez

(2014), Valient (2013) and state-of-the-art Unreal En-

gine approach Abadie (2018) divide the scene into the

background and foreground, and runs a gathering ﬁl-

ter separately for each. We adopt such a technique,

which performs better in terms of rendering time even

in comparison to Yan et al. (2016), which avoids the

problem of separating the scene by depth by factoring

high-dimensional ﬁlters into 1D integrals.

Hach et al. (2015) acquires a rich lens archive de-

rived from a real target cinematic lens and uses it to

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications

synthesize a point spread function (PSF) for convo-

lution in blurring. For each pixel, Leimk

uhler et al.

(2018) splats its PSF using a sparse representation

of its Laplacian. Time-dependent edge functions for

Akenine-M

oller et al. (2007) and complex plane pha-

sors for Garcia (2017) have also been used to pro-

duce DoF. Such approaches involve complex compu-

tations and seem to be more suitable for ofﬂine ren-

dering. More recently, convolutional neural network

approaches like Zhang et al. (2019) perform post-

processing for DoF by predicting the amount of blur

to generate through the analysis of past frames in real-

time but require copious amounts of training data.

McGraw (2015) and McIntosh et al. (2012)

are post-process techniques that produce polygonal

bokeh shapes based on the silhouette of the camera

aperture. McGraw also supports bokeh shapes of non-

uniform intensities, including bokeh shapes which are

lighter or darker at the rim due to spherical aberration

of the lens. Our approach currently generates circu-

lar bokeh shapes of uniform intensities but can be ex-

tended to produce alternative bokeh shapes such as

polygons by changing the shape of our sampling ker-

nel, and bokeh shapes of varying intensities by adjust-

ing the relative weight of samples within the kernel.

Our hybrid DoF technique is novel as we augment

conventional post-process approaches with ray trac-

ing, generating more accurate semi-transparencies of

foreground geometry in real-time.

3 DESIGN

Our approach in Figure 1 combines post-process

based DoF with temporally-accumulated and

spatially-reconstructed ray trace based DoF, to

produce a hybrid DoF effect that recreates accurate

semi-transparencies. Using deferred shading, a Ge-

ometry Buffer (G-Buffer) is ﬁrst produced, together

with textures containing other derived information

needed for the post-process and ray trace stages. A

sharp all-in-focus rasterized image of the scene is

also generated. This image subsequently undergoes

post-process ﬁltering while parts of the scene deemed

inaccurate with post-processing undergo distributed

ray tracing augmented with spatio-temporal recon-

struction. The images are ﬁnally composited together

with a temporal anti-aliasing (TAA) pass.

We split our scene into the near ﬁeld and the far

ﬁeld. Points in the scene in front of the focus plane

are in the near ﬁeld, and points behind the focus plane

are in the far ﬁeld further away, as shown in Figure 2.

We perform this split for both the post-process and ray

trace images, in order to merge post-processed colour

with ray trace colour on a per-ﬁeld basis later on.

3.1 Post-process

For our post-process technique, we adapted the DoF

implementation by Jimenez (2014) which uses a gath-

ering approach directly inspired by Sousa (2013) for

ﬁltering to produce blur. Following Jimenez, the ini-

tial rasterized image is downscaled to half its resolu-

tion to speed up the ﬁltering process.

3.1.1 Preﬁlter Pass

For circular bokeh shapes, Jimenez uses a 49-tap 3-

ring main ﬁlter kernel scaled to the size of the max-

imum CoC in the tile neighbourhood of the target

pixel. However, to ﬁght undersampling, a downsam-

pling 9-tap bilateral preﬁlter is ﬁrst applied to ﬁll the

gaps of the main ﬁlter. We decided to use 81 taps with

an additional ring of samples as shown in Figure 3 for

better visual quality. Hence, our preﬁlter kernel has

a diameter of 1/8 instead of 1/6 the maximum CoC

size as in the original design. In cases where the max-

imum CoC is too small as most pixels in the neigh-

bourhood are in focus, the size of the preﬁlter kernel

is capped at

√

2 (diagonal length of 1 pixel) to avoid

sampling the same pixel multiple times.

3.1.2 Main Filter Pass

Jimenez performs alpha blending on the foreground

and background layers with the normalized alpha of

the foreground. However, the implementation result

was unsatisfactory as the normalized alpha calculated

was too small, producing an overly transparent fore-

ground. Hence, we used a normalized weighted sum

of foreground and background contributions for the

post-process colour v

instead as shown.

+ v

∑

i=1

D(0, i) ·sampleAlpha(r

)

(1)

In the above equation, r

refers to the CoC radius

of sample i while v

and v

represent the total ac-

cumulated colour for the foreground and background

respectively. D(0,i) refers to the comparison of the

CoC of sample i to its distance to the centre tap of the

kernel. If the radius of the sample’s CoC is greater

than its distance to the kernel centre, the sample con-

tributes to the target pixel’s colour.

To combat aliasing, we jitter the camera’s position

with pseudorandom number values. We also gather

the proportion of samples with high specular values

for each pixel to be used to composite ray trace and

post-process colour on bright bokeh shapes later on.

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games

Figure 1: Hybrid rendering pipeline for DoF.

(a) Near ﬁeld (b) Far ﬁeld

Figure 2: Demarcation of near and far ﬁelds.

3.1.3 Postﬁlter Pass

Finally, as recommended by Jimenez, we apply a

3 ×3 median postﬁlter at half resolution to upscale

the image back to full resolution like in Sousa (2013).

The median postﬁlter, based on a GPU-optimized

max-min network ﬂow (Smith, 1996), helps to re-

move noise from the main pass ﬁltering by rejecting

outlier pixels, smoothening out the result.

3.2 Ray Trace

3.2.1 Ray Mask

We shoot a variable number of rays into the scene by

creating an adaptive ray mask based on the gradient of

surface normals. Employing a selective rendering ap-

proach (Chalmers et al., 2006) for better performance,

we aim to shoot more rays at edges to create clean

semi-transparencies but less at regions with fewer de-

tails such as relatively ﬂat surfaces.

Our ray mask utilizes a 5 ×5 Sobel convolution

kernel to estimate how extreme an edge is. Adopt-

ing ideas from Canny Edge Detection (Canny, 1986),

we apply a Gaussian ﬁlter on the G-Buffer before per-

forming the Sobel operator so as to reduce noise and

jaggies along diagonal edges. The Sobel kernel is

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications

Figure 3: 9-Tap bilateral preﬁlter for 81-tap main ﬁlter.

then applied to the ﬁltered G-Buffer at a lower reso-

lution to get an approximate derivative of the gradient

associated with each target pixel, based on the depth

and surface normal of itself and surrounding pixels

which are readily available from rasterization. The

depth derivatives capture the separation between over-

lapping objects where the colour of one object might

be uncovered in the other. On the other hand, normal

derivatives can detect signiﬁcant variations in the ori-

entation of primitive faces within objects themselves

near their silhouettes, where semi-transparencies are

observed in DoF. The per-pixel output of this ﬁlter is:

x = (δ

+ δ

) ·s, s ∈ [0,1] (2)

= saturate(1 −

x + 1

) (3)

Here, δ

and δ

refer to the magnitude of the

derivative of depth and surface normals surrounding

the pixel respectively, based on the Sobel ﬁlter. x

refers to the normalized x and s is a user-speciﬁed

variable to scale down the result as it is hard to per-

form normalization with respect to the entire scene,

resulting in the aforementioned compromise.

To account for temporal variation to reduce noise

in the output, we also shoot more rays at regions

of high variance in luminance as inspired by Schied

et al. (2017). Hence, the ray mask is complemented

with a temporally-accumulated variance estimate σ

explained later in Section 3.2.3. Although this vari-

ance is small, it is able to detect edges of foreground

objects and specular bokeh shapes. We favour shoot-

ing more rays in these regions for a cleaner image via

scaling the variance by a large weight of 100000. This

ampliﬁed variance is then used with x

to determine

the ﬁnal ray count as follows.

= saturate(x

+ σ

·100000) ·m (4)

(a) σ

·100000

(b) x

Figure 4: Comparison of variance with no. of rays shot.

The value of the large weight was chosen with our

observation of the ﬁnal ray mask generated in relation

to our variance as shown in Figure 4. Instead of se-

lectively updating speciﬁc pixels like in Dayal et al.

(2005), we shoot at least one ray per pixel in the near

ﬁeld but increase this number based on the variance

gathered over time for enhanced visual quality.

Figure 5: Graph of x

against σ for different m at x

= 0.5.

The number of rays to be shot per pixel is capped

at m, a scene-dependent value that can be tuned for

performance or accuracy. Appropriate values were

chosen for scenes SUN TEMPLE and PINK ROOM.

As shown in Figure 5, more rays are shot as the esti-

mated luminance variance increases. However, after a

certain threshold, this number plateaus and never ex-

ceeds m which is exactly the ray budget for the scene.

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games

3.2.2 Shooting Rays

We follow the distributed ray tracing DoF technique

by Cook et al. (1984) in shooting a ray from a point

on the lens to a calculated focus point. We consider

the ﬁnal image as the camera sensor and the position

of the camera in the scene as the centre of the lens.

After determining the direction from a pixel on

the “sensor” (the image being rendered) to the cen-

tre of the lens, we then compute the focus point of

this pixel on the focus plane. Then, we take ran-

dom positions on the lens and spawn rays from those

positions towards the focus point. As illustrated in

Kraus and Strengert (2007), we take multiple samples

within a circle to produce random ray origins within

the lens’ area. On edges of foreground objects, some

rays will hit the object while others will “go around”

and sample the scene behind, yielding the effect of

semi-transparent silhouettes of the blurred object in

the foreground, out of the zone of focus.

We ray trace at half resolution as a trade-off for

speed and upscale later using the same median ﬁlter

as our post-process stage in Section 3.1.3, helping to

reduce sampling noise especially on very bright ar-

eas of foreground objects. The ray-traced colours and

their respective calculated CoC sizes are then sorted

by depth into the near and far ﬁeld. To have a smooth

transition between the near and far ﬁeld, we split the

contribution of each ray trace colour per pixel to each

ﬁeld based on its distance from the separating focus

plane. A hit ratio is also stored, which is the number

of rays contributing to the near ﬁeld colour divided by

the total rays shot.

3.2.3 Accumulation and Reprojection

To increase our sample count, we use temporal accu-

mulation adapted from TAA (Karis, 2014). We ac-

cumulate the near and far ﬁeld ray trace colours over

time and use an exponential moving average to blend

between history and current frames. By default, we

use a high blend factor of 0.95 to stabilize the image.

However, to account for movement, we leverage

per-pixel depth and motion information for reprojec-

tion like in Lehtinen et al. (2011). As such, we use

motion vectors calculated based on Rosado (2008) to

reproject near ﬁeld pixels. As for the far ﬁeld, we re-

quire an approximation of the far ﬁeld world position

of our target pixel. We ﬁrst attempt to compute the av-

erage world position of our target pixel from far ﬁeld

hits of our ray trace pass. Under low ray counts, in the

event that there is no far ﬁeld hit, the target pixel does

not give any depth information for far ﬁeld reprojec-

tion. In such cases, we obtain the average far ﬁeld

world position of neighbouring pixels in a 3×3 region

instead. Then, we use the computed world position

to calculate the previous screen space position of the

target pixel. This approximates reprojection for oc-

cluded objects appearing in semi-transparent regions.

Since reprojection is different for the near and far

ﬁelds, we normalize the ﬁnal colour based on the ac-

cumulated hit ratio h. Otherwise, we run the risk of

having varying colour intensities in our merged result.

Hence, we perform linear interpolation (lerp) on the

near and far ﬁeld colours based on h. We also lerp the

new average near and far ﬁeld CoC sizes accordingly

to get an approximate CoC size for the current frame

for spatial reconstruction later. However, during mo-

tion, we lerp based on the latest hit ratio rather than h

to prevent the ghosting (or smearing) of the far ﬁeld

within the silhouette of near ﬁeld objects.

Like Dayal et al. (2005), we sample more for ar-

eas of large colour variance. Borrowing from Schied

et al. (2017), we calculate variance estimates using lu-

minance values of the ﬁnal merged ray trace image to

identify regions of high noise. This includes specular

bokeh shapes that are difﬁcult to converge as well as

newly ray-traced regions. The variance texture then

undergoes a Gaussian blur before it is used to deter-

mine the number of rays to be shot in the next frame.

3.2.4 Spatial Reconstruction

Before spatial reconstruction, the ray trace colour is

median ﬁltered (Section 3.1.3), which helps to re-

move sparse unconverged bokeh shapes formed by

small specular highlights, trading accuracy for im-

age quality. For reconstruction, we use a circular

kernel to gather the surrounding colour contributions

of neighbouring pixels. The kernel is scaled for

sampling using the average CoC size of the current

frame collected from temporal accumulation, which

is also used to determine the mip level or level-of-

detail (LOD) for the sampling. Samples with CoC

radius smaller than their distance to the target pixel

are rejected, similar to the post-process ﬁltering ap-

proach as described earlier. Finally, we lerp the orig-

inal colour and hit ratio of the target pixel with that

of its neighbours by clamped variance estimates b to

avoid over-blurring in converged regions, as shown in

the equation below.

b = clamp(σ

·2000, 0, 0.9) (5)

In the above equation, we are calculating how ag-

gressive our variable size blur is for the ray-traced

output. The variance is scaled by a large weight of

2000 to blur any pixels with a small variance. Once

again, this value is chosen based on observation of

output as opposed to physically correct rendering.

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications

3.3 Composite

For the ﬁnal image, we apply the ray trace, post-

process and sharp rasterized colours onto pixels based

on their z-distances in relation to the zone of focus.

To determine the depth range of the zone of focus, we

ﬁrst compute the range for which the CoC size of pix-

els is less than

√

2. If the CoC of a point is smaller

than a “pixel” on our camera sensor (or a sensor cell),

it will appear as a single pixel in the ﬁnal image and

can be considered in focus. Hence, we determined

that the zone of focus is the set of z-values where:

a · f ·d

(a · f +

√

2(d − f ).

)

≤z ≤

a · f ·d

(a · f −

√

2(d − f ) ·

)

(6)

Here, a refers to the aperture diameter of the cam-

era lens, f its focal length, d the distance between the

lens and the focus plane, w

the sensor’s width in met-

ric units and w

the image’s width in pixels.

Within the zone of focus, the full resolution un-

blurred rasterized colour is applied instead of the ﬁl-

tered ray trace or post-process colours upscaled from

half resolution. Outside the zone of focus, for near

ﬁeld objects and their silhouettes, we then apply the

ray trace colour to form accurate semi-transparent

silhouettes. However, for bright bokeh shapes, we

favour the post-process over the ray trace colour based

on the bokeh shape intensity, i.e., the proportion of

samples with high specular values gathered from the

main ﬁlter pass of the post-process stage. Through

this, we minimize noise and ghosting artifacts from

the ray trace colour in specular bokeh shapes.

For far ﬁeld geometry out of the zone of focus, we

adaptively blend the ray trace colour with the post-

process colour using the hit ratio. If the hit ratio is

high, we favour the post-process colour as fewer rays

hit the far ﬁeld. On the other hand, if the hit ra-

tio is low, this means that the number of hits in the

far ﬁeld is comparable to that of the near ﬁeld, so

more ray trace colour is used. However, there are also

fewer foreground hits closer to the edges of the ray

mask. Hence, when blending ray trace colour with

post-process colour, using our hit ratio as-is causes

blur discontinuities and tiling artifacts from our ray

mask. To minimize them while trying to retain the

ray-traced semi-transparencies, we only blend at hit

ratios below 0.3 to produce a smooth transition be-

tween the ray trace and post-process colours, as illus-

trated in the formula below. If we bias towards the

post-process colours (> 0.3), we will lose the semi-

transparencies rendered by our ray trace pass.

h = smoothstep(0, 0.3, h) (7)

3.4 TAA

Fireﬂies are artifacts appearing due to the sampling

of very bright pixels which get spread out during spa-

tial low-pass ﬁltering but are not temporally stable.

As the camera moves, the bright spots tend to ﬂicker

from one frame to another. This persistent ﬂickering

is likely due to the temporal instability of the initial

rasterized image. Because of its low-pass ﬁltering

stages, post-processed DoF is particularly sensitive to

pixel ﬂicker and will tend to spread the small tempo-

rally unstable highlights, creating ﬁreﬂies artifacts.

Following the example from Abadie (2018) to re-

move ﬂickering bright pixels, we resolve our ﬁnal im-

age with TAA (Karis, 2014), reprojecting previous

frames onto the newest frames. To further stabilize

our result, we also apply TAA to the initial rasterized

image before ﬁltering, as well as to the image gener-

ated after ﬁltering to stabilize camera jitter.

4 IMPLEMENTATION

4.1 Falcor

We used the NVIDIA Falcor real-time rendering

framework (Benty et al., 2020) with DirectX 12 back-

end to utilize ray tracing acceleration and develop our

hybrid DoF pipeline. We also made use of Falcor’s

graphics techniques library to apply Gaussian blur on

our G-Buffer and luminance variance in generating

our ray mask, as well as TAA to stabilize our result.

4.2 Scene-dependent Values

The scenes used for testing our hybrid DoF imple-

mentation are THE MODERN LIVING ROOM (Wig42,

2014) and UE4 SUN TEMPLE (Epic Games, 2017).

We also tested our approach on AMAZON LUMBER-

YARD BISTRO (Amazon Lumberyard, 2017).

For all 3 scenes, we used s = 0.8 for scaling in

our ray mask as it worked well to identify edges

when coupled with our variance estimation. Increas-

ing the value would bias to shooting more rays even

at relatively ﬂat surfaces. For THE MODERN LIVING

ROOM (or PINK ROOM), maximum ray values m of

10 to 20 per pixel worked well. Increasing the number

of rays had diminishing returns for the visual quality

of bokeh shapes. For SUN TEMPLE and the exterior

scene of AMAZON LUMBERYARD BISTRO (BISTRO

EXTERIOR), we had to keep our maximum ray count

at 10 to get interactive frame rates for large ray masks.

However, as SUN TEMPLE and BISTRO EXTERIOR

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games

are more complex with textures, any noise observed

was less noticeable as compared to PINK ROOM.

Regarding the depth range for our post-process

main ﬁlter pass, as compared to 100 feet as suggested

by Jimenez, we found that a value of 10 cm worked

best for our test scenes due to their relatively close

geometries. As for our mip sampling in spatial recon-

struction, we used a simple clamped linear mapping

to determine the LOD to sample from, as illustrated in

the formula below, where c

refers to the temporally-

accumulated CoC of each pixel. This linear mapping

was devised from our observation of the amount of

blur in specular bokeh shapes for our test scenes.

lod = clamp(c

·0.05, 0, 3) (8)

We hope to further simplify these scene-

dependent variables for ease of use by artists.

5 RESULTS

5.1 Graphics Quality Comparison

We evaluate our hybrid DoF method (c) against our

adaptation of fully post-processed DoF (Jimenez,

2014) without local background reconstruction (a),

Unreal Engine 4 DoF (Abadie, 2018) (b) and fully

ray-traced DoF (Cook et al., 1984) (d) in this section

as well as our demo video (link).

(a) (b) (c) (d)

Figure 6: Semi-transparencies of foreground silhouettes.

Semi-transparencies are of much better quality

with hybrid DoF, as seen in Figure 6. For the adapted

post-process DoF, the foreground blur is extended out

of the object’s silhouette to overwrite the background

colour. However, in hybrid DoF, we can see the back-

ground object, speciﬁcally, the bottom right corner of

the cushion appear along the edge of the cushion in

front. This same corner is not visible for UE4.

To quantitatively compare the visual quality of our

hybrid result with the ground truth (Cook et al., 1984),

we chose a shot of PINK ROOM with very blurred

foreground geometry that exhibits highly specular

bokeh shapes and semi-transparent silhouettes in Fig-

ure 7. When compared to ground truth DoF, increas-

ing the maximum number of rays of hybrid DoF for

the shot at relatively low ray counts increases the

structural similarity index (SSIM), suggesting that our

Figure 7: Top: reference image at m = 10. Bottom: graph

of SSIM and ray trace pass timings against ray budget m.

reconstruction ﬁlter might be effective in improving

image quality. However, at higher ray counts, there

are diminishing returns. At this stage, our technique is

also unable to reach ground truth quality even at high

ray counts, due to some approximated values used in

our post-process stage and composition artifacts.

5.2 Performance

Using the Falcor API helped speed up implementa-

tion by handling the loading of scene assets, setting

up the rendering pipeline and creating ray tracing ac-

celeration structures. However, the API hides many

low-level details from Direct3D, making it difﬁcult to

optimize rendering. Consequently, frame rate ﬁgures

are not representative of how our algorithm would

perform if professionally implemented and properly

optimized for games. Nonetheless, we believe that

our technique can be adapted and optimized for in-

teractive rendering and motivate further research in

this direction. We have achieved relatively interactive

frame rates without extensive optimization, validating

hybrid rendering as a proof of concept.

Our measurements are taken with the Nsight pro-

ﬁling tool on an Intel Core i7-8700K CPU at 16GB

RAM with an NVIDIA GeForce RTX 2080 Ti GPU.

The shots used include a background-dominant (BG)

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications

Table 1: Shots used for proﬁling.

m 1 30 50

wide shot and a foreground-dominant (FG) close-up

of PINK ROOM, as well as a mixed (M) medium shot

of BISTRO EXTERIOR with different m as shown in

Table 1, with their performances in Table 2. In partic-

ular, the ray trace pass duration is calculated from the

combined GPU timings of the Direct3D API calls for

building acceleration structures and dispatching rays.

It can be noted that post-processing performances

are on par with implementations in a production en-

gine. As a reference, Unreal Engine achieves 1.59

ms on a GTX 1080 (Abadie, 2018). While the dura-

tion of other passes remains relatively constant, the

cost of the ray trace pass increases with additional

rays shot per pixel except for BG. This is because

background geometry is predominantly rendered with

post-processing, which means that few rays are traced

at all. For M, part of the moped is in the foreground

and is ray-traced, while the cushion and parts of the

sofa are completely ray-traced for FG. Hence, an in-

creasing m gives a dip in frame rate for FG but not M.

The cost of 1 ray for M and FG are 0.04 ms and 0.07

ms respectively. These observations allow us to de-

sign content-based adaptive trade-offs between qual-

ity and performance, which we defer to future work.

5.3 Limitations and Future Work

Although our technique improves the quality of semi-

transparencies at the silhouettes of blurry foreground

geometry, we acknowledge that it currently might not

fare as well as other more efﬁcient state-of-the-art

post-processing approaches in the following aspects.

We hope to continue working on performance opti-

mizations alongside visual enhancements.

(a) Ghosting (b) Tiling

Figure 8: Artifacts.

Ghosting artifacts currently appear from temporal

accumulation as shown in Figure 8a. Using colour-

based neighbourhood clamping to combat ghosting as

explained in Karis (2013) reintroduces noise as noted

in Schied et al. (2017). While we have reduced the

blend factor during motion, ghosting is still not elim-

inated especially at low frame rates as it takes longer

for temporal accumulation to converge to the new

colour. Potentially, we could adopt ideas from Schied

et al. (2018) which manages to eliminate ghosting ar-

tifacts by estimating per-pixel blend factors.

Due to our insufﬁciently-sized ray mask, tiling ar-

tifacts are also observed when foreground objects are

too close to the camera as seen in Figure 8. Ideally,

our ray mask should be scaled based on CoC size to

account for the sizeable blur of objects close to the

camera. Hence, for our technique, we could scale our

ray mask by the maximum CoC in the neighbourhood.

With 1 sample per pixel, noise generated is inher-

ently difﬁcult to remove. Adopting a ﬁnal blur like

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games

Table 2: Pass durations (in ms) and frame rates.

Shot m 1 30 50

BG 1.70 1.72 1.70

M 2.07 2.05 2.04

Rasterization

FG 1.90 1.89 1.89

BG 1.43 1.44 1.42

M 1.63 1.62 1.69

Post-Process

FG 1.68 1.67 1.66

BG 0.64 0.64 0.65

M 0.66 0.66 0.66

G-Buffer & σ

Blur

FG 0.65 0.65 0.64

BG 0.18 0.19 0.18

M 0.26 1.36 2.08

Ray Trace

FG 0.40 2.08 3.25

BG 0.60 0.60 0.60

M 0.60 0.60 0.60

Accumulation

FG 0.61 0.61 0.61

BG 0.35 0.34 0.34

M 0.33 0.34 0.34

Median

FG 0.34 0.34 0.34

BG 0.42 0.42 0.42

M 0.81 0.79 0.81

Recon-Composite

FG 1.42 1.40 1.42

BG 0.53 0.53 0.53

M 0.53 0.53 0.53

Final TAA

FG 0.53 0.53 0.53

BG 0.64 0.64 0.66

M 0.64 0.61 0.65

Others

FG 0.64 0.62 0.65

BG 2.19 2.19 2.19

M 2.66 3.75 4.49

Our Pass

FG 3.42 5.08 6.26

BG 6.49 6.52 6.50

M 7.53 8.56 9.40

Total Duration

FG 8.17 9.79 10.99

BG 181 179 180

M 96 97 97

Frame Rate

FG 146 122 110

Barr

e-Brisebois et al. (2019) for reﬂections resulted

in a loss of detail in semi-transparent areas. However,

considering that we only ray trace within a ray mask,

using a variable number of rays is a good compromise

for better image quality.

As many post-process effects remain to be en-

hanced with ray tracing, we are also exploring hybrid

rendering for motion blur. Post-processed motion blur

poses similar issues of semi-transparencies. Hence,

we are also investigating the use of ray tracing to

uncover true background information behind motion-

blurred foreground objects (Tan et al. (2020b)).

6 CONCLUSION

We present a hybrid real-time rendering technique for

the DoF effect in games. Our ray trace pass attains

better image quality by rendering more accurate semi-

transparencies with minimal artist overhead. Addi-

tionally, our ray mask and adaptive ray count, even

when unoptimized, allow us to achieve relatively in-

teractive frame rates. In future, we hope to augment

and incorporate other effects like motion blur into our

hybrid real-time rendering pipeline. Our hybrid ren-

dering engine will be open-sourced for the beneﬁt of

the research community and the industry.

ACKNOWLEDGEMENTS

We thank Wyman (2018) for the Falcor scene ﬁle

of THE MODERN LIVING ROOM (CC BY) as well

as the NVIDIA ORCA for that of UE4 SUN TEM-

PLE (CC BY-NC-SA) and AMAZON LUMBERYARD

BISTRO (CC BY). This work is supported by the

Singapore Ministry of Education Academic Research

grant T1 251RES1812, “Dynamic Hybrid Real-time

Rendering with Hardware Accelerated Ray-tracing

and Rasterization for Interactive Applications”.

REFERENCES

Abadie, G. (2018). Advances in real-time rendering in

games: A life of a bokeh. In ACM SIGGRAPH 2018

Courses, SIGGRAPH ’18, New York, NY, USA. As-

sociation for Computing Machinery.

Akenine-M

oller, T., Munkberg, J., and Hasselgren, J.

(2007). Stochastic rasterization using time-continuous

triangles. In Proceedings of the 22nd ACM SIG-

GRAPH/EUROGRAPHICS Symposium on Graphics

Hardware, GH ’07, pages 7–16, Goslar, DEU. Euro-

graphics Association.

Amazon Lumberyard (2017). Amazon lumberyard bistro,

open research content archive (orca).

Andrade, P., Sabino, T. L., and Clua, E. (2014). Towards a

heuristic based real time hybrid rendering - a strategy

to improve real time rendering quality using heuristics

and ray tracing. In 2014 International Conference on

Computer Vision Theory and Applications (VISAPP),

volume 3, pages 12–21.

Barr

e-Brisebois, C., Hal

en, H., Wihlidal, G., Lauritzen,

A., Bekkers, J., Stachowiak, T., and Andersson, J.

(2019). Hybrid rendering for real-time ray tracing. In

Haines, E. and Akenine-M

oller, T., editors, Ray Trac-

ing Gems, chapter 25. Apress. http://raytracinggems.

com.

Beck, S., Bernstein, A. C., Danch, D., and Fr

ohlich, B.

(1981). Cpu-gpu hybrid real time ray tracing frame-

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications

work. volume 0, pages 1–8. The Eurographics Asso-

ciation and Blackwell Publishing Ltd.

Belcour, L., Soler, C., Subr, K., Holzschuch, N., and Du-

rand, F. (2013). 5d covariance tracing for efﬁcient de-

focus and motion blur. ACM Trans. Graph., 32(3).

Benty, N., Yao, K.-H., Clarberg, P., Chen, L., Kallweit,

S., Foley, T., Oakes, M., Lavelle, C., and Wyman, C.

(2020). The Falcor rendering framework.

Bukowski, M., Hennessy, P., Osman, B., and McGuire, M.

(2013). The Skylanders SWAP Force depth-of-ﬁeld

shader. In GPU Pro 4: Advanced Rendering Tech-

niques, pages 175–184.

Cabeleira, J. P. G. (2010). Combining rasterization and ray

tracing techniques to approximate global illumination

in real-time. Master’s thesis, Portugal.

Canny, J. (1986). A computational approach to edge de-

tection. IEEE Trans. Pattern Anal. Mach. Intell.,

8(6):679–698.

Catmull, E. (1984). An analytic visible surface algorithm

for independent pixel processing. SIGGRAPH Com-

put. Graph., 18(3):109–115.

Chalmers, A., Debattista, K., and dos Santos, L. P. (2006).

Selective rendering: Computing only what you see.

In Proceedings of the 4th International Conference

on Computer Graphics and Interactive Techniques

in Australasia and Southeast Asia, GRAPHITE ’06,

pages 9–18, New York, NY, USA. ACM.

Chen, C.-C. and Liu, D. S.-M. (2007). Use of hardware

z-buffered rasterization to accelerate ray tracing. In

Proceedings of the 2007 ACM Symposium on Applied

Computing, SAC ’07, pages 1046–1050, New York,

NY, USA. ACM.

Cook, R. L., Porter, T., and Carpenter, L. (1984). Dis-

tributed ray tracing. SIGGRAPH Comput. Graph.,

18(3):137–145.

Dayal, A., Woolley, C., Watson, B., and Luebke, D. (2005).

Adaptive frameless rendering. In ACM SIGGRAPH

2005 Courses, SIGGRAPH ’05, New York, NY, USA.

ACM.

Demers, J. (2004). Depth of ﬁeld: A survey of techniques.

In GPU Gems, chapter 23. Pearson Higher Education.

Epic Games (2017). Unreal engine sun temple, open re-

search content archive (orca).

Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark,

W. R., and Hanrahan, P. (2009). Data-parallel ras-

terization of micropolygons with defocus and motion

blur. In Proceedings of the Conference on High Per-

formance Graphics 2009, HPG ’09, page 59–68, New

York, NY, USA. Association for Computing Machin-

ery.

Franke, L., Hofmann, N., Stamminger, M., and Selgrad, K.

(2018). Multi-layer depth of ﬁeld rendering with tiled

splatting. Proc. ACM Comput. Graph. Interact. Tech.,

1(1).

Garcia, K. (2017). Circular separable convolution depth of

ﬁeld. In ACM SIGGRAPH 2017 Talks, SIGGRAPH

’17, New York, NY, USA. Association for Computing

Machinery.

Hach, T., Steurer, J., Amruth, A., and Pappenheim, A.

(2015). Cinematic bokeh rendering for real scenes.

In Proceedings of the 12th European Conference on

Visual Media Production, CVMP ’15, New York, NY,

USA. Association for Computing Machinery.

Haeberli, P. and Akeley, K. (1990). The accumulation

buffer: Hardware support for high-quality rendering.

In Proceedings of the 17th Annual Conference on

Computer Graphics and Interactive Techniques, SIG-

GRAPH ’90, page 309–318, New York, NY, USA. As-

sociation for Computing Machinery.

Hertel, S., Hormann, K., and Westermann, R. (2009). A

hybrid gpu rendering pipeline for alias-free hard shad-

ows. In Ebert, D. and Kr

uger, J., editors, Eurographics

2009 Areas Papers, pages 59–66, M

unchen, Germany.

Hou, Q., Qin, H., Li, W., Guo, B., and Zhou, K. (2010). Mi-

cropolygon ray tracing with defocus and motion blur.

ACM Trans. Graph., 29(4).

Jimenez, J. (2014). Advances in real-time rendering in

games, part i: Next generation post processing in call

of duty: Advanced warfare.

Karis, B. (2013). Tone mapping.

Karis, B. (2014). High quality temporal supersampling.

Kraus, M. and Strengert, M. (2007). Depth-of-ﬁeld render-

ing by pyramidal image processing. Comput. Graph.

Forum, 26:645–654.

Lauterbach, C. and Manocha, D. (2009). Fast hard and soft

shadow generation on complex models using selective

ray tracing. Technical report tr09-004, UNC CS.

Lee, S., Eisemann, E., and Seidel, H.-P. (2009). Depth-of-

ﬁeld rendering with multiview synthesis. ACM Trans.

Graph., 28(5):1–6.

Lee, S., Kim, G., and Choi, S. (2008). Real-time depth-of-

ﬁeld rendering using point splatting on per-pixel lay-

ers. Comput. Graph. Forum, 27:1955–1962.

Lehtinen, J., Aila, T., Chen, J., Laine, S., and Durand,

F. (2011). Temporal light ﬁeld reconstruction for

rendering distribution effects. ACM Trans. Graph.,

30(4):55:1–55:12.

Lei, K. and Hughes, J. F. (2013). Approximate depth of ﬁeld

effects using few samples per pixel. In Proceedings of

the ACM SIGGRAPH Symposium on Interactive 3D

Graphics and Games, I3D ’13, page 119–128, New

York, NY, USA. Association for Computing Machin-

ery.

Leimk

uhler, T., Seidel, H.-P., and Ritschel, T. (2018).

Laplacian kernel splatting for efﬁcient depth-of-ﬁeld

and motion blur synthesis or reconstruction. ACM

Trans. Graph., 37(4).

Macedo, D. V. D., Serpa, Y. R., and Rodrigues, M. A. F.

(2018). Fast and realistic reﬂections using screen

space and gpu ray tracing—a case study on rigid and

deformable body simulations. Comput. Entertain.,

16(4).

Marrs, A., Spjut, J., Gruen, H., Sathe, R., and McGuire, M.

(2018). Adaptive temporal antialiasing. In Proceed-

ings of the Conference on High-Performance Graph-

ics, HPG ’18, pages 1:1–1:4, New York, NY, USA.

ACM.

McGraw, T. (2015). Fast bokeh effects using low-rank lin-

ear ﬁlters. Vis. Comput., 31(5):601–611.

A Hybrid System for Real-time Rendering of Depth of Field Effect in Games

McIntosh, L., Riecke, B. E., and DiPaola, S. (2012). Efﬁ-

ciently simulating the bokeh of polygonal apertures in

a post-process depth of ﬁeld shader. Comput. Graph.

Forum, 31(6):1810–1822.

Mehta, S. U., Yao, J., Ramamoorthi, R., and Durand, F.

(2014). Factored axis-aligned ﬁltering for rendering

multiple distribution effects. ACM Trans. Graph.,

33(4).

Potmesil, M. and Chakravarty, I. (1982). Synthetic im-

age generation with a lens and aperture camera model.

ACM Trans. Graph., 1(2):85–108.

Rosado, G. (2008). Chapter 27. motion blur as a post-

processing effect. In Nguyen, H., editor, GPU Gems

3, chapter 27.

Sattlecker, M. and Steinberger, M. (2015). Reyes rendering

on the gpu. In Proceedings of the 31st Spring Confer-

ence on Computer Graphics, SCCG ’15, page 31–38,

New York, NY, USA. Association for Computing Ma-

chinery.

Schied, C., Kaplanyan, A., Wyman, C., Patney, A., Chai-

tanya, C. R. A., Burgess, J., Liu, S., Dachsbacher,

C., Lefohn, A., and Salvi, M. (2017). Spatiotempo-

ral variance-guided ﬁltering: Real-time reconstruction

for path-traced global illumination. In Proceedings

of High Performance Graphics, HPG ’17, pages 2:1–

2:12, New York, NY, USA. ACM.

Schied, C., Peters, C., and Dachsbacher, C. (2018). Gra-

dient estimation for real-time adaptive temporal ﬁlter-

ing. Proc. ACM Comput. Graph. Interact. Tech., 1(2).

Selgrad, K., Reintges, C., Penk, D., Wagner, P., and Stam-

minger, M. (2015). Real-time depth of ﬁeld using

multi-layer ﬁltering. In Proceedings of the 19th Sym-

posium on Interactive 3D Graphics and Games, i3D

’15, page 121–127, New York, NY, USA. Association

for Computing Machinery.

Shirley, P., Aila, T., Cohen, J., Enderton, E., Laine, S., Lue-

bke, D., and McGuire, M. (2011). A local image re-

construction algorithm for stochastic rendering. In

Symposium on Interactive 3D Graphics and Games,

I3D ’11, page 9–14, New York, NY, USA. Associa-

tion for Computing Machinery.

Smith, J. L. (1996). Implementing median ﬁlters in xc4000e

fpgas. Xcell, 23(1).

Sousa, T. A. (2013). Advances in real-time rendering in

games, part ii: Graphics gems from cryengine 3.

Tan, Y. W., Chua, N., Biette, N., and Bhojan, A. (2020a).

Hybrid dof: Ray-traced and post-processed hybrid

depth of ﬁeld effect for real-time rendering. In ACM

SIGGRAPH 2020 Posters, SIGGRAPH ’20, New

York, NY, USA. Association for Computing Machin-

ery.

Tan, Y. W., Xiaohan, C., and Bhojan, A. (2020b). Hybrid

mblur: Using ray tracing to solve the partial occlusion

artifacts in real-time rendering of motion blur effect.

In ACM SIGGRAPH 2020 Posters, SIGGRAPH ’20,

New York, NY, USA. Association for Computing Ma-

chinery.

Vaidyanathan, K., Munkberg, J., Clarberg, P., and Salvi, M.

(2015). Layered light ﬁeld reconstruction for defocus

blur. ACM Trans. Graph., 34(2).

Valient, M. (2013). Killzone shadow fall.

Weier, M., Roth, T., Hinkenjann, A., and Slusallek, P.

(2018). Foveated depth-of-ﬁeld ﬁltering in head-

mounted displays. ACM Trans. Appl. Percept., 15(4).

Whitted, T. (1979). An improved illumination model

for shaded display. SIGGRAPH Comput. Graph.,

13(2):14–.

Wig42 (2014). The modern living room.

Wyman, C. (2018). Introduction to directx raytracing. In

ACM SIGGRAPH 2018 Courses, SIGGRAPH ’18.

Yan, L.-Q., Mehta, S. U., Ramamoorthi, R., and Durand, F.

(2016). Fast 4d sheared ﬁltering for interactive render-

ing of distribution effects. ACM Trans. Graph., 35(1).

Zhang, X., Matzen, K., Nguyen, V., Yao, D., Zhang, Y., and

Ng, R. (2019). Synthetic defocus and look-ahead aut-

ofocus for casual videography. ACM Trans. Graph.,

38(4).

GRAPP 2022 - 17th International Conference on Computer Graphics Theory and Applications