Study of Coding Units Depth for Depth Maps Quality Scalable

Compression Using SHVC

Dorsaf Sebai

, Faouzi Ghorbel and Sounia Messbahi

Cristal Laboratory, National School of Computer Science, Manouba University, Tunisia

Keywords:

Scalable High Efﬁciency Video Coding, Depth Maps, Coding Units, Quality Scalability.

Abstract:

Scalable High Efﬁciency Video Coding (SHVC) is used to adaptively encode texture images. SHVC archi-

tecture is composed of Base and Enhancement Layers (BL and EL), with an interlayer picture processing

module between them. In order to ensure effective encoding, each picture is divided into a certain number of

Coding Units (CUs), with different depths, composing the Coding Tree Unit (CTU). Being initially dedicated

to texture images, SHVC does not provide the same efﬁciency when applied to depth maps. To understand the

causes behind, we propose to study the SHVC CTU partitioning for depth maps. This can be a starting point

to propose an efﬁcient 3D video scalable compression. Main observations of this study show that the depth

of most CUs is 2 and 3 for texture images. However, this depth is either 0 or 1 for depth maps. Moreover,

CUs depths frequently change when passing from the base and enhancement layers of SHVC for the non-ﬂat

regions. This is not the case for the smooth regions that generally preserve the same CUs depths in the two

SHVC layers.

1 INTRODUCTION

3D video has become increasingly popular with ad-

vances in communication, display and related areas.

Today, 3D video leads to the emergence of new tech-

nologies, namely virtual, augmented and mixed reali-

ties that ﬁnd their applications in several ﬁelds such as

health, education, and industry. Even for Internet of

Things (IoT), the future is for the 3D vision and IoT

with depth to allow machines, such as autonomous

cars, robots and drones, a deep perception like human

beings. This is all the more true as cameras, which

simultaneously capture the image and its depth, be-

come more and more accessible to the general public

thanks to their integration in smartphones.

The digital age has greatly changed the consump-

tion of 3D video content, deﬁning new trends and con-

straints that the standards of compression must face.

3D videos have in fact become accessible on many

devices, such as television, computer, smartphones,

IoT gadgets and by many transmission media such

as the Internet, mobile, terrestrial and satellite net-

works. At the same time, users are increasingly de-

manding good quality. This is especially true with the

emergence of new video formats, such as Ultra High

Deﬁnition (UHD), High Dynamic Range (HDR) and

High Frame Rate (HFR). Faced to these challenges,

https://orcid.org/0000-0001-7720-2741

scalable compression is required to provide multiple

streams of the same video that meet the heteroge-

neous needs of the receivers.

Being a scalable extension of the High Efﬁ-

ciency Video Coding (HEVC) (Sullivan et al., 2012)

video compression standard, the Scalable High Efﬁ-

ciency Video Coding standard (SHVC) (Boyce et al.,

2016) makes it possible to perform scalable encod-

ing. SHVC is dedicated to the scalable compression

of conventional 2D videos whose only component is

texture images. It is not adapted to depth maps as it in-

duces damaging visual artifacts at sharp depth discon-

tinuities (Sebai, 2020). Further, 3D High Efﬁciency

Video Coding (3D-HEVC) (Tech et al., 2016), is the

latest standard dedicated to the compression of depth

maps. But, it does not allow scalable compression of

these latter. However, the need for scalable compres-

sion also persists for 3D video used by many appli-

cations, such as virtual, augmented and mixed reality.

In this paper, we propose a study of the SHVC CTU

partitioning for depth images in order to evaluate its

efﬁciency in 3D context. Firstly, we aim to anal-

yse the CTUs depth in texture images versus depth

maps. Secondly, we analyze the CTU construction in

Base Layer (BL) versus Enhancement Layer (EL) of

SHVC.

The rest of this paper is organized as follows. In

Section 2 and 3, we present the concepts required to

114

Sebai, D., Ghorbel, F. and Messbahi, S.

Study of Coding Units Depth for Depth Maps Quality Scalable Compression Using SHVC.

DOI: 10.5220/0011706400003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

114-120

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

understand the study of CUs structure which is de-

tailed in Section 4. The conclusion is drawn in Sec-

tion 5.

2 MULTIVIEW VIDEO PLUS

DEPTH

Multiview video plus depth (MVD) (Merkle et al.,

2007) is a 3D video format which consists of more

than two streams videos each including two compo-

nents, texture images and their corresponding depth

maps. 3D-HEVC, the latest 3D video coding stan-

dards, is developed to efﬁciently compress MVD

data, especially depth maps. These latter are two-

dimensional representation of the scene geometry.

For each texture pixel, a corresponding pixel exists in

the depth map. The depth pixel value is the distance

between a point in 3D space, captured by the corre-

sponding texture pixel, and the camera. Knowing the

intrinsic and extrinsic parameters of the camera, this

depth value makes it possible to project a pixel into

3D space. If a view synthesis is desired, the pixel is

then projected onto a virtual viewing plane thereby

generating a new virtual view.

As shown in Fig. 1, depth maps are represented

as 2D grayscale images. This representation allows

256 depth values, ranging from 0 to 255. The inten-

sity value 255 deﬁnes the closest scene objects to the

camera. The farthest objects correspond to the gray

level 0. Being different from texture images, depth

maps are distinguished by their piecewise planar def-

inition and the impact of depth discontinuities on the

quality of the synthesized views. Indeed, each plane

corresponds to an object of the scene where intensi-

ties of its coplanar pixels vary in a regular way; and

contours, placed at the objects edges, reproduce sharp

depth discontinuities between objects in foreground

and background (Yea and Vetro, 2009). These dis-

continuities must be preserved as their compression

results in a highly visible degradation of the synthe-

sized views. However, it is not the case for smooth re-

Figure 1: Example of a texture image (left) and its associ-

ated depth map (right).

gions. The distortions of compression errors in these

regions have a less noticeable or even limited effect.

3 SCALABLE VIDEO

COMPRESSION

Scalability consists in extracting several versions

from a stream according to the users needs, termi-

nals capacities, and networks conditions (Tohidypour,

2016). In the particular case of images, a stream is

said to be scalable when parts of this latter can be

deleted, so that the resulting sub-stream forms another

valid stream for a target decoder. A Base Layer (BL)

is produced, and other Enhancement Layers (EL) are

used to best adapt the capabilities of the receiver.

3.1 Scalable High Efﬁciency Video

Coding

The SHVC (Boyce et al., 2016) encoder is the exten-

sion of HEVC which currently offers different types

of scalability such as temporal scalability, spatial scal-

ability and quality scalability also known as Signal

to Noise Ratio (SNR) scalability. The principle of

SHVC consists in encoding each image of the input

video into a hierarchy of layers, starting from a BL

that is gradually reﬁned by the addition of ELs.

SHVC encodes input images in the BL bit stream

using the HEVC or Advanced Video Coding (AVC)

codec (Wiegand et al., 2003). The encoding of the EL

is performed using HEVC while leveraging the data

already encoded in the lower reference layers in order

to eliminate redundancies. This Inter Layer Process-

ing (ILP) block improves the efﬁciency of SHVC in

terms of compression ratio. It ensures that data al-

ready encoded in the previous layers is no longer en-

coded in the current layer. The bit streams of the BL

and EL are ﬁnally multiplexed according to the in-

creasing order of the layers indices. On decoding,

these streams are demultiplexed in order to decode

each of them taking into consideration the redundan-

cies retained by the ILP step.

3.2 Coding Tree Unit

In SHVC, a picture is divided into a quadtree, known

as CTU. This latter is composed of CUs that are

similar to macroblocks used in the previous AVC.

Whereas macroblocks can span 4× 4 to 16× 16 block

sizes, CTUs can process as many as 64 × 64 blocks.

This gives it the ability to compress information more

efﬁciently. Fig. 3 represents an illustration of the

Study of Coding Units Depth for Depth Maps Quality Scalable Compression Using SHVC

115

Figure 2: Architecture of SHVC: (a) SHVC encoder with two layers and (b) SHVC decoder with two layers (Gudumasu et al.,

2015).

SHVC CTU structure. Each CU can be splitted into

four sub-CUs due to a Rate Distortion Optimiza-

tion (RDO) decision function (Sullivan and Wiegand,

1998). This ergodic process looks for all possible

CUs to choose the ones with the smallest rate dis-

tortion cost. The maximum depth partitioning is set

equal to 4, numbered from 0 to 3. The CTU depth

(CTU Depth) 0 corresponds to the CU size of 64× 64,

CTU depth 1 to CU size of 32 × 32, CTU depth 2 to

CU size of 16 × 16 and CTU depth 3 to CU size of

8 × 8. A CU will be divided into Prediction Units

(PUs) that are used to carry the information related to

the prediction modes. To choose the best CTU struc-

ture, the Rate Distortion Cost (RDCost) is calculated

for each mode, and, at the end of the process, the par-

tition that introduces the lowest RDCost is selected.

Figure 3: Illustration of the Coding Tree Unit structure in

SHVC.

4 STUDY OF CODING TREE

UNIT STRUCTURE

Since it was originally designed for texture images,

SHVC is analyzed for depth maps. We aim speciﬁ-

cally to study the size of a CU between the base and

enhancement layers on the one hand and in the tex-

ture image and depth map on the other hand. For this

syudy, We use twenty images, ten texture images and

their corresponding depth maps as shown in Fig. 4.

To encode these latter, the SHVC Test Model refer-

ence encoder (SHM 12.0) is used. For each of the

images, two layers BL and EL are respectively coded

at QP values of 26 and 22.

Figure 4: Test images: Texture images (ﬁrst row) and their

associated depth maps (second row).

4.1 Coding Tree Unit Structure: Base

Layer vs. Enhancement Layer

In this present section, we seek to evaluate the amount

of depth maps regions for which the CTU depth does

not change between BL and EL. Fig. 5 and Table 1

show the percentages of CTUs that maintain the same

depth when passing from BL to EL (% Regions CTU

Depth BL EL). For smooth regions, we note a high

correlation between base and enhancement layers. In

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

116

fact, 85% of CTUs maintain their depth since object

plans of depth maps are mostly encoded from the BL

and no more partitioning is required for the EL. Un-

like smooth regions, less than 29% of contours re-

gions maintain the same CTU depth in EL relatively

to BL. This type of regions often needs to be more

splitted in EL, as the CTU depth in BL is insufﬁcient.

This observation is relevant to reduce the computa-

tional complexity of SHVC for depth maps coding. A

further work can, for example, aim at proposing an

algorithm that skips the CTU partitioning for smooth

regions in ELs.

Figure 5: Percentage of CTUs maintaining the same depth

between base layer and enhancement layer.

Table 1: Percentage of CTUs maintaining the same depth

between base layer and enhancement layer.

Sequences Regions with edges Smooth regions

Statue 18.45% 95.80%

Helicopter 16.00% 90.35%

Vase 15.21% 85.23%

Chess 11.06% 99.01%

Champagne 22.06% 92.02%

Breakdancers 6.11% 87.66%

Balloons 8.49% 89.20%

Ballet 10.95% 97.89%

Autumn 17.03% 87.39%

Armchair 29.14% 98.11%

4.2 Coding Tree Unit Structure:

Texture Images vs. Depth Maps

In order to study the most commonly used CTU

depths (CTU Depths) in SHVC quadtree partition-

ing, we present the percentage of each CTU depth,

i.e. 0, 1, 2 and 3, for both texture and depth im-

ages in BL (cf. Fig. 6) and EL (cf. Fig. 7). Ac-

cording to obtained results, we note that CTU Depths

2 and 3 are more frequently adopted, with an aver-

age percentage of 71.5%, in the texture images. They

are, however, less frequently adopted in depth maps,

with an average percentage of 46.74%. As texture im-

ages contain many and various details and patterns,

the quadtree decomposition is needed to be parti-

tioned to small sized coding units, e.g. 16 × 16 and

8 × 8. Thus, larger CTU depths, i.e. 2 and 3, are

commonly reached. However, this is not the case of

depth maps that mostly integrate smooth plans of dis-

tances to capture camera. Therefore, CTUs do not

need to be very partitioned and high depths are not

commonly reached. This implies that depth maps en-

coded using SHVC are generated with some missing

information; and that is what can impact the quality

of depth discontinuities and then the quality of syn-

thesized views. Here, integrating one or more of the

3D-HEVC DMMs, besides the conventional SHVC

prediction modes, can remedy this shortcoming. Fur-

thermore, when comparing the percentage of CTUs

of depth 3 in BL to those in EL, we note an increase

from an average of 46.17% to 53.33% for texture im-

ages and 14.96% to 24.16% for depth maps. This can

be explained by the decrease of the QP value from

26 to 22 when passing from BL to EL. In fact, more

details need to be detected in EL than in BL. This

involves using smaller CU of 8 × 8 size in order to

achieve higher quality.

5 CONCLUSION

Image and video compression is a multidisciplinary

and cross-cutting ﬁeld that is at the crossroads of

most innovative applications and domains and spans

nearly every image-centric industry, where both com-

pact and relevant information is needed. From health-

care to education, agriculture to manufacturing, and

beyond, stakeholders have to rely on compression to

help store and communicate their data. The study

carried in this paper provides a statistical analysis of

SHVC CTU partitioning for depth maps SNR scalable

compression. We typically look for CTU depths in

SHVC base and enhancement layers for depth maps,

as well as CTU depths in texture images and their cor-

responding depth ones. The obtained results lead to

the following conclusions:

• A high percentage of CTUs, that correspond to

ﬂat depth regions, preserves the same depth from

BL to EL. However, this is not the case of non-ﬂat

depth regions.

• CTU depths reached when encoding depth maps

are smaller than those reached for texture im-

ages. This can badly affect the depth disconti-

nuities preservation, and consequently the synthe-

sized views quality.

These interpretations can be very useful for our fu-

ture research directions that aim to adapt SHVC, orig-

inally conceived for texture, to depth.

Study of Coding Units Depth for Depth Maps Quality Scalable Compression Using SHVC

117

(a) Statue. (b) Helicopter.

(e) Champagne. (f) Breakdancers.

(g) Balloons. (h) Ballet.

(i) Autumn. (j) Armchair.

Figure 6: Percentage of CU depths for both texture images and depth maps in the base layer.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

118

(a) Statue. (b) Helicopter.

(e) Champagne. (f) Breakdancers.

(g) Balloons. (h) Ballet.

(i) Autumn. (j) Armchair.

Figure 7: Percentage of CU depths for both texture images and depth maps in the Enhancement layer.

Study of Coding Units Depth for Depth Maps Quality Scalable Compression Using SHVC

119

REFERENCES

Boyce, J. M., Ye, Y., Chen, J., and Ramasubramonian,

A. K. (2016). Overview of SHVC: Scalable Exten-

sions of the High Efﬁciency Video Coding Standard,

volume 26. IEEE Transactions on Circuits and Sys-

tems for Video Technology.

Gudumasu, S., He, Y., Ye, Y., and Xiu, X. (2015). Video

streaming with SHVC to HEVC transcoding. SPIE,

Applications of Digital Image Processing.

Merkle, P., Smolic, A., Muller, K., and Wiegand, T.

(2007). Multi-View Video Plus Depth Representation

and Coding. IEEE International Conference on Image

Processing.

Sebai, D. (2020). Performance analysis of hevc scalable ex-

tension for depth maps, volume 92. Springer Journal

of Signal Processing Systems for Signal, Image, and

Video Technology.

Sullivan, G. J., Ohm, J., Han, W., and Wiegand, T.

(2012). Overview of the High Efﬁciency Video Cod-

ing (HEVC) Standard, volume 22. IEEE Transactions

on Circuits and Systems for Video Technology.

Sullivan, G. J. and Wiegand, T. (1998). Rate-distortion op-

timization for video compression, volume 15. IEEE

Transactions on Circuits and Systems for Video Tech-

nology.

Tech, G., Chen, Y., M

uller, K., Ohm, J., Vetro, A., and

Wang, Y. (2016). Overview of the Multiview and

3D Extensions of High Efﬁciency Video Coding, vol-

ume 26. IEEE Transactions on Circuits and Systems

for Video Technology.

Tohidypour, R. (2016). Complexity reduction schemes for

video compression. dissertation doctorale.

Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra,

A. (2003). Overview of the H.264/AVC video coding

standard, volume 13. IEEE Transactions on Circuits

and Systems for Video Technology.

Yea, S. and Vetro, A. (2009). Multi-layered coding of depth

for virtual view synthesis. Picture Coding Sympo-

sium.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

120