Performance of Trusted Computing in Cloud Infrastructures

with Intel SGX

Anders T. Gjerdrum, Robert Pettersen, Håvard D. Johansen and Dag Johansen

UiT: The Arctic University of Norway, Tromsø, Norway

Keywords:

Privacy, Security, Cloud Computing, Trusted Computing, Performance.

Abstract:

Sensitive personal data is to an increasing degree hosted on third-party cloud providers. This generates strong

concerns about data security and privacy as the trusted computing base is expanded to include hardware com-

ponents not under the direct supervision of the administrative entity responsible for the data. Fortunately,

major hardware manufacturers now include mechanisms promoting secure remote execution. This paper stud-

ies Intel’s Software Guard eXtensions (SGX), and experimentally quantiﬁes how basic usage of this instruction

set extension will affect how cloud hosted services must be constructed. Our experiments show that correct

partitioning of a service’s functional components will be critical for performance.

1 INTRODUCTION

Sensors and mobile devices record ever more aspects

of our daily lives. This is causing an inﬂux of data

streams that feeds into potentially complex analytical

pipelines hosted remotely by various cloud providers.

Not only are the sheer amounts of data generated

cumbersome to store and analyze at scale; data might

also be accompanied by strict privacy requirements,

as is the case with smart home and health monitoring

devices (Gjerdrum et al., 2016).

Processing of sensitive and personal data in the

cloud requires the design of new Software-as-a-

Service (SaaS) architectures that are able to en-

force rigid privacy and security policies (Johansen

et al., 2015) throughout the entire hardware and soft-

ware stack, including the underlying cloud-provided

Infrastructure-as-a-Service (IaaS) components. Al-

though, commodity hardware mechanisms for trusted

computing have been available for some time (TCG

Published, 2011; Osborn and Challener, 2013), these

are often poised with performance and functional-

ity restrictions. Prior implementations by Intel, like

Trusted Platform Modules (TPM) and Trusted Execu-

tion Technology (TXT), are able to establish trust and

guarantee integrity of software, the latter also support-

ing rudimentary secure code execution.

Software Guard Extentions (SGX) (Anati et al.,

2013) is Intel’s new trusted computing platform that,

together with similar efforts by both ARM and AMD,

is quickly making general trusted computing a com-

modity. Fundamentally, SGX is an instruction set ex-

tension introduced with the Skylake generation of In-

tel’s Core architecture, supporting conﬁdentiality, in-

tegrity and attestation of trusted code running on un-

trusted platforms. SGX is able to counter a multi-

tude of different software and physical attacks by the

construction of secure enclaves consisting of trusted

code and data segments. While SGX should be con-

sidered an iterative technology built on previous ef-

forts, it surpasses previous iterations both in terms of

performance and functionality. SGX is designed to

provide general secure computing facilities allowing

developers to easily port their existing legacy appli-

cations into SGX enabled enclaves. These properties

make SGX an attractive technology for cloud-based

SaaS architectures that handle person sensitive data.

SGX is a proprietary platform and prior knowl-

edge is based on limited documentation describing its

architecture. Furthermore, little is known about the

performance of the primitives provided by the SGX

platform and how to author software utilizing these

primitives while maximizing performance.

In this paper we analyze the performance charac-

teristics of the SGX technology currently available to

better understand how such technologies can be used

to enforce privacy policies in cloud hosted SaaS archi-

tectures. We analyze SGX primitives at a ﬁne-grained

level and provide detailed performance evaluation of

the core mechanisms in SGX. The paper is structured

as follows: Section 2 outlines the relevant parts of the

SGX micro architecture while Section 3 outlines the

668

Gjerdrum, A., Pettersen, R., Johansen, H. and Johansen, D.

Performance of Trusted Computing in Cloud Infrastructures with Intel SGX.

DOI: 10.5220/0006373706960703

In Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017) , pages 668-675

ISBN: 978-989-758-243-1

details of our microbenchmark. Section 4 provides

an informed discussion of our ﬁndings and Section 5

detail relevant work before concluding remarks.

2 INTEL SOFTWARE GUARD

EXTENTIONS (SGX)

SGX allows regular application threads to transition

into secure enclaves by issuing the special EENTER

special instructions to a logical processor. Entry is

initiated by performing a controlled jump into the en-

clave code, analogous to how entry into virtual ma-

chine contexts occurs. A process can only enter an en-

clave from ring 3, i.e user level, and threads running

in enclave mode are not allowed to trigger software

interrupts, also prohibiting the use of system calls.

An application which requires access to common Op-

erating System (OS) provided services, like the ﬁle

system, must be carefully designed so that its threads

exit enclave mode through application deﬁned inter-

faces before invoking any system calls. Since SGX’s

Trusted Computing Base (TCB) does not include the

underlying OS, all such transitions, parameters, and

responses, must be carefully validated by the applica-

tion designer.

SGX allows multiple threads to execute inside the

same enclave. For each logical processor executing

inside an enclave a Thread Control Structure (TCS)

is needed. These data structures must be provisioned

before enclave startup, and are stored in the Enclave

Page Cache (EPC) main-memory pages set aside for

enclaves. Among other things, the TCS contains the

OENTRY ﬁeld which is loaded into the instruction

pointer when entering an enclave. Before doing so,

SGX stores the execution context of the untrusted

code into regular memory, by using the XSAVE in-

struction, which then again is restored when exiting

the enclave. Stack pointers are not modiﬁed when

entering an enclave, however (Costan and Devadas,

2016) suggests that to avoid the possibility of exploits,

it is expected that each enclave set their stack pointer

to an area fully contained withing EPC memory. Pa-

rameter input to the enclave is marshalled into buffers,

and once the transition is done, enclave code can copy

data directly from untrusted DRAM memory. This is

not part of the native SGX implementation, rather a

convenience provided by the application SDK.

Threads exit enclaves either voluntarily through

synchronous exit instructions, or asynchronously by

service of a hardware interrupt occurring on the af-

fected logical core. Synchronous exits, through the

EEXIT instruction, causes the logical processor to

leave enclave mode. The instruction pointer as well

as the stack pointers are restored to their prior address

before entering the enclave. SGX does not modify

any instructions on enclave exit and so it is the au-

thors’ responsibility to clear them, to avoid leaking

secret information. In the case of an Asynchronous

Enclave Exit (AEX), a hardware interrupt such as a

page fault causes the processor to exit the enclave and

jump down to the kernel in order to service the fault.

Prior to this, SGX saves the execution context into

EPC memory for safekeeping, before clearing it so

that the OS is not able to infer any execution state

from the enclave. When the interrupt handler is done,

SGX restores the execution context and resumes exe-

cution.

2.1 The Enclave Page Cache

Memory used by enclaves is separated at boot time

from regular process DRAM memory into what is

called Processor Reserved Memory (PRM). This con-

tiguous region of memory is divided into 4 kb pages,

collectively referred to as the Enclave Page Cache

(EPC). EPC memory is only accessible inside the en-

clave or via the SGX instruction set. Neither system

software running at protection ring 0 (kernel mode)

or application code at ring 3 (user mode) are able to

access its memory contents directly, and any attempt

to read or write to it is ignored. Furthermore, DMA

access to PRM memory is prohibited by hardware to

guard against malicious peripheral devices attempt-

ing to tap the system bus. The conﬁdentiality of the

enclave is guarded by Intel’s Memory Encryption En-

gine (MEE), which encrypts and decrypts memory at

the CPU package boundary, on the system bus right

after the L3 cache.

Similar to virtual memory, EPC page management

is handled entirely by the OS. However, EPC mem-

ory is not directly accessible to any system mode and

each page assignment is done through SGX instruc-

tions. The OS is responsible for assigning pages to

particular enclaves and evict pages to regular DRAM.

The current generation of SGX hardware only sup-

ports a maximum PRM size of 128 MB, but through

swapping, there are no practical limits to the size

of an enclave. The integrity of pages swapped out

is guaranteed by always checking an auxiliary data

structure also residing in PRM, called the Enclave

Page Cache Map. This datastructure contains the cor-

rect mappings between virtual addresses and Physical

PRM memory, as well as integrity checks for each

page. Each page can only belong to one enclave,

and as a consequence, shared memory between en-

claves is prohibited. They are however able to share

DRAM memory if residing inside the same process’

Performance of Trusted Computing in Cloud Infrastructures with Intel SGX

669

address space, and enclave memory is allowed to read

and write to untrusted memory inside that process.

The page eviction instruction also generates a liveness

challenge for each page, storing them in special EPC

pages for later comparison. These precautions guard

against a malicious OS trying to subvert an enclave by

either manipulating the address translation, explicitly

manipulating pages, or serving old pages back to the

enclave (replay attacks).

In order to guard against stale address translations

for executing enclaves, the processor does a coarse-

grained TLB shootdown for pages being evicted.

Page faults targeting a particular enclave will cause

the kernel to issue a Inter Processor Interrupt (IPI)

for all logical cores running inside of the enclaves in

question. This will cause each thread to do an AEX,

as mentioned above, and trap down to the kernel page

fault handler. Moreover, the lowermost 12 bits of the

virtual address at fault, stored in the CR2 registry, is

cleared so that the OS cannot infer any access pattern.

To amortize the cost of interrupting all cores execut-

ing inside a particular enclave for each page eviction,

the SGX implementation supports batching up to 16

page evictions together at a time.

2.2 Enclave Creation

SGX supports multiple mutually distrusting enclaves

on a single machine either within the same process’

address space or in different processes. Enclaves are

created by system software on behalf of an appli-

cation, issuing an ECREATE instruction. This will

cause SGX to allocate a new EPC page for the SGX

Enclave Control Structure (SECS) which stores meta-

data for each enclave. This is used by SGX instruc-

tions to identify enclaves, and among other things

map enclaves to physical EPC pages via the EPCM

structure. Before the enclave is ready for execut-

ing code, each initial code and data segment must be

added to enclave memory via the OS issuing specially

crafted instructions to the SGX implementation for

each page. The same instruction is also used to create

the TCS for each expected thread inside the enclave.

In addition, the OS driver issues updates for enclave

measurements used for software attestation. We re-

fer to the SGX developer manual for a description

of the SGX attestation process. When all pages are

loaded, the enclave is initialized and the enclave re-

ceives a launch token from a special pre-provisioned

enclave entrusted by Intel. At this point, the enclave

is considered fully initialized and no further memory

allocations may happen. Intels revised speciﬁcations

for SGX version 2 includes support for expanding en-

claves after initial creation by dynamic paging sup-

port. However, we refrain from further explanation as

hardware supporting these speciﬁcations has not been

released at this point.

When an enclave is destroyed, the inverse hap-

pens, as the OS marks each page used by the enclave

as invalid by the EREMOVE instruction. Before free-

ing the page, SGX makes sure that no logical pro-

cessor is executing inside the enclave that owns the

particular page. Finally, the SECS is deallocated if all

pages in the EPCM referring to that particular enclave

are deallocated.

3 EXPERIMENTS

To gain experience in how the next generation cloud-

based SaaS systems should be architected to best take

advantage of the SGX features in modern processors,

we ran a series of micro benchmarks on SGX-enabled

hardware. Our experimental setup consists of a Dell

Optiplex workstation with an Intel Core i5-6500 CPU

@ 3.20 GHz with four logical cores and 2 × 8 GB of

DDR3 DIMM DRAM. To avoid inaccuracies caused

by dynamic frequency scaling, Intel Speedstep and

CStates were disabled in all our experiments. To

measure the peak performance of the architecture, we

also altered the PRM size in hardware setup to be the

maximum allowed 128 MB. We run the experiments

on Ubuntu 14.04 using the open source kernel mod-

ule for Intel SGX.

We instrumented the SGX ker-

nel module to record the operational costs. Based on

our understanding of the system we derived different

benchmarks testing various features of the platform.

Common for all experiments is the observation that

more iterations did not yield a lower deviation. This

may be attributed to noise generated by the rest of the

system. This noise is subtle, but signiﬁcant since we

are measuring at ﬁne-grained time intervals.

Note that the current iteration of SGX prohibits

use of the RDTSC instruction inside of enclaves, and

as such there are no natively timing facilities avail-

able inside enclaves. A later release reveals that the

updated speciﬁcations for SGX version one does sup-

port RDTSC inside enclaves. Hints suggests that this

might be distributable by means of an update to the

microcode architecture. We were, however, unsuc-

cessful in obtaining this update. Time measurements

performed throughout this experiment must therefore

exit the enclave before being captured. As a conse-

quence, we can only measure the total time taken be-

tween entering and exiting an enclave described as the

sequence of events depicted in Figure 1.

https://github.com/01org/linux-sgx-driver

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

670

t Application

Enclave

enclave_ecall( )

t0_ocall( )

get_time( )

enclave_ecall( )

t0_ocall( )

t1_ocall( )

Δt

get_time( )

Figure 1: Sequence of events involved in measuring time

spent inside enclaves.

3.1 Entry and Exit Costs

In our ﬁrst experiments, we look at the cost of enter-

ing and leaving an enclave. Understanding this cost

is important as it dictates how SGX enabled SaaS ser-

vices can partition its functionality between enclaved

and non-enclaved execution to minimize TCB size. A

prohibitively large cost of entry would necessitate a

reduction in the number of entry calls, and thus in-

creasing the amount of code and data residing inside

of the enclave, increasing the required TCB. The ex-

treme case being a full library OS that include almost

all the functionality an application requires within the

enclave (Baumann et al., 2014). The Intel Software

Developer Manual

suggests that the cost of entering

an enclave is also a function of the size of the data

copied into the enclave as a part of the entry. Thus,

if experiments show that the cost of large amounts of

data entering the enclave is prohibitively large, only

data requiring conﬁdentiality should be copied into

the enclave.

100

150

200

250

300

0 65536 131072 196608 262144 327680 393216 458752

Time (ms)

Buﬀer Size (byte)

Figure 2: Enclave transition cost as a function of buffer size.

Figure 2 shows the measured cost as a function

of increasing buffer sizes. As shown in the ﬁgure,

https://software.intel.com/en-us/articles/intel-sdm

the cost of transitioning into enclaves increases lin-

early with the buffer size. This experiment only uses

buffers as parameter while transitioning into the en-

clave. To be able to host the buffer inside the enclave,

its heap size must be sufﬁciently large. The observed

baseline cost with no buffer is the bare transition cost

for entering enclaves. This cost quickly becomes in-

signiﬁcant as the buffer size increases. This behavior

is expected as this cost includes copying the buffers

into enclave memory on transitions, which invokes

the MEE for memory written to the enclave. To our

surprise, however, we observed that the baseline cost

only increased above 64 kb. One possible explanation

for this is that the pages may already be present in

EPC memory for buffer sizes smaller than 64 kb

For larger buffers the increased cost can also be

attributed to page faults caused by enclave memory

previously evicted to DRAM. This issue is further ex-

plored in the next experiment.

3.2 Paging

Another probable architectural trade-off is the logi-

cal assumption that an increase in TCB would reduce

enclave transitions but requires more PRM. As men-

tioned in Section 1, the PRM is a very limited re-

source in comparison to regular DRAM and the sys-

tem has a total of 128 MB of it. Moreover, any en-

clave is subject to the system software evicting EPC

pages when PRM resources becomes scarce. Any sys-

tem using SGX should factor in the cost of swapping

pages between PRM and regular DRAM. Figure 3 il-

lustrates this cost in enclaves as observed by both the

kernel and the user level enclave.

The y-axis is the discrete cost in nano seconds,

while the x-axis is time elapsed into the experiment.

We instrumented the OS kernel driver to measure the

time taken to evict pages out of EPC into DRAM de-

noted by red dots, as well as the total time spent inside

the page fault handler, shown by the black line.

20000

40000

60000

80000

100000

120000

0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07

latency (nano sec)

elapsed time

User level memory access

EPC kernel page fault

EPC kernel page eviction event

Figure 3: Paging overhead in nano seconds as a function of

time elapsed while writing sequentially to enclave memory.

Performance of Trusted Computing in Cloud Infrastructures with Intel SGX

671

The green line denotes user level instrumentation

measuring the time it takes to write to a particular

address in EPC memory. Similarly to the prior ex-

periment, we are prohibited to make timing measure-

ments inside enclaves. Therefore, the user level mea-

surements include the baseline cost of entry and exit

of an enclave, notably with 4 byte buffers transition-

ing each way.

To induce enclave page faults we set the total en-

clave heap size to 256 MB, which is larger than the to-

tal amount of available EPC memory. Furthermore, to

hit each page we invoke write operations to each ad-

dress within the 4 kb page size sequentially along the

allocated memory space inside the enclave. As men-

tioned in Section 2, the only time enclaves are able

to allocate memory is before the EINIT instruction

is called by issuing EADD. Therefore, all memory

must be allocated before enclave execution begins.

We can clearly see at the beginning of the experiment

an increase in page faults occurring when trying to ﬁt

256 MB of enclave memory into potentially 128 MB

of physical EPC memory.

Correlating the different events happening at user

level and kernel level we observe a strong relation-

ship between eviction events and increase in write

time at user level. One property of the system that

might increase this cost is the fact that evicting pages

causes AEX events for any logical processor execut-

ing within an enclave, as explained in Section 2.

We also observe that the kernel driver is operat-

ing very conservatively in terms of assigning EPC

pages to enclaves by the amount of page faults oc-

curring during execution. Moreover, as mentioned in

Section 2, the 12 lower bits of the virtual page fault

address is cleared by SGX before trapping down to

the page fault handler. Therefore, the driver is not

able to make any assumptions about memory access

patterns inside enclaves. Moreover, as Section 2 ex-

plains, liveness challenge data might also be evicted

of EPC memory, causing a cascade of page loads to

occur from DRAM. It is worth noting that our experi-

ment only uses one thread, and that all page evictions

issuing IPI only interrupt this single thread.

It is clear that high performance applications

might want to tune the OS support for paging to their

needs. If an application can predict a speciﬁc ac-

cess pattern, the kernel paging support should adapt

to this. Moreover, by optimizing towards exhaustive

use of the EPC memory, applications running inside

enclaves might be subject to fewer page faults.

Furthermore, initial setup will keep large amounts

of the enclave in memory, which might eliminate the

overhead of paging for some enclaves. This further-

more reduces overhead caused by IPI interrupts trig-

1e+08

2e+08

3e+08

4e+08

5e+08

6e+08

7e+08

8e+08

9e+08

1e+09

0 20 40 60 80 100

latency (nano sec)

# Enclaves

4 kb

16 kb

64 kb

256 kb

1 mb

4 mb

Figure 4: Latency as a function of number of enclaves cre-

ated simultaneously, for differing sizes of enclaves

gering AEX from the given enclave. Initially, the cre-

ation of large enclaves trigger memory allocations by

the kernel, and it might be necessary for application

developers to offset this initial cost by previsioning

enclaves.

3.3 Enclave Provisioning

Modern distributed system architectures increasingly

rely on modular programing paradigms and multi-

component software with possibly differing trust do-

mains. Such distributed systems often consist of sev-

eral third-party open source components, both trusted

and not. Moreover, separating both the unit of failure

and trust of such systems is often a good idea.

SGX supports the creation of multiple mutually

distrusting enclaves which can be used in such a mod-

ular design. As mentioned in Section 2 the SGX

programming model allows enclaves to communicate

with the outside using well deﬁned interfaces, which

lends itself to an architecture where trust is compart-

mentalized into separate enclaves. Figure 4 illustrates

the additional cost in terms of provisioning latency

as a function of enclaves created simultaneously, and

we can clearly observe that the added cost in en-

clave creation increases linearly. Through multiple

iterations of this experiment we observe the added

cost by increasing enclave sizes. As demonstrated,

this added cost becomes increasingly signiﬁcant when

provisioning multiple enclaves exceeding 256 kb in

size. As mentioned in Section 2, enclaves are created

by issuing specially crafted functions for each page of

code and data being allocated inside enclave memory.

It is worth mentioning that we observed a signiﬁcant

amount of page faults occurring during enclave cre-

ation, and it is reasonable to assume that this is also

contributing to the cost. Furthermore, the observa-

tions made about entry cost for buffer sizes less than

64 kb shown in Figure 2, is further corroborated by

the fact that for enclave sizes less than 64 kb the pro-

visioning costs are nearly identical.

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

672

For application software requiring low latency op-

eration it might be necessary to pre-provision en-

claves to offset this cost in latency. However, this

approach might cause additional problems with collo-

cating them in EPC memory if the individual enclaves

are sufﬁciently large.

4 DISCUSSION

From our experiments in Section 3, we have identiﬁed

several important performance idiosyncrasies of SGX

that should be considered when constructing SGX en-

abled cloud services: the cost of entering and exiting

enclaves, the cost of data copying, the cost of provi-

sioning new enclaves and the cost of memory usage.

As mentioned in Section 2, entry and exit proce-

dures do similar amounts of work in terms of cost. As

our experiments show, the most signiﬁcant cost factor

of transitioning is the buffer size input as argument to

the transition either through entry or exit. In particu-

lar, we observed a steep rise in data copy cost when

buffer sizes are larger than 64 kb. Our recommenda-

tion is therefore that:

Recommendation 1. Applications should partition

its functional components to minimize data copied

across enclave boundries.

One possible component architecture that follows

the guideline of Recommendation 1 would be to co-

locate all functionality into one single enclave, mak-

ing it largely self-sufﬁcient. An example a system fol-

lowing such an approach is Haven (Baumann et al.,

2014), which reduces the interface between trusted

and untrusted code by co-locating a larger part of

the system software stack inside a single enclave by

means of a library OS. The efﬁciency of this ap-

proach, however, directly contradicts the observation

we made in Section 3.2, where we measured the over-

head associated with enclave memory being paged

in and out to regular DRAM. Because the EPC is a

scarce resource, system software aggressively pages

out enclave memory not being used. However, as our

experiments show, the page fault handler is overea-

ger, and fails to fully utilize EPC memory exhaus-

tively. Because of security concerns, the kernel is not

given the exact faulting address of each enclave page

fault, and therefore does not make any assumptions as

to the memory access patterns. We therefore recom-

mend that:

Recommendation 2. The size of an enclave should

not exceed 64 kb.

Recommendation 3. Prior knowledge about ap-

plication’s memory consumption and access pattern

should be used to modify the SGX kernel module in

order to reduce memory page eviction.

As our experiment shows, enclave creation is

costly and time consuming. To hide some of this cost,

the underlying OS can pre-provision enclaves when-

ever usage patterns can be predicted. However, once

used, an enclave might be tainted with secret data.

Recycling used enclaves to a common pool can there-

fore potentially leeks secrets from one process to the

next: invalidating the isolation guarantees. We there-

fore recommend that:

Recommendation 4. Application authors that can

accurately predict before-the-fact usage of enclaves

should pre-provision enclaves in a disposable pool of

resources that guarantees no reuse between isolation

domains.

The cost of enclave creation must also factor in

the added baseline cost of metadata structures asso-

ciated with each enclave. Provisioning an enclaves

must at least account for its SECS, one TCS structure

for each logical core executed inside an enclave, and

one SSA for each thread performing AEX. (Costan

and Devadas, 2016) explains that to simplify imple-

mentation, most of these structures are allocated at

the beginning of a EPC page, wholly dedicated to

that instance. Therefore, it is not out of line to con-

sider an enclave with 4 logical cores, having 9 pages

(34 kb) allocated to it, excluding code and data seg-

ments. Applications should consider the added mem-

ory cost of separate enclaves in conjunction with the

relative amount of available EPC. Furthermore, to off-

set the cost of having multiple enclaves, application

authors should consider security separation at a con-

tinuous scale. Some security models might be content

with role based isolation, rather than call for an ex-

plicit isolation of all users individually. We therefore

recommend that:

Recommendation 5 . Application authors should

carefully consider the granularity of isolation re-

quired for their intended use, as a ﬁner granularity

includes the added cost of enclave creation.

At the time of writing, the only available hardware

supporting SGX are the Skylake generation Core

chips with SGX version 1. As our experiments show,

paging has a profound impact on performance, and

a natural follow-up would be to measure the perfor-

mance characteristics of the dynamic paging support

proposed in the SGX V2 speciﬁcations. However,

Performance of Trusted Computing in Cloud Infrastructures with Intel SGX

673

as mentioned earlier, Intel has yet to release any in-

formation regarding the arrival of SGX V2 enabled

chips. The imminent 8th generation Kaby Lake chips

do not include support, and the earliest likely release

will therefore as part of Cannon Lake in Q4 2017.

SGX supports attestation of software running on

top of untrusted platforms, by using signed hard-

ware measurements to establish trust between parties.

These parties could be either locally with two distinct

enclaves executing on the same hardware, or remotely

by help of Intel’s attestation service. In the future, it

would be interesting, in light of the large cost of en-

clave transition demonstrated above, to examine the

performance characteristics of a secure channel for

communication between enclaves.

5 RELATED WORK

Several previous works quantiﬁes various aspects of

the overhead associated with composite architectures

based on SGX. Haven (Baumann et al., 2014) im-

plements shielded execution of unmodiﬁed legacy ap-

plications by inserting a library OS entirely inside of

SGX enclaves. This effort resulted in architectural

changes to the SGX speciﬁcation to include, among

other things, support for dynamic paging. The proof

of concept implementation of Haven is only evalu-

ated in terms of legacy applications running on top

of the system. Furthermore, Haven was built on a

pre-release emulated version of SGX, and the perfor-

mance evaluation is not directly comparable to real

world applications. Overshadow (Chen et al., 2008)

provide similar capabilities as Haven, but does not

rely on dedicated hardware support.

SCONE (Arnautov et al., 2016) implements sup-

port for secure containers inside of SGX enclaves.

The design of SCONE is driven by experiments on

container designs pertaining to the TCB size inside

enclaves, in which, at the most extreme an entire li-

brary OS is included and at the minimum a stub in-

terface to application libraries. The evaluation of

SCONE is, much like the evaluation of Haven, based

on running legacy applications inside SCONE con-

tainers. While (Arnautov et al., 2016) make the same

conclusions with regards to TCB size versus memory

usage and enclave transition cost as (Baumann et al.,

2014), they do not quantify this cost. Despite this,

SCONE supplies a solution to the entry exit prob-

lem we outline in Section 3, where threads are pinned

inside enclaves, and do not transition to the outside.

Rather, communication happens by means of the en-

clave threads writing to a dedicated queue residing in

regular DRAM memory. This approach is still, how-

ever, vulnerable to theads being evicted from enclaves

by AEX caused by IPI as part of a page fault.

(Costan and Devadas, 2016) describe the archi-

tecture of SGX based on prior art, released developer

manuals, and patents. Furthermore, they conduct a

comprehensive security analysis of SGX, falsifying

some of its guarantees by explaining in detail ex-

ploitable vulnerabilities within the architecture. This

work is mostly orthogonal to our efforts, however,

we base most of our knowledge of SGX from this

treatment on the topic. These prior efforts lead

(Costan et al., 2016) to implement Sanctum, which

implements an alternative hardware architectural

extension providing many of the same properties

as SGX, but targeted towards the Rocket RISC-V

chip architecture. Sanctum evaluates its prototype

by simulated hardware, against an insecure baseline

without the proposed security properties. (McKeen

et al., 2016) introduce dynamic paging support to the

SGX speciﬁcations. This prototype hardware were

not available to us.

Ryoan (Hunt et al., 2016) attempts to solve

the same problems outlined in the introduction, by

implementing a distributed sandbox for facilitating

untrusted computation on secret data residing on third

party cloud services. Ryoan proposes a new request

oriented data-model where processing modules are

activated once without persisting data input to them.

Furthermore, by remote attestation, Ryoan is able to

verify the integrity of sandbox instances and protect

execution. By combining sandboxing techniques with

SGX, Ryoan is able to create a shielding construct

supporting mutually distrust between the application

and the infrastructure. Again, Ryoan is benchmarked

against real world applications, and just like other

prior work, does not correctly quantify the exact

overhead attributed to SGX primitives. Furthermore,

large parts of its evaluation is conducted in an

SGX emulator based on QEMU, which have been

retroﬁtted with delays and TLB ﬂushes based upon

real hardware measurements to better mirror real

SGX performance. These hardware measurements

are present for EENTRY and EEXIT instructions,

however do not attribute the cost of moving argument

data into and out of enclave memory. Moreover,

Ryoan speculate on the cost of SGX V2 paging

support, although strictly based on emulated mea-

surements, and assumptions about physical cost.

ARM TrustZone is a hardware security archi-

tecture that can be incorporated into ARMv7-

A, ARMv8-A and ARMv8-M on-chip sys-

tems (Ngabonziza et al., 2016; Shuja et al.,

2016). Although the underlying hardware design,

features, and interfaces differ substantially to SGX,

CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science

674

both essentially provide the same key concepts of

hardware isolated execution domains and the ability

to bootstrap attested software stacks into those

enclaves. However, the TrustZone hardware can only

distinguish between two execution domains, and

relies on having a software based trusted execution

environment for any further reﬁnements.

6 CONCLUSION

Online services are increasingly relying on third-party

cloud providers to host sensitive data. This tendency

brings forth strong concerns for the security and pri-

vacy of data owners as cloud providers cannot fully

be trusted to enforce the restrictive usage policies that

often govern such data. Intel SGX provides hardware

support for general trusted computing in commodity

hardware. These extensions to the x86 instruction set

establish trust through remote attestation of code and

data segments provisioned on non-trusted infrastruc-

ture, furthermore guaranteeing the conﬁdentiality and

integrity of these from potentially malicious system

software.

Prior efforts demonstrate the capabilities of SGX

through rigorous systems capable of hosting large

legacy applications securely inside enclaves. These

systems, however, do not quantify the exact cost as-

sociated with using SGX. This paper evaluates the mi-

cro architectural cost of entering and exiting enclaves,

the cost of data copying, the cost of provisioning new

enclaves and the cost of memory usage. From this,

we have derived ﬁve recommendations for application

authors wishing to secure their cloud-hosted privacy

sensitive data using SGX.

ACKNOWLEDGMENTS

This work was supported in part by the Norwegian

Research Council project numbers 231687/F20. We

would like to thank the anonymous reviewers for their

useful insights and comments.

REFERENCES

Anati, I., Gueron, S., Johnson, S., and Scarlata, V. (2013).

Innovative technology for cpu based attestation and

sealing. In Proceedings of the 2nd international work-

shop on hardware and architectural support for secu-

rity and privacy, volume 13.

Arnautov, S., Trach, B., Gregor, F., Knauth, T., Martin, A.,

Priebe, C., Lind, J., Muthukumaran, D., O’Keeffe, D.,

Stillwell, M. L., Goltzsche, D., Eyers, D., Kapitza,

R., Pietzuch, P., and Fetzer, C. (2016). Scone: Se-

cure linux containers with intel sgx. In 12th USENIX

Symposium on Operating Systems Design and Imple-

mentation (OSDI 16), pages 689–703, GA. USENIX

Association.

Baumann, A., Peinado, M., and Hunt, G. (2014). Shielding

applications from an untrusted cloud with Haven. In

11th USENIX Symposium on Operating Systems De-

sign and Implementation (OSDI ’14). USENIX – Ad-

vanced Computing Systems Association.

Chen, X., Garﬁnkel, T., Lewis, E. C., Subrahmanyam, P.,

Waldspurger, C. A., Boneh, D., Dwoskin, J., and

Ports, D. R. (2008). Overshadow: A virtualization-

based approach to retroﬁtting protection in commod-

ity operating systems. In Proc. of the 13th Interna-

tional Conference on Architectural Support for Pro-

gramming Languages and Operating Systems, ASP-

LOS XIII, pages 2–13, New York, NY, USA. ACM.

Costan, V. and Devadas, S. (2016). Intel sgx explained. In

Cryptology ePrint Archive.

Costan, V., Lebedev, I., and Devadas, S. (2016). Sanctum:

Minimal hardware extensions for strong software iso-

lation. In USENIX Security, volume 16, pages 857–

874.

Gjerdrum, A. T., Johansen, H. D., and Johansen, D. (2016).

Implementing informed consent as information-ﬂow

policies for secure analytics on eHealth data: Princi-

ples and practices. In Proc. of the IEEE Conference

on Connected Health: Applications, Systems and En-

gineering Technologies: The 1st International Work-

shop on Security, Privacy, and Trustworthiness in

Medical Cyber-Physical System, CHASE ’16. IEEE.

Hunt, T., Zhu, Z., Xu, Y., Peter, S., and Witchel, E.

(2016). Ryoan: A distributed sandbox for untrusted

computation on secret data. In Proceedings of the

12th USENIX Conference on Operating Systems De-

sign and Implementation, OSDI’16, pages 533–549,

Berkeley, CA, USA. USENIX Association.

Johansen, H. D., Birrell, E., Van Renesse, R., Schneider,

F. B., Stenhaug, M., and Johansen, D. (2015). Enforc-

ing privacy policies with meta-code. In Proceedings

of the 6th Asia-Paciﬁc Workshop on Systems, page 16.

ACM.

McKeen, F., Alexandrovich, I., Anati, I., Caspi, D., John-

son, S., Leslie-Hurd, R., and Rozas, C. (2016). Intel



software guard extensions (intel

 sgx) support for

dynamic memory management inside an enclave. In

Proceedings of the Hardware and Architectural Sup-

port for Security and Privacy 2016, page 10. ACM.

Ngabonziza, B., Martin, D., Bailey, A., Cho, H., and Mar-

tin, S. (2016). Trustzone explained: Architectural fea-

tures and use cases. In Collaboration and Internet

Computing (CIC), 2016 IEEE 2nd International Con-

ference on, pages 445–451. IEEE.

Osborn, J. D. and Challener, D. C. (2013). Trusted platform

module evolution. Johns Hopkins APL Technical Di-

gest, 32(2):536–543.

Shuja, J., Gani, A., Bilal, K., Khan, A. U. R., Madani, S. A.,

Khan, S. U., and Zomaya, A. Y. (2016). A survey of

mobile device virtualization: taxonomy and state of

the art. ACM Computing Surveys (CSUR), 49(1):1.

TCG Published (2011). TPM main part 1 design principles.

Speciﬁcation Version 1.2 Revision 116, Trusted Com-

puting Group.

Performance of Trusted Computing in Cloud Infrastructures with Intel SGX

675