Privacy Preserving Delegated Word Search in the Cloud

Kaoutar Elkhiyaoui, Melek

Onen and Reﬁk Molva

EURECOM, Sophia-Antipolis, France

Keywords:

Privacy Preserving Keyword Search, Delegation, Cloud.

Abstract:

In this paper, we address the problem of privacy preserving delegated word search in the cloud. We consider

a scenario where a data owner outsources its data to a cloud server and delegates the search capabilities to a

set of third party users. In the face of semi-honest cloud servers, the data owner does not want to disclose any

information about the outsourced data; yet it still wants to beneﬁt from the highly parallel cloud environment.

In addition, the data owner wants to ensure that delegating the search functionality to third parties does not

allow these third parties to jeopardize the conﬁdentiality of the outsourced data, neither does it prevent the

data owner from efﬁciently revoking the access of these authorized parties. To these ends, we propose a word

search protocol that builds upon techniques of keyed hash functions, oblivious pseudo-random functions and

Cuckoo hashing to construct a searchable index for the outsourced data, and uses private information retrieval

of short information to guarantee that word search queries do not reveal any information about the data to

the cloud server. Moreover, we combine attribute-based encryption and oblivious pseudo-random functions to

achieve an efﬁcient revocation of authorized third parties. The proposed scheme is suitable for the cloud as it

can be easily parallelized.

1 INTRODUCTION

The cloud computing paradigm offers clients the ease

of outsourcing the storage of their massive data with

the advantage of reducing cost and assuring availabil-

ity. Large-scale cloud infrastructures bring up severe

security and privacy issues: Apart from traditional se-

curity challenges, the outsourcedstorage of ”big data”

raises the challenge of processing it at the cloud in a

secure and privacy preserving manner while consider-

ing the cloud provider itself as a potential adversary.

While data owners (i.e. clients) can simply en-

crypt their data before outsourcing it to the cloud, tra-

ditional conﬁdentiality mechanisms fall short when

it comes to mining/processing the data. Recently,

several solutions have been proposed to allow the

search of words over encrypted data. In this paper

however, we address the problem of delegated word

search whereby in addition to the data owner itself,

some authorized third-parties can perform search op-

erations over private data. In addition to security and

privacy properties that classical search solutions as-

sure under a semi-honest (i.e., honest-but-curious) se-

curity model, a privacy preserving delegated word

search mechanism includes the delegation and revo-

cation operations: The data owner should be able to

remove the search capability of a third party at any

point in time through an efﬁcient revocation mecha-

nism.

We propose a new privacy preserving word search

solution whereby as in (Chor et al., 1997), the data

owner constructs a searchable index with all words

listed in its ﬁles and similarly to (Blass et al., 2012),

it applies a private information retrieval to guaran-

tee that the adversary including the cloud itself does

not discover any information about the search query

and its result. The newly proposed solution out-

performs existing ones thanks to a combination of

Cuckoo hashing with private information retrieval for

the search operation. The use of Cuckoo hashing

helps in assigning one word to a unique position in

the index, thus removing the probability of collisions

within the index: The data owner ﬁrst constructs a

conﬁdential index where each particular element cor-

responds to a unique word and ﬁlls it in with some

private information derived from the actual word.

The search operation consists of the computation of

the position corresponding to the queried word using

Cuckoo hashing, and building the corresponding PIR

query to be sent to the cloud provider.

Moreover, the delegation operation is assured

thanks to the use of attribute based encryption (ABE)

137

Elkhiyaoui K., Önen M. and Molva R..

Privacy Preserving Delegated Word Search in the Cloud.

DOI: 10.5220/0005054001370150

In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 137-150

ISBN: 978-989-758-045-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

which only allows users holding certain ”attributes”

to search over the data. For example, when compa-

nies outsource their logs over the cloud, they can al-

low some data protection commissioner to search over

them under an audit operation. Whereas efﬁcient re-

vocation is achieved by a combination of ABE and

oblivious pseudo random functions. The revocation

operation does not imply the re-encryption of the out-

sourced data and only requires an update of the access

policy by the data owner which can be considered as

a negligible cost.

The major contributions of the paper can be sum-

marized as follows:

• We propose a new word search protocol which

is based on an efﬁcient word-index construction

thanks to the use of Cuckoo hashing and the trans-

formation of PIR into privacy preserving word

search.

• The newly proposed solution also includes del-

egation and revocation capabilities thanks to the

use of Attribute Based Encryption and Oblivious

Pseudo Random Functions. The revocation oper-

ation does not incur any cost except for the update

of the access policy by the data owner.

• We deﬁne the main privacy requirements and fur-

ther provide a formal analysis of these properties.

Section 2 introduces the generic problem of pri-

vacy preserving delegated word search and the appli-

cation scenario. The different privacy requirements

are formally deﬁned in section 3. The ﬁrst version

of the privacy preserving word search solution is de-

scribed in section 4. The entire solution including the

delegation and revocation operations is presented in

section 5. We analyze the new solution in terms of se-

curity and performance in Sections 6 and 7. Finally,

Section 8 reviews the state of the art.

2 BACKGROUND

We consider a scenario where a data owner outsources

some privacy sensitive data to a cloud server and

wishes to later on perform some operations over it

without revealing any details about the data. The op-

eration we are focusing on is word search over en-

crypted data and in our scenario the data owner may

wish to delegate part of the search operations to au-

thorized third parties. An illustrative example of such

a requirement can be a scenario wherein due to regu-

latory matters, some data (such as logs) still need to

be searchable by third parties such as data protection

commissioners. The three entities involved in a pri-

vacy preserving delegated word search and the main

algorithms are formally deﬁned in the following sec-

tions.

2.1 Entities

A privacy preserving delegated word search involves

the following entities:

• Data Owner. O: It possesses a large ﬁle F that

it outsources to the cloud server S. Without loss

of generality, we assume that the number of dis-

tinct words in F is n and the corresponding set is

deﬁned as L

= {ω

,ω

,..., ω

}. Similarly to pre-

vious work such as (Curtmola et al., 2006; Blass

et al., 2012), we assume that once O outsources a

ﬁle F, it will no longer modify it.

• Cloud Server. S : It stores an encrypted version

of the outsourced ﬁle F and a searchable index I

of the set L

of “distinct” words present in F.

• Authorized User. U: It has access to a set of

credentials that enable it to perform search queries

on F. This authorized user could be an auditor

which as part of its auditing task has to search the

activity logs of O. We also note that in some cases

an authorized user could correspond to the data

owner that wants to perform word search on its

outsourced data.

2.2 Privacy Preserving Delegated

Word-search

In accordance with the work of (Curtmola et al.,

2006), a privacy preserving delegated word-search

comprises the following algorithms:

• Setup(ζ) → (MK,P ): It is a randomized algo-

rithm that is executed by the data owner O. It

takes as input the security parameter ζ, and out-

puts a master key MK and a set of public parame-

ters P that will be used by subsequent algorithms

to perform the word-search.

• Encrypt(MK,F) → C: This algorithm is run by

O. It has as input the master key MK and the ﬁle

F, and outputs an encryption C of ﬁle F.

• BuildIndex(MK, F) → I : This algorithm has as

input the master key MK and a ﬁle F and outputs

an index I of distinct words ω

present in F. This

algorithm is generally run by the data owner O.

• Delegate(MK,St

,id

) → K

: This algorithm is

executed by O to delegate the search capabilities

on its ﬁles to some third party user. On input of the

master key MK, the current state St

of O and the

identiﬁer id

of some user U, D elegate outputs a

secret key K

that will be provided to U.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

138

• To ken(ω,St

) → τ: This algorithm is exe-

cuted by authorized users or the data owner O to

generate a search token for some word ω. It takes

as input the word ω, the current state St

of autho-

rized user U and the key K

and outputs a search

token τ.

• Query(τ) → Q : It is a randomized algorithm that

is run by authorized users to generate word search

queries. On input of a token τ, Query outputs a

word search query Q that will be forwarded to

cloud server S.

• Resp onse(Q , I ) → R : This algorithm is invoked

by S whenever S receives a word search query Q .

It takes as input Q and the index I and outputs a

word search response R .

• Verify(R ,St

) →b: It is a deterministic algorithm

run by authorized users to verify S ’s responses.

On input of S’s response R and the current state

of authorized user U, Verify outputs a bit b =

1 if ω ∈ F and b = 0 otherwise.

• Revoke(MK,St

,id

) → (St

′

,St

′

): This algo-

rithm is run by the data owner O to revoke the

access of previously authorized users. It has as

input the master key MK, the current state St

data owner O and the identiﬁer id

of some previ-

ously authorized user U, and it outputs an updated

state St

′

for O and an updated state St

′

for cloud

server S.

3 ADVERSARY MODEL

The crucial privacy challenge to address when design-

ing a privacy preserving delegated word search is as-

suring privacy against a misbehaving cloud server. In-

deed, the cloud server may attempt to infer sensitive

information about the outsourced ﬁles (and their own-

ers thereof) from the ciphertexts and indexes it keeps.

It may also try to derive information about those ﬁles

from the word search queries it processes. Thus, it

is of utmost importance to ensure that the ciphertexts

and the indexes that the cloud stores together with the

word search queries it processes do not leak any in-

formation about the data owners’ ﬁles.

Furthermore, the delegation of search capabilities

to third party users inherently raises the requirements

of access authorization and revocation, and therewith

the requirement of privacy against revoked users. For

example, a previously authorized user may exploit the

information it collected during its word search oper-

ations that occurred when it was still authorized to

conduct lookup operation after its revocation so as

to learn new information about the outsourced ﬁles.

Therefore, one should ensure that even if revoked

users can still issue valid search queries to the cloud

server, they should not be able to decode the cloud

server’s responses.

Along these lines, we provide in the subsequent

sections formal models for the notions of both pri-

vacy against cloud servers and privacy against re-

voked users, which we will employ to assess the se-

curity of our scheme in the appendix of this paper.

Of course, solutions protected against misbehaving

clouds and revoked users are inherently secure against

any other type of external adversaries.

3.1 Privacy Against Cloud Server

In accordance with the work of (Blass et al., 2012)

and (Curtmola et al., 2006), we assume that the cloud

server S is semi-honest: Although interested in dis-

covering the content of the data and the queries, S

still performs all the required operations correctly.

A privacy preserving delegated word search

should ensure that the semi-honest cloud server S

does not discover any information about the content

of an outsourced ﬁle from either its encryption or its

index. This means that in addition to not being able

to break the conﬁdentiality of the outsourced data, S

should neither be able to mount statistical attacks on

the outsourced ﬁles (e.g. occurrence of words) nor to

tell whether two ﬁles contain (or do not contain) the

same words. In compliance with the work of (Blass

et al., 2012), we refer to this requirement as storage

privacy. Moreover, a solution for privacy preserv-

ing delegated word search should as well guarantee

query privacy: during the lookup phase, cloud server

S should not be able to derive any useful informa-

tion about the queries of authorized users. Namely,

S should not be able to tell whether any two word

search queries were issued for the same word or not

(cf. (Blass et al., 2012)).

To formally capture the adversarial capabilities of

S in the subsequent privacy deﬁnitions, we assume

that S is given access to the following oracles:

• O

encrypt

(F,MK) → C: This oracle takes a ﬁle F

and the master key MK of some data owner O as

inputs and computes an encryption C of ﬁle F by

calling the algorithm Encrypt.

• O

index

(F,MK) → I : On inputs of ﬁle F and the

master key MK, this oracle executes the algorithm

BuildIndex and returns the indexI associated with

ﬁle F.

• O

search,s

(I ,ω) → view

: Cloud server S invokes

this oracle whenever it wants to receive and pro-

cess a word search query. On inputs of index

PrivacyPreservingDelegatedWordSearchintheCloud

139

Algorithm 1: Learning phase of the storage pri-

vacy game.

// S calls oracles O

encrypt

and O

index

a polynomial

// number of times

← S ;

← O

encrypt

,MK);

← O

index

,MK);

//S returns a challenge word

∗

← S ;

Algorithm 2: Challenge phase of the storage

privacy game.

// Let F

∗

and F

∗

b e two ﬁles s.t. F

∗

contains ω

∗

// while F

∗

does not

b ← {0,1};

∗

← O

encrypt

∗

,MK);

∗

← O

index

∗

,MK);

∗

← S ;

I and word ω, this oracle starts an execution of

the word search protocol with cloud server S to

check whether ω is in I or not. At the end of the

word search operation, O

search,s

returns the view

view

= (St

,rand

1,s

2,s

,..., M

l,s

) of cloud

server S during the word search, where St

is the

current state of cloud server S, rand

is its inter-

nal randomness that it used to generate its word

search response and M

i,s

is the i

message that

S received during the word search from oracle

search,s

3.1.1 Storage Privacy

We deﬁne storage privacy using an

indistinguishability-based game that comprises

two phases: A learning phase (cf. Algorithm 1) and a

challenge phase (cf. Algorithm 2). The goal of cloud

server S in this game is to tell whether a challenge

ﬁle F

∗

contains some word ω

∗

. To this effect, cloud

server S calls the oracles O

encrypt

and O

index

for a

polynomial number of times in the learning phase.

By the end of this phase, S outputs a challenge word

∗

Let F

∗

and F

∗

be two ﬁles such that F

∗

contains

∗

while F

∗

does not.

Now in the challenge phase, cloud server S is pro-

vided with the encryption C

∗

and the index I

∗

of ﬁle

∗

where b is picked randomly from {0,1}. At the

end of the challenge phase, S outputs its guess b

∗

for

the bit b. We say that S succeeds in the storage pri-

vacy game if b = b

∗

Deﬁnition 1. [Storage Privacy] Let Π

success

denote

Algorithm 3: Learning phase of the query pri-

vacy game.

// S calls oracles O

encrypt

, O

index

, and O

search,s

// a po lynomial number of times

,ω

) ← S ;

← O

encrypt

,MK);

← O

index

,MK);

view

s,i

← O

search,s

(I ,ω

);

//S outputs achallenge ﬁle F

∗

and two distinct

// words ω

and ω

∗

,ω

∗

,ω

∗

) ← S ;

Algorithm 4: Challenge phase of the query pri-

vacy game.

∗

← O

encrypt

∗

,MK);

∗

← O

index

∗

,MK);

b ← {0,1};

view

∗

← O

search,s

∗

,ω

∗

);

∗

← S ;

the probability that S succeeds in the storage privacy

game. We say that a word search protocol assures

storage privacy, iff for any cloud server S , Π

success

≤

+ ε, where ε is a negligible function in the security

parameter ζ.

3.1.2 Query Privacy

Similarly to storage privacy, we formalize query pri-

vacy through an indistinguishability-based game that

runs in two phases: A learning phase and a challenge

phase. In the learning phase as depicted in Algo-

rithm 3, cloud server S picks adaptively a polynomial

number of ﬁle and word pairs (F

,ω

). For each se-

lected pair (F

,ω

), S calls ﬁrst the oracles O

encrypt

and O

index

to encrypt F and build the corresponding

index respectively, then it queries the oracle O

search,s

to receive and process a search query for word ω

. At the end of the learning phase, S outputs a chal-

lenge ﬁle F

∗

and two challenge words ω

∗

and ω

∗

In the challenge phase (cf. Algorithm 4), cloud

server S queries the oracles O

encrypt

and O

index

which

provide S with the encryption and the index of the

challenge ﬁle F

∗

respectively. Then, the oracle

search,s

executes an instance of the word search pro-

tocol for word ω

∗

with S, where b is a randomly se-

lected bit. Finally, S outputs its guess b

∗

for the bit b.

We say that S succeeds in the query privacy game if

b = b

∗

Deﬁnition 2. Let Π

success

denote the probability that

S succeeds in the query privacy game. We say that

SECRYPT2014-InternationalConferenceonSecurityandCryptography

140

a word search protocol ensures query privacy, iff for

any cloud server S, Π

success

≤

+ ε, where ε is a neg-

ligible function in the security parameter ζ.

3.2 Privacy Against Revoked Users

(”Forward Privacy”)

Ideally, a privacy preserving delegated word search

should assure that when an authorized user is revoked,

it can no longer look for words in the cloud server’s

ﬁles (this does not imply that the revoked user can-

not query the server’s database, rather it means that

it cannot successfully interpret the cloud server’s re-

sponses). In other words, a privacy preserving dele-

gated word search should make sure that even if a re-

voked user is able to issue word search queries, it can-

not infer any new information about the outsourced

ﬁles that it did not learn before its revocation. This

requirement resembles the notion of forward secrecy

whereby a user cannot have access to any data after its

revocation. In the context of word search in addition

to the content of the data, the revoked user should not

infer any additional information from future queries

as well.

Since in this paper we only focus on static data

(i.e. the data owner does not update its ﬁle once out-

sourced to the cloud server), we argue that the above

intuition can be captured by assuring that revoked

users cannot look up a word for which they did not

issue a search query when they were still authorized.

Without loss of generality, we assume that there

is a data owner O that outsources its ﬁle F and the

corresponding index I to cloud server S , and that a

user U is interested in searching the ﬁle F even after

its revocation. To this effect, U may behave mali-

ciously during the execution of the word search pro-

tocol. Namely, U may provide bogus word search

queries to cloud server S.

In order to formalize privacy against revoked

users, we use a privacy game that similarly to the two

previous games consists of a learning and a challenge

phase. In addition to the oracles O

encrypt

and O

index

user U has access to the following oracles.

• O

delegate

(MK) → K

: On input of the data owner

O’s master key MK, the oracle O

delegate

executes

the algorithm Delegate to allow U to perform

word search on O’s ﬁle F and outputs the secret

key K

• O

revoke

: This oracle revokes the right of U to

search the ﬁle F by executing the algorithm

Revoke which updates the states of data owner O

and cloud server S.

• O

search,u

(I ,ω) → view

: U calls this oracle

whenever it wants to perform a word search

on the index I . It takes as input an index I

and a word ω and outputs the view view

(St

,rand

1,u

2,u

,..., M

′

) of user U during

the word search, where St

is the current state of

user U and rand

is its internal randomness that it

used to generate its word search query, whereas

i,u

corresponds to the i

message that U re-

ceived from O

search,u

during the word search.

• O

chal,u

(I ,ω) → chal

u,b

: When called with an

index I and word ω, this oracle ﬂips a

random coin b ∈ {0, 1}. If b = 1, then

chal,u

returns the actual view chal

u,1

= view

(St

,rand

1,u

2,u

,..., M

′

) of user U during

the word search for ω, such that St

is the current

state of user U and rand

is its internal random-

ness, whereas M

i,u

corresponds to the i

message

that U received from O

search,u

during the word

search. If b = 0, then O

chal,u

outputs chal

u,0

(St

,rand

1,u

2,u

,..., M

′

), where St

is the

current state of user U and rand

is its internal

randomness, and M

i,u

are generated randomly by

chal,u

Once user U enters the learning phase of the pri-

vacy game (see Algorithm 5), it ﬁrst calls the oracle

index

with a ﬁle F of its choosing to get the cor-

responding index I . Next user U invokes the ora-

cle O

delegate

which supplies U with the secret key

. This key will enable U to execute the word

search protocol with cloud server S on the index I

and therewith on ﬁle F. Then user U queries the or-

acle O

search,u

for a polynomial number of words ω

of its choosing. Next, the oracle O

revoke

revokes U.

After the revocation, U can still issue a polynomial

number of word search queries on ﬁle F by calling

search,u

. Finally, U outputs a challenge word ω

∗

that

is not present in ﬁle F.

In the challenge phase (see Algorithm 6), U

queries the oracle O

chal,u

with the word ω

∗

and the

index I

∗

that corresponds to F ∪{ω

∗

}. The oracle

chal,u

in turn ﬂips a random coin b ∈ {0,1} and out-

puts the challenge viewchal

∗

u,b

. At the end of the chal-

lenge phase, revoked user U outputs a guess b

∗

for bit

We say that U succeeds in the game of privacy

against revoked users if i.) b = b

∗

and if ii.) U did not

issue a search query for the challenge word ω

∗

before

calling the oracle O

revoke

(i.e. ω

∗

6= ω

, ∀i).

Deﬁnition 3. Let Π

success

denote the probability that

U succeeds in the privacy game against revoked

users. We say that a delegated word search mech-

anism provides privacy against revoked users iff for

any revoked user U, Π

success

≤

+ ε, where ε is a

PrivacyPreservingDelegatedWordSearchintheCloud

141

Algorithm 5: Learning phase of the privacy

game against revoked users.

I ← O

index

(F,MK);

← O

delegate

(I );

// U calls O

search,u

for a polynomia l number of

// times

← U;

view

u,i

← O

search,u

(I ,ω

);

revoke

(U);

// U calls O

search,u

for a polynomia l number of

// times after revoca tion

′

← U;

view

′

u,i

← O

search,u

(I ,ω

′

);

//U returns a challenge word that is not in ﬁle F

∗

← U ;

Algorithm 6: Challenge phase of the privacy

game against revoked users.

∗

← O

index

(F ∪{ω

∗

},MK);

chal

∗

u,b

← O

chal,u

∗

,ω

∗

);

∗

← U;

negligible function in the security parameter ζ.

4 PRIVACY PRESERVING WORD

In this section, we describe the ﬁrst version of the

proposed word search solution which does not offer

any delegation capabilities and therefore only assures

privacy against honest-but-curious cloud providers.

Similarly to (Chor et al., 1997; Blass et al., 2012),

to assure query privacy against a semi-honest cloud

server, we rely on Private Information Retrieval (PIR)

to build our word-search scheme. Actually, PIR al-

lows a user to retrieve a data block from a server’s

database without disclosing any information about the

sought block. However, PIR protocols assume that

the user know beforehand the position in the database

of the data block to be retrieved, and therefore, they

cannot be used directly in privacy preserving word

search wherein a user only holds a list of words to

look for. Fortunately, (Chor et al., 1997) proposed a

technique that transforms any PIR mechanism into a

protocol for private information retrieval by keyword,

and thereby, into a privacy preserving word-search.

The main idea is to ﬁrst construct an index of all the

distinct words present in the outsourced data and then

apply a PIR to this index. As shown in (Chor et al.,

1997), this can be achieved by representing the index

by a hash-table that maps each word to a unique po-

sition in the table. During the search phase, the user

ﬁrst computes the position of the requested word in

the hashtable (i.e. the index) and further runs PIR

to fetch the block stored at that position. While the

construction of (Chor et al., 1997) can be easily trans-

formed into a privacy preserving word search, we be-

lieve that it can be further optimized by using Cuckoo

hashing to build the hashtables (i.e. the indexes) of

the words in the outsourced ﬁles.

Along these lines, we ﬁrst formalize and describe

the PIR and the Cuckoo hashing algorithms that will

underpin our word search solution.

4.1 Building Blocks

4.1.1 Trapdoor Private Information Retrieval

For efﬁciency purposes, we opt for a PIR mechanism

called trapdoor PIR which was proposed by (Trostle

and Parrish, 2010), and whose security is based on the

trapdoor group assumption. We stress however that

this particular PIR can be interchanged by any other

efﬁcient PIR algorithm.

In compliance with the work of (Trostle and Par-

rish, 2010), we model the server’s database on which

private information retrieval is performed by a binary

(k,l)−matrix M . Trapdoor PIR allows a user to re-

trieve the bit b at position (x,y) in M as follows:

• PIRQuery(x) →

α: The user picks a secret large

number p (typically |p| = 200 bits) and selects

randomly u ∈Z

∗

and k other values a

∈Z

. Next,

it computes the k following values: e

= 1 + 2·a

and ∀ i 6= x, e

= 2 · a

, and sends the vector

α = (α

)

i=1

= (u ·e

mod p)

i=1

to the cloud.

• PIRRespo nse(

α,M ) →

β: On receiving

α,

the server computes the matrix product

β =

(β

,β

,..., β

) =

α·M .

• PIRAnalysis(

β,y) → b: After receiving the

server’s response

β = (β

,β

,..., β

), the user

computes γ

= β

·u

−1

mod p, and retrieves b by

computing γ

mod 2.

4.1.2 Cuckoo Hashing

Cuckoo hashing was ﬁrst proposed by (Pagh and

Rodler, 2004) to build efﬁcient and practical data in-

dexes. It ensures worst-case constant look-up and

deletion time and amortized constant insertion time

while minimizing the storage requirements.

In order to store n elements in some index I ,

Cuckoo hashing uses two hash tables T and T

′

con-

taining L entries each, and two hash functions H :

SECRYPT2014-InternationalConferenceonSecurityandCryptography

142

{0,1}

∗

→{1,2,...,L} and H

′

: {0,1}

∗

→{1,2,...,L}.

Now, an element τ

is either stored in entry H(τ

) in

hash table T, or in entry H

′

(τ

) in hash table T

′

but

never in both.

The lookup operation in I is therefore simple:

When given an element τ ∈ {0, 1}

∗

, the two entries

at positions H(τ

) and H

′

(τ

) are queried in tables T

and T

′

respectively. To delete an element τ

from I ,

the entry corresponding to τ

is removed. Finally, to

insert a new element τ

∈{0, 1}

∗

into I , we ﬁrst check

whether the entry of T at position H(τ

) is empty. If

it is the case, then τ

is inserted in this entry of T and

the insertion algorithm converges. Otherwise, if that

entry is already occupied by another element τ

, then

will be removed from its current entry in T and re-

located to its other possible entry H

′

(τ

) in T

′

. Now,

if there is an element τ

in the entry H

′

(τ

) of T

′

, then

will be inserted in entry H

′

(τ

) in table T

′

while

will be moved to its other possible entry H(τ

) in

T. This insertion process is repeated iteratively until

the insertion of all elements in either T or T

′

. If this

process of insertion does not converge (i.e., there is an

element that cannot be inserted), or it takes too long to

converge, then all the elements in I will be rehashed

with new hash functions H and H

′

An analysis of Cuckoo hashing (Pagh, 2001)

shows that if L ≥n, then there is a family of universal

hash functions that guarantees a small rehashing prob-

ability of order O(

) and a constant expected time for

insertion. For a more comprehensive analysis of the

performance of Cuckoo hashing, the reader may refer

to (Pagh and Rodler, 2004).

4.2 Protocol Description

We recall that in this ﬁrst version, the data owner O

wants to upload a large ﬁle F to cloud server S and

once its data uploaded O wants to further search for

some words within the ﬁle without revealing any in-

formation to the semi-honest cloud server. The set

of all distinct words within F is deﬁned as L

{ω

,ω

,..., ω

}. The proposed protocol can be di-

vided into two main phases:

• During the upload phase, before outsourcing its

data, O builds the index corresponding to the n

distinct words present in ﬁle F and encrypts F us-

ing a semantically secure symmetric encryption.

• During the search phase, O computes the posi-

tion of the requested word ω in F’s index and

perform a PIR query to retrieve the information

stored at that position in the index. Upon recep-

tion of server S ’s PIR response, O veriﬁes this

response and decides accordingly whether ω is

present in F or not.

4.2.1 Setup

The data owner O calls the Setup algorithm which

takes as input the security parameter ζ and outputs a

master key MK and a set of public parameters P such

that:

• The master key MK is composed of a symmetric

encryption key K

enc

and a MAC key K

mac

• The public parameters P comprise a MAC H

mac

{0,1}

×{0,1}

∗

→ {0,1}

and a cryptographic

hash function H : {0,1}

∗

→ {0,1}

4.2.2 Upload

The ﬁle upload phase consists of i.) Encrypting the

ﬁle F using a semantically secure encryption such as

AES in counter mode (cf. Encrypt) and ii.) building

a searchable index for L

(cf. BuildIndex).

The data owner O ﬁrst generates a unique ﬁle

identiﬁer ﬁ d for ﬁle F and then encrypts F by call-

ing the algorithm Encrypt. This algorithm takes

as inputs secret key K

enc

and ﬁle F and outputs

a semantically secure encryption C = Enc(K

enc

,F)

of F. Next, O invokes the algorithm BuildIndex

which on input of master key MK (more precisely

MAC key K

mac

), ﬁle identiﬁer ﬁd and the list of dis-

tinct words L

= {ω

,ω

,..., ω

} present in F out-

puts a list of MACs L

= {h

...,h

}, such that

= H

mac

,ω

||ﬁd) where || denotes concatena-

tion. Then the algorithm BuildIndex constructs an

index I for L

= {h

...,h

} using Cuckoo hash-

ing. In order to optimize the performance of the

PIR underlying our word-search scheme, our index

will differ from traditional Cuckoo hashing indexes

by comprising two sets of t binary (rectangular) ma-

trices {M

}

j=1

,{M

′

}

j=1

of size (k,l) rather than

two hash-tables T and T

′

. Namely, instead of us-

ing two hash functions that hash into {1, 2,...,L}, we

employ two hash functions H and H

′

that hash into

{1,2,..., k}×{1,2,..., l}. For an element h ∈ {0,1}

∗

the hash function H (H

′

resp.) returns a position (x, y)

((x

′

) resp.) in matrices {M

} ({M

′

} resp.). More

precisely, the algorithm BuildIndex executes the fol-

lowing:

• First BuildIndex generates two sets of t binary ma-

trices {M

} and {M

′

} (1 ≤ j ≤ t) of size (k,l)

each, where each element is initialized to 0.

• BuildIndex then picks two hashes H and H

′

that

map each element h

in L

to either a position

) = H(h

) in matrices {M

} or to a position

′

) = H

′

) in matrices {M

′

}, by following

the Cuckoo hashing algorithm described in Sec-

tion 4.1.2. We recall that in order to ensure worst-

case constant look-up using Cuckoo hashing, k

PrivacyPreservingDelegatedWordSearchintheCloud

143

and l have to be chosen such that kl ≥ n, where

n is the size of L

• BuildIndex subsequently ﬁlls the binary matrices

} and {M

′

} (1 ≤ j ≤t) as follows:

– For each h

, BuildIndex computes H(h

) =

i,1

i,2

,..., b

i,t

), where H is a t−bits crypto-

graphic hash function.

– Now, if h

is mapped to a position (x

) =

H(h

) in M

(or to a position (x

′

) = H

′

)

in M

′

resp.), then the bit at position (x

) in

(the bit at position (x

′

) in M

′

resp.) will

be set to b

i, j

. Hence, if h

is mapped to a posi-

tion (x

) = H(h

) in {M

} (1 ≤ j ≤t), then:

H(h

) = (M

)

,..., M

)

• Finally, BuildIndex outputs the searchable

index I = {H, H

′

,M,M

′

} such that M =

,..., M

} and M

′

= {M

′

,..., M

′

At the end of this phase, data owner O sends the

ﬁle identiﬁer ﬁd, the encryption C and the index I to

cloud server S.

4.2.3 Word Search

The search phase is divided into the three following

steps:

Search Query. To look for a word ω in ﬁle F, O

calls the algorithm Token which computes the MAC

h = H

mac

,ω||ﬁd). Further,O runs the algorithm

Query which computes H(h) = (x, y) and H

′

(h) =

′

). We recall that (x,y) and (x

′

) correspond

to the potential position of h in {M

} and {M

′

} re-

spectively. Next, algorithm Query outputs two PIR

queries

α = PIRQuery(x) = (α

,α

,..., α

) and

′

PIRQuery (x

′

) = (α

′

,α

′

,..., α

′

) that will allow O to

retrieve the x

and x

′th

rows respectively of (k,l) bi-

nary matrices, as depicted in Section 4.1.1. Finally, O

sends its search query Q = (

α,

′

) to server S.

Search Response. On receiving O’s search

query Q = (

α,

′

), S runs algorithm Response

which on input of Q , M = {M

,..., M

}

and M

′

= {M

′

,..., M

′

}, computes two

sets of t PIR responses R = {

,...,

} and

′

= {

′

,...,

′

} such that for all 1 ≤ j ≤t:

= PIRResponse(

α,M

) =

α·M

′

= PIRResponse(

′

) =

′

·M

′

S sends then its word search response R =

{R,R

′

} to O.

Veriﬁcation. To verify whether ω is in ﬁle F, the

data owner O runs the algorithm Verify. When called,

algorithm Verify unblinds the y

element of each vec-

tor

by executing PIRAnalysis(y) and the y

′th

ele-

ment of each vector

′

by running PIRAnalysis(y

′

as was shown in Section 4.1.1. This allows Verify to

derive a bit b

from

and a bit b

′

from

′

respec-

tively for all 1 ≤ j ≤t.

We denote by

b and

′

the string of bits

,..., b

) and (b

′

,..., b

′

) respectively. After

obtaining

b and

′

, algorithm Verify computes the

hash H(h) and checks whether

b = H(h) or

′

H(h). If so, then Verify outputs1 meaning that ω ∈F;

otherwise, Verify outputs 0.

5 PRIVACY PRESERVING WORD

SEARCH WITH DELEGATION

In this section we describe the entire solution includ-

ing the delegation capabilities. We recall that data

owner O wants to: i.) upload a large ﬁle F that con-

tains n distinct words L

= {ω

,ω

,..., ω

} to cloud

server S, ii.) delegate the search capabilities on ﬁle F

to third party users and ﬁnally iii.) be able to revoke

these third party users at any point of time. There-

fore the ﬁnal solution involves in addition to the pre-

viously mentioned two phases from the basic proto-

col (i.e. Upload and WdSearch), a Delegation and a

Revocation phase. We modify the Upload and Word

Search phases so as to allow the data owner to up-

load the necessary material that will enable authorized

users to perform search operations, whereas during

the newly deﬁned Delegation phase, the data owner

provides authorized users with the MAC key used to

build the index. Finally, the Revocation phase is de-

ﬁned in order to grant the data owner the capability to

revoke authorized users efﬁciently.

The additional two phases are deﬁned thanks to

the use of Ciphertext-Policy Attribute-Based Encryp-

tion (CP-ABE) and Oblivious Pseudo Random Func-

tions (OPRF). We stress here that by combining

OPRF and ABE, we do not only allow for seamless

revocation but also we ensure the anonymity of autho-

rized users. As opposed to traditional access control

mechanisms, the proposed solution does not require

authorized users to identify and authenticate them-

selves to the cloud server.

Before providing a detailed description of our

scheme, we summarize and formalize in the next sec-

tion the algorithms underlying CP-ABE and OPRFs.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

144

5.1 Building Blocks

5.1.1 Ciphertext-policy Attribute-based

Encryption

A ciphertext-policy attribute-based encryption allows

a user to encrypt a message M under some access pol-

icy AP in such a way that only parties possessing at-

tributes that match AP can derive M from the cipher-

text. Actually, a CP-ABE consists of the following

algorithms, cf. (Bethencourt et al., 2007):

• Setup

abe

(ζ) → (MK

abe

): It is a randomized

algorithm that takes as input a security parameter

ζ, and outputs a master key MK

abe

and a set of

public parameters P

abe

that will be used by subse-

quent algorithms.

• Enc

abe

(M,AP) → C: It is a randomized algo-

rithm that takes as input a message M and some

access policy AP, and outputs a ciphertext C =

Enc

abe

(M,AP) such that only users holding the

attributes satisfying the access policy AP can de-

crypt C.

• CredGen

abe

(MK

abe

) → cred

: It is a random-

ized algorithm which on input of master key

abe

and a set of attributes A

, generates a set of

credentials cred

that are associated with A

. This

algorithm is generally executed by a trusted third

party (for instance a certiﬁcation authority) whose

aim is to deﬁne a set of admissible attributes A

and to issue credentials cred

to any user possess-

ing attributes A

⊂ A.

• Dec

abe

(C,cred

) →

M: It is a deterministic al-

gorithm that takes as input a ciphertext C and

a set of credentials cred

. Assume that C en-

crypts a message M under the access policy AP

(i.e., C = E nc

abe

(M,AP)) and that the credentials

cred

are associated with the set of attributes A

If the attributes A

satisfy the access policy AP,

then Dec

abe

decrypts C successfully and outputs

M = Dec

abe

(C,cred

) = M. Otherwise, the de-

cryption fails and Dec

abe

outputs

M =⊥.

5.1.2 Oblivious Pseudo-random Functions

An OPRF (Freedman et al., 2005; Jarecki and Liu,

2009) is a two-party protocol that allows a sender S

with input δ and a receiver R with input h to compute

jointly the function f

(h) for some pseudo-random

function family f

, in such a way that receiver R only

learns the value f

(h), whereas sender S learns noth-

ing from the protocol interaction.

Deﬁnition 4 (Oblivious Pseudo-Random Function

(Freedman et al., 2005)). A two-party protocol π be-

tween a sender S of input δ and a receiver R of in-

put h is said to be an oblivious pseudo-random func-

tion (OPRF), if there is some pseudo-random function

family f

such that at the end of the execution of π:

• Receiver R gets f

(h) while learning nothing

about S’s input δ.

• Sender S learns nothing about R’s input h or the

value of f

(h).

In the following, we provide a quick overview of

the generic algorithms underpinning an OPRF that

evaluates the output of some pseudo-random function

family f

• Setup

oprf

(ζ) →(δ,P

oprf

): It is a randomized algo-

rithm that is run by the sender S. It takes as input

the security parameter ζ and outputs an OPRF se-

cret key δ and a set of public parameters P

oprf

that

will be used by subsequent algorithms.

• Query

oprf

(h) → Q

oprf

: It is a randomized algo-

rithm that is executed by the receiver R when-

ever R wants to generate an OPRF query. This

algorithm has as input an element h ∈ {0, 1}

and

outputs a matching OPRF query Q

oprf

that will be

sent later to sender S.

• Response

oprf

,δ) →R

oprf

: It is a randomized

algorithm which is operated by sender S when-

ever S receives an OPRF query. On input of an

OPRF query Q

oprf

, the algorithm Response

oprf

re-

turns the corresponding OPRF response R

oprf

that

will be forwarded to the receiver.

• Result

oprf

,St

) → f

(h): It is deterministic

algorithm that is run by receiver R and takes as in-

put an OPRF response R

oprf

and the current state

of R. Without loss of generality, we assume

that R received the response R

oprf

as a follow-up

to a previous OPRF query that was generated for

h ∈{0,1}

. Accordingly, the algorithm Result

oprf

outputs f

(h), i.e. the evaluation of the pseudo-

random function f

at point h.

In the remainder of this paper, we employ the

OPRF proposed by (Jarecki and Liu, 2009) which al-

lows a receiver R and a sender S to compute jointly

the evaluation of the pseudo-random function f

(h) =

1/(δ+h)

for any h ∈Z

∗

, where N is an RSA safe mod-

ulus and g is a random generator of a group G of order

N. However for ease of exposition, we will omit the

implementation details of this OPRF and we will only

refer to the generic OPRF algorithms when describing

our scheme.

5.2 Protocol Description

In the sequel of this paper and in accordance with

the work of (Curtmola et al., 2006), we assume that

PrivacyPreservingDelegatedWordSearchintheCloud

145

the cloud server does not collude with revoked users.

We indicate that if such a collusion happens, then our

protocol will not be able to deter revoked users from

searching the outsourced ﬁles.

Without loss of generality, we also assume that

there is some certiﬁcation authority which is in charge

of: i.) deﬁning the universe of admissible attributes

A = {att

,att

,...}, ii.) providing potential data own-

ers and potential authorized users with their creden-

tials cred

that match their attributes A

⊂ A follow-

ing for instance the CP-ABE scheme proposed by

(Bethencourt et al., 2007).

5.2.1 Setup

As in the ﬁrst version of the protocol, the data owner

O calls the Setup algorithm which takes as input the

security parameter ζ and outputs a master key MK and

a set of public parameters P such that:

• The master key MK is composed of a symmet-

ric encryption key K

enc

, a MAC key K

mac

and an

OPRF secret key δ.

• The new public parameters P comprise a MAC

mac

: {0,1}

×{0,1}

∗

→ Z

∗

(where N is a safe

RSA modulus), a cryptographic hash function H :

{0,1}

∗

→ {0,1}

and the public parameters P

oprf

of the OPRF f

(h) = g

1/(δ+h)

5.2.2 Upload

The ﬁle upload phase amounts to i.) Encrypting

the ﬁle F using AES encryption (cf. Encrypt) ii.)

building a searchable index for L

(cf. BuildIndex).

Now instead of building the index I based on L

...,h

} as was done previously, the index

will be constructed using the OPRF values f

) =

1/(δ+h

)

. Since the computation of OPRF is deemed

to be demanding, we suggest that BuildIndex be exe-

cuted jointly by O and the semi-honest cloud server

S in such a way that O is only required to com-

pute symmetric operations (e.g. hash functions and

AES encryption) whereas the cloud server performs

the more computationally intensive operations (i.e.

OPRF and Cuckoo Hashing). Henceforth, we denote

BuildIndex

the sub-algorithm of BuildIndex that is

executed by data owner O and BuildIndex

the sub-

algorithm of BuildIndex that is operated by cloud

server S.

Processing at the Data Owner. As in the previ-

ous protocol, data owner O ﬁrst generates a unique

ﬁle identiﬁer ﬁd for ﬁle F and then encrypts F by

calling the algorithm Encrypt which outputs an AES

encryption C = Enc(K

enc

,F) of F. Then, O in-

vokes the algorithm BuildIndex

which outputs a

list of MACs L

= {h

...,h

}, such that h

mac

,ω

||ﬁd). Next, O deﬁnes the access pol-

icy AP that will be associated with ﬁle F and ﬁ-

nally forwards (via a secure channel) the ﬁle iden-

tiﬁer ﬁd, the encryption C, the list of MACs L

,..., h

}, the access policy AP and the OPRF

secret key δ to cloud server S.

Processing at the Cloud. The processing at the

cloud comprises two operations. The ﬁrst one

is to compute OPRF over the MACs in L

,..., h

} using the secret key δ. The second

operation is to build an index with the resulting val-

ues using Cuckoo hashing. More precisely, upon re-

ceipt of ﬁle identiﬁer ﬁd, ciphertext C, list of keyed

hashes L

= {h

,..., h

}, access policy AP associ-

ated with C and the OPRF key δ, S calls the algorithm

BuildIndex

which proceeds as explained below:

• First, BuildIndex

computes τ

= f

) =

1/(δ+h

)

for all 1 ≤ i ≤ n.

• BuildIndex

prepares an index I for T =

{τ

,τ

,..., τ

} using Cuckoo hashing. Namely,

BuildIndex

generates two sets of t binary ma-

trices {M

} and {M

′

} (1 ≤ j ≤ t) of size (k,l)

each, where each element is initialized to 0.

BuildIndex

then selects two hashes H and H

′

that map each element τ

in T to either a position

) = H(τ

) in matrices {M

} or to a position

′

) = H

′

(τ

) in matrices {M

′

}, by executing

the Cuckoo hashing algorithm.

• BuildIndex

ﬁlls the binary matrices {M

} and

′

} (1 ≤ j ≤ t) similarly to the previous ver-

sion of the protocol. The only difference is that

instead of storing the hashes H(h

) in {M

} and

′

}, we store the hashes H(τ

• Finally, BuildIndex

outputs the searchable

index I = {H,H

′

,M,M

′

} such that M =

,..., M

} and M

′

= {M

′

,..., M

′

5.2.3 Delegation

To delegate the word search capabilities on the en-

crypted ﬁle F to third party users, data owner O

encrypts its MAC key K

mac

under its access pol-

icy AP using attribute-based encryption and provides

cloud server S with the resulting ciphertext C

mac

Enc

abe

mac

,AP). Thereafter,S publishes the cipher-

text C

mac

and the ﬁle identiﬁer ﬁd.

We note that an authorized user U will in principle

possesses a set of attributes A (and therewith a set

of credentials cred) that satisfy the access policy AP.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

146

Hence, U will be able to decrypt the ciphertext C

mac

using cred and derives the MAC key K

mac

. This MAC

key K

mac

will be then used by U to perform word

search on O’s ﬁle as will be shown in the next section.

5.2.4 Word Search

To search the encrypted ﬁle C for some word ω, the

authorized user U performs the following operations:

Token Generation. The token generation phase

consists of executing an OPRF protocol between the

authorized user U and the cloud server S, where U

corresponds to the receiver R and S to the sender S

(following the notations in Section 5.1.2). Conse-

quently, to generate a token τ for word ω, U executes

algorithm Token as follows:

• On inputs of the word ω, the ﬁle identiﬁer ﬁd

and the MAC key K

mac

, the algorithm Token ﬁrst

computes h = H

mac

,ω||ﬁd). Then it calls

the algorithm Query

oprf

which on input of h out-

puts an OPRF query Q

oprf

to evaluate f

(h) =

1/(δ+h)

. Next, the algorithm Token forwards the

OPRF query Q

oprf

to cloud server S.

• Upon receipt of Q

oprf

, S calls the OPRF algo-

rithm Response

oprf

. This algorithm uses the secret

OPRF key δ and the OPRF query Q

oprf

to output

an OPRF response R

oprf

Here instead of sending the OPRF response R

oprf

in clear to U, S will obfuscate it in such a way

that only an authorized (i.e. non-revoked) user

will be able to derive R

oprf

. This obfuscation is

performed as follows:

– S picks randomly a symmetric encryption key

′

enc

and encrypts the OPRF response R

oprf

us-

ing K

′

enc

and the semantically secure encryp-

tion Enc. This will result in a ciphertext C

′

Enc(K

′

enc

oprf

– Then it computes a CP attribute-based encryp-

tion C

enc

= Enc

abe

′

enc

,AP) of the encryption

key K

′

enc

under the access policy AP of the data

owner O.

Notice that in this manner, we make sure that

only authorized users will be able to decrypt the

OPRF response and therewith obtain the token

τ = f

(h) = g

1/(δ+h)

necessary to perform the

word search.

At the end of this step, S forwards the ciphertexts

′

and C

enc

to authorized user U.

• On receiving the ciphertexts C

′

and C

enc

, the al-

gorithm Token ﬁrst decrypts C

enc

using the cre-

dentials cred that U obtained from the CA and

gets K

′

enc

= Dec

abe

enc

,cred). Then it computes

the OPRF response R

oprf

by decrypting the ci-

phertext C

enc

using the secret key K

′

enc

. Next,

the algorithm Token calls the OPRF algorithm

Response

oprf

which takes as input R

oprf

and out-

puts consequently the word search token τ =

(h) = g

1/(δ+h)

Search Query. After obtaining the token τ cor-

responding to the word ω, U runs the algorithm

Query which ﬁrst computesH(τ) = (x, y) and H

′

(τ) =

′

). Then, as in the previous solution, it computes

two PIR queries (

α,

′

) to retrieve the x

and the x

′th

rowof a (k,l) binary matrix and sends the word search

query Q = (

α,

′

) to cloud server S.

Search Response. On receiving U’s search query

Q = (

α,

′

), cloud server S runs algorithm Respo nse

which computes the two sets of t PIR responses R =

{

,...,

}and R

′

= {

′

,...,

′

}such that for

all 1 ≤ j ≤t:

= PIRResponse(

α,M

) =

α·M

′

= PIRResponse(

′

) =

′

·M

′

S sends then its word search response R =

{R,R

′

} to U.

Veriﬁcation. To verify whether ω is in the en-

crypted ﬁle C, the authorized user U runs the original

algorithm Verify as described in Section 4.2.3. But

after obtaining

b and

′

, algorithm Verify computes

the hash H(τ) instead of the hash H(h) and checks

accordingly whether

b = H(τ) or

′

= H(τ). If it is

the case, then Verify outputs 1 meaning that ω ∈ F;

otherwise, Verify outputs 0.

5.2.5 Revocation

For sake of simplicity, we assume that the data owner

O revokes attributes att

∈ A instead of individual

users U. We believe that this assumption is sufﬁ-

cient in the context of our application as described in

Section 2, where the data owner delegates the word

search capabilities to regulators or auditors that are

not identiﬁed by their identities but by their attributes.

Now to revoke an attribute att

, O runs the algo-

rithm Revo ke which outputs a new access policy AP

′

that will be given to the cloud server S. For instance,

if we assume that the initial access policy AP of O

states that auditors from EU and the US can perform

word search on O’s ﬁles, then a revocation of attribute

US will lead to a new access policy AP

′

that says that

PrivacyPreservingDelegatedWordSearchintheCloud

147

only auditors from the EU can perform word search.

In this manner, auditors from the US will no longer

have access to O’s ﬁle.

6 PRIVACY ANALYSIS

In this section, we brieﬂy analyze the privacy prop-

erties of the proposed scheme. The interested reader

may refer to the full version of this paper (Elkhiyaoui

et al., 2014) for a more formal analysis.

6.1 Storage Privacy

Our scheme insures storage privacy thanks to the use

of semantically secure encryption and message au-

thentication code during the upload phase. Actually,

the semantically secure encryption assures that cloud

server S cannot derive any information about the ﬁle

F from its encryption C. In addition, by computing

MACs that not only depend on the words present in

the ﬁle but also on its unique identiﬁer, we ensure that

the index I does not leak any information about the

outsourced ﬁle. Notably, cloud server S cannot tell

whether two outsourced ﬁles have words in common

or not, based on their indexes.

6.2 Query Privacy

Query privacy is assured by the use of both OPRF and

PIR. On the one hand, OPRF allows authorized user

U to generate a word search token τ without disclos-

ing anything to cloud server S about the word ω that

U is interested in. On the other hand, PIR enables U

to preform word search on S ’s database while mak-

ing sure that S learns nothing about the word search

queries or their corresponding results.

6.3 Privacy Against Revoked Users

Since in this paper, we only focus on the case where

data owner O revokes attributes instead of individual

users, it follows that using for instance the CP-ABE

scheme proposed by (Bethencourt et al., 2007) suf-

ﬁces to ensure efﬁcient revocation. As shown in the

previous section, revocation is achieved by updating

the access policy associated with ﬁle F and by ex-

ploiting the properties of OPRF: Obfuscating S ’s re-

sponses during the token generation phase (cf. Sec-

tion 5.2) stops a revoked user from deriving new word

search tokens and consequently from verifying S’s re-

sponses.

Note also that even if revoked users gain access to

the cloud server’s database, they cannot decrypt the

content of the outsourced ﬁles as they do not have ac-

cess to the encryption key K

enc

. All they can achieve

is performing a dictionary attack on the index I using

the MAC key K

mac

and the OPRF secret key δ, which

can be computationally intensive.

7 PERFORMANCE EVALUATION

During the upload phase, the data owner is only re-

quired to encrypt the ﬁle to be outsourced using a

symmetric encryption and to compute a MAC h

for

each word ω

∈ L

. On the other hand, the cloud

server computes the OPRFs (i.e. tokens) τ

= f

)

and builds the corresponding index I by following

the algorithm of Cuckoo hashing. Although the com-

putation of the OPRF proposed in (Jarecki and Liu,

2009) may be deemed computationally demanding as

it calls for exponentiations, it can be efﬁciently par-

allelized at the cloud server. Actually, if the cloud

server possesses N machines for instance, it can pro-

vide each one of its machines with

fraction of the

list of MACs L

= {h

,..., h

} supplied by the

data owner. Each machine will consequently compute

exponentiations whose results will be given back to

the cloud server to construct the index I .

While some would argue that using PIR to com-

pute the responses of the cloud server to word search

queries is computationally intensive, we note that this

computation consists of matrix multiplications which

can easily be parallelized. Actually, the cloud server

can store at each one of its machine

-fraction of the

binary matrices {M

} and {M

′

}. Upon receipt of a

word search query, S forwards the PIR queries it re-

ceives to its N machines which accordingly compute

the corresponding PIR responses.

Furthermore, we emphasize that in this paper we

employ PIR to retrieve a hash of word search tokens

instead of their actual values. This fact drastically en-

hances the computation and the communication per-

formances of our scheme. For example, if we instan-

tiate the OPRF in the token generation phase with the

OPRF presented in (Jarecki and Liu, 2009), then we

will end up with tokens of size 1024 bits. This means

that if we retrieve the actual values of the token to per-

form word search, then each search query will consist

of retrieving 1024 bits which is far from being prac-

tical. Instead in our protocol, each search operation

consists of fetching t-bit (t is typically 80) hash. We

note also that setting the size (k, l) of the matrices

} and {M

′

} to (

√

tn,

) results in a minimal

communication cost of O(

√

tn).

Finally, we stress that contrary to related work

(Curtmola et al., 2006), revocation in our protocol

SECRYPT2014-InternationalConferenceonSecurityandCryptography

148

does not require the re-encryption of the outsourced

ﬁles. Rather, it only calls for an update of the access

policy of the data owner at the cloud server.

8 RELATED WORK

As opposed to the proposed solution, most of existing

word search mechanisms be them asymmetric (Bel-

lare et al., 2007; Boneh et al., 2004; Waters et al.,

2004) or symmetric (Curtmola et al., 2006; Kamara

et al., 2012; Song et al., 2000; Golle et al., 2004) seem

to guarantee query privacy partially: Indeed, in these

solutions, although the outsourced data and queries

are encrypted, the cloud can discover the response to

any encrypted query. Furthermore very few of current

solutions (Curtmola et al., 2006; Dong et al., 2008)

propose the ability to delegate the search operation;

unfortunately, these solutions provide the authorized

user with the data encryption key and therefore revo-

cation of a user requires the re-encryption of the en-

tirely outsourced data and the distribution of this new

key to the authorized users.

The ﬁrst solution which transforms an original

PIR mechanism into a privacy preserving word-search

solution is proposed by Chor et. al. in (Chor et al.,

1997). Similarly to our solution, in (Chor et al.,

1997), the owner of the data constructs an index based

on all distinct words in the outsourced ﬁle. This index

is a hash-table that is ﬁlled according to the perfect

hashing algorithm of (Fredman et al., 1984). Our so-

lution outperforms the solution in (Chor et al., 1997)

thanks to the use of Cuckoo hashing instead of perfect

hashing. Namely, in the scheme of (Chor et al., 1997),

a word search query consists of three PIR queries,

whereas in our protocol it is composed of two PIR

queries. Additionally, the PIR queries in the case of

Cuckoo hashing are independent. This implies that

the server can execute the two PIR instances in paral-

lel to respond to the word search query.

Another solution that resembles the proposed so-

lution is PRISM (Blass et al., 2012) where the cloud

constructs some binary matrices in which each cell

represents one or more words without knowing their

content and the owner sends PIR requests to retrieve

the content of one of these cells. Thanks to the use of

Cuckoo hashing, our solution outperforms the origi-

nal PRISM mechanism without lowering the security

level. PRISM deﬁnes a matrix in which each cell cor-

responds to one or more words; therefore, two words

can turn out to be represented by the same cell. In

order to decrease the probability of such collisions,

the data owner send multiple (q) queries for the same

word. In the newly proposed mechanism, the prob-

ability of collisions within the binary matrices is 0

and the data owner and/or the authorized user need

to send a single query for each word. Additionally,

PRISM does not offer any delegation capability and

a straightforward delegation operation would require

the distribution of the data encryption key to autho-

rized users which can increase privacy risks.

9 CONCLUSION

We introduced a protocol for privacy preserving del-

egated word search in the cloud. This protocol al-

lows a data owner to outsource its encrypted data to a

cloud server, while empowering the data owner with

the capability to delegate word search operations to

third parties. By employing keyed hash functions and

oblivious pseudo-random functions, we ensure that

authorized users only learn whether a given word is

in the outsourced ﬁles or not. In addition, we use pri-

vate information retrieval to make sure that the cloud

server cannot infer any information about the out-

sourced ﬁles from the execution of the word search

protocol. Furthermore, we combine attribute-based

encryption and oblivious pseudo-random functions to

accommodate efﬁcient revocation. Finally, the data

owner in our protocol is only required to perform

symmetric operations, whereas the computationally

intensive computations are performed by the cloud

server, and they can easily be parallelized.

ACKNOWLEDGEMENT

This work was partially funded by the Cloud Ac-

countability project - A4Cloud (grant EC 317550).

REFERENCES

Bellare, M., Boldyreva, A., and O’Neill, A. (2007).

Deterministic and efﬁciently searchable encryption.

In Proceedings of the 27th Annual International

Cryprology Conference on Advances in Cryptology,

(CRYPTO’07), pages 535–552.

Bethencourt, J., Sahai, A., and Waters, B. (2007).

Ciphertext-policy attribute-based encryption. In Secu-

rity and Privacy, 2007. SP ’07. IEEE Symposium on,

pages 321–334.

Blass, E.-O., di Pietro, R., Molva, R., and

Onen, M. (2012).

PRISM - Privacy-Preserving Search in MapReduce.

In Proceedings of the 12th Privacy Enhancing Tech-

nologies Symposium (PETS 2012). LNCS.

Boneh, D., Crescenzo, G. G., Ostrovsky, R., and Per-

siano, G. (2004). Public key encryption with keyword

PrivacyPreservingDelegatedWordSearchintheCloud

149

search. In Proceedings of Eurocrypt 2004, volume

3027, pages 506–522. LNCS.

Chor, B., Gilboa, N., and Naor, M. (1997). Private informa-

tion retrieval by keywords.

Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.

(2006). Searchable symmetric encryption: improved

deﬁnitions and efﬁcient constructions. In Proceedings

of the 13th ACM conference on Computer and com-

munications security, CCS ’06, pages 79–88. ACM.

Dong, C., Russello, G., and Dulay, N. (2008). Shared and

searchable encrypted data for untrusted servers. In

Proceeedings of the 22nd annual IFIP WG 11.3 work-

ing conference on Data and Applications Security,

pages 127–143, Berlin, Heidelberg. Springer-Verlag.

Elkhiyaoui, K.,

Onen, M., and Molva, R. (2014). Privacy

Preserving Delegated Word Search in the Cloud.

Fredman, M. L., Koml´os, J., and Szemer´edi, E. (1984).

Storing a Sparse Table with 0(1) Worst Case Access

Time. J. ACM, 31(3):538–544.

Freedman, M., Ishai, Y., Pinkas, B., and Reingold, O.

(2005). Keyword search and oblivious pseudorandom

functions. In Proceedings of the Second international

conference on Theory of Cryptography, TCC’05,

pages 303–324, Berlin, Heidelberg. Springer-Verlag.

Golle, P., Staddon, J., and Waters, B. (2004). Secure

conjunctive keyword search over encrypted data. In

Jakobsson, M., Yung, M., and Zhou, J., editors, Proc.

of the 2004 Applied Cryptography and Network Secu-

rity Conference, pages 31–45. LNCS 3089.

Jarecki, S. and Liu, X. (2009). Efﬁcient Oblivious Pseudo-

random Function with Applications to Adaptive OT

and Secure Computation of Set Intersection. In The-

ory of Cryptography, volume 5444 of Lecture Notes

in Computer Science, pages 577–594. Springer Berlin

Heidelberg.

Kamara, S., Papamanthou, C., and Roeder, T. (2012). Dy-

namic searchable symmetric encryption. In Proceed-

ings of the 2012 ACM conference on Computer and

communications security, CCS ’12, pages 965–976,

New York, NY, USA. ACM.

Pagh, R. (2001). On the cell probe complexity of member-

ship and perfect hashing. In Proceedings of the thirty-

third annual ACM symposium on Theory of comput-

ing, STOC ’01, pages 425–432, New York, NY, USA.

ACM.

Pagh, R. and Rodler, F. (2004). Cuckoo hashing. Journal of

Algorithms, 51(2):122–144.

Song, D. X., Wagner, D., and Perrig, A. (2000). Prac-

tical techniques for searches on encrypted data. In

Proceedings of the 2000 IEEE Symposium on Secu-

rity and Privacy, SP ’00, pages 44–, Washington, DC,

USA. IEEE Computer Society.

Trostle, J. and Parrish, A. (2010). Efﬁcient Computation-

ally Private Information Retrieval from Anonymity

or Trapdoor Groups. In Proceedings of Conference

on Information Security, pages 114–128, Boca Raton,

USA.

Waters, B. R., Balfanz, D., Durfee, G., and Smetters, D. K.

(2004). Building an encrypted and searchable audit

log. In Proceedings of NDSS’04.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

150