Privacy Preserving Delegated Word Search in the Cloud
Kaoutar Elkhiyaoui, Melek
¨
Onen and Refik Molva
EURECOM, Sophia-Antipolis, France
Keywords:
Privacy Preserving Keyword Search, Delegation, Cloud.
Abstract:
In this paper, we address the problem of privacy preserving delegated word search in the cloud. We consider
a scenario where a data owner outsources its data to a cloud server and delegates the search capabilities to a
set of third party users. In the face of semi-honest cloud servers, the data owner does not want to disclose any
information about the outsourced data; yet it still wants to benefit from the highly parallel cloud environment.
In addition, the data owner wants to ensure that delegating the search functionality to third parties does not
allow these third parties to jeopardize the confidentiality of the outsourced data, neither does it prevent the
data owner from efficiently revoking the access of these authorized parties. To these ends, we propose a word
search protocol that builds upon techniques of keyed hash functions, oblivious pseudo-random functions and
Cuckoo hashing to construct a searchable index for the outsourced data, and uses private information retrieval
of short information to guarantee that word search queries do not reveal any information about the data to
the cloud server. Moreover, we combine attribute-based encryption and oblivious pseudo-random functions to
achieve an efficient revocation of authorized third parties. The proposed scheme is suitable for the cloud as it
can be easily parallelized.
1 INTRODUCTION
The cloud computing paradigm offers clients the ease
of outsourcing the storage of their massive data with
the advantage of reducing cost and assuring availabil-
ity. Large-scale cloud infrastructures bring up severe
security and privacy issues: Apart from traditional se-
curity challenges, the outsourcedstorage of ”big data”
raises the challenge of processing it at the cloud in a
secure and privacy preserving manner while consider-
ing the cloud provider itself as a potential adversary.
While data owners (i.e. clients) can simply en-
crypt their data before outsourcing it to the cloud, tra-
ditional confidentiality mechanisms fall short when
it comes to mining/processing the data. Recently,
several solutions have been proposed to allow the
search of words over encrypted data. In this paper
however, we address the problem of delegated word
search whereby in addition to the data owner itself,
some authorized third-parties can perform search op-
erations over private data. In addition to security and
privacy properties that classical search solutions as-
sure under a semi-honest (i.e., honest-but-curious) se-
curity model, a privacy preserving delegated word
search mechanism includes the delegation and revo-
cation operations: The data owner should be able to
remove the search capability of a third party at any
point in time through an efficient revocation mecha-
nism.
We propose a new privacy preserving word search
solution whereby as in (Chor et al., 1997), the data
owner constructs a searchable index with all words
listed in its files and similarly to (Blass et al., 2012),
it applies a private information retrieval to guaran-
tee that the adversary including the cloud itself does
not discover any information about the search query
and its result. The newly proposed solution out-
performs existing ones thanks to a combination of
Cuckoo hashing with private information retrieval for
the search operation. The use of Cuckoo hashing
helps in assigning one word to a unique position in
the index, thus removing the probability of collisions
within the index: The data owner first constructs a
confidential index where each particular element cor-
responds to a unique word and fills it in with some
private information derived from the actual word.
The search operation consists of the computation of
the position corresponding to the queried word using
Cuckoo hashing, and building the corresponding PIR
query to be sent to the cloud provider.
Moreover, the delegation operation is assured
thanks to the use of attribute based encryption (ABE)
137
Elkhiyaoui K., Önen M. and Molva R..
Privacy Preserving Delegated Word Search in the Cloud.
DOI: 10.5220/0005054001370150
In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 137-150
ISBN: 978-989-758-045-1
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
which only allows users holding certain ”attributes”
to search over the data. For example, when compa-
nies outsource their logs over the cloud, they can al-
low some data protection commissioner to search over
them under an audit operation. Whereas efficient re-
vocation is achieved by a combination of ABE and
oblivious pseudo random functions. The revocation
operation does not imply the re-encryption of the out-
sourced data and only requires an update of the access
policy by the data owner which can be considered as
a negligible cost.
The major contributions of the paper can be sum-
marized as follows:
We propose a new word search protocol which
is based on an efficient word-index construction
thanks to the use of Cuckoo hashing and the trans-
formation of PIR into privacy preserving word
search.
The newly proposed solution also includes del-
egation and revocation capabilities thanks to the
use of Attribute Based Encryption and Oblivious
Pseudo Random Functions. The revocation oper-
ation does not incur any cost except for the update
of the access policy by the data owner.
We define the main privacy requirements and fur-
ther provide a formal analysis of these properties.
Section 2 introduces the generic problem of pri-
vacy preserving delegated word search and the appli-
cation scenario. The different privacy requirements
are formally defined in section 3. The first version
of the privacy preserving word search solution is de-
scribed in section 4. The entire solution including the
delegation and revocation operations is presented in
section 5. We analyze the new solution in terms of se-
curity and performance in Sections 6 and 7. Finally,
Section 8 reviews the state of the art.
2 BACKGROUND
We consider a scenario where a data owner outsources
some privacy sensitive data to a cloud server and
wishes to later on perform some operations over it
without revealing any details about the data. The op-
eration we are focusing on is word search over en-
crypted data and in our scenario the data owner may
wish to delegate part of the search operations to au-
thorized third parties. An illustrative example of such
a requirement can be a scenario wherein due to regu-
latory matters, some data (such as logs) still need to
be searchable by third parties such as data protection
commissioners. The three entities involved in a pri-
vacy preserving delegated word search and the main
algorithms are formally defined in the following sec-
tions.
2.1 Entities
A privacy preserving delegated word search involves
the following entities:
Data Owner. O: It possesses a large file F that
it outsources to the cloud server S. Without loss
of generality, we assume that the number of dis-
tinct words in F is n and the corresponding set is
defined as L
ω
= {ω
1
,ω
2
,..., ω
n
}. Similarly to pre-
vious work such as (Curtmola et al., 2006; Blass
et al., 2012), we assume that once O outsources a
file F, it will no longer modify it.
Cloud Server. S : It stores an encrypted version
of the outsourced file F and a searchable index I
of the set L
ω
of “distinct” words present in F.
Authorized User. U: It has access to a set of
credentials that enable it to perform search queries
on F. This authorized user could be an auditor
which as part of its auditing task has to search the
activity logs of O. We also note that in some cases
an authorized user could correspond to the data
owner that wants to perform word search on its
outsourced data.
2.2 Privacy Preserving Delegated
Word-search
In accordance with the work of (Curtmola et al.,
2006), a privacy preserving delegated word-search
comprises the following algorithms:
Setup(ζ) (MK,P ): It is a randomized algo-
rithm that is executed by the data owner O. It
takes as input the security parameter ζ, and out-
puts a master key MK and a set of public parame-
ters P that will be used by subsequent algorithms
to perform the word-search.
Encrypt(MK,F) C: This algorithm is run by
O. It has as input the master key MK and the file
F, and outputs an encryption C of file F.
BuildIndex(MK, F) I : This algorithm has as
input the master key MK and a file F and outputs
an index I of distinct words ω
i
present in F. This
algorithm is generally run by the data owner O.
Delegate(MK,St
o
,id
u
) K
u
: This algorithm is
executed by O to delegate the search capabilities
on its files to some third party user. On input of the
master key MK, the current state St
o
of O and the
identifier id
u
of some user U, D elegate outputs a
secret key K
u
that will be provided to U.
SECRYPT2014-InternationalConferenceonSecurityandCryptography
138
To ken(ω,St
u
,K
u
) τ: This algorithm is exe-
cuted by authorized users or the data owner O to
generate a search token for some word ω. It takes
as input the word ω, the current state St
u
of autho-
rized user U and the key K
u
and outputs a search
token τ.
Query(τ) Q : It is a randomized algorithm that
is run by authorized users to generate word search
queries. On input of a token τ, Query outputs a
word search query Q that will be forwarded to
cloud server S.
Resp onse(Q , I ) R : This algorithm is invoked
by S whenever S receives a word search query Q .
It takes as input Q and the index I and outputs a
word search response R .
Verify(R ,St
u
) b: It is a deterministic algorithm
run by authorized users to verify S s responses.
On input of Ss response R and the current state
St
u
of authorized user U, Verify outputs a bit b =
1 if ω F and b = 0 otherwise.
Revoke(MK,St
o
,id
u
) (St
o
,St
s
): This algo-
rithm is run by the data owner O to revoke the
access of previously authorized users. It has as
input the master key MK, the current state St
o
of
data owner O and the identifier id
u
of some previ-
ously authorized user U, and it outputs an updated
state St
o
for O and an updated state St
s
for cloud
server S.
3 ADVERSARY MODEL
The crucial privacy challenge to address when design-
ing a privacy preserving delegated word search is as-
suring privacy against a misbehaving cloud server. In-
deed, the cloud server may attempt to infer sensitive
information about the outsourced files (and their own-
ers thereof) from the ciphertexts and indexes it keeps.
It may also try to derive information about those files
from the word search queries it processes. Thus, it
is of utmost importance to ensure that the ciphertexts
and the indexes that the cloud stores together with the
word search queries it processes do not leak any in-
formation about the data owners’ files.
Furthermore, the delegation of search capabilities
to third party users inherently raises the requirements
of access authorization and revocation, and therewith
the requirement of privacy against revoked users. For
example, a previously authorized user may exploit the
information it collected during its word search oper-
ations that occurred when it was still authorized to
conduct lookup operation after its revocation so as
to learn new information about the outsourced files.
Therefore, one should ensure that even if revoked
users can still issue valid search queries to the cloud
server, they should not be able to decode the cloud
server’s responses.
Along these lines, we provide in the subsequent
sections formal models for the notions of both pri-
vacy against cloud servers and privacy against re-
voked users, which we will employ to assess the se-
curity of our scheme in the appendix of this paper.
Of course, solutions protected against misbehaving
clouds and revoked users are inherently secure against
any other type of external adversaries.
3.1 Privacy Against Cloud Server
In accordance with the work of (Blass et al., 2012)
and (Curtmola et al., 2006), we assume that the cloud
server S is semi-honest: Although interested in dis-
covering the content of the data and the queries, S
still performs all the required operations correctly.
A privacy preserving delegated word search
should ensure that the semi-honest cloud server S
does not discover any information about the content
of an outsourced file from either its encryption or its
index. This means that in addition to not being able
to break the confidentiality of the outsourced data, S
should neither be able to mount statistical attacks on
the outsourced files (e.g. occurrence of words) nor to
tell whether two files contain (or do not contain) the
same words. In compliance with the work of (Blass
et al., 2012), we refer to this requirement as storage
privacy. Moreover, a solution for privacy preserv-
ing delegated word search should as well guarantee
query privacy: during the lookup phase, cloud server
S should not be able to derive any useful informa-
tion about the queries of authorized users. Namely,
S should not be able to tell whether any two word
search queries were issued for the same word or not
(cf. (Blass et al., 2012)).
To formally capture the adversarial capabilities of
S in the subsequent privacy definitions, we assume
that S is given access to the following oracles:
O
encrypt
(F,MK) C: This oracle takes a file F
and the master key MK of some data owner O as
inputs and computes an encryption C of file F by
calling the algorithm Encrypt.
O
index
(F,MK) I : On inputs of file F and the
master key MK, this oracle executes the algorithm
BuildIndex and returns the indexI associated with
file F.
O
search,s
(I ,ω) view
s
: Cloud server S invokes
this oracle whenever it wants to receive and pro-
cess a word search query. On inputs of index
PrivacyPreservingDelegatedWordSearchintheCloud
139
Algorithm 1: Learning phase of the storage pri-
vacy game.
// S calls oracles O
encrypt
and O
index
a polynomial
// number of times
F
i
S ;
C
i
O
encrypt
(F
i
,MK);
I
i
O
index
(F
i
,MK);
//S returns a challenge word
ω
S ;
Algorithm 2: Challenge phase of the storage
privacy game.
// Let F
0
and F
1
b e two files s.t. F
1
contains ω
// while F
0
does not
b {0,1};
C
b
O
encrypt
(F
b
,MK);
I
b
O
index
(F
b
,MK);
b
S ;
I and word ω, this oracle starts an execution of
the word search protocol with cloud server S to
check whether ω is in I or not. At the end of the
word search operation, O
search,s
returns the view
view
s
= (St
s
,rand
s
,M
1,s
,M
2,s
,..., M
l,s
) of cloud
server S during the word search, where St
s
is the
current state of cloud server S, rand
s
is its inter-
nal randomness that it used to generate its word
search response and M
i,s
is the i
th
message that
S received during the word search from oracle
O
search,s
.
3.1.1 Storage Privacy
We define storage privacy using an
indistinguishability-based game that comprises
two phases: A learning phase (cf. Algorithm 1) and a
challenge phase (cf. Algorithm 2). The goal of cloud
server S in this game is to tell whether a challenge
file F
b
contains some word ω
. To this effect, cloud
server S calls the oracles O
encrypt
and O
index
for a
polynomial number of times in the learning phase.
By the end of this phase, S outputs a challenge word
ω
.
Let F
0
and F
1
be two files such that F
1
contains
ω
while F
0
does not.
Now in the challenge phase, cloud server S is pro-
vided with the encryption C
b
and the index I
b
of file
F
b
where b is picked randomly from {0,1}. At the
end of the challenge phase, S outputs its guess b
for
the bit b. We say that S succeeds in the storage pri-
vacy game if b = b
.
Definition 1. [Storage Privacy] Let Π
S
success
denote
Algorithm 3: Learning phase of the query pri-
vacy game.
// S calls oracles O
encrypt
, O
index
, and O
search,s
// a po lynomial number of times
(F
i
,ω
i
) S ;
C
i
O
encrypt
(F
i
,MK);
I
i
O
index
(F
i
,MK);
view
s,i
O
search,s
(I ,ω
i
);
//S outputs achallenge file F
and two distinct
// words ω
0
and ω
1
(F
,ω
0
,ω
1
) S ;
Algorithm 4: Challenge phase of the query pri-
vacy game.
C
O
encrypt
(F
,MK);
I
O
index
(F
,MK);
b {0,1};
view
s
O
search,s
(I
,ω
b
);
b
S ;
the probability that S succeeds in the storage privacy
game. We say that a word search protocol assures
storage privacy, iff for any cloud server S , Π
S
success
1
2
+ ε, where ε is a negligible function in the security
parameter ζ.
3.1.2 Query Privacy
Similarly to storage privacy, we formalize query pri-
vacy through an indistinguishability-based game that
runs in two phases: A learning phase and a challenge
phase. In the learning phase as depicted in Algo-
rithm 3, cloud server S picks adaptively a polynomial
number of file and word pairs (F
i
,ω
i
). For each se-
lected pair (F
i
,ω
i
), S calls first the oracles O
encrypt
and O
index
to encrypt F and build the corresponding
index respectively, then it queries the oracle O
search,s
to receive and process a search query for word ω
i
in
F
i
. At the end of the learning phase, S outputs a chal-
lenge file F
and two challenge words ω
0
and ω
1
.
In the challenge phase (cf. Algorithm 4), cloud
server S queries the oracles O
encrypt
and O
index
which
provide S with the encryption and the index of the
challenge file F
respectively. Then, the oracle
O
search,s
executes an instance of the word search pro-
tocol for word ω
b
with S, where b is a randomly se-
lected bit. Finally, S outputs its guess b
for the bit b.
We say that S succeeds in the query privacy game if
b = b
.
Definition 2. Let Π
S
success
denote the probability that
S succeeds in the query privacy game. We say that
SECRYPT2014-InternationalConferenceonSecurityandCryptography
140
a word search protocol ensures query privacy, iff for
any cloud server S, Π
S
success
1
2
+ ε, where ε is a neg-
ligible function in the security parameter ζ.
3.2 Privacy Against Revoked Users
(”Forward Privacy”)
Ideally, a privacy preserving delegated word search
should assure that when an authorized user is revoked,
it can no longer look for words in the cloud server’s
files (this does not imply that the revoked user can-
not query the server’s database, rather it means that
it cannot successfully interpret the cloud server’s re-
sponses). In other words, a privacy preserving dele-
gated word search should make sure that even if a re-
voked user is able to issue word search queries, it can-
not infer any new information about the outsourced
files that it did not learn before its revocation. This
requirement resembles the notion of forward secrecy
whereby a user cannot have access to any data after its
revocation. In the context of word search in addition
to the content of the data, the revoked user should not
infer any additional information from future queries
as well.
Since in this paper we only focus on static data
(i.e. the data owner does not update its file once out-
sourced to the cloud server), we argue that the above
intuition can be captured by assuring that revoked
users cannot look up a word for which they did not
issue a search query when they were still authorized.
Without loss of generality, we assume that there
is a data owner O that outsources its file F and the
corresponding index I to cloud server S , and that a
user U is interested in searching the file F even after
its revocation. To this effect, U may behave mali-
ciously during the execution of the word search pro-
tocol. Namely, U may provide bogus word search
queries to cloud server S.
In order to formalize privacy against revoked
users, we use a privacy game that similarly to the two
previous games consists of a learning and a challenge
phase. In addition to the oracles O
encrypt
and O
index
,
user U has access to the following oracles.
O
delegate
(MK) K
u
: On input of the data owner
Os master key MK, the oracle O
delegate
executes
the algorithm Delegate to allow U to perform
word search on Os file F and outputs the secret
key K
u
.
O
revoke
: This oracle revokes the right of U to
search the file F by executing the algorithm
Revoke which updates the states of data owner O
and cloud server S.
O
search,u
(I ,ω) view
u
: U calls this oracle
whenever it wants to perform a word search
on the index I . It takes as input an index I
and a word ω and outputs the view view
u
=
(St
u
,rand
u
,M
1,u
,M
2,u
,..., M
l
,u
) of user U during
the word search, where St
u
is the current state of
user U and rand
u
is its internal randomness that it
used to generate its word search query, whereas
M
i,u
corresponds to the i
th
message that U re-
ceived from O
search,u
during the word search.
O
chal,u
(I ,ω) chal
u,b
: When called with an
index I and word ω, this oracle flips a
random coin b {0, 1}. If b = 1, then
O
chal,u
returns the actual view chal
u,1
= view
u
=
(St
u
,rand
u
,M
1
1,u
,M
1
2,u
,..., M
1
l
,u
) of user U during
the word search for ω, such that St
u
is the current
state of user U and rand
u
is its internal random-
ness, whereas M
i,u
corresponds to the i
th
message
that U received from O
search,u
during the word
search. If b = 0, then O
chal,u
outputs chal
u,0
=
(St
u
,rand
u
,M
0
1,u
,M
0
2,u
,..., M
0
l
,u
), where St
u
is the
current state of user U and rand
u
is its internal
randomness, and M
0
i,u
are generated randomly by
O
chal,u
.
Once user U enters the learning phase of the pri-
vacy game (see Algorithm 5), it first calls the oracle
O
index
with a file F of its choosing to get the cor-
responding index I . Next user U invokes the ora-
cle O
delegate
which supplies U with the secret key
K
u
. This key will enable U to execute the word
search protocol with cloud server S on the index I
and therewith on file F. Then user U queries the or-
acle O
search,u
for a polynomial number of words ω
i
of its choosing. Next, the oracle O
revoke
revokes U.
After the revocation, U can still issue a polynomial
number of word search queries on file F by calling
O
search,u
. Finally, U outputs a challenge word ω
that
is not present in file F.
In the challenge phase (see Algorithm 6), U
queries the oracle O
chal,u
with the word ω
and the
index I
that corresponds to F {ω
}. The oracle
O
chal,u
in turn flips a random coin b {0,1} and out-
puts the challenge viewchal
u,b
. At the end of the chal-
lenge phase, revoked user U outputs a guess b
for bit
b.
We say that U succeeds in the game of privacy
against revoked users if i.) b = b
and if ii.) U did not
issue a search query for the challenge word ω
before
calling the oracle O
revoke
(i.e. ω
6= ω
i
, i).
Definition 3. Let Π
U
success
denote the probability that
U succeeds in the privacy game against revoked
users. We say that a delegated word search mech-
anism provides privacy against revoked users iff for
any revoked user U, Π
U
success
1
2
+ ε, where ε is a
PrivacyPreservingDelegatedWordSearchintheCloud
141
Algorithm 5: Learning phase of the privacy
game against revoked users.
I O
index
(F,MK);
K
u
O
delegate
(I );
// U calls O
search,u
for a polynomia l number of
// times
ω
i
U;
view
u,i
O
search,u
(I ,ω
i
);
O
revoke
(U);
// U calls O
search,u
for a polynomia l number of
// times after revoca tion
ω
i
U;
view
u,i
O
search,u
(I ,ω
i
);
//U returns a challenge word that is not in file F
ω
U ;
Algorithm 6: Challenge phase of the privacy
game against revoked users.
I
O
index
(F {ω
},MK);
chal
u,b
O
chal,u
(I
,ω
);
b
U;
negligible function in the security parameter ζ.
4 PRIVACY PRESERVING WORD
SEARCH
In this section, we describe the first version of the
proposed word search solution which does not offer
any delegation capabilities and therefore only assures
privacy against honest-but-curious cloud providers.
Similarly to (Chor et al., 1997; Blass et al., 2012),
to assure query privacy against a semi-honest cloud
server, we rely on Private Information Retrieval (PIR)
to build our word-search scheme. Actually, PIR al-
lows a user to retrieve a data block from a server’s
database without disclosing any information about the
sought block. However, PIR protocols assume that
the user know beforehand the position in the database
of the data block to be retrieved, and therefore, they
cannot be used directly in privacy preserving word
search wherein a user only holds a list of words to
look for. Fortunately, (Chor et al., 1997) proposed a
technique that transforms any PIR mechanism into a
protocol for private information retrieval by keyword,
and thereby, into a privacy preserving word-search.
The main idea is to first construct an index of all the
distinct words present in the outsourced data and then
apply a PIR to this index. As shown in (Chor et al.,
1997), this can be achieved by representing the index
by a hash-table that maps each word to a unique po-
sition in the table. During the search phase, the user
first computes the position of the requested word in
the hashtable (i.e. the index) and further runs PIR
to fetch the block stored at that position. While the
construction of (Chor et al., 1997) can be easily trans-
formed into a privacy preserving word search, we be-
lieve that it can be further optimized by using Cuckoo
hashing to build the hashtables (i.e. the indexes) of
the words in the outsourced files.
Along these lines, we first formalize and describe
the PIR and the Cuckoo hashing algorithms that will
underpin our word search solution.
4.1 Building Blocks
4.1.1 Trapdoor Private Information Retrieval
For efficiency purposes, we opt for a PIR mechanism
called trapdoor PIR which was proposed by (Trostle
and Parrish, 2010), and whose security is based on the
trapdoor group assumption. We stress however that
this particular PIR can be interchanged by any other
efficient PIR algorithm.
In compliance with the work of (Trostle and Par-
rish, 2010), we model the server’s database on which
private information retrieval is performed by a binary
(k,l)matrix M . Trapdoor PIR allows a user to re-
trieve the bit b at position (x,y) in M as follows:
PIRQuery(x)
~
α: The user picks a secret large
number p (typically |p| = 200 bits) and selects
randomly u Z
p
and k other values a
i
Z
p
. Next,
it computes the k following values: e
x
= 1 + 2·a
x
and i 6= x, e
i
= 2 · a
i
, and sends the vector
~
α = (α
i
)
k
i=1
= (u ·e
i
mod p)
k
i=1
to the cloud.
PIRRespo nse(
~
α,M )
~
β: On receiving
~
α,
the server computes the matrix product
~
β =
(β
1
,β
2
,..., β
l
) =
~
α·M .
PIRAnalysis(
~
β,y) b: After receiving the
server’s response
~
β = (β
1
,β
2
,..., β
l
), the user
computes γ
y
= β
y
·u
1
mod p, and retrieves b by
computing γ
y
mod 2.
4.1.2 Cuckoo Hashing
Cuckoo hashing was first proposed by (Pagh and
Rodler, 2004) to build efficient and practical data in-
dexes. It ensures worst-case constant look-up and
deletion time and amortized constant insertion time
while minimizing the storage requirements.
In order to store n elements in some index I ,
Cuckoo hashing uses two hash tables T and T
con-
taining L entries each, and two hash functions H :
SECRYPT2014-InternationalConferenceonSecurityandCryptography
142
{0,1}
{1,2,...,L} and H
: {0,1}
{1,2,...,L}.
Now, an element τ
i
is either stored in entry H(τ
i
) in
hash table T, or in entry H
(τ
i
) in hash table T
but
never in both.
The lookup operation in I is therefore simple:
When given an element τ {0, 1}
, the two entries
at positions H(τ
i
) and H
(τ
i
) are queried in tables T
and T
respectively. To delete an element τ
i
from I ,
the entry corresponding to τ
i
is removed. Finally, to
insert a new element τ
i
{0, 1}
into I , we first check
whether the entry of T at position H(τ
i
) is empty. If
it is the case, then τ
i
is inserted in this entry of T and
the insertion algorithm converges. Otherwise, if that
entry is already occupied by another element τ
j
, then
τ
j
will be removed from its current entry in T and re-
located to its other possible entry H
(τ
j
) in T
. Now,
if there is an element τ
k
in the entry H
(τ
j
) of T
, then
τ
j
will be inserted in entry H
(τ
j
) in table T
while
τ
k
will be moved to its other possible entry H(τ
k
) in
T. This insertion process is repeated iteratively until
the insertion of all elements in either T or T
. If this
process of insertion does not converge (i.e., there is an
element that cannot be inserted), or it takes too long to
converge, then all the elements in I will be rehashed
with new hash functions H and H
.
An analysis of Cuckoo hashing (Pagh, 2001)
shows that if L n, then there is a family of universal
hash functions that guarantees a small rehashing prob-
ability of order O(
1
n
) and a constant expected time for
insertion. For a more comprehensive analysis of the
performance of Cuckoo hashing, the reader may refer
to (Pagh and Rodler, 2004).
4.2 Protocol Description
We recall that in this first version, the data owner O
wants to upload a large file F to cloud server S and
once its data uploaded O wants to further search for
some words within the file without revealing any in-
formation to the semi-honest cloud server. The set
of all distinct words within F is defined as L
ω
=
{ω
1
,ω
2
,..., ω
n
}. The proposed protocol can be di-
vided into two main phases:
During the upload phase, before outsourcing its
data, O builds the index corresponding to the n
distinct words present in file F and encrypts F us-
ing a semantically secure symmetric encryption.
During the search phase, O computes the posi-
tion of the requested word ω in Fs index and
perform a PIR query to retrieve the information
stored at that position in the index. Upon recep-
tion of server S s PIR response, O verifies this
response and decides accordingly whether ω is
present in F or not.
4.2.1 Setup
The data owner O calls the Setup algorithm which
takes as input the security parameter ζ and outputs a
master key MK and a set of public parameters P such
that:
The master key MK is composed of a symmetric
encryption key K
enc
and a MAC key K
mac
.
The public parameters P comprise a MAC H
mac
:
{0,1}
ζ
×{0,1}
{0,1}
κ
and a cryptographic
hash function H : {0,1}
{0,1}
t
.
4.2.2 Upload
The file upload phase consists of i.) Encrypting the
file F using a semantically secure encryption such as
AES in counter mode (cf. Encrypt) and ii.) building
a searchable index for L
ω
(cf. BuildIndex).
The data owner O first generates a unique file
identifier fi d for file F and then encrypts F by call-
ing the algorithm Encrypt. This algorithm takes
as inputs secret key K
enc
and file F and outputs
a semantically secure encryption C = Enc(K
enc
,F)
of F. Next, O invokes the algorithm BuildIndex
which on input of master key MK (more precisely
MAC key K
mac
), file identifier fid and the list of dis-
tinct words L
ω
= {ω
1
,ω
2
,..., ω
n
} present in F out-
puts a list of MACs L
H
= {h
1
,h
2
...,h
n
}, such that
h
i
= H
mac
(K
mac
,ω
i
||fid) where || denotes concatena-
tion. Then the algorithm BuildIndex constructs an
index I for L
H
= {h
1
,h
2
...,h
n
} using Cuckoo hash-
ing. In order to optimize the performance of the
PIR underlying our word-search scheme, our index
will differ from traditional Cuckoo hashing indexes
by comprising two sets of t binary (rectangular) ma-
trices {M
j
}
t
j=1
,{M
j
}
t
j=1
of size (k,l) rather than
two hash-tables T and T
. Namely, instead of us-
ing two hash functions that hash into {1, 2,...,L}, we
employ two hash functions H and H
that hash into
{1,2,..., k}×{1,2,..., l}. For an element h {0,1}
,
the hash function H (H
resp.) returns a position (x, y)
((x
,y
) resp.) in matrices {M
j
} ({M
j
} resp.). More
precisely, the algorithm BuildIndex executes the fol-
lowing:
First BuildIndex generates two sets of t binary ma-
trices {M
j
} and {M
j
} (1 j t) of size (k,l)
each, where each element is initialized to 0.
BuildIndex then picks two hashes H and H
that
map each element h
i
in L
H
to either a position
(x
i
,y
i
) = H(h
i
) in matrices {M
j
} or to a position
(x
i
,y
i
) = H
(h
i
) in matrices {M
j
}, by following
the Cuckoo hashing algorithm described in Sec-
tion 4.1.2. We recall that in order to ensure worst-
case constant look-up using Cuckoo hashing, k
PrivacyPreservingDelegatedWordSearchintheCloud
143
and l have to be chosen such that kl n, where
n is the size of L
H
.
BuildIndex subsequently fills the binary matrices
{M
j
} and {M
j
} (1 j t) as follows:
For each h
i
, BuildIndex computes H(h
i
) =
(b
i,1
,b
i,2
,..., b
i,t
), where H is a tbits crypto-
graphic hash function.
Now, if h
i
is mapped to a position (x
i
,y
i
) =
H(h
i
) in M
j
(or to a position (x
i
,y
i
) = H
(h
i
)
in M
j
resp.), then the bit at position (x
i
,y
i
) in
M
j
(the bit at position (x
i
,y
i
) in M
j
resp.) will
be set to b
i, j
. Hence, if h
i
is mapped to a posi-
tion (x
i
,y
i
) = H(h
i
) in {M
j
} (1 j t), then:
H(h
i
) = (M
1
(x
i
,y
i
)
,M
2
(x
i
,y
i
)
,..., M
t
(x
i
,y
i
)
)
Finally, BuildIndex outputs the searchable
index I = {H, H
,M,M
} such that M =
{M
1
,M
2
,..., M
t
} and M
= {M
1
,M
2
,..., M
t
}.
At the end of this phase, data owner O sends the
file identifier fid, the encryption C and the index I to
cloud server S.
4.2.3 Word Search
The search phase is divided into the three following
steps:
Search Query. To look for a word ω in file F, O
calls the algorithm Token which computes the MAC
h = H
mac
(K
mac
,ω||fid). Further,O runs the algorithm
Query which computes H(h) = (x, y) and H
(h) =
(x
,y
). We recall that (x,y) and (x
,y
) correspond
to the potential position of h in {M
j
} and {M
j
} re-
spectively. Next, algorithm Query outputs two PIR
queries
~
α = PIRQuery(x) = (α
1
,α
2
,..., α
k
) and
~
α
=
PIRQuery (x
) = (α
1
,α
2
,..., α
k
) that will allow O to
retrieve the x
th
and x
th
rows respectively of (k,l) bi-
nary matrices, as depicted in Section 4.1.1. Finally, O
sends its search query Q = (
~
α,
~
α
) to server S.
Search Response. On receiving Os search
query Q = (
~
α,
~
α
), S runs algorithm Response
which on input of Q , M = {M
1
,M
2
,..., M
t
}
and M
= {M
1
,M
2
,..., M
t
}, computes two
sets of t PIR responses R = {
~
β
1
,
~
β
2
,...,
~
β
t
} and
R
= {
~
β
1
,
~
β
2
,...,
~
β
t
} such that for all 1 j t:
~
β
j
= PIRResponse(
~
α,M
j
) =
~
α·M
j
~
β
j
= PIRResponse(
~
α
,M
j
) =
~
α
·M
j
S sends then its word search response R =
{R,R
} to O.
Verification. To verify whether ω is in file F, the
data owner O runs the algorithm Verify. When called,
algorithm Verify unblinds the y
th
element of each vec-
tor
~
β
j
by executing PIRAnalysis(y) and the y
th
ele-
ment of each vector
~
β
j
by running PIRAnalysis(y
),
as was shown in Section 4.1.1. This allows Verify to
derive a bit b
j
from
~
β
j
and a bit b
j
from
~
β
j
respec-
tively for all 1 j t.
We denote by
~
b and
~
b
the string of bits
(b
1
,b
2
,..., b
t
) and (b
1
,b
2
,..., b
t
) respectively. After
obtaining
~
b and
~
b
, algorithm Verify computes the
hash H(h) and checks whether
~
b = H(h) or
~
b
=
H(h). If so, then Verify outputs1 meaning that ω F;
otherwise, Verify outputs 0.
5 PRIVACY PRESERVING WORD
SEARCH WITH DELEGATION
In this section we describe the entire solution includ-
ing the delegation capabilities. We recall that data
owner O wants to: i.) upload a large file F that con-
tains n distinct words L
ω
= {ω
1
,ω
2
,..., ω
n
} to cloud
server S, ii.) delegate the search capabilities on file F
to third party users and finally iii.) be able to revoke
these third party users at any point of time. There-
fore the final solution involves in addition to the pre-
viously mentioned two phases from the basic proto-
col (i.e. Upload and WdSearch), a Delegation and a
Revocation phase. We modify the Upload and Word
Search phases so as to allow the data owner to up-
load the necessary material that will enable authorized
users to perform search operations, whereas during
the newly defined Delegation phase, the data owner
provides authorized users with the MAC key used to
build the index. Finally, the Revocation phase is de-
fined in order to grant the data owner the capability to
revoke authorized users efficiently.
The additional two phases are defined thanks to
the use of Ciphertext-Policy Attribute-Based Encryp-
tion (CP-ABE) and Oblivious Pseudo Random Func-
tions (OPRF). We stress here that by combining
OPRF and ABE, we do not only allow for seamless
revocation but also we ensure the anonymity of autho-
rized users. As opposed to traditional access control
mechanisms, the proposed solution does not require
authorized users to identify and authenticate them-
selves to the cloud server.
Before providing a detailed description of our
scheme, we summarize and formalize in the next sec-
tion the algorithms underlying CP-ABE and OPRFs.
SECRYPT2014-InternationalConferenceonSecurityandCryptography
144
5.1 Building Blocks
5.1.1 Ciphertext-policy Attribute-based
Encryption
A ciphertext-policy attribute-based encryption allows
a user to encrypt a message M under some access pol-
icy AP in such a way that only parties possessing at-
tributes that match AP can derive M from the cipher-
text. Actually, a CP-ABE consists of the following
algorithms, cf. (Bethencourt et al., 2007):
Setup
abe
(ζ) (MK
abe
,P
abe
): It is a randomized
algorithm that takes as input a security parameter
ζ, and outputs a master key MK
abe
and a set of
public parameters P
abe
that will be used by subse-
quent algorithms.
Enc
abe
(M,AP) C: It is a randomized algo-
rithm that takes as input a message M and some
access policy AP, and outputs a ciphertext C =
Enc
abe
(M,AP) such that only users holding the
attributes satisfying the access policy AP can de-
crypt C.
CredGen
abe
(MK
abe
,A
i
) cred
i
: It is a random-
ized algorithm which on input of master key
MK
abe
and a set of attributes A
i
, generates a set of
credentials cred
i
that are associated with A
i
. This
algorithm is generally executed by a trusted third
party (for instance a certification authority) whose
aim is to define a set of admissible attributes A
and to issue credentials cred
i
to any user possess-
ing attributes A
i
A.
Dec
abe
(C,cred
i
)
ˆ
M: It is a deterministic al-
gorithm that takes as input a ciphertext C and
a set of credentials cred
i
. Assume that C en-
crypts a message M under the access policy AP
(i.e., C = E nc
abe
(M,AP)) and that the credentials
cred
i
are associated with the set of attributes A
i
.
If the attributes A
i
satisfy the access policy AP,
then Dec
abe
decrypts C successfully and outputs
ˆ
M = Dec
abe
(C,cred
i
) = M. Otherwise, the de-
cryption fails and Dec
abe
outputs
ˆ
M =.
5.1.2 Oblivious Pseudo-random Functions
An OPRF (Freedman et al., 2005; Jarecki and Liu,
2009) is a two-party protocol that allows a sender S
with input δ and a receiver R with input h to compute
jointly the function f
δ
(h) for some pseudo-random
function family f
δ
, in such a way that receiver R only
learns the value f
δ
(h), whereas sender S learns noth-
ing from the protocol interaction.
Definition 4 (Oblivious Pseudo-Random Function
(Freedman et al., 2005)). A two-party protocol π be-
tween a sender S of input δ and a receiver R of in-
put h is said to be an oblivious pseudo-random func-
tion (OPRF), if there is some pseudo-random function
family f
δ
such that at the end of the execution of π:
Receiver R gets f
δ
(h) while learning nothing
about S’s input δ.
Sender S learns nothing about R’s input h or the
value of f
δ
(h).
In the following, we provide a quick overview of
the generic algorithms underpinning an OPRF that
evaluates the output of some pseudo-random function
family f
δ
:
Setup
oprf
(ζ) (δ,P
oprf
): It is a randomized algo-
rithm that is run by the sender S. It takes as input
the security parameter ζ and outputs an OPRF se-
cret key δ and a set of public parameters P
oprf
that
will be used by subsequent algorithms.
Query
oprf
(h) Q
oprf
: It is a randomized algo-
rithm that is executed by the receiver R when-
ever R wants to generate an OPRF query. This
algorithm has as input an element h {0, 1}
κ
and
outputs a matching OPRF query Q
oprf
that will be
sent later to sender S.
Response
oprf
(Q
oprf
,δ) R
oprf
: It is a randomized
algorithm which is operated by sender S when-
ever S receives an OPRF query. On input of an
OPRF query Q
oprf
, the algorithm Response
oprf
re-
turns the corresponding OPRF response R
oprf
that
will be forwarded to the receiver.
Result
oprf
(R
oprf
,St
r
) f
δ
(h): It is deterministic
algorithm that is run by receiver R and takes as in-
put an OPRF response R
oprf
and the current state
St
r
of R. Without loss of generality, we assume
that R received the response R
oprf
as a follow-up
to a previous OPRF query that was generated for
h {0,1}
κ
. Accordingly, the algorithm Result
oprf
outputs f
δ
(h), i.e. the evaluation of the pseudo-
random function f
δ
at point h.
In the remainder of this paper, we employ the
OPRF proposed by (Jarecki and Liu, 2009) which al-
lows a receiver R and a sender S to compute jointly
the evaluation of the pseudo-random function f
δ
(h) =
g
1/(δ+h)
for any h Z
N
, where N is an RSA safe mod-
ulus and g is a random generator of a group G of order
N. However for ease of exposition, we will omit the
implementation details of this OPRF and we will only
refer to the generic OPRF algorithms when describing
our scheme.
5.2 Protocol Description
In the sequel of this paper and in accordance with
the work of (Curtmola et al., 2006), we assume that
PrivacyPreservingDelegatedWordSearchintheCloud
145
the cloud server does not collude with revoked users.
We indicate that if such a collusion happens, then our
protocol will not be able to deter revoked users from
searching the outsourced files.
Without loss of generality, we also assume that
there is some certification authority which is in charge
of: i.) defining the universe of admissible attributes
A = {att
1
,att
2
,...}, ii.) providing potential data own-
ers and potential authorized users with their creden-
tials cred
i
that match their attributes A
i
A follow-
ing for instance the CP-ABE scheme proposed by
(Bethencourt et al., 2007).
5.2.1 Setup
As in the first version of the protocol, the data owner
O calls the Setup algorithm which takes as input the
security parameter ζ and outputs a master key MK and
a set of public parameters P such that:
The master key MK is composed of a symmet-
ric encryption key K
enc
, a MAC key K
mac
and an
OPRF secret key δ.
The new public parameters P comprise a MAC
H
mac
: {0,1}
ζ
×{0,1}
Z
N
(where N is a safe
RSA modulus), a cryptographic hash function H :
{0,1}
{0,1}
t
and the public parameters P
oprf
of the OPRF f
δ
(h) = g
1/(δ+h)
.
5.2.2 Upload
The file upload phase amounts to i.) Encrypting
the file F using AES encryption (cf. Encrypt) ii.)
building a searchable index for L
ω
(cf. BuildIndex).
Now instead of building the index I based on L
H
=
{h
1
,h
2
...,h
n
} as was done previously, the index
will be constructed using the OPRF values f
δ
(h
i
) =
g
1/(δ+h
i
)
. Since the computation of OPRF is deemed
to be demanding, we suggest that BuildIndex be exe-
cuted jointly by O and the semi-honest cloud server
S in such a way that O is only required to com-
pute symmetric operations (e.g. hash functions and
AES encryption) whereas the cloud server performs
the more computationally intensive operations (i.e.
OPRF and Cuckoo Hashing). Henceforth, we denote
BuildIndex
O
the sub-algorithm of BuildIndex that is
executed by data owner O and BuildIndex
S
the sub-
algorithm of BuildIndex that is operated by cloud
server S.
Processing at the Data Owner. As in the previ-
ous protocol, data owner O first generates a unique
file identifier fid for file F and then encrypts F by
calling the algorithm Encrypt which outputs an AES
encryption C = Enc(K
enc
,F) of F. Then, O in-
vokes the algorithm BuildIndex
O
which outputs a
list of MACs L
H
= {h
1
,h
2
...,h
n
}, such that h
i
=
H
mac
(K
mac
,ω
i
||fid). Next, O defines the access pol-
icy AP that will be associated with file F and fi-
nally forwards (via a secure channel) the file iden-
tifier fid, the encryption C, the list of MACs L
H
=
{h
1
,h
2
,..., h
n
}, the access policy AP and the OPRF
secret key δ to cloud server S.
Processing at the Cloud. The processing at the
cloud comprises two operations. The first one
is to compute OPRF over the MACs in L
H
=
{h
1
,h
2
,..., h
n
} using the secret key δ. The second
operation is to build an index with the resulting val-
ues using Cuckoo hashing. More precisely, upon re-
ceipt of file identifier fid, ciphertext C, list of keyed
hashes L
H
= {h
1
,h
2
,..., h
n
}, access policy AP associ-
ated with C and the OPRF key δ, S calls the algorithm
BuildIndex
S
which proceeds as explained below:
First, BuildIndex
S
computes τ
i
= f
δ
(h
i
) =
g
1/(δ+h
i
)
for all 1 i n.
BuildIndex
S
prepares an index I for T =
{τ
1
,τ
2
,..., τ
n
} using Cuckoo hashing. Namely,
BuildIndex
S
generates two sets of t binary ma-
trices {M
j
} and {M
j
} (1 j t) of size (k,l)
each, where each element is initialized to 0.
BuildIndex
S
then selects two hashes H and H
that map each element τ
i
in T to either a position
(x
i
,y
i
) = H(τ
i
) in matrices {M
j
} or to a position
(x
i
,y
i
) = H
(τ
i
) in matrices {M
j
}, by executing
the Cuckoo hashing algorithm.
BuildIndex
S
fills the binary matrices {M
j
} and
{M
j
} (1 j t) similarly to the previous ver-
sion of the protocol. The only difference is that
instead of storing the hashes H(h
i
) in {M
j
} and
{M
j
}, we store the hashes H(τ
i
).
Finally, BuildIndex
S
outputs the searchable
index I = {H,H
,M,M
} such that M =
{M
1
,M
2
,..., M
t
} and M
= {M
1
,M
2
,..., M
t
}.
5.2.3 Delegation
To delegate the word search capabilities on the en-
crypted file F to third party users, data owner O
encrypts its MAC key K
mac
under its access pol-
icy AP using attribute-based encryption and provides
cloud server S with the resulting ciphertext C
mac
=
Enc
abe
(K
mac
,AP). Thereafter,S publishes the cipher-
text C
mac
and the file identifier fid.
We note that an authorized user U will in principle
possesses a set of attributes A (and therewith a set
of credentials cred) that satisfy the access policy AP.
SECRYPT2014-InternationalConferenceonSecurityandCryptography
146
Hence, U will be able to decrypt the ciphertext C
mac
using cred and derives the MAC key K
mac
. This MAC
key K
mac
will be then used by U to perform word
search on Os file as will be shown in the next section.
5.2.4 Word Search
To search the encrypted file C for some word ω, the
authorized user U performs the following operations:
Token Generation. The token generation phase
consists of executing an OPRF protocol between the
authorized user U and the cloud server S, where U
corresponds to the receiver R and S to the sender S
(following the notations in Section 5.1.2). Conse-
quently, to generate a token τ for word ω, U executes
algorithm Token as follows:
On inputs of the word ω, the file identifier fid
and the MAC key K
mac
, the algorithm Token first
computes h = H
mac
(K
mac
,ω||fid). Then it calls
the algorithm Query
oprf
which on input of h out-
puts an OPRF query Q
oprf
to evaluate f
δ
(h) =
g
1/(δ+h)
. Next, the algorithm Token forwards the
OPRF query Q
oprf
to cloud server S.
Upon receipt of Q
oprf
, S calls the OPRF algo-
rithm Response
oprf
. This algorithm uses the secret
OPRF key δ and the OPRF query Q
oprf
to output
an OPRF response R
oprf
.
Here instead of sending the OPRF response R
oprf
in clear to U, S will obfuscate it in such a way
that only an authorized (i.e. non-revoked) user
will be able to derive R
oprf
. This obfuscation is
performed as follows:
S picks randomly a symmetric encryption key
K
enc
and encrypts the OPRF response R
oprf
us-
ing K
enc
and the semantically secure encryp-
tion Enc. This will result in a ciphertext C
=
Enc(K
enc
,R
oprf
).
Then it computes a CP attribute-based encryp-
tion C
enc
= Enc
abe
(K
enc
,AP) of the encryption
key K
enc
under the access policy AP of the data
owner O.
Notice that in this manner, we make sure that
only authorized users will be able to decrypt the
OPRF response and therewith obtain the token
τ = f
δ
(h) = g
1/(δ+h)
necessary to perform the
word search.
At the end of this step, S forwards the ciphertexts
C
and C
enc
to authorized user U.
On receiving the ciphertexts C
and C
enc
, the al-
gorithm Token first decrypts C
enc
using the cre-
dentials cred that U obtained from the CA and
gets K
enc
= Dec
abe
(C
enc
,cred). Then it computes
the OPRF response R
oprf
by decrypting the ci-
phertext C
enc
using the secret key K
enc
. Next,
the algorithm Token calls the OPRF algorithm
Response
oprf
which takes as input R
oprf
and out-
puts consequently the word search token τ =
f
δ
(h) = g
1/(δ+h)
.
Search Query. After obtaining the token τ cor-
responding to the word ω, U runs the algorithm
Query which first computesH(τ) = (x, y) and H
(τ) =
(x
,y
). Then, as in the previous solution, it computes
two PIR queries (
~
α,
~
α
) to retrieve the x
th
and the x
th
rowof a (k,l) binary matrix and sends the word search
query Q = (
~
α,
~
α
) to cloud server S.
Search Response. On receiving Us search query
Q = (
~
α,
~
α
), cloud server S runs algorithm Respo nse
which computes the two sets of t PIR responses R =
{
~
β
1
,
~
β
2
,...,
~
β
t
}and R
= {
~
β
1
,
~
β
2
,...,
~
β
t
}such that for
all 1 j t:
~
β
j
= PIRResponse(
~
α,M
j
) =
~
α·M
j
~
β
j
= PIRResponse(
~
α
,M
j
) =
~
α
·M
j
S sends then its word search response R =
{R,R
} to U.
Verification. To verify whether ω is in the en-
crypted file C, the authorized user U runs the original
algorithm Verify as described in Section 4.2.3. But
after obtaining
~
b and
~
b
, algorithm Verify computes
the hash H(τ) instead of the hash H(h) and checks
accordingly whether
~
b = H(τ) or
~
b
= H(τ). If it is
the case, then Verify outputs 1 meaning that ω F;
otherwise, Verify outputs 0.
5.2.5 Revocation
For sake of simplicity, we assume that the data owner
O revokes attributes att
i
A instead of individual
users U. We believe that this assumption is suffi-
cient in the context of our application as described in
Section 2, where the data owner delegates the word
search capabilities to regulators or auditors that are
not identified by their identities but by their attributes.
Now to revoke an attribute att
i
, O runs the algo-
rithm Revo ke which outputs a new access policy AP
that will be given to the cloud server S. For instance,
if we assume that the initial access policy AP of O
states that auditors from EU and the US can perform
word search on Os files, then a revocation of attribute
US will lead to a new access policy AP
that says that
PrivacyPreservingDelegatedWordSearchintheCloud
147
only auditors from the EU can perform word search.
In this manner, auditors from the US will no longer
have access to Os file.
6 PRIVACY ANALYSIS
In this section, we briefly analyze the privacy prop-
erties of the proposed scheme. The interested reader
may refer to the full version of this paper (Elkhiyaoui
et al., 2014) for a more formal analysis.
6.1 Storage Privacy
Our scheme insures storage privacy thanks to the use
of semantically secure encryption and message au-
thentication code during the upload phase. Actually,
the semantically secure encryption assures that cloud
server S cannot derive any information about the file
F from its encryption C. In addition, by computing
MACs that not only depend on the words present in
the file but also on its unique identifier, we ensure that
the index I does not leak any information about the
outsourced file. Notably, cloud server S cannot tell
whether two outsourced files have words in common
or not, based on their indexes.
6.2 Query Privacy
Query privacy is assured by the use of both OPRF and
PIR. On the one hand, OPRF allows authorized user
U to generate a word search token τ without disclos-
ing anything to cloud server S about the word ω that
U is interested in. On the other hand, PIR enables U
to preform word search on S s database while mak-
ing sure that S learns nothing about the word search
queries or their corresponding results.
6.3 Privacy Against Revoked Users
Since in this paper, we only focus on the case where
data owner O revokes attributes instead of individual
users, it follows that using for instance the CP-ABE
scheme proposed by (Bethencourt et al., 2007) suf-
fices to ensure efficient revocation. As shown in the
previous section, revocation is achieved by updating
the access policy associated with file F and by ex-
ploiting the properties of OPRF: Obfuscating S s re-
sponses during the token generation phase (cf. Sec-
tion 5.2) stops a revoked user from deriving new word
search tokens and consequently from verifying Ss re-
sponses.
Note also that even if revoked users gain access to
the cloud server’s database, they cannot decrypt the
content of the outsourced files as they do not have ac-
cess to the encryption key K
enc
. All they can achieve
is performing a dictionary attack on the index I using
the MAC key K
mac
and the OPRF secret key δ, which
can be computationally intensive.
7 PERFORMANCE EVALUATION
During the upload phase, the data owner is only re-
quired to encrypt the file to be outsourced using a
symmetric encryption and to compute a MAC h
i
for
each word ω
i
L
ω
. On the other hand, the cloud
server computes the OPRFs (i.e. tokens) τ
i
= f
δ
(h
i
)
and builds the corresponding index I by following
the algorithm of Cuckoo hashing. Although the com-
putation of the OPRF proposed in (Jarecki and Liu,
2009) may be deemed computationally demanding as
it calls for exponentiations, it can be efficiently par-
allelized at the cloud server. Actually, if the cloud
server possesses N machines for instance, it can pro-
vide each one of its machines with
1
N
fraction of the
list of MACs L
H
= {h
1
,h
2
,..., h
n
} supplied by the
data owner. Each machine will consequently compute
n
N
exponentiations whose results will be given back to
the cloud server to construct the index I .
While some would argue that using PIR to com-
pute the responses of the cloud server to word search
queries is computationally intensive, we note that this
computation consists of matrix multiplications which
can easily be parallelized. Actually, the cloud server
can store at each one of its machine
1
N
-fraction of the
binary matrices {M
j
} and {M
j
}. Upon receipt of a
word search query, S forwards the PIR queries it re-
ceives to its N machines which accordingly compute
the corresponding PIR responses.
Furthermore, we emphasize that in this paper we
employ PIR to retrieve a hash of word search tokens
instead of their actual values. This fact drastically en-
hances the computation and the communication per-
formances of our scheme. For example, if we instan-
tiate the OPRF in the token generation phase with the
OPRF presented in (Jarecki and Liu, 2009), then we
will end up with tokens of size 1024 bits. This means
that if we retrieve the actual values of the token to per-
form word search, then each search query will consist
of retrieving 1024 bits which is far from being prac-
tical. Instead in our protocol, each search operation
consists of fetching t-bit (t is typically 80) hash. We
note also that setting the size (k, l) of the matrices
{M
j
} and {M
j
} to (
tn,
p
n
t
) results in a minimal
communication cost of O(
tn).
Finally, we stress that contrary to related work
(Curtmola et al., 2006), revocation in our protocol
SECRYPT2014-InternationalConferenceonSecurityandCryptography
148
does not require the re-encryption of the outsourced
files. Rather, it only calls for an update of the access
policy of the data owner at the cloud server.
8 RELATED WORK
As opposed to the proposed solution, most of existing
word search mechanisms be them asymmetric (Bel-
lare et al., 2007; Boneh et al., 2004; Waters et al.,
2004) or symmetric (Curtmola et al., 2006; Kamara
et al., 2012; Song et al., 2000; Golle et al., 2004) seem
to guarantee query privacy partially: Indeed, in these
solutions, although the outsourced data and queries
are encrypted, the cloud can discover the response to
any encrypted query. Furthermore very few of current
solutions (Curtmola et al., 2006; Dong et al., 2008)
propose the ability to delegate the search operation;
unfortunately, these solutions provide the authorized
user with the data encryption key and therefore revo-
cation of a user requires the re-encryption of the en-
tirely outsourced data and the distribution of this new
key to the authorized users.
The first solution which transforms an original
PIR mechanism into a privacy preserving word-search
solution is proposed by Chor et. al. in (Chor et al.,
1997). Similarly to our solution, in (Chor et al.,
1997), the owner of the data constructs an index based
on all distinct words in the outsourced file. This index
is a hash-table that is filled according to the perfect
hashing algorithm of (Fredman et al., 1984). Our so-
lution outperforms the solution in (Chor et al., 1997)
thanks to the use of Cuckoo hashing instead of perfect
hashing. Namely, in the scheme of (Chor et al., 1997),
a word search query consists of three PIR queries,
whereas in our protocol it is composed of two PIR
queries. Additionally, the PIR queries in the case of
Cuckoo hashing are independent. This implies that
the server can execute the two PIR instances in paral-
lel to respond to the word search query.
Another solution that resembles the proposed so-
lution is PRISM (Blass et al., 2012) where the cloud
constructs some binary matrices in which each cell
represents one or more words without knowing their
content and the owner sends PIR requests to retrieve
the content of one of these cells. Thanks to the use of
Cuckoo hashing, our solution outperforms the origi-
nal PRISM mechanism without lowering the security
level. PRISM defines a matrix in which each cell cor-
responds to one or more words; therefore, two words
can turn out to be represented by the same cell. In
order to decrease the probability of such collisions,
the data owner send multiple (q) queries for the same
word. In the newly proposed mechanism, the prob-
ability of collisions within the binary matrices is 0
and the data owner and/or the authorized user need
to send a single query for each word. Additionally,
PRISM does not offer any delegation capability and
a straightforward delegation operation would require
the distribution of the data encryption key to autho-
rized users which can increase privacy risks.
9 CONCLUSION
We introduced a protocol for privacy preserving del-
egated word search in the cloud. This protocol al-
lows a data owner to outsource its encrypted data to a
cloud server, while empowering the data owner with
the capability to delegate word search operations to
third parties. By employing keyed hash functions and
oblivious pseudo-random functions, we ensure that
authorized users only learn whether a given word is
in the outsourced files or not. In addition, we use pri-
vate information retrieval to make sure that the cloud
server cannot infer any information about the out-
sourced files from the execution of the word search
protocol. Furthermore, we combine attribute-based
encryption and oblivious pseudo-random functions to
accommodate efficient revocation. Finally, the data
owner in our protocol is only required to perform
symmetric operations, whereas the computationally
intensive computations are performed by the cloud
server, and they can easily be parallelized.
ACKNOWLEDGEMENT
This work was partially funded by the Cloud Ac-
countability project - A4Cloud (grant EC 317550).
REFERENCES
Bellare, M., Boldyreva, A., and O’Neill, A. (2007).
Deterministic and efficiently searchable encryption.
In Proceedings of the 27th Annual International
Cryprology Conference on Advances in Cryptology,
(CRYPTO’07), pages 535–552.
Bethencourt, J., Sahai, A., and Waters, B. (2007).
Ciphertext-policy attribute-based encryption. In Secu-
rity and Privacy, 2007. SP ’07. IEEE Symposium on,
pages 321–334.
Blass, E.-O., di Pietro, R., Molva, R., and
¨
Onen, M. (2012).
PRISM - Privacy-Preserving Search in MapReduce.
In Proceedings of the 12th Privacy Enhancing Tech-
nologies Symposium (PETS 2012). LNCS.
Boneh, D., Crescenzo, G. G., Ostrovsky, R., and Per-
siano, G. (2004). Public key encryption with keyword
PrivacyPreservingDelegatedWordSearchintheCloud
149
search. In Proceedings of Eurocrypt 2004, volume
3027, pages 506–522. LNCS.
Chor, B., Gilboa, N., and Naor, M. (1997). Private informa-
tion retrieval by keywords.
Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.
(2006). Searchable symmetric encryption: improved
definitions and efficient constructions. In Proceedings
of the 13th ACM conference on Computer and com-
munications security, CCS ’06, pages 79–88. ACM.
Dong, C., Russello, G., and Dulay, N. (2008). Shared and
searchable encrypted data for untrusted servers. In
Proceeedings of the 22nd annual IFIP WG 11.3 work-
ing conference on Data and Applications Security,
pages 127–143, Berlin, Heidelberg. Springer-Verlag.
Elkhiyaoui, K.,
¨
Onen, M., and Molva, R. (2014). Privacy
Preserving Delegated Word Search in the Cloud.
Fredman, M. L., Koml´os, J., and Szemer´edi, E. (1984).
Storing a Sparse Table with 0(1) Worst Case Access
Time. J. ACM, 31(3):538–544.
Freedman, M., Ishai, Y., Pinkas, B., and Reingold, O.
(2005). Keyword search and oblivious pseudorandom
functions. In Proceedings of the Second international
conference on Theory of Cryptography, TCC’05,
pages 303–324, Berlin, Heidelberg. Springer-Verlag.
Golle, P., Staddon, J., and Waters, B. (2004). Secure
conjunctive keyword search over encrypted data. In
Jakobsson, M., Yung, M., and Zhou, J., editors, Proc.
of the 2004 Applied Cryptography and Network Secu-
rity Conference, pages 31–45. LNCS 3089.
Jarecki, S. and Liu, X. (2009). Efficient Oblivious Pseudo-
random Function with Applications to Adaptive OT
and Secure Computation of Set Intersection. In The-
ory of Cryptography, volume 5444 of Lecture Notes
in Computer Science, pages 577–594. Springer Berlin
Heidelberg.
Kamara, S., Papamanthou, C., and Roeder, T. (2012). Dy-
namic searchable symmetric encryption. In Proceed-
ings of the 2012 ACM conference on Computer and
communications security, CCS ’12, pages 965–976,
New York, NY, USA. ACM.
Pagh, R. (2001). On the cell probe complexity of member-
ship and perfect hashing. In Proceedings of the thirty-
third annual ACM symposium on Theory of comput-
ing, STOC ’01, pages 425–432, New York, NY, USA.
ACM.
Pagh, R. and Rodler, F. (2004). Cuckoo hashing. Journal of
Algorithms, 51(2):122–144.
Song, D. X., Wagner, D., and Perrig, A. (2000). Prac-
tical techniques for searches on encrypted data. In
Proceedings of the 2000 IEEE Symposium on Secu-
rity and Privacy, SP ’00, pages 44–, Washington, DC,
USA. IEEE Computer Society.
Trostle, J. and Parrish, A. (2010). Efficient Computation-
ally Private Information Retrieval from Anonymity
or Trapdoor Groups. In Proceedings of Conference
on Information Security, pages 114–128, Boca Raton,
USA.
Waters, B. R., Balfanz, D., Durfee, G., and Smetters, D. K.
(2004). Building an encrypted and searchable audit
log. In Proceedings of NDSS’04.
SECRYPT2014-InternationalConferenceonSecurityandCryptography
150