Framework for Searchable Encryption with SQL Databases
Monir Azraoui, Melek
¨
Onen and Refik Molva
EURECOM, Sophia-Antipolis, France
Keywords:
Searchable Encryption, SQL Database, Data Confidentiality.
Abstract:
In recent years, the increasing popularity of outsourcing data to third-party cloud servers sparked a major
concern towards data breaches. A standard measure to thwart this problem and to ensure data confidentiality
is data encryption. Nevertheless, organizations that use traditional encryption techniques face the challenge of
how to enable untrusted cloud servers perform search operations while the actually outsourced data remains
confidential. Searchable encryption is a powerful tool that attempts to solve the challenge of querying data
outsourced at untrusted servers while preserving data confidentiality. Whereas the literature mainly considers
searching over an unstructured collection of files, this paper explores methods to execute SQL queries over
encrypted databases. We provide a complete framework that supports private search queries over encrypted
SQL databases, in particular for PostgreSQL and MySQL databases. We extend the solution for searchable
encryption designed by Curtmola et al., to the case of SQL databases. We also provide features for evaluating
range and boolean queries. We finally propose a framework for implementing our construction, validating its
practicality.
1 INTRODUCTION
Outsourcing data storage and processing to third-
party servers, such as cloud servers, has become a
common procedure. However, when it comes to sto-
ring personal or business confidential data, the usage
of cloud services is currently questioned as as po-
tentially malicious cloud servers may try to learn in-
formation from the data they store and the queries
they process or even leak them to some unauthori-
zed parties. Encryption is a standard approach to en-
sure the confidentiality of data outsourced at honest-
but-curious cloud servers. However, “traditional” en-
cryption schemes deprive the users of functionalities
over the data, such as searching. In this case, to se-
arch words in encrypted outsourced data, the users
are required to retrieve the entire dataset from the
cloud, decrypt it and operate the search locally. This
naive approach poses serious performance concerns
which cancel out the benefit of outsourcing. There-
fore, cloud servers should be able to perform the se-
arch operation in outsourced databases, while the ac-
tual databases are encrypted.
The problem of searching over encrypted data has
received much interest from both academia (Song
et al., 2000; Hacig
¨
um
¨
us¸ et al., 2002; Agrawal et al.,
2004; Curtmola et al., 2006; Chase and Kamara,
2010; Popa et al., 2012; Cash et al., 2013; Kamara and
Papamanthou, 2013) and industry (Bitglass
1
, Cipher-
Cloud
2
or Skyhigh
3
). Research on this topic mainly
focuses on the scenario of a user who outsources an
encrypted collection of documents (such as e-mails,
medical records, etc.) and would like to further se-
arch keywords over this encrypted dataset. While this
theoretical setting is valid, in practice, many organi-
zations such as governments, hospitals or companies
store data in relational databases which structure data
into tables according to a set of attributes. The po-
pular SQL language (Chamberlin and Boyce, 1974)
enables users to store, query and update their data in
a user-friendly manner. SQL databases ensure fast re-
cord search and retrieval provided that the SQL ser-
ver is able to read data content. As mentioned before,
encryption typically impedes the server from reading
data which makes search over encrypted databases
challenging. In particular, a cryptographic data pro-
tection mechanism for searching over encrypted data
stored in a SQL database should allow the server to ef-
ficiently process search queries without having access
to the plaintext data. Besides, querying an encrypted
SQL database should be user-friendly: protected que-
1
https://www.bitglass.com/ [Accessed Oct. 09, 2017].
2
https://ciphercloud.com/ [Accessed Oct. 09, 2017].
3
https://www.skyhighnetworks.com/ [Accessed Oct. 09,
2017].
Azraoui, M., Önen, M. and Molva, R.
Framework for Searchable Encryption with SQL Databases.
DOI: 10.5220/0006666100570067
In Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018), pages 57-67
ISBN: 978-989-758-295-0
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
57
ries should adopt the SQL syntax and simulate the se-
arch functionality as if data was not encrypted. A di-
rect application of solutions for searching an encryp-
ted collection of files to SQL databases is not straight-
forward. Indeed, existing solutions (Curtmola et al.,
2006; Chase and Kamara, 2010; Cash et al., 2013;
Kamara and Papamanthou, 2013) build an index of
the keywords. In the case of SQL, data is arranged
into tables, and records are queried based on a condi-
tion over one or more attributes. Therefore, keywords
should preserve this notion of attribute. Furthermore,
SQL allows performing comparisons of data (range
queries) which are not always addressed in existing
work.
To solve these challenges, in (Popa et al., 2012), the
authors proposed CryptDB, a framework that sup-
ports SQL queries over encrypted data. This solu-
tion relies on layers of different property-preserving
encryptions, such as deterministic (DET) and order-
preserving encryption (OPE), applied to a column
of an SQL table. To query the encrypted database,
CryptDB converts the plaintext SQL query into its
encrypted equivalent and decrypts the targeted lay-
ers. The major drawback of CryptDB lies in the fact
that whenever one layer is removed, the encryption
scheme becomes weak. In light of this, the major pro-
blem is to deliver a practical solution for searching
over encrypted databases that does not suffer from the
leakage occurring in CryptDB and that enables trans-
parent processing of complex queries over encrypted
SQL databases.
In this work, we tackle this problem and propose
a practical construction for searching over encrypted
SQL databases which limits information leakage. Our
solution builds upon the searchable encryption techni-
que designed in (Curtmola et al., 2006) which applies
to unstructured documents. This mechanism builds
an inverted search index of keywords in the database
to enable keyword search queries over encrypted data.
We make the following contributions:
We define a solution that is SQL-compatible, so as
to empower user and server to use SQLs operati-
ons of database creation, data insertion and search.
We propose a framework for implementing our so-
lution on the PostgreSQL backend;
Our solution achieves practicality thanks to the use
of the cuckoo hashing technique which renders the
search in the index efficient;
Our solution supports complex queries, namely
boolean and range queries;
We evaluate the practicality of our solution by tes-
ting it over an encrypted e-health database.
The rest of the paper is organized as follows.
Section 2 defines searchable encryption and SQL
databases. We describe our solution in Section 3.
Section 4 presents the proposed implementation fra-
mework and a performance evaluation. We review
existing work in Section 5. We conclude the paper
and give some direction for future work in Section 6.
2 PRELIMINARIES
2.1 Symmetric Searchable Encryption
We consider the following scenario: a user U wants
to store a set D = {d
1
,d
2
,...,d
n
} of confidential data
at a server S. D contains a list of searchable key-
words W = {ω
1
,ω
2
,..., ω
N
}. To preserve data confi-
dentiality, U encrypts D to obtain C = {c
1
,c
2
,..., c
n
},
which is outsourced to S. A symmetric searchable en-
cryption mechanism allows searching the keywords
directly in C , without compromising data confidenti-
ality nor query privacy. In (Curtmola et al., 2006), the
authors formally define symmetric searchable encryp-
tion (SSE) schemes by the following algorithms:
KeyGen(1
κ
) K : It is a probabilistic algorithm
executed by user U and, given a security parame-
ter κ, outputs a secret key K .
BuildIndex(D,K ) I : This algorithm is run by
user U to generate a search index. It takes key
K and dataset D as inputs and outputs index I .
When the execution of BuildIndex is completed,
index I is stored at server S.
Trapdoor(ω,K ) τ
ω
: This algorithm, executed by
user U, issues a search token, called trapdoor,
to search for a given keyword ω. It takes secret
key K and keyword ω as inputs and generates
τ
ω
. This trapdoor allows the cloud to perform the
search over the encrypted database for the corre-
sponding keyword ω.
Search(I , τ
ω
) D(ω): Run by server S, this algo-
rithm processes the search query on ω and gene-
rates the search result. It takes index I and trap-
door τ
ω
as inputs and outputs D(ω), the list of
identifiers of documents containing ω.
2.2 SQL Databases
Our work focuses on data structured in SQL databa-
ses such as MySQL
4
or PostgreSQL
5
. An SQL data-
base D organizes the data in a set of two-dimensional
tables (T
1
,T
2
,.., T
n
). Each table row is a record, the
columns are attributes, and each table entry a value.
4
https://www.mysql.com/ [Accessed Oct. 09, 2017].
5
https://www.postgresql.org/ [Accessed Oct. 09, 2017].
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
58
We assume the existence of a view V , resulting from
a query applied to the data in D, which represents a
subset of the records, selected from different tables in
D, in such a way that queries can directly be proces-
sed on V . We mainly consider the SELECT operation
that can be expressed as follows:
SELECT attributes FROM table WHERE conditions.
This query retrieves from an SQL table (or a view) the
records, and more precisely, the attributes specified
after SELECT, that match conditions. In this work,
we are interested in conditions that can be written as
a boolean expression Q : r {0,1}, which, evalua-
ted on a record r of D, returns Q (r) = 1, if r satisfies
the expression or Q (r) = 0 otherwise. Q consists of
several predicates P
i
that express conditions over one
or more attributes in D and are combined with bool-
ean operators: OR and AND. Each of the P
i
involves
operators such as =,>,<,, and relates a database
attribute to a search criterion.
Table 1: Example of table: PATIENT.
ID Name Surname Age Gender Occupation
001 David Smith 38 M Farmer
002 Mary Grant 27 F Lawyer
003 Julie David 19 F Student
004 Daniel Farmer 45 M Lawyer
For example, let us consider table PATIENT (see
Table 1). An example of predicate P
1
can be
h Name=‘Mary’i where Name is an attribute and Mary
the value to be searched for. In this paper, we consider
the following queries, listed in Table 2:
Simple keyword queries: They contain only one pre-
dicate Q = P
1
= hattr=vali. The query is
expressed as SELECT attributes FROM table
WHERE attr=val, attr is an attribute and val
any possible value.
Conjunctive queries (AND): Predicates are combined
with an AND operator: Q = P
1
P
2
... P
m
.
Disjunctive queries (OR): Predicates are combined
with an OR operator: Q = P
1
P
2
... P
m
.
Range queries: They involve one or two predicates
combined with an AND operator, and apply to nu-
meric data to find value within a range.
Complex queries: They combine conjunctions, dis-
junctions and range queries.
3 OUR SSE CONSTRUCTION
FOR SQL DATABASES
To clarify the need for an SSE scheme dedicated to
SQL databases, we first present an e-health use case
that involves medical data. From this use case, we
identify the gaps between the theoretical aspect of an
SSE scheme and the practical requirement for an en-
crypted database.
3.1 Use Case
Let us consider a hospital that collects data from their
patients. This data may consist of their personal in-
formation (names, age, etc.), their biological data
(height, weight, body temperature, etc.) or the care
they received (surgery operation, etc.). This data is
managed in an SQL database (such as in Table 1).
The hospital’s information technology (IT) depart-
ment would like to pay the service of a cloud provider
to store and enable access to the data.
However, since the e-health data is confidential
and because the hospital must comply with strong re-
gulations with respect to the storage and processing
of medical data (such as the recent European Gene-
ral Data Protection Regulation (GPDR, 2016)), the IT
staff encrypts the data before its outsourcing. Nevert-
heless, the hospital does not want to lose the functio-
nalities offered by the SQL language. Therefore, the
obfuscation of the e-health database should preserve
(i) the format and the arrangement of the data, and
(ii) the SQL functionalities such as creating and sto-
ring the database, or searching and retrieving particu-
lar records based on one or more attributes. It is clear
that this use case falls into the model of searchable
encryption as defined in Section 2.1.
3.2 Background
We base our work on the scheme described in (Curt-
mola et al., 2006). Unlike our use case, the authors
do not consider SQL databases but focus on an un-
structured collection of files D. To enable user U to
efficiently search for a keyword ω in the encrypted da-
taset C , this scheme builds an inverted search index I ,
that links each keyword ω with D(ω), the set of iden-
tifiers of documents containing ω. I is also obfusca-
ted so that it does not disclose any information about
the data. Only the holder of the secret key can issue
a search query. The search time is linear in the size
of D(ω), which is believed to be optimal (Curtmola
et al., 2006). Besides, the authors proved that their
scheme is secure under IND-CKA1 (security against
chosen-keyword attack). In a nutshell, this security
definition requires that an adversary should learn no-
thing from the outsourced data and the index beyond
what can be inferred from the search results and the
access pattern, even if the adversary chooses the ke-
ywords. In particular, the search tokens do not leak
information about the keywords.
Framework for Searchable Encryption with SQL Databases
59
Table 2: Example of queries over table PATIENT.
Query Conditions Results
Simple Surname=‘Grant’ h002, Mary, Grant, 27, F, Lawyeri
Conjunctive Name=‘Mary’ AND Surname=‘Grant’ h002, Mary, Grant, 27, F, Lawyeri
Disjunctive Gender=‘F’ OR Occupation=‘Lawyer’
h002, Mary, Grant, 27,F, Lawyeri
h003, Julie, David, 19, F, Studenti
h004, Daniel, Farmer, 45, M, Lawyeri
Range
Age<45 AND Age>=35 h001, David, Smith, 38, M, Farmeri
Age<20 h003, Julie, David, 19, F, Studenti
Complex (Gender=‘F’ OR Occupation=‘Lawyer’) AND Age<20 h003, Julie, David, 19, F, Studenti
In what follows, we outline this SSE scheme:
KeyGen(1
κ
) K : Key K consists of three pseudo-
randomly generated keys K
ψ
, K
π
and K
ϕ
used for
two pseudo-random permutations
6
(ψ and π) and
one pseudo-random function
6
ϕ respectively.
BuildIndex(D,K ) I : As shown in Algorithm 1, it
constructs three data structures:
B Linked Lists L
i
: For each keyword ω
i
W,
1 i N, the list L
i
is an encrypted form of
D(ω
i
). Each node N
i, j
of L
i
(1 j |D(ω
i
)|)
contains a record identifier in D(ω
i
), the key
k
i, j
used to encrypt node N
i, j+1
and the address
in array A of N
i, j+1
. Each node is encrypted
using any semantically secure symmetric en-
cryption
7
algorithm denoted E. Node N
i,1
is en-
crypted using key k
i,0
.
B Array A: It stores nodes {N
i, j
} in a pseudo-
random order. U uses key K
ψ
as input of ψ to
compute the position of each N
i, j
in A.
B Look-up table T : Each entry in T is associa-
ted to a keyword ω
i
W and stores information
about N
i,1
, namely its address in A and the key
k
i,0
. This information is encrypted using a XOR
operation with a key K
i
= ϕ(K
ϕ
,ω
i
). T stores
the encrypted information in a pseudo-random
order, using π seeded with key K
π
. In (Curt-
mola et al., 2006), the authors suggest, without
giving further details, to implement T via a
FKS dictionary (Fredman et al., 1984).
At the end of BuildIndex, index I = (A,T ) is sto-
red at server S.
Trapdoor(ω,K ) τ
ω
: τ
ω
consists of the position
pos of the entry in table T associated with ω and
the key K
ω
to decrypt T [pos]. We recall that, if
ω exists in D (without loss of generality, we as-
sume that this keyword is ω
i
, for 1 i N), then
6
A pseudo-random function is a polynomial-time function
such that any probabilistic polynomial-time adversary can-
not distinguish it from a truly random function.
7
Informally, a semantically symmetric encryption scheme
(E,D) is a non-deterministic encryption algorithm which
yields different ciphertexts for the same plaintext mes-
sage. (E, D) denotes the pair of encryption and decryption
functions.
T [pos
i
] stores the address in array A of N
i,1
and
the key k
i,0
. Therefore, τ
ω
consists of two values:
pos = π(K
π
,ω) and K
ω
= ϕ(K
ϕ
,ω).
Search(I , τ
ω
) D(ω): As depicted in Algorithm 2,
this algorithm parses τ
ω
as pos and K
ω
. Then it lo-
cates entry T [pos] and XOR-decrypts it using K
ω
.
It obtains the address in array A of N
i,1
and the
key k
i,0
. Starting from N
i,1
, algorithm Search can
decrypt all the nodes N
i, j
, 1 j |L
i
| of L
i
. Fi-
nally, the algorithm outputs the list D(ω
i
). If one
of these steps fails, then algorithm Search aborts
and indicates that the keyword was not found.
Algorithm 1: I BuildIndex(D,K ).
Inputs : Database D, Secret Key K
Output: Index I
1 Create dictionary W
2 for ω
i
W do determine D(ω
i
)
3
4 Set global counter ctr = 1
// Array A creation
5 Initialize an empty array A of size m
6 for ω
i
W do
// Build a linked list L
i
with nodes N
i, j
7 Generate random key k
i,0
8 for 1 j |D(ω
i
)| do
9 Generate random key k
i, j
10 N
i, j
= hrecordID
j
||k
i, j
||ψ
K
ψ
(ctr + 1)i
11 Encrypt N
0
i, j
= E
k
i, j1
(N
i, j
)
12 Store in A[ψ
K
ψ
(ctr)]
13 Set counter ctr = ctr +1
14 end
15 For the last node of L
i
, set address of next
node to NULL
16 end
17 Let m
0
=
ω
i
W
|D(ω
i
)|
18 if m
0
< m then fill the (m m
0
) remaining entries
of A with random values
19
// Look-up table T creation
20 for ω
i
W do
T [π
K
π
(ω
i
)] = haddr[A(N
i,1
)]||k
i,0
i ϕ
K
ϕ
(ω
i
)
21
22 Output I = (A,T )
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
60
Algorithm 2: D(ω) Search(I ,τ
ω
).
Inputs : Index I , Trapdoor τ
ω
Output: Search Results D(ω)
1 Parse τ
ω
= (pos,K
ω
)
2 Retrieve θ = T [pos]
3 Parse hα||ki = θ K
ω
4 Decrypt linked list L whose first node is in A[α]
and encrypted under key k
5 Output each decrypted record ID
3.3 Requirements for
SQL-compatibility
Given the scenario described in Section 3.1, we iden-
tify five requirements for designing an SSE solution
that would support SQL databases:
R1: Dictionary Creation. The first requirement
concerns the choice of the keywords W. In
(Curtmola et al., 2006), the authors only mention
to “scan D to build the set of distinct words”. In
SQL databases, the data is structured into tables,
where each record contains several attributes that
take specific values. Hence, one can imagine
that the dictionary would contain all the possible
values in D. We show in Section 3.5 that such
a dictionary does not work in the scope of SQL
databases. Therefore, we specify an algorithm
CreateDictionary to create an appropriate W.
R2: Practical Look-up Table T . The authors in
(Curtmola et al., 2006) mention the use of an
FKS dictionary (Fredman et al., 1984), without
giving much details about its implementation.
FKS dictionaries are known to be memory
expensive and finding the good hash that prevents
hash collisions (as a matter of fact, a perfect
hash function) when inserting a new item is a
demanding process. For these reasons, we opt
for another data structure, simpler to implement,
more practical and more flexible, namely a
cuckoo hash table. We give an overview of this
data structure in Section 3.4.
R3: Structured-data Encryption. In the case of un-
structured datasets, each document is encrypted
independently. In the case of SQL databases,
the organization of data into SQL tables requi-
res that each value in the tables is encrypted inde-
pendently, using a semantically secure encryption
algorithm. This will preserve the functionality
of selecting specific attributes from an SQL table
using the instruction SELECT attributes. The-
refore, we will define two additional algorithms
EncryptTable and DecryptResults.
R4: Transparent SQL Queries. In (Curtmola et al.,
2006), communication details between user and
server are omitted. With SQL databases, querying
data is a well-defined process that involves spe-
cific SQL instructions such as CREATE, INSERT
or SELECT. In this paper, we require that a user
transparently queries the database, meaning that
no change is needed in the user’s application pro-
gram. Even in the case of encrypted data, our SSE
scheme should respect the SQL language expres-
siveness, available in the case of plain databases.
R5: Complex Queries. Last but not least, the SSE in
(Curtmola et al., 2006) does not tackle the pro-
blem of complex queries that contain range, con-
junctive or disjunctive queries. We extend this so-
lution with this functionality.
Before delving into the details of our proposal
for a symmetric searchable encryption scheme, we
outline in the next section the definition of cuckoo
hashing.
3.4 Cuckoo Hash
In a cuckoo hash table T (Pagh and Rodler, 2004),
an item v can be stored in one of two possible
entries, each located in two different tables, as-
sociated with two independent hash functions and
a key x. The look-up operation, later denoted
CuckooLookup(T ,x), only requires to examine these
two locations. The insertion of a new element v with
key x, operation denoted CuckooInsert(T ,x,v), eva-
luates the first hash function over x to give the first
possible slot in the first table. If this slot is already
assigned, then the item currently occupying this lo-
cation is “kicked out” and v can be inserted in the
emptied slot. The removed item is then moved to
its alternative location assigned with the second hash
function. This move may encounter another collision,
thus requiring another element to be kicked out from
its current location. This procedure is repeated until
the last kicked-out item finds a free slot or when an
endless loop is detected (or when the number of kick-
outs reaches a predefined maximum). In the latter
case, the authors in (Pagh and Rodler, 2004) suggest
to rehash the data structure, that is, to use two new
hash functions.
3.5 Our Construction for Simple
Keyword Queries
Our solution is built upon the technique described in
(Curtmola et al., 2006), but implements several modi-
fications in order to satisfy the previously mentioned
Framework for Searchable Encryption with SQL Databases
61
requirements. Let us consider a database D = {T }.
Table T contains t attributes and R records. Without
loss of generality, for databases with multiple tables,
we assume that we can reason on a view V extracted
from a subset of these tables.
Our first proposal provides a solution to sa-
tisfy requirement R1. We define an algorithm
CreateDictionary to construct a dictionary W from
D. As required by algorithm BuildIndex, the search
index, built upon W, stores information about the re-
cords that contain each of the keywords. Besides, a
basic SQL query contains a clause WHERE of the form
hattribute=valuei, meaning that we are interested
in the records where the attribute has the specified
value. Therefore, the search index must indicate that
information. Hence, the dictionary W contains ke-
ywords under the form attribute=value. We de-
note n the size of W (n t × R). Let us take the ex-
ample of the table PATIENT presented in Table 1 and
consider the query SELECT * FROM PATIENT WHERE
Name=‘David’. If W only listed the distinct words in
the table, as it is the case in (Curtmola et al., 2006),
then the index would have lost the information that
David is a possible value for the attribute Name. Note
that David is also a value for the attribute Surname.
In this case, the search over the index would have re-
turned all the records that contain the word David,
regardless of the attributes. To cope with this is-
sue, we suggest to store the keywords ω of the form
Name=‘David’ in order to preserve the information
on the attribute and to prevent the search from retur-
ning wrong results. Algorithm 3 depicts the operati-
ons to build dictionary W.
Algorithm 3: {W, {D(ω)}
ωW
} CreateDictionary(D).
Inputs : Database D
Output: Dictionary W, Sets {D(ω)}
ωW
1 W =
/
0
2 for record D do
3 for attribute do
4 ω hattribute=valuei
5 if ω / W then W = W {ω}
6
7 end
8 end
9 for ω
i
W do determine D(ω
i
)
10
To build index I = (A,T ), we first permute the
rows in the database and then we follow the procedure
described in Algorithm 4. In our case, we consider the
record identifier, denoted as RecordID, as an additio-
nal attribute of the considered SQL table. To improve
the efficiency of (Curtmola et al., 2006) and meet re-
quirement R2, we construct a cuckoo hash table T for
efficient lookups as depicted in Algorithm 4.
At the end of BuildIndex, I is stored in an SQL
table called INDEX as a binary object (for example,
using the type bytea in PostgreSQL).
Algorithm 4: I BuildIndex(W,K ).
Inputs : Keywords W, Secret Key K
Output: Index I
1 Set global counter ctr = 1
// Array A creation: same operations as in Algorithm 1
// Cuckoo Hash Table T creation:
2 for ω
i
W do
3 value = haddr[A(N
i,1
)]||k
i,0
i ϕ
K
ϕ
(ω
i
)
4 CuckooInsert(T , π
K
π
(ω
i
),value)
5 end
6 Output I = (A,T )
The next operation in our SSE construction en-
crypts the SQL table with a semantically secure en-
cryption scheme E = (E,D). As opposed to (Curt-
mola et al., 2006), we define two encryption and
decryption algorithms
8
(Algorithm 5). Algorithm
EncryptTable encrypts each entry in the table separa-
tely, and because of the semantically secure property
of the encryption scheme, each encrypted entry is in-
distinguishable from the others. Note that the attri-
butes and the table names are also encrypted, but the
extra attribute RecordID is left unencrypted, which
does not impact the security of our scheme. Algo-
rithm DecryptResults decrypts the encrypted records
returned by the server after the execution of Search.
These two algorithms take as input secret key K , out-
put by a modified algorithm KeyGen, that generates
an additional key K
enc
.
Algorithm 5: C EncryptTable(D,K ).
Inputs : Table T , Secret Key K
enc
Output: Encrypted table C
1 encrypted table = E(K
enc
,table name)
2 for attribute do
3 encrypted attribute =
E(K
enc
,attribute)
4 end
5 for record in T do
6 for attribute do
7 encrypted value = E(K
enc
,value)
8 end
9 end
To retrieve the records from the encrypted da-
tabase based on a search criterion denoted ω, the
user translates a plain SQL query such as SELECT *
FROM PATIENT WHERE Name=‘David’ (here ω cor-
responds to Name=‘David’) into an encrypted SQL
query, denoted SE query. This SE query requests the
8
Algorithm DecryptResults is the inverse of EncryptTable
showed in Algorithm 5, where encryption function E is
replaced by decryption function D.
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
62
server to execute algorithm Search over the search in-
dex I using the search token generated with algorithm
Trapdoor. Therefore, the search procedure operates
in a challenge-response interaction between user U
and server S with the following steps:
1. U executes algorithm Trapdoor to generate the se-
arch token τ
ω
for keyword ω. Then, U forms the
following SE query:
SELECT * FROM encrypted table WHERE
RecordsID IN Search(I, τ
ω
) (1)
The most important and challenging part of this
SE query is RecordsID IN Search(I , τ
ω
). The
WHERE condition requests the execution of algo-
rithm Search by the SQL server. Since this algo-
rithm outputs D(ω), the WHERE condition collects
the records whose RecordID is in the output of
algorithm Search.
2. On execution of the SE query (1), S runs algo-
rithm Search with the specified trapdoor τ
ω
on the
indicated index. Algorithm Search is identical to
the one defined in (Curtmola et al., 2006), except
the fact that we resort to algorithm CuckooLookup
to look-up table T , as depicted in Algorithm 6.
The SE query (1) outputs the encrypted records
that match the search criterion ω. The last step of
our protocol decrypts the obtained records by calling
algorithm DecryptResults.
Algorithm 6: D(ω) Search(I ,τ
ω
)
Inputs : Index I , Trapdoor τ
ω
Output: Search Results D(ω)
1 Parse τ
ω
= (γ,η)
2 Retrieve θ = CuckooLookup(T , γ)
3 Parse hα||ki = θ K
ω
4 Decrypt linked list L whose first node is in A[α]
and encrypted under key k
5 Output each decrypted record ID
Proposed Framework. In addition to the algo-
rithms we specified in the abovementioned para-
graphs, we suggest that our SSE construction follows
the framework depicted in Figure 2. Specifically, this
framework allows to run the secure search function,
namely algorithm Search, on a widely adopted SQL
server, namely PostgreSQL.
The architecture of the proposed framework con-
sists of two zones:
Trusted Zone: It includes user U with an unmodi-
fied SQL client application. We define a proxy
between U and S, which executes all the user-
side functions of our protocol. The keys generated
with KeyGen are stored in a keystore.
Untrusted Zone: It consists of the untrusted cloud
server S running PostgreSQL that stores the en-
crypted database and the secure search index.
This PostgreSQL server is in charge of executing
the function Search defined in Algorithm 6.
As shown in Figure 1, our implemented scheme is
divided into three phases:
Upload: The user outsources her database by exe-
cuting the CREATE and INSERT INTO queries.
The proxy receives them, generates keys, runs
CreateDictionary, BuildIndex and EncryptTables
and converts the queries into an SE query. This SE
query includes the encrypted version of CREATE
and INSERT INTO queries for the data and the
CREATE and INSERT INTO queries for storing the
index into an additional table denoted INDEX.
Query: The user executes a SELECT-FROM-WHERE
query to retrieve some records based on some
criteria (the WHERE conditions). The proxy re-
ceives this query, extracts the criteria, retrieves
the keys and executes Trapdoor for each of these
criteria. Then, the proxy forms an SE query
that converts the SQL query into its “encrypted”
version: The proxy replaces the FROM argument
with the encrypted table name and replaces the
WHERE conditions with the clause RecordID IN
Search(index, trapdoor), where Search is the
algorithm defined in Algorithm 6, trapdoor is the
token generated by Trapdoor and index is retrie-
ved with SELECT index FROM INDEX.
Retrieval: Upon reception of this SE query, the
server executes Search over the index and the
trapdoor specified in the query. Thanks to the
FROM and WHERE parameters, the server retrieves
the encrypted records whose identifiers are output
by Search. The proxy then calls DecryptResults
and sends to the user the decrypted search results.
3.6 Boolean Queries
We propose to transform boolean queries into several
subqueries. The server operates the intersection (in
case of conjunction of keywords) or the union (in case
of disjunction of keywords) of the search results and
returns them to the user. Formally, our solution trans-
forms a plain query Q : r {0,1} into an SE query
Q
0
: r {0,1}, where Q
0
consists of the predicates P
0
i
which are the “obfuscated versions” of the predicates
P
i
defining Q . The P
0
i
are of the form RecordID IN
Search(index, trapdoor
i
) and are combined with the
same boolean operators as the P
i
. Here trapdoor
i
de-
notes the token generated by algorithm Trapdoor for
the search criteria P
i
. Let us consider the query
SELECT * FROM PATIENT WHERE Name=‘Mary’
AND Surname=‘Grant’.
It contains two predicates P
1
= hName=‘Mary’i and
Framework for Searchable Encryption with SQL Databases
63
Figure 1: Workflow of our scheme.
Figure 2: Architecture of our SSE solution.
P
2
= hSurname=‘Grant’i, linked with the boolean
operator AND. To transform this SQL query into an
SQL-SE query, the proxy generates the two following
trapdoors: trapdoor
1
= Trapdoor(Name=‘Mary’) and
trapdoor
2
= Trapdoor(Surname=‘Grant’) and exe-
cutes the SE query:
SELECT * FROM encrypted table WHERE
RecordsID IN Search(index, trapdoor
1
) AND
RecordsID IN Search(index, trapdoor
2
).
3.7 Range Queries
Let us consider the attribute Age from the data-
base example depicted in Table 1 and a range query
that requests to retrieve the records where Age > 40
and Age 45. Using the SSE solution depicted in
Section 3.6, we can simply generate trapdoors for
each of the values between 41 and 45, so five trap-
doors for the keywords Age = 41, Age = 42, etc. For
each trapdoor, the server calls algorithm Search and
computes and sends back to the user the union of
the search results. In case of wider ranges (such as
Age > 18 and Age 75), this solution would not be
efficient because of the execution of several instances
of Trapdoor and Search.
The idea of our solution for range queries is to re-
duce a range query problem into a keyword search
problem. To a particular value of an attribute (for
example: value 45 for attribute Age), we associate
a value (for example: [45 50[) to a new attribute
Range Age. Therefore, we define a new keyword
hRange Age = [45 50[i that will be inserted in in-
dex I during the execution of BuildIndex. As a matter
of fact, we extend the initial database D = T with a
new attribute
9
Range attribute. For each record r
in D, we let the user specify the range value of the
corresponding attribute. Table 3 shows table PATIENT
extended with the attribute Range
Age. Note that the
user selects its preferences on the range interval and
the initial value, based on the content of the database
and the possible queries she may issue, depending on
the use cases
10
.
Table 3: Table PATIENT with Range column.
ID Name Surname Age Range Age Gender Occupation
001 David Smith 38 [35;40[ M Farmer
002 Mary Grant 27 [25;30[ F Lawyer
003 Julie David 19 [15;20[ F Student
004 Daniel Farmer 45 [45; 50[ M Lawyer
When user U wants to retrieve the records that ma-
tch a range query, the proxy issues an SE query that
contains the trapdoor(s) associated with the specified
range. Different cases can be encountered:
1. The range query exactly corresponds to one of
the range included in the search index. If we
consider the example of Table 3, this case ap-
plies to search queries such as SELECT * FROM
PATIENT WHERE Age 35 AND Age < 40. The
range query hAge 35 AND Age < 40i exactly
matches the range we specified for the record de-
aling with patient David Smith. In this case, the
proxy calls algorithm Trapdoor for the keyword
hRange Age = [35; 40[i. This case is reduced to a
simple keyword search query.
2. The range query is different from the ranges
9
Practically, this column will not be stored at the server.
10
The specified ranges can have different interval length de-
pending on the frequency of items in the interval. We can
use statistical tools for defining the range intervals but this
topic is out of scope of this paper.
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
64
included in the search index. Let us take the fol-
lowing range query: SELECT * FROM PATIENT
WHERE Age 18 AND Age < 25. The interval
[18;25[ is not one of the ranges we included
in Table 3. Therefore, the proxy factorizes the
search range [18;25[ into a union of ranges
(inserted in the search index during the execution
of BuildIndex) and singletons of discrete values.
In our example [18; 25[= {18} {19} [20; 25[.
Hence, the proxy calls algorithm Trapdo or for
the two singletons hAge = 18i and hAge = 19i as
well as for hRange Age = [20;25[i. Therefore,
instead of 7 search tokens, our solution only
generates 3 trapdoors and the corresponding SE
query is a disjunction of clauses RecordID IN
Search(index,trapdoor).
4 IMPLEMENTATION AND
PERFORMANCE EVALUATION
Implementation. Our SSE scheme is implemen-
ted according to the architecture shown in Figure 2.
All the functions at the proxy are implemented in
Java. The server runs PostgreSQL server (version
9.5). Search is coded in Java with PL/Java
11
, an add-
on for server-side procedures. The code is packed into
a JAR that is further loaded into the PostgreSQL bac-
kend, so as to execute Search directly in PostgreSQL.
We tested our prototype on a single machine that si-
mulates both the proxy and the server. The machine
has four 3.20GHz Intel Core
TM
i5-3470 processors,
32GB of RAM and runs Ubuntu 14.04 LTS.
E-Health Database. This work is done in the context
of a collaborative project named CLARUS
12
, which
collected data from anonymized medical records of a
hospital. This database consists of eight tables and
queries are performed over a view (in the SQL termi-
nology) called LAB. This view lists 64.440 records and
comprises eight attributes, including a patient identi-
fier (pat id), a patient name (pat name), a patient
last name (composed of two parts pat last1 and
pat last2) and an identifier of an episode of care
(ep id).
Results. We measured the time consumed by the pha-
ses described above over the e-health database. We
let our program automatically add range attributes for
pat id. The length of the ranges is 10: we insert
Range pat id values of the form [0,9], [10,19], etc.
We ran 100 executions of the functions, and computed
the average time, listed in Tables 4 and 6. We com-
11
https://tada.github.io/pljava/. [Accessed Oct. 09, 2017]
12
http://www.clarussecure.eu
pare these measurements with the case where upload
(resp. search) is performed without the application of
our SSE mechanism in Table 5 (resp. Table 7).
Table 4: Upload with SSE.
Operation Time (s)
Index Creation 22.1 s
Encryption 5,60 s
Server storage 4.60 s
Table 5: Upload without SSE.
Operation Time (s)
Upload 9.735 s
Server storage 1.238 s
The time needed by the server for storage is gre-
ater than the case where no encryption is performed,
since the server is required to store the index toget-
her with the encrypted data. We evaluated single ke-
yword queries (keywords with different numbers of
occurrences) and range queries. The Retrieval phase
is fast: less than 1 second to decrypt the search re-
sults. We recall that Search first finds the entry in ta-
ble T associated with the searched keyword and then
decrypts the corresponding linked list. In average,
Search has acceptable costs, given that the data is en-
crypted. For the most frequent keyword, Search de-
crypts 4275 nodes which yields the time reported in
Table 6. In the case of range queries, we tested the
query pat id 50 AND pat id < 150, which ge-
nerates 10 trapdoors of the form hRange pat id =
[50,59]i, ..., hRange pat id = [140,149]i (since we
configured the attribute Range pat id with intervals
of size 10). Thus, Search is executed 10 times. The
Upload phase is relatively long (22 s) since BuildIndex
scans the database value by value to create the index
and involve a high amount of permutations. Nevert-
heless, these operations are performed only once for
an unbounded number of queries.
Table 6: Query and Retrieval.
Query Trapdoor |D(ω)| Search Decryption
Single Keyword
pat last2=‘GARCIA’ 2 ms 4275 122 s 0,37 s
pat name=‘RAUL’ 1 ms 178 5.6 s 0,02 s
pat name=‘MORAD’ 1 ms 11 0.8 s 0,002 s
pat last1=‘DUC’ 9 ms 0 0.5 s
Range Query 2,0 ms 198 9.87 s 0,04 s
Table 7: SQL search without SSE.
Query SQL search
Single Keyword
pat last2=‘GARCIA’ 3.5 ms
pat name=‘RAUL’ 8.0 ms
pat name=‘MORAD’ 7 ms
pat last1=‘DUC’ 7 ms
Range Query 45 ms
Framework for Searchable Encryption with SQL Databases
65
5 RELATED WORK
We reviewed the literature in the area of searchable
encrypted database systems. The design of such sys-
tems always trades off between the level of security,
the query functionalities preserved in spite of encryp-
tion and the performance.
Boolean Queries. In (Hacig
¨
um
¨
us¸ et al., 2002), the
authors proposed a heuristic to execute SQL queries
over encrypted data where the querier user is requi-
red to perform a heavy processing of the search re-
sults, whereas in our case, query and retrieval are lig-
htweight operations. Besides, this solution encrypts
records as a whole whereas our construction encrypts
records attribute-wise. The authors do not provide
any security guarantee. In (Cash et al., 2013), an
SSE scheme for boolean queries, named OXT, is tes-
ted with a MySQL database. This solution builds two
indexes, which increases the storage overhead at the
server. This scheme supposes the knowledge of the
frequency of the keywords to perform efficient search.
The user first queries the least frequent keyword of a
conjunctive keyword query and then filters the results
for the other keywords. In (Pappas et al., 2014), Blind
Seer is proposed as a system which enables boolean
SQL queries via a tree-based search index. It resorts
to Yao’s garbled circuit (Yao, 1986), which requires
the user and the server to jointly parse the tree to pro-
cess the search query. Our construction only incurs a
single round of interaction between the user and the
server.
Range Queries. In (Agrawal et al., 2004), the
authors defined order-preserving encryption (OPE)
which enables range queries, but reveals the order
of numeric data. Our scheme avoids this leakage
by using a semantically secure encryption algorithm.
ARX (Poddar et al., 2016) is based on a tree to evalu-
ate range queries over encrypted data, and uses Yao’s
garbled circuit to traverse the tree to respond to range
queries. Whenever such a query is computed, nodes
of the tree must be updated, which incurs complexity
overhead at the user-side. Besides, the authors tested
this solution on a NoSQL database, which does not
store structured data as in MySQL or PostgreSQL.
CryptDB. Proposed by (Popa et al., 2012), this sy-
stem allows equality, range and boolean queries over
encrypted SQL databases, thanks to onion encryp-
tion, that encrypts each attribute with one or more on-
ion layers. Each of the encryption layers preserves a
particular functionality. For simple keyword search,
CryptDB applies a deterministic encryption, disclo-
sing to the server the occurrences of a particular ke-
yword. CryptDB also preserves the functionality of
range queries by adding a layer of OPE, which also
leaks some information to adversaries. As opposed
to CryptDB, even if the same data is queried several
times, our solution preserves data and query privacy.
6 CONCLUSION
This paper proposes an SSE scheme for SQL data-
bases. The proposed solution builds upon the alre-
ady existing searchable encryption proposed by Curt-
mola et al. (Curtmola et al., 2006), which is trans-
formed to an SQL-compatible scheme. Our solu-
tion supports several query functionalities including
range and Boolean queries. Fruthermore, thanks to
the use of cuckoo hashing, the search operation beco-
mes more efficient. We finally present a framework
for implementation which embeds the search algo-
rithm into PostgreSQL and that converts plain SQL
queries into “encrypted” SQL queries directly execu-
table by the Postgres server. This framework is evalu-
ated in terms of performance using an e-health data-
base.
Our future work consists in developing an optimi-
zed system for databases with million of records. We
plan to conduct an in-depth performance evaluation
including comparison with existing -comparable- im-
plementations, if any, to show its practicality in real-
world scenarios.
ACKNOWLEDGMENTS
This work was partially funded by European Com-
mission through the H2020 project CLARUS (grant
No. 644024). We particularly thank the Fundaci
´
o
Cl
´
ınic per a la Recerca Biom
`
edica (FCRB) for pro-
viding our test e-health database. The authors also
would like to thank Mr. Yiadh Tlijani for his help in
simulations.
REFERENCES
Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. (2004).
Order Preserving Encryption for Numeric Data. In
Proceedings of the 2004 ACM SIGMOD international
conference on Management of data, pages 563–574.
ACM.
Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ros¸u, M.-
C., and Steiner, M. (2013). Highly-Scalable Searcha-
ble Symmetric Encryption with Support for Boolean
Queries. In Advances in Cryptology–CRYPTO 2013,
pages 353–373. Springer.
Chamberlin, D. D. and Boyce, R. F. (1974). SEQUEL: A
Structured English Query Language. In Proceedings
CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science
66
of the 1974 ACM SIGFIDET (Now SIGMOD) Works-
hop on Data Description, Access and Control, SIGFI-
DET ’74, pages 249–264. ACM.
Chase, M. and Kamara, S. (2010). Structured Encryption
and Controlled Disclosure. In International Confe-
rence on the Theory and Application of Cryptology
and Information Security, pages 577–594. Springer.
Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.
(2006). Searchable Symmetric Encryption: Improved
Definitions and Efficient Constructions. Cryptology
ePrint Archive, 2006:210.
Fredman, M. L., Koml
´
os, J., and Szemer
´
edi, E. (1984).
Storing a Sparse Table with 0(1) Worst Case Access
Time. J. ACM, 31(3):538–544.
GPDR (2016). General Data Protection Regulation. Official
Journal of the European Union, L119:1–88.
Hacig
¨
um
¨
us¸, H., Iyer, B., Li, C., and Mehrotra, S. (2002).
Executing SQL over Encrypted Data in the Database-
service-provider Model. In Proceedings of the 2002
ACM SIGMOD international conference on Manage-
ment of data, pages 216–227. ACM.
Kamara, S. and Papamanthou, C. (2013). Parallel and Dyn-
amic Searchable Symmetric Encryption. In Internati-
onal Conference on Financial Cryptography and Data
Security, pages 258–274. Springer.
Pagh, R. and Rodler, F. F. (2004). Cuckoo Hashing. Journal
of Algorithms, 51(2):122–144.
Pappas, V., Krell, F., Vo, B., Kolesnikov, V., Malkin, T.,
Choi, S. G., George, W., Keromytis, A., and Bellovin,
S. (2014). Blind Seer: A Scalable Private DBMS. In
2014 IEEE Symposium on Security and Privacy (SP),
pages 359–374. IEEE.
Poddar, R., Boelter, T., and Popa, R. A. (2016). Arx: A
Strongly Encrypted Database System. IACR Crypto-
logy ePrint Archive, 2016:591.
Popa, R. A., Redfield, C., Zeldovich, N., and Balakrishnan,
H. (2012). CryptDB: Processing Queries on an En-
crypted Database. Communications of the ACM,
55(9):103–111.
Song, D. X., Wagner, D., and Perrig, A. (2000). Practical
Techniques for Searches on Encrypted Data. In Pro-
ceeding 2000 IEEE Symposium on Security and Pri-
vacy. S P 2000, pages 44–55.
Yao, A. C.-C. (1986). How to Generate and Exchange Se-
crets. In 27th Annual Symposium on Foundations of
Computer Science, 1986, pages 162–167. IEEE.
Framework for Searchable Encryption with SQL Databases
67