A Multiple-server Efﬁcient Reusable Proof of Data Possesion from

Private Information Retrieval Techniques

Juan Camilo Corena, Anirban Basu, Yuto Nakano, Shinsaku Kiyomoto and Yutaka Miyake

Information Security Group, KDDI R&D Labs Inc. Fujimino, Saitama, Japan

Keywords:

Cloud Storage, Proof of Data Possession, Private Information Retrieval.

Abstract:

A proof of Data Possession (PDP) allows a client to verify that a remote server is still in possession of a ﬁle

entrusted to it. One way to design a PDP, is to compute a function depending on a secret and the ﬁle. Then,

during the veriﬁcation stage, the client reveals the secret input to the server who recomputes the function and

sends the output back to the client. The client can then compare both values to determine if the server is still

in possession of the ﬁle. The problem with this approach is that once the server knows the secret, it is not

useful anymore. In this article, we present two PDP schemes inspired in Multiple-Server Private Information

Retrieval (MSPIR) protocols. In a traditional MSPIR protocol, the goal is to retrieve a given block of the ﬁle

from a group of servers storing identical copies of it, without telling the servers what block was retrieved.

In contrast, our goal is to let servers evaluate a function using an input that is not revealed to them. We

show that our constructions are secure, practical and that they can complement existing approaches in storage

architectures using multiple cloud providers. The amount of transmitted information during the veriﬁcation

stage of the protocols is proportional to the square root of the length of the ﬁle.

1 INTRODUCTION

The popularity of cloud-based storage services has

fostered the development of primitives to guarantee

that the owners of the information can retrieve their

data when needed. Given the pay-as-you-go model

for storage in cloud providers, it is necessary to per-

form this task in an efﬁcient way to minimize the cost

and resources. Several solutions exist in the litera-

ture for this problem, besides the obvious solution of

downloading the entire ﬁle and performing a compu-

tation over it locally. In the literature, the two most

relevant solutions for this problem are named Proofs

of Data Possession (PDPs) (Ateniese et al., 2007) and

Proofs of Retrievability (Shacham and Waters, 2008)

(PORs). The difference between these primitives is

that even though in both of them blocks of a given

ﬁle are checked to be stored correctly, in the latter an

Erasure Code is applied to guarantee that the ﬁle is

actually retrievable.

We consider a scenario where there is a set of

users U that stores data blocks at a set of remote

servers S through a local trustworthy proxy P, similar

to an enterprise setting with an in-house proxy, or a

website using cloud infrastructure. The requirement

for several remote servers (or clouds) comes from

a redundancy perspective given that cloud providers

also present outages (Raphael, 2013).

A PIR protocol (Chor et al., 1998) allows a client

to query a replicated database, in such a way that no

server knows what record was retrieved by the client.

We wish to apply ideas from MSPIR schemes to the

problem of reusing secrets securely in PDP schemes.

Even though the idea is very natural, it is usually be-

lieved that PIR protocols are too slow to be used in

practice (Sion and Carbunar, 2007). For this reason,

the approach has not been developed fully and has

been deemed only of theoretical interest (Hanser and

Slamanig, 2013). However, recent advances in the

area (Olumoﬁn and Goldberg, 2012) have made PIR

more practical even for the single server scenario.

We show that PIR can be used in a real system in

the context of proving data possession, based on the

following observations: data storage in cloud servers

involves replication, making fast MSPIR schemes

such as (Chor et al., 1998) feasible for this scenario.

Cloud providers have incentives not to cooperate with

each other (e.g. market share). Some efﬁcient PIR

schemes can be extended to not just retrieve some

blocks form the server, but also to apply a function

over the entire ﬁle. We present a construction achiev-

ing this in Section 4.

307

Camilo Corena J., Basu A., Nakano Y., Kiyomoto S. and Miyake Y..

A Multiple-server Efﬁcient Reusable Proof of Data Possesion from Private Information Retrieval Techniques.

DOI: 10.5220/0005049803070314

In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 307-314

ISBN: 978-989-758-045-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

In this work we present two approaches, the ﬁrst

of them is very intuitive: when the proxy has access

to a block, it simply stores a hash of the block lo-

cally. The veriﬁcation procedure involves using PIR

to download some blocks, in order to verify that their

hashes match the ones stored by the proxy. The in-

tuition for security is that by using PIR, the servers

have to compute a function involving all the blocks.

If a server is storing even a single corrupt block, this

will be reﬂected in the output of the function.

One advantage of this approach is that it requires

no additional storage at the server. Processing at the

proxy is light since it only involves the computation

of a dot product. In addition, it can support dynamic

ﬁles efﬁciently. The drawback of this approach is that

for the non-retrieved blocks, it does not check that the

blocks are those stored by P, but simply that all the

servers are storing the same value. It is also possible

to reduce the local storage at P by storing hashes of

blocks locally with a probability q.

To overcome the previous security drawback, we

designed an additional scheme that uses PIR tech-

niques not just for retrieval but also for computing a

secret function over the data. The construction can

be explained with a toy example: assume we want to

perform PIR to recover b

over a database B with ﬁve

elements b

; the database is stored at two

servers S

and S

. The client sends to each server the

following vectors:

[1]

= (−2, 1, −5,2,−1)

[2]

= (2,−1, 5,−1,1) (1)

note that V

[1]

+ V

[2]

= (0, 0,0,1,0) = E

[4]

, where E

[i]

is a vector consisting of 0s in all coordinates except at

coordinate i where it is 1. Now, each server computes

the dot product “·” between the received vector and

its local version of B. Given the properties of the dot

product, we have that:

B·V

[1]

+ B·V

[2]

=B·(V

[1]

+ V

[2]

)

=B·E

[4]

= b

. (2)

Therefore, b

can be recoverd by adding the response

from each server. In this sense, this PIR protocol is

computing the dot product of a secret vector E

[i]

and

the database B. The idea of our PDP is to select a

random vector R and compute R·B before uploading

the ﬁle to the servers. To verify, we select two ran-

dom vectors such that V

[1]

+ V

[2]

= R. Since each

vector V

[i]

does not give any information about R,

we can verify many times without leaking signiﬁcant

information about our secret vector. Given that all

the elements of B are used in the computation, any

change or deletion at any of the servers will be de-

tected with high probability. In the current scheme,

the client must upload a number of elements equal to

the size of the database |B|. However, by representing

the database as a square of length

|B|, it is possible

to reduce the total transmission to 2|S|

|B|, where

|S| is the number of servers. The security assumption

in the schemes, is that the servers do not communicate

among themselves.

1.1 Contributions

The contributions of this work are as follows.

1. We present a novelway to create PDPs, by extend-

ing current ideas in multi-server PIR protocols.

2. Our constructions are reusable, can test several

servers simultaneously and one of them is very ef-

ﬁcient for dynamic ﬁles. Even though we are not

the ﬁrst ones to propose a system with these prop-

erties (see (Le and Markopoulou, 2012) for a sys-

tem involving multiple servers), our constructions

are simpler and easier to implement for practition-

ers.

3. We show that our PIR-based constructions are

practical given the current trends in remote infor-

mation storage.

Regarding existing schemes, the drawback of our

constructions is that they do not achieve a property

called Public Veriﬁability. For this property to hold,

anyone should be able to verify the ﬁle is stored, re-

gardless of the ﬁle’s access control policies. Since our

proposals may reveal the ﬁle contents to the veriﬁer,

they should not be used by unauthorized third parties.

The rest of the article is organized as follows: In

Section 2 we present the problem scenario; in Section

3 we present existing work related to our proposal;

in Section 4 we present our PIR-based constructions

and their proof of security; in Section 5 we present

the results of the simulation of our proposal and ex-

isting constructions; ﬁnally, in section 6 we present

the conclusions of this work.

2 PROBLEM STATEMENT AND

NOTATION

There is a set of users U that connects to a set of re-

mote servers S = S

,...,S

through a local proxy P. P

forwards all user requests to S and it is assumed to be

trustworthy. By trustworthy we mean that the results

reported by P about its operations, reﬂect its view of

the system accurately. On the other hand, members of

S might want to hide data loss/corruption from mem-

bers of U. Even though P has storage capabilities,

SECRYPT2014-InternationalConferenceonSecurityandCryptography

308

the amount of storage available at the members of S is

signiﬁcantly larger. We also assume that P is able to

perform computations to help members of U to verify

that their remote data is stored as intended.

Operations will be performed at S and P in units

called blocks. For practical purposes, these blocks are

around 4 or 8 KB which is the usual parameter for

ﬁle systems. It is possible to interact with the storage

service using 3 operations namely:

• write(pos, data, length): this operations writes

length blocks stored in data starting at block pos.

• read(pos): reads the block stored at position pos.

• delete(pos): deletes the block stored at postion

pos.

Our goal is to verify the correctness and complete-

ness of the read, write, delete operations in S. By cor-

rectness we imply that the information is stored as it

was sent by the users. By completeness, we mean that

S is returning the requested information in its current

state.

The types of attack that can be possible to perform

by S are the following:

• A read operation returns a random value, or a pre-

vious value for the block.

• A write operation writes a different data.

• A delete operation might not be executed.

We will denote vectors and matrices by bold cap-

ital letters (e.g. V, B). Unless otherwise deﬁned,

the elements of a given vector will be represented by

lower indices, such as V

. Positions in the matrix

will be given by two lower indices enclosed by square

brackets and separated by a comma. Thus, B

[k, j]

rep-

resents position k, j of matrix B. A column j of a

matrix will be represented by B

[:, j]

, conversely the k-

th row will be represented by B

[k,:]

. Upper indices in

vectors will be used to denote vectors and matrices

that are used or stored by a given server. According

to this, V

[i]

is a version of vector V used by server S

Similarly, B

[i]

[k, j]

represents the element k, j of a matrix

at server S

. The need to represent different versions

of a given vector arises from possible local variations

due tu corruption.

Another use of upper indices is to represent vec-

tors of a given class, such as E

[i]

which denotes the

i-th row of the identity matrix. The operator |V| de-

notes the number of coordinates of a vector. When

used on a set (e.g. |S|), it denotes the number of ele-

ments of the set. Finally, when used on a function, the

operator denotes the size of the output of the function.

3 EXISTING WORK

Existing work in this area includes several ap-

proaches. On a general perspective there has been

signiﬁcant research on authenticated data structures

(Tamassia, 2003). These structures can be used to

verify that the elements returned by the remote server

contain certain properties, such as being part of a ﬁle.

Proofs of Data Possession (PDPs) allow to check

a ﬁle remotely without downloading it. To this date,

many constructions are available, including: trap-

door functions based on discrete logarithms (Ate-

niese et al., 2007), proofs based on vector opera-

tions ans pseudo random functions (Shacham and

Waters, 2008), the previous schemes can be set up

for public veriﬁability. Other constructions include:

adversarial error correcting codes (Bowers et al.,

2009), commitment schemes over linear functions

(Xu and Chang, 2012), authenticated encryption (Ate-

niese et al., 2008) and hardness ampliﬁcation (Dodis

et al., 2009). Other lines of research include: mul-

tiuser batch authentication of ﬁles (Wang et al., 2010),

where a third party can peform tests on behalf of

many users simultaneously; audits for dynamic ﬁles

(Zhu et al., 2013); guaranteeing that multiple en-

crypted copies can be recovered without additional

setup processing (Curtmola et al., 2008); simultane-

ous public and private veriﬁability (Hanser and Sla-

manig, 2013); veriﬁcation for encoded ﬁles (Le and

Markopoulou, 2012), (Corena and Ohtsuki, 2013);

None of these approaches use PIR techniques to cre-

ate reusable schemes. Schemes based on Oblivious

RAM (ORAM) have also been proposed (Cash et al.,

2013), (Apon et al., 2014), their goal is to hide the

access pattern of the ﬁle, but their overhead is consid-

erable.

A related primitive to proofs of data possession

is Private Information Retrieval (PIR) (Chor et al.,

1998), where a client wishes to retrieve records from

a server without the server knowing what item was

retrieved. In particular, we are interested in proto-

cols where there are several servers storing the same

database and that are not allowed to communicate

among themselves, such as the one from (Chor et al.,

1998). This protocol is similar to those described in

the introduction.

Efﬁcient single server PIR is also possible. In

(Trostle and Parrish, 2011) Trostle and Parrish apply

a set of random coefﬁcients to the database to return

a single noisy value. The noise can be subtracted in

an oblivious way. Single server PIR does not lend it-

self well for our constructions since the mechanisms

used for cancelling the noise works regardless of the

correctness of a given block.

AMultiple-serverEfficientReusableProofofDataPossesionfromPrivateInformationRetrievalTechniques

309

4 PROPOSAL

In this section we will present two proposals: the ba-

sic one will sample some elements that were stored by

P using a PIR protocol. The second one will compute

a function over all the blocks.

4.1 Sampling Scheme

The general idea of our construction is to store at P a

function of the blocks, and then ask the cloud servers

for the blocks to verify them locally. The reason we

need PIR to achieve this is twofold. First, PIR proto-

cols apply a function over all the blocks participating

in the test. Second, we do not want the cloud servers

to know what blocks have been requested by P. Oth-

erwise, a non-persistent attacker that is able to write

the same value at a given position for several servers,

can reduce the detection capabilities of the scheme.

We will now present our approach based on sampling:

Setup: A user sends to P a block B and a value i

denoting the absolute position of this block in the

ﬁle. P computes H(s,B||i) and stores it locally.

Here, H is a MAC function (e.g. HMAC) using

a secret s, over data B||i. The operator || is the

concatenation operator. Finally, P uploads B to S.

Challenge: Each member of S models the L blocks

of the ﬁle as a matrix, we call this matrix the

matrix representation of the ﬁle:

B =







... B

√

L−

√

L+1

... B







. (3)

If L is not a square number, then ﬁll the remain-

ing positions of the matrix with a padding scheme

over the incomplete columns. Then, P creates |S|

random vectors V

[1]

,...,V

[|S|]

of length

√

L such

that

|S|

∑

i=1

[i]

= E

[ j]

(4)

where j is the row of the matrix that wants to be

retrieved, and E

[ j]

is a vector full of 0s except

at coordinate j where it is 1. Now P sends V

[i]

to S

,1 ≤ i ≤ |S| and asks them to compute the

product of B

[i]

[ j,k]

= V

[i]

·B

[:,k]

for all the columns

1 ≤ k ≤

√

L .

Veriﬁcation: Once P has received all the responses

from each server, it adds them to obtain the re-

turned row j for each column k

[ j,k]

|S|

∑

i=1

[i]

[ j,k]

. (5)

For each element B

[ j,k]

, P inverts the mapping

function used for the matrix representation, to

obtain the real index i

′

of this block. Next it

computes H(s,B

′

||i

′

) and veriﬁes that it has been

stored locally. If the previous test does not hold,

then we know there is a problem in one of the

blocks mapped to the correspondingcolumn in the

matrix representation. The column is considered

correct otherwise.

Even though it would be tempting not to store the

output of H at P for each block, and store it at the

servers. That would make the system vulnerable to an

attack where a previous version of B

[ j,k]

is returned

by the servers. In such a case, the MAC would verify

correctly, without detecting that there is a new version

of the block. The purpose of storing the output of H

locally, is then to guarantee freshness in the retrieved

blocks.

Regarding the detection capabilities of this

scheme, it is different from a naive sampling scheme

where blocks are retrieved at random without PIR.

The reason is that all blocks involved in the proof are

included in the computation. If any single part of any

block at any of the servers has the wrong value, it will

be detected with high probability. In contrast, naive

sampling can only detect problems in the blocks that

were downloaded by P. We will now formalize this

claim

Claim 1. Our proposed system can detect bit decay

or adversarial modiﬁcation of the ﬁle by an adversary

who does not corrupt all servers, with signiﬁcantly

higher probability than the naive sampling scheme as

the ﬁle grows.

Proof. A scheme that samples random blocks from a

ﬁle with L blocks, can detect at least one defective

block out of d defective blocks with probability:

1−



L−d



(6)

where τ is the number of sampled blocks. The expres-

sion follows from complementing the probability of

always selecting a good element given by (L −d)/L

on all the τ tests.

As the ﬁle length L grows, the detection probabil-

ity of the naive sampling approach tends to 0, given

that:

lim

L→∞



1−



L−d



= 0. (7)

Now consider our method that retrieves the row j

from the ﬁle matrix using PIR. Seen from the perspec-

tive of a single column k, the block B

[ j,k]

is retrieved

SECRYPT2014-InternationalConferenceonSecurityandCryptography

310

by computing:

[ j,k]

= V

[1]

·B

[1]

[:,k]

+ ... + V

[|S|]

·B

[|S|]

[:,k]

(8)

where B

[i]

[:,k]

is the k-th column of matrix B at the i-th

server and

∑

|S|

i=1

[i]

= E

[ j]

. If any element B

[i]

[r,k]

is dif-

ferent at one of the |S| servers, and assuming that the

modiﬁed element equals B

[i]

[r,k]

= B

[r,k]

where N

is a column noise vector vector full of zeroes except

at N

[r,k]

6= 0. The contribution of the i-th server to the

sum becomes:

[i]



[i]

[:,k]

+ N



. (9)

By combining (9) and (8), it is possible to verify that

the result retrieved by the client for the k-th column

is:

[ j,k]

+ V

[i]

N. (10)

Since N

[r,k]

6= 0, then V

[i]

N can take any possible value

of the ﬁnite ﬁeld F where computations are being per-

formed. The probability of getting a value that when

applied to H provides the right result, is less than or

equal to c/|H|. Where c is the maximum number of

elements from F assigned to a single output of H. By

selecting a proper size for |H|, c can be made close to

1 with overwelmingprobability. Hence, we can detect

random bit decay with probability:

|F|−c

|F|

. (11)

This proves that the detection capabilities of our

scheme are better than in the naive sampling ap-

proach, regardless of the sampling parameters se-

lected for small d in the asymptotic case.

It is important to note that the previous proof does

not imply that the ﬁle is safe from an adversary that

can modify the information at all the servers. Con-

sider an adversary that can modify B

[i]

[r,k]

,∀i ∈S,r 6= j

with the same value. Since the challenge phase of our

construction wishes to cancel the contribution of each

of the servers when r 6= j, our test is verifying that

the non-retrieved rows are storing the same value at

all the servers. Therefore, this method can only de-

tect a smart adversary, when the exact row that was

modiﬁed is retrieved. In terms of detection probabil-

ity for this scenario, our scheme would be equivalent

to the naive sampling scheme over each column of the

matrix representation of the ﬁle.

4.2 A Scheme Against Smart

Adversaries

To address the concern of a smart adversary that mod-

iﬁes blocks using a well deﬁned strategy aimed at

fooling the veriﬁer, we can modify the system to re-

turn a result that includes information from all the

given blocks in a column. The main observation of

this scheme is that in a PIR protocol, we want random

vectors with this property:

∑

i=1

[i]

= E

[ j]

. (12)

However, for our purposes of verifying information,

what we want to compute is the result of applying a

random vector V to the columns of the matrix repre-

senting the ﬁle, hence

∑

i=1

[i]

= V. (13)

This is equivalent to applying a dot product using

replication as a way to mask the secret vector V from

the servers.

The modiﬁed scheme is as follows:

Setup: P wishes to upload a ﬁle which is modeled

as a matrix B in the same way as in the Challenge

phase of the sampling protocol. Then, P generates

a random vector V and computes the dot product

between V and each of the columns of B to gen-

erate the values:

= V·B

[:,k]

(14)

where B

[:,k]

represents the k-th column of matrix

B. Then, P uploads the ﬁle to the members of S

and stores the σ

valueseither locally or encrypted

at S.

Challenge: P creates challenge vectors V

[i]

for each

server such that

∑

i=1

[i]

= V. (15)

Each server applies vector V

[i]

to all the columns

of its local version of the ﬁle B

[i]

and returns the

values σ

[i]

for each column k.

Veriﬁcation: Once P has received all the responses

[i]

from all the servers, it adds them in the fol-

lowing way:

′

|S|

∑

i=1

[i]

(16)

to obtain the result of computing the MAC to the

given column. If σ

= σ

′

,1 ≤ k ≤ |S| then all the

servers are correct; otherwise, there is an error.

Similar to the previous scheme, since no information

from V is revealed to any client, the same V can be

reused many times without compromising its security.

AMultiple-serverEfficientReusableProofofDataPossesionfromPrivateInformationRetrievalTechniques

311

Compared to the previous scheme, it is not possible

for an adversary to set a random value B

[i]

[r,k]

∀i ∈ S,

because it would alter the total sum σ

′

. This happens

because V

[i]

6= 0 with high probability. We can avoid

the possibility of V

[i]

= 0 by selecting a larger ﬁeld

or sampling a different number from a pseudorandom

function if the output in the sequence is 0.

One difference between this scheme and the pre-

vious one, is that we are not returning actual blocks

from the ﬁle, but rather a function of the blocks. For

this reason, we need to prove that actually passing the

test implies that the servers are storing a copy of the

ﬁle.

Claim 2. A set of servers computing the function cor-

rectly can recover the ﬁle with high probability.

Proof. From the scheme’s description it is possible to

see that servers who have the correct ﬁle, can compute

the function correctly. Now consider a scenario where

at least one server is missing one correct block in a

column. There are two options for these servers to

provide a satisfactory response:

1. Send a random value and expect that the result

adds to the correct σ

. This happens with prob-

ability 1/|F| because of the properties of the dot

productand the combinationprocedure performed

at P. By selecting a larger ﬁeld size, the probabil-

ity of this option succeeding becomes negligible.

2. Use previous responses to infer the right answer.

This is possible whenever the new challenge vec-

tors sent by P are linearly dependent to the previ-

ous ones. Assume the result for previous vectors

[i]

was:

= B

[i]

[:,k]

[i]

and b

= B

[i]

[:,k]

[i]

. (17)

Then, for a given vector αV

[i]

+ βW

[i]

where α,β

are coefﬁcients, the result is αa

+ βb

. This re-

sult can be computed by a server even when B

[i]

[:,k]

is not available. Now, assume q = |F|and n =

√

Then, in order to reply to any query from P,

the server needs to have n vector-response pairs.

However, if this information is available, the ﬁle

can be recovered using Gaussian Elimination. If

less than n vector-response pairs are available, a

copy of the ﬁle is still needed to reply to most

queries.

To understand why this reasoning is true, con-

sider the best scenario for a cheating server, that

is: having the output of n −1 linearly indepen-

dent vector-response pairs. In such a case, there

are still λ = q

−q

n−1

vectors that cannot be pro-

duced as a linear combination of the n−1 vector-

response pairs owned by the server. Here q

the total number of vectors of length n over F and

n−1

is the number of linear combinations that can

be formed with n −1 vectors. If we select a vec-

tor at random, the probability of choosing a vector

for which the server does not have the necessary

information to reply is:

−q

n−1

= 1−

. (18)

Therefore, the probability of selecting a vector for

which the server can reply correctly without hav-

ing a copy of the ﬁle is:

1−



1−



. (19)

This probability becomes very small as q grows.

In addition, the vector-response representation for

the ﬁle is not advantageous for the server, since it

requires more storage than storing the column it-

self. The previous argument also holds even if the

servers uses smaller subvectors for the columns.

The drawback of this scheme is that when blocks

are being overwritten, we need to subtract the contri-

bution of the previous version of the block B

[ j,k]

a given σ

. Then, we must add the contribution of

the new version of the block B

′

[ j,k]

. The procedure is

illustrated in this equation:

′

= σ

−V

[ j,k]

+ V

′

[ j,k]

. (20)

Unfortunately, this procedure involves downloading

the current block B

[ j,k]

from some member of S.

For this reason, we believe this construction is more

suited for systems where the blocks do not change of-

ten, as any update operation involves one additional

read operation.

Up to this point, both proposals can detect whether

at least one of the servers is storing corrupt data, but

they do not identify which one exactly. In the next

section we will see how to ﬁnd the corrupt servers

from the received responses.

4.3 Finding the Corrupt Servers

The two proposed schemes involve hiding the secret

vector by splitting it into several vectors; shares of this

vector are sent to the different nodes. This is a par-

ticular case of a Threshold Scheme (Shamir, 1979),

where n shares are created and at leastt t + 1 ≤ n of

them are needed to recover the secret.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

312

One could try several approaches to ﬁnd the cor-

rupt server, such as repeating the test with a different

subset of them. Unfortunately, this is not efﬁcient.

A better approach is presented in (Goldberg, 2007):

assume that our secret vector V has |V| coordinates

,...,v

|V|

. For each coordinate of V, create a poly-

nomial of degree t < |S|−1

(x) = a

j,t

+ ... + a

j,1

x+ a

j,0

(21)

where a

j,0

= v

(the secret to be shared) and the other

coefﬁcients a

j,k

,k 6= 0 are selected randomly. Here,

t + 1 denotes the minimum number of servers needed

to recover V. Each server receivesa vector of the form

),... , f

|V|

) (22)

where c

is a random coefﬁcient used for the server i

for this particular test. P receives the dot product of

this vector with the column k of the ﬁle matrix:

[1]

[ j,k]

),...,B

[|S|]

[ j,k]

|S|

). (23)

By selecting any t + 1 of them, it is possible to inter-

polate and recover B

[ j,k]

, and given that a

is known

by P, it is possible to recover B

[ j,k]

. Since we are

working in an scenario where we have evaluations of

the same polynomial at different points, we need to

solve an interpolation with errors problem. This is

equivalent to decoding Reed-Solomon codes.

Given the previous presentation of Goldberg’s

scheme for PIR, it is clear that it can be applied for our

sampling scheme. For the scheme considering smart

adversaries, the element returned from the k-th col-

umn of the i-th server will be of the form

|V|

∑

j=1

[i]

[ j,k]

) (24)

Once we select the result of column k at t + 1 servers,

the result of this sum over the free term once interpo-

lation is performed, is given by:

|V|

∑

j=1

[i]

[ j,k]

j,0

|V|

∑

j=1

[i]

[ j,k]

= σ

(25)

Which is the same value we expected to obtain in the

scheme using vector addition. If there are some nodes

returning different values, then a decoding procedure

that can ﬁnd the errors can be applied to identify the

corrupt servers.

In terms of bandwidth usage, the advantage of

using a more general secret sharing scheme, is that

we only need to send |S| vectors to the servers and

we can determine what nodes are transmitting correct

values to P. In terms of security, it prevents corrupt

servers from performing attacks based on P’s reac-

tion to wrong responses. One instance of such attack

is presented in (Patterson and Sassaman, 2007) where

a covert channel for the servers is created. The disad-

vantage of this scheme is that we now need to evaluate

a polynomial of degree t + 1, |S| times for every coor-

dinate of the secret vector V. This makes the protocol

more computationally intensive for P.

5 SIMULATION

The simulation environment consisted of a Windows

8.1 machine with an Intel Core i7 −3770 CPU run-

ning at 3.4 GHz and 16 GB of RAM. The program-

ming environment was the Java JDK 1.7.0 release

21. We used the JDK’s BigInteger class for large

number arithmetic and set the prime for the ﬁeld as

170141183460469231731687303715884105757, the

ﬁrst prime number larger than 2

128

. The purpose of

this implementation was to show that the proposal is

practical, rather than provide an optimized version of

it.

We used a ﬁle with 32768 blocks of size 4 KB.

For this parameters, the matrix representation had

182 blocks per column. Each column had in total

e = 46592 elements each one of size 17 bytes. These

parameters correspond to a ﬁle with a length similar to

128 MB. We experimented with Shamir’s scheme in-

volving polynomial evaluation and the scheme where

vectors are added. Shamir’s scheme was considerably

slower, taking 2421 ms for 10 servers and 542 ms for

3 servers. The vector scheme took 990 ms and 400

ms respectively. The evaluation routine for Shamir’s

scheme was performed using Horner’s rule. The dot

product step on the matrix took 5172 ms, for an ef-

fective throughput of 24.74 MB/s. The total amount

of information needed to be transmitted to verify for

the 3-server scenario was 4.26 MB corresponding to

3.3% of the total ﬁle. The time needed to upload and

download the data was not included, since it varies

according to the network.

To compare our construction with an existing one,

we implemented the private veriﬁcation scheme from

(Shacham and Waters, 2008). We set the transmis-

sion overhead to a constant at the cost of increasing

the storage at the server to twice the original size of

the ﬁle. This was done to compare against the most

transmission-efﬁcient version of the scheme. The

ﬁeld used for computations was the same. Through-

put on the server part was 12.1 MB/s given that the

server needs to generate random numbers. Genera-

tion of the challenge from the client is signiﬁcantly

faster, since it only involves sending the seeds of a

random generator.

AMultiple-serverEfficientReusableProofofDataPossesionfromPrivateInformationRetrievalTechniques

313

6 CONCLUSIONS

We presented two PDPs based on fast multi-server

PIR that have several desirable properties and whose

complexity is sublinear in the size of the ﬁle. We

showed that the proposals can detect data corruption

due to random failures with high probability. One of

the proposals can work with dynamic ﬁles and has a

very fast setup stage that only involves a hash func-

tion. Its drawback, is that it cannot detect corrup-

tion when an attacker modiﬁes the servers in a coordi-

nated fashion. This drawback is solved in the second

scheme; however, the scheme pays a penalty when it

is used for dynamic ﬁles, by requiring an additional

read operation. The downside of both schemes is the

size of the transmitted information and lack of secure

public veriﬁability.

REFERENCES

Apon, D., Katz, J., Shi, E., and Thiruvengadam, A.

(2014). Veriﬁable oblivious storage. In Public-Key

Cryptography–PKC 2014, pages 131–148. Springer.

Ateniese, G., Burns, R., Curtmola, R., Herring, J., Kissner,

L., Peterson, Z., and Song, D. (2007). Provable data

possession at untrusted stores. In Proceedings of the

14th ACM Conference on Computer and Communica-

tions Security, CCS ’07, pages 598–609.

Ateniese, G., Di Pietro, R., Mancini, L. V., and Tsudik, G.

(2008). Scalable and efﬁcient provable data posses-

sion. In Proceedings of the 4th international confer-

ence on Security and privacy in communication net-

works, page 9. ACM.

Bowers, K. D., Juels, A., and Oprea, A. (2009). Hail: a

high-availability and integrity layer for cloud storage.

In Proceedings of the 16th ACM conference on Com-

puter and communications security, pages 187–198.

ACM.

Cash, D., K¨upc¸ ¨u, A., and Wichs, D. (2013). Dynamic

proofs of retrievability via oblivious ram. In Advances

in Cryptology–EUROCRYPT 2013, pages 279–295.

Springer.

Chor, B., Kushilevitz, E., Goldreich, O., and Sudan, M.

(1998). Private information retrieval. Journal of the

ACM (JACM), 45(6):965–981.

Corena, J. C. and Ohtsuki, T. (2013). Proofs of data posses-

sion and pollution checking for regenerating codes. In

Global Communications Conference (GLOBECOM),

2013 IEEE, pages 2717–2722.

Curtmola, R., Khan, O., Burns, R., and Ateniese, G. (2008).

Mr-pdp: Multiple-replica provable data possession.

In Distributed Computing Systems, 2008. ICDCS’08.

The 28th International Conference on, pages 411–

420. IEEE.

Dodis, Y., Vadhan, S., and Wichs, D. (2009). Proofs of

retrievability via hardness ampliﬁcation. In Theory of

Cryptography, pages 109–127. Springer.

Goldberg, I. (2007). Improving the robustness of private

information retrieval. In Security and Privacy, 2007.

SP’07. IEEE Symposium on, pages 131–148. IEEE.

Hanser, C. and Slamanig, D. (2013). Efﬁcient simultaneous

privately and publicly veriﬁable robust provable data

possession from elliptic curves. In SECRYPT 2013,

pages 15–26. SciTePress.

Le, A. and Markopoulou, A. (2012). Nc-audit: Auditing for

network coding storage. In Network Coding (NetCod),

2012 International Symposium on, pages 155–160.

Olumoﬁn, F. and Goldberg, I. (2012). Revisiting the com-

putational practicality of private information retrieval.

In Financial Cryptography and Data Security, pages

158–172. Springer.

Patterson, M. L. and Sassaman, L. (2007). Subliminal chan-

nels in the private information retrieval protocols. In

Proceedings of the 28th Symposium on Information

Theory in the Benelux, NL.

Raphael, J. (2013). The worst

cloud outages of 2013 (so far),.

http://www.infoworld.com/slideshow/107783/the-

worst-cloud-outages-of-2013-so-far-221831. Ac-

cessed: April 9th 2014.

Shacham, H. and Waters, B. (2008). Compact proofs of

retrievability. In Advances in Cryptology-ASIACRYPT

2008, pages 90–107. Springer.

Shamir, A. (1979). How to share a secret. Communications

of the ACM, 22(11):612–613.

Sion, R. and Carbunar, B. (2007). On the computational

practicality of private information retrieval. In Pro-

ceedings of NDSS.

Tamassia, R. (2003). Authenticated data structures. In

Algorithms-ESA 2003, pages 2–5. Springer.

Trostle, J. and Parrish, A. (2011). Efﬁcient computationally

private information retrieval from anonymity or trap-

door groups. In Information Security, pages 114–128.

Springer.

Wang, C., Wang, Q., Ren, K., and Lou, W. (2010). Privacy-

preserving public auditing for data storage security in

cloud computing. In INFOCOM, 2010 Proceedings

IEEE, pages 1–9.

Xu, J. and Chang, E.-C. (2012). Towards efﬁcient proofs of

retrievability. In Proceedings of the 7th ACM Sympo-

sium on Information, Computer and Communications

Security, pages 79–80. ACM.

Zhu, Y., Ahn, G.-J., Hu, H., Yau, S. S., An, H. G., and Hu,

C.-J. (2013). Dynamic audit services for outsourced

storages in clouds. Services Computing, IEEE Trans-

actions on, 6(2):227–238.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

314