Solving Set Relations with Secure Bloom Filters Keeping Cardinality

Private

Louis Tajan

1

, Dirk Westhoff

1

and Frederik Armknecht

2

1

Hochschule Offenburg, Offenburg, Germany

2

University of Mannheim, Mannheim, Germany

Keywords:

Bloom Filters, Set Operations, Set Relations, Outsourced Computation.

Abstract:

We propose in this work to solve privacy preserving set relations performed by a third party in an outsourced

conﬁguration. We argue that solving the disjointness relation based on Bloom ﬁlters is a new contribution in

particular by having another layer of privacy on the sets cardinality. We propose to compose the set relations

in a slightly different way by applying a keyed hash function. Besides discussing the correctness of the set

relations, we analyze how this impacts the privacy of the sets content as well as providing privacy on the

sets cardinality. We are in particular interested in how having bits overlapping in the Bloom ﬁlters impacts

the privacy level of our approach. Finally, we present our results with real-world parameters in two concrete

scenarios.

1 INTRODUCTION

In the work at hand, we propose a protocol to solve

what we call the private outsourced disjointness test.

We suppose two parties, let it be Alice and Bob each

owning a dataset of elements, respectively A and B .

They would like to know if their datasets are disjoint,

i.e. if A ∩ B =

/

0. To do so, they will ask a third party,

Server to perform the veriﬁcation. Alice (resp. Bob)

does not want to reveal any information on her dataset

including its size to any other party.

We propose a solution based on the Bloom ﬁl-

ter (Bloom, 1970) representation along with keyed

hash functions as HMAC. We also propose to apply

this approach to a more straightforward set relation,

the private outsourced inclusiveness test. Bloom ﬁl-

ters are space-efﬁcient data structures used to repre-

sent sets and that allow to perform set membership

checks. One may argue that a simple pseudonymiza-

tion could be sufﬁcient for the above sketched sce-

nario, as solely apply a keyed hash function on the

sets (Churches and Christen, 2004). However, even if

the pseudonymization function remains private to any

other party than the Bloom ﬁlter owners, one may di-

rectly gain knowledge on the number of common el-

ements of two Bloom ﬁlters. Such a naive approach

will also reveal which pseudonym is present in none,

one or both sets. On the contrary, Bloom ﬁlter rep-

resentation has the particular feature of adding obfus-

cation to the sets. We argue that contrary to multi-

party based solutions (Kissner and Song, 2005), it has

relevance if such a protocol class is non-interactive.

For instance, it could be applied to scenarios of mo-

bile users tracking (Tajan and Westhoff, 2019) or to

cloud auditing use cases (Tajan et al., 2016) where

a third party auditor should perform veriﬁcation on

logﬁles and whitelists. The concrete contributions

of our work are threefold. Firstly, we allow a third

party entity to compute set relations namely, inclu-

siveness and disjointness in an outsourced model. To

do so, we tune the Bloom ﬁlter approach by enhanc-

ing its privacy with respect to the sets content. Such

an approach is also providing privacy on the sets

cardinality. Secondly, we present an attack to gain

the sequences cardinality in the present conﬁguration.

By analyzing the behavior of overlapping bits in the

Bloom ﬁlter environment we show what amount of

information such an attack may provide. Finally, we

implemented our solution and present our results ob-

tained for the concrete cloud security audit on access

control use case with real-world parameters.

2 PRELIMINARIES

In this section we introduce the type of operations we

are performing on sets and how we represent them

using the Bloom ﬁlters approach.

Tajan, L., Westhoff, D. and Armknecht, F.

Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private.

DOI: 10.5220/0009835904430450

In Proceedings of the 17th International Joint Conference on e-Business and Telecommunications (ICETE 2020) - SECRYPT, pages 443-450

ISBN: 978-989-758-446-6

Copyright

c

2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

443

2.1 Set Operations and Relations

Multiple types of operations could be performed on

sets but in this work, we aim to test set relations.

Some of them could be reduced to compute the cardi-

nality of some operations. For instance, being able to

compute the cardinality of the intersection of two sets

indicates whether they have elements in common or if

one set is included in the other one. For privacy con-

cerns, it could be of interest to solely reveal its cardi-

nality instead of the intersection itself. Therefore, we

propose a solution to solve two kinds of set relations

namely the inclusiveness and the disjointness.

2.2 Bloom Filters

A Bloom ﬁlter is a data structure introduced by Bur-

ton Howard Bloom in 1970 (Bloom, 1970). It is used

to represent a set of elements. With a Bloom ﬁlter

representing a certain set, one can verify whether an

element is a member of this set. Such a data struc-

ture consists of a tabular of m bits which is associated

to n

key

public hash functions. At ﬁrst, all the m bits

are initialized to 0. Moreover two functions namely

add() and test() are available. To add an element to the

Bloom ﬁlter, one has to compute the hashes of this el-

ement with each of the respective n

key

hash functions.

Then, set the bit to 1 for each position correspond-

ing to a hash value. To test whether one element is

included in the Bloom ﬁlter with the test() function,

one has, by the same manner, to compute the respec-

tive hash values of this element and verify if the re-

spective bits are set to 1. If at least one of these bits is

set to 0, then with certainty the tested element is not a

member of the set represented by the Bloom ﬁlter (i.e.

no false negative for test() could happen). On the con-

trary, with some probability, the test() function could

retrieve a false positive. Indeed, even if all the bits

that have been veriﬁed are set to 1, the tested element

may not be part of the set represented by the Bloom

ﬁlter. To express the probability of having a false pos-

itive when performing function test() we introduce the

notion of an overlapping bit.

Deﬁnition 1 (Overlapping Bit). When adding an ele-

ment to a Bloom ﬁlter, a certain bit has to be set to 1

but this bit is already set to 1.

The probability of having an overlapping bit is

null when the Bloom ﬁlter is still blank, and it grows

along with the number of inserted elements. We

express this probability as following with X

BF

A

the

amount of bits already set to 1 in BF

A

at a speciﬁc

point in time:

P

ob

=

X

BF

A

m

(1)

We could then express the average amount of different

bits added to the Bloom ﬁlter when adding one new

element to it:

X

add

= n

key

+

n

key

∑

i=1

(−1)

i

·

∑

n

key

−1

j=i

j

i

m

i

(2)

And we could generalize it to the average amount of

bits added to the Bloom ﬁlter when adding N new el-

ements to it:

X

add

(N) = n

key

· N +

n

key

·N

∑

i=1

(−1)

i

·

∑

(n

key

·N)−1

j=i

j

i

m

i

(3)

By observing the current state of a Bloom ﬁlter rep-

resenting ﬁnite set A of n

A

elements, one can ex-

press the exact amount of overlapping bits as the value

Y

BF

A

:

Y

BF

A

= (n

A

· n

key

) − X

BF

A

(4)

2.3 Adversary Models

We describe in this section how the different parties

should behave regarding the security of the protocol.

2.3.1 Alice and Bob - Honest-but-Curious

Adversaries

We consider Alice and Bob acting normally accord-

ing to the protocol description. Even if knowing any

information on the other party’s dataset could be of

interest, we consider that Alice’s and Bob’s main ob-

jective is to retrieve the result of the sets relations and

therefore they will faithfully provide correct inputs.

2.3.2 Server - Malicious Adversary

On the contrary, the third party is not trustworthy and

may behave arbitrarily. Slightly differently than the

classic deﬁnition of a malicious adversary (Goldre-

ich, 2004), we consider here solely its privacy aspect.

Indeed, we do not care about any alteration of the ﬁ-

nal result by the server. We consider that the server’s

objective is to gain any information on the sets from

Alice and Bob and not to refuse or abort the protocol

prematurely. Therefore, to ﬁnd information on their

datasets, the server (or any other malicious adversary)

could try to generate its own Bloom ﬁlter and perform

the set relations on it along with another Bloom ﬁlter

from one of the parties.

3 RELATED WORK

To the best of our knowledge, this work is the ﬁrst

to solve the set disjointness relation in an outsourced

SECRYPT 2020 - 17th International Conference on Security and Cryptography

444

conﬁguration using Bloom ﬁlter construction and pro-

viding privacy on the sets’ sizes. Computing the dis-

jointness test as a two-party secure computation prob-

lem has been proposed in several papers based on

homomorphic encryption and Pedersen commitments

for (Freedman et al., 2004; Kiayias and Mitrofanova,

2005), on “testable and homomorphic commitments”

on polynomial representations for (Hohenberger and

Weis, 2006) and Sylvester matrix and Lagrange in-

terpolation for (Ye et al., 2008) but none of these are

adapted to a third party scenario or provide privacy

on the sets’ sizes. Using Bloom ﬁlters in the objec-

tive to process operations and sets has been already

been proposed. In addition to not providing any solu-

tion to private disjointness test, (Burkhart and fontas,

2012) proposes solution for multiparty computation

while (Dong et al., 2013; Pinkas et al., 2014) for a

two-party protocol. In (Kerschbaum, 2012) we no-

tice that Kerschbaum proposes an “outsourced” ver-

sion of his protocol but requires homomorphic mul-

tiplications between Bloom ﬁlters of encrypted ele-

ments. Also, the protocol disables the server to learn

about the intersection size. Solving the set intersec-

tion cardinality implies solving the disjointness prob-

lem as in (Egert et al., 2015) but this work relies on

knowing the sizes of the sets and does not ﬁt our out-

sourced conﬁguration. Adding privacy to Bloom

ﬁlters has been investigated in several works. We

explain in what way these existing solutions do not

ﬁt our requirements and therefore our solution brings

novelty. First of all, we highlight the fact that none

of the following solutions provide privacy on the sets

cardinality. In (Goh, 2003) Goh associates Bloom ﬁl-

ters with a keyed pseudo-random function to allow

a private member testing in the Bloom ﬁlter. This

is in particular one aspect we do not want the third

party be able to do. In (Li and Gong, 2012) the au-

thors expose a construction of Bloom ﬁlters along

with HMAC protocol in a wireless sensor aggrega-

tion scenario. Their approach is somehow similar but

the base station shares HMAC keys directly with each

of the nodes. Therefore, the merging of Bloom ﬁl-

ters from different nodes does not allow any operation

since different keys are used. In (Qiu et al., 2007), still

by combining the Bloom ﬁlter approach with a keyed

hash function, the authors propose a solution to com-

pute the membership of elements in a set. Therefore,

they manipulate Bloom ﬁlters of unique elements that

leads to data leakage regarding the amount of ele-

ments and could be very costly, especially when con-

sidering thousands of them.

4 PROTOCOL

We also emphasize the fact that, to process the two set

relations, the considered Bloom ﬁlters should be sim-

ilarly generated, namely with the same size m, keyed

hash function and set of keys K. First we recall the

two privacy enhancements from tuning the classical

use of Bloom ﬁlter. Then we present the two set rela-

tions before explaining how the parameters should be

selected to guarantee a certain level of correctness.

4.1 Privacy Enhancements

Our approach to make such a technique ﬁtting for

privacy-sensitive use cases, is based on the use of

a public keyed collision-resistant hash function (e.g.

MAC) with a set of n

key

private keys instead of the

n

key

public hash functions. Without loss of generality,

we use an HMAC function to solve the two set rela-

tions. That being said, any party that does not hold the

keys cannot use the test() function to directly verify if

a speciﬁc element is included in the Bloom ﬁlter. The

other security beneﬁt when using an HMAC function

is that even if the function is publicly released, any

party that does not hold the keys cannot add additional

elements. More formally, we deﬁne a Bloom ﬁlter of

a set A = {a

1

, . . . , a

n

A

} as a tabular of m bits, with

a set of n

key

keys K = {k

1

, . . . , k

n

key

} and an HMAC

function h

k

κ

{0, 1}

∗

→ {1, . . . , m} with k

κ

∈ K as:

BF

A,(h

k

κ

)

k

κ

∈K

= b f

A

[ j]

16 j6m

(5)

where b f

A

[ j] = 1 i f ∃ (i, κ) s.t. h

k

κ

(a

i

) = j

b f

A

[ j] = 0 otherwise

In the remaining parts of this work, we use the simpli-

ﬁed notation BF

A

(resp. BF

B

) to represent the Bloom

ﬁlter of set A (resp. set B). The second privacy en-

hancement we add to the use of Bloom ﬁlters cor-

responds to keep parameter n

key

private to avoid re-

vealing the sets’ cardinalites. Indeed, the naive tech-

nique to retrieve the cardinality of the set by looking

at its respective Bloom ﬁlter would be to divide its

amount of bits set to one by parameter n

key

. There ex-

ists an optimized technique introduced by Swamidass

and Baldi (Swamidass and Baldi, 2007) which com-

putes n

∗

A

an approximation of the number of distinct

elements inserted in BF

A

with X

BF

A

the amount of bits

set to 1 in the Bloom ﬁlter:

n

∗

A

= −

m

n

key

ln

h

1 −

X

BF

A

m

i

(6)

Such a technique requires even more overlapping bits

to mislead the attacker. We argue that by making pa-

rameter n

key

private, one could not be able to compute

Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private

445

n

∗

A

anymore. One may argue the complexity of keep-

ing the size of K private or the effort to store a large

amount of keys. We could then slightly modify the

protocol to have a unique key k. Indeed, the outcome

of h

k

(x) will be divided in n

key

equal size fragments

and each indicates an index of the Bloom ﬁlter to in-

crement.

4.2 Initialization

h, n

key

, m, K ←Setup: Alice should ﬁrst choose and

generate the Bloom ﬁlter parameters: the dimen-

sion m, the HMAC function h, the amount of keys

n

key

and the set of keys K = {k

1

, . . . , k

n

key

}. She

generates parameters by performing the following

protocol:

• Randomly choose n

key

∈ [n

L

key

;n

U

key

] with inte-

gers n

key

, n

L

key

and n

U

key

.

• Set m such that X

∩=

/

0

< n

L

key

.

Values n

L

key

and n

U

key

are public and represent the

value space of n

key

. We determine them later con-

sidering correctness and privacy in Sections 4.5

and 5. The restriction on parameter m corresponds

to a correctness consideration on X

∩=

/

0

which we

explain in more details in Section 4.5.2. Then Al-

ice selects the public HMAC function h, gener-

ates its n

key

respective keys and privately shares

parameters {h, n

key

, m, K} with Bob.

BF

A

←Create(A): Alice (resp. Bob) generates the

Bloom ﬁlter of her dataset A = {a

1

, . . . , a

n

A

}

(resp. B = {b

1

, . . . , b

n

B

}):

BF

A

= BF

A,(h

k

κ

)

k

κ

∈K

= b f

A

[ j]

16 j6m

(resp. BF

B

= BF

B, (h

k

κ

)

k

κ

∈K

= b f

B

[ j]

16 j6m

)

4.3 Inclusiveness Protocol

This operator allows to verify if one set is included in

another. It performs directly on the Bloom ﬁlters of

the respective sets. To determine if A is included in

B we deﬁne BF

A⊆B

← INC(BF

A

, BF

B

):

b f

A⊆B

[ j]

16 j6m

← INC(BF

A

, BF

B

) (7)

where 0 ← b f

A⊆B

[ j] i f (b f

A

[ j] = 1 ∧ b f

B

[ j] = 0)

1 ← b f

A⊆B

[ j] otherwise.

We remark that this operator is equivalent to the bit-

wise binary operator combination:

INC(BF

A

, BF

B

) ≡ ¬(BF

A

) OR BF

B

(8)

Server ﬁrstly computes the inclusion protocol on the

two respective Bloom ﬁlters of sets A and B to test if

A ⊆ B , namely if all the elements from Alice’s set are

included in Bob’s set:

INC(BF

A

, BF

B

) = BF

A⊆B

= b f

A⊆B

[ j]

16 j6m

Then Server expresses X

A⊆B

which corresponds to

the number of bits set to 1 in the resulting Bloom ﬁl-

ter:

X

A⊆B

=

m

∑

j=1

b f

A⊆B

[ j] (9)

Server tests if X

A⊆B

= m and can conclude that A ⊆

B if no false positive occurred. Otherwise we have

A * B with certainty.

4.4 Disjointness Protocol

This set relation allows to verify that no elements

from one set are included in another set. In other

words, this allows to claim that two sets are disjoint.

This test function is not trivial, indeed, if we use

Bloom ﬁlters it is not sufﬁcient to highlight the cases

where a bit 1 has been inserted at the same index for

the two respective Bloom ﬁlters. We deﬁne this oper-

ator as BF

A∩B=

/

0

← DIS(BF

A

, BF

B

):

b f

A∩B=

/

0

[ j]

16 j6m

← DIS(BF

A

, BF

B

) (10)

where 1 ← b f

A∩B=

/

0

[ j] i f (b f

A

[ j] = 1 ∧ b f

B

[ j] = 1)

0 ← b f

A∩B=

/

0

[ j] otherwise.

We remark that this operator is equivalent to the bit-

wise logical-and operator:

DIS(BF

A

, BF

B

) ≡ BF

A

AND BF

B

. (11)

To verify that no element from Alice’s dataset are in-

cluded in Bob’s one, Server performs the disjointness

relation on the respective Bloom ﬁlters of A and B:

DIS(BF

A

, BF

B

) = BF

A∩B=

/

0

= b f

A∩B=

/

0

[ j]

16 j6m

Then Server expresses X

A∩B=

/

0

which corresponds to

the number of bits set to 1 in the resulting Bloom ﬁl-

ter:

X

A∩B=

/

0

=

m

∑

j=1

b f

A∩B=

/

0

[ j] (12)

Server compares it such that:

if X

A∩B=

/

0

< n

L

key

then A and B are distinct

if X

A∩B=

/

0

> n

L

key

then A and B have at least one ele-

ment in common

Indeed for each element which is included in both

sets, we get n

key

times a bit set to 1 in the resulting

Bloom ﬁlter. However we could still get such a bit

set to 1 due to a bit set to 1 in BF

A

and BF

B

stemming

from different elements originally added to the Bloom

ﬁlters. We call such a case a false negative for the dis-

jointness relation since the auditor will state that the

sets are not disjoint while they are. We will discuss its

probability of occurrence in the following sections.

SECRYPT 2020 - 17th International Conference on Security and Cryptography

446

4.5 Correctness of the Set Relations

In this section we consider the correctness of our two

proposed relations. We recall that the Bloom ﬁlter

approach allows false positives but no false negative

on the test() function. Nevertheless, we focus on the

overlapping bits of the Bloom ﬁlters resulting from

our set relations.

4.5.1 Correctness of the Inclusiveness Relation

For the inclusiveness relation, we notice that only

false positive could happen and not false negative. In-

deed, after performing INC(BF

A

, BF

B

), if there is an

index j with b f

A⊆B

[ j] = 0, we have b f

A

[ j] = 1 and

b f

B

[ j] = 0, then with certainty, at least one element

from A does not belong to B. Concretely, if the out-

come of the auditing process states that A * B then

we have a probability of correctness of 1. On the other

hand, if we get A ⊆ B as result, this outcome is not

necessarily correct and we get a probability of cor-

rectness equals to 1 − P

FP

the probability of having

a false positive. P

FP

could be expressed in terms of

parameters n

key

, m and n

B

denoting the amount of el-

ements inserted in BF

B

. The probability that our in-

clusiveness relation outcomes a false positive whereas

one element a

i

from A is not in B is equivalent to the

one to have test(B, a

i

) resulting true with the same

parameters. We detail the value of P

FP

:

First, we denote the probability that after inserting

n

B

elements, a certain bit is equal to 1 is:

1 − (1 −

1

m

)

n

key

·n

B

(13)

If we consider that Z

A,B

elements from A are not in-

cluded in B, the probability of having a false positive

after computing the inclusiveness relation is:

P

FP

>

1 − (1 −

1

m

)

n

key

·n

B

n

key

·Z

A,B

(14)

'

1 − (1 −

1

m

)

X

add

(n

B

)

X

add

(Z

A,B

)

4.5.2 Correctness of the Disjointness Relation

For the disjointness relation, we have on the contrary

no case of false positive but a case of false nega-

tive may happen. Indeed, if we get X

A∩B=

/

0

< n

L

key

then it means that BF

A

and BF

B

have less than n

L

key

(and thus less than n

key

) indexes i where b f

A

[i] = 1

and b f

B

[i] = 1. It is then not possible that A and B

have common elements. Regarding the false nega-

tive scenario, it could happen if we get too many re-

sulting overlapping bits in the resulting Bloom ﬁlter

BF

A∩B=

/

0

.

Deﬁnition 2 (Resulting Overlapping Bit). When there

exists a speciﬁc index i, where b f

A

[i] = b f

B

[i] = 1 and

these two bits are coming from different elements.

Therefore, a false negative consists of a case

where A and B have no element in common but

Server gets X

A∩B=

/

0

> n

L

key

, i.e. more than n

L

key

result-

ing overlapping bits happened. To avoid such a case,

we have to accurately tune the parameters such that in

a case of distinct sets A and B, the respective value

X

A∩B=

/

0

will never (with acceptable probability) be

greater than n

L

key

. To do so, Alice has to carefully se-

lect the parameters n

key

and m such that X

∩=

/

0

< n

L

key

.

Value X

∩=

/

0

represents the expected value of X

A∩B=

/

0

when performing the disjointness protocol on two dis-

tinct sets A and B. To express value X

∩=

/

0

, we ﬁrst

give the probability of having a bit set to 1 for any

index j in both Bloom ﬁlters BF

A

and BF

B

, knowing

that A and B are distinct:

p(b f

A

[ j] = 1 ∧ b f

B

[ j] = 1) (15)

= p(b f

A

[ j] = 1) · p(b f

B

[ j] = 1)

= (1 − (1 −

1

m

)

n

key

·n

A

) · (1 − (1 −

1

m

)

n

key

·n

B

)

Finally, the expected amount of bits set to 1 in both

BF

A

and BF

B

at the same index resulting from distinct

set elements is:

X

∩=

/

0

= m ·(1−(1−

1

m

)

X

add

(n

A

)

)·(1−(1 −

1

m

)

X

add

(n

B

)

)

(16)

When we have Z

0

A,B

common elements inserted in

both Bloom ﬁlters, we get X

A∩B=

/

0

' Z

0

A,B

· n

key

+

X

∩=

/

0

. Therefore, if Alice takes care that X

∩=

/

0

never

gets greater or equal to n

L

key

, then Server could notice

when the two sets have common elements even in the

case of Z

0

A,B

= 1.

4.5.3 Choosing Parameters Regarding

Correctness

In the classical use of Bloom ﬁlters as presented

in (Bloom, 1970), some usage recommendations are

made to generate parameters n

key

and m:

m = −

n

A

· ln (P

FP

)

(ln 2)

2

(17)

n

key

=

m

n

A

· ln 2 (18)

We recall that initially Bloom ﬁlters are not supposed

to hold such relations testing as inclusiveness or dis-

jointness. Therefore, the considerations on the gener-

ation of n

key

and m are manifold.

Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private

447

5 PRIVACY ANALYSIS

In this section we show how our solutions fulﬁll pri-

vacy in terms of content and cardinality.

5.1 Distribution of the Overlapping Bits

In this section we analyze the characteristics of over-

lapping bits occurring throughout the basic step of

Bloom ﬁlters generation. We obtain such a distribu-

tion by running the generation of 10

3

Bloom ﬁlters for

each parameters conﬁguration. From these distribu-

tions we could notice several characteristics. First, the

more elements we add to the Bloom ﬁlter, the larger is

the overlapping bits range. For instance, if we follow

recommendations from (17) and (18), and we insert

only 10 elements, we get a range of overlapping bits

to approximately 10. When we have 100 inserted ele-

ments the range increases to approximately 40. Since

our protocols use an HMAC function which generates

a uniform random distribution, we could consider that

the overlapping bits follow a normal distribution. Set-

ting the parameters in the objective to tune the distri-

bution to get an acceptable overlapping bits range re-

garding the aiming level of privacy could be intended.

As a second characteristic, we observe that when

we have two sets with highly distant cardinalities

n

A

n

B

(or resp. n

B

n

A

), the number of over-

lapping bits in the Bloom ﬁlter of the smaller set Y

BF

A

(resp. Y

BF

B

) substantially decreases and the one of

the larger set substantially increases. Having too few

overlapping bits in a Bloom ﬁlter could be problem-

atic, especially if it could even be predictable by the

attacker. By running tests we notice that no matter

which n

key

is picked or how many elements are in-

serted in the Bloom ﬁlters, if the ratio

n

A

n

B

remains

the same, then the expected amounts of overlapping

bits in BF

A

and BF

B

remain approximately the same.

Moreover, we see that it is even worse if we keep de-

creasing the ratio

n

A

n

B

. One solution to keep having an

acceptable range of overlapping bits in the Bloom ﬁl-

ter representations of the smaller set, even if we have

a signiﬁcant difference in the cardinalities, could be to

use a greater domain [n

L

key

;n

U

key

]. Indeed, for the same

ratio

n

A

n

B

, we get greater overlapping bits ranges. In

(Tajan et al., 2019), some results obtained by testing

the overlapping bits distribution are presented.

5.2 Privacy on the Content

First, we claim that no attacker could determine

which concrete elements from A is included in B.

This holds by means of the Bloom ﬁlter construc-

tion. Indeed, each element from the sets are mapped

with the HMAC function constructed from a cryp-

tographic hash function and therefore beneﬁts from

its on-wayness characteristic. The only straightfor-

ward manner to get any knowledge on the Bloom ﬁl-

ter content would be to use the test() function which is

only computable by Alice and Bob. More concretely,

Server does not know the HMAC’s keys and cannot

generate its own Bloom ﬁlter or add any element to

an existing one and perform the set relations. Indeed,

they require that all the considered Bloom ﬁlters are

generated with the same keys. Also, Server is not able

to learn from BF

A

or BF

B

if a speciﬁc element from

A is also included in B. All elements inserted in a

Bloom ﬁlter are mixed together and it is not possible,

even from the same Bloom ﬁlter, to distinguish them.

5.3 Privacy on the Cardinality

In this section, we focus on the ability of any attacker

to retrieve the cardinality of the sets from one or mul-

tiple versions of the Bloom ﬁlter’s representation. The

overlapping bits property of the Bloom ﬁlters allows

to hide the exact number of elements in sets A and B.

However, Server is able to determine the amount of

bits set to 1 in the Bloom ﬁlters. It could then deduct

the following information: n

A

>

X

BF

A

n

key

. By keeping

parameter n

key

secret to Server, we consider the car-

dinalities obfuscated to a certain level.

We recall that there exists an optimized manner

to get the cardinality of a set from its Bloom ﬁlter

representation as explained in Section 4.1, the S&B

technique. Without any overlapping bit, getting the

result is therefore straightforward. On the contrary,

having multiple overlapping bits will lead any non-

authorized party to misinterpret the cardinality. To

ensure that, the ratio of the amount of overlapping bits

over parameter n

key

should be important.

We also notice that having an acceptable probabil-

ity of false negative and an acceptable level of privacy

for the set cardinalities are contradicting strategies.

Indeed, our approach to solve the disjointness set re-

lation is based on reducing the amount of overlapping

bits to avoid confusion having common elements.

5.4 Sets Cardinality Attack

We present here how an attacker could aim to retrieve

cardinalities n

A

and n

B

. To do so, Server will ﬁrstly

try to determine parameter n

key

used by Alice and

Bob. Server knows that n

key

∈ [n

L

key

;n

U

key

] and that n

key

is a factor of the amount of bits inserted in both Bloom

ﬁlters. The candidates list for n

key

is represented as

L

n

key

= {l

1

, . . . , l

λ

n

key

} with λ

n

key

a security parame-

ter that represents the size of this list. We also con-

SECRYPT 2020 - 17th International Conference on Security and Cryptography

448

Table 1: Execution of the two set relations with n

key

selected in [5.10

2

;2.10

3

].

Parameters Execution times in sec. for Attack

n

W

n

L

1

n

L

2

m n

key

INC(L

1

, W ) DIS(L

2

, W ) λ

n

key

Use Case 1 10

3

10

3

10

3

1.18 · 10

9

733 2.57 · 10

−1

2.16 · 10

−1

406

10

3

10

3

10

3

7.62 · 10

9

1861 2.51 · 10

−1

7.44 · 10

−1

397

10

4

9 · 10

3

2 · 10

2

9.47 · 10

9

1468 2.16 · 10

−1

8.67 · 10

−1

29

Use Case 2 10

2

10

4

/ 4.40 · 10

9

1416 / 7.41 · 10

−1

32

10

2

5 · 10

4

/ 8.13 · 10

9

861 / 3.36 · 10

1

37

10

2

5 · 10

5

/ 1.27 · 10

10

561 / 1.9 · 10

2

32

sider the two sub-lists L

A

n

key

= {l

1

, . . . , l

λ

A

} and L

B

n

key

=

{l

1

, . . . , l

λ

B

} which correspond to the lists of factors

regarding BF

A

and BF

B

before the cross-checking

that leads to L

n

key

. We set Y

BF

A

∈ [ob

A

1

;ob

A

2

] and

Y

BF

B

∈ [ob

B

1

;ob

B

2

] the amounts of overlapping bits in

the Bloom ﬁlters. In each Bloom ﬁlter, some overlap-

ping bits could have occurred, therefore the attacker

knows that regarding BF

A

, n

key

could be a factor of

X

BF

A

or (X

BF

A

+ 1) or (X

BF

A

+ 2) . . . Similarly holds

for BF

B

. It means that L

A

n

key

(resp. L

B

n

key

) is composed

by elements l

j

which verify the two characteristics:

l

j

∈ [n

L

key

;n

U

key

] and l

j

|x

A

(19)

with x

A

∈ [X

BF

A

+ ob

A

1

;X

BF

A

+ ob

A

2

]

(resp. l

j

|x

B

with x

B

∈ [X

BF

B

+ ob

B

1

;X

BF

B

+ ob

B

2

])

Finally, we have L

n

A

= (l

i

)

i∈[1;λ

n

A

]

the list of candi-

dates for n

A

with λ

n

A

the amount of elements in L

n

A

.

The ﬁrst step of the attack consists of listing all the

common factor of {X

BF

A

, (X

BF

A

+ 1), (X

BF

A

+ 2), . . . }

and {X

BF

B

, (X

BF

B

+ 1), (X

BF

B

+ 2), . . . } to generate

lists L

A

n

key

and L

B

n

key

. Then, Server will intersect the

two list to generate the candidates list L

n

key

.

The second phase of the attack is to translate L

n

key

into lists L

n

A

and L

n

B

. Server could use the S&B tech-

nique (6) to approximate size n

A

and since parameter

m is public and value X

BF

A

is directly computable, we

have the following function:

n

∗

A

(n

key

) = −

m

n

key

ln

h

1 −

X

BF

A

m

i

(20)

When we look closely to L

n

key

, we could notice that

if some elements are following each others, they are

translated to the same n

A

’s candidate. In other words,

multiple elements from L

n

key

correspond to the same

element from L

n

A

, thus λ

n

A

6 λ

n

key

.

6 RESULTS

The Bloom ﬁlter-based set relations have various ap-

plications in practice. We selected two of them and

applied our solutions. We implemented our protocols

in Java and the measurements have been made with a

CPU conﬁguration of Intel Core i5 2.40GHz x4.

6.1 Results on a Cloud Auditing Use

Case

We test our solution with parameters suiting a cloud

security auditing use case from (Tajan et al., 2016). A

third party auditor should verify that a cloud service

provider (C S P ) performed correctly an access con-

trol on data from a client stored online. The auditor

thus performs the sets relations on logﬁles L

1

and L

2

from C S P composed of 10

2

to 10

4

IP addresses and

a whitelist W from the client composed of 10

3

to 10

4

IP addresses. The tested parameters conﬁgurations

are presented in Table 1 where the two set relations

INC(L

1

, W ) and DIS(W , L

2

) are tested 10

4

times.

In both cases we obtain 0.00% of false positives and

false negatives. We remark that the set relations are

equivalent to bit-wise operations on the Bloom ﬁlters.

We also notice that the performance times consider-

ing the set cardinality privacy are by far acceptable

especially in an auditing use case. Finally, we express

the privacy on the sets’ cardinality by λ

n

key

.

6.2 Results on a Mobile Devices

Tracking Use Case

In this use case from (Tajan and Westhoff, 2019),

there are three different sub-use cases where a third

party should verify if any suspect user from a govern-

ment agency’s whitelist has been connected to a spe-

ciﬁc wireless access point with one of its devices. In

Table 1 we give examples of relevant parameters that

produced successful computations along with the run-

ning time of the disjointness function in seconds. We

notice a signiﬁcant decrease of the sets’ cardinality

privacy when the sets have different sizes as explained

in Section 5.1. To overcome this privacy weakness,

the parties could agree on a default size and adding

dummy elements if necessary.

Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private

449

7 CONCLUSION

We showed how to compute two speciﬁc set relations

namely private outsourced inclusiveness test and pri-

vate outsourced disjointness test using the space-

efﬁcient data representation Bloom ﬁlter. In addition

to fulﬁll privacy on the content, we provided a certain

level of privacy on the cardinality of the Bloom ﬁlter’s

data structure. Our implementation’s results validate

an acceptable level of privacy, for instance when ap-

plied to a cloud security audit on access control. Such

an approach based on Bloom ﬁlters could be easily

adapted also to other set relations or operations like

equality or relative complement.

REFERENCES

Bloom, B. H. (1970). Space/time trade-offs in hash cod-

ing with allowable errors. Commun. ACM, 13(7):422–

426.

Burkhart, M. and fontas, X. D. (2012). Fast private set op-

erations with sepia.

Churches, T. and Christen, P. (2004). Some methods for

blindfolded record linkage. BMC Med. Inf. & Deci-

sion Making.

Dong, C., Chen, L., and Wen, Z. (2013). When private set

intersection meets big data: an efﬁcient and scalable

protocol. In Sadeghi, A., Gligor, V. D., and Yung, M.,

editors, 2013 ACM SIGSAC Conference on Computer

and Communications Security, CCS’13, Berlin, Ger-

many, November 4-8, 2013, pages 789–800. ACM.

Egert, R., Fischlin, M., Gens, D., Jacob, S., Senker, M., and

Tillmanns, J. (2015). Privately computing set-union

and set-intersection cardinality via bloom ﬁlters. In

Foo, E. and Stebila, D., editors, Information Security

and Privacy - 20th Australasian Conference, ACISP

2015, Brisbane, QLD, Australia, June 29 - July 1,

2015, Proceedings, volume 9144 of Lecture Notes in

Computer Science. Springer.

Freedman, M. J., Nissim, K., and Pinkas, B. (2004). Efﬁ-

cient private matching and set intersection. In Cachin,

C. and Camenisch, J., editors, Advances in Cryptology

- EUROCRYPT 2004, International Conference on

the Theory and Applications of Cryptographic Tech-

niques, Interlaken, Switzerland, May 2-6, 2004, Pro-

ceedings, volume 3027 of Lecture Notes in Computer

Science. Springer.

Goh, E. (2003). Secure indexes. IACR Cryptology ePrint

Archive, 2003:216.

Goldreich, O. (2004). The Foundations of Cryptography -

Volume 2: Basic Applications. Cambridge University

Press.

Hohenberger, S. and Weis, S. A. (2006). Honest-veriﬁer

private disjointness testing without random oracles. In

Danezis, G. and Golle, P., editors, Privacy Enhancing

Technologies, 6th International Workshop, PET 2006,

Cambridge, UK, June 28-30, 2006, Revised Selected

Papers, volume 4258 of Lecture Notes in Computer

Science, pages 277–294. Springer.

Kerschbaum, F. (2012). Outsourced private set intersec-

tion using homomorphic encryption. In Youm, H. Y.

and Won, Y., editors, 7th ACM Symposium on Infor-

mation, Compuer and Communications Security, ASI-

ACCS ’12, Seoul, Korea, May 2-4, 2012. ACM.

Kiayias, A. and Mitrofanova, A. (2005). Testing dis-

jointness of private datasets. In Patrick, A. S. and

Yung, M., editors, Financial Cryptography and Data

Security, 9th International Conference, FC 2005,

Roseau, The Commonwealth of Dominica, February

28 - March 3, 2005, Revised Papers, Lecture Notes in

Computer Science, pages 109–124. Springer.

Kissner, L. and Song, D. X. (2005). Privacy-preserving set

operations. In Shoup, V., editor, Advances in Cryp-

tology - CRYPTO 2005: 25th Annual International

Cryptology Conference, Santa Barbara, California,

USA, August 14-18, 2005, Proceedings, volume 3621

of Lecture Notes in Computer Science. Springer.

Li, Z. and Gong, G. (2012). Efﬁcient data aggregation with

secure bloom ﬁlter in wireless sensor networks.

Pinkas, B., Schneider, T., and Zohner, M. (2014). Faster pri-

vate set intersection based on OT extension. In Fu, K.

and Jung, J., editors, Proceedings of the 23rd USENIX

Security Symposium, San Diego, CA, USA, August 20-

22, 2014., pages 797–812. USENIX Association.

Qiu, L., Li, Y., and Wu, X. (2007). Preserving privacy in

association rule mining with bloom ﬁlters. J. Intell.

Inf. Syst., 29(3):253–278.

Swamidass, S. J. and Baldi, P. (2007). Mathematical cor-

rection for ﬁngerprint similarity measures to improve

chemical retrieval. Journal of Chemical Information

and Modeling, 47(3):952–964. PMID: 17444629.

Tajan, L. and Westhoff, D. (2019). Retrospective tracking

of suspects in GDPR conform mobile access networks

datasets. In Proceedings of the Third Central Euro-

pean Cybersecurity Conference, CECC 2019, Munich,

Germany, November 14-15, 2019, pages 16:1–16:6.

ACM.

Tajan, L., Westhoff, D., and Armknecht, F. (2019). Pri-

vate set relations with bloom ﬁlters for outsourced

SLA validation. IACR Cryptology ePrint Archive,

2019:993.

Tajan, L., Westhoff, D., Reuter, C. A., and Armknecht, F.

(2016). Private information retrieval and searchable

encryption for privacy-preserving multi-client cloud

auditing. In 11th International Conference for In-

ternet Technology and Secured Transactions, ICITST

2016, Barcelona, Spain, December 5-7, 2016. IEEE.

Ye, Q., Wang, H., Pieprzyk, J., and Zhang, X. (2008). Ef-

ﬁcient disjointness tests for private datasets. In Mu,

Y., Susilo, W., and Seberry, J., editors, Information

Security and Privacy, 13th Australasian Conference,

ACISP 2008, Wollongong, Australia, July 7-9, 2008,

Proceedings, volume 5107 of Lecture Notes in Com-

puter Science, pages 155–169. Springer.

SECRYPT 2020 - 17th International Conference on Security and Cryptography

450