Solving Set Relations with Secure Bloom Filters Keeping Cardinality
Private
Louis Tajan
1
, Dirk Westhoff
1
and Frederik Armknecht
2
1
Hochschule Offenburg, Offenburg, Germany
2
University of Mannheim, Mannheim, Germany
Keywords:
Bloom Filters, Set Operations, Set Relations, Outsourced Computation.
Abstract:
We propose in this work to solve privacy preserving set relations performed by a third party in an outsourced
configuration. We argue that solving the disjointness relation based on Bloom filters is a new contribution in
particular by having another layer of privacy on the sets cardinality. We propose to compose the set relations
in a slightly different way by applying a keyed hash function. Besides discussing the correctness of the set
relations, we analyze how this impacts the privacy of the sets content as well as providing privacy on the
sets cardinality. We are in particular interested in how having bits overlapping in the Bloom filters impacts
the privacy level of our approach. Finally, we present our results with real-world parameters in two concrete
scenarios.
1 INTRODUCTION
In the work at hand, we propose a protocol to solve
what we call the private outsourced disjointness test.
We suppose two parties, let it be Alice and Bob each
owning a dataset of elements, respectively A and B .
They would like to know if their datasets are disjoint,
i.e. if A B =
/
0. To do so, they will ask a third party,
Server to perform the verification. Alice (resp. Bob)
does not want to reveal any information on her dataset
including its size to any other party.
We propose a solution based on the Bloom fil-
ter (Bloom, 1970) representation along with keyed
hash functions as HMAC. We also propose to apply
this approach to a more straightforward set relation,
the private outsourced inclusiveness test. Bloom fil-
ters are space-efficient data structures used to repre-
sent sets and that allow to perform set membership
checks. One may argue that a simple pseudonymiza-
tion could be sufficient for the above sketched sce-
nario, as solely apply a keyed hash function on the
sets (Churches and Christen, 2004). However, even if
the pseudonymization function remains private to any
other party than the Bloom filter owners, one may di-
rectly gain knowledge on the number of common el-
ements of two Bloom filters. Such a naive approach
will also reveal which pseudonym is present in none,
one or both sets. On the contrary, Bloom filter rep-
resentation has the particular feature of adding obfus-
cation to the sets. We argue that contrary to multi-
party based solutions (Kissner and Song, 2005), it has
relevance if such a protocol class is non-interactive.
For instance, it could be applied to scenarios of mo-
bile users tracking (Tajan and Westhoff, 2019) or to
cloud auditing use cases (Tajan et al., 2016) where
a third party auditor should perform verification on
logfiles and whitelists. The concrete contributions
of our work are threefold. Firstly, we allow a third
party entity to compute set relations namely, inclu-
siveness and disjointness in an outsourced model. To
do so, we tune the Bloom filter approach by enhanc-
ing its privacy with respect to the sets content. Such
an approach is also providing privacy on the sets
cardinality. Secondly, we present an attack to gain
the sequences cardinality in the present configuration.
By analyzing the behavior of overlapping bits in the
Bloom filter environment we show what amount of
information such an attack may provide. Finally, we
implemented our solution and present our results ob-
tained for the concrete cloud security audit on access
control use case with real-world parameters.
2 PRELIMINARIES
In this section we introduce the type of operations we
are performing on sets and how we represent them
using the Bloom filters approach.
Tajan, L., Westhoff, D. and Armknecht, F.
Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private.
DOI: 10.5220/0009835904430450
In Proceedings of the 17th International Joint Conference on e-Business and Telecommunications (ICETE 2020) - SECRYPT, pages 443-450
ISBN: 978-989-758-446-6
Copyright
c
2020 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
443
2.1 Set Operations and Relations
Multiple types of operations could be performed on
sets but in this work, we aim to test set relations.
Some of them could be reduced to compute the cardi-
nality of some operations. For instance, being able to
compute the cardinality of the intersection of two sets
indicates whether they have elements in common or if
one set is included in the other one. For privacy con-
cerns, it could be of interest to solely reveal its cardi-
nality instead of the intersection itself. Therefore, we
propose a solution to solve two kinds of set relations
namely the inclusiveness and the disjointness.
2.2 Bloom Filters
A Bloom filter is a data structure introduced by Bur-
ton Howard Bloom in 1970 (Bloom, 1970). It is used
to represent a set of elements. With a Bloom filter
representing a certain set, one can verify whether an
element is a member of this set. Such a data struc-
ture consists of a tabular of m bits which is associated
to n
key
public hash functions. At first, all the m bits
are initialized to 0. Moreover two functions namely
add() and test() are available. To add an element to the
Bloom filter, one has to compute the hashes of this el-
ement with each of the respective n
key
hash functions.
Then, set the bit to 1 for each position correspond-
ing to a hash value. To test whether one element is
included in the Bloom filter with the test() function,
one has, by the same manner, to compute the respec-
tive hash values of this element and verify if the re-
spective bits are set to 1. If at least one of these bits is
set to 0, then with certainty the tested element is not a
member of the set represented by the Bloom filter (i.e.
no false negative for test() could happen). On the con-
trary, with some probability, the test() function could
retrieve a false positive. Indeed, even if all the bits
that have been verified are set to 1, the tested element
may not be part of the set represented by the Bloom
filter. To express the probability of having a false pos-
itive when performing function test() we introduce the
notion of an overlapping bit.
Definition 1 (Overlapping Bit). When adding an ele-
ment to a Bloom filter, a certain bit has to be set to 1
but this bit is already set to 1.
The probability of having an overlapping bit is
null when the Bloom filter is still blank, and it grows
along with the number of inserted elements. We
express this probability as following with X
BF
A
the
amount of bits already set to 1 in BF
A
at a specific
point in time:
P
ob
=
X
BF
A
m
(1)
We could then express the average amount of different
bits added to the Bloom filter when adding one new
element to it:
X
add
= n
key
+
n
key
i=1
(1)
i
·
n
key
1
j=i
j
i
m
i
(2)
And we could generalize it to the average amount of
bits added to the Bloom filter when adding N new el-
ements to it:
X
add
(N) = n
key
· N +
n
key
·N
i=1
(1)
i
·
(n
key
·N)1
j=i
j
i
m
i
(3)
By observing the current state of a Bloom filter rep-
resenting finite set A of n
A
elements, one can ex-
press the exact amount of overlapping bits as the value
Y
BF
A
:
Y
BF
A
= (n
A
· n
key
) X
BF
A
(4)
2.3 Adversary Models
We describe in this section how the different parties
should behave regarding the security of the protocol.
2.3.1 Alice and Bob - Honest-but-Curious
Adversaries
We consider Alice and Bob acting normally accord-
ing to the protocol description. Even if knowing any
information on the other party’s dataset could be of
interest, we consider that Alice’s and Bob’s main ob-
jective is to retrieve the result of the sets relations and
therefore they will faithfully provide correct inputs.
2.3.2 Server - Malicious Adversary
On the contrary, the third party is not trustworthy and
may behave arbitrarily. Slightly differently than the
classic definition of a malicious adversary (Goldre-
ich, 2004), we consider here solely its privacy aspect.
Indeed, we do not care about any alteration of the fi-
nal result by the server. We consider that the server’s
objective is to gain any information on the sets from
Alice and Bob and not to refuse or abort the protocol
prematurely. Therefore, to find information on their
datasets, the server (or any other malicious adversary)
could try to generate its own Bloom filter and perform
the set relations on it along with another Bloom filter
from one of the parties.
3 RELATED WORK
To the best of our knowledge, this work is the first
to solve the set disjointness relation in an outsourced
SECRYPT 2020 - 17th International Conference on Security and Cryptography
444
configuration using Bloom filter construction and pro-
viding privacy on the sets’ sizes. Computing the dis-
jointness test as a two-party secure computation prob-
lem has been proposed in several papers based on
homomorphic encryption and Pedersen commitments
for (Freedman et al., 2004; Kiayias and Mitrofanova,
2005), on “testable and homomorphic commitments”
on polynomial representations for (Hohenberger and
Weis, 2006) and Sylvester matrix and Lagrange in-
terpolation for (Ye et al., 2008) but none of these are
adapted to a third party scenario or provide privacy
on the sets’ sizes. Using Bloom filters in the objec-
tive to process operations and sets has been already
been proposed. In addition to not providing any solu-
tion to private disjointness test, (Burkhart and fontas,
2012) proposes solution for multiparty computation
while (Dong et al., 2013; Pinkas et al., 2014) for a
two-party protocol. In (Kerschbaum, 2012) we no-
tice that Kerschbaum proposes an “outsourced” ver-
sion of his protocol but requires homomorphic mul-
tiplications between Bloom filters of encrypted ele-
ments. Also, the protocol disables the server to learn
about the intersection size. Solving the set intersec-
tion cardinality implies solving the disjointness prob-
lem as in (Egert et al., 2015) but this work relies on
knowing the sizes of the sets and does not fit our out-
sourced configuration. Adding privacy to Bloom
filters has been investigated in several works. We
explain in what way these existing solutions do not
fit our requirements and therefore our solution brings
novelty. First of all, we highlight the fact that none
of the following solutions provide privacy on the sets
cardinality. In (Goh, 2003) Goh associates Bloom fil-
ters with a keyed pseudo-random function to allow
a private member testing in the Bloom filter. This
is in particular one aspect we do not want the third
party be able to do. In (Li and Gong, 2012) the au-
thors expose a construction of Bloom filters along
with HMAC protocol in a wireless sensor aggrega-
tion scenario. Their approach is somehow similar but
the base station shares HMAC keys directly with each
of the nodes. Therefore, the merging of Bloom fil-
ters from different nodes does not allow any operation
since different keys are used. In (Qiu et al., 2007), still
by combining the Bloom filter approach with a keyed
hash function, the authors propose a solution to com-
pute the membership of elements in a set. Therefore,
they manipulate Bloom filters of unique elements that
leads to data leakage regarding the amount of ele-
ments and could be very costly, especially when con-
sidering thousands of them.
4 PROTOCOL
We also emphasize the fact that, to process the two set
relations, the considered Bloom filters should be sim-
ilarly generated, namely with the same size m, keyed
hash function and set of keys K. First we recall the
two privacy enhancements from tuning the classical
use of Bloom filter. Then we present the two set rela-
tions before explaining how the parameters should be
selected to guarantee a certain level of correctness.
4.1 Privacy Enhancements
Our approach to make such a technique fitting for
privacy-sensitive use cases, is based on the use of
a public keyed collision-resistant hash function (e.g.
MAC) with a set of n
key
private keys instead of the
n
key
public hash functions. Without loss of generality,
we use an HMAC function to solve the two set rela-
tions. That being said, any party that does not hold the
keys cannot use the test() function to directly verify if
a specific element is included in the Bloom filter. The
other security benefit when using an HMAC function
is that even if the function is publicly released, any
party that does not hold the keys cannot add additional
elements. More formally, we define a Bloom filter of
a set A = {a
1
, . . . , a
n
A
} as a tabular of m bits, with
a set of n
key
keys K = {k
1
, . . . , k
n
key
} and an HMAC
function h
k
κ
{0, 1}
{1, . . . , m} with k
κ
K as:
BF
A,(h
k
κ
)
k
κ
K
= b f
A
[ j]
16 j6m
(5)
where b f
A
[ j] = 1 i f (i, κ) s.t. h
k
κ
(a
i
) = j
b f
A
[ j] = 0 otherwise
In the remaining parts of this work, we use the simpli-
fied notation BF
A
(resp. BF
B
) to represent the Bloom
filter of set A (resp. set B). The second privacy en-
hancement we add to the use of Bloom filters cor-
responds to keep parameter n
key
private to avoid re-
vealing the sets’ cardinalites. Indeed, the naive tech-
nique to retrieve the cardinality of the set by looking
at its respective Bloom filter would be to divide its
amount of bits set to one by parameter n
key
. There ex-
ists an optimized technique introduced by Swamidass
and Baldi (Swamidass and Baldi, 2007) which com-
putes n
A
an approximation of the number of distinct
elements inserted in BF
A
with X
BF
A
the amount of bits
set to 1 in the Bloom filter:
n
A
=
m
n
key
ln
h
1
X
BF
A
m
i
(6)
Such a technique requires even more overlapping bits
to mislead the attacker. We argue that by making pa-
rameter n
key
private, one could not be able to compute
Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private
445
n
A
anymore. One may argue the complexity of keep-
ing the size of K private or the effort to store a large
amount of keys. We could then slightly modify the
protocol to have a unique key k. Indeed, the outcome
of h
k
(x) will be divided in n
key
equal size fragments
and each indicates an index of the Bloom filter to in-
crement.
4.2 Initialization
h, n
key
, m, K Setup: Alice should first choose and
generate the Bloom filter parameters: the dimen-
sion m, the HMAC function h, the amount of keys
n
key
and the set of keys K = {k
1
, . . . , k
n
key
}. She
generates parameters by performing the following
protocol:
Randomly choose n
key
[n
L
key
;n
U
key
] with inte-
gers n
key
, n
L
key
and n
U
key
.
Set m such that X
=
/
0
< n
L
key
.
Values n
L
key
and n
U
key
are public and represent the
value space of n
key
. We determine them later con-
sidering correctness and privacy in Sections 4.5
and 5. The restriction on parameter m corresponds
to a correctness consideration on X
=
/
0
which we
explain in more details in Section 4.5.2. Then Al-
ice selects the public HMAC function h, gener-
ates its n
key
respective keys and privately shares
parameters {h, n
key
, m, K} with Bob.
BF
A
Create(A): Alice (resp. Bob) generates the
Bloom filter of her dataset A = {a
1
, . . . , a
n
A
}
(resp. B = {b
1
, . . . , b
n
B
}):
BF
A
= BF
A,(h
k
κ
)
k
κ
K
= b f
A
[ j]
16 j6m
(resp. BF
B
= BF
B, (h
k
κ
)
k
κ
K
= b f
B
[ j]
16 j6m
)
4.3 Inclusiveness Protocol
This operator allows to verify if one set is included in
another. It performs directly on the Bloom filters of
the respective sets. To determine if A is included in
B we define BF
AB
INC(BF
A
, BF
B
):
b f
AB
[ j]
16 j6m
INC(BF
A
, BF
B
) (7)
where 0 b f
AB
[ j] i f (b f
A
[ j] = 1 b f
B
[ j] = 0)
1 b f
AB
[ j] otherwise.
We remark that this operator is equivalent to the bit-
wise binary operator combination:
INC(BF
A
, BF
B
) ¬(BF
A
) OR BF
B
(8)
Server firstly computes the inclusion protocol on the
two respective Bloom filters of sets A and B to test if
A B , namely if all the elements from Alice’s set are
included in Bob’s set:
INC(BF
A
, BF
B
) = BF
AB
= b f
AB
[ j]
16 j6m
Then Server expresses X
AB
which corresponds to
the number of bits set to 1 in the resulting Bloom fil-
ter:
X
AB
=
m
j=1
b f
AB
[ j] (9)
Server tests if X
AB
= m and can conclude that A
B if no false positive occurred. Otherwise we have
A * B with certainty.
4.4 Disjointness Protocol
This set relation allows to verify that no elements
from one set are included in another set. In other
words, this allows to claim that two sets are disjoint.
This test function is not trivial, indeed, if we use
Bloom filters it is not sufficient to highlight the cases
where a bit 1 has been inserted at the same index for
the two respective Bloom filters. We define this oper-
ator as BF
AB=
/
0
DIS(BF
A
, BF
B
):
b f
AB=
/
0
[ j]
16 j6m
DIS(BF
A
, BF
B
) (10)
where 1 b f
AB=
/
0
[ j] i f (b f
A
[ j] = 1 b f
B
[ j] = 1)
0 b f
AB=
/
0
[ j] otherwise.
We remark that this operator is equivalent to the bit-
wise logical-and operator:
DIS(BF
A
, BF
B
) BF
A
AND BF
B
. (11)
To verify that no element from Alice’s dataset are in-
cluded in Bob’s one, Server performs the disjointness
relation on the respective Bloom filters of A and B:
DIS(BF
A
, BF
B
) = BF
AB=
/
0
= b f
AB=
/
0
[ j]
16 j6m
Then Server expresses X
AB=
/
0
which corresponds to
the number of bits set to 1 in the resulting Bloom fil-
ter:
X
AB=
/
0
=
m
j=1
b f
AB=
/
0
[ j] (12)
Server compares it such that:
if X
AB=
/
0
< n
L
key
then A and B are distinct
if X
AB=
/
0
> n
L
key
then A and B have at least one ele-
ment in common
Indeed for each element which is included in both
sets, we get n
key
times a bit set to 1 in the resulting
Bloom filter. However we could still get such a bit
set to 1 due to a bit set to 1 in BF
A
and BF
B
stemming
from different elements originally added to the Bloom
filters. We call such a case a false negative for the dis-
jointness relation since the auditor will state that the
sets are not disjoint while they are. We will discuss its
probability of occurrence in the following sections.
SECRYPT 2020 - 17th International Conference on Security and Cryptography
446
4.5 Correctness of the Set Relations
In this section we consider the correctness of our two
proposed relations. We recall that the Bloom filter
approach allows false positives but no false negative
on the test() function. Nevertheless, we focus on the
overlapping bits of the Bloom filters resulting from
our set relations.
4.5.1 Correctness of the Inclusiveness Relation
For the inclusiveness relation, we notice that only
false positive could happen and not false negative. In-
deed, after performing INC(BF
A
, BF
B
), if there is an
index j with b f
AB
[ j] = 0, we have b f
A
[ j] = 1 and
b f
B
[ j] = 0, then with certainty, at least one element
from A does not belong to B. Concretely, if the out-
come of the auditing process states that A * B then
we have a probability of correctness of 1. On the other
hand, if we get A B as result, this outcome is not
necessarily correct and we get a probability of cor-
rectness equals to 1 P
FP
the probability of having
a false positive. P
FP
could be expressed in terms of
parameters n
key
, m and n
B
denoting the amount of el-
ements inserted in BF
B
. The probability that our in-
clusiveness relation outcomes a false positive whereas
one element a
i
from A is not in B is equivalent to the
one to have test(B, a
i
) resulting true with the same
parameters. We detail the value of P
FP
:
First, we denote the probability that after inserting
n
B
elements, a certain bit is equal to 1 is:
1 (1
1
m
)
n
key
·n
B
(13)
If we consider that Z
A,B
elements from A are not in-
cluded in B, the probability of having a false positive
after computing the inclusiveness relation is:
P
FP
>
1 (1
1
m
)
n
key
·n
B
n
key
·Z
A,B
(14)
'
1 (1
1
m
)
X
add
(n
B
)
X
add
(Z
A,B
)
4.5.2 Correctness of the Disjointness Relation
For the disjointness relation, we have on the contrary
no case of false positive but a case of false nega-
tive may happen. Indeed, if we get X
AB=
/
0
< n
L
key
then it means that BF
A
and BF
B
have less than n
L
key
(and thus less than n
key
) indexes i where b f
A
[i] = 1
and b f
B
[i] = 1. It is then not possible that A and B
have common elements. Regarding the false nega-
tive scenario, it could happen if we get too many re-
sulting overlapping bits in the resulting Bloom filter
BF
AB=
/
0
.
Definition 2 (Resulting Overlapping Bit). When there
exists a specific index i, where b f
A
[i] = b f
B
[i] = 1 and
these two bits are coming from different elements.
Therefore, a false negative consists of a case
where A and B have no element in common but
Server gets X
AB=
/
0
> n
L
key
, i.e. more than n
L
key
result-
ing overlapping bits happened. To avoid such a case,
we have to accurately tune the parameters such that in
a case of distinct sets A and B, the respective value
X
AB=
/
0
will never (with acceptable probability) be
greater than n
L
key
. To do so, Alice has to carefully se-
lect the parameters n
key
and m such that X
=
/
0
< n
L
key
.
Value X
=
/
0
represents the expected value of X
AB=
/
0
when performing the disjointness protocol on two dis-
tinct sets A and B. To express value X
=
/
0
, we first
give the probability of having a bit set to 1 for any
index j in both Bloom filters BF
A
and BF
B
, knowing
that A and B are distinct:
p(b f
A
[ j] = 1 b f
B
[ j] = 1) (15)
= p(b f
A
[ j] = 1) · p(b f
B
[ j] = 1)
= (1 (1
1
m
)
n
key
·n
A
) · (1 (1
1
m
)
n
key
·n
B
)
Finally, the expected amount of bits set to 1 in both
BF
A
and BF
B
at the same index resulting from distinct
set elements is:
X
=
/
0
= m ·(1(1
1
m
)
X
add
(n
A
)
)·(1(1
1
m
)
X
add
(n
B
)
)
(16)
When we have Z
0
A,B
common elements inserted in
both Bloom filters, we get X
AB=
/
0
' Z
0
A,B
· n
key
+
X
=
/
0
. Therefore, if Alice takes care that X
=
/
0
never
gets greater or equal to n
L
key
, then Server could notice
when the two sets have common elements even in the
case of Z
0
A,B
= 1.
4.5.3 Choosing Parameters Regarding
Correctness
In the classical use of Bloom filters as presented
in (Bloom, 1970), some usage recommendations are
made to generate parameters n
key
and m:
m =
n
A
· ln (P
FP
)
(ln 2)
2
(17)
n
key
=
m
n
A
· ln 2 (18)
We recall that initially Bloom filters are not supposed
to hold such relations testing as inclusiveness or dis-
jointness. Therefore, the considerations on the gener-
ation of n
key
and m are manifold.
Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private
447
5 PRIVACY ANALYSIS
In this section we show how our solutions fulfill pri-
vacy in terms of content and cardinality.
5.1 Distribution of the Overlapping Bits
In this section we analyze the characteristics of over-
lapping bits occurring throughout the basic step of
Bloom filters generation. We obtain such a distribu-
tion by running the generation of 10
3
Bloom filters for
each parameters configuration. From these distribu-
tions we could notice several characteristics. First, the
more elements we add to the Bloom filter, the larger is
the overlapping bits range. For instance, if we follow
recommendations from (17) and (18), and we insert
only 10 elements, we get a range of overlapping bits
to approximately 10. When we have 100 inserted ele-
ments the range increases to approximately 40. Since
our protocols use an HMAC function which generates
a uniform random distribution, we could consider that
the overlapping bits follow a normal distribution. Set-
ting the parameters in the objective to tune the distri-
bution to get an acceptable overlapping bits range re-
garding the aiming level of privacy could be intended.
As a second characteristic, we observe that when
we have two sets with highly distant cardinalities
n
A
n
B
(or resp. n
B
n
A
), the number of over-
lapping bits in the Bloom filter of the smaller set Y
BF
A
(resp. Y
BF
B
) substantially decreases and the one of
the larger set substantially increases. Having too few
overlapping bits in a Bloom filter could be problem-
atic, especially if it could even be predictable by the
attacker. By running tests we notice that no matter
which n
key
is picked or how many elements are in-
serted in the Bloom filters, if the ratio
n
A
n
B
remains
the same, then the expected amounts of overlapping
bits in BF
A
and BF
B
remain approximately the same.
Moreover, we see that it is even worse if we keep de-
creasing the ratio
n
A
n
B
. One solution to keep having an
acceptable range of overlapping bits in the Bloom fil-
ter representations of the smaller set, even if we have
a significant difference in the cardinalities, could be to
use a greater domain [n
L
key
;n
U
key
]. Indeed, for the same
ratio
n
A
n
B
, we get greater overlapping bits ranges. In
(Tajan et al., 2019), some results obtained by testing
the overlapping bits distribution are presented.
5.2 Privacy on the Content
First, we claim that no attacker could determine
which concrete elements from A is included in B.
This holds by means of the Bloom filter construc-
tion. Indeed, each element from the sets are mapped
with the HMAC function constructed from a cryp-
tographic hash function and therefore benefits from
its on-wayness characteristic. The only straightfor-
ward manner to get any knowledge on the Bloom fil-
ter content would be to use the test() function which is
only computable by Alice and Bob. More concretely,
Server does not know the HMAC’s keys and cannot
generate its own Bloom filter or add any element to
an existing one and perform the set relations. Indeed,
they require that all the considered Bloom filters are
generated with the same keys. Also, Server is not able
to learn from BF
A
or BF
B
if a specific element from
A is also included in B. All elements inserted in a
Bloom filter are mixed together and it is not possible,
even from the same Bloom filter, to distinguish them.
5.3 Privacy on the Cardinality
In this section, we focus on the ability of any attacker
to retrieve the cardinality of the sets from one or mul-
tiple versions of the Bloom filter’s representation. The
overlapping bits property of the Bloom filters allows
to hide the exact number of elements in sets A and B.
However, Server is able to determine the amount of
bits set to 1 in the Bloom filters. It could then deduct
the following information: n
A
>
X
BF
A
n
key
. By keeping
parameter n
key
secret to Server, we consider the car-
dinalities obfuscated to a certain level.
We recall that there exists an optimized manner
to get the cardinality of a set from its Bloom filter
representation as explained in Section 4.1, the S&B
technique. Without any overlapping bit, getting the
result is therefore straightforward. On the contrary,
having multiple overlapping bits will lead any non-
authorized party to misinterpret the cardinality. To
ensure that, the ratio of the amount of overlapping bits
over parameter n
key
should be important.
We also notice that having an acceptable probabil-
ity of false negative and an acceptable level of privacy
for the set cardinalities are contradicting strategies.
Indeed, our approach to solve the disjointness set re-
lation is based on reducing the amount of overlapping
bits to avoid confusion having common elements.
5.4 Sets Cardinality Attack
We present here how an attacker could aim to retrieve
cardinalities n
A
and n
B
. To do so, Server will firstly
try to determine parameter n
key
used by Alice and
Bob. Server knows that n
key
[n
L
key
;n
U
key
] and that n
key
is a factor of the amount of bits inserted in both Bloom
filters. The candidates list for n
key
is represented as
L
n
key
= {l
1
, . . . , l
λ
n
key
} with λ
n
key
a security parame-
ter that represents the size of this list. We also con-
SECRYPT 2020 - 17th International Conference on Security and Cryptography
448
Table 1: Execution of the two set relations with n
key
selected in [5.10
2
;2.10
3
].
Parameters Execution times in sec. for Attack
n
W
n
L
1
n
L
2
m n
key
INC(L
1
, W ) DIS(L
2
, W ) λ
n
key
Use Case 1 10
3
10
3
10
3
1.18 · 10
9
733 2.57 · 10
1
2.16 · 10
1
406
10
3
10
3
10
3
7.62 · 10
9
1861 2.51 · 10
1
7.44 · 10
1
397
10
4
9 · 10
3
2 · 10
2
9.47 · 10
9
1468 2.16 · 10
1
8.67 · 10
1
29
Use Case 2 10
2
10
4
/ 4.40 · 10
9
1416 / 7.41 · 10
1
32
10
2
5 · 10
4
/ 8.13 · 10
9
861 / 3.36 · 10
1
37
10
2
5 · 10
5
/ 1.27 · 10
10
561 / 1.9 · 10
2
32
sider the two sub-lists L
A
n
key
= {l
1
, . . . , l
λ
A
} and L
B
n
key
=
{l
1
, . . . , l
λ
B
} which correspond to the lists of factors
regarding BF
A
and BF
B
before the cross-checking
that leads to L
n
key
. We set Y
BF
A
[ob
A
1
;ob
A
2
] and
Y
BF
B
[ob
B
1
;ob
B
2
] the amounts of overlapping bits in
the Bloom filters. In each Bloom filter, some overlap-
ping bits could have occurred, therefore the attacker
knows that regarding BF
A
, n
key
could be a factor of
X
BF
A
or (X
BF
A
+ 1) or (X
BF
A
+ 2) . . . Similarly holds
for BF
B
. It means that L
A
n
key
(resp. L
B
n
key
) is composed
by elements l
j
which verify the two characteristics:
l
j
[n
L
key
;n
U
key
] and l
j
|x
A
(19)
with x
A
[X
BF
A
+ ob
A
1
;X
BF
A
+ ob
A
2
]
(resp. l
j
|x
B
with x
B
[X
BF
B
+ ob
B
1
;X
BF
B
+ ob
B
2
])
Finally, we have L
n
A
= (l
i
)
i[1;λ
n
A
]
the list of candi-
dates for n
A
with λ
n
A
the amount of elements in L
n
A
.
The first step of the attack consists of listing all the
common factor of {X
BF
A
, (X
BF
A
+ 1), (X
BF
A
+ 2), . . . }
and {X
BF
B
, (X
BF
B
+ 1), (X
BF
B
+ 2), . . . } to generate
lists L
A
n
key
and L
B
n
key
. Then, Server will intersect the
two list to generate the candidates list L
n
key
.
The second phase of the attack is to translate L
n
key
into lists L
n
A
and L
n
B
. Server could use the S&B tech-
nique (6) to approximate size n
A
and since parameter
m is public and value X
BF
A
is directly computable, we
have the following function:
n
A
(n
key
) =
m
n
key
ln
h
1
X
BF
A
m
i
(20)
When we look closely to L
n
key
, we could notice that
if some elements are following each others, they are
translated to the same n
A
s candidate. In other words,
multiple elements from L
n
key
correspond to the same
element from L
n
A
, thus λ
n
A
6 λ
n
key
.
6 RESULTS
The Bloom filter-based set relations have various ap-
plications in practice. We selected two of them and
applied our solutions. We implemented our protocols
in Java and the measurements have been made with a
CPU configuration of Intel Core i5 2.40GHz x4.
6.1 Results on a Cloud Auditing Use
Case
We test our solution with parameters suiting a cloud
security auditing use case from (Tajan et al., 2016). A
third party auditor should verify that a cloud service
provider (C S P ) performed correctly an access con-
trol on data from a client stored online. The auditor
thus performs the sets relations on logfiles L
1
and L
2
from C S P composed of 10
2
to 10
4
IP addresses and
a whitelist W from the client composed of 10
3
to 10
4
IP addresses. The tested parameters configurations
are presented in Table 1 where the two set relations
INC(L
1
, W ) and DIS(W , L
2
) are tested 10
4
times.
In both cases we obtain 0.00% of false positives and
false negatives. We remark that the set relations are
equivalent to bit-wise operations on the Bloom filters.
We also notice that the performance times consider-
ing the set cardinality privacy are by far acceptable
especially in an auditing use case. Finally, we express
the privacy on the sets’ cardinality by λ
n
key
.
6.2 Results on a Mobile Devices
Tracking Use Case
In this use case from (Tajan and Westhoff, 2019),
there are three different sub-use cases where a third
party should verify if any suspect user from a govern-
ment agency’s whitelist has been connected to a spe-
cific wireless access point with one of its devices. In
Table 1 we give examples of relevant parameters that
produced successful computations along with the run-
ning time of the disjointness function in seconds. We
notice a significant decrease of the sets’ cardinality
privacy when the sets have different sizes as explained
in Section 5.1. To overcome this privacy weakness,
the parties could agree on a default size and adding
dummy elements if necessary.
Solving Set Relations with Secure Bloom Filters Keeping Cardinality Private
449
7 CONCLUSION
We showed how to compute two specific set relations
namely private outsourced inclusiveness test and pri-
vate outsourced disjointness test using the space-
efficient data representation Bloom filter. In addition
to fulfill privacy on the content, we provided a certain
level of privacy on the cardinality of the Bloom filter’s
data structure. Our implementation’s results validate
an acceptable level of privacy, for instance when ap-
plied to a cloud security audit on access control. Such
an approach based on Bloom filters could be easily
adapted also to other set relations or operations like
equality or relative complement.
REFERENCES
Bloom, B. H. (1970). Space/time trade-offs in hash cod-
ing with allowable errors. Commun. ACM, 13(7):422–
426.
Burkhart, M. and fontas, X. D. (2012). Fast private set op-
erations with sepia.
Churches, T. and Christen, P. (2004). Some methods for
blindfolded record linkage. BMC Med. Inf. & Deci-
sion Making.
Dong, C., Chen, L., and Wen, Z. (2013). When private set
intersection meets big data: an efficient and scalable
protocol. In Sadeghi, A., Gligor, V. D., and Yung, M.,
editors, 2013 ACM SIGSAC Conference on Computer
and Communications Security, CCS’13, Berlin, Ger-
many, November 4-8, 2013, pages 789–800. ACM.
Egert, R., Fischlin, M., Gens, D., Jacob, S., Senker, M., and
Tillmanns, J. (2015). Privately computing set-union
and set-intersection cardinality via bloom filters. In
Foo, E. and Stebila, D., editors, Information Security
and Privacy - 20th Australasian Conference, ACISP
2015, Brisbane, QLD, Australia, June 29 - July 1,
2015, Proceedings, volume 9144 of Lecture Notes in
Computer Science. Springer.
Freedman, M. J., Nissim, K., and Pinkas, B. (2004). Effi-
cient private matching and set intersection. In Cachin,
C. and Camenisch, J., editors, Advances in Cryptology
- EUROCRYPT 2004, International Conference on
the Theory and Applications of Cryptographic Tech-
niques, Interlaken, Switzerland, May 2-6, 2004, Pro-
ceedings, volume 3027 of Lecture Notes in Computer
Science. Springer.
Goh, E. (2003). Secure indexes. IACR Cryptology ePrint
Archive, 2003:216.
Goldreich, O. (2004). The Foundations of Cryptography -
Volume 2: Basic Applications. Cambridge University
Press.
Hohenberger, S. and Weis, S. A. (2006). Honest-verifier
private disjointness testing without random oracles. In
Danezis, G. and Golle, P., editors, Privacy Enhancing
Technologies, 6th International Workshop, PET 2006,
Cambridge, UK, June 28-30, 2006, Revised Selected
Papers, volume 4258 of Lecture Notes in Computer
Science, pages 277–294. Springer.
Kerschbaum, F. (2012). Outsourced private set intersec-
tion using homomorphic encryption. In Youm, H. Y.
and Won, Y., editors, 7th ACM Symposium on Infor-
mation, Compuer and Communications Security, ASI-
ACCS ’12, Seoul, Korea, May 2-4, 2012. ACM.
Kiayias, A. and Mitrofanova, A. (2005). Testing dis-
jointness of private datasets. In Patrick, A. S. and
Yung, M., editors, Financial Cryptography and Data
Security, 9th International Conference, FC 2005,
Roseau, The Commonwealth of Dominica, February
28 - March 3, 2005, Revised Papers, Lecture Notes in
Computer Science, pages 109–124. Springer.
Kissner, L. and Song, D. X. (2005). Privacy-preserving set
operations. In Shoup, V., editor, Advances in Cryp-
tology - CRYPTO 2005: 25th Annual International
Cryptology Conference, Santa Barbara, California,
USA, August 14-18, 2005, Proceedings, volume 3621
of Lecture Notes in Computer Science. Springer.
Li, Z. and Gong, G. (2012). Efficient data aggregation with
secure bloom filter in wireless sensor networks.
Pinkas, B., Schneider, T., and Zohner, M. (2014). Faster pri-
vate set intersection based on OT extension. In Fu, K.
and Jung, J., editors, Proceedings of the 23rd USENIX
Security Symposium, San Diego, CA, USA, August 20-
22, 2014., pages 797–812. USENIX Association.
Qiu, L., Li, Y., and Wu, X. (2007). Preserving privacy in
association rule mining with bloom filters. J. Intell.
Inf. Syst., 29(3):253–278.
Swamidass, S. J. and Baldi, P. (2007). Mathematical cor-
rection for fingerprint similarity measures to improve
chemical retrieval. Journal of Chemical Information
and Modeling, 47(3):952–964. PMID: 17444629.
Tajan, L. and Westhoff, D. (2019). Retrospective tracking
of suspects in GDPR conform mobile access networks
datasets. In Proceedings of the Third Central Euro-
pean Cybersecurity Conference, CECC 2019, Munich,
Germany, November 14-15, 2019, pages 16:1–16:6.
ACM.
Tajan, L., Westhoff, D., and Armknecht, F. (2019). Pri-
vate set relations with bloom filters for outsourced
SLA validation. IACR Cryptology ePrint Archive,
2019:993.
Tajan, L., Westhoff, D., Reuter, C. A., and Armknecht, F.
(2016). Private information retrieval and searchable
encryption for privacy-preserving multi-client cloud
auditing. In 11th International Conference for In-
ternet Technology and Secured Transactions, ICITST
2016, Barcelona, Spain, December 5-7, 2016. IEEE.
Ye, Q., Wang, H., Pieprzyk, J., and Zhang, X. (2008). Ef-
ficient disjointness tests for private datasets. In Mu,
Y., Susilo, W., and Seberry, J., editors, Information
Security and Privacy, 13th Australasian Conference,
ACISP 2008, Wollongong, Australia, July 7-9, 2008,
Proceedings, volume 5107 of Lecture Notes in Com-
puter Science, pages 155–169. Springer.
SECRYPT 2020 - 17th International Conference on Security and Cryptography
450