VIP Blowﬁsh Privacy in Communication Graphs

Mohamed Nassar

1 a

, Elie Chicha

2,3

, Bechara AL Bouna

and Richard Chbeir

Computer Science Department, American University of Beirut, Lebanon

TICKET Lab., Antonine University, Hadat-Baabda, Lebanon

Univ. Pau & Pays Adour, UPPA - E2S, LIUPPA, Anglet, France

Keywords:

Blowﬁsh Privacy, Social Networks.

Abstract:

Communication patterns analysis is becoming crucial for global health security especially with the spread of

epidemics such as COVID-19 by the means of social contact. At the same time, personal privacy is considered

an essential human right. Privacy-preserving frameworks enable communication graph analysis within formal

privacy guarantees. In this paper, we present a summary of Blowﬁsh privacy and explore the possibility

of applying it in the context of undirected communication graphs. Communication graphs represent social

contact or call detail records databases. We deﬁne the notions of neighborhood, discriminative secrets, and

policies for these graphs. We study several examples of queries and compute their sensitivity. Even though not

addressed in the original Blowﬁsh privacy paper, we explore the idea of having a discriminative secret graph

per individual. This allows us to treat some persons as VIP and put their privacy on top priority, where other

persons can have lower privacy constraints. This may help to offer privacy as a service and increase the utility

of the anonymized communication graph to an appropriate level.

1 INTRODUCTION

Communication patterns analysis is becoming crucial

for global health security especially with the spread

of epidemics such as COVID-19 by the means of

social contact. In the same time, personal privacy

is considered as an essential human right. Privacy-

preserving frameworks enable communication graph

analysis within formal privacy guarantees. Differen-

tial Privacy (DP) provides ways for trading-off the

privacy of individuals in a statistical database for the

utility of data analysis.

DP has a single tuning knob, namely ε, sometimes

two (ε and δ). For example, increasing ε means more

utility and less privacy. The idea of Blowﬁsh privacy

(BP) is to provide more tuning knobs by introducing

policies He et al. (2014). In BP, a policy speciﬁes:

• secrets: information that must be kept secret. Since not

all the information has to be secret, we can increase the

utility of the data by lessening the protection of certain

properties.

• and constraints: known properties about the data. Con-

straints add protection against an adversary who knows

these constraints.

DP can be considered as an instance of BP where:

https://orcid.org/0000-0001-8857-4436

Bob

Alice

Eve

Carol

id tuple

Bob (0, 1, 1, 1)

Alice

(1, 0, 1, 0)

Eve (1, 1, 0, 1)

Carol (1, 0, 1, 0)

Figure 1: Communication graph and its database.

• every property about an individual’s record is protected,

• every individual is independent of all the other individ-

uals in the dataset. There is no correlations.

Because of its generalized framework and powerful

expressiveness of adversarial knowledge, we expect

that BP can solve privacy challenges in graph-based

databases. In this paper, we explore the application of

BP to communication graphs such as social networks

and call detail records databases. We model the se-

crets and the auxiliary knowledge in terms of the BP

model and give numerous examples.

2 BP FOR COMMUNICATION

GRAPHS

A communication graph is a graph where vertices

represent individuals and an edge between two in-

Nassar, M., Chicha, E., Al Bouna, B. and Chbeir, R.

VIP Blowﬁsh Privacy in Communication Graphs.

DOI: 10.5220/0009875704590467

In Proceedings of the 17th International Joint Conference on e-Business and Telecommunications (ICETE 2020) - SECRYPT, pages 459-467

ISBN: 978-989-758-446-6

459

dividuals exists if a communication has happened

in between the individuals corresponding to the ver-

tices. Social networks and call detail records can

be modeled as communication graphs. One way to

anonymize the data of a communication graph is to

remove the identiﬁers at the vertices. The goal of an

adversary is therefore to discover the individual cor-

responding to a node in the graph.

A communication graph G

) can be repre-

sented as a database D of size n = |V

| where each

tuple t of D corresponds to an individual id. The tu-

ple dimension is m = |V

| as well. The ith attribute

of t is 1 if t. id has communicated with the individual

corresponding to node i, and 0 otherwise. A row in D

represents the ego network of a vertex.

An example of a communication graph and its cor-

responding database is shown in Figure 1. Note that

we consider a binary communication event (0 or 1).

Other models might be explored in future work, for

instance annotating the edges with call frequency, av-

erage duration, call time or other meta-data. since the

BP framework is deﬁned over categorical data, bin-

ning might be used if the meta data is not categorical.

The BP notation is based on the DP notation as

summarized in Table 1.

2.1 Secrets

In addition, BP deﬁnes secrets and discriminative

pairs of secrets as shown in Table 1. We give exam-

ples of secrets and pairs of secrets over a communica-

tion graph in Table 2.

Guess

the real t

Adversary

Pick a secret pair

) from S

pairs

Challenger

id,t

b(0 or 1)

Repeat

Figure 2: Discriminative pair of secrets as a game.

The discriminative secret graph generalizes the speci-

ﬁcation of discriminative pairs of secrets. It is a graph

where vertices represent secrets and edges link only

the discriminative pairs of secrets. More formally it

is denoted G = (V, E) where V = T and E ⊆ T × T .

Even though not addressed in the original BP paper,

we explore the idea of having a discriminative secret

graph per individual. This allows us to treat some

persons as VIP and put their privacy on top prior-

ity, where other persons can have lower privacy con-

straints. This may help increase the utility of the com-

munication graph to an appropriate level.

Table 1: Notation of BP, secrets, and discriminative pairs.

Symbol Description

D Database of n tuples

T = A

×. .. ×A

Domain of m categorical attributes

t ∈ T A single tuple

t. id Id of the tuple’s real owner

t.A

Value of the ith attribute in tuple t

Set of all possible datasets with size n (|D| =

) ∈ N

In DP, D

and D

are neighbors, they differ

in the value of one tuple

A randomized mechanism, for example

adding random noise to the result of a query

S ⊆

range(M )

A set of the outputs generated by M

ε-DP

For every S and every two neighbors

): Pr[M (D

) ∈ S] ≤ e

Pr[M (D

) ∈

f : I

−→ R

A function that takes a database as input and

returns a vector of real numbers as output,

for example a countIf query

S( f )

The global sensitivity of f is the max

Manhattan distance between the outputs

for any two neighbor databases: S( f ) =

max

)∈N

|| f (D

) − f (D

)||

Lap

The Laplace Mechanism adds η ∈ R

f (D), where η is a vector of independent

random variables. Each η

is drawn from the

Laplace distribution with parameter S( f )/ε:

Pr[η

= z] ∝ e

−z.ε/S( f )

P =

,. .. ,P

)

A partitioning of the domain T

: I

−→ Z

A histogram query. h

(D) outputs for each

the number of times values in P

appear

in D. The sensitivity of histogram queries is

S(h

) = 2 since replacing a tuple by another

one may decrease the count of a partition and

increase the count of another partition.

The complete histogram query, it outputs for

each t ∈ T the number of times it appears in

(D)

The expected mean squared error of M :

(D) =

∑

E[( f

(D) −

(D))

] where

(D) and

(D) are the ith components of

the true answer and the noisy answer, re-

spectively. For Laplace mechanism and his-

togram queries, this error is: E

Lap

(D) =

|T |.E(Laplace(2/ε))

= 8|T |/ε

. A large

epsilon means less error, hence more utility.

An arbitrary statement over the val-

ues in the database. Example1:

t. id = ’Bob’ ∧ t.disease = ’cancer’.

Example2: t.

id = ’Bob’ ∧ t

. id =

’Alice’ ∧t

.disease = t

.disease

A set of secrets that the data owner would

like to protect, e.g. {Example1, Example2}

SECRYPT 2020 - 17th International Conference on Security and Cryptography

460

Table 1: Notation of BP, secrets, and discriminative pairs.

(cont.)

Symbol Description

(s,s

) ∈ S ×S A pair of secrets, e.g. (Example1, Example2)

A discrimi-

native pair of

secrets (s,s

)

A mutually exclusive pair of secrets. Two

statements that cannot be true at the same

time. An adversary must not be able to dis-

tinguish which one is true and which one

is false, e.g. (t. id = ’Bob’ ∧ t = x, t. id =

’Bob’ ∧t = y)

The secret t. id = i ∧t = x where x ∈ T , e.g.

’Bob’

(’cancer’,65)

pairs

A set of discriminative pairs of secrets, e.g.

full

pairs

, S

attr

pairs

, S

pairs

, S

d,θ

pairs

A set of discriminative pairs of se-

crets based on graph G(V,E), i.e.

{(s

)|∀i,∀(x, y) ∈ E}

Full domain:

full

pairs

For every individual, the value

is not known to be x or y, i.e.

{(s

)|∀i,∀(x, y) ∈ T × T }

Attributes:

attr

pairs

For every individual and every two tu-

ples differing in the value of only one

attribute A where one of them is real,

the real tuple is not known. The privacy

deﬁnition is weaker than in full domain

full

pairs

since the real tuple is distinguish-

able if more than one attribute differs, i.e.

{(s

)|∀i,∃A, x[A] 6= y[A] ∧ x[

A] = y[

A]}

Partitioned:

pairs

For every individual and every two tuples

coming from the same partition where one

of them is real, the real tuple is not known,

i.e.

{(s

)|∀i,∃ j,(x , y) ∈ P

× P

This

privacy deﬁnition is very useful for location

data.

Distance

threshold:

d,θ

pairs

For every individual and every two tuples

having their distance less than or equal

to a threshold θ where one of them is

real, the real tuple is not known, i.e.

{(s

)|∀i,d(x,y) ≤ θ}

In this direction, the idea of a discriminative secret is

very similar to what consists a game in cryptography.

We prefer to call it a privacy game here and represent

it as shown in Figure 2. In this game, a challenger

picks an Id (e.g. Bob) and a pair of discriminative se-

crets at random (e.g. ”Bob has called Alice” or ”Bob

has not called Alice”). The pair is represented by two

tuples, or an edge in the discriminative secret graph

of the Id. The edge vertices identify the two tuples.

The challenger sends the Id and the two tuples to the

adversary (e.g. which one does belong to Bob?). The

adversary has to guess which of the two tuples be-

longs to the id and responds with only 1 bit b. b = 0

is chosen for t

and b = 1 for t

Our goal is to make the probability of the adver-

sary guessing the assumed right tuple not signiﬁcantly

different than a coin ﬂip.

An important remark about undirected communi-

Table 2: Examples of notions of secrets for a communica-

tion database.

Symbol Description - Example

Secret: s

Bob has talked to Alice: t

. id = ’Bob’ ∧

. id = ’Alice’ ∧t

[ j] = t

[i] = 1

A discrimi-

native pair of

secrets (s,s

)

Given two communication tuples (ego net-

works), we cannot distinguish which one of

them belongs to Bob, for example, (t. id =

’Bob’ ∧ t = (0,1, 1,1), t. id = ’Bob’ ∧ t =

(0,0, 0,1))

The secret where individual i has ego net-

work x, for example, s

’Bob’

(0,1,1,1)

full

pairs

For an individual, all ego networks are dis-

criminative

attr

pairs

For an individual and two vectors that differ

in only one communication, we cannot tell

which one is real.

pairs

For an individual and two tuples belonging

to the same partition, we cannot tell which

tuple is the real one.

(d,θ)

pairs

Given a distance metric and a threshold. The

privacy game is to challenge the adversary

with one individual and two records having

their distance less than or equal to thresh-

old. A suitable distance for communication

graphs is the Hamming distance (or the num-

ber of different bits), which is equivalent to

the Manhattan distance in this case.

cation graphs is that not all the graphs are feasible. If

Bob has talked to Alice, it means that Alice has talked

to Bob. The database matrix is symmetric. Another

constraint is that t

[i] must be 0, and all other entries

are either 0 or 1. The BP framework allows to de-

ﬁne constraints about the dataset, and redeﬁnes the

notion of neighborhood databases by excluding inter-

mediate, yet infeasible ones. Therefore, we suggest

that BP is a more suitable framework for communica-

tion graphs than its DP predecessor.

2.2 Auxiliary Knowledge

Auxiliary knowledge is usually formalized using cor-

relations, for example c(R = r

) + c(R = r

) = a

where c(r

) is the count of records having the attribute

R equal to r

, c(r

) is the count of records having

the attribute R equal to r

, and a

is known. BP sug-

gests to formalize auxiliary knowledge in terms of a

set of constraints Q that a database D must satisfy.

It denotes I

⊂ I

the subset of all possible database

instances. In the case of undirected communication

graphs, we have two inherent constraints:

• the matrix of D is symmetric: t

= t

• the ego attributes are zero: t

= 0

It is also possible to use directed communication

graphs where a directed edge from Bob to Alice

VIP Blowﬁsh Privacy in Communication Graphs

461

means that Bob has called Alice. In this case the

ﬁrst constraint above is not considered. Additional

constraints which are not necessarily inherent to the

graph representation can be considered, for example:

• Count Queries. The number of individuals that have 5

neighbors.

• Marginal Constraints. A marginal is the projection of

the database on a given subset of columns. Rows hav-

ing the same projection are grouped in one record along

with their count. In our context we project on a subset

of nodes. For example let’s project the database in Fig-

ure 1 on Bob and Eve only (columns 1 and 3). Alice and

Carol have the same projection since both have called

Bob and Eve. Therefore the projection have 3 rows:

(Bob,1), (Eve,1) and (Alice-Carol,2).

• Meta-node Constraints. A meta-node is a node repre-

senting a sub-graph or a group of individuals. Meta-

node auxiliary knowledge is for example the number of

people calling a group of individuals, or the number of

calls in between two groups of individuals. The adver-

sary may know that the group Bob-Carol and the group

Alice-Eve have three calls linking them.

• Clique Constraints. A clique is a complete graph. The

adversary may know that a group of nodes makes a

clique. For example Bob, Alice and Eve form a clique.

2.3 Policy and Privacy Deﬁnitions

To apply BP, one must deﬁne a policy P(T ,G,I

)

which is composed of a set of tuples T , a discrimi-

native secret graph G(V,E) based on sets of discrim-

inative pairs S

pairs

, and a set of possible database in-

stances I

under the auxiliary knowledge constraints.

One also has to devise a randomized mechanism M

that satisﬁes (ε,P)-BP. Concretely, for every pair of

neighboring databases, denoted (D

) ∈ N(P), and

every set of outputs S ⊆ range(M ), we have:

Pr[M (D

) ∈ S] ≤ e

Pr[M (D

) ∈ S]

To see how it differs from DP, let’s consider D

D ∪ {x} and D

= D ∪ {y}, two databases that dif-

fer in one tuple, and suppose P = (T ,G,I

), i.e., no

constraints. D

and D

are not considered neighbors

unless (s

) ∈ S

pairs

. Otherwise, having M that sat-

isﬁes (ε,P)-BP means that:

Pr[M (D

) ∈ S] ≤ e

ε.d

(x,y)

Pr[M (D

) ∈ S]

since BP is shown to satisfy sequential composition.

Similarly to increasing ε, the chance of an attacker

to distinguish between pairs farther apart in the graph

is higher. We gain overall utility by scarifying local

privacy of some users.

The Laplace mechanism M

Lap

ensures

(ε, P(T , G, I

))-BP for any query function,

f : I

−→ R

, by outputting f (D) + η where

η ∈ R

is a vector of independent random num-

bers drawn from Lap(S( f ,P)/ε). S( f , P) is the

policy-speciﬁc global sensitivity and is deﬁned as

max

)∈N(P)

f (D

) − f (D

)

Following the deﬁnition of neighbors in He et al.

(2014), let T (D

) the set of discriminative pairs

) such as the ith tuples in D

and D

are x and

y. Let ∆(D

) = D

∪ D

. D

and D

are neighbors, if: (1) they both comply to the con-

straints, (2) T 6=

0, and (3) T has the smallest size,

there is no feasible database D

such that T (D

) ⊂

T (D

) or T (D

) = T (D

) & ∆(D

) ⊂

∆(D

). In our communication graph representa-

tion, two databases are candidate neighbors if they

differ by the ego network of one individual, and this

difference is represented in the security graph of that

individual. Note that this means that one or several

edges might be added or removed between two neigh-

bor communication graphs.

To give an example, the two graphs in Figure 3

are different in three tuples: |∆(D

)| = 6. If only

one of the different pairs is in the security graph, for

instance Bob’s pairs, we have |T | = 1. There is no

database having a non-empty subset of T , and no fea-

sible database with same T and a subset of ∆. (To do

so, we need to make Alice’s ego network indifferent,

or Eve’s ego network indifferent, which is not possi-

ble due to symmetry constraints). We consider that

these two graphs are neighbors.

Bob

Alice

Eve

Carol

id tuple

Bob (0, 0, 1, 1)

Alice (0, 0, 1, 0)

Eve (1, 1, 0, 1)

Carol (1, 0, 1, 0)

Bob

Alice

Eve

Carol

id tuple

Bob (0, 1, 0, 1)

Alice (1, 0, 1, 0)

Eve (0, 1, 0, 1)

Carol (1, 0, 1, 0)

Figure 3: Neighboring graphs and their databases. The ego

network of Bob has changed, and Bob has a G

full

policy.

Under P(T ,G

full

), we can obtain two neigh-

bor communication graphs by taking one vertex and

changing its ego network. Any two communication

graphs that differ in n + 1 tuples where n tuples differ

in one bit and one tuple differs in n bits are considered

neighbors under G

full

To make the concept of neighbor databases

used throughout the paper more straightforward, we

demonstrate the following result:

Theorem 1. Given G

), its database/matrix rep-

resentation M(G

) and the policy P(T , G,I

), where T

represents all binary vectors of size |V

|, G represents the

SECRYPT 2020 - 17th International Conference on Security and Cryptography

462

overall graph of discriminative secret graphs for all the

nodes, and I

constrains the possible databases to have:

(1) ∀i 6= j,M

i, j

= M

j,i

= 0 or M

i, j

= M

j,i

= 1 and (2)

∀i,M

i,i

= 0. If G = G

attr

or G is any non-empty subset of

attr

, we have that: Two graphs G

and G

are neighbors,

i.e., (G

) ∈ N(P), if they differ by one and only one edge

e(i, j) = e( j, i) and for at least one vertex of the edge (either

i or j) the discriminative secret pair (s

) (where x and y

differ at the bit j) or (s

) (where a and b differ at the bit

i) is in the security graph G.

Proof. G

attr

means that two tuples form a discriminative

secret pair if they differ by only one attribute. This differ-

ence is reﬂected in the communication graph by the addition

or removal of one edge.

If G

and G

differ by one or more edges that do not

correspond to discriminative pairs in the security graph G,

then T (G

) =

0 and the graphs are not neighbors.

If G

and G

differ by many edges that affect many

secret pairs in G, then we can build a graph G

that takes

only one of these edges that affects one (or two) secret pairs

in G to form a subset of T (G

), and therefore the two

graphs are not neighbors.

For the case where G

and G

differ by many edges and

for only one of them e(i, j ) we have (s

) ∈ G or (s

) ∈

G or both, then T (G

) is the minimal possible set. But

we can ﬁnd a sub-graph G

of G

where T (G

) =

T (G

) by removing the extra edges which do not have

any discriminative secret pairs that belong to the security

graph. Then, G

and G

are not neighbors.

For the case where G

and G

differ by only one edge

e(i, j), and we have (s

) ∈ G or (s

) ∈ G or both, then

T (G

) is minimal and there is no feasible intermediate

database. Only in this case G

and G

are neighbors.

2.4 BP with Individualized Security

Graphs

We consider the possibility that different individuals

may have different security graphs. For example, we

can divide the users into two extreme sub-groups: VIP

and Standard. The discriminative secret graph for a

VIP user is complete or attribute-based. The discrim-

inative secret graph for a standard user has 0 edges.

The application of Theorem 1 to the case where

standard nodes’ security graph is G

empty

and VIP

nodes’ security graph is G

attr

can be explained as fol-

lows. Take two communication graphs that differ by

only one edge:

• Case I. If the vertices of the edge are standard nodes,

then T (G

) =

0 and the graphs are not neighbors.

• Case II. If one of the vertices is VIP and the other is

standard, then the size of T is 1 and the graphs are

neighbors.

• Case III. If the vertices of the edge are two VIP nodes,

then the size of T is 2 and the graphs are neighbors. Any

intermediate database that makes | T |= 1 is infeasible.

3 EXAMPLES OF QUERIES AND

BP MECHANISMS

To apply BP given a query or a function f over the

protected database D, one has to determine ﬁrst the

global sensitivity Dwork et al. (2006) of f , based on

the privacy policy P = (T ,G,I

S( f ,P) = max

)∈N(P)

|| f (D

) − f (D

)||

Once the global sensitivity S( f , P) is identiﬁed, out-

putting f (D) + η ensures (ε,P)-BP if η ∈ R

is a

vector of independent random numbers drawn from

Lap(S( f ,P)/ε).

3.1 Example 1: Histogram Query for

Degrees of Vertices

Under P

attr

, two graphs G

and G

are neighbors if

= G

∪ {e}. Assume DV = dv

,...,dv

|DV |

is the

set of all degrees for the vertices in these graphs.

If the edge e is added between node a having de-

gree dv

and node b with degree dv

, i 6= j, then the

count of dv

and dv

will decrease each by 1 while

i+1

and dv

j+1

will increase each by 1. Also the

cumulative count of dv

(respectively dv

) decreases

by 1, yet the cumulative count of dv

i+1

(respectively

j+1

) stays unchanged, as shown in Figure 4.

If the edge e is added between two nodes both

having the same degree dv

, then the count of dv

will decrease by 2 and dv

i+1

will increase by 2.

Also the cumulative count of dv

decreases by 2, yet

the cumulative count of dv

i+1

stays unchanged, as

shown in Figure 5. Taking both cases into account,

the global sensitivity of complete histogram query is

S( f

complete

attr

) = 4, and the global sensitivity of a

cumulative histogram query is S( f

cumulative

attr

) = 2.

Under P

full

, two graphs G

and G

are neighbors

if they differ by the ego network of one vertex. In the

worst case, the vertex passes from degree 0 to degree

n − 1, where n is the number of vertices in the graph.

All the other vertices have their degrees shifted by +1.

In total, 2n bins are affected and the sensitivity is 2n.

In cumulative histogram query, in a worst case sce-

nario, n vertices change their degrees and move from

one bin to another, however the receptive bin does not

change its count. The sensitivity is n.

VIP Blowﬁsh Privacy in Communication Graphs

463

3.2 Example 2: Histogram of Degrees of

Vertices for Standard Nodes

In this exercise, we divide the communication graph

vertices into two groups: VIP nodes and standard

nodes. The discriminative secret graph for a VIP node

is built as follows: There is no edge between two

tuples if they differ by more than one attribute (i.e.

attr

). In addition, we consider only attributes that

belong to a VIP vertex. Two tuples differing by an at-

tribute corresponding to a standard node are not con-

nected in the discriminative secret graph. We denote

this set of secret pairs: S

attr,VIP

pairs

Consider the following query: ”Histogram of de-

grees of vertices for standard nodes”. To compute

their sensitivity we examine the three cases: (a) the

edge we add/remove is between two VIP nodes: noth-

ing will change in the histogram of the query; (b) the

edge we add/remove is between one VIP node and

one standard node: one of the bins in the histogram

will decrease by 1 and its right-hand neighbor will in-

crease by 1; (c) the edge we add/remove is between

two standard nodes. This edge does not correspond to

a secret pair. It means that this case will not occur for

two neighbor graphs and can therefore be ignored.

It follows that the sensitivity of this query under

attr,VIP

pairs

is only 2. The sensitivity is reduced by 50%

in comparison to the full histogram query. By limiting

the privacy focus to the VIP nodes, we gain in terms

of utility for queries over the standard nodes.

3.3 Example 3: Histogram of the

Number of Connections between

VIP Nodes and Standard Nodes

Bob

Alice

Eve

Carol

degree 0 1 2 3 0 1 2 3

count 1 2 1 0 0 3 0 1

cumul-

ative count

1 3 4 4 0 3 3 4

Figure 4: Counts and cumulative counts of node degree for

graphs G

and G

(with dashed red edge), the added edge

e connects two nodes of different degrees.

A similar query is the ”Histogram of the number of

connections between a VIP node and standard nodes”

or ”Histogram of the number of connections between

a standard node and VIP nodes” . To compute their

sensitivity we examine the three cases:

(a) the edge we add/remove is between two VIP nodes:

nothing will change in the query’s result,

(b) the edge we add/remove is between one VIP node and

Bob

Alice

Eve

Carol

degree 2 3 2 3

count 2 2 0 4

cumulative

count

2 4 0 4

Figure 5: Counts and cumulative counts of node degree for

graphs G

and G

(with dashed red edge), the added edge

e connects two nodes of the same degree.

one standard node: two of the bins in the histogram

will vary by ±1,

nodes. This edge does not correspond to a secret pair

and does not change the query result in the same time.

The sensitivity of these queries under S

attr,VIP

pairs

is 2.

These queries are useful in a graph where the stan-

dard nodes are the members of a company’s support

team and the VIP nodes are the customers. The calls

between a support team member and the customers

are the target. We aim to study, for example, if a load

balancing strategy works well, or how many clients a

support member is serving in average. At the same

time, we are protecting the privacy of the customers.

4 EXPERIMENTS AND RESULTS

In this section, we present the results of our exper-

iments to evaluate the BP on graphs both in terms

of utility and privacy. We compare G

attr

and G

full

for histogram queries. In addition, we show the

utility of some queries under G

attr,VIP

. Our exper-

iments use three graph datasets collected from the

music streaming service Deezer Rozemberczki et al.

(2019). These datasets represent the friendship Net-

work of users from Croatia (HR) of 54,573 nodes and

498,202 edges, Hungary (HU) of 47,538 nodes and

222,887 edges and Romania (RO) of 41,773 nodes

and 125,826 edges.

4.1 MSE of Complete Histogram

The Mean Squared Error (MSE) for complete his-

togram queries under G

attr

and G

full

are:

Lap

attr

(D) = bE[Laplace(4/ε)

] = 32b/ε

Lap

full

(D) = bE[Laplace(2n/ε)

] = 8n

b/ε

where n is the number of vertices and b is the number

of bins. We empirically sample the MSE of the com-

plete histogram query for a given ε by generating the

real histogram of each graph (HR, HU, and RO) and

k noisy versions. We compute the MSE of the noisy

SECRYPT 2020 - 17th International Conference on Security and Cryptography

464

(a) Full Policy

Complete Histogram.

(b) Attribute Policy

Complete Histogram.

Figure 6: MSE of complete histograms queries under Full

and Attribute Policies (k = 10).

versions as follows:

MSE(H,H

) =

∑

i=1

mean

[(H(i) − H

(i))

]

where H is the original histogram and H

is its

ε-noisy version. We choose ten epsilon values:

0.1, 0.2, ..., 1. The number of bins b is equal to

421, 113 and 113 for the graphs HR, HU and RO re-

spectively. The results are shown in Figure 6a under

full

and in Figure 6b under G

attr

. G

full

has null util-

ity whereas for G

attr

we expect the standard deviation

per bin to be around 32 to 56 (depending on ε). Using

coarser bins may reduce this error by decreasing b.

4.2 MSE of Cumulative Histogram

MSE for cumulative histogram under G

attr

and G

full

Lap

attr

(D) = bE(Laplace(2/ε))

= 8b/ε

Lap

full

(D) = bE(Laplace(n/ε))

= 2n

b/ε

The difference in MSE under G

full

and G

attr

is also

clear for cumulative histogram queries as shown in

Figure 7a and Figure 7b.

4.3 Simulating Sensitivity Results

In this experiment, we sample neighbors of our input

graphs and compute the difference in the values of the

histogram queries. We compare the obtained values

with our derived sensitivity formulas. We sample the

sensitivity values for our input graphs as follows:

(a) take an input graph, and compute its histogram H,

(b) take a vertex at random,

full

, change the value of only one edge for G

attr

(d) compute the histogram query for the obtained graph

, and compute the L1-norm ||H − H

(e) repeat starting at (c),

(a) Full Policy

Cumulative Histogram.

(b) Attribute Policy

Cumulative Histogram.

Figure 7: MSE of cumulative histograms queries under Full

and Attribute Policies (k = 10).

(f) take max ||H −H

and compare to the corresponding

sensitivity: 2n for G

full

and just 4 for G

attr

(g) repeat starting at (a),

Different strategies can be adopted to change the ego

network under G

full

• Take-out: The chosen vertex is taken out by removing

all its connections similarly to node-based DP.

• Random Ego Network: The chosen vertex samples

a Bernoulli distribution with predeﬁned probability p

(e.g. 0.5) to decide on linking to each node in the graph.

• Flipped Ego Network: The chosen vertex deletes all its

neighbors and connects to all non-neighbors.

We apply the above strategies to the complete his-

togram queries and vertices of different degrees. The

results are shown in Figure 8a. In addition we show

the derived sensitivity formula and the worst case

take-out difference (2 × (deg

+ 1)). It is clear that

full

is unreasonably pessimistic about the sensitiv-

ity of complete histograms queries. We can gain

more utility by deriving more speciﬁc secret graphs

(based on the strategy of ego network manipulation)

or adding constraints and auxiliary knowledge to the

policy. For instance, it is unreasonable that a single

node would be connected to all the nodes in the graph.

Similar experiments for the cumulative histogram

queries are shown in 8b. In contrast, random and

ﬂipped ego network values are very close to the de-

rived sensitivity formula.

4.4 Extrapolation of Queries under

attr

We have shown that limiting privacy to some VIP

nodes provides utility for queries such as ”Histogram

of degrees of vertices for standard nodes” and ”His-

togram of the number of connections between a VIP

node and standard nodes”. In this experiment, we

show that these queries can be exploited to esti-

mate information about the complete graph. For in-

stance, the histogram of degrees of vertices for stan-

VIP Blowﬁsh Privacy in Communication Graphs

465

(a) Complete Histogram. (b) Cumulative Histogram.

Figure 8: Simulation of the queries under Full Policy.

Figure 9: MSE of extrapolation for complete histogram

queries under VIP/Standard partition.

dard nodes can be extrapolated to estimate the his-

togram of degrees of vertices for all the nodes. The

histogram of the number of connections between a

VIP node and standard nodes can be extrapolated to

estimate the histogram of degrees of VIP nodes, and

so on. The histogram of degrees of vertices for stan-

dard nodes can be extrapolated as follows:

all nodes

(i) = H(i)

standard

100

100−%

V IP

The MSE of the extrapolated histogram query com-

pared to the complete histogram query is shown in

Figure 9. Naturally, the error depends on the ratio of

the number of VIP nodes to the total number of ver-

tices in the graph.

5 RELATED WORK

The authors of this paper are recently conducting re-

search in secure outsourcing of computations Nassar

et al. (2013, 2016) and privacy preserving applica-

tions Barakat et al. (2016); Chicha et al. (2018). To

our knowledge, no previous work concerning BP to-

wards graph datasets has been considered in the liter-

ature. However, many DP mechanisms were prosped

for graphs. In Hay et al. (2009), Hay et al. divide

these mechanisms into two types: edge-DP and node-

DP.

Usually, a node in a graph represents a person

while an edge represents a connection between two

persons. The purpose of edge-DP Blocki et al. (2012)

is to prevent the usage of these connections for reveal-

ing the identity of a person. On the other hand, node-

DP achieve similar data protection by blurring node

appearance in the graph. Node-DP is much more sen-

sitivity than edge-DP, which is usually preferable.

Graph projection is a technique to apply node-DP.

A parameter α is used to transform a graph to be α-

degree-bounded. In Kasiviswanathan et al. (2013), all

the nodes with a higher degree than α are removed,

which causes a much higher number of edges to be

removed than necessary.

In Raskhodnikova and Smith (2015), Raskhod-

nikova et al. use a Lipschitz extension tool and a

generalized exponential mechanism to release an ap-

proximate histogram of degree distribution in a graph

under node-DP with a sensitivity of 6α. Many works

propose to generate noisy degree distributions from

graphs under DP. Generative methods are then used

to create output graphs fulﬁlling noisy input distribu-

tions Day et al. (2016). Qin et al. Qin et al. (2017)

propose LDPGen for decentralized social networks.

It collects neighbor lists of the nodes and reconstructs

the graph in two phases under local edge-DP.

Local and smooth sensitivity achieves less noise

than the global sensitivity Nissim et al. (2007). Fi-

nally, Karma et al. Karwa et al. (2011) presents ef-

ﬁcient algorithms to provide noisy answers to sub

graph counting queries under a relaxed version of

edge-DP. Our work is a departure from previous

works, ﬁrst because it employs the framework of

BP, and second because it allows a per-identity cus-

tomized privacy policy for individuals in the graph.

6 CONCLUSION

In this paper, we have summarized BP, its formal

model and the enhanced privacy-utility trade-off that

it brings with respect to its predecessor, DP. We en-

rolled examples of its application to communication

graph databases and their typical queries. We further

studied the idea of privacy as a service with differ-

entiation among different groups of individuals. We

showed that this relaxation is formally feasible and

proved its utility through the enrollment of several

queries and computing their sensitivity. We work un-

der the settings of binary differentiation (standard vs.

VIP) and binary communication status (0 or 1).

ACKNOWLEDGMENTS

The authors would like to acknowledge the National

Council for Scientiﬁc Research of Lebanon (CNRS-

SECRYPT 2020 - 17th International Conference on Security and Cryptography

466

L) and Univ. Pau & Pays Adour, UPPA - E2S,

LIUPPA for granting a doctoral fellowship to Elie

Chicha. This work is also partially suported by a grant

from the University Research Board of the Ameri-

can University of Beirut (URB-AUB-2019/2020), the

National Council for Scientiﬁc Research of Lebanon,

and the Antonine University.

REFERENCES

S. Barakat, B. A. Bouna, M. Nassar, and C. Guyeux. On

the evaluation of the privacy breach in disassociated

set-valued datasets. arXiv preprint arXiv:1611.08417,

2016.

J. Blocki, A. Blum, A. Datta, and O. Sheffet. The johnson-

lindenstrauss transform itself preserves differential

privacy. In Foundations of Computer Science (FOCS),

2012 IEEE 53rd Annual Symposium on, pages 410–

419. IEEE, 2012.

E. Chicha, B. Al Bouna, M. Nassar, and R. Chbeir. Cloud-

based differentially private image classiﬁcation. Wire-

less Networks, pages 1–8, 2018.

W.-Y. Day, N. Li, and M. Lyu. Publishing graph degree dis-

tribution with node differential privacy. In Proceed-

ings of the 2016 International Conference on Manage-

ment of Data, pages 123–138, 2016.

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Cali-

brating noise to sensitivity in private data analysis. In

Theory of cryptography conference, pages 265–284.

Springer, 2006.

M. Hay, C. Li, G. Miklau, and D. Jensen. Accurate esti-

mation of the degree distribution of private networks.

In Data Mining, 2009. ICDM’09. Ninth IEEE Interna-

tional Conference on, pages 169–178. IEEE, 2009.

X. He, A. Machanavajjhala, and B. Ding. Blowﬁsh pri-

vacy: Tuning privacy-utility trade-offs using policies.

In Proceedings of the 2014 ACM SIGMOD inter-

national conference on Management of data, pages

1447–1458. ACM, 2014.

V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavt-

sev. Private analysis of graph structure. Proceedings

of the VLDB Endowment, 4(11):1146–1157, 2011.

S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and

A. Smith. Analyzing graphs with node differential pri-

vacy. In Theory of Cryptography Conference, pages

457–476. Springer, 2013.

M. Nassar, A. Erradi, F. Sabri, and Q. M. Malluhi. Secure

outsourcing of matrix operations as a service. In 2013

IEEE Sixth International Conference on Cloud Com-

puting, pages 918–925. IEEE, 2013.

M. Nassar, N. Wehbe, and B. Al Bouna. K-nn classi-

ﬁcation under homomorphic encryption: application

on a labeled eigen faces dataset. In 2016 IEEE Intl

Conference on Computational Science and Engineer-

ing (CSE) and IEEE Intl Conference on Embedded

and Ubiquitous Computing (EUC) and 15th Intl Sym-

posium on Distributed Computing and Applications

for Business Engineering (DCABES), pages 546–552.

IEEE, 2016.

K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sen-

sitivity and sampling in private data analysis. In Pro-

ceedings of the thirty-ninth annual ACM symposium

on Theory of computing, pages 75–84. ACM, 2007.

Z. Qin, T. Yu, Y. Yang, I. Khalil, X. Xiao, and K. Ren. Gen-

erating synthetic decentralized social graphs with lo-

cal differential privacy. In Proceedings of the 2017

ACM SIGSAC Conference on Computer and Commu-

nications Security, pages 425–438, 2017.

S. Raskhodnikova and A. Smith. Efﬁcient lipschitz ex-

tensions for high-dimensional graph statistics and

node private degree distributions. arXiv preprint

arXiv:1504.07912, 2015.

B. Rozemberczki, R. Davies, R. Sarkar, and C. Sutton.

Gemsec: Graph embedding with self clustering. In

Proceedings of the 2019 IEEE/ACM International

Conference on Advances in Social Networks Analysis

and Mining 2019, pages 65–72. ACM, 2019.

VIP Blowﬁsh Privacy in Communication Graphs

467