Exploring Differential Privacy in Practice

Davi Grossi Hasuda and Juliana de Melo Bezerra

Computer Science Department, ITA, S

ao Jos

e dos Campos, Brazil

Keywords:

Privacy, Differential Privacy, Classiﬁcation Algorithms, Accuracy, Data Analysis.

Abstract:

Every day an unimaginable amount of data is collected from Internet users. All this data is essential for

designing, improving and suggesting products and services. In this frenzy of capturing data, privacy is often

put at risk. Therefore, there is a need for considering together capturing relevant data and preserving the

privacy of each person. Differential Privacy is a method that adds noise in data in a way to keep privacy. Here

we investigate Differential Privacy in practice, aiming to understand how to apply it and how it can affect data

analysis. We conduct experiments with four classiﬁcation techniques (including Decision Tree, N

aive Bayes,

MLP and SVM) by varying privacy degree in order to analyze their accuracy. Our initial results show that low

noise guarantees high accuracy; larger data size is not always better in the presence of noise; and noise in the

target does not necessary disrupt accuracy.

1 INTRODUCTION

It is noticeable that data is an important asset of the

globalized world. Every moment, a lot of new data

is being generated by users of the Internet worldwide,

which is actually useful for many companies that in-

vest in storing and processing all the data they can col-

lect (World, 2018). Amazon.com, for instance, devel-

oped its Recommender System by searching for users

with similar interests, and made suggestions based on

this similarity (Smith and Linden, 2017). Generally,

for companies to understand how the user experience

is evolving with the product or service, they have to

collect user data (Havir, 2017). Another important

factor of the globalized world is the fact that some of

this data is used to train neural networks, and very

often this training requires a great amount of data.

Face ID, for example, the Apple’s system to recog-

nize someone’s face and authenticate based on that,

took over 1 billion images to train its neural network

(Apple, 2017b).

In the midst of the frenzy of collecting data,

many times privacy is jeopardized (Buffered.com,

2017). One of the most emblematic case was the

scandal involving Facebook and Cambridge Analyt-

ica (Granville, 2018), where Facebook provided the

data and Cambridge Analytica used it improperly to

inﬂuence the presidential run in the United States.

Another example was with Tanium, a cybersecurity

startup, that exposed the network of a client without

permission (Winkler, 2017). Concerned about these

privacy scandals, there are some efforts emerging in

order to preserve privacy. GDPR, for instance, is the

General Data Protection Regulation (GDPR, 2018)

from the European Union that rewrites how data shar-

ing must work on the Internet. GDPR describes con-

straints and rules when accessing and sharing user

data. Another effort, which is the focus of our pa-

per, is Differential Privacy (DP)(Dwork, 2006). DP

establishes constraints to algorithms that concentrate

data in a statistical database. Such constraints limit

the privacy impact on individuals whose data is in the

database.

Simple anonymization processes can be very inef-

fective for assuring privacy. For example, there is the

case of 2006 Netﬂix Prize, a competition promoted

by Netﬂix where competitors must develop an algo-

rithm to predict ratings from users. For that, Netﬂix

shared a dataset with over 100 million ratings by over

480 thousand users. All the names were removed, and

some fake ratings were added. But as shown later,

it was not enough, since a de-anonymization process

was possible by comparing the Netﬂix dataset with an

IMDb dataset (Dwork and Roth, 2014). So, the pro-

cess that privatize data must be linkage attack-proof.

Besides, it must not compromise the ﬁnal result of

machine learning algorithms and statistical studies

where they are used. It means that, after the privatiza-

tion process, it is expected that the data is still useful

(Abadi et al., 2016). Fortunately, Differential Privacy

already takes that in count. Moreover, DP provides a

way of measuring privacy (Dwork and Roth, 2014).

Hasuda, D. and Bezerra, J.

Exploring Differential Privacy in Practice.

DOI: 10.5220/0010440408770884

In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 1, pages 877-884

ISBN: 978-989-758-509-8

877

Most of the papers regarding DP focus on theory,

considering deﬁnition, foundations and algorithms re-

lated to DP (Dwork, 2006; Dwork and Roth, 2014;

Dwork et al., 2006; Dwork and Rothblum, 2016).

(Jain and Thakurta, 2014) propose a privacy preserv-

ing algorithm for risk minimization. (McSherry and

Talwar, 2007) indicate that participants have limited

effect on the outcome of the DP mechanism. (Minami

et al., 2016) focus on the privacy of Gibbs posteriors

with convex and Lipschitz loss functions. (Mironov,

2017) discuss a new deﬁnition for differential pri-

vacy. (Foulds et al., 2016) try to bring a practical per-

spective of DP, however it focuses on the Variational

Bayes family of methods. (Apple, 2017a) present

how they determined the most used emoji while pre-

serving users privacy. We then observed that it is

missing more pragmatic approaches about how to im-

plement and use DP algorithms.

In this paper, we apply Differential Privacy in

practice. There are two main types of privatization:

Online (or adaptative or interactive) and Ofﬂine (or

batch or non-interactive) (Dwork and Roth, 2014).

The online type depends on the queries made and the

number of them (which can be limited). The ofﬂine

type of privatization does not make assumptions about

the number or type of queries made to the dataset, so

all the data can be stored already privatized. We focus

on ofﬂine methods, speciﬁcally on the Laplace mech-

anism (Dwork and Roth, 2014). We study the impact

of this DP mechanism in data analysis. Four classi-

ﬁcation algorithms were considered, including Deci-

sion Tree, Na

ıve Bayes, Multi-Layer Perceptron Clas-

siﬁer (MLP) and Support Vector Machines (SVM).

We are then able to compare the accuracy of each al-

gorithm when using not privatized data or data with

different degrees of privatization.

This paper is organized as follows. Section 2

brieﬂy presents DP and related methods. Section 3

presents our programming support, methodology, re-

sults and discussions. Section 4 summarizes contri-

butions and outlines future work.

2 BACKGROUND

In this section, we describe the coin method, which

is a simple example of DP. Later the deﬁnition of DP

is presented. We also shows an important DP mech-

anism called Laplace mechanism, which in turn is

a particular case of Exponential mechanism (Dwork,

2006)(Dwork and Roth, 2014).

2.1 Coin Method

(Warner, 1965) describes of a simple DP method. In

this experiment, the goal was to collect data that may

be sensitive to people and, because of that, they might

be willing to give a false answer, in order to preserve

their privacy. Let’s suppose we want to make a sur-

vey to know how many people make use of illegal

drugs. It is expected that many people that do use il-

legal drugs might lie in their answer. But in order to

get a clear look at the percentage of people that use

illegal drugs, we can use the coin mechanism in order

to preserve people’s privacy.

It goes according to Figure 1: when registering

someone’s answer, ﬁrst a coin is tossed. If the result

of the ﬁrst toss is Heads, we register the answer the

person gave us (represented by A). On the other hand,

if the result of the ﬁrst toss is Tails, we toss the coin

again. Being the second result Heads, we register Yes

(the person does use illegal drugs); being Tails, we

Figure 1: Coin mechanism diagram.

By the end of the experiment, there will be a database

with answers from all the subjects, but it is expected

that 50% (assuming that the coin has a 50% chance of

getting each result) were artiﬁcially generated. So, if

we look at the answer of a single person, there will be

no certainty if that was the true answer.

At the same time, if we subtract 25% of the total

answers with the answer Yes and 25% of the total an-

swers with the answer No, we can have a clear view

of the percentage of the population that make use of

illegal drugs. It was possible, concomitantly, to have a

statistically accurate result (assuming that there were

enough people involved in the study) and preserve ev-

eryone’s privacy.

2.2 Differential Privacy

The basic structure of a DP method consists of a

mechanism that has the not privatized data as input,

and outputs the privatized data. DP establishes con-

straints that this mechanism must conform to, in order

to limit the privacy impact of individuals whose data

is in the dataset.

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

878

A mechanism M with domain N

|X|

is ε-

differentially private if for all S ⊆ Range(M) and for

all x, y ∈ N

|X|

such that

x − y

≤ 1 :

P[M(x) ∈ S]

P[M(y) ∈ S]

≤ exp(ε)

In this deﬁnition, we have P[E] as the probability of a

certain event E happening and

∑

i=1

What this deﬁnition is making is comparing two

datasets (x, y) that are neighbors (

x − y

≤ 1) and

seeing the probability of the resulted dataset after

privatization being alike. The mechanism is simply

adding noise to the data.

2.3 Exponential Mechanism

One of the most common used DP Mechanism is

called the Exponential Mechanism. Let’s consider the

formal deﬁnitions below.

D: domain of input dataset

R: range of ’noisy’ outputs

R: real numbers

Let’s deﬁne a scoring function f : D × R → R

where it returns a real-valued score for each dimen-

sion it wants to evaluate, given an input dataset x ∈ D

and a output r ∈ R. In simple terms, such score tells

us how ’good’ the output r is for this x input. Given

all of that, the Exponential Mechanism is:

M(x, f , ε) = output r with probability proportional

to exp(

2∆ f (x,r)

) or simply:

P[M(x, f , ε) = r] ∝ exp(

2∆ f (x, r)

)

The ∆ is the sensitivity of a scoring function. For-

mally, we can deﬁne the sensitivity of a function as

being: For every x, y ∈ D such that

x − y

= 1, ∆ is

the maximum possible value for

f (x) − f (y)

Sensitivity value helps us understanding our data

and balances the scale of the noise that must be added,

so it makes sense to the data we are analyzing. Imag-

ine a case where the data we want to add noise is a

colored image, with RGB values for each pixel rang-

ing from 0 to 255 for each of the tree colors. We need

to add a noise to each subpixel that can (with no dif-

ﬁculty) reach values from -255 to 255, so when we

look the value of a single subpixel, we don’t know

what value it was initially. But if we apply this exact

same noise to a black and white image, where each

pixel can be whether 0 (black) or 1 (white), the noise

will be much bigger than the data, and almost all the

utility will be lost. For this case, the noise must have

a smaller scale. The sensitivity balances the scale of

the noise with the possible values the data can reach.

We are not going to demonstrate that the Exponential

Mechanism is ε- differentially private it here, since it

can be found in literature and would deviate from the

purpose of this paper.

2.4 Laplace Mechanism

The equation that describes the Laplace Distribution

is:

f (x | µ, b) =

exp(−

x − µ

)

If the value of b is increased, for example, the curve

becomes less concentrated and more spread. The µ

value is the mean of the distribution. This distribution

can be useful in DP for adding noise to the original

dataset.

The Laplace Mechanism is simply a type of Ex-

ponential Mechanism, which makes it easier to be un-

derstood. To get to the Laplace Mechanism, we ﬁrst

use the Exponential Mechanism, but with a deﬁned

scoring function. Let’s consider the following scor-

ing function:

f (x, r) = −2

x − r

In this scoring function, we are saying that the output

r = x is the best for the output. This implies that the

format (structure) of the output and the input are the

same. If the input is an array of ten zeros, for instance,

the best output is the same array of ten zeros.

With that, we can deﬁne the mechanism as re-

specting the equation:

P[M(x, r, ε) = r] ∝ exp(−

ε|x − r|

∆

)

Using Math manipulations, it’s possible to get other

form to deﬁne the Laplace Mechanism, as follows:

M(x, f , ε) = x + (Y

, ..., Y

)

where Y

are independent and identically distributed

random variables drawn from Lap(

∆

As this mechanism is simple a kind of Exponential

Mechanism, we can say by extent that the Laplace

Mechanism is ε-differentially private.

3 APPLYING DIFFERENTIAL

PRIVACY

In this section, we present the environment to support

programming with DP. We discuss the developed ex-

periments, by present and analyzing the results.

Exploring Differential Privacy in Practice

879

3.1 Differential Privacy Lab

As we want to get a better sense on how to use and

apply DP mechanisms, the ﬁrst step was to build a

lab in which we could run all of our experiments.

For that, we developed an open source code that

implements the Laplace Mechanism and is struc-

tured to easily integrate different tests, datasets and

data analysis techniques. The chosen programming

language was Python 3, which is very common in

data analysis. Furthermore, we implemented some

of it inside Jupyter Notebooks, which is an interac-

tive environment suitable for running the experiments

we want to build. The source code is available at

https://github.com/dhasuda/Differential-Privacy-Lab.

The code was built to be easily extended. You can

plug in a new dataset or your own DP Mechanism. In

Figure 2, there is an UML diagram with the represen-

tation of the architecture. Each Jupyter Notebook has

basically 3 dependencies: DataProvider, Adapter and

Privatizer. The DataProvider is the dataset, with all

the information needed and available to the analysis.

Adapter is simply a class that adapts the format of the

data from DataProvider to the format of the Priva-

tizer.

Privatizer is the class responsible for implement-

ing the DP Mechanism. All privatizers must inherit

from the AbstractPrivatizer abstract class. The ab-

stract class implements the privatize method, that es-

timates the sensitivity of the data (when it is not pro-

vided) and, for each value, adds the noise. The shape

of the noise is implemented inside privatizeSingleAn-

swer. We then implemented the Laplace Mechanism,

which is available in privatizers/laplacePrivatizer.py.

Figure 2: DP lab architecture.

Tests are really important in any code development,

that’s why there are tests for most of the .py ﬁles. The

script that runs all of the tests is the runAllTests.sh.

Every time you change something from the existing

code, make sure all the tests pass.

3.2 Methodology and Results

The dataset for the experiments was named

fetch

covtype. It is a tree-cover type dataset

(the predominant type of tree cover) available inside

scikit-learn, an open source, simple and efﬁcient

set of tools for data analysis. This dataset contains

581,012 samples, with a dimensionality of 54 and 7

possible classes. For the initial investigation, we add

noise to attributes and target (classes).

We implement the analysis of the data for the fol-

lowing classiﬁcation technique: Decision Tree, Na

ıve

Bayes, MLP and SVM. We use the libraries avail-

able in scikit-learn. Inside the notebooks (one for

each technique), there is a section where we adjust

the size of the training data. We can deﬁne an array

with all training data sizes to test, and for each size of

choice, there is a random selection of samples from

the database. All the samples that are not used in the

training are then considered in the testing, in order to

measure the accuracy of the algorithm.

Firstly there is the model training using the raw

data. After that, there are multiple data trainings with

different values for ε in the Laplace Mechanism that

is ε-differentially private. All the values of ε are de-

ﬁned in an array as well. After training the model

with the noisy data for different values of ε, all the

results of accuracy are printed in graphs. Here we

show the results for a ﬁxed size of training data. We

use 1,000 samples for Decision Tree, SVM and MLP

techniques. We use 100 samples for training in Na

ıve

Bayes algorithm, due to limited processing time. The

varying value for this experiment is then the ε value.

For Decision Tree algorithm, results are in Fig-

ure 3. We can observe that the bigger the ε value

(which leans less privacy), the closer the accuracy of

the model is compared to the model trained without

privatized data. It is possible to see the convergence

of values. Besides, for very small values of ε there is

a less consistent accuracy. For Na

ıve Bayes algorithm

presented in Figure 4, the behavior is very similar to

the Decision Tree experiment, with the difference of

a faster convergence of values when decreasing the

privacy level (i.e. increasing of ε value).

According to Figure 5, the SVM experiment

shows the same convergence pattern, but with a

slower convergence compared to the previous experi-

ments and also a more stable response for very small

values of ε. The MLP experiment (presented in Figure

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

880

Figure 3: Accuracy of Decision Tree algorithm.

Figure 4: Accuracy of Na

ıve Bayes algorithm.

Figure 5: Accuracy of SVM algorithm.

Figure 6: Accuracy of MLP algorithm.

6) is the one with less stable results, and also the one

with the biggest difference in accuracy between the

model with privatized and not privatized input data.

But it is still possible to recognize a convergence pat-

tern, even though it is not as uniform as the previous

experiments. In all experiments there was a common

pattern, as expected: less privacy means closer results

between using privatized and non-privatized data.

3.3 Analysis

We already know that privacy in differentially private

algorithms can be measured by the ε value. But what

values of ε in a ε-differentially private mechanism are

good and really preserve the privacy? How to under-

stand the impact of the ε value? Thinking about this

question, we propose a more intuitive way of under-

standing such value, and we called it D Coefﬁcient. D

Coefﬁcient is based on the Coin Mechanism (Warner,

1965), described in Section 2.1.

The Coin Mechanism considers that there are two

possible responses (for instance, Yes or No) of a per-

son for a question. When registering someone’s an-

swer, a ﬁrst coin is tossed. If the result is Heads, it

is registered the answer the person gave. On the other

hand, if the result is Tails, the coin is tossed again. Be-

ing the second result Heads, it is registered Yes; being

Tails, the response is then considered as No.

For deﬁning D Coefﬁcient, the modiﬁcation we

made was in the ﬁrst coin toss of the Coin Mech-

anism. Instead of getting a 50% chance of getting

heads, we decided we would get a D chance of get-

ting heads. In other words, the probability of saving

the true answer ’A’ (and not generating it artiﬁcially)

will be D, as illustrated in Figure 7. D probability is

itself the D Coefﬁcient that can be calculated, based

on the ε value we want to achieve, as:

D(ε) =

exp(ε) − 1

exp(ε) + 1

D Coefﬁcient represents the chance of saving the orig-

inal answer, if a Coin Mechanism with the same pri-

vacy level was used. The entire demonstration of this

formula is out of the scope of this paper. Considering

the data from the four classiﬁcation algorithms previ-

ously presented, we calculated the value of ε*. We

deﬁne ε* as the minimal ε that gives us less than 10%

difference between accuracy without privacy and ac-

curacy with privacy. We chose 10% in order to have

low interference of DP in the analysis, which means

that it would be possible to achieve similar ﬁndings

using privatized data.

We then calculate D(ε*). For N

aive Bayes, we

found ε* = 4 and D(ε*) = 0.964028. For the other

classiﬁcation algorithms, we found ε* = 10 and D(ε*)

= 0.999909. We observed that D value was very close

to 1, which is not a good ﬁnding. It means that if we

use ε* as the privacy level in the Coin Mechanism, the

chances of the ﬁrst coin toss outputting Heads would

be incredibly high (over 95% for all the experiments).

So, the data would be not privatized as expected, since

the majority of records would keep the original data.

Exploring Differential Privacy in Practice

881

Figure 7: Coin mechanism diagram with variable coin prob-

ability D in the ﬁrst toss.

3.4 More Investigations

One of the parameters that was kept constant in the

last experiment was the training data size, i.e. how

many samples we used to train each model. For each

classiﬁcation, we did not vary the amount of data used

in the training. Now we investigate if more samples

can give as better accuracy when dealing with DP.

Given the results from D coefﬁcient, here we use val-

ues of ε between 1 and 3 (which are values of privacy

of our interest).

We calculate the relative accuracy for different

training data sizes and different ε values. The relative

accuracy is deﬁned by dividing the accuracy of the al-

gorithm that used privatized data by the accuracy of

the algorithm trained with the raw data. This mea-

surement (relative accuracy) was chosen because we

expect the accuracy of the model with no privatized

data to increase as the training data increases in size.

We ran each experiment ﬁve times, and the output is

the average of found values.

Figure 8 shows the relative accuracy for the Deci-

sion Tree algorithm, where N represents the number

of samples in the training dataset. We found no im-

provement in the relative accuracy when increasing

the size of the input data. Actually, for the biggest

input size we got the worst results. The Na

ıve Bayes

algorithm, presented in Figure 9, does not follow the

behavior of the Decision Tree when comparing the

relative accuracy. Here we have a more optimistic re-

sult: the best results are the ones with the biggest in-

put sizes. For the MLP algorithm, shown in Figure 10,

we observed an unexpected behavior, where it is pos-

sible to highlight that the worst result came from the

biggest training dataset. Results of SVM algorithm,

presented in Figure 11, also indicate that the bigger

the input size, the worse is the relative accuracy.

In the chosen database, there are seven possible

classiﬁcation of the predominant type of tree cover

(integer value from 1 to 7). The noise, on the other

hand, is a real number drawn from a random vari-

able (driven from the Laplace mechanism). In the ﬁrst

experiment, we added noise (related to ε value) also

to the classiﬁcation number, and then rounded the re-

sulting number to match one of the possible classiﬁ-

cations. Here we investigate the application of noise

only to the attributes and not to the target (classes).

We then compare three values here: accuracy with

no privatization, accuracy with full privatization and

accuracy with ’semi- privatization’ (no noise in the

target). We keep the training dataset size ﬁxed. We

made ﬁve rounds to get each value, and the results

present the average value of all these rounds. For the

Decision Tree algorithm presented in Figure 12 (for

N = 10,000), both lines where data is somehow pri-

vatized are close to each other, suggesting that keep-

ing the original values for the target data does not add

utility to the privatized data. A similar, but even more

optimistic patter, emerges in the Na

ıve Bayes exper-

iment, as shown in Figure 13 shows (for N = 100).

The fully privatized data gets better results compared

to the semi-privatized in most of points in the chart.

The MLP algorithm, shown in Figure 14 (for N =

10,000) has points with better results with the fully

privatized data compared to the semi-privatized. For

the SVM algorithm presented in Figure 15 (for N

= 1,000), the results are very close to the Decision

Trees, with not much gain in utility by removing the

noise from the target. Similar results were found for

the analyzed classiﬁcation algorithms, which means

that removing the noise from the target data does not

translate to a gain in overall data utility. It is important

to point out that not privatizing the target data does de-

crease the level of privacy of the data. It makes easier,

for instance, to implement a linkage attack.

Figure 8: Relative accuracy for Decision Tree algorithm by

varying dataset size.

4 CONCLUSIONS

We studied how the theory of Differential Privacy re-

lates to its application, by analysing the impact of DP

mechanisms in the utility of the data for different data

analysis algorithms. In fact we used four classiﬁca-

tion techniques: Decision Tree, N

aive Bayes, SVM

and MLP. By utility we mean the possibility of ex-

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

882

Figure 9: Relative accuracy for N

aive Bayes algorithm by

varying dataset size.

Figure 10: Relative accuracy for MLP algorithm by varying

dataset size.

Figure 11: Relative accuracy for SVM algorithm by varying

dataset size.

tracting good results when using privatized data in-

stead of non-privatized data. In other words, we were

interest to know if the classiﬁcation techniques could

keep their accuracy in the presence of data privatized

by a DP mechanism.

The path to understand all of these impacts in-

cluded the development of a Differential Privacy Lab.

We projected it with extensibility in mind. Every part

is modular and can be easily replaced. With that de-

cision, we aimed to make a product that was elastic

enough to ﬁt into the workﬂow of anyone starting to

develop a Differential Privacy solution. We proved

the capabilities of this lab with the experiments we

ran on it, and with all the conclusions we got using

this tool.

During the analysis of the experiment results,

while thinking about the level of privacy, the D Co-

efﬁcient emerged as a more intuitive way of under-

standing the level of privacy of a DP mechanism. We

Figure 12: Accuracy of Decision Tree algorithm consider-

ing distinct application of noise.

Figure 13: Accuracy of N

aive Bayes algorithm considering

distinct application of noise.

Figure 14: Accuracy of MLP algorithm considering distinct

application of noise.

Figure 15: Accuracy of SVM algorithm considering distinct

application of noise.

found that small amounts of noise can lead to a big

drop in utility. It leads to a concern regarding data

privatization, which is that, even with the application

of DP Mechanisms, we cannot say the data is being

properly privatized and maintaining its utility.

Besides, we found that increasing the number of

Exploring Differential Privacy in Practice

883

samples in the training dataset does not always im-

prove accuracy. While removing the noise from the

target data, we observed that there were no signiﬁcant

gain in utility. In fact, privacy is a little damaged when

data is not fully privatized. Therefore, we encourage

adding noise even to the target data.Of course, more

experiments need to be conducted to conﬁrm our ini-

tial ﬁndings.

We believe that the presented experiments as well

the developed lab is a sound basis for understanding

DP and applying it in practice. As future work, we in-

tend to design new experiments considering different

datasets. There can be studied the impact of the pri-

vatization in techniques other than classiﬁcation ones.

It is also possible to make other types of exploration,

such as combining different DP mechanisms for dis-

tinct parts of the data.

REFERENCES

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B.,

Mironov, I., Talwar, K., and Zhang, L. (2016). Deep

learning with differential privacy. In ACM SIGSAC

Conference on Computer and Communications Secu-

rity.

Apple (2017a). Learning with privacy at scale. [Online;

accessed 05-Oct-2019].

Apple (2017b). An on-device deep neural network for face

detection. [Online; accessed 11-June-2019].

Buffered.com (2017). The biggest privacy & security scan-

dals of 2017. [Online; accessed 11-June-2019].

Dwork, C. (2006). Differential privacy. In International

Colloquium on Automata, Languages, and Program-

ming (ICALP).

Dwork, C., McSherryKobbi, F., and Smith, N. (2006). Cal-

ibrating noise to sensitivity in private data analysis. In

Theory of Cryptography Conference (TCC).

Dwork, C. and Roth, A. (2014). The algorithmic founda-

tions of differential privacy. Foundations and Trends

in Theoretical Computer Science, 9(3-4):211–407.

Dwork, C. and Rothblum, G. N. (2016). Concentrated dif-

ferential privacy.

Foulds, J., Geumlek, J., Welling, M., and Chaudhuri,

K. (2016). On the theory and practice of privacy-

preserving bayesian data analysis. In Thirty-Second

Conference on Uncertainty in Artiﬁcial Intelligence

(UAI).

GDPR (2018). General data protection regulation (gdpr).

Technical report, Ofﬁcial Journal of the European

Union. [Online; accessed 11-June-2019].

Granville, K. (2018). Facebook and cambridge analytica:

What you need to know as fallout widens. [Online;

accessed 11-June-2019].

Havir, D. (2017). A comparison of the approaches to cus-

tomer experience analysis. Economics and Business

Journal, 31(1):82–93.

Jain, P. and Thakurta, A. (2014). (near) dimension indepen-

dent risk bounds for differentially private learning. In

31st International Conference on International Con-

ference on Machine Learning (ICML).

McSherry, F. and Talwar, K. (2007). Mechanism design via

differential privacy. In 48th Annual IEEE Symposium

on Foundations of Computer Science (FOCS).

Minami, K., Arai, H., Sato, I., and Nakagawa, H. (2016).

Differential privacy without sensitivity. In 30th Inter-

national Conference on Neural Information Process-

ing Systems (NIPS).

Mironov, I. (2017). Renyi differential privacy. In IEEE 30th

Computer Security Foundations Symposium (CSF).

Smith, B. and Linden, G. (2017). Two decades of recom-

mender systems at amazon.com. IEEE Internet Com-

puting, 21(3):12–18.

Warner, S. L. (1965). Randomized response: A survey tech-

nique for eliminating evasive answer bias. Journal of

the American Statistical Association, 60(309):63–39.

Winkler, R. (2017). Cybersecurity startup tanium exposed

california hospital’s network in demos without per-

mission. [Online; accessed 11-June-2019].

World, D. I. (2018). How much data is generated per

minute? the answer will blow your mind away. [On-

line; accessed 11-June-2019].

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

884