The Use of De-identiﬁcation Methods for Secure and Privacy-enhancing

Big Data Analytics in Cloud Environments

Gloria Bondel, Gonzalo Munilla Garrido, Kevin Baumer and Florian Matthes

Chair for Software Engineering for Business Information Systems, Faculty of Informatics, Technical University of Munich,

Boltzmannstrasse 3, Garching, Germany

Keywords:

Security, Privacy, Big Data Analytics, Cloud Environments.

Abstract:

Big data analytics are interlinked with distributed processing frameworks and distributed database systems,

which often make use of cloud computing services providing the necessary infrastructure. However, storing

sensitive data in public clouds leads to security and privacy issues, since the cloud service presents a central

point of attack for external adversaries as well as for administrators and other parties which could obtain neces-

sary privileges from the cloud service provider. To enable data security and privacy in such a setting, we argue

that solutions using de-identiﬁcation methods are most suitable. Thus, this position paper presents the starting

point for our future work aiming at the development of a privacy-preserving tool based on de-identiﬁcation

methods to meet security and privacy requirements while simultaneously enabling data processing.

1 PROBLEM STATEMENT

Smartphones, social media, and Internet-of-Things

(IoT) are just some of the technical developments

which lead to the digitalization of our day-to-day life.

The utilization of these new technologies results in

the creation of a vast amount of data, so-called big

data, which is collected by the organizations provid-

ing digital services and products. To generate use-

ful information from the collected data, organizations

use different analytics approaches. These analytics

approaches enable the testing of hypotheses and the

identiﬁcation of patterns, ultimately allowing orga-

nizations to design better products and services, en-

hance the user experience, or optimize internal pro-

cesses.

Organizations often use traditional business in-

telligence approaches to analyze data. Business in-

telligence approaches transform data into a prede-

ﬁned structure and store it on a central server (i.e.,

a data warehouse) for future processing. However,

for big data, these business intelligence approaches

are no longer applicable due to the properties of

big data, which are availability in large quantities,

requires quasi-real-time processing, and comprises

many different types of structured and unstructured

data (Laney, 2001). Instead, approaches for big data

analytics based on distributed data processing have

emerged. Distributed processing frameworks enable

faster data processing by storing data in a distributed

ﬁle system and moving the processing activities into

the data nodes, thereby enabling parallelization.

Distributed data processing is storage intensive,

making a large computational infrastructure neces-

sary. Therefore, big data analytics and cloud comput-

ing, offering efﬁcient management and reduced cost

of IT infrastructures, are often associated (Hashem

et al., 2015; Liu et al., 2015; Ma et al., 2013). Nowa-

days,cloud providers even offer special products for

distributed big data processing, e.g. Cloudera

, Ama-

zon EMR

, Microsoft Azure HDInsight

, or IBM Bi-

gInsights

However, big data processing combined with

cloud computing leads to security and privacy con-

cerns (Liu et al., 2015; Stergiou and Psannis, 2017).

Even if the data is encrypted while being transferred

over a network, the data needs to be stored in the

cloud in plaintext to enable data processing. This

leads to two major security issues. First of all, the

cloud presents a central point of attack for external

attackers. For example, a hacker could manage to

penetrate the cloud and gain access to sensitive data.

https://www.cloudera.com/

https://aws.amazon.com/de/emr/

https://azure.microsoft.com/de-de/services/hdinsight/

https://www.ibm.com/support/knowledgecenter/

SSPT3X 4.0.0/com.ibm.swg.im.infosphere.biginsights.

product.doc/doc/c0057605.html

338

Bondel, G., Garrido, G., Baumer, K. and Matthes, F.

The Use of De-identiﬁcation Methods for Secure and Privacy-enhancing Big Data Analytics in Cloud Environments.

DOI: 10.5220/0009470903380344

In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020) - Volume 2, pages 338-344

ISBN: 978-989-758-423-7

The second security concern originates from within

the cloud service provider’s (CSP) organization. Ad-

ministrators of the CSP have certain privileges to the

data stored in the cloud, which is necessary to per-

form maintenance activities as well as to prevent mis-

use of provided resources (Li et al., 2013). However,

these administrative privileges can be abused for per-

sonal beneﬁts, as shown in an incident in 2010, when

Google had to ﬁre a key engineer after breaking into

the Gmail and Google Voice accounts of several chil-

dren (Krazit, 2010). Furthermore, the cloud provider

could give access to cloud resources to third parties,

e.g., to government entities for reasons of legal pros-

ecution.

Overall, the risks of cloud computing inhibit the

use of big data analytics (Liu et al., 2015; Li et al.,

2013), which hinders the realization of high poten-

tials. Therefore, several approaches that address se-

curity and privacy concerns in the context of big data

analytics in cloud environments exist. However, these

approaches reduce processing performance, are often

described in a very abstract manner, do not contain

precise instructions for action, or are outdated. Thus,

it is difﬁcult for organizations to get an overview of

current approaches, their advantages, and their disad-

vantages.

Therefore, a persisting problem in big data analyt-

ics can be formulated as follows:

How can data be protected in big data cloud environ-

ments while enabling a maximum of processing func-

tionality and minimizing performance constraints as

well as utility loss?

In this context, the conﬂicting goals of security and

privacy, on the one hand, and preserving the util-

ity of data to enable analytics, on the other hand,

should be evaluated (Tomashchuk et al., 2019; Is-

ley, 2018). As a ﬁrst step to address this prob-

lem, we identiﬁed ﬁve approaches for secure big data

analytics in cloud environments. These approaches

are homomorphic encryption, partial encryption in

combination with trusted hardware or partial encryp-

tion in combination with trusted client/hybrid cloud,

de-identiﬁcation, and privacy-preserving cloud archi-

tecture. After analyzing the advantages and dis-

advantages of these approaches, we argue that de-

identiﬁcation is a promising approach since it en-

ables a multitude of analysis functionalities while si-

multaneously realizing security and privacy objec-

tives. This assumption is additionally conﬁrmed by

the emergence of several commercial tools that aim at

implementing de-identiﬁcation for data processing in

cloud environments.

However, to the best of the authors’ knowl-

edge, a holistic view of the impact of different de-

identiﬁcation methods on the trade-off between secu-

rity and privacy versus data analytics capabilities does

not exist. Therefore, in our future work, we aim at an-

alyzing this trade-off on the use cases of data gener-

ated by wearables and vehicles. Furthermore, we aim

at developing a privacy-enhancing tool for the easy

application of different de-identiﬁcation methods.

In the following, we will ﬁrst present a short def-

inition and delineation of the terms security and pri-

vacy in section 2. This is followed by an overview

of existing approaches to secure big data analytics

in cloud environments including their advantages and

disadvantages in section 3. This section also cov-

ers a short rational why we deem de-identiﬁcation

methods as most suitable approach. Afterwards, in

section 4, we present existing approaches using de-

identiﬁcation methods from research and practice. Fi-

nally, we conclude with a conclusion in section 5.

2 FOUNDATIONS

In this section we will introduce the concepts of secu-

rity and privacy as well as their interrelation.

We adapt the deﬁnition of security provided by

(Fink et al., 2018), who deﬁnes security as ”[...] a set

of measures to ensure that a system will be able to ac-

complish its goal as intended, while mitigating unin-

tended negative consequences”. Thus, security aims

to prevent vulnerabilities of software and hardware,

making it resilient against malicious attacks, natural

disasters, unplanned disruptions, and the unintended

use of computational resources (Hurlburt et al., 2009).

In general, privacy can be deﬁned as ”[...] free-

dom from observation, disturbance, or unwanted pub-

lic attention [...]” (Fink et al., 2018). However, to

make the term privacy more actionable in the con-

text of computer science, the threat-based deﬁnition

of privacy provided by (Wu, 2012) is adopted: ”[Pri-

vacy] is deﬁned not by what it is, but by what it is

not - it is the absence of a privacy breach that deﬁnes

a state of privacy”. Thus, privacy is about identify-

ing and characterizing relevant privacy threats as well

as protecting information against these threats (Wu,

2012; Solove, 2015).

Different views on the relationship between the

terms security and privacy exist (Hurlburt et al.,

2009). In most cases, security and privacy are inter-

preted as overlapping concepts. The overlapping area

between security and privacy is often referred to as in-

formation security. Information security aims at pro-

tecting different kinds of information and data from

destructive forces and unwanted actions (Mukherjee

The Use of De-identiﬁcation Methods for Secure and Privacy-enhancing Big Data Analytics in Cloud Environments

339

et al., 2015). The three principles conﬁdentiality, in-

tegrity, and availability, are known as the CIA triad

that characterizes information security (Fink et al.,

2018; Domingo-Ferrer et al., 2019). These princi-

ples do not only support and shape the theoretical un-

derstanding of information security, but they are also

often used as a basis for deﬁning privacy rules and

for protecting electronic health information (Samonas

and Coss, 2014).

However, in other cases, privacy is interpreted as

an aspect of security (Hurlburt et al., 2009). This re-

sults from the observation that some security methods

have a direct effect on privacy (Fink et al., 2018).

Summarizing, both - privacy and security - have in

common that they are concerned with the appropriate

use and protection of information. However, the con-

cepts vary concerning the scope and rationale of the

protection (Fink et al., 2018; Hurlburt et al., 2009).

3 EXISTING APPROACHES

In this section, we will ﬁrst present existing ap-

proaches for big data analytics as well as their advan-

tages and disadvantages. Fig. 1 presents a schematic

overview of the different approaches. Due to the

tremendous potentials of big data for organizations,

several approaches to solving security issues of big

data analytics in the cloud have been proposed in the

past:

Homomorphic Encryption. This approach describes

the use of an encryption scheme, which allows the

processing of encrypted data (see Fig. 1, a). Fully

homomorphic encryption would allow arbitrary pro-

cessing of encrypted data, but as of today, no such

encryption scheme exists. However, scientiﬁc liter-

ature presents several partially homomorphic algo-

rithms, allowing to perform limited functionality on

encrypted data, e.g., data aggregation (Paillier, 1999;

Castelluccia et al., 2005; Lu et al., 2012), cosine sim-

ilarity (Lu et al., 2014), and order-preserving search

(Agrawal et al., 2004). Due to the limited functional-

ity as well as losses in processing performance, homo-

morphic encryption schemes are not relevant in prac-

tice.

Partial Encryption in Combination with Trusted

Hardware. The approach is based on the integra-

tion of a trusted hardware device into the public cloud

infrastructure. The trusted device runs as an au-

tonomous compute element that the cloud administra-

tor cannot access (Bajaj and Sion, 2014) (see Fig. 1,

b). First, the user of this approach splits the data into

sensitive and not sensitive data, which is uploaded to

the public cloud in ciphertext and plaintext, respec-

tively. The processing of not sensitive data is done

directly in the public cloud. However, if the sensitive

data needs to be processed, it is ﬁrst transferred onto

the trusted hardware, where it is decrypted, processed,

encrypted, and sent back to the cloud. The public

cloud passes the resulting plaintext and the ciphertext

on to the client, who has to decrypt the processing

results of the sensitive data and merge it with the re-

sults of the not sensitive data. This way, the sensitive

data is never in the public cloud without being en-

crypted. Since the trusted hardware is integrated into

the cloud infrastructure, this approach can only be im-

plemented by the CSP, which requires trust that the

trusted hardware is not compromised. Furthermore,

splitting a dataset into two parts and processing each

part individually leads to performance losses. Again,

several examples are presented in research (Bajaj and

Sion, 2014; Eguro and Venkatesan, 2012; Arasu et al.,

2013; Pires et al., 2016), but few implementations ex-

ist in practice, e.g., Intel SGX

Partial Encryption in Combination with Trusted

Client/Hybrid Cloud. Similarly to partial encryp-

tion in conjunction with trusted hardware, the data set

is split into sensitive and non-sensitive data. How-

ever, this time, the sensitive data is not sent to the

public cloud but stored and processed in a private

cloud (Hacig

us¸ et al., 2002; Zhang et al., 2013)

(see Fig. 1, c). Disadvantages of this approach are that

the client has to maintain one or more local servers af-

ter all. Moreover, performance issues due to inter-site

communication arise.

De-identiﬁcation of Data. De-identiﬁcation methods

are approaches that make it difﬁcult to restore the link

between an individual and his or her data by removing

or transforming speciﬁc data points (Kushida et al.,

2012). Before uploading the data to the public cloud,

it is sanitized using different de-identiﬁcation meth-

ods, e.g., pseudonymization, generalization, charac-

ter masking, or suppression (see Fig. 1, d). The

risk of using de-identiﬁcation techniques is that ”re-

identiﬁcation attacks” can be launched to identify

speciﬁc individuals (Kushida et al., 2012). Privacy

models provide a means for measuring the likeli-

hood of re-identiﬁcation attacks and thus for deﬁn-

ing different levels of privacy (Tomashchuk et al.,

2019). Privacy models are, for example, k-anonymity

(Sweeney, 2002), l-diversity (Machanavajjhala et al.,

2006), t-closeness (Li et al., 2007), and differential

privacy (Dwork, 2006). However, re-identiﬁcation at-

tacks can still circumvent privacy models by linking

more data to the anonymized dataset, creating unique-

ness of each entry, and therefore establishing a link

https://www.intel.de/content/www/de/de/architecture-

and-technology/software-guard-extensions.html

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

340

back to an individual’s identity.

Privacy Preserving Cloud Architecture. The cloud

architecture presented by (Li et al., 2013) removes

control rights of the provider, ensuring that the CSP

can not access any dataset stored in the cloud (see

Fig. 1, d). However, CSPs want to keep their control

rights to prevent misuse of their cloud infrastructures.

Similar approaches are proposed by (Pacheco et al.,

2017; Jr. et al., 2016)

All the solutions presented have certain advan-

tages and disadvantages. One of our underlying as-

sumptions is that the CSP is not trustworthy. Thus,

we can exclude the approach of partial encryption in

combination with trusted hardware. Also, we omit

the approach of a cloud architecture that preserves

data privacy, since to the best of the authors’ knowl-

edge, currently no cloud provider implements such an

architecture. The limited functionality and reduced

processing performance of homomorphic encryption

hamper the exploitation of big data analytics. Thus,

we do not consider homomorphic encryption for se-

cure and privacy-enhancing cloud computing any fur-

ther. Although the approach of partial encryption in

combination with a Trusted Client / Hybrid Cloud

does not realize many of the advantages of using

public clouds, this approach is not completely ex-

cluded. For example, small amounts of data, such as

keys used in encryption, can be stored on on-premise

databases.

In summary, in our future work, we will focus

on de-identiﬁcation methods to implement privacy for

big data analytics in cloud environments. This ap-

proach enables a multitude of functionalities while

simultaneously realizing security as well as privacy

objectives and does not require the client to trust

the CSP. Besides, depending on the security and pri-

vacy requirements, a higher or lower level of secu-

rity and privacy can be achieved by the selected de-

identiﬁcation methods, which makes it possible to im-

plement different use cases. Finally, it is also possi-

ble to assess the risk of re-identiﬁcation attacks using

privacy models, and thus to evaluate the choice of de-

identiﬁcation methods and, if necessary, adjust them

accordingly.

4 RELATED WORK

In this section, we present existing approaches to se-

curity and privacy for big data analytics in cloud en-

vironments, as presented in research and practice.

Many scientiﬁc publications emphasize the rele-

vance of secure big data analytics in cloud environ-

ments, e.g., (Liu et al., 2015; Stergiou and Psan-

nis, 2017; Zissis and Lekkas, 2012; Neves et al.,

2016). However, focussing on solutions based on de-

identiﬁcation methods, the number of publications is

signiﬁcantly smaller. The use of encryption for secure

big data analytics in the cloud, which is not homomor-

phic encryption, is limited to the use of the Advanced

Encryption Standard (AES) (Sachdev and Bhansali,

2013). There are also examples for generalization of

data, such as (Prasser et al., 2017; Wan et al., 2015;

Dankar et al., 2012; Prasser et al., 2016), that mainly

deal with health data and are afﬁliated with the prod-

uct ARX Data Anonymization Tool.

A concrete context in which the de-identiﬁcation

of data plays a vital role is in the area of personal

health data (PHI) in the USA. In 2002, the revised

version of the HIPAA Privacy Rule was adopted,

which sets national standards for the protection of

medical records and other personal health informa-

tion (U.S. Department of Health and Human Services,

nN). Under HIPPA, personal health information, if

de-identiﬁed as required, may be used and disclosed

for any purpose.

The Safe Harbor Model, not to be confused

with the entirely different and currently non-existent

EU Safe Harbor, speciﬁes what data must be re-

moved or generalized. Several initiatives automate

HIPAA de-identiﬁcation, such as ZIPpy Safe Har-

bor De-Identiﬁcation Macros for SAS (Chatﬁeld and

Parker, 2018). However, HIPAA Safe Harbor de-

identiﬁcation methods are limited and have been

shown to be insufﬁcient for protection against re-

identiﬁcation attacks (Sweeney et al., 2017).

In the meantime, however, other tools emerged

that were originally developed for the purpose of im-

plementing HIPAA Safe Harbor, but now also of-

fer more extensive de-identiﬁcation methods. These

include Privacy Analytics Eclipse

, ARX Data

Anonymization Tool

, IQVIA

or the Google Cloud

Healthcare API

. However, all of these products still

have a strong focus on health data and therefore are

often based on standards for interoperability of health

data (e.g. FHIR

, DICOM

Other products that offer the de-identiﬁcation of

data commercially are Anonos

, IBM InfoSphere

https://privacy-analytics.com/software/privacy-analyti

cs-eclipse

https://arx.deidentiﬁer.org

https://www.iqvia.com/solutions/real-world-value-and

-outcomes/privacy-preservation-and-data-linkage

https://cloud.google.com/healthcare

https://www.hl7.org/fhir/index.html

https://www.dicomstandard.org/dicomweb/restful-

structure

https://www.anonos.com

The Use of De-identiﬁcation Methods for Secure and Privacy-enhancing Big Data Analytics in Cloud Environments

341

Figure 1: Overview of existing approaches for privacy-securing big data analytics in cloud environments.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

342

Optim Data Privacy

, Data Sunrise

and Privitar

However, these products offer only a single, or a lim-

ited set of de-identiﬁcation methods (e.g., dynamic

pseudonymization of Anonos, character masking of

Data Sunrise) and only selected databases can be used

as Big Data infrastructure.

5 CONCLUSION

In this position paper, we address a persisting prob-

lem in big data analytics, which is concerned with

the trade-off between protecting security and privacy

while at the same time enabling analysis functional-

ity. We present existing approaches for preserving the

security and privacy of big data analytics in cloud en-

vironments and argue that de-identiﬁcation provides

the most promising approach. This assumption is sup-

ported by the emergence of commercial tools for en-

abling security for cloud environments applying de-

identiﬁcation methods. However, to the best of the au-

thors’ knowledge, no holistic analysis of the trade-off

between different de-identiﬁcation methods and anal-

ysis functionality currently exists. Therefore, our fu-

ture work aims at analyzing this trade-off in a use case

focusing on data generated by wearables as well as on

vehicle-generated data. These results will provide the

basis for the implementation of a privacy-enhancing

tool applying different de-identiﬁcation methods.

ACKNOWLEDGMENTS

This work was sponsored by the German Federal

Ministry of Education and Research (BMBF) grant

01IS17049 / X-DACE. The responsibility for the con-

tent of this publication lies with the authors.

REFERENCES

Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. (2004).

Order preserving encryption for numeric data. In Pro-

ceedings of the 2004 ACM SIGMOD International

Conference on Management of Data, SIGMOD ’04,

pages 563–574, New York, NY, USA. ACM.

Arasu, A., Blanas, S., Eguro, K., Kaushik, R., Kossmann,

D., Ramamurthy, R., and Venkatesan, R. (2013).

Orthogonal security with cipherbase. In 6th Bien-

nial Conference on Innovative Data Systems Research

(CIDR’13).

https://www.ibm.com/us-en/marketplace/infosphere-

optim-data-privacy

https://www.datasunrise.com/data-masking

https://www.privitar.com

Bajaj, S. and Sion, R. (2014). Trusteddb: A trusted

hardware-based database with privacy and data conﬁ-

dentiality. IEEE Transactions on Knowledge and Data

Engineering, 26(3):752–765.

Castelluccia, C., Mykletun, E., and Tsudik, G. (2005). Efﬁ-

cient aggregation of encrypted data in wireless sensor

networks. In The Second Annual International Con-

ference on Mobile and Ubiquitous Systems: Network-

ing and Services, pages 109–117.

Chatﬁeld, A. and Parker, Jessica ans Egeler, P. (2018).

Zippy safe harbor de-identiﬁcation macros. In SAS

Conference Proceedings: SAS Global Forum 2018.

Dankar, F. K., El Emam, K., Neisa, A., and Roffey, T.

(2012). Estimating the re-identiﬁcation risk of clin-

ical data sets. BMC medical informatics and decision

making, 12(1):66.

Domingo-Ferrer, J., Farr

as, O., Ribes-Gonz

alez, J., and

anchez, D. (2019). Privacy-preserving cloud com-

puting on sensitive data: A survey of methods, prod-

ucts and challenges. Computer Communications, 140-

141:38–60.

Dwork, C. (2006). Differential privacy. In 33rd Inter-

national Colloquium on Automata, Languages and

Programming, part II (ICALP 2006), volume 4052

of Lecture Notes in Computer Science, pages 1–12.

Springer Verlag.

Eguro, K. and Venkatesan, R. (2012). Fpgas for trusted

cloud computing. In 22nd International Confer-

ence on Field Programmable Logic and Applications

(FPL), pages 63–70.

Fink, G. A., Song, H., and Jeschke, S., editors (2018). Se-

curity and privacy in cyber-physical systems: Founda-

tions, principles, and applications. Wiley IEEE Press,

Hoboken, NJ, ﬁrst edition edition.

Hacig

us¸, H., Iyer, B., Li, C., and Mehrotra, S. (2002).

Executing sql over encrypted data in the database-

service-provider model. In Proceedings of the 2002

ACM SIGMOD International Conference on Manage-

ment of Data, SIGMOD ’02, pages 216–227, New

York, NY, USA. ACM.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S.,

Gani, A., and Khan, S. U. (2015). The rise of “big

data” on cloud computing: Review and open research

issues. Information Systems, 47:98 – 115.

Hurlburt, G. F., Miller, K. W., Voas, J. M., and Day, J. M.

(2009). Privacy and/or security: Take your pick. IT

Professional, 11(4):52–55.

Isley, P. (2018). Iso/iec 20889 ﬁrst edition 2018-11: Pri-

vacy enhancing data de- identiﬁcation terminology

and classiﬁcation of techniques. Standard.

Jr., E. C. B., Monteiro, J. M., Reis, R., and Machado, J. C.

(2016). A ﬂexible mechanism for data conﬁdential-

ity in cloud database scenarios. In Proceedings of the

18th International Conference on Enterprise Informa-

tion Systems - Volume 1: ICEIS,, pages 359–368. IN-

STICC, SciTePress.

Krazit, T. (2010). Google ﬁred engineer for pri-

vacy breach. https://www.cnet.com/news/

google-ﬁred-engineer-for-privacy-breach/. web-

site, online, accessed 13 July 2018.

The Use of De-identiﬁcation Methods for Secure and Privacy-enhancing Big Data Analytics in Cloud Environments

343

Kushida, C., Nichols, D., Jadrnicek, R., Miller, R.,

Walsh, J., and Grifﬁn, K. (2012). Strategies for de-

identiﬁcation and anonymization of electronic health

record data for use in multicenter research studies.

Medical care, 50 Suppl:S82–101.

Laney, D. (2001). 3D data management: Controlling data

volume, velocity, and variety. Technical report, META

Group, Garnter.

Li, M., Zang, W., Bai, K., Yu, M., and Liu, P. (2013).

Mycloud: Supporting user-conﬁgured privacy pro-

tection in cloud computing. In Proceedings of the

29th Annual Computer Security Applications Confer-

ence, ACSAC ’13, pages 59–68, New York, NY, USA.

ACM.

Li, N., Li, T., and Venkatasubramanian, S. (2007).

t-closeness: Privacy beyond k-anonymity and l-

diversity. In 2007 IEEE 23rd International Confer-

ence on Data Engineering, pages 106–115.

Liu, C., Yang, C., Zhang, X., and Chen, J. (2015). External

integrity veriﬁcation for outsourced big data in cloud

and iot: A big picture. Future Generation Computer

Systems, 49:58 – 67.

Lu, R., Liang, X., Li, X., Lin, X., and Shen, X. (2012).

Eppa: An efﬁcient and privacy-preserving aggrega-

tion scheme for secure smart grid communications.

IEEE Transactions on Parallel and Distributed Sys-

tems, 23(9):1621–1631.

Lu, R., Zhu, H., Liu, X., Liu, J. K., and Shao, J. (2014).

Toward efﬁcient and privacy-preserving computing in

big data era. IEEE Network, 28(4):46–50.

Ma, M., Wang, P., and Chu, C. (2013). Data management

for internet of things: Challenges, approaches and op-

portunities. In 2013 IEEE International Conference

on Green Computing and Communications and IEEE

Internet of Things and IEEE Cyber, Physical and So-

cial Computing, pages 1144–1151.

Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasub-

ramaniam, M. (2006). L-diversity: privacy beyond k-

anonymity. In 22nd International Conference on Data

Engineering (ICDE’06), pages 24–24.

Mukherjee, J., Datta, B., Banerjee, R., and Das, S.

(2015). Dwt difference modulation based novel

steganographic algorithm. In Jajodia, S. and Mazum-

dar, C., editors, Information systems security, Lecture

Notes in Computer Science, pages 573–582. Springer,

Cham and Heidelberg and New York and Dordrecht

and London.

Neves, P. C., Schmerl, B. R., C

amara, J., and Bernardino,

J. (2016). Big data in cloud computing: Features and

issues. In IoTBD.

Pacheco, L., Alchieri, E., and Solis, P. (2017). Architecture

for privacy in cloud of things. In Proceedings of the

19th International Conference on Enterprise Informa-

tion Systems - Volume 2: ICEIS,, pages 487–494. IN-

STICC, SciTePress.

Paillier, P. (1999). Public-key cryptosystems based on com-

posite degree residuosity classes. In Stern, J., editor,

Advances in Cryptology — EUROCRYPT ’99, pages

223–238, Berlin, Heidelberg. Springer Berlin Heidel-

berg.

Pires, R., Pasin, M., Felber, P., and Fetzer, C. (2016).

Secure content-based routing using intel software

guard extensions. In Proceedings of the 17th In-

ternational Middleware Conference, Middleware ’16,

pages 10:1–10:10, New York, NY, USA. ACM.

Prasser, F., Kohlmayer, F., and Kuhn, K. A. (2016). The

importance of context: Risk-based de-identiﬁcation of

biomedical data. Methods of information in medicine,

55(04):347–355.

Prasser, F., Kohlmayer, F., Spengler, H., and A Kuhn, K.

(2017). A scalable and pragmatic method for the safe

sharing of high-quality health data. IEEE Journal of

Biomedical and Health Informatics, PP:1–1.

Sachdev, A. and Bhansali, M. (2013). Enhancing cloud

computing security using aes algorithm. International

Journal of Computer Applications, 67:19–23.

Samonas, S. and Coss, D. (2014). The cia strikes back:

Redeﬁning conﬁdentiality, integrity and availability in

security. Journal of Information System Security, Vol-

ume 10(3):21–45.

Solove, D. J. (2015). The meaning and value of privacy.

In Roessler, B. and Mokrosinska, D., editors, Social

Dimensions of Privacy, pages 71–82. Cambridge Uni-

versity Press, Cambridge.

Stergiou, C. and Psannis, K. E. (2017). Efﬁcient and se-

cure big data delivery in cloud computing. Multimedia

Tools and Applications, 76(21):22803–22822.

Sweeney, L. (2002). k-anonymity: A model for protecting

privacy. International Journal of Uncertainty, Fuzzi-

ness and Knowledge-Based Systems, 10(05):557–570.

Sweeney, L., Yoo, J. S., Perovich, L., Boronow, K. E.,

Brown, P., and Brody, J. G. (2017). Re-identiﬁcation

risks in hipaa safe harbor data: A study of data from

one environmental health study. Technology science,

2017.

Tomashchuk, O., van Landuyt, D., Pletea, D., Wuyts, K.,

and Joosen, W. (2019). A data utility-driven bench-

mark for de-identiﬁcation methods. In Gritzalis, S.,

Weippl, E. R., Katsikas, S. K., Anderst-Kotsis, G.,

Tjoa, A. M., and Khalil, I., editors, Trust, Privacy and

Security in Digital Business, volume 11711 of Lecture

Notes in Computer Science, pages 63–77. Springer In-

ternational Publishing, Cham.

U.S. Department of Health and Human Services

(n.N.). Summary of the HIPAA Privacy Rule.

https://www.hhs.gov/hipaa/for-professionals/privacy/

laws-regulations/index.html. accessed 16. Juli 2019.

Wan, Z., Vorobeychik, Y., Xia, W., Clayton, E. W., Kantar-

cioglu, M., Ganta, R., Heatherly, R., and Malin, B. A.

(2015). A game theoretic framework for analyzing re-

identiﬁcation risk. PloS one, 10(3):e0120592.

Wu, F. T. (2012). Deﬁning privacy and utility in data sets.

84 University of Colorado Law Review 1117 (2013);

2012 TRPC, pages 1117–1177.

Zhang, J. Y., Wu, P., Zhu, J., Hu, H., and Bonomi, F.

(2013). Privacy-preserved mobile sensing through hy-

brid cloud trust framework. In 2013 IEEE Sixth Inter-

national Conference on Cloud Computing, pages 952–

953.

Zissis, D. and Lekkas, D. (2012). Addressing cloud com-

puting security issues. Future Generation Computer

Systems, 28(3):583 – 592.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

344