The Concept of Identifiability in ML Models

Stephanie von Maltzan

Karlsruhe Institute of Technology, Centre for Applied Legal Studies, Vincenz-Prießnitz-Str. 3, 76137 Karlsruhe, Germany

Keywords: Anonymisation, Pseudonymisation, ML Model, Adversarial Attacks, Privacy, Utility.

Abstract: Recent research indicates that the machine learning process can be reversed by adversarial attacks. These

attacks can be used to derive personal information from the training. The supposedly anonymising machine

learning process represents a process of pseudonymisation and is, therefore, subject to technical and

organisational measures. Consequently, the unexamined belief in anonymisation as a guarantor for privacy

cannot be easily upheld. It is, therefore, crucial to measure privacy through the lens of adversarial attacks and

precisely distinguish what is meant by personal data and non-personal data and above all determine whether

ML models represent pseudonyms from the training data.

1 INTRODUCTION

The debate about Privacy Preserving and in particular

anonymisation techniques has intensified as a result

of an increasing demand for stronger and more

comprehensive protection of personal data. In recent

years many companies have considered

anonymisation to be the answer to all data protection

and privacy issues. Companies exploiting

anonymisation assume that it cannot breach privacy.

This premise poses, nevertheless, some challenges.

Two crucial assumptions should, therefore, be

considered. The first point to note is that personal data

is only considered anonymous, if it is not possible to

identify an individual. The General Data Protection

Regulation (GDPR) assumes that there is some

leeway in considering data to be anonymous.

Determining anonymity is, therefore, fraught with

difficulties and depends on criteria that change

according to technical progress or even specific

analysis, as the GDPR takes into account technical

developments to determine anonymous data. This

leads to constant uncertainty in the anonymisation

process. Government standards are, therefore,

indispensable. Secondly, a large body of research on

the volatility and vulnerability of machine learning

(ML) models points to the problem that training data

used for ML models have higher probability of re-

identification as a result of adversarial attacks

(Fredrikson et al., 2014; Shokri et al., 2017). Based

on the premise that anonymisation is not a risk-free

mechanism, as increasingly acknowledged (Brasher,

2018; Mehmood et al., 2016; Piras et al., 2019;

Pomares-Quimbaya et al., 532019), adversarial

attacks against ML models became the focus of

research. Overall, the review showed that ML models

memorise sensitive information of the data used for

training, indicating serious privacy risks (Carlini et

al.; Fredrikson et al., 2015; Hayes et al., 2019;

Hilprecht et al., 2019; Jayaraman & Evans, 2019;

Pyrgelis et al.; Shokri et al., 2017; Song et al.; Yeom

et al., 2017). This emerge to the problem that ML

models might be personal data and fall, therefore,

under the scope of the GDPR. Anonymised data is

exempt from the GDPR. If the data is not personal

data, as previously assumed for ML models (and the

output), the GDPR does not apply.

Machine learning algorithms are regularly trained

and evaluated on disjoint data sets. Hence, research

and industry have been under the erroneous belief that

it is not possible to retrospectively draw conclusions

from the ML model about the data used for training.

However, some ML techniques - as the above

mentioned research has shown - can remember the

training data of the model antiparallel to the

predefined learning process. Despite its “artificial”

nature the ML process contains some characteristics

of the properties, patterns and correlations from the

data used for training and, thus, does not protect from

linkage and attribute inference. As indicated above,

this raises the question of whether the ML models

represent pseudonyms from the training data and

could, therefore, fall under the definition of personal

data. Classifying models as personal data raises

von Maltzan, S.

The Concept of Identiﬁability in ML Models.

DOI: 10.5220/0011081600003194

In Proceedings of the 7th International Conference on Internet of Things, Big Data and Security (IoTBDS 2022), pages 215-222

ISBN: 978-989-758-564-7; ISSN: 2184-4976

215

further far-reaching problems concerning the widely

used ML as a Service (MLaaS). In the event that

malicious users were able to re-identify data used to

train these models, the resulting information leakage

and privacy breach would cause serious issues.

Therefore, the unexamined belief in anonymisation as

a guarantor for privacy cannot be easily upheld.

As a result, the surprising ease of identifying

individuals or information about individuals in

supposedly anonymous – even synthetic (Stadler et

al., 2020b) – datasets creates a great deal of

uncertainty about which technical measures are

adequate to both legal standards and practical

expectations. The well-known tension between utility

and privacy is thus amplified. It is crucial to measure

privacy through the lens of adversarial attacks and

precisely distinguish what is meant by personal data

and non-personal data and above all determine

whether ML models represent pseudonyms from the

training data. It is, therefore, preferable to adopt an

approach that incorporates the above mentioned

vulnerability and volatility of the models into the

training while retaining utility. A GDPR-compliant

use of ML models requires technical measures

specified by government standards (yet to be

developed).

Hence, this work will first address the concept of

identifiability and the scope of privacy criteria that

lead to effective anonymisation (or

pseudonymisation) and transfer these findings to ML

models. Building on the principles arising from the

GDPR, the Guidelines from the Art. 29 Data

Protection Working Group, nowadays known as the

European Data Protection Board (EDPB) and the

European Union Agency for Cybersecurity (ENISA)

as well as the CJEU’s Breyer judgement were used to

provide the underlying rationale. Further guidance –

albeit with a contrary view – was above all also

provided by Mourby (Mourby et al., 2018), Stalla-

Bourdillon (Hu et al., 2017; Stalla-Bourdillon &

Knight, 2017) and Groos (Groos & van Veen, 2020).

Based on this research, the aim of this paper is not

only to outline the legally relevant scope of

identifiability – which has not been discussed so far

in the context of ML models – but also to combine the

respective concepts in an interdisciplinary manner.

This can only be achieved by drawing on existing

research and established legal definitions and

concepts. To the best of my knowledge, there is no

conceptual and cross-cutting work that is in line with

the recent research concerning the legal outcome of

adversarial attacks and anonymising effects of ML

models and above all the identification of ML models

as pseudonyms of the data used for training.

2 THE CONCEPT OF

IDENTIFIABILITY

Approaches to define identifiability can be found in

the GDPR and are closely linked to the concept of

personal data. Personal data under Art. 4 (1) GDPR

means any information relating to an identified or

identifiable natural person (data subject); an

identifiable natural person is one who can be

identified, directly or indirectly. The terms identified

and identifiable are, therefore, of crucial importance

to distinguish the different types of data and to

determine (Stalla-Bourdillon & Knight, 2017)

whether data should be considered personal data.

These vague criteria allow different

interpretations, which leads to a dynamic and thus

different understanding and perception of the

definition. The notion of personal data is, therefore,

quite difficult to define. However, the objective

depends on the question of what personal data is.

Recital 26 expands the notion of identifiability and

makes a distinction between personal data and

anonymised data, excluding anonymised data from

the scope of the GDPR. In view of the

aforementioned risk of re-identification and in order

to avoid misunderstandings and conceptual

ambiguities, the distinction from anonymisation is of

crucial importance. This is primarily due to the fact

that uncertainties exist regarding the classification of

pseudonymised data as personal data or the

classification of technical and organisational

measures that are considered pseudonymisation

measures (Mourby et al., 2018) and not

anonymisation measures. In light of the existing

uncertainties, a differentiation of the anonymising (or

pseudonymising) effect is necessary.

2.1 Anonymising Effect of ML Models

Anonymisation effectively serves as a privacy

protection technique and as a way to remove the

personal character of the data. However, as is

increasingly acknowledged, anonymisation is not a

risk-free mechanism (Brasher, 2018; Mehmood et al.,

2016; Piras et al., 2019; Pomares-Quimbaya et al.,

532019), especially with regard to ML models; and as

demonstrated by Stadler (Stadler et al., 2020a), this

also applies to synthetic data.

Anonymisation is regarded as a process whereby

a data subject can no longer be identified, directly or

indirectly, either by the controller or by a third party

on the basis of irreversibly altered personal data. The

key factor is that a person is not or no longer

identifiable after the data has been anonymised. With

IoTBDS 2022 - 7th International Conference on Internet of Things, Big Data and Security

216

the criterion of all means reasonably likely to be used

Recital 26 provides guidance in addressing this issue.

The decisive factor is whether identifying the data

subjects is possible with the data and the additional

knowledge. However, the extent to which additional

knowledge and means of third parties should be

included is a matter of dispute. The previous attempts

to define the concept of identifiability and

anonymisation from literature and judiciary do not

draw a consistent picture.

Following the European Court of Justice’s

(CJEU) Breyer ruling, the additional knowledge of

third parties has to be attributed to the controller if the

additional knowledge "constitutes a means which

may reasonably be used to identify the data subject"

(Case C-582/14, Patrick Breyer V Bundesrepublik

Deutschland, 2016). This criterion is not met,

according to the CJEU, if the identification of the data

subject is prohibited by law or practically impossible

because it requires a disproportionate effort in terms

of time, costs and manpower, so that the risk of

identification appears to be insignificant. The legally

permissible means are therefore the decisive criterion

to be discussed.

Favouring a broad interpretation of personal data

the European Data Protection Board (EDPB, formerly

Article 29 Working Party) still refers to its Opinion

5/2014 on anonymisation techniques (Working Paper

216), in which it proposes a high threshold for

achieving successful anonymisation and refers to a

technique comparable to permanent erasure, i.e. "it

must not be possible to further process the personal

data" (Article 29 Working Party, 2014a). According

to this opinion, inference is considered one of the key

risks, which is why a very broad definition has been

chosen (Article 29 Working Party, 2014a). As far as

anonymisation is concerned, WP 216 states that the

data set can be considered anonymous if the

controller aggregates the data at a level where

individual events are no longer identifiable.

This Opinion has not become obsolete after the

Breyer decision. In determining whether a natural

person is identifiable, all the means reasonably

available, either to the controller or to any other

person, to identify the natural person directly or

indirectly should be considered. This includes all

objective factors such as the cost and time required

for identification, the technology available at the time

of processing and technological developments.

The potential additional knowledge of the

controller or a third party is, therefore, relevant. The

determining factor is whether there are (unlawful)

means that can reasonably be used to link the data

held by the controller with the additional information

from the third party to enable re-identification.

Some researchers assume that only the

capabilities of the data controller should be taken into

account, thus excluding the capabilities of third

parties or at least seeing such capabilities as

insignificant (Groos & van Veen, 2020) in regard to

time, cost and manpower. Recital 26 states means

reasonably likely to be used (...) by the controller or

another person (…), which emphasis not only the

controller’s but also the third person’s capabilities.

Notwithstanding the fact that non-mandatory law is

not binding under Article 288 of the Treaty on the

Functioning of the European Union, recitals add a

layer of understanding and define what the rules mean

in the context of a particular case. The recitals must,

therefore, be respected.

Contrary to the court's opinion and statements

from the literature, I argue that even unlawful means

should be included in the discussion. Attacks by third

parties should not be ignored here, as there are often

legally impermissible but technically easy to

implement measures for re-identifying individuals.

For the evaluation of the re-identification potential, it

is, therefore, important to consider not only the

legally permissible measures, but also the technically

possible ones. The evaluation of the likelihood must

apply an objective criterion, i.e. it must not depend on

the motivation or the intention to obtain the means or

to actually use it in a particular case. Taking into

account the high risk of re-identification mentioned

above it seems questionable whether the CJEU took

this into account in its deliberations. The risks of re-

identification, therefore, also address the way

attackers can identify data subjects in data sets. This

means that if prohibited means allow re-identification

the data is not considered anonymised. With the

attack methods mentioned above it becomes evident

that despite disjoint data sets in the ML process

inferences cannot be completely prevented. The

generated additional knowledge can be accomplished

with reasonable effort. With Privacy Preserving ML

– as will be shown – there are nevertheless measures

that mitigate the risk of re-identification (Article 29

Working Party, 2007) and can be used to a reasonable

extent. By implementing technical and organisational

measures as part of the training process, and

especially by considering the inherent risk of

adversarial attacks, all means that could reasonably

be used to identify the data subject are taken into

account in determining the risks.

The ML models can consequently be interpreted

as personal data by re-identifying the data contained

in the training data through an adversarial attack. All

The Concept of Identiﬁability in ML Models

217

information that relates to a person - no matter how

trivial or banal it may seem - is considered personal

data. If one follows this line of reasoning, information

obtained through an adversarial attack is also personal

data.

2.2 Pseudonymising Effect of ML

Models

As already indicated the concept of additional

information is closely linked to that of

pseudonymisation. A pseudonym is considered a

piece of information that – depending on the

pseudonymisation function – are associated to an

identifier of a data subject (with different degrees of

linkability) and, therefore, carries the risk of being

subject to a re-identification attack, as those described

above. Pseudonymisation's decisive feature is that,

according to Art. 4 No. 5 GDPR, pseudonyms can no

longer be associated with a specific person without

the use of additional information (ENISA European

Union Agency for Network and Information Security,

2019).

Both background knowledge and personal

knowledge must be included in the criterion of

additional knowledge. The latter is the information

that could be kept separately from the dataset by

technical and organisational measures whereas

background knowledge corresponds to the

knowledge that is publicly accessible to an average,

reasonably competent individual which cannot be

physically separated from the dataset and can have a

high impact on re-identification risk. Personal

knowledge, on the other hand, can vary from one

person to another and represents information that is

not publicly accessible to an average, reasonably

competent individual, but to some qualified

individuals. In combination with anonymised data,

this personal knowledge in conjunction with the

derived attribute(s) can lead to re-identification or at

least disclosure of (potentially sensitive) information

about an individual. Therefore, the use of additional

information is central to the re-identification risk and

on the same time strength of the pseudonymisation.

This process can be more or less complex depending

on the pseudonymisation function.

It remains to be ascertained if ML models

represent pseudonyms.

It is, therefore, sufficient if the data subject can be

identified and statements can be made about his or her

factual and personal circumstances (Article 29

Working Party, 2007). In the training process certain

properties of the training datasets are stored in the

model as feature vectors - regardless of whether they

are labelled or stored, which basically depends on the

application or learning technique. Support Vector

Machines or k-nearest neighbour classification

methods store the feature vectors whereas neural

networks, for example, do not, but can remember

them unintentionally (Carlini et al.). In the latter, a

model inversion attack, for instance, generates feature

vectors similar to those used to train the model by

using the outputs obtained from the model. Such

training data sets, which consist of a set of features

and an associated output, may contain sensitive

information - like medical records or images - and

thus have quasi-identifiers or values of other features

that can be used to identify individuals. According to

the GDPR, this information is considered personal

data. The ML process is reversible, insofar as an

external assignment rule remains and thus a general

possibility of re-identification exists.

Based on this, the following constellation was

developed to illustrate the effects of Model Inversion

Attacks:

It is assumed that there is a generated data set

with personal data A and an ML model B

trained with personal data B. Access is given

either via the model directly (white box) or via

an interface (black box).

In such a constellation model B represents the

pseudonymised version of the training data set B

while data set A represents the key or the assignment

rule which can be used to (partially) re-identify this

data. If an attacker has access to A and model B and

the model inversion attack is successful, it seems

possible to consider not only A but also model B (and

its output) as personal data. If model B has been

published and A as well as model B are kept by

different persons, model B is also considered personal

data.

This constellation can also be applied to

Membership Inference Attacks and Model

Manipulation Attacks. In contrast to Model Inversion

Attacks, in a Membership Inference Attack, the

shadow models comparatively represent the key or

assignment rule, whereas in a (black box) Model

Manipulation Attack, the key or assignment rule is

considered to be the enriched randomly generated but

unique data that can be used to retrieve the

information stored in the labels.

This information cannot be obfuscated in ML

models that are vulnerable to adversarial attacks to

the extent that re-identification is no longer possible.

Reversibly pseudonymised data is considered

indirectly identifiable information about individuals -

even if the disclosure is not made consciously

IoTBDS 2022 - 7th International Conference on Internet of Things, Big Data and Security

218

possible - with the exception of the model

manipulation attack - under previously specified

conditions. Conceptually, the above is thus close to

the notion of pseudonymisation in the GDPR.

Consequently, a ML model attacked by

adversarial attacks can, therefore, no longer be

considered anonymous. Re-identification of the

training data seems to be within the realm of

possibility. Such an ML model should, therefore, be

subject to similar principles and regulations that apply

to identifiable data.

2.3 Technical and Organisational

Measures

These results are extremely problematic from the

perspective of a researcher or company that wants to

use ML models and attribute an anonymising effect

to them.

As described above, the personal identifiability of

data depends on the context. Therefore, it is necessary

to regularly assess whether data can (still) be

considered anonymous. As a consequence, personal

identifiability has to be determined dynamically and

risk-dependently. A change in the situation can also

lead to a change in the risk of (re-)identification. This

is affected, for example, by the knowledge of third

parties or by new developments in (de-

)anonymisation techniques (Article 29 Working

Party, 2014b) A non-recurrent risk analysis is

therefore insufficient. Anonymisation in general is,

therefore, subject to a number of uncertainties,

especially with regard to the fact that the relevant

technical and social circumstances can change rapidly

over time. Despite anonymisation, a residual risk may

remain for the data subject (Article 29 Working Party,

2014b). These risks also apply to ML models.

Inferences cannot be completely prevented, as seen

above. It is possible to retrospectively draw

conclusions from the ML model to the data used for

training. Adversarial attacks can, therefore, lead to a

different classification of the (output of) ML models

that were previously considered anonymous.

Consequently, if vulnerable ML models do not

represent anonymous data, methods must be found

that can guarantee both utility and data protection.

This also applies to the training process of ML

models.

The question is how one can reliably defend the

above mentioned attacks on pseudonymisation.

Firstly, the whole training dataset and all data values

should be considered. Secondly, any knowledge and

inferences should be eliminated if possible.

Therefore, an effective privacy protection technique

should be applied. However, the challenge is to

ensure privacy protection without reducing utility.

The trade-off between protection and utility is

apparent.

Based on the risks highlighted above, it is

essential to be cautious when using ML models to

process sensitive information. One has to consider not

only what kind of model is used and how it is

provided, but also how the data should be prepared

before the training process. Many algorithms

commonly used are based on the assumption that they

need raw data. However, with Privacy Preserving

ML, there are methods (Al-Rubaie & Chang, 2019;

Gambs et al., 2021; Jia et al., 2019; Milad Nasr et al.,

2018; Mukherjee et al., 2021; Nasr et al., 2019) to

reduce the effectiveness of the above attacks while

preserving utility. There are several approaches

depending on the application and model. Gambs

(Gambs et al., 2021) demonstrate, for example, an

approach for synthesised data based on Differential

Privacy that reduces the risk of adversarial attacks

while preserving utility whereas Mukherjee

(Mukherjee et al., 2021) optimise current approaches

to mitigate Membership Inference Attacks on GAN

models that previously resulted in poorer generated

sample quality. The authors stated not only that their

method provide protection against Membership

Inference Attacks “while leading to negligible loss in

downstream performances” (Mukherjee et al., 2021)

but also that their algorithm prevent memorisation of

the training data set. In order to prevent Membership

Inference Attacks it is also proposed to limit the

number of classes that a model can predict to the most

commonly used classes. Avoiding overfitting a model

can also be beneficial (Yang et al., 2020). The use of

regularisation techniques like dropout (Srivastava et

al., 2014) may contribute to prevent overfitting and

also to strengthen privacy (Jain et al., 2015) in neural

networks. However, no guarantee exists that a model

is completely invulnerable to attack. In some cases, it

has been shown that such attacks are successful even

without overfitting the model (Yeom et al., 2017).

However, overfitting is not the only reason that

causes Membership Inference Attacks. Even if ML

models are overfitted they could leak different

amounts of membership information. Specifically,

due to their different structures, they might remember

information about the data used for training.

Nonetheless, if no raw data is used for training,

the risk assessment of whether personal data is

affected could be completely different. However, it

should clearly be stated that previous approaches to

defend against Membership Inference Attacks have

limited effect on Model Inversion Attacks. To my

The Concept of Identiﬁability in ML Models

219

knowledge, there is no known method that adequately

defends both attacks.

3 CONCLUSION

As outlined above, the ambiguous terms of the GDPR

as well as the classification of ML models as

pseudonyms cause a number of problems which need

to be addressed. It is, therefore, important that

adjusted and comprehensive guidelines and best

practices are elaborated to eliminate the uncertainty

as to which technical offerings meet legal standards

as well as adequately match practical expectations.

Measuring privacy through the lens of adversarial

attacks is therefore crucial. As stated earlier, one

should not rely solely on the belief in a data

anonymising ML model but also that the model be

assessed with respect to a privacy test based on

adversarial attacks to quantify the privacy protection

provided by the processing method. It is, therefore, of

crucial importance that the question of whether a data

set is considered to be personal, pseudonymised or

anonymised can be answered without ambiguity. The

GDPR could apply depending on the outcome.

Concerning ML, anonymous data itself does not

guarantee privacy without the support of other

techniques. Which does not mean that anonymisation

is a useless tool, but it must be applied with the

support of other Privacy Preserving mechanisms. In

order to properly assess and mitigate privacy threats,

a risk-based approach to be evaluated regularly

should be adopted, taking into account the purpose

and overall context of the processing of personal data,

as well as the degree of utility and scalability.

Choosing technical and organisational measures

depends on various parameters, e.g. the level of data

protection and the utility of the pseudonymised data,

which may lead to different approaches or even

variations of approaches. The trade-off between

utility and data protection should carefully be

analysed. On one side, utility need to be optimised for

the intended purposes while keeping a strong data

protection. This field of privacy preserving ML is

gradually becoming a highly debated topic and is a

challenging one, with a high dependency on matters

of context, involved entities, data types and additional

knowledge. There is consequently no single approach

that fit for all possible scenarios. A one-size-fits-all

solution is not sufficient. Applying robust

pseudonymisation to reduce the risk of adversarial

attacks and maintain the utility of the pseudonymised

data requires a high level of competence.

It is therefore necessary to develop a holistic and

legally binding concept consisting of governmental

and technical measures. The following criteria can

provide preliminary orientation.

The Data Protection Authorities as well as the

EDPB should, therefore, provide practical guidance

with regard to the assessment of the risk and best

practices in the field of pseudonymisation and

anonymisation. The definition and explanation of the

state of the art is of crucial importance. Furthermore,

the notion of identifiability and anonymisation needs

to be readdressed. The adversarial attacks are

evolving which leads to more and more challenging

anonymisation and pseudonymisation process. The

authorities should, therefore, extend the current

techniques to more advanced solutions addressing the

special challenges appearing with ML models. The

relevant EU institutions should provide support and

disseminate these efforts. To achieve more legal

certainty, manageable standards for anonymisation

associated with presumption rules in the event of

compliance should be established at EU level.

Considering the advancing technical development, it

would also be advisable to provide the standards with

a temporal validity. Furthermore, standardised test

procedures should be developed to check supposedly

anonymous data for personal identifiability.

Guidance on appropriate or inappropriate techniques

is therefore indispensable. The Article 29 Data

Protection Working Party as well as ENISA provided

guidance about the use of various privacy

technologies – including Differential Privacy. The

anonymisation and pseudonymisation techniques

should be revisited concerning the aforementioned

highly probably adversarial attacks and the inherent

flexible determinability of the degree of anonymity.

Orientation could be formulated at EU level by means

of a guideline for determining suitable anonymisation

procedures, as well as addressing criteria for

determining appropriate procedural parameters. The

regularly updated technical guideline of the

Bundesamt für Sicherheit in der Informationstechnik

(Federal Office for Information Security in Germany)

„Kryptographische Verfahren: Empfehlungen und

Schlüssellängen“ (Cryptographic Procedures:

Recommendations and Key Lengths), which gives

recommendations for cryptographic procedures could

be the role model in this respect.

This work has the limitation of being purely

theoretical. Nonetheless, it provides not only a

revisited evaluation of the notion of identifiability and

the classification of ML models as pseudonymised

data but also an insight into the inherent risk for ML

models as well as sufficient technical and

IoTBDS 2022 - 7th International Conference on Internet of Things, Big Data and Security

220

organisational measures which has to be standardised.

Overall, the work provided ample background

information on relevant concepts concerning

anonymisation and pseudonymisation and how to

deal with the fact that the ML process does not have

an anonymising effect. Clearly, more practical

interdisciplinary work linking up adversarial attacks

and Privacy Preserving techniques with regulation

and data protection efforts needs to ramp up. Above

all, it is important that the uncertainty associated with

adversarial attacks is surmounted by governmental

and technical standards, which will be developed in

the future.

REFERENCES

Al-Rubaie, M., & Chang, J. M. (2019). Privacy-Preserving

Machine Learning: Threats and Solutions. IEEE

Security & Privacy Magazine, 17(2), 49–58.

https://doi.org/10.1109/MSEC.2018.2888775

Article 29 Working Party. (2007). WP 136: Opinion 4/2007

on the concept of personal data.

Article 29 Working Party. (2014a). Opinion 05/2014 on

Anonymisation Techniques WP216.

Article 29 Working Party. (2014b). WP 217: Opinion

06/2014 on the notion of legitimate interests of the data

controller under Article 7 of Directive 95/46/EC,

Brasher, E. A. (2018). Addressing the Failure of

Anonymization: Guidance from the European Union's

General Data Protection Regulation. Columbia

Business Law Review, 2018, 209. https://heinonline.

org/HOL/Page?handle=hein.journals/colb2018&id=21

5&div=&collection=

Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., & Song, D. The

secret sharer: evaluating and testing unintended

memorization in neural networks. In Proceedings of the

28th USENIX Conference on Security Symposium

(SEC’19). USENIX Association.

Case C-582/14, Patrick Breyer v Bundesrepublik

Deutschland. (2016). https://eur-lex.europa.eu/legal-

content/EN/TXT/?uri=CELEX%3A62014CN0582

ENISA European Union Agency for Network and

Information Security (2019). Guidelines-on-shaping-

technology-according-to-GDPR-provisions.

https://www.ledecodeur.ch/wp-

content/uploads/2019/12/Guidelines-on-shaping-

technology-according-to-GDPR-provisions.pdf

Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model

Inversion Attacks that Exploit Confidence Information

and Basic Countermeasures. In I. Ray, N. Li, & C.

Kruegel (Eds.), Proceedings of the 22nd ACM SIGSAC

Conference on Computer and Communications

Security - CCS '15 (pp. 1322–1333). ACM Press.

https://doi.org/10.1145/2810103.2813677

Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., &

Ristenpart, T. (2014). Privacy in Pharmacogenetics: An

End-to-End Case Study of Personalized Warfarin

Dosing. In K. Fu (Ed.), 23rd USENIX Security

Symposium: August 20 - 22, 2014, San Diego, CA (pp.

17–32). USENIX Association.

Gambs, S., Ladouceur, F., Laurent, A., & Roy-Gaumond,

A. (2021). Growing synthetic data through

differentially-private vine copulas. Proceedings on

Privacy Enhancing Technologies, 2021(3), 122–141.

https://doi.org/10.2478/popets-2021-0040

Groos, D., & van Veen, E. (2020). Anonymised Data and

the Rule of Law. European Data Protection Law

Review, 6(4), 498–508. https://doi.org/10.21552/

edpl/2020/4/6

Hayes, J., Melis, L., Danezis, G., & Cristofaro, E. D. (2019).

LOGAN: Membership Inference Attacks Against

Generative Models. Proceedings on Privacy Enhancing

Technologies, 2019(1), 133–152. https://doi.org/10.

2478/popets-2019-0008

Hilprecht, B., Härterich, M., & Bernau, D. (2019). Monte

Carlo and Reconstruction Membership Inference

Attacks against Generative Models.

Proceedings on

Privacy Enhancing Technologies, 2019(4), 232–249.

https://doi.org/10.2478/popets-2019-0067

Hu, R., Stalla-Bourdillon, S., Yang, M., Schiavo, V., &

Sassone, V. (2017). Bridging Policy, Regulation, and

Practice? A Techno-Legal Analysis of Three Types of

Data in the GDPR.

Jain, P., Kulkarni, V., Thakurta, A., & Williams, O. (2015,

March 6). To Drop or Not to Drop: Robustness,

Consistency and Differential Privacy Properties of

Dropout. https://arxiv.org/pdf/1503.02031

Jayaraman, B., & Evans, D. (2019). Proceedings of the 28th

USENIX Security Symposium: August 14-16, 2019,

Santa Clara, CA, USA. USENIX Association.

https://atc.usenix.org/system/files/sec19-

jayaraman.pdf

Jia, J., Salem, A., Backes, M., Zhang, Y., & Gong, N. Z.

(2019). Memguard. In L. Cavallaro (Ed.), ACM Digital

Library, Proceedings of the 2019 ACM SIGSAC

Conference on Computer and Communications

Security (pp. 259–274). Association for Computing

Machinery. https://doi.org/10.1145/3319535.3363201

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., &

Guo, S. (2016). Protection of Big Data Privacy. IEEE

Access, 4, 1821–1834. https://doi.org/10.1109/

ACCESS.2016.2558446

Milad Nasr, Reza Shokri, & Amir Houmansadr (2018).

Machine Learning with Membership Privacy using

Adversarial Regularization. In ACM Conference on

Computer and Communications Security (CCS).

https://www.researchgate.net/publication/326782376_

Machine_Learning_with_Membership_Privacy_using

_Adversarial_Regularization

Mourby, M., Mackey, E., Elliot, M., Gowans, H., Wallace,

S. E., Bell, J., Smith, H., Aidinlis, S., & Kaye, J. (2018).

Are ‘pseudonymised’ data always personal data?

Implications of the GDPR for administrative data

research in the UK. Computer Law & Security Review,

34(2), 222–233. https://doi.org/10.1016/j.clsr.2018.0

1.002

The Concept of Identiﬁability in ML Models

221

Mukherjee, S., Xu, Y., Trivedi, A., Patowary, N., & Ferres,

J. L. (2021). privGAN: Protecting GANs from

membership inference attacks at low cost to utility.

Proceedings on Privacy Enhancing Technologies,

2021(3), 142–163. https://doi.org/10.2478/popets-

2021-0041

Nasr, M., Shokri, R., & Houmansadr, A. (2019).

Comprehensive Privacy Analysis of Deep Learning:

Passive and Active White-box Inference Attacks

against Centralized and Federated Learning. In 2019

IEEE Symposium on Security and Privacy: Sp 2019 :

San Francisco, California, USA, 19-23 May 2019 :

Proceedings (pp. 739–753). IEEE.

https://doi.org/10.1109/SP.2019.00065

Piras, L., Al-Obeidallah, M. G., Praitano, A., Tsohou, A.,

Mouratidis, H., Gallego-Nicasio Crespo, B., Bernard, J.

B., Fiorani, M., Magkos, E., Sanz, A. C., Pavlidis, M.,

D’Addario, R., & Zorzino, G. G. (2019). DEFeND

Architecture: A Privacy by Design Platform for GDPR

Compliance. In S. Gritzalis, E. R. Weippl, S. K.

Katsikas, G. Anderst-Kotsis, A. M. Tjoa, & I. Khalil

(Eds.), Trust, Privacy and Security in Digital Business

(pp. 78–93). Springer International Publishing.

Pomares-Quimbaya, A., Sierra-Múnera, A., Mendoza-

Mendoza, J., Malaver-Moreno, J., Carvajal, H., &

Moncayo, V. (532019). Anonylitics: From a Small Data

to a Big Data Anonymization System for Analytical

Projects. In Proceedings of the 21st International

Conference on Enterprise Information Systems (pp. 61–

71). SCITEPRESS - Science and Technology

Publications. https://doi.org/10.5220/00076852006100

Pyrgelis, A., Troncoso, C., & Cristofaro, E. D. Knock

Knock, Who's There? Membership Inference on

Aggregate Location Data. http://arxiv.org/pdf/1708.

06145v2

Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017,

May 22–26). Membership Inference Attacks Against

Machine Learning Models. In 2017 IEEE Symposium

on Security and Privacy (SP) (pp. 3–18). IEEE.

https://doi.org/10.1109/SP.2017.41

Song, C., Ristenpart, T., Shmatikov, V., & 2017. Machine

Learning Models that Remember Too Much. In

Thuraisingham, Evans et al. (Hg.) 2017 – Proceedings

of the 2017 ACM (pp. 587–601). https://doi.org/10.11

45/3133956.3134077

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., &

Salakhutdinov, R. (2014). Dropout: A Simple Way to

Prevent Neural Networks from Overfitting. Journal of

Machine Learning Research. https://dl.acm.org/doi/

pdf/10.5555/2627435.2670313

Stadler, T., Oprisanu, B., & Troncoso, C. (2020a,

November 13). Synthetic Data -- Anonymisation

Groundhog Day. https://arxiv.org/pdf/2011.07018.pdf

Stadler, T., Oprisanu, B., & Troncoso, C. (2020b,

November 13). Synthetic Data -- Anonymisation

Groundhog Day. https://arxiv.org/pdf/2011.07018

Stalla-Bourdillon, S., & Knight, A. (2017). Anonymous

Data v. Personal Data — A False Debate: An EU

Perspective on Anonymization, Pseudonymization and

Personal Data. Wisconsin International Law Journal,

34(2), 285–322. http://hosted.law.wisc.edu/wordpress/

wilj/files/2017/12/Stalla-Bourdillon_Final.pdf

Yang, Z., Shao, B., Xuan, B., Chang, E.C., & Zhang, F.

(2020, May 8). Defending Model Inversion and

Membership Inference Attacks via Prediction

Purification. https://arxiv.org/pdf/2005.03915v1.pdf

Yeom, S., Giacomelli, I., Fredrikson, M., & Jha, S. (2017,

September 5). Privacy Risk in Machine Learning:

Analyzing the Connection to Overfitting.

https://arxiv.org/pdf/1709.01604

IoTBDS 2022 - 7th International Conference on Internet of Things, Big Data and Security

222