Security Evaluation of Decision Tree Meets Data Anonymization

Ryousuke Wakabayashi

, Lihua Wang

2 a

, Ryo Nojima

3,2 b

and Atsushi Waseda

1,∗ c

Department of Informatics, Tokyo University of Information Sciences,

4-1 Onaridai, Wakaba-ku, Chiba 265-8501, Japan

Cybersecurity Research Institute, National Institute of Information and Communications Technology,

4-2-1, Nukui-Kitamachi, Koganei, Tokyo 184–8795, Japan

College of Information Science and Engineering, Ritsumeikan University,

1-1-1 Noji-higashi, Kusatsu, Shiga 525-8577, Japan

Keywords:

Privacy Preserving, Decision Tree, k-Anonymity.

Abstract:

This paper focuses on the relationship between decision trees, a typical machine learning methods, and data

anonymization. We ﬁrst demonstrate that the information leakage from trained decision trees can be evalu-

ated using well-studied data anonymization techniques. We then show that decision trees can be strengthened

against speciﬁc attacks using data anonymization techniques. Speciﬁcally, we propose two decision tree prun-

ing methods to improve security against uniqueness and homogeneity attacks, and we evaluate the accuracy

of these methods experimentally.

1 INTRODUCTION

Recently, with the rapid evolution of machine learn-

ing technology and the expansion of data due to the

information technology developments, it has become

increasingly important for companies to determine

how to best utilize big data effectively and efﬁciently.

However, big data often includes personal and privacy

information; thus, careless utilization of such sensi-

tive information may lead to unexpected penalties.

Many privacy-preserving technologies have been

proposed to utilize data while preserving user privacy.

Typical privacy-preserving technologies include data

anonymization and secure computation. In addition,

privacy-preserving technologies inherently involve a

trade-off relationship between security and usability.

Therefore, in this paper, we evaluate the pri-

vacy leakage of decision trees, which is a funda-

mental machine learning method, trained using data

containing personal information. In particular, we

examine the extent to which personal information

is leaked from the model from a data anonymiza-

tion perspective. Historically, data anonymization re-

search has progressed from pseudonymization to k-

https://orcid.org/0000-0002-7553-423X

https://orcid.org/0000-0002-2955-2920

https://orcid.org/0000-0002-3594-5704

∗

Corresponding author.

anonymity (Sweeney, 2002), l-diversity (Machanava-

jjhala et al., 2006), t-closeness (Li et al., 2007), and

so on. Currently researchers are focusing on mem-

bership privacy and differential privacy (Stadler et al.,

2022).

In this paper, we initially discuss the common

structure of decision trees and data anonymization.

We then demonstrate that previously proposed attacks

against anonymization in the past can also be applied

to decision trees. Speciﬁcally, we demonstrate that

(1) the uniqueness attack against anonymization

(pseudonymization),

(2) the homogeneity attack and background knowl-

edge attack against k-anonymity,

which are representative attacks, can also be applied

to decision trees. In addition, we discuss:

(3) how to prevent privacy information leakage from

a learned decision tree using data anonymization

techniques.

Speciﬁcally, we employ k-anonymity as a means

to enhance the security of decision trees. It is note-

worthy that similar methods have been proposed in

previous studies. For example, Slijepcevic et al. pro-

vided a systematic comparison and detailed investi-

gation into the effects of k-anonymisation data on the

results of machine learning models. However, they

did not investigate the impact of k-anonymization on

Wakabayashi, R., Wang, L., Nojima, R. and Waseda, A.

Security Evaluation of Decision Tree Meets Data Anonymization.

DOI: 10.5220/0012456600003648

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Information Systems Security and Privacy (ICISSP 2024), pages 853-860

ISBN: 978-989-758-683-5; ISSN: 2184-4356

853

Table 1: Dataset.

Zip Age Nationality Disease

13053 28 Russian Heart

13068 29 American Heart

13068 21 Japanese Flu

13053 23 American Flu

14853 50 Indian Cancer

14853 55 Russian Heart

14850 47 American Flu

14850 59 American Flu

13053 31 American Cancer

13053 37 Indian Cancer

13068 36 Japanese Cancer

13068 32 American Cancer

Table 2: k-anonymity (k = 4).

Zip Age Nationality Disease

130** < 30 * Heart

130** < 30 * Flu

1485* < 30 * Cancer

1485* > 40 * Heart

1485* > 40 * Flu

130** 30-40 * Cancer

trained decision trees (Slijepcevic et al., 2021). No-

jima and Wang proposed a method that employs k-

anonymity to enhance randomized decision trees, re-

sulting in satisfactory levels of differential privacy.

The advantage of this proposed method lies in its abil-

ity to achieve differential privacy without introducing

Laplace noise (Nojima and Wang, 2023). Their work

differs signiﬁcantly from existing differentially pri-

vate decision tree protocols (Friedman and Schuster,

2010; Jagannathan et al., 2012; Bai et al., 2017) which

require adding noise to the tree model, although a lim-

itation of their approach is the need for multiple trees.

These results of the current study suggest that

there is a deep relationship between data anonymiza-

tion and decision trees, and that investigating

anonymization, including k-anonymity, is beneﬁcial

in terms of analyzing and improving the privacy pro-

tection mechanism of decision trees.

The remainder of this paper is organized as fol-

lows. Section 2 introduces relevant preliminary in-

formation, e.g., anonymization methods and decision

trees. Section 3 demonstrates how to convert at-

tack methods against data anonymization into attacks

against decision trees. In addition, relevant experi-

mental results of these attacks are also discussed in

Section 3. In Section 4, we demonstrate how much se-

curity and accuracy can be practically realized when

the decision tree is strengthened using a method that

is similar to k-anonymity. Finally, the paper is con-

cluded in Section 5, including a brief discussion of

potential future issues.

2 PRELIMINARIES

2.1 Data Anonymization

When providing personal data to a third party, it is

necessary to modify the data to preserve user privacy.

Here, modifying the user’s data (i.e., a record) such

that an adversary cannot re-identify a speciﬁc indi-

vidual is referred to as data anonymization. As a ba-

sic technique, to prevent re-identiﬁcation, an identi-

ﬁer, e.g., a name or employee number is deleted or

the data holder replaces it with a pseudonym ID. This

process is referred to as pseudonymization. However,

simply modifying identiﬁers does not imply privacy

preservation. In some cases, individuals can be re-

identiﬁed by a combination of features (i.e., a quasi-

identiﬁer); thus, it is necessary to modify both the

identiﬁer and the quasi-identiﬁer to reduce the risk

of re-identiﬁcation. In most of cases, the identiﬁers

themselves are not used for data analysis; thus, re-

moving identiﬁers does not sacriﬁce the quality of

the dataset so much. However, if we modify quasi-

identiﬁers in the same manner, although the data may

become anonymous, it will also become useless. A

typical anonymization technique for quasi-identiﬁers

is to “roughen” the numerical values.

2.2 Attacks Against Data

Anonymization

2.2.1 Attacks Against Pseudonymization

A simple attack is possible against pseudonymized

data from which identiﬁers, e.g., names, have been

removed. In this attack, the attacker uses the quasi-

identiﬁer of a user u. If this attacker obtains the

pseudonymized data, by searching for user u’s quasi-

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

854

Figure 1: Homogeneity attack.

Figure 2: Background knowledge attack.

identiﬁer in pseudonymized data, the attacker can ob-

tain sensitive information about u. For example, if

the attacker obtains the dataset shown in Table 1, and

knows friend u’s Zip code is 13068, age is 29, and na-

tionality is American, then, by searching the dataset,

the attacker can identify that user u is suffering from

some heart-related disease. This attack is referred to

as the Uniqueness attack.

k-Anonymity: k-anonymity is a countermeasure

against the uniqueness attack. In k-anonymity, fea-

tures are divided into quasi-identiﬁers and sensitive

information, and the same quasi-identiﬁer is mod-

iﬁed such that it does not become less than k − 1

users. Table 2 shows anonymized data that has been k-

anonymized (k = 4) using quasi-identiﬁers, e.g., ZIP

code, age, and nationality.

2.2.2 Homogeneity Attack

At a cursory glance, k-anonymity appears to be se-

cure, however even if k-anonymity is employed, a ho-

mogeneity attack is still feasible. This attack becomes

possible if the sensitive information is the same. Let’s

see the k-anonymized dataset shown on the right side

of Figure 1, and we assume the attacker on the left

side of Figure 1. Here, the attacker has the informa-

tion (Zip, Age) = (13053, 35) and all sensitive infor-

mation in those records is cancer, it can be revealed

that Bob has cancer.

Figure 3: Decision tree.

2.2.3 Background Knowledge Attack

Homogeneous attacks suggest a problem when

records with the same quasi-identiﬁer have the same

sensitive information; however, a previous study (Li

et al., 2007) also argued that there is a problem even

in cases where the records are not the same. The

k-anonymized dataset on the right side of Figure 2

shows four records with quasi-identiﬁers (130,<

30,∗), and two types of sensitive information, i.e.,

(Heart, Flu). Here, assume that the attacker has back-

ground knowledge of the data similar to that shown

on the left side of Figure 2. In this case, there are cer-

tainly possibilities of Heart and Flu; however, if the

probability of Japanese experiencing heart disease is

extremely low, Umeko is estimated as ﬂu. Thus, it

must be acknowledged that k-anonymity does not pro-

vide a high degree of security.

2.3 Decision Trees

Decision trees are supervised learning methods that

are primarily used for classiﬁcation tasks, and a tree

structure is created while learning from data (Fig-

ure 3). When predicting the label y of x, the process

begins from the root of the tree, and the correspond-

ing leaf is searched for while referring to each feature

of x. Finally, by this referral process, y is predicted.

The label determined by the leaf is determined by

the dataset D used to generate the tree structure. In

other words, after the tree structure is created, for each

element (x

) in dataset D, the corresponding leaf ℓ

is found and the value of y

is stored. If y

∈ Y =

{0,1}, then in each leaf ℓ, the number of y

that was 0

and the number of y

that was 1 are preserved. More

precisely, [ℓ

,ℓ

] are preserved for each leave ℓ, where

ℓ

and ℓ

represent the numbers of data with label y

that were 0 and 1, respectively. Table 3 shows the

notations used in the paper.

For the prediction given x, we ﬁrst search for

the corresponding leaf, and it may be judged as 1 if

ℓ

+ℓ

, and 0 otherwise. Here the threshold 1/2

can be set ﬂexibly depending on where the decision

Security Evaluation of Decision Tree Meets Data Anonymization

855

Table 3: Notations.

k Anonymization parameter

D Dataset {x

}

x Data (x

,...,x

) with f features

y Label of data x

Y Label space Y = {0, 1} in the paper

ℓ Leaf or Leaves

ℓ

The number of data that classed to leaf ℓ with

label i ∈ Y

ℓ

The number of data that classed to leaf ℓ, i.e.,

ℓ

= ℓ

+ ℓ

s The pruning threshold, which is set to k − 1 in

the experiments

Total number of users who can be identiﬁed by

a homogeneous attack

ℓ

Number of leaves can perform homogeneity

attack

tree is applied, and when providing the learned deci-

sion tree to a third party, it is possible to pass ℓ

and

ℓ

together for each leaf ℓ. In this paper, we consider

the security of decision trees in such situations.

Generally, the deeper the tree structure, the more

likely it is to overﬁt; thus, we frequently prune the

tree, and this technique is employed to realize privacy

preservation in this paper.

3 SECURITY OF DECISION

TREES FROM DATA

ANONYMIZATION

PERSPECTIVE

3.1 Security Analysis

Generally, a decision tree is constructed from a given

dataset; however, we show that it is also possible

to partially reconstruct the dataset from the decision

tree. Table 4 shows an example of re-constructing a

dataset from the decision tree shown in Figure 3. As

can be seen, it is impossible to reconstruct the original

data completely from a binary tree model; however, it

is possible to extract some of the data. By exploiting

this essential property, it is possible to mount some at-

tacks against reconstructed data, as discussed in Sec-

tion 2. In the following, using Table 4 as an example,

we discuss speciﬁc cases of how each attack can be

applied.

• Uniqueness Attack: In the dataset (Table 4) re-

covered from the model, there is one user whose

height is greater than 170 and who is under 15

years of age (in the sixth row); thus, it is possible

to perform a uniqueness attack against this user.

Table 4: Example of conversion from decision tree to

anonymized data.

Height Weight Age Helth

< 170 < 60 * yes

< 170 < 60 * no

< 170 ≥ 60 * no

≥ 170 * < 15 yes

≥ 170 * ≥ 15 yes

≥ 170 * ≥ 15 no

Note that pruning decision trees can be an effec-

tive mechanism to prevent uniqueness attacks.

• Homogeneous Attack: Similarly, in the fourth and

ﬁfth rows, height < 170, weight ≥ 60, and health

status are the same (i.e., “unhealthy”), thus, an ho-

mogeneous attack is possible.

• Background knowledge attack: Similarly, in the

seventh, eighth, and ninth rows there are 3 users

whose data meet both height ≥ 170 and age ≥ 15.

Among these users, one is healthy (yes) and two

are unhealthy (no). As an attacker, we can con-

sider the following:

– (Background knowledge of user A) Height:

173, Age: 33, Healthy

– (Background knowledge of target user B)

Height: 171, age: 19.

In this case, if the adversary knows that user A

is healthy, he/she can identify that user B is un-

healthy.

3.2 EXPERIMENTS

In this study, we used three datasets to evaluate the

vulnerability of decision trees against uniqueness and

homogeneous attacks, i.e., the Nursery dataset (Ra-

jkovic, 1997), the Loan dataset (Mahdi Navaei, ), and

the Adult dataset (Becker and Kohavi, 1996). In these

experiments, we used Python3 and sklearn library

to train the decision trees. The characteristics of each

dataset are described as follows:

• Nursery Dataset: The Nursery dataset contains

12,960 records with 8 features, with a maximum

of ﬁve values for each feature.

• Loan Dataset: The Loan dataset contains 5,000

records with 12 features. Each feature has many

possible values, and the number of records is

small.

• Adult Dataset: The Adult dataset contains

48,842 records with 14 features. Here, each fea-

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

856

Table 5: Number of leaves for which a uniqueness attack is

possible.

Tree Depth Nursery Loan Adult

3 0 0 0

4 0 0 0.5

5 0 0 1.8

6 0 3.7 5.1

7 0 6 11

Table 6: Number of leaves (N

ℓ

) can perform homogeneity

attacks & Total number of users (N

) who can be identiﬁed

by a homogeneous attack.

Nursery Loan Adult

Tree Depth (N

ℓ

) (N

ℓ

) (N

ℓ

) (N

)

3 1 3448 3.8 3653 0.3 0.7

4 1 3448 4.8 3303 2.4 210

5 3 5057 7.9 3729 7.3 830

6 11.2 6822 16.7 3746 19 1623

7 24 7863 27 3837 34.4 1918

ture has more possible values and more records

than the Nursery and Loan datasets

3.2.1 Uniqueness Attack Experiment

In this experiment, the tree depths were set to 3, 4,

5, 6, and 7. We divided each dataset into a training

set and an evaluation set. The training set, which was

used to train the decision tree, contained 80% of the

records in the dataset. Here, the decision tree was

trained 10 times and the average was taken. The num-

ber of leaves for which a uniqueness attack is possi-

ble for each dataset is shown in Table 5. On the Adult

dataset, there are cases where an individual can be

identiﬁed by taking only four features. Thus, it is pos-

sible to perform a uniqueness attack from the trained

decision tree. In other words, the risk of information

leakage is possible. In addition, the Nursery dataset

has a small number of value types for each feature;

thus, the risk of uniqueness attacks is low.

3.2.2 Homogeneous Attack Experiments

Table 6 shows the results of the homogeneous attack

experiments. As in the previous experiment, here, we

set the tree depth to 3, 4, 5, 6, and 7, and we divided

each dataset into a training set (80%) and an evalua-

tion set. The decision tree was trained 10 times, and

averages of the following numbers were computed.

• Number of leaves ℓ such that (ℓ

,ℓ

) =

(0,z),(z, 0), where z > 0, and

• Number of users who can be identiﬁed by a ho-

mogeneous attack.

Figure 4: Tree depth & number of users who can be identi-

ﬁed by a homogeneous attack.

On all datasets, even if the tree depth is small, in-

formation can be leaked by a homogeneous attack. In

addition, similar to the uniqueness attacks, suscep-

tibility to homogeneous attacks increases as the tree

depth increases as shown in Figure 4.

4 USING ANONYMIZATION TO

STRENGTHEN THE DECISION

TREE

In this section, we show that the data anonymization

technique can strengthen the decision tree.

4.1 Applying k-Anonymity

In a previous study (Nojima and Wang, 2023), k-

anonymization was achieved by “removing leaves

with a small number of users” for a randomized de-

cision tree. Note that a similar method can be ap-

plied to the decision trees. Speciﬁcally, by setting

s = k−1 and pruning leaves such that n

ℓ

= ℓ

+ℓ

≤ s,

a method that is similar to k-anonymity can be real-

ized. Two corresponding methods are illustrated in

Figures 5 and 6. Here, after training, we modify the

trained decision tree as follows:

• Method 1 (Figure 5): Leaves ℓ that result in n

ℓ

≤ s

are pruned.

• Method 2 (Figure 6): For nodes with at least one

child with n

ℓ

≤ s, both children are pruned, and

the parent node is made a leaf node.

4.2 Experiments

We conducted experiments to verify the difference

in accuracy and the possibility of attack for pruning

Methods 1 and 2.

Here, a decision tree was trained in the manner

described previously, and the impact on accuracy with

Security Evaluation of Decision Tree Meets Data Anonymization

857

Figure 5: Decision tree pruning Method 1.

Figure 6: Decision tree pruning Method 2.

Figure 7: The pruning threshold s & accuracy.

a tree depth of 7 is shown in Table 7. For Method 1,

the impact on accuracy was large when s = 5 on the

Adult dataset. In addition, accuracy decreased as the

threshold value s increased.

In terms of Method 2, the inﬂuence on accuracy

was small, and the number of users who can perform

homogeneous attacks also decreased. We found that

Method 2 exhibited better accuracy and effectiveness

than Method 1 against homogeneous attacks as shown

in Figure 7. For Method 1, the record assigned to

the pruned leaf was deleted and was not used when

predicting the label. In contrast, for Method 2, it was

used as the predicted label at the parent node, which

appeared to inﬂuence accuracy in the experiments.

5 CONCLUSION

Contribution: In this paper, we evaluated the pos-

sibility of information leakage of data from trained

decision trees by discussing the relationship between

the decision trees and data anonymization. We found

that information leakage is possible via three distinct

attacks, i.e., the uniqueness, homogeneous, and back-

ground knowledge attacks. We veriﬁed that the risk

associated with uniqueness attacks was high when the

total number of feature combinations was large. In

contrast, we found that the risk of homogeneous at-

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

858

Table 7: Anonymization experiments for the proposed methods. * Set tree depth = 7, and the threshold value for pruning (i.e.,

k − 1) s = 5, 10, 20, 30, 40, and 50, where “original” represents the original decision tree without pruning. * Notations U and

H denote uniqueness and homogeneity attacks, respectively; ACC denotes accuracy; N

and N

ℓ

denote numbers of users and

leaves, respectively.

Nursery Loan Adult

s U H H ACC U H H ACC U H H ACC

(= k − 1) (N

) (N

ℓ

) (N

ℓ

) (N

ℓ

)

Experiment Result for Method 1

original 0 7863.9 24 0.9370 6 3837 27 0.9809 11 1918.9 34.4 0.8545

5 0 7863 24 0.9370 0 3803.5 11 0.9815 0 1881.5 14.7 0.8120

10 0 7863 24 0.9370 0 3784.5 8.5 0.9599 0 1854.3 11.4 0.8072

20 0 7863 24 0.9370 0 3752.3 6.1 0.9469 0 1811.5 8.6 0.7954

30 0 7861 23.9 0.9363 0 3743.7 5.8 0.9294 0 1750 6.1 0.7926

40 0 7777 21.7 0.9333 0 3711.8 4.9 0.9294 0 1735.9 5.7 0.7921

50 0 7696 19.8 0.9324 0 3684.5 4.3 0.9277 0 1693.7 4.8 0.7900

Experiment Result for Method 2

original 0 7863.9 24 0.9370 6.7 3838.3 27 0.9810 11 1918.9 34.4 0.8544

5 0 7863.9 24 0.9370 0 3763.9 7.8 0.9813 0 1269.1 10 0.8548

10 0 7860.2 23.9 0.9370 0 3648.2 5.5 0.9820 0 673.1 3.5 0.8542

20 0 7654 19.6 0.9373 0 3555.5 3.9 0.9809 0 673.1 3.5 0.8535

30 0 7601.5 18.5 0.9363 0 3169.5 3.1 0.9792 0 652 2.6 0.8535

40 0 7571.1 17.7 0.9355 0 3159.1 2.9 0.9791 0 676.1 2.2 0.8530

50 0 7528.5 17.3 0.9348 0 3108 2.7 0.9774 0 551.7 1.2 0.8527

tacks was high when the total number of feature com-

binations was small.

In addition, we have presented two different deci-

sion tree pruning methods. We found that, when the

number of leaf samples is less than some s, to obtain

stronger anonymization and high accuracy, it is better

to make the parent node of a leaf into a leaf than to

prune the leaf. We also conﬁrmed that although the

effect of reducing the number of records that can be

attacked using homogeneity attacks can be expected

to some degree, it is impossible to eliminate them en-

tirely.

Extensibility: The experimental results suggest

that the attacks against decision trees presented in this

paper can also be applied to extended decision tree

variants, e.g., randomized decision tree (Fan et al.,

2003). A randomized decision tree selects features

of x at random, creates multiple trees, and produces

a prediction result for each tree. This differs from a

conventional decision tree in that the prediction is de-

termined via a majority vote or the average value. The

randomized decision tree in the literature (Fan et al.,

2003) has a counter (UpdateStatistics). This in-

formation can be used to implement the attacks dis-

cussed in this paper. In addition, although there is

only a single tree in the decision tree structure, a ran-

domized decision tree utilizes multiple trees; thus, the

attacks described in this paper may work more effec-

tively.

Future Research Direction: The results of this

paper demonstrate that a vulnerability similar to

that of anonymization is inherent in decision trees.

Historically, anonymization has progressed from

pseudonymization to k-anonymization (Sweeney,

2002), l-diversity (Machanavajjhala et al., 2006), and

t-closeness (Li et al., 2007). Currently, membership

privacy and differential privacy are attracting increas-

ing attention (Blum et al., 2005; Fletcher and Islam,

2017; Fletcher and Islam, 2019; Friedman and Schus-

ter, 2010; Nojima and Wang, 2023; Patil and Singh,

2014; Stadler et al., 2022); thus, decision trees that

satisfy differential privacy while maintaining sufﬁ-

cient accuracy will be required in the future.

ACKNOWLEDGEMENTS

This work was supported in part by JST CREST Grant

Number JPMJCR21M1, and JSPS KAKENHI Grant

Number JP20K11826, Japan.

REFERENCES

Bai, X., Yao, J., Yuan, M., Deng, K., Xie, X., and Guan,

H. (2017). Embedding differential privacy in decision

tree algorithm with different depths. Sci. China Inf.

Sci., 60(8):082104:1–082104:15.

Security Evaluation of Decision Tree Meets Data Anonymization

859

Becker, B. and Kohavi, R. (1996). Adult.

UCI Machine Learning Repository. DOI:

https://doi.org/10.24432/C5XW20.

Blum, A., Dwork, C., McSherry, F., and Nissim, K.

(2005). Practical privacy: the SuLQ framework. In

Li, C., editor, Proceedings of the Twenty-fourth ACM

SIGACT-SIGMOD-SIGART Symposium on Principles

of Database Systems, June 13-15, 2005, Baltimore,

Maryland, USA, pages 128–138. ACM.

Fan, W., Wang, H., Yu, P. S., and Ma, S. (2003). Is ran-

dom model better? on its accuracy and efﬁciency. In

Proceedings of the 3rd IEEE International Conference

on Data Mining (ICDM 2003), 19-22 December 2003,

Melbourne, Florida, USA, pages 51–58. IEEE Com-

puter Society.

Fletcher, S. and Islam, M. Z. (2017). Differentially private

random decision forests using smooth sensitivity. Ex-

pert Syst. Appl., 78:16–31.

Fletcher, S. and Islam, M. Z. (2019). Decision Tree Clas-

siﬁcation with Differential Privacy: A survey. ACM

Comput. Surv., 52(4):83:1–83:33.

Friedman, A. and Schuster, A. (2010). Data mining with

differential privacy. In Rao, B., Krishnapuram, B.,

Tomkins, A., and Yang, Q., editors, Proceedings of

the 16th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, Washington,

DC, USA, July 25-28, 2010, pages 493–502. ACM.

Jagannathan, G., Pillaipakkamnatt, K., and Wright, R. N.

(2012). A practical differentially private random deci-

sion tree classiﬁer. Trans. Data Priv., 5(1):273–295.

Li, N., Li, T., and Venkatasubramanian, S. (2007). t-

Closeness: Privacy beyond k-Anonymity and l-

Diversity. In Chirkova, R., Dogac, A.,

Ozsu, M. T.,

and Sellis, T. K., editors, Proceedings of the 23rd In-

ternational Conference on Data Engineering, ICDE

2007, The Marmara Hotel, Istanbul, Turkey, April 15-

20, 2007, pages 106–115. IEEE Computer Society.

Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasub-

ramaniam, M. (2006). l-Diversity: Privacy Beyond

k-Anonymity. In Liu, L., Reuter, A., Whang, K., and

Zhang, J., editors, Proceedings of the 22nd Interna-

tional Conference on Data Engineering, ICDE 2006,

3-8 April 2006, Atlanta, GA, USA, page 24. IEEE

Computer Society.

Mahdi Navaei. Bank Personal Loan Modelling.

https://www.kaggle.com/datasets/ngnnguynthkim/bank-

personal-loan-modellingcsv.

Nojima, R. and Wang, L. (2023). Differential private (ran-

dom) decision tree without adding noise. In Luo, B.,

Cheng, L., Wu, Z., Li, H., and Li, C., editors, Neu-

ral Information Processing - 30th International Con-

ference, ICONIP 2023, Changsha, China, November

20-23, 2023, Proceedings, Part IX, volume 1963 of

Communications in Computer and Information Sci-

ence, pages 162–174. Springer.

Patil, A. and Singh, S. (2014). Differential private ran-

dom forest. In 2014 International Conference on Ad-

vances in Computing, Communications and Informat-

ics, ICACCI 2014, Delhi, India, September 24-27,

2014, pages 2623–2630. IEEE.

Rajkovic, V. (1997). Nursery. UCI Machine Learning

Repository. DOI: https://doi.org/10.24432/C5P88W.

Slijepcevic, D., Henzl, M., Klausner, L. D., Dam, T., Kiese-

berg, P., and Zeppelzauer, M. (2021). k-anonymity

in practice: How generalisation and suppression af-

fect machine learning classiﬁers. Comput. Secur.,

111:102488.

Stadler, T., Oprisanu, B., and Troncoso, C. (2022). Syn-

thetic Data - Anonymisation Groundhog day. In But-

ler, K. R. B. and Thomas, K., editors, 31st USENIX

Security Symposium, USENIX Security 2022, Boston,

MA, USA, August 10-12, 2022, pages 1451–1468.

USENIX Association.

Sweeney, L. (2002). k-anonymity: A model for protecting

privacy. Int. J. Uncertain. Fuzziness Knowl. Based

Syst., 10(5):557–570.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

860