Optimal Noise Injection on Training Data: A Defense Against

Membership Inference Attacks

Radia Kassa

1,2

, Kamel Adi

and Myria Bouhaddi

Laboratoire LITAN,

Ecole sup

erieure en Sciences et Technologies de l’Informatique et du Num

erique,

RN 75, Amizour 06300, Bejaia, Algeria

Computer Security Research Laboratory, University of Quebec in Outaouais, Gatineau, Quebec, Canada

Keywords:

Membership Inference Attacks, Data Privacy, Machine Learning, Defense Mechanism, Optimal Noise

Injection, Prediction Entropy, Black-Box Defense, Optimized Noise, Shapley Values.

Abstract:

Membership inference attacks (MIAs) present a serious risk to data privacy in machine learning (ML) models,

as they allow attackers to determine whether a given data point was included in the training set. Although

various defenses exist, they often struggle to effectively balance privacy and utility. To address this challenge,

we propose in this paper a novel defense mechanism based on Optimal Noise Injection during the training

phase. Our approach involves injecting a carefully designed and controlled noise vector into each training

sample. This optimization maximizes prediction entropy to obscure membership signals while leveraging

Shapley values to preserve data utility. Experiments on benchmark datasets show that our method reduces

MIA success rates signiﬁcantly without sacriﬁcing accuracy, offering a strong privacy-utility trade-off for

black-box scenarios.

1 INTRODUCTION

Deep learning has signiﬁcantly transformed the ﬁeld

of artiﬁcial intelligence, achieving remarkable per-

formance in tasks such as autonomous driving, natu-

ral language processing, and medical diagnostics. Its

ability to automatically extract complex features from

large datasets has made it essential for sensitive appli-

cations in areas such as edge computing, ﬁnance, and

healthcare. However, this success relies on access to

vast amounts of often sensitive or private data, expos-

ing these models to considerable privacy risks. In-

deed, many studies have shown that Machine Learn-

ing (ML) models can inadvertently memorize sensi-

tive information from training data, primarily due to

overﬁtting. This memorization makes models vulner-

able to sophisticated attacks aimed at extracting con-

ﬁdential information. One of the most critical threats

is Membership Inference Attacks (MIAs), which en-

able adversaries to infer whether a speciﬁc data point

was included in a model’s training set by analyzing

subtle cues in its outputs, such as prediction vec-

tors. MIAs exploit the fact that models often pro-

duce overconﬁdent predictions for training samples

compared to unseen data. This difference in conﬁ-

dence creates a vulnerability that attackers can use to

distinguish between training set members and non-

members. Authors in (Shokri et al., 2017) were the

ﬁrst to demonstrate the vulnerability of widely used

Machine Learning as a Service (MLaaS) platforms,

such as the Google Prediction API and Amazon ML,

to membership inference attacks (MIAs). Their work

highlighted how easily attackers could extract infor-

mation about training data using only model outputs.

Since then, more sophisticated variants of MIAs have

been proposed to target different model architectures

and data types. In response to these attacks, vari-

ous defense mechanisms have been developed, gener-

ally falling into two categories: provable defenses and

empirical defenses. Provable defenses are grounded

in formal privacy guarantees, most notably through

differential privacy (DP). Although such approaches

provide strong theoretical protection, they often lead

to signiﬁcant accuracy degradation. Empirical de-

fenses, on the other hand, aim to protect privacy while

maintaining high model accuracy. These methods

focus on obfuscating the signals that MIAs exploit,

without providing formal privacy guarantees. Among

the commonly used strategies are regularization tech-

niques, masking of conﬁdence scores, and knowledge

distillation. However, despite their apparent effec-

tiveness, these defenses have signiﬁcant limitations

Kassa, R., Adi, K., Bouhaddi and M.

Optimal Noise Injection on Training Data: A Defense Against Membership Inference Attacks.

DOI: 10.5220/0013639300003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 531-538

ISBN: 978-989-758-760-3; ISSN: 2184-7711

531

against sophisticated attacks and do not always en-

sure an effective trade-off between privacy and utility.

This trade-off highlights the urgent need for new de-

fense mechanisms capable of balancing privacy and

utility effectively without degrading model perfor-

mance.

To achieve this goal, we propose a novel defense

mechanism based on optimal noise injection. The

main idea is to inject a carefully designed and con-

trolled noise vector into each training sample during

the training phase. This noise is optimized through

the following mechanisms:

• Maximizing prediction entropy to minimize

the conﬁdence gap between member and non-

member samples.

• Leveraging Shapley values to guide feature-

adaptive noise injection.

• Enhancing local robustness to mitigate misclassi-

ﬁcations of inputs near the decision boundary.

In contrast to existing methods that apply noise

uniformly, our defense strategy adaptively perturbs

inputs according to feature inﬂuence, minimizing per-

turbations on salient features to preserve accuracy,

while amplifying them on less critical features to in-

troduce targeted uncertainty.

We evaluated our defense on the Purchase100 and

Texas100 datasets, demonstrating a notable reduction

in black-box MIA success rates and achieving a su-

perior privacy-utility trade-off. These results con-

ﬁrm the effectiveness of optimal noise injection for

robust and practical protection in privacy-preserving

machine learning, while effectively addressing the

limitations of existing approaches.

In summary, the key contributions of this paper are

as follows.

• We propose a new defense mechanism based on

optimal noise injection, which involves injecting

a carefully designed and optimized noise vec-

tor into each training sample during the training

phase.

• We show that our defense effectively mitigates

black-box MIAs by confounding the attacker’s in-

ference classiﬁer into a state of uncertainty, while

still achieving a favorable trade-off between pri-

vacy and utility.

2 RELATED WORK

We provide a comprehensive overview of Member-

ship Inference Attacks and the corresponding defense

mechanisms.

2.1 Membership Inference Attacks

MIAs attacks can target all types of ML model de-

ployments, including black-box scenarios, where the

attacker has access only to the target model’s predic-

tion vectors, without any knowledge of its internal pa-

rameters. (Shokri et al., 2017) introduced one of the

ﬁrst black-box MIAs against ML models. Their ap-

proach involves building multiple shadow models to

mimic the behavior of the target model, followed by

training an attack model using the prediction vectors

generated by these shadow models. The attack model

is then used to infer whether a given input sample is

a member of the target model’s training set. (Salem

et al., 2018) later relaxed many of the assumptions

of (Shokri et al., 2017) and demonstrated that even

a single shadow model can be sufﬁcient to mount an

effective MIA. Subsequently, a new class of metric-

based membership inference attacks (MIAs) was pro-

posed. These attacks determine membership by com-

puting speciﬁc metrics (prediction correctness, pre-

diction entropy, prediction conﬁdence, or loss) on the

model’s output for a given sample and comparing the

result to a predeﬁned threshold. This approach was

ﬁrst introduced by (Yeom et al., 2018) and further ex-

plored by (Salem et al., 2018) and (Song et al., 2019).

Later, (Song and Mittal, 2021) summarized and ex-

tended these works, and introduced a state-of-the-

art metric-based attack called the privacy risk score.

Other studies, such as (Choquette-Choo et al., 2021),

investigated variants of MIAs that rely solely on pre-

dicted labels, without access to conﬁdence scores.

By leveraging input perturbation techniques, these at-

tacks exploit the observation that the predicted class

of a member instance is generally more resistant to

changes than that of a non-member, as member pre-

dictions tend to be more stable. Furthermore, Carlini

et al. (Carlini et al., 2022) introduced LiRA (Likeli-

hood Ratio Attack), which employs a likelihood ra-

tio test to compare a model’s outputs when a sample

is included in the training set versus when it is ex-

cluded. They also proposed evaluating attack effec-

tiveness using the True Positive Rate (TPR) at very

low False Positive Rates (FPR), a robust metric that

has since become a standard benchmark in member-

ship inference research.

2.2 Defenses Against MIAs

Several defense mechanisms have been proposed to

mitigate the risks of MIAs by obscuring statistical dif-

ferences between members and non-members. These

include differential privacy, conﬁdence score mask-

ing, regularization, and knowledge distillation, each

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

532

offering a distinct privacy-utility trade-off.

Differential Privacy: involves adding noise dur-

ing training to prevent disclosure of member-speciﬁc

information. Some approaches implement DP by

adding noise to the model’s objective function (Wang

et al., 2017), while others apply it directly to gradi-

ents during optimization (Abadi et al., 2016; Pichap-

ati et al., 2019). Despite offering strong privacy guar-

antees, it results in a signiﬁcant degradation of model

accuracy,

Conﬁdence Score Masking: is another technique

aimed at reducing the amount of information leaked

through a model’s predictions, thereby hindering

membership inference. One variant limits outputs to

the top-k predicted classes (Shokri et al., 2017) rather

than the complete conﬁdence vector. Another strategy

perturbs the output probabilities by injecting noise to

mislead attackers. For instance, (Bouhaddi and Adi,

2023; Jia et al., 2019), adds carefully crafted adver-

sarial noise to the prediction vectors applied after the

model’s computation to preserve accuracy. However,

(Song and Mittal, 2021) showed that even with such

output perturbations, models can remain vulnerable to

label-only attacks and metric-based attacks.

Regularization Techniques: aim to mitigate model

overﬁtting and improve generalization capabilities.

Many works (Leino and Fredrikson, 2020; Salem

et al., 2018; Shokri et al., 2017) have demonstrated

that overﬁtting is a key factor contributing to the ef-

fectiveness of MIAs. In this context, (Shokri et al.,

2017) highlighted that regularization of L1 and L2

can effectively reduce the success rate of MIA. Sub-

sequently, various regularization methods have been

explored to counter these attacks, including dropout

(Srivastava et al., 2014), model stacking (Salem et al.,

2018), and early stopping (Song and Mittal, 2021).

(Nasr et al., 2018) introduced Adversarial Regular-

ization, a min-max optimization approach that incor-

porates an adversarial regularizer into the loss func-

tion to jointly minimize prediction loss and maximize

membership privacy. Furthermore, (Li et al., 2021)

introduced a regularization method that combines

Mixup with Maximum Mean Discrepancy (MMD).

This technique interpolates between pairs of training

samples and aligns the prediction output distributions

of members and non-members using MMD.

Knowledge Distillation: is a technique that leverages

the output of a teacher model to train a student model,

which is then made public. The objective is to enable

the student model to achieve accuracy close to that of

the teacher model, without directly exposing sensitive

data. (Shejwalkar and Houmansadr, 2021) proposed

distillation for membership privacy, which uses pub-

lic unlabeled datasets to train a student model with

soft labels generated by a teacher model. However,

its effectiveness is limited by the availability of suit-

able public datasets. To overcome this, complemen-

tary knowledge distillation and pseudo complemen-

tary knowledge distillation (Zheng et al., 2021), along

with knowledge cross-distillation (Chourasia et al.,

2021), perform knowledge distillation directly using

private data. Additionally, two variants based on self-

distillation have been introduced, namely SELENA

(Tang et al., 2022) and SEDMA (Nakai et al., 2024),

enable a single model to improve its performance by

reusing its own predictions as pseudo-labels during

training.

Despite the progress made, current defenses

against MIAs still have limitations, and the privacy-

utility trade-off remains a major challenge. It is

therefore essential to develop more adaptive and

lightweight defenses capable of ensuring an optimal

privacy-utility trade-off.

3 PRELIMINARIES AND

PROBLEM FORMULATION

In this section, we present the key concepts together

with their respective notations used in this paper, and

then describe the threat model adopted for our study.

3.1 Preliminaries and Notation

Supervised ML. In this paper, we study supervised

ML for classiﬁcation tasks. We consider a classi-

ﬁcation model, denoted by f : R

→ R

, where

x = (x

, . . . , x

) ∈ R

represents an input feature vec-

tor and f (x) ∈ R

corresponds to a predicted prob-

ability distribution over the k possible classes. The

model, parameterized by θ, is trained on a dataset

= {(x

(n)

, y

(n)

)}

n=1

, where x

(n)

∈ R

denotes an

input feature vector and y

(

n) is the corresponding

ground truth label. The training objective is to mini-

mize the average prediction loss over D

min

∑

n=1



(n)

), y

(n)



(1)

Where |D

| denotes the size of the training set and

L is the prediction loss function (e.g., cross-entropy

loss) measuring the discrepancy between the model’s

output and the ground truth labels. The model’s out-

put f (x) ∈ R

satisﬁes

∑

j=1

f (x)

= 1, meaning it rep-

resents a valid probability distribution over the k pos-

sible classes. The predicted label is then obtained as

ˆy = argmax

f (x)

Optimal Noise Injection on Training Data: A Defense Against Membership Inference Attacks

533

Prediction Entropy: measures the uncertainty of a

model with respect to its predictions. Low entropy in-

dicates that the model is highly conﬁdent, while high

entropy implies that the model is uncertain.

Let f (x) =



, . . . , y



be the model predic-

tion vector for a sample x. The prediction entropy



f (x)) is then calculated as:



f (x)



= −

∑

j=1

log





(2)

Shapley Values: are used to assess the importance of

each feature x

in a model’s prediction by quantifying

its average marginal contribution across all possible

subsets of features. The Shapley value ϕ

(x) of a fea-

ture x

is deﬁned as:

(x) =

∑

S⊆N\{i}

|S|!(|N| − |S| −1)!

|N|!



f (x

S∪{i}

) − f (x

)



(3)

Where N represents the set of all features in the

model, deﬁned as N = {1, 2, ..., d}. The subset S

is a selection of features that exclude x

, denoted

S ⊆ N \ {i}. The function f (x

) corresponds to the

prediction of the model using only the features of S,

while f (x

S∪{i}

) represents the prediction after includ-

ing the feature x

. The difference f (x

S∪{i}

) − f (x

)

quantiﬁes the marginal contribution of x

to the pre-

diction of the model. Finally, the weighting factor

|S|!(|N|−|S|−1)!

|N|!

ensures a fair evaluation of the contri-

bution of each feature in all possible subsets.

A high Shapley value for x

indicates a strong in-

ﬂuence on the prediction of the model, whereas a low

value suggests a minor contribution.

3.2 Threat Model

Adversarial Capabilities. We consider an adversary

with black-box access to the target model f , meaning

they can query the model via a prediction API and

receive the corresponding probability vectors. The

adversary’s objective is to carry out an MIA attack

(Salem et al., 2018; Shokri et al., 2017) to determine

whether a given sample x was part of the model’s

training dataset. By issuing multiple queries, the ad-

versary gathers sufﬁcient information to train a bi-

nary classiﬁer, referred to as the ”attack classiﬁer” A,

which predicts the membership status of a sample us-

ing conﬁdence scores. Formally, the attack classiﬁer

A is deﬁned as follows:

A(x, f (x)) → [0, 1] (4)

A takes as input a sample x and its prediction vec-

tor f (x), then outputs a probability indicating whether

x belongs to the training set. A value close to 0 sug-

gests that x is a non-member, whereas a value close to

1 indicates a high likelihood of membership.

Adversarial Knowledge. We assume that the adver-

sary does not have access to the internal details of the

target model, such as its parameters, weights, or the

full training dataset, except for a small subset of sam-

ples that can be leveraged to perform an MIA attack.

However, it is assumed that the adversary is aware

of the model architecture and the deployed defense

mechanism. These assumptions allow us to consider

powerful adversaries and evaluate the robustness of

our approach against advanced threats.

4 PROPOSED DEFENSE

MECHANISM

The proposed defense mechanism aims to mitigate

MIA by introducing a carefully controlled noise per-

turbation η into training samples during the training

phase. The goal is to obscure statistical patterns that

an attacker might exploit while ensuring that the mod-

iﬁed samples remain useful for classiﬁcation. This

requires balancing security and utility through well-

deﬁned constraints that guide the noise injection pro-

cess. An overview of our defense mechanism is illus-

trated in Figure 1.

Figure 1: Overview of the Proposed Defense Mechanism

Against Black-Box Membership Inference Attacks.

The fundamental idea of our approach is to in-

troduce perturbations that increase the uncertainty in

the model’s predictions, thereby making it harder for

an attacker to infer membership information. At the

same time, the noise should be controlled so that it

does not signiﬁcantly degrade the model’s classiﬁca-

tion performance. To achieve this, we deﬁne two key

constraints: a security constraint, which ensures suf-

ﬁcient uncertainty, and a utility constraint, which pre-

serves the correctness of the classiﬁcation.

The security constraint ensures that noise in-

creases the entropy of the prediction of the model,

making it more difﬁcult for an attacker to detect

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

534

the membership status. The intuition behind this is

that low-entropy predictions provide clear conﬁdence

scores that can be exploited by an attacker, whereas

higher entropy introduces uncertainty, masking the

membership signal. We formalize this constraint as

follows.



f (x +η)



≥ τ (5)

where τ is a predeﬁned entropy threshold that ensures

a minimum level of uncertainty in the predictions of

the model.

Next, to preserve utility, we impose a constraint

that ensures that the perturbed sample remains classi-

ﬁed in the same category as the original. The intuition

here is that, while we want to introduce uncertainty

for an attacker, we must ensure that the model still

recognizes the sample correctly. This leads to the fol-

lowing constraint:

argmax

f (x +η)

= argmax

f (x)

(6)

where argmax returns the argument that produces the

maximum value of a function. This guarantees that

the perturbation does not change the decision bound-

ary in a way that causes misclassiﬁcation.

To reﬁne this constraint, we incorporate Shapley

values to control how noise is applied to different

features. Since not all features contribute equally

to the model prediction, perturbing critical features

too much could lead to misclassiﬁcation, while per-

turbing less important features can provide the neces-

sary uncertainty without affecting the accuracy. Let

η = (η

, . . . , η

) be the perturbation vector to be ap-

plied to x = (x

, . . . , x

), we formulate the constraint

as follows:

≤ ε ·



1 − β ·

(x)

max(ϕ

(x))



(7)

where ε is a noise budget, β is a weighting pa-

rameter, and ϕ

(x) represents the Shapley value of the

feature i. This ensures that highly inﬂuential features

receive minimal noise, preserving classiﬁcation accu-

racy while still introducing controlled uncertainty.

However, for samples located near the decision

boundary, small perturbations may still ﬂip the pre-

dicted class. To prevent this, we introduce an addi-

tional utility-preserving constraint based on the local

robustness of the model, which limits the overall mag-

nitude of the perturbation according to the local sen-

sitivity and conﬁdence margin of the model. Thus, we

formulate the constraint as follows:

∥η∥

≤

gap

(x)

∥∇ f (x)∥

(8)

where f

gap

(x) denotes the margin between the top

two predicted class scores, and ∥∇ f (x)∥

is the norm

of the input gradient, indicating the model’s sensitiv-

ity to perturbations. This constraint ensures that if the

model is highly sensitive or if the decision margin is

small, the injected noise remains small to prevent al-

tering the prediction. Critical features are minimally

perturbed, and the total noise remains within a safe

margin, preventing misclassiﬁcation.

Bringing everything together, our ﬁnal objective is

to minimize the magnitude of the perturbation while

satisfying both the security and utility constraints.

min

∥η∥

s.t. H



f (x +η)



≥ τ,

≤ ε ·



1 − β ·

(x)

max(ϕ

(x))



∀i = 1, ..., d,

∥η∥

≤

gap

(x)

∥∇ f (x)∥

(9)

This formulation ensures that noise remains mini-

mal while satisfying security and utility constraints,

striking a trade-off between privacy protection and

model performance.

5 EXPERIMENTATION

In this section, we present an experimental evalua-

tion of our proposed defense mechanism. We ﬁrst de-

scribe the experimental setup and the evaluation met-

rics used to assess the trade-off between privacy and

utility. Then, we analyze the experimental results to

demonstrate the effectiveness of our approach in mit-

igating MIAs.

5.1 Experimental Setup

Datasets. To evaluate the effectiveness of our pro-

posed defense mechanism, we conduct experiments

on two widely used benchmark datasets for MIAs:

Purchase100 and Texas100.

Purchase100. Consists of 197,324 customer pur-

chase records, each with 600 binary features indicat-

ing whether a speciﬁc item was purchased. The clas-

siﬁcation task is to predict customer shopping patterns

in 100 classes.

Texas100. Contains 67,330 hospital discharge

records, each with 6,170 binary features representing

the presence or absence of speciﬁc symptoms. The

goal is to predict the medical procedure assigned to

the patient among 100 classes.

Optimal Noise Injection on Training Data: A Defense Against Membership Inference Attacks

535

Target Model. We use a fully connected neural net-

work as the target model for the Purchase100 and

Texas100 datasets. The architecture includes four

hidden layers with sizes [1024, 512, 256, 128]. Each

hidden layer uses the ReLU activation function, while

the output layer applies a softmax function to predict

the probabilities of more than 100 classes. The mod-

els are trained using the Adam optimizer, with the

cross-entropy loss function, a learning rate of 0.001,

over 100 epochs. The datasets used in our experi-

ments are summarized in Table 1.

Table 1: Dataset splits used in our experiments: Train is

used to train the target model; Test, to evaluate its accu-

racy. Known denotes the subset of training data accessible

to the adversary for building the attack model. Target is

used to evaluate membership inference attacks and contains

an equal number of member and non-member samples.

Dataset Train Test Known Target

Purchase100 20,000 20,000 10,000 10,000

Texas100 10,000 10,000 5,000 5,000

Inference Attack Model. In our evaluation, we adopt

a black-box MIA setup where a shadow model is

trained on part of the target model’s training data and

non-member samples from the same distribution. The

purpose of this model is to generate outputs for train-

ing an attack model.

The attack model consists of three fully con-

nected subnets, each operating on the prediction vec-

tor, the one-hot encoded label, and their concatena-

tion. Each subnetwork uses a ReLU activation func-

tion, with weights initialized from a normal distribu-

tion N (0, 0.01) and biases initialized to zero. The

model is trained using the Adam optimizer, learning

rate of 0.001 for 100 epochs, with the cross-entropy

loss function. The ﬁnal output is a membership prob-

ability that indicates the likelihood that a given sam-

ple belongs to the target model’s training data.

Defense Model. Our defense model adopts the same

architecture as the target model. However, rather of

being trained directly on the original data, it is trained

on a noisy dataset, generated by applying our pro-

posed feature-adaptive noise injection mechanism to

each input sample. Speciﬁcally, for each training in-

stance x, a noise vector η is computed under security

and utility constraints, and added to generate a per-

turbed input x

′

= x + η. Each perturbed input x

′

associated with a soft label y

′

= f (x

′

), representing

the probability distribution output by the target model

when evaluated on the noisy sample. Simultaneously,

the original hard label y is retained to ensure that clas-

siﬁcation performance is preserved.

To train the defense model, we use a combined

loss function, which includes two components: the

cross-entropy loss (CE) and the Kullback–Leibler

(KL) divergence. The loss function is deﬁned as fol-

lows:

L(x

′

) = α · KL



′

) ∥ y

′



+ (1 − α) · CE



′

), y



where α ∈ [0, 1] is a hyperparameter that balances

the trade-off between privacy and utility. The defense

model is trained using the Adam optimizer, with a

learning rate of 0.001 for 100 epochs.

Evaluation Metrics. To evaluate our defense, we use

four key metrics: Inference Accuracy and attack AUC

(Area Under the ROC Curve) to evaluate privacy pro-

tection, where values near 0.5 indicate a strong de-

fense against MIAs, and Test Accuracy and General-

ization Gap to measure model utility and generaliza-

tion. These metrics together provide a comprehensive

understanding of the privacy-utility trade-off.

5.2 Experimental Results

In this section, we evaluate the effectiveness of our

proposed mechanism against black-box MIAs. To

this end, we conducted a comparative study involv-

ing three models under the same training and attack

conﬁgurations: an undefended model, trained with-

out any privacy mechanism and serving as a baseline;

a uniform noise defense model, in which a ﬁxed per-

turbation is applied equally to all training samples re-

gardless of feature importance; and our adaptive noise

defense, which injects feature-wise optimized noise

guided by utility and security constraints.

To assess the utility of each model, we track their

classiﬁcation performance on unseen data using test

accuracy over training epochs. As illustrated in Fig-

ure 2, the adaptive noise defense achieves test accu-

racy close to that of the undefended model and con-

sistently outperforms the uniform noise defense. This

indicates that our method preserves classiﬁcation per-

formance by selectively perturbing features in a way

that minimizes the impact on critical decision compo-

nents.

Next, we examine the privacy protection offered

by each model using Attack AUC, which measures

the effectiveness of the MIA across all possible de-

cision thresholds. As shown in Figure 3, the unde-

fended model yields high AUC values, which con-

ﬁrms its vulnerability. The uniform noise defense

offers limited mitigation, whereas our adaptive noise

defense signiﬁcantly reduces the AUC, approaching

the ideal baseline value of 0.5, which corresponds

to random guessing and thus provides strong privacy

protection.

Finally, to understand the privacy-utility trade-off,

we analyze the relationship between inference attack

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

536

accuracy and the generalization gap. As presented in

Figure 4, the undefended model exhibits both a high

generalization gap and high inference accuracy, indi-

cating strong overﬁtting and exposure to MIAs. In

contrast, our adaptive noise defense reduces both the

gap and the success of the attack, suggesting that low-

ering the overﬁtting improves privacy while retaining

generalization. The uniform noise defense falls be-

tween the two, with moderate performance on both

fronts.

These results conﬁrm that our adaptive noise in-

jection method provides a balanced defense by safe-

guarding sensitive membership information while

maintaining the classiﬁcation utility of the model.

Figure 2: Test Accuracy over Epochs for Different Defense

Strategies.

Figure 3: Attack AUC Under Different Defenses.

Figure 4: Privacy-Utility Trade-off: Inference Accuracy Vs.

Generalization Gap.

6 CONCLUSION

In this work, we introduced a novel defensive ap-

proach against black-box MIAs, based on an opti-

mized and feature-adaptive noise injection mecha-

nism. The originality of our approach lies in its ability

to adjust the injected perturbation for each feature ac-

cording to its inﬂuence on the decision of the model,

as measured by the Shapley values.

Our method is built upon two key constraints. The

ﬁrst is a utility constraint, guided by Shapley values

and local robustness, which aims to preserve the most

inﬂuential features and prevent misclassiﬁcation. The

second is a security constraint that increases the pre-

diction entropy to introduce controlled uncertainty.

Together, these constraints strike a balance between

maintaining model accuracy and mitigating the risk

of membership inference, thereby enhancing privacy

protection.

Our experimental results demonstrate that the pro-

posed method achieves a favorable trade-off between

privacy and utility, effectively reducing the risk of

membership inference without substantially degrad-

ing classiﬁcation performance. These ﬁndings high-

light the potential of combining utility-aware and

entropy-based constraints to enhance privacy in ma-

chine learning models.

Future work will focus on reﬁning these con-

straints, particularly through the development of more

adaptive noise injection strategies. We also aim to ex-

tend our approach to other threat models, including

white-box attacks, and to evaluate its effectiveness on

more complex architectures and across diverse appli-

cation domains.

REFERENCES

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B.,

Mironov, I., Talwar, K., and Zhang, L. (2016). Deep

learning with differential privacy. In Proceedings of

the 2016 ACM SIGSAC Conference on Computer and

Communications Security, pages 308–318.

Bouhaddi, M. and Adi, K. (2023). Mitigating membership

inference attacks in machine learning as a service. In

IEEE International Conference on Cyber Security and

Resilience, CSR 2023, Venice, Italy, July 31 - Aug. 2,

2023, pages 262–268. IEEE.

Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., and

Tram

er, F. (2022). Membership inference attacks from

ﬁrst principles. In 2022 IEEE Symposium on Security

and Privacy (SP), pages 1897–1914. IEEE.

Choquette-Choo, C. A., Tramer, F., Carlini, N., and Pa-

pernot, N. (2021). Label-only membership inference

attacks. In Proceedings of the 38th International

Conference on Machine Learning, volume 139, pages

1964–1974. PMLR.

Chourasia, R., Enkhtaivan, B., Ito, K., Mori, J., Teran-

ishi, I., and Tsuchida, H. (2021). Knowledge cross-

distillation for membership privacy. arXiv preprint,

available at https://arxiv.org/abs/2111.01363.

Jia, J., Salem, A., Backes, M., Zhang, Y., and Gong, N. Z.

(2019). Memguard: Defending against black-box

membership inference attacks via adversarial exam-

ples. In Proceedings of the 2019 ACM SIGSAC Con-

Optimal Noise Injection on Training Data: A Defense Against Membership Inference Attacks

537

ference on Computer and Communications Security,

pages 259–274.

Leino, K. and Fredrikson, M. (2020). Stolen memories:

Leveraging model memorization for calibrated White-

Box membership inference. In Proceedings of the

29th USENIX Security Symposium (USENIX Security

20), pages 1605–1622.

Li, J., Li, N., and Ribeiro, B. (2021). Membership infer-

ence attacks and defenses in classiﬁcation models. In

Proceedings of the Eleventh ACM Conference on Data

and Application Security and Privacy, pages 5–16.

Nakai, T., Wang, Y., Yoshida, K., and Fujino, T. (2024).

Sedma: Self-distillation with model aggregation for

membership privacy. Proceedings on Privacy En-

hancing Technologies.

Nasr, M., Shokri, R., and Houmansadr, A. (2018). Ma-

chine learning with membership privacy using adver-

sarial regularization. In Proceedings of the 2018 ACM

SIGSAC Conference on Computer and Communica-

tions Security, pages 634–646.

Pichapati, V., Suresh, A. T., Yu, F. X., Reddi, S. J., and Ku-

mar, S. (2019). Adaclip: Adaptive clipping for private

sgd. arXiv preprint, available at https://arxiv.org/abs/

1908.07643.

Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M.,

and Backes, M. (2018). Ml-leaks: Model and data in-

dependent membership inference attacks and defenses

on machine learning models. arXiv preprint, available

at https://arxiv.org/abs/1806.01246.

Shejwalkar, V. and Houmansadr, A. (2021). Membership

privacy for machine learning models through knowl-

edge transfer. In Proceedings of the AAAI Conference

on Artiﬁcial Intelligence, pages 9549–9557.

Shokri, R., Stronati, M., Song, C., and Shmatikov, V.

(2017). Membership inference attacks against ma-

chine learning models. In 2017 IEEE symposium on

security and privacy (SP), pages 3–18. IEEE.

Song, L. and Mittal, P. (2021). Systematic evaluation of

privacy risks of machine learning models. In 30th

USENIX Security Symposium (USENIX Security 21),

pages 2615–2632. USENIX.

Song, L., Shokri, R., and Mittal, P. (2019). Privacy risks of

securing machine learning models against adversarial

examples. In Proceedings of the 2019 ACM SIGSAC

Conference on Computer and Communications Secu-

rity, pages 241–257. ACM.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,

and Salakhutdinov, R. (2014). Dropout: A simple way

to prevent neural networks from overﬁtting. The Jour-

nal of Machine Learning Research, 15(1):1929–1958.

Tang, X., Mahloujifar, S., Song, L., Shejwalkar, V., Nasr,

M., and Houmansadr, A. (2022). Mitigating mem-

bership inference attacks by Self-Distillation through

a novel ensemble architecture. In Proceedings of the

31st USENIX Security Symposium (USENIX Security

22), pages 1433–1450.

Wang, D., Ye, M., and Xu, J. (2017). Differentially private

empirical risk minimization revisited: Faster and more

general. In Advances in Neural Information Process-

ing Systems, volume 30.

Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. (2018).

Privacy risk in machine learning: Analyzing the con-

nection to overﬁtting. In 2018 IEEE 31st Computer

Security Foundations Symposium (CSF), pages 268–

282. IEEE.

Zheng, J., Cao, Y., and Wang, H. (2021). Resisting member-

ship inference attacks through knowledge distillation.

Neurocomputing, 452:114–126.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

538