
Target Model. We use a fully connected neural net-
work as the target model for the Purchase100 and
Texas100 datasets. The architecture includes four
hidden layers with sizes [1024, 512, 256, 128]. Each
hidden layer uses the ReLU activation function, while
the output layer applies a softmax function to predict
the probabilities of more than 100 classes. The mod-
els are trained using the Adam optimizer, with the
cross-entropy loss function, a learning rate of 0.001,
over 100 epochs. The datasets used in our experi-
ments are summarized in Table 1.
Table 1: Dataset splits used in our experiments: Train is
used to train the target model; Test, to evaluate its accu-
racy. Known denotes the subset of training data accessible
to the adversary for building the attack model. Target is
used to evaluate membership inference attacks and contains
an equal number of member and non-member samples.
Dataset Train Test Known Target
Purchase100 20,000 20,000 10,000 10,000
Texas100 10,000 10,000 5,000 5,000
Inference Attack Model. In our evaluation, we adopt
a black-box MIA setup where a shadow model is
trained on part of the target model’s training data and
non-member samples from the same distribution. The
purpose of this model is to generate outputs for train-
ing an attack model.
The attack model consists of three fully con-
nected subnets, each operating on the prediction vec-
tor, the one-hot encoded label, and their concatena-
tion. Each subnetwork uses a ReLU activation func-
tion, with weights initialized from a normal distribu-
tion N (0, 0.01) and biases initialized to zero. The
model is trained using the Adam optimizer, learning
rate of 0.001 for 100 epochs, with the cross-entropy
loss function. The final output is a membership prob-
ability that indicates the likelihood that a given sam-
ple belongs to the target model’s training data.
Defense Model. Our defense model adopts the same
architecture as the target model. However, rather of
being trained directly on the original data, it is trained
on a noisy dataset, generated by applying our pro-
posed feature-adaptive noise injection mechanism to
each input sample. Specifically, for each training in-
stance x, a noise vector η is computed under security
and utility constraints, and added to generate a per-
turbed input x
′
= x + η. Each perturbed input x
′
is
associated with a soft label y
′
= f (x
′
), representing
the probability distribution output by the target model
when evaluated on the noisy sample. Simultaneously,
the original hard label y is retained to ensure that clas-
sification performance is preserved.
To train the defense model, we use a combined
loss function, which includes two components: the
cross-entropy loss (CE) and the Kullback–Leibler
(KL) divergence. The loss function is defined as fol-
lows:
L(x
′
) = α · KL
f
′
(x
′
) ∥ y
′
+ (1 − α) · CE
f
′
(x
′
), y
where α ∈ [0, 1] is a hyperparameter that balances
the trade-off between privacy and utility. The defense
model is trained using the Adam optimizer, with a
learning rate of 0.001 for 100 epochs.
Evaluation Metrics. To evaluate our defense, we use
four key metrics: Inference Accuracy and attack AUC
(Area Under the ROC Curve) to evaluate privacy pro-
tection, where values near 0.5 indicate a strong de-
fense against MIAs, and Test Accuracy and General-
ization Gap to measure model utility and generaliza-
tion. These metrics together provide a comprehensive
understanding of the privacy-utility trade-off.
5.2 Experimental Results
In this section, we evaluate the effectiveness of our
proposed mechanism against black-box MIAs. To
this end, we conducted a comparative study involv-
ing three models under the same training and attack
configurations: an undefended model, trained with-
out any privacy mechanism and serving as a baseline;
a uniform noise defense model, in which a fixed per-
turbation is applied equally to all training samples re-
gardless of feature importance; and our adaptive noise
defense, which injects feature-wise optimized noise
guided by utility and security constraints.
To assess the utility of each model, we track their
classification performance on unseen data using test
accuracy over training epochs. As illustrated in Fig-
ure 2, the adaptive noise defense achieves test accu-
racy close to that of the undefended model and con-
sistently outperforms the uniform noise defense. This
indicates that our method preserves classification per-
formance by selectively perturbing features in a way
that minimizes the impact on critical decision compo-
nents.
Next, we examine the privacy protection offered
by each model using Attack AUC, which measures
the effectiveness of the MIA across all possible de-
cision thresholds. As shown in Figure 3, the unde-
fended model yields high AUC values, which con-
firms its vulnerability. The uniform noise defense
offers limited mitigation, whereas our adaptive noise
defense significantly reduces the AUC, approaching
the ideal baseline value of 0.5, which corresponds
to random guessing and thus provides strong privacy
protection.
Finally, to understand the privacy-utility trade-off,
we analyze the relationship between inference attack
SECRYPT 2025 - 22nd International Conference on Security and Cryptography
536