
Table 3: This table compares the entropy values of real and synthetic datasets across different datasets. It includes the entropy
values of the real data alongside the entropy values of the synthetic data under various privacy budget parameters.
Dataset Real Entropy
Fake Entropy
Fake Entropy
Average
ε = 0.5 ε = 1 ε = 2 ε = 10 GAN
Adult 9.2969 10.3115 10.3636 10.3742 10.2718 10.3306 10.3303
Credit 4.6713 6.0720 6.0913 6.0595 6.0561 6.0619 6.0682
Mushroom 8.5522 8.5704 8.5642 8.5737 8.5565 8.5631 8.5656
Heart 4.5593 5.1202 5.1318 5.1892 5.1805 5.1609 5.1565
Bankruptcy 1.4395 8.7104 8.6884 8.6949 8.6709 8.2461 8.6021
Diabetes 4.5782 6.4472 6.4791 6.5277 6.5613 6.5278 6.5086
Additionally, these differences can be attributed
to the datasets’ nature, complexity, and feature
diversity. Simpler datasets like Mushroom, with
limited attributes and predictable distributions, make
it harder for GAN and DPGAN models to introduce
complexity, resulting in minimal entropy changes.
In contrast, complex datasets like Bankruptcy, with
heterogeneous features and less predictable patterns,
drive GAN models to generate more diverse synthetic
data, leading to higher entropy and better privacy
gain. Notably, Bankruptcy is the only dataset showing
a clear increase in entropy as ε decreases, likely due
to its structural complexity enabling greater diversity
under tighter privacy constraints. However, this
does not mean similar effects are absent in other
datasets, as entropy alone may not fully reflect privacy
improvements, especially in simpler or categorical
datasets.
6 CONCLUSIONS
In this paper, we explored the use of GANs and
Differentially Private GANs (DPGANs) based on the
Wasserstein distance to generate synthetic datasets
that balance utility and privacy for classifier training.
Comparing classifiers trained on real and synthetic
data, we found that although there is a slight
accuracy trade-off, synthetic data from GAN and
DPGAN effectively preserves privacy without major
performance loss. Additionally, higher entropy levels
in synthetic data reflect greater randomness and
diversity, making it harder to link synthetic samples
to real data and enhancing privacy protection.
REFERENCES
Alishahi, M. and Moghtadaiee, V. (2023). Collaborative
Private Classifiers Construction, pages 15–45.
Springer, Cham.
Alishahi, M. and Zannone, N. (2021). Not a free lunch, but a
cheap one: On classifiers performance on anonymized
datasets. In Data and Applications Security and
Privacy, volume 12840, pages 237–258. Springer.
Dat, P. T., Dutt, A., Pellerin, D., and Qu
´
enot, G.
(2019). Classifier training from a generative
model. In International Conference on Content-Based
Multimedia Indexing (CBMI), pages 1–6. IEEE.
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I.,
and Naor, M. (2006). Our data, ourselves: Privacy
via distributed noise generation. In Advances in
Cryptology–EUROCRYPT 2006, pages 486–503.
Ghosheh, G. O., Li, J., and Zhu, T. (2024). A survey
of generative adversarial networks for synthesizing
structured electronic health records. ACM Comput.
Surv., 56(6).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and
Bengio, Y. (2014). Generative adversarial nets. In
Advances in neural information processing systems,
pages 2672–2680.
Kim, S.-H., Jung, C., and Lee, Y.-J. (2016).
An entropy-based analytic model for the
privacy-preserving in open data. In International
Conference on Big Data, pages 3676–3684. IEEE.
Lopuha
¨
a-Zwakenberg, M., Alishahi, M., Kivits, J.,
Klarenbeek, J., van der Velde, G., and Zannone, N.
(2021). Comparing classifiers’ performance under
differential privacy. In International Conference on
Security and Cryptography, SECRYPT, pages 50–61.
Moghtadaiee, V., Alishahi, M., and Rabiei, M. (2025).
Differentially private gans for generating synthetic
indoor location data. International Journal of
Information Security, 24:1–21.
Rashid, H., Tanveer, M. A., and Khan, H. A. (2019).
Skin lesion classification using gan based data
augmentation. In International conference of the
IEEE engineering in medicine and biology society
(EMBC), pages 916–919. IEEE.
Sheikhalishahi, M. and Zannone, N. (2020). On the
comparison of classifiers’ construction over private
inputs. In International Conference on Trust, Security
and Privacy in Computing and Communications
(TrustCom), pages 691–698.
Xie, L., Lin, K., Wang, S., Wang, F., and Zhou, J. (2018).
Differentially private generative adversarial network.
Zhang, L., Shen, B., Barnawi, A., Xi, S., Kumar, N., and
Wu, Y. (2021). Feddpgan: federated differentially
private generative adversarial networks framework for
the detection of covid-19 pneumonia. Information
Systems Frontiers, 23(6):1403–1415.
SECRYPT 2025 - 22nd International Conference on Security and Cryptography
716