cessible tools. Synthetic data generation is becoming
increasingly important across a variety of domains.
Facilitating a collaborative environment, where state-
of-the-art tools are available through a simple Python
interface, is crucial for fostering the growth of the
community, including the less technologically profi-
cient domains where there is a growing interest and
adoption of synthetic data.
The success of Synthesizers will largely depend
on the community taking an interest in the develop-
ment. Therefore, we encourage input and requests for
improvements and contributions from all sources. In-
tegration of other training and/or evaluation adapters
is of special importance to help the meta-framework
grow, and we look forward to collaborating with other
active research environments and users.
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,
M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,
G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-
lur, M., Levenberg, J., Man
´
e, D., Monga, R., Moore,
S., Murray, D., Olah, C., Schuster, M., Shlens, J.,
Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-
houcke, V., Vasudevan, V., Vi
´
egas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and
Zheng, X. (2015). TensorFlow: Large-scale machine
learning on heterogeneous systems. Software avail-
able from tensorflow.org.
Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G.
(2019). Neural spline flows. In Advances in Neural
Information Processing Systems 32: Annual Confer-
ence on Neural Information Processing Systems 2019,
NeurIPS 2019, December 8-14, 2019, Vancouver, BC,
Canada, pages 7509–7520.
Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., and
Rankin, D. (2022). Synthetic data generation for tab-
ular health records: A systematic review. Neurocom-
puting, 493:28–45.
Lautrup, A. D., Hyrup, T., Zimek, A., and Schneider-Kamp,
P. (2024). Syntheval: A framework for detailed util-
ity and privacy evaluation of tabular synthetic data.
Preprint at https://arxiv.org/abs/2404.15821. Code
available on GitHub v1.4.1.
Lightning AI (2024). Pytorch lightning. https://doi.org/10.
5281/zenodo.3530844.
Nowok, B., Raab, G. M., and Dibben, C. (2016). synthpop:
Bespoke creation of synthetic data in r. Journal of
Statistical Software, 74(11):1–26.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., K
¨
opf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
PyTorch: an imperative style, high-performance deep
learning library. In Proceedings of the 33rd Interna-
tional Conference on Neural Information Processing
Systems, Red Hook, NY, USA. Curran Associates Inc.
Ping, H., Stoyanovich, J., and Howe, B. (2017). Data-
synthesizer: Privacy-preserving synthetic datasets. In
Proceedings of the 29th International Conference on
Scientific and Statistical Database Management, SS-
DBM ’17, New York, NY, USA. Association for Com-
puting Machinery.
Qian, Z., Davis, R., and van der Schaar, M. (2023). Syn-
thcity: a benchmark framework for diverse use cases
of tabular synthetic data. In Advances in Neural Infor-
mation Processing Systems, volume 36, pages 3173–
3188. Curran Associates, Inc.
Strelcenia, E. and Prakoonwit, S. (2023). A survey on
gan techniques for data augmentation to address the
imbalanced data issues in credit card fraud detec-
tion. Machine Learning and Knowledge Extraction,
5(1):304–329.
van Breugel, B., Kyono, T., Berrevoets, J., and van der
Schaar, M. (2021). Decaf: Generating fair synthetic
data using causally-aware generative networks. In
Advances in Neural Information Processing Systems,
volume 34, pages 22221–22233. Curran Associates,
Inc.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,
Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,
M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S.,
Drame, M., Lhoest, Q., and Rush, A. (2020). Trans-
formers: State-of-the-art natural language processing.
In Proceedings of the 2020 Conference on Empiri-
cal Methods in Natural Language Processing: System
Demonstrations, pages 38–45, Online. Association for
Computational Linguistics.
Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veera-
machaneni, K. (2019). Modeling tabular data using
conditional GAN. In Advances in Neural Information
Processing Systems 32: Annual Conference on Neural
Information Processing Systems 2019, NeurIPS 2019,
December 8-14, 2019, Vancouver, BC, Canada, pages
7333–7343.
Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., and
Bennett, K. P. (2020). Generation and evaluation of
privacy preserving synthetic health data. Neurocom-
puting, 416:244–255.
Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney,
J., Mooney, S. D., and Malin, B. A. (2022). A mul-
tifaceted benchmarking of synthetic electronic health
record generation models. Nature Communications,
13(1).
Yoon, J., Drumright, L. N., and van der Schaar, M. (2020).
Anonymization through data synthesis using gener-
ative adversarial networks (ADS-GAN). IEEE J
Biomed Health Inform, 24(8):2378–2388.
Zhang, Z., Yan, C., and Malin, B. A. (2022). Membership
inference attacks against synthetic health data. Jour-
nal of Biomedical Informatics, 125:103977.
ICSOFT 2024 - 19th International Conference on Software Technologies
184