Evaluating Synthetic Data Generation Techniques for Medical Dataset

Takayuki Miura, Eizen Kimura, Atsunori Ichikawa, Masanobu Kii, Juko Yamamoto

2024

Abstract

Anticipation surrounds the use of real-world data for data analysis in medicine and healthcare, yet handling sensitive data demands ethical review and safety management, presenting bottlenecks in the swift progression of research. Consequently, numerous techniques have emerged for generating synthetic data, which preserves the features of the original data. Nonetheless, the quality of such synthetic data, particularly in the context of real-world data, has yet to be sufficiently examined. In this paper, we conduct experiments with a Diagonosis Procedure Combination (DPC) dataset to evaluate the quality of synthetic data generated by statistics-based, graphical model-based, and deep neural network-based methods. Further, we implement differential privacy for theoretical privacy protection and assess the resultant degradation of data quality. The findings indicate that a statistics-based method called Gaussian Copula and a graphical-model-based method called AIM yield high-quality synthetic data regarding statistical similarity and machine learning model performance. The paper also summarizes issues pertinent to the practical application of synthetic data derived from the experimental results.

Download


Paper Citation


in Harvard Style

Miura T., Kimura E., Ichikawa A., Kii M. and Yamamoto J. (2024). Evaluating Synthetic Data Generation Techniques for Medical Dataset. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF; ISBN 978-989-758-688-0, SciTePress, pages 315-322. DOI: 10.5220/0012314500003657


in Bibtex Style

@conference{healthinf24,
author={Takayuki Miura and Eizen Kimura and Atsunori Ichikawa and Masanobu Kii and Juko Yamamoto},
title={Evaluating Synthetic Data Generation Techniques for Medical Dataset},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF},
year={2024},
pages={315-322},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012314500003657},
isbn={978-989-758-688-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF
TI - Evaluating Synthetic Data Generation Techniques for Medical Dataset
SN - 978-989-758-688-0
AU - Miura T.
AU - Kimura E.
AU - Ichikawa A.
AU - Kii M.
AU - Yamamoto J.
PY - 2024
SP - 315
EP - 322
DO - 10.5220/0012314500003657
PB - SciTePress