PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research
Moad Hani, Nacim Betrouni, Saïd Mahmoudi, Mohammed Benjelloun
2025
Abstract
: Longitudinal datasets like the Parkinson’s Progression Markers Initiative (PPMI) face critical challenges from missing data and privacy constraints. This paper introduces PPMI-Benchmark, the first comprehensive framework evaluating 12 imputation methods and 6 synthetic data generation techniques across clinical, demographic, and biomarker variables in Parkinson’s disease research. We implement advanced methods including HyperImpute (ensemble optimization), VaDER (variational deep embedding), and conditional tabular GANs (CTGAN), evaluating them through novel metrics integrating sliced Wasserstein distance (dSW = 0.039 ± 0.012), temporal consistency analysis, and clinical validity constraints. Our results demonstrate HyperImpute’s superiority in imputation accuracy (MAE=5.16 vs. 5.19–5.57 for baselines), while CTGAN achieves optimal distribution fidelity (SWD=0.039 vs. 0.062–0.146). Crucially, we reveal persistent demographic biases in cognitive scores, with age-related imputation errors increasing by 23% for patients over 70, and propose mitigation strategies. The framework provides actionable guidelines for selecting data completion strategies based on missingness patterns (MCAR/MAR/MNAR), computational constraints, and clinical objectives, advancing reproducibility and fairness in neurodegenerative disease research. Validated on 1,483 PPMI participants, our work addresses emerging needs in healthcare AI governance and synthetic data interoperability for multi-center collaborations.
DownloadPaper Citation
in Harvard Style
Hani M., Betrouni N., Mahmoudi S. and Benjelloun M. (2025). PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 246-259. DOI: 10.5220/0013649700003967
in Bibtex Style
@conference{data25,
author={Moad Hani and Nacim Betrouni and Saïd Mahmoudi and Mohammed Benjelloun},
title={PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={246-259},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013649700003967},
isbn={978-989-758-758-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research
SN - 978-989-758-758-0
AU - Hani M.
AU - Betrouni N.
AU - Mahmoudi S.
AU - Benjelloun M.
PY - 2025
SP - 246
EP - 259
DO - 10.5220/0013649700003967
PB - SciTePress