PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research

Moad Hani, Nacim Betrouni, Saïd Mahmoudi, Mohammed Benjelloun

2025

Abstract

: Longitudinal datasets like the Parkinson’s Progression Markers Initiative (PPMI) face critical challenges from missing data and privacy constraints. This paper introduces PPMI-Benchmark, the first comprehensive framework evaluating 12 imputation methods and 6 synthetic data generation techniques across clinical, demographic, and biomarker variables in Parkinson’s disease research. We implement advanced methods including HyperImpute (ensemble optimization), VaDER (variational deep embedding), and conditional tabular GANs (CTGAN), evaluating them through novel metrics integrating sliced Wasserstein distance (dSW = 0.039 ± 0.012), temporal consistency analysis, and clinical validity constraints. Our results demonstrate HyperImpute’s superiority in imputation accuracy (MAE=5.16 vs. 5.19–5.57 for baselines), while CTGAN achieves optimal distribution fidelity (SWD=0.039 vs. 0.062–0.146). Crucially, we reveal persistent demographic biases in cognitive scores, with age-related imputation errors increasing by 23% for patients over 70, and propose mitigation strategies. The framework provides actionable guidelines for selecting data completion strategies based on missingness patterns (MCAR/MAR/MNAR), computational constraints, and clinical objectives, advancing reproducibility and fairness in neurodegenerative disease research. Validated on 1,483 PPMI participants, our work addresses emerging needs in healthcare AI governance and synthetic data interoperability for multi-center collaborations.

Download


Paper Citation


in Harvard Style

Hani M., Betrouni N., Mahmoudi S. and Benjelloun M. (2025). PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 246-259. DOI: 10.5220/0013649700003967


in Bibtex Style

@conference{data25,
author={Moad Hani and Nacim Betrouni and Saïd Mahmoudi and Mohammed Benjelloun},
title={PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={246-259},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013649700003967},
isbn={978-989-758-758-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - PPMI-Benchmark: A Dual Evaluation Framework for Imputation and Synthetic Data Generation in Longitudinal Parkinson's Disease Research
SN - 978-989-758-758-0
AU - Hani M.
AU - Betrouni N.
AU - Mahmoudi S.
AU - Benjelloun M.
PY - 2025
SP - 246
EP - 259
DO - 10.5220/0013649700003967
PB - SciTePress