XPCA Gen: Extended PCA Based Tabular Data Generation Model

Sreekala Padinjarekkara, Jessica Alecci, Mirela Popa

2024

Abstract

The proposed method XPCA Gen, introduces a novel approach for synthetic tabular data generation by util-ising relevant patterns present in the data. This is performed using principle components obtained through XPCA (probabilistic interpretation of standard PCA) decomposition of original data. Since new data points are obtained by synthesizing the principle components, the generated data is an accurate and noise redundant representation of original data with a good diversity of data points. The experimental results obtained on benchmark datasets (e.g. CMC, PID) demonstrate performance in ML utility metrics (accuracy, precision, recall), showing its ability to capture inherent patterns in the dataset. Along with ML utility metrics, high Hausdorff distance indicates diversity in generated data without compromising statistical properties. Moreover, this is not a data hungry method like other complex neural networks. Overall, XPCA Gen emerges as a promising solution for data privacy preservation and robust model training with diverse samples.

Download


Paper Citation


in Harvard Style

Padinjarekkara S., Alecci J. and Popa M. (2024). XPCA Gen: Extended PCA Based Tabular Data Generation Model. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 141-151. DOI: 10.5220/0012568600003654


in Bibtex Style

@conference{icpram24,
author={Sreekala Padinjarekkara and Jessica Alecci and Mirela Popa},
title={XPCA Gen: Extended PCA Based Tabular Data Generation Model},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={141-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012568600003654},
isbn={978-989-758-684-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - XPCA Gen: Extended PCA Based Tabular Data Generation Model
SN - 978-989-758-684-2
AU - Padinjarekkara S.
AU - Alecci J.
AU - Popa M.
PY - 2024
SP - 141
EP - 151
DO - 10.5220/0012568600003654
PB - SciTePress