Authors:
Elif Özcan
1
;
2
;
Ruşen Akkuş Halepmollası
1
;
2
and
Yusuf Yaslan
2
Affiliations:
1
T ÜB˙ITAK Informatics and Information Security Research Center, Kocaeli, Turkey
;
2
Faculty of Computer and Informatics Engineering, Istanbul Technical University, Turkey
Keyword(s):
Finance, Synthetic Data, Federated Learning, Artificial Intelligence.
Abstract:
Financial services generate vast, complex and diverse datasets, yet data privacy issues pose significant challenges for secure usage and collaborative analysis. Synthetic data generation can offer an innovative solution while preserving privacy without exposing sensitive information. Also, federated learning enables collaborative model training across clients while maintaining data privacy. In this study, we used Default Credit Card dataset and employed diffusion based synthetic data generation to evaluate its impact on centralized and federated learning approaches. To this end, we offer comprehensive benchmarking of synthetic, real, and hybrid datasets by employing four machine learning classifiers both centrally and federated. Our findings demonstrate that synthetic data effectively improves results, especially when combined with real data. We also conduct client specific experiments in federated learning when addressing highly imbalanced or incomplete class distributions. Moreover
, we evaluate FedF1 aggregation method, which aims to improve global model performance by optimizing F1-score. To the best of our knowledge, this is the first study to integrate synthetic data generation and federated learning on a financial dataset to provide valuable insights for secure and collaborative learning.
(More)