Authors:
Simin Yu
1
;
Victor Chang
1
;
Gia Linh Huỳnh
2
;
Vitor Jesus
3
and
Jiabin Luo
1
Affiliations:
1
Department of Business Analytics and Information Systems, Aston Business School, Aston University, Birmingham, U.K.
;
2
Becamex Business School, Eastern International University, Binh Duong, Vietnam
;
3
School of Computer Sci and Digital Tech, College of Engineering and Physical Sci, Aston University, Birmingham, U.K.
Keyword(s):
Money Laundering, Machine Learning, Data Imbalance, Oversampling, Undersampling.
Abstract:
The rapid growth of online transactions has increased convenience but also risks like money laundering, threatening financial systems. Financial institutions use machine learning to detect suspicious activities, but imbalanced datasets challenge algorithm performance. This study uses resampling techniques (SMOTE, ADASYN, Random Undersampling, NearMiss) and ensemble algorithms (XGBoost, CatBoost, Random Forest) on a simulated money laundering dataset provided by IBM (2023) to address this. Our findings reveal that each resampling technique offers unique advantages and trade-offs. CatBoost consistently outperforms XGBoost and Random Forest across sampling techniques, achieving the best balance between precision and recall while maintaining strong ROC curve scores. This strong performance could reduce the number of transactions banks must examine, as investigations would only focus on the predicted laundering cases.