Evaluating Machine Learning Strategies for Credit Risk Anomaly Detection

Chenxi Hu

2025

Abstract

This study addresses class imbalance in credit risk classification and optimizes model performance. One-hot encoding was applied during data preprocessing, followed by an attempt at dimensionality reduction using Principal Component Analysis (PCA), although PCA led to poorer results. Cost-Sensitive Learning (CSL) was used to handle class imbalance. Five machine learning models were evaluated, and results showed that Support Vector Classifier (SVC) and Logistic Regression performed best in ROC-AUC and F1-Score, effectively balancing precision and recall. XGBoost achieved the highest recall but had lower precision, resulting in a lower F1-Score. Random Forest and Neural Networks displayed balanced performance but did not outperform SVC and Logistic Regression, making SVC and Logistic Regression the recommended models for credit risk classification. Regarding PCA, the results showed it did not significantly improve model performance. While some models, such as Neural Networks, showed slight improvements in ROC-AUC and F1-Score, PCA generally led to a decrease in precision and recall, failing to enhance performance. CSL improved recall significantly but reduced precision and accuracy. The study mitigated this by optimizing feature weights, achieving higher recall with minimal precision loss, offering a balanced solution for high-recall scenarios.

Download


Paper Citation


in Harvard Style

Hu C. (2025). Evaluating Machine Learning Strategies for Credit Risk Anomaly Detection. In Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE; ISBN 978-989-758-765-8, SciTePress, pages 436-441. DOI: 10.5220/0013699000004670


in Bibtex Style

@conference{icdse25,
author={Chenxi Hu},
title={Evaluating Machine Learning Strategies for Credit Risk Anomaly Detection},
booktitle={Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE},
year={2025},
pages={436-441},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013699000004670},
isbn={978-989-758-765-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE
TI - Evaluating Machine Learning Strategies for Credit Risk Anomaly Detection
SN - 978-989-758-765-8
AU - Hu C.
PY - 2025
SP - 436
EP - 441
DO - 10.5220/0013699000004670
PB - SciTePress