precision, leading to more false positives. Random
Forest performed well in feature importance analysis
but had lower recall, while Neural Networks provided
balanced performance across metrics, slightly inferior
to SVC and Logistic Regression in maintaining
precision-recall trade-offs.
The application of PCA in credit risk anomaly
detection caused significant information loss,
compromising critical feature relationships and
leading to declines in accuracy, F1-Score, and
precision. Although PCA aims to reduce
dimensionality and enhance efficiency, its use in this
context resulted in sub-optimal performance, failing
to achieve the desired optimization. In contrast, CSL
significantly improved recall across all models,
addressing the need to detect minority-class cases, but
at the expense of precision, resulting in increased
false positives.
A promising solution to these challenges lies in
optimizing the most important features identified
through multi-model integration. By leveraging the
strengths of various models to pinpoint key attributes,
this approach retains essential information and
reduces redundancy, enabling models to achieve a
better balance between recall and precision. This
strategy addresses the practical need for high recall
while mitigating the limitations posed by PCA and
CSL, offering a robust pathway for credit risk
anomaly detection.
REFERENCES
Alam, T. M., Shaukat, K., & Hussain, A. 2020. An
investigation of credit card default prediction in the
imbalanced datasets. IEEE Access, 8, 201270–201283.
Almajid, A. S. 2021. Multilayer perceptron optimization on
imbalanced data using SVM-SMOTE and one-hot
encoding for credit card default prediction. Expert
Systems with Applications, 185, 115642.
Altman, E. I. 1968. Financial ratios, discriminant analysis
and the prediction of corporate bankruptcy. The Journal
of Finance, 23(4), 589–609.
Arora, N., & Kaur, P. D. 2020. A Bolasso-based consistent
feature selection enabled random forest classification
algorithm: An application to credit risk assessment.
Applied Soft Computing, 86, 105936.
Brown, I., & Mues, C. 2012. An experimental comparison
of classification algorithms for imbalanced credit
scoring data sets. Expert Systems with Applications,
39(3), 3446–3453.
Chandola, V., Banerjee, A., & Kumar, V. 2009. Anomaly
detection: A survey. ACM Computing Surveys, 41(3),
1–58.
Fernández, A., García, S., Galar, M., Prati, R. C.,
Krawczyk, B., & Herrera, F. 2018. Learning from
imbalanced data sets (Vol. 10, No. 2018). Cham:
Springer.
Guyon, I., & Elisseeff, A. 2003. An introduction to variable
and feature selection. Journal of Machine Learning
Research, 3, 1157–1182.
He, H., & Garcia, E. A. 2009. Learning from imbalanced
data. IEEE Transactions on Knowledge and Data
Engineering, 21(9), 1263–1284.
Hild, K. E., Erdogmus, D., & Torkkola, K. 2006. Feature
extraction using information-theoretic learning. IEEE
Transactions on Neural Networks, 17(5), 1224–1235.
Lundberg, S. M., & Lee, S. I. 2017. A unified approach to
interpreting model predictions, Nov. arXiv preprint
arXiv:1705.07874.
Mandour, M. A., & Chi, G. 2024. A review study of AI
methods for credit default prediction. Journal of
Artificial Intelligence in Finance, 12(3), 45–67.
Meinshausen, N., & Bühlmann, P. 2010. Stability selection.
Journal of the Royal Statistical Society: Series B, 72(4),
417–473.
Rousseeuw, P. J., & Van Driessen, K. 1999. A fast
algorithm for the minimum covariance determinant
estimator. Technometrics, 41(3), 212–223.
Song, F., Guo, Z., & Mei, D. 2010. Feature selection using
principal component analysis. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 32(3), 547–
561.
Uğuz, H. 2011. A two-stage feature selection method for
text categorization by using information gain, principal
component analysis and genetic algorithm. Expert
Systems with Applications, 38(7), 8747–8753.
World Bank. 2021. Global Financial Development Report
2021: Credit anomalies and economic stability. World
Bank Publications. Retrieved from
https://openknowledge.worldbank.org/handle/10986/3
5430