A Linear-dependence-based Approach to Design Proactive Credit Scoring Models

Roberto Saia, Salvatore Carta


The main aim of a credit scoring model is the classification of the loan customers into two classes, reliable and unreliable customers, on the basis of their potential capability to keep up with their repayments. Nowadays, credit scoring models are increasingly in demand, due to the consumer credit growth. Such models are usually designed on the basis of the past loan applications and used to evaluate the new ones. Their definition represents a hard challenge for different reasons, the most important of which is the imbalanced class distribution of data (i.e., the number of default cases is much smaller than that of the non-default cases), and this reduces the effectiveness of the most widely used approaches (e.g., neural network, random forests, and so on). The Linear Dependence Based (LDB) approach proposed in this paper offers a twofold advantage: it evaluates a new loan application on the basis of the linear dependence of its vector representation in the context of a matrix composed by the vector representation of the non-default applications history, thus by using only a class of data, overcoming the imbalanced class distribution issue; furthermore, it does not exploit the defaulting loans, allowing us to operate in a proactive manner, by addressing also the cold-start problem. We validate our approach on two real-world datasets characterized by a strong unbalanced distribution of data, by comparing its performance with that of one of the best state-of-the-art approach: random forests.


  1. Ali, S. and Smith, K. A. (2006). On learning algorithm selection for classification. Appl. Soft Comput., 6(2):119-138.
  2. Attenberg, J. and Provost, F. J. (2010). Inactive learning?: difficulties employing active learning in practice. SIGKDD Explorations, 12(2):36-41.
  3. Batista, G. E., Prati, R. C., and Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1):20-29.
  4. Bhattacharyya, S., Jha, S., Tharakunnel, K. K., and Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602-613.
  5. Blanco-Oliver, A., Pino-Mejías, R., Lara-Rubio, J., and Rayo, S. (2013). Credit scoring models for the microfinance industry using neural networks: Evidence from peru. Expert Syst. Appl., 40(1):356-364.
  6. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  7. Brill, J. (1998). The importance of credit scoring models in improving cash flow and collections. Business Credit, 100(1):16-17.
  8. Brown, I. and Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39(3):3446- 3453.
  9. Chen, S. Y. and Liu, X. (2004). The contribution of data mining to information science. Journal of Information Science, 30(6):550-558.
  10. Chi, B. and Hsu, C. (2012). A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl., 39(3):2650-2661.
  11. Crone, S. F. and Finlay, S. (2012). Instance sampling in credit scoring: An empirical study of sample size and balancing. International Journal of Forecasting, 28(1):224-238.
  12. Davis, R., Edelman, D., and Gammerman, A. (1992). Machine-learning algorithms for credit-card applications. IMA Journal of Management Mathematics, 4(1):43-51.
  13. Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters.Commun. ACM, 51(1):107-113.
  14. Desai, V. S., Crook, J. N., and Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1):24-37.
  15. Donmez, P., Carbonell, J. G., and Bennett, P. N. (2007). Dual strategy active learning. In Kok, J. N., Koronacki, J., de Mántaras, R. L., Matwin, S., Mladenic, D., and Skowron, A., editors, Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings, volume 4701 of Lecture Notes in Computer Science, pages 116-127. Springer.
  16. Doumpos, M. and Zopounidis, C. (2014). Credit scoring. In Multicriteria Analysis in Finance, pages 43-59. Springer.
  17. Faraggi, D. and Reiser, B. (2002). Estimation of the area under the roc curve. Statistics in medicine, 21(20):3093- 3106.
  18. Fensterstock, A. (2005). Credit scoring and the next step. Business Credit, 107(3):46-49.
  19. Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T. D., and Sciascio, E. D. (2016). Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In Sen, S., Geyer, W., Freyne, J., and Castells, P., editors, Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016, pages 119-122. ACM.
  20. Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77(1):103-123.
  21. He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Trans. Knowl. Data Eng., 21(9):1263- 1284.
  22. Henley, W. et al. (1997). Construction of a k-nearestneighbour credit-scoring system. IMA Journal of Management Mathematics, 8(4):305-321.
  23. Henley, W. and Hand, D. J. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, pages 77-95.
  24. Henley, W. E. (1994). Statistical aspects of credit scoring. PhD thesis, Open University.
  25. Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4):655-665.
  26. Japkowicz, N. and Stephen, S. (2002). The class imbalance problem: A systematic study. Intell. Data Anal., 6(5):429-449.
  27. Lee, T.-S. and Chen, I.-F. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4):743-752.
  28. Lessmann, S., Baesens, B., Seow, H., and Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1):124-136.
  29. Lika, B., Kolomvatsos, K., and Hadjiefthymiades, S. (2014). Facing the cold start problem in recommender systems. Expert Syst. Appl., 41(4):2065-2073.
  30. Marqués, A. I., García, V., and S ánchez, J. S. (2013). On the suitability of resampling techniques for the class imbalance problem in credit scoring. JORS, 64(7):1060- 1070.
  31. Mester, L. J. et al. (1997). Whats the point of credit scoring? Business review, 3:3-16.
  32. Moler, C. B. (2004). Numerical computing with MATLAB. SIAM.
  33. Morrison, J. (2004). Introduction to survival analysis in business. The Journal of Business Forecasting, 23(1):18.
  34. Ong, C.-S., Huang, J.-J., and Tzeng, G.-H. (2005). Building credit scoring models using genetic programming. Expert Systems with Applications, 29(1):41-47.
  35. Powers, D. M. (2011). Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation.
  36. Pozzolo, A. D., Caelen, O., Borgne, Y. L., Waterschoot, S., and Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl., 41(10):4915-4928.
  37. Quah, J. T. S. and Sriganesh, M. (2008). Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl., 35(4):1721-1732.
  38. Reichert, A. K., Cho, C.-C., and Wagner, G. M. (1983). An examination of the conceptual issues involved in developing credit-scoring models. Journal of Business & Economic Statistics, 1(2):101-114.
  39. Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov., 1(3):317-328.
  40. Son, L. H. (2016). Dealing with the new user cold-start problem in recommender systems: A comparative review. Inf. Syst., 58:87-104.
  41. Thanuja, V., Venkateswarlu, B., and Anjaneyulu, G. (2011). Applications of data mining in customer relationship management. Journal of Computer and Mathematical Sciences Vol, 2(3):399-580.
  42. Vinciotti, V. and Hand, D. J. (2003). Scorecard construction with unbalanced class sizes. Journal of Iranian Statistical Society, 2(2):189-205.
  43. Wang, G., Hao, J., Ma, J., and Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl., 38(1):223-230.
  44. Wang, G., Ma, J., Huang, L., and Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst., 26:61-68.
  45. Zhu, J., Wang, H., Yao, T., and Tsou, B. K. (2008). Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Scott, D. and Uszkoreit, H., editors, COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18-22 August 2008, Manchester, UK, pages 1137-1144.

Paper Citation

in Harvard Style

Saia R. and Carta S. (2016). A Linear-dependence-based Approach to Design Proactive Credit Scoring Models . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 111-120. DOI: 10.5220/0006066701110120

in Bibtex Style

author={Roberto Saia and Salvatore Carta},
title={A Linear-dependence-based Approach to Design Proactive Credit Scoring Models},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},

in EndNote Style

JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - A Linear-dependence-based Approach to Design Proactive Credit Scoring Models
SN - 978-989-758-203-5
AU - Saia R.
AU - Carta S.
PY - 2016
SP - 111
EP - 120
DO - 10.5220/0006066701110120