Authors:
Başak Gültekin
and
Betül Erdoğdu Şakar
Affiliation:
Faculty of Engineering and Natural Sciences, Bahçeşehir University, Beşiktaş and Turkey
Keyword(s):
Credit Scoring, Default Prediction, Feature Selection, Classification, Boruta, Logistic Regression, Random Forest, Artificial Neural Network.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Business Analytics
;
Business Intelligence
;
Cardiovascular Technologies
;
Computing and Telecommunications in Cardiology
;
Data Analytics
;
Data Engineering
;
Data Manipulation
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Decision Support Systems
;
Decision Support Systems, Remote Data Analysis
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Health Information Systems
;
Human-Computer Interaction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Methodologies and Methods
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Software Engineering
;
Statistics Exploratory Data Analysis
;
Symbolic Systems
Abstract:
In this study, different data mining techniques were applied to a real bank credit data set from a public bank to provide an automated and objective credit scoring. Two-step methodology was used for objective credit scoring: Determining the variables to be included in the model and deciding on the model to classify the potential credit application as “bad credit (default)” or “good credit (not default)”. The phrases “bad credit” and “good credit” are used as class labels since they are used like this in banking jargon in Turkey. For this two-step procedure, different variable selection algorithms like Random Forest, Boruta and machine learning algorithms like Logistic Regression, Random Forest, Artificial Neural Network were tried. At the end of the feature selection phase, CRA_Score and III_Score variables were determined as most important variables. Moreover, occupation and bank product number were also predictor variables. For the classification phase, Neural Network model was the
best model with higher accuracy and low average square error also Random Forest model better resulted than Logistic Regression model.
(More)