extend churn prediction accuracy and improve
programme usage and analysis through scrutinising
such practices as well as their respective limits.
(Meryem Chajia et al. 2024)
The models were evaluated using ACCURACY,
RECALL, F1-SCORE, and PRECISION
performance metrics, and the Random Forest
Classifier obtained an accuracy of 96.12%, which is
higher than that of Decision Trees. The limitations
include reliance on structured data, potential bias, and
the exclusion of advanced deep learning methods.
Data preprocessing, feature selection, and model
evaluation techniques play important roles in
improving churn prediction accuracy, according to
the study. They may guide future exploration of
ensemble learning, deep learning models and cost-
sensitive learning that can further refine our
prediction ability. (Aditi Chaudhary et al. 2023).
This study aims to classify customers in order to
predict churn using machine learning techniques.
The research focuses on imbalanced datasets and
mitigates it with the CTGAN (Conditional Tabular
GAN) and the SMOTE (Synthetic Minority
Oversampling Technique). (HSLR) model is
proposed based on hybrid stacking and logistic
regression (LR) as a meta-classifier, with random
forest (RF), extreme gradient boosting (XGB),
adaptive boosting (ADA), and light gradient
boosting (LGBM) as base classifiers. The
performance is measured with accuracy, precision,
recall, F1-score, MCC and ROC score, where
SMOTE generated data gives better results (94.06%
accuracy). The limitations of these methods might be
the absence of deep-learning techniques, the
potential bias from the synthesis of the datasets, and,
possibly, the requirement for real-time
implementation. Findings indicate the need for
future research to adopt techniques harnessing deep
learning capabilities, real-time churn prediction, and
ethical concerns around AI. (Nomanahmad et al.
2024)
We studied customer churn prediction using
machine learning algorithms primarily Support
Vector Machines (SVM). The study highlights factors
that might predict churn, including service quality,
pricing, customer satisfaction, and influence from
competitors. Data preprocessing, feature selection
and regression methods are carried out further for
customer attrition prediction. The SVM model
samples hyperplanes and maps data to higher-
dimensional spaces using kernel functions, enhancing
accuracy in classification. It suffers from some
limitations, such as being dependent on structured
data, inability to adjust to new data in real time, and
exclusion of deep learning models. Future studies
can leverage upon deep learning, real-time analytics
and more sophisticated feature engineering to
improve churn prediction accuracy. (RajaGopal et al.
2021).
The third paper is entitled "Analysis and Prediction of
Bank User Churn Based on Ensemble Learning
Algorithm," and envisages customer churn prediction
in a bank using the three ensemble algorithms
CatBoost, LightGBM, and Random Forest. On
quarterly user data, the model achieves 90% accuracy
and more than 80% AUC that not only useful in
customer retention but also marketing strategies
refinement for bank. Indeed, as the study indicates,
ensemble learning integrated with the proper return of
data can improve the prediction results, but issues
such as overfitting and data optimization must still be
addressed. (Yihui Deng et al. 2021).
The paper titled "Prediction of Player Churn and
Disengagement Based on User Activity Data of a
Freemium Online Strategy Game" explores
predicting player churn in "The Settlers Online" using
machine learning algorithms like random forests,
decision trees, and neural networks. By analyzing
player activity data and employing methods such as
sliding windows and quartile approaches, the
researchers achieved high accuracy, with AUC values
exceeding 0.99 and prediction accuracies over 97%.
However, the study acknowledges limitations in
generalizing the results to other games and highlights
potential biases and the need for fine-tuning labeling
approaches and feature selection. The findings are
particularly relevant for game developers seeking to
retain players in freemium games.( Karsten
Rothmeier et al. 2020)
The paper "Development of Churn Prediction
Model using XGBoost - Telecommunication Industry
in Sri Lanka" explores customer churn prediction
using machine learning algorithms like Decision
Tree, Logistic Regression, SVM, ANN, Random
Forest, AdaBoost, and XGBoost. Analyzing data
from 10,000 postpaid users, XGBoost achieved the
highest accuracy of 82.90%, improving to 83.13%
after hyperparameter tuning. The study highlights the
effectiveness of ensemble methods but notes the need
for better feature selection and data pre-processing to
address potential overfitting (Prasanth Senthan, et al.
2021).
3 PROPOSED SYSTEM
The system uses supervised machine learning
algorithms, particularly Random Forest, to predict