
Figure 1: Pipeline of proposed methodology
sion prediction in hospitals. The set is divided into
train and test. Various models, such as MLP, XG-
Boost, and Catboost, are trained. The accuracy of
these models is then verified using their respective
results on the test set. Finally, the system uses an
ensemble learning technique, which combines the re-
sults of multiple models to improve the prediction of
hospital readmission.
The paper is divided into 5 sections listed below:
With an overview of Several methods for group learn-
ing, such as the functions MLP, XGBoost, and Cat-
Boost, Section 2 describes the algorithms for machine
learning that are currently available for hospital read-
mission prediction. The process of preparing patient
data, training models, and combining their predictions
using ensemble methods like voting or weighted av-
eraging to produce the final result is covered in Sec-
tion 3. The experimental results are presented in Sec-
tion 4, which compares a performance of ensemble
model with individual models on important metircs
such as F1-score, recall, and accuracy. Section 5 gives
additional details regarding the results implications
and future approaches for developing strong ensem-
ble learning techniques to improve hospital readmis-
sion prediction are also included in this.
2 BACKGROUND STUDY
Predicting hospital readmissions is a crucial field
of healthcare analysis that has been deeply researched
through different methods of machine learning. Be-
cause to their basic analysis and implementation, tra-
ditional models such as logistic regression(Leonard
et al., 2022) have been used frequently. When there is
a clear correlation between the input factors (such as
age, clinical history, etc.) and the output (readmission
risk), the linear model known as logistic regression
performs well. However, traditional models may find
it difficult to represent the complex and non-linear in-
teractions between variables seen in healthcare data.
For instance, non-linear relationships that are difficult
for linear models to accurately represent may develop
from interactions between different medical disorders
and treatments. As a result, these models frequently
lack predictive ability when dealing with the complex
of healthcare datasets.
In the area of hospital readmission prediction, ef-
fective tree-based algorithms like XGBoost (Hiday-
aturrohman and Hanada, 2024; Chen et al., 2023) and
CatBoost (Safaei et al., 2022; Quan and Gopukumar,
2023) have come up. To efficiently manage structured
data with missing values and complex feature interac-
tions, XGBoost applies gradient boosting. With its or-
dered boosting technique, CatBoost improves at cate-
gorical features without the need for any preprocess-
ing. While these models have shown promise, their
complexity in computation is frequently a challenge
in situations with limited resources or in applications
in real time.
Deep learning approaches, such as Recurrent Neu-
ral Network(RNN) (Chopra et al., 2017) and MLP
(Ti’jay Goudjerkan, ; Teo et al., 2023) , offer accu-
rate techniques for handling big and complex datasets.
The patient data’s cyclic patterns and non-linear re-
lationships can be captured by such models. How-
ever, many factors preventing their broader clinical
use include high computational costs, significant pre-
processing needs, and limited comprehension.
The strengths of many models have demonstrated
that ensemble learning techniques (Mienye and Sun,
2022; Yu and Xie, 2019) can significantly increase
predictive performance. In order to increase stabil-
ity and decrease variation, techniques such as vot-
ing, stacking, and bagging combine predictions from
various models. However, studies have shown that
ensembles frequently perform better than individual
models when managing the complexity of healthcare
datasets. A number of current ensemble approaches
may not be efficient, because they do not have enough
variance among base models.
To predict hospital readmissions, other machine
learning techniques such as Naive Bayes (Rao and
Battula, 2019), Random Forests (Bleich et al., 2021;
Kalusivalingam et al., 2012), and Support Vector Ma-
chine(SVM) (Wang and Paschalidis, 2019) have also
been used. These methods work well for use with
smaller datasets or for specific applications, however
they are unlikely to deal with the huge quantities of
complex healthcare data.
Although the previously discussed research are
helpful, there are circumstances where they fall short
in terms of generalization, data handling, and model
interpretability. It is challenging to deal with limited,
imbalanced datasets and categorical features. Neural
networks and other high-accuracy models frequently
Hospital Readmission Risk Prediction Using Ensemble Learning
821