
2 RELATED WORKS
Machine learning-based heart disease prediction has
emerged as one of the most extensively studied fields.
The main reason for this is the Random Forest (RF)
algorithm, which emerges as an excellent application
for high-dimensional data, feature interaction, and
prevention of overfitting. There have been many
studies that used RF to predict cardiovascular-related
outcomes, almost uniformly showing it to be better
than classical methods. In Xuanyi Tao, various
machine learning models, with a focus on the
Random Forest model, in making predictions about
cardiovascular diseases based on critical health
indicators such as age, blood pressure, cholesterol
levels, and heart rate. Yu Wan, et al., The challenges
presented in diagnosing heart failure purely on
clinical symptoms are identified, and the importance
of applying data-driven methods for early detection is
emphasized. From the analysis of a dataset consisting
of key health determinants like creatinine
phosphokinase (CPK), serum creatinine (SCR),
ejection fraction (EF), age, and follow-up time
intervals, the study demonstrates that CPK is the most
significant indicator of heart failure. Mienye and
Yanxia Sun, to find out how well they predict heart
disease, all machine learning algorithms—including
Decision Trees, Logistic Regression, Support Vector
Machines, Random Forest, XGBoost, and Adaptive
Boosting (AdaBoost)—are employed. And in Zerui
Jiang, Logistic Regression has better classification
accuracy and predictive ability than Random Forest.
Thoutireddy Shilpa and Anal Paul, the proposed CVD
Prediction Framework (CVDPF) uses a combination
of machine learning algorithms along with HFS,
which is an aggregation of multiple filter-based
methods to make predictions more accurate. In Hui
Yuan, et al, this research examines four essential
biomarkers—CK-MB, BNP, Galectin-3 (Gal-3), and
sST2—and utilizes the Random Forest algorithm to
enhance the precision of predictions. Anamta
Siddiqui and Syed Wajahat Abbas Rizvi, It combines
a variety of models to analyse patient data and find
risk factors linked to heart disease, including Random
Forest, Decision Trees, Support Vector Machine
(SVM), K-nearest-neighbors algorithm (KNN), and
Logistic Regression. In Shagufta Rasheed, et al, this
study utilizes Random Forest, Support Vector
Machine (SVM), Adaboost, Logistic Regression, and
Naive Bayes methods to analyze cardiovascular and
clinical information, with a focus on the optimization
of hyperparameters using GridSearchCV to enhance
the accuracy of the models. In Muhammad Yoga, et
al, By combining filter and wrapper-based feature
selection techniques such Chi-Square (CS),
Correlation-Based Selection of Features (CSF), and
Forward Selection (FS), the study tackles practical
issues like noisy features, high-dimensional datasets,
and premature convergence. In Nesma Elsayed, et al
Results indicate the Random Forest model is found to
outperform the rest of the models with the best
accuracy, precision, and recall. In Peiyang Yu, et al,
Application of Particle Swarm Optimization (PSO)
for improvement of the Transformer model increased
classification accuracy to 96.5%, surpassing the
performance of traditional machine learning
techniques. In Kalaivani B and Ranichitra A, They
reduced the dimensionality of the data and improved
the classification efficiency by combining the
LASSO technique with differential Entropy-based
Information Gain for feature selection. In Proshanta
Kumar Bhowmik, et al, these results reveal that
Logistic Regression achieved the greatest ROC-AUC
value, proportionally balancing the true positives with
the false positives, while Support Vector Machine
(SVM) had the most accuracy. In Ochim Gold and
Agaji Iorshase, Models were created and compared
using WEKA software, and the J48 and AdaBoost
combination performed an excellent accuracy of
92.3%, beating the Random Forest model with a
recorded accuracy of 89.2%. In Joel Paul, Both
models Support Vector Machines (SVM) and
Random Forest (RF) and these have strong predictive
performances, but Random Forest outperforms SVM
in terms of accuracy and generalizability. In
Madhumita Pal and Smita Parija , results of the study
reveal that Random Forest algorithm is an efficient
machine learning model for classifying heart disease.
Subsequent studies may aim to look at other models
like Naive Bayes, Decision-Trees, and KNN (K-
Nearest Neighbors) for enhancing accuracy further.
In Ramanathan Gopalakrishnan and Jagadeesha, this
research assesses these models using metrics such as
F1-Score, ROC-AUC, and accuracy, identifying the
most effective method for CAD prediction. In L.
Vindhya, et al, show’s that the maximum accuracy
rate of 85.5% was attained by the Support Vector
Machine (SVM) using a hybrid feature selection
strategy that combines Information Gain,
Symmetrical Uncertainty, and Correlation-based
Feature Selection (CFS). This demonstrates how
important effective feature selection is for
significantly improving model performance. In Didik
Setiyadi, et al; Tsehay Admassu, et al, Support Vector
Machine (SVM) was found to have the highest
accuracy of 85%, outperforming Random Forest (RF)
and Neural Networks (NN). The results of the
research show that SVM is a reliable tool in
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
238