Machine Learning for Heart Disease Identification in E-Healthcare

Shaik Mohammad Riyaz Naik, Anapati Sai Venkat, Kandukuri Dhanush, Potu Mani Kanta,

Palukuru Sumanth Royal and Akula Naveen Kumar

Department of Computer Science and Engineering, Santhiram Engineering College, Nandyal‑518501, Andhra Pradesh,

India

Keywords: Cardiovascular Disease, Heart Disease, Machine Learning App, ML Algorithms, SDG 3, SHAP, SMOTE.

Abstract: Heart disease is a leading cause of mortality worldwide, with early detection playing a critical role in reducing

death rates. Accurate prediction of heart disease remains challenging due to complex medical data and the

inability to provide continuous monitoring. Utilizing the heart disease dataset, various feature selection

techniques, including ANOVA F-statistic (ANOVA FS), Chi-squared test (Chi2 FS), and Mutual Information

(MI FS), were employed to identify significant predictors. Synthetic Minority Oversampling Technique

(SMOTE) was applied to address data imbalance and enhance model performance. A comprehensive

classification approach was undertaken using diverse machine learning models and ensemble methods.

Among these, a Stacking Classifier combining Boosted Decision Trees, Extra Trees, and LightGBM achieved

superior results, delivering 100% accuracy across all feature selection techniques. The high performance

highlights the effectiveness of advanced ensemble learning in achieving reliable heart disease predictions,

emphasizing the potential of integrating robust feature selection with sophisticated classification models for

precise medical data analysis. This approach demonstrates the capacity to support early diagnosis and

improved patient outcomes.

1 INTRODUCTION

Cardiac diseases are one of the leading causes of

mortality worldwide with heart disease having a

major share. The muscular heart is responsible for

pumping blood through the rest of the body as part of

the circulatory system. This complexity is composed

of arteries, veins and capillaries through which

oxygen and nutrients flow to organs and tissues.

CVD: If the normal flow of blood is disturbed, it

leads to different types of heart disorder called

cardiovascular diseases (CVD). Cardiovascular

disease and stroke kill more than 17.5 million people

around the world each year, more than 75% of these

deaths take place in low and middle-income

countries, according to the World Health

Organization (WHO). This alarming statistic

emphasizes the public health challenge of heart

disease that is escalating all around the world, as

myocardial infarctions and strokes result in 80% of

total CVD-related deaths.

The burden of cardiovascular diseases has led to a

global focus on early detection, prevention, and

treatment strategies. In line with the United Nations'

Sustainable Development Goal 3, which emphasizes

the importance of health and well-being, addressing

cardiovascular diseases has become a priority for

improving global health outcomes. Common risk

factors for of heart disease risk factors include

smoking, older age, family history of heart disease,

high cholesterol, lack of exercise, high blood

pressure, obesity, diabetes, and stress. Quitting

smoking, exercising on a regular basis, maintaining a

healthy weight, and managing stress are all lifestyle

choices that can decrease the risk of heart disease. In

addition to lifestyle interventions, you may take tests

like ECGs, echocardiograms, MRIs and blood tests to

find heart disease. Medical procedures such as

angioplasty, bypass, and using implanted devices,

such as pacemakers or defibrillators can also be

necessary for treatment.

Advancements in healthcare technology,

particularly in the realm of Big Data and Electronic

Health Records (EHRs), have made it possible to

leverage vast amounts of patient data for predictive

modeling. Machine learning (ML) techniques are

Naik, S. M. R., Venkat, A. S., Dhanush, K., Kanta, P. M., Royal, P. S. and Kumar, A. N.

Machine Learning for Heart Disease Identiﬁcation in E: Healthcare.

DOI: 10.5220/0013907000004919

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 3, pages

861-870

ISBN: 978-989-758-777-1

861

increasingly being used to analyze large datasets from

healthcare systems, extracting meaningful insights to

predict the likelihood of heart disease. By processing

and analyzing data from various patient

demographics, risk factors, and diagnostic results,

machine learning can assist healthcare professionals

in identifying patients at high risk and enable early

intervention. This approach is transforming the

landscape of healthcare by offering more precise and

efficient methods of diagnosis, prediction, and

personalized treatment plans.

2 RELATED WORK

Cardiovascular diseases (CVD), including coronary

heart disease, have remained a global health concern,

accounting for a significant proportion of worldwide

mortality. With the rise in healthcare data and

advancements in machine learning (ML) techniques,

there has been a surge in efforts to predict and

diagnose heart disease more effectively. The ability

to analyze large datasets using machine learning

offers new avenues for identifying risk factors,

predicting outcomes, and improving early detection.

This literature survey discusses various studies that

have employed different machine learning models

and techniques to predict heart disease and highlights

their findings.

Yang et al. conducted a study using machine

learning to identify risk factors for coronary heart

disease. Their work focused on big data analysis,

demonstrating that machine learning models such as

decision trees, random forests, and support vector

machines (SVM) could effectively identify key risk

factors from patient data. The study found that high

cholesterol, age, and family history were among the

most significant risk factors contributing to the

prediction of heart disease. Furthermore, the study

underscored the importance of data preprocessing,

feature selection, and the need for a robust dataset in

improving model performance. The results confirmed

that machine learning could be a powerful tool for the

early detection of heart disease, especially when

combined with large, comprehensive datasets.

In a similar vein, Ngufor et al. reviewed several

machine learning algorithms for heart disease

prediction. The authors provided a comparative

analysis of popular techniques like SVM, decision

trees, k-nearest neighbors (KNN), and artificial

neural networks (ANNs). Their findings indicated

that ensemble methods, such as bagging and boosting,

provided superior predictive performance over

individual models. Additionally, the study

emphasized the importance of feature selection, as

irrelevant features can degrade model accuracy. This

review highlighted that while various algorithms

could predict heart disease, the choice of technique

depends heavily on the dataset characteristics, the

computational resources available, and the specific

requirements of the prediction task.

Farag et al. focused on improving heart disease

prediction using boosting and bagging techniques.

Boosting algorithms, such as AdaBoost, and bagging

techniques, like Random Forest, were tested to assess

their ability to enhance prediction accuracy. The

study found that ensemble techniques were more

robust than standalone classifiers in terms of reducing

variance and improving the stability of predictions.

Furthermore, the study suggested that the

combination of boosting and bagging could mitigate

the overfitting problem, which is often encountered in

heart disease prediction models. This work

underlined the importance of using multiple

classifiers in conjunction to achieve optimal

performance.

Zhang et al. investigated the application of

XGBoost, a gradient boosting algorithm, in the

clinical prediction of coronary heart disease. Their

study showed that XGBoost outperformed traditional

methods like logistic regression and SVM in terms of

both accuracy and interpretability. XGBoost’s ability

to handle imbalanced datasets, which is a common

issue in heart disease prediction, made it particularly

suitable for medical data. The study also pointed out

that hyperparameter tuning played a crucial role in

maximizing the model's performance. XGBoost's

superior performance in clinical environments can be

attributed to its efficiency, scalability, and ability to

provide a reliable prediction while reducing the risk

of overfitting.

Liu et al. conducted a comparative analysis of

several machine learning algorithms, including

decision trees, SVM, and random forests, for heart

disease prediction. Their study found that SVM with

radial basis function (RBF) kernels provided the

highest prediction accuracy among the algorithms

tested. However, they also noted that decision trees

and random forests offered better interpretability,

which is crucial in a medical setting. The study

concluded that while SVM showed the best accuracy,

the choice of algorithm should be based on the trade-

off between accuracy and interpretability, especially

when the model needs to be used by healthcare

professionals for decision-making.

Hussein et al. compared several machine learning

techniques for heart disease diagnosis, including

KNN, decision trees, and artificial neural networks.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

862

The study aimed to assess the models' diagnostic

abilities using a dataset of patient health records.

Their results indicated that KNN and decision trees

performed well with lower computational overhead,

making them suitable for real-time applications in

clinical settings. While artificial neural networks

provided higher accuracy, they required more

computational resources and were harder to interpret.

The research emphasized that the adoption of

machine learning models in healthcare systems must

consider both performance and resource constraints.

Akbar et al. conducted a critical review of various

machine learning approaches for heart disease

prediction. They evaluated techniques such as

decision trees, KNN, SVM, random forests, and

neural networks, and provided an overview of their

strengths and weaknesses in predicting heart disease.

The authors concluded that ensemble methods,

particularly Random Forest, achieved the highest

accuracy due to their ability to reduce overfitting and

handle noisy data. However, the review also

highlighted the challenge of selecting the most

relevant features from large datasets, as feature

selection plays a crucial role in the model’s

performance. Additionally, the study emphasized the

importance of handling missing data and ensuring

that the dataset used for training is both representative

and balanced.

Zarshenas et al. performed a comparative study of

machine learning algorithms for predicting heart

disease, focusing on classifiers such as SVM,

decision trees, KNN, and logistic regression. Their

results showed that SVM and Random Forest

achieved the best predictive performance, with SVM

offering higher accuracy in certain cases. They also

highlighted the importance of preprocessing steps,

such as normalization and feature scaling, in

improving the performance of machine learning

models. The study suggested that hybrid models,

which combine the strengths of multiple algorithms,

could be a promising direction for future research in

heart disease prediction.

A common theme in the studies reviewed is the

importance of feature selection and data

preprocessing in enhancing the performance of heart

disease prediction models. Many studies emphasize

the use of ensemble techniques like bagging and

boosting to improve model accuracy, while others

suggest that hybrid models combining different

machine learning algorithms could yield better

results. Additionally, while traditional models like

decision trees and logistic regression are still widely

used, more recent studies have shown that advanced

algorithms like XGBoost and neural networks can

outperform these methods, particularly when dealing

with complex and high-dimensional data.

The increasing availability of healthcare data and

the advent of machine learning techniques have

opened new possibilities for the early detection and

diagnosis of heart disease. Machine learning models

can help healthcare providers identify patients at risk,

enabling timely interventions and reducing the

burden of heart disease. However, challenges such as

data quality, feature selection, model interpretability,

and computational efficiency must be addressed to

fully realize the potential of these technologies in

clinical practice. The combination of advanced

machine learning models with domain-specific

knowledge from healthcare professionals will likely

drive the next wave of innovations in heart disease

prediction.

3 MATERIALS AND METHODS

The system under design aims at building an

efficient heart disease prediction model with the

support of machine learning and advanced census

strategies. The Heart Disease data is preprocessed

and important features are selected by the methods,

ANOVA F-statistic (ANOVA FS), Chi-squared

(Chi2 FS) and Mutual Information (MI FS). To deal

with class imbalance, SMOTE is employed to balance

the data distribution.

The system evaluates multiple machine learning

algorithms, including Naïve Bayes, Support Vector

Machines (SVM), XGBoost, AdaBoost, Bagging

Classifier, Decision Tree, K-Nearest Neighbor

(KNN), Random Forest, and Logistic Regression. An

ensemble Voting Classifier combines the predictions

from these models to improve overall accuracy and

robustness. Additionally, a Stacking Classifier

integrates Boosted Decision Tree, Extra Trees, and

LightGBM to exploit their complementary strengths.

This hybrid approach aims to enhance the precision

and reliability of heart disease prediction, aiding early

detection and better clinical decision-making.

Figure 1: Proposed architecture.

Machine Learning for Heart Disease Identiﬁcation in E: Healthcare

863

This image (Figure 1) depicts a flowchart for a

heart disease prediction model. It starts with data

preparation and preprocessing from heart disease

datasets. Then, feature selection methods (ANOVA

FS, Chi2 FS, MIFS) are applied. The dataset is then

fed into various machine learning classifiers (NB,

KNN, SVM, RF, XGBoost, LR, AdaBoost, Voting

Classifier, Bagging, Stacking Classifier, DT). The

model undergoes performance analysis and

evaluation to assess its accuracy and effectiveness in

predicting heart disease.

3.1 Dataset Collection

The dataset used for heart disease prediction consists

of 303 samples with 14 features, including both

numerical and categorical variables. These features

capture critical patient information, which include

age, sex, chest pain type (cp), resting blood pressure

(trestbps), serum cholesterol in mg/dl (chol), fasting

blood sugar (fbs), resting ecg results (restecg), max

heart rate (thalach), exercise induced angina (exang),

ST depression induced by exercise (oldpeak), slope

of excercise ST segment (slope), number of major

vessels colored by fluoroscopy (ca), and thalassemia

(thal). The target is a binary variable representing

whether or not the heart disease is present. After

applying feature selection techniques, such as

ANOVA F-statistic (ANOVA FS), Chi-squared test

(Chi2 FS), and Mutual Information (MI FS), different

sets of features were selected to improve model

accuracy and efficiency, ensuring that the dataset is

optimized for predicting heart disease outcomes.

Figure 2 shows Dataset Collection Table – Heart

Disease Data.

Figure 2: Dataset collection table – Heart disease data.

3.2 Pre-Processing

Pre-processing is a crucial step in preparing the

dataset for machine learning. It involves cleaning and

transforming data to ensure accuracy, efficiency, and

relevance. Proper handling of missing values,

encoding, and feature selection significantly

enhances model performance and robustness.

3.2.1 Data Processing

The data processing begins with cleaning, which

involves removing missing values and correcting

inconsistencies. Unwanted columns are dropped to

streamline the dataset. Label encoding is applied to

categorical variables, and the features are separated

into input (X) and output (y) datasets, ensuring proper

structuring for analysis. These steps ensure the

dataset is ready for model training.

3.2.2 Data Visualization

Data visualization helps to understand the

relationships between variables and uncover hidden

patterns. A correlation matrix is created to explore the

relationships between numerical features, while

sample outcomes are visualized to check for data

distribution and trends. This aids in identifying

relevant features and understanding how they impact

the target variable.

3.2.3 Label Encoding

Label encoding transforms categorical labels into

numerical values, enabling models to process non-

numeric data. This process converts each category

into a unique integer, making it suitable for machine

learning algorithms that require numerical inputs.

Label encoding is particularly useful when there is an

inherent order in the categorical data.

3.2.4 Oversampling

In order to tackle this, we would use SMOTE

(Synthetic Minority Over-sampling Technique) to

generate synthetic examples for the minority class.

This method is suggested for balancing the dataset as

sometimes the model may overfit the majority class.

It's a smart way to improve generalization and

performance of models, especially on unbalanced

datasets.

3.2.5 Feature Selection

Feature selection helps to identify the most relevant

variables for model training. Techniques like

ANOVA F-statistic, Chi-squared test, and Mutual

Information Feature Selection (MIFS) are applied to

filter out irrelevant features, improving the model’s

efficiency and accuracy. By reducing the number of

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

864

features, the model complexity is lowered, leading to

faster computation and better generalization.

3.3 Training & Testing

Data is splitted into training and test sets to assess

model performance. An 80/20 ratio is maintained:

80% of the data is used to train the model’s

classifier(s), while 20% is reserved for testing of that

model. This division ensures there is enough data for

the model to learn from and some completely separate

data for validation. The separation is important to

evaluate how well the model can generalize and

perforhttp://www.jmlr.org/papers/volume10/cawr07

a/cawr07a.pdf m on unknown test data.

3.4 Algorithms

Naive Bayes is employed for its simplicity and

efficiency in handling large datasets. It leverages

Bayes' theorem to classify heart disease risk based on

feature probabilities, making it particularly effective

for categorical data.

Support Vector Machine (SVM) is used to find the

optimal hyperplane that separates different classes. It

excels in high-dimensional spaces, making it suitable

for complex feature interactions in heart disease

prediction.

XGBoost is implemented for its powerful boosting

capabilities, enhancing model accuracy through

iterative learning. It combines weak learners into a

strong predictive model, making it highly effective

for predicting heart disease risk.

AdaBoost focuses on improving weak classifiers by

emphasizing misclassified instances. This iterative

approach increases prediction accuracy, making it a

valuable technique for robust heart disease

classification in the model.

Bagging Classifier is utilized to reduce variance and

enhance model stability. By combining predictions

from multiple models trained on different subsets of

data, it helps improve heart disease risk predictions.

Decision Tree algorithm is employed for its

interpretability and ease of understanding. It splits

data based on feature values, providing clear insights

into decision-making processes for heart disease

prediction.

K-Nearest Neighbors (KNN) is used for its

straightforward approach to classification based on

proximity. It assesses the nearest data points to

classify heart disease risk, leveraging similarity

among instances.

Random Forest combines multiple decision trees to

enhance prediction accuracy and control overfitting.

This ensemble method is effective for heart disease

prediction, providing robust results across various

datasets.

Logistic Regression is implemented to model the

probability of heart disease occurrence. It estimates

relationships between dependent and independent

variables, making it suitable for binary classification

tasks in the system.

Voting Classifier aggregates predictions from

multiple models, including Naive Bayes, SVM, and

others. This ensemble approach enhances overall

prediction accuracy by leveraging the strengths of

various algorithms for heart disease classification.

Stacking Classifier combines predictions from a

Boosted Decision Tree and ExtraTree with

LightGBM. This layered approach integrates multiple

models, optimizing performance and accuracy for

heart disease predictions by capturing complex

patterns in data.

4 RESULTS & DISCUSSION

Accuracy: The validity of a test is to distinguish the

positive from the negative patients and healthy

human being. The true positive and true negative

proportion out of all examined cases value is what we

should use to get the idea of what to expect from a

test. Mathematically, we have the following:

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =

TP + TN

TP + FP + TN + FN

(1)

Precision: Precision measures the ratio of correctly

classified instances or samples of a class among the

ones that have been classified as that class. So, the

precision can be calculated using the following

formula:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

True Positive

True Positive + False Positive

(2)

Recall: Recall is a value from machine learning that

tells you how good a model is at finding all the things

of a particular type. It gives you an idea of when it is

the case that a class behaves properly inside the

model (its (sufficiency) The ratio of true positive to

the actual positive is all about a model’s

comprehensiveness (how good a model is at capturing

a given class)

𝑅𝑒𝑐𝑎𝑙𝑙 =

TP + FN

3

Machine Learning for Heart Disease Identiﬁcation in E: Healthcare

865

F1-Score: F1 score is an evaluation metric used to

assess the performance of machine learning models.

It is based both on the precision and recall of a

model. Accuracy returns how many times a model

made the correct prediction over the whole dataset.

𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗

𝑅𝑒𝑐𝑎𝑙𝑙 X 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

∗ 100 (4)

Tables (1, 2& 3) assess the metrics of accuracy,

precision, recall, and F1-scorefor each algorithm.

The Stacking Classifier significantly outperforms all

the other algorithms in all the measures. The tables

provide similar comparisons of the metrics for the

other algorithms.

Figure 3: Comparison graphs for Anova FS.

Table 1: Performance Evaluation Metrics for Anova FS.

Model Accuracy Precisio

Recall F1

Score

Naive

0.848 0.850 0.848 0.849

SVM 0.682 0.686 0.682 0.682

XGBoost 0.818 0.820 0.818 0.818

AdaBoost 0.833 0.845 0.833 0.834

Bagging 0.818 0.826 0.818 0.819

Decision

Tree

0.788 0.790 0.788 0.788

KNN 0.727 0.729 0.727 0.728

Random

Forest

0.864 0.868 0.864 0.864

Logistic

Regressio

0.864 0.864 0.864 0.864

Voting 0.818 0.818 0.818 0.818

Stacking 1.000 1.000 1.000 1.000

Table 2: Performance evaluation metrics for Chi2 FS.

Model Accuracy Precision Recall F1

Score

Naive

0.788 0.790 0.788 0.788

SVM 0.652 0.656 0.652 0.652

XGBoost 0.818 0.835 0.818 0.820

AdaBoost 0.773 0.773 0.773 0.773

Bagging 0.803 0.815 0.803 0.804

Decision

Tree

0.727 0.735 0.727 0.728

KNN 0.621 0.622 0.621 0.621

Random

Forest

0.879 0.886 0.879 0.879

Logistic

ression

0.803 0.807 0.803 0.803

Voting 0.818 0.818 0.818 0.818

Stacking 1.000 1.000 1.000 1.000

Figure 4: Comparison graphs for HHO FS in Chi2 FS.

0.2

0.4

0.6

0.8

1.2

Naive Bayes

SVM

XGBoost

AdaBoost

Bagging

Decision Tree

KNN

Random Forest

Logistic…

Voting

Stacking

COMPARISON GRAPHS FOR ANOVA FS

Accuracy Precision Recall F1 Score

0.2

0.4

0.6

0.8

1.2

Naive Bayes

SVM

XGBoost

AdaBoost

Bagging

Decision Tree

KNN

Random Forest

Logistic Regression

Voting

Stacking

COMPARISON GRAPHS FOR

CHI2 FS

Accuracy Precision Recall F1 Score

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

866

Table 3: Performance evaluation metrics for MI FS.

Model Accurac

Preci

sion

Reca

Scor

Naive

Bayes

0.848 0.850 0.848 0.849

SVM 0.636 0.636 0.636 0.636

XGBoos

0.833 0.845 0.833 0.834

AdaBoos

0.833 0.834 0.833 0.833

Bagging 0.833 0.834 0.833 0.833

Decision

Tree

0.773 0.784 0.773 0.774

KNN 0.682 0.682 0.682 0.682

Random

Forest

0.879 0.881 0.879 0.879

Logistic

Regressi

0.864 0.864 0.864 0.864

Voting 0.864 0.864 0.864 0.864

Stacking 1.000 1.000 1.000 1.000

Figure 5: Comparison graphs for MI FS.

We use figure 3,4 & 5 to show the visualization

of the accuracy in light blue, recall in grey, precision

in orange and f1-Score in light yellow. In contrast to

the other models, our Stacking Classifier performed

best on all metrics and reached the maximum values

overall. These findings emerge in the graphs above.

Figure 6: Home page.

In the above figure 6, this is a user interface

dashboard, it is a welcome message for navigating

page.

Figure 7: ANOVA dataset loading.

In the above figure 7, this is a user input page, using

this user can upload ANOVA dataset for testing.

Figure 8: Test result.

0.2

0.4

0.6

0.8

1.2

Naive Bayes

SVM

XGBoost

AdaBoost

Bagging

Decision Tree

KNN

Random Forest

Logistic…

Voting

Stacking

COMPARISON GRAPHS

FOR MI FS

Accuracy Precision Recall F1 Score

Machine Learning for Heart Disease Identiﬁcation in E: Healthcare

867

In the above figure 8, this is a result screen, in this

user will get output for loaded input data.

Figure 9: CHI2 dataset loading.

In the above figure 9, this is a user input page, using

this user can upload CHI2 dataset for testing.

Figure 10: Test result.

In the above figure 10, this is a result screen, in this

user will get output for loaded input data.

Figure 11: MI dataset loading.

In the above figure 11, this is a user input page, using

this user can upload MI dataset for testing.

Figure 12: Test result.

In the above figure 12, this is a result screen, in this

user will get output for loaded input data.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

868

5 CONCLUSIONS

In conclusion, the proposed system demonstrates the

effectiveness of using advanced machine learning

techniques for predicting heart disease with high

accuracy. By utilizing feature selection methods such

as ANOVA F-statistic, Chi-squared test, and Mutual

Information, the system successfully identifies key

predictors, enhancing the overall performance of the

model. The application of SMOTE for addressing

class imbalance further improves the model's ability

to detect heart disease cases, ensuring balanced and

reliable predictions.

Among the various algorithms tested, the

Stacking Classifier, which combines Boosted

Decision Trees, Extra Trees, and LightGBM,

achieved the highest performance, delivering a

remarkable 100% accuracy across all feature

selection techniques. This result underscores the

power of ensemble methods in combining the

strengths of individual classifiers to improve

predictive accuracy. By integrating robust feature

selection with sophisticated ensemble learning, the

proposed system significantly contributes to the

accurate and early detection of heart disease,

demonstrating its potential for real-world clinical

applications and decision-making in healthcare.

In future work, additional techniques such as deep

learning models, including neural networks, can be

explored to improve prediction accuracy. The use of

advanced ensemble methods, like Gradient Boosting

or stacking with more diverse base classifiers, could

offer further improvements. Incorporating additional

feature selection methods, such as Recursive Feature

Elimination (RFE) or L1 regularization, may refine

the model's performance. Exploring time-series data

and incorporating temporal factors could also provide

more comprehensive insights for predicting heart

disease outcomes.

REFERENCES

(2023). World Health Organization. Cardiovascular

Diseases (CVDs). Accessed: May 5,

2023.[Online].Available:https://www.afro.who.int/

health-topics/cardiovascular-diseases

A. Farag, A. Farag, and A. Sallam, ‘‘Improving heart

disease prediction using boosting and bagging

techniques,’’ in Proc. Int. Conf. Innov. Trends Comput.

Eng. (ITCE), Mar. 2016, pp. 90–96.

A. Zarshenas, M. Ghanbarzadeh, and A. Khosravi, ‘‘A

comparative study of machine learning algorithms for

predicting heart disease,’’ Artif. Intell. Med., vol. 98,

pp. 44–54, Oct. 2019.

C. Ngufor, A. Hossain, S. Ali, and A. Alqudah, ‘‘Machine

learning algorithms for heart disease prediction: A

survey,’’ Int. J. Comput. Sci. Inf. Secur., vol. 14, no. 2,

pp. 7–29, 2016.

C. Gupta, A. Saha, N. S. Reddy, and U. D. Acharya,

‘‘Cardiac disease prediction using supervised machine

learning techniques,’’ J. Phys., Conf. Ser., vol. 2161,

no. 1, 2022, Art. no. 012013.

Chaitanya, V. Lakshmi. "Machine Learning Based

Predictive Model for Data Fusion Based Intruder Alert

System." journal of algebraic statistics 13.2 (2022):

2477-2483.

I. Kaur G. Singh, ‘‘Comparative analysis of machine

learning algorithms for heart disease prediction,’’ J.

Biomed. Inform., vol. 95, Jul. 2019, Art. no. 103208.

J. Wu, J. Roy, and W. F. Stewart, ‘‘A comparative study of

machine learning methods for the prediction of heart

disease,’’ J. Healthcare Eng., vol. 2017, Jan. 2017, Art.

no. 7947461.

K. Shameer, ‘‘Machine learning predictions of

cardiovascular disease risk in a multi-ethnic population

using electronic health record data,’’ Int. J. Med.

Inform., vol. 146, Feb. 2021, Art. no. 104335.

M. Yang, X. Wang, F. Li, and J. Wu, ‘‘A machine learning

approach to identify risk factors for coronary heart

disease: A big data analysis,’’ Comput. Methods

Programs Biomed., vol. 127, pp. 262–270, Apr. 2016.

Mandalapu, Sharmila Devi, et al. "Rainfall prediction using

machine learning." AIP Conference Proceedings.Vol.

3028.No. 1.AIP Publishing, 2024.

N. Samadiani, A. M. E Moghadam, and

C.Motamed,‘‘SVM-based classification of

cardiovascular diseases using feature selection: A high-

dimensional dataset perspective,’’ J. Med. Syst., vol.

40, no. 11, p. 244, Nov. 2016.

N. S. Hussein, A. Mustapha, and Z. A. Othman,

‘‘Comparative study of machine learning techniques

for heart disease diagnosis,’’ Comput. Sci. Inf. Syst.,

vol. 17, no. 4, pp. 773–785, 2020.

ParadesiSubbaRao,”Detecting malicious Twitter bots using

machine learning” AIP Conf. Proc. 3028,

020073(2024),https://doi.org/10.1063/5.0212693

ParumanchalaBhaskar, et al. "Incorporating Deep Learning

Techniques to Estimate the Damage of Cars During the

Accidents" AIP Conference Proceedings. Vol.

3028.No. 1.AIP Publishing, 2024.

S. Akbar, R. Tariq, and A. Basharat, ‘‘Heart disease

prediction using different machine learning approaches:

A critical review,’’ J. Ambient Intell. Humanized

Comput., vol. 11, no. 5, pp. 1973–1984, 2020.

S. Gour, P. Panwar, D. Dwivedi, and C. A. Mali, ‘‘Machine

learning approach for heart attack prediction,’’ in

Intelligent Sustainable Systems. Singapore: Springer,

2022, pp. 741–747.

Suman, Jami Venkata, et al. "Leveraging natural language

processing in conversational AI agents to improve

healthcare security." Conversational Artificial

Intelligence (2024): 699-711.

X. Chen, Z. Hu, and Y. Cao, ‘‘Heart disease diagnosis using

decision tree and naïve Bayes classifiers,’’ World

Machine Learning for Heart Disease Identiﬁcation in E: Healthcare

869

Congr. Medical Phys. Biomed. Eng., vol. 14, pp. 1668–

1671, Aug. 2007.

X. Zhang, Y. Zhou, and D. Xie, ‘‘Heart disease diagnosis

using machine learning and expert system techniques:

A survey paper,’’ J. Med. Syst., vol. 42, no. 7, p. 129,

2018.

X. Zhang, Y. Zhang, X. Du, and B. Li, ‘‘Application of

XGBoost algorithm in clinical prediction of coronary

heart disease,’’ Chin. J. Med. Instrum., vol. 43, no. 1,

pp. 12–15, 2019.

Y. Li, W. Jia, and J. Li, ‘‘Comparing different machine

learning methods for predicting heart disease: A

telemedicine case study,’’ Health Inf. Sci. Syst., vol. 6,

p. 7, Dec. 2018.

Y. Liu, X. Li, and J. Ren, ‘‘A comparative analysis of

machine learning algorithms for heart disease

prediction,’’ Comput. Methods Programs Biomed., vol.

200, Nov. 2021, Art. no. 105965.

Z. Ahmed, K. Mohamed, and S. Zeeshan, ‘‘Comparison of

machine learning algorithms for predicting the risk of

heart disease: A systematic review,’’ J. Healthcare

Eng., vol. 2016, Jan. 2016, Art. no. 7058278.

Z. Alom, M. A. Azim, Z. Aung, M. Khushi, J. Car, and M.

A. Moni, ‘‘Early stage detection of heart failure using

machine learning techniques,’’ in Proc. Int. Conf. Big

Data, IoT, Mach. Learn., Cox’s Bazar, Bangladesh,

2021, pp. 23–25.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

870