Heart Disease Prediction Using Warm and Naviebayes

Satheesh Kumar A. and Vijayalakshmi M.

Department of Computer Science and Engineering, Nandha Engineering College, Erode, Tamil Nadu, India

Keywords: Heart Disease, Machine Learning, Predictive Modeling, Medical History.

Abstract: Heart disease cases have been rising quickly lately, thus it's critical to anticipate these conditions. It is

challenging to identify the criteria, and because it involves sensitive information, it must be done correctly.

We have created an application that estimates an individual's risk of developing heart disease. In the study,

which uses a dataset that includes clinical factors like age, sex, kind of chest discomfort, resting blood

pressure, cholesterol levels, fasting blood sugar, and others, machine learning algorithms are applied to predict

heart disease. We specifically contrasted the effectiveness of Naive Bayes and Decision Tree classifiers.

Eighty samples were used for training, and twenty samples were used for testing. Models were trained on the

training set and predictions were produced on the testing set. With precision, recall, and F1-score all tightly

aligned at 85-88%, the Decision Tree model attained an accuracy of 85%. However, with a 90% score in

accuracy, precision, recall, and F1-score, the Naive Bayes model beat the Decision Tree, indicating that it

would be more useful in this situation. The models' performance was further examined using confusion

matrices, which showed that Naive Bayes also performed better in terms of balancing false positives and false

negatives. These results highlight the promise of using machine learning methods to the early identification

and detection of cardiac disease.

1 INTRODUCTION

Heart disease continues to be a major global cause of

death, creating a serious public health issue.

Predicting one's risk of heart disease is a challenging

undertaking due to the complex interplay of genetic,

behavioral, and environmental factors. An increasing

number of people are interested in using

computational techniques to improve the precision

and effectiveness of cardiac disease prediction as a

result of the development of new technology,

especially in the area of machine learning. A kind of

artificial intelligence called machine learning enables

computers to recognize patterns and forecast

outcomes from data without the need for explicit

programming. Machine learning algorithms are

capable of analyzing large datasets that contain a

variety of patient data, such as medical history,

lifestyle decisions, and genetic predispositions, in the

context of predicting cardiac disease. Through the

discovery of latent patterns and associations within

this data, these algorithms can help medical personnel

anticipate and prevent cardiac disease more

successfully. With its capacity to provide

individualized and data- driven insights, this

predictive modeling methodology has the potential to

completely transform conventional risk assessment

techniques. Unlike traditional risk calculators, which

frequently depend on a small number of variables,

machine learning models can consider a large number

of variables and adjust to new data, improving

forecast accuracy.

1.1 Heart Disease

Figure 1: Heart disease.

480

A., S. K. and M., V.

Heart Disease Prediction Using Warm and Naviebayes.

DOI: 10.5220/0013931700004919

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 5, pages

480-485

ISBN: 978-989-758-777-1

Heart disease is a major cause of morbidity and

mortality and a challenging global health challenge.

Because of its intricate interplay of genetic, lifestyle,

and environmental factors, it is complicated and

necessitates novel ways to prevention and prediction.

Figure 1 show the Heart Disease.

1.2 Machine Learning

Machine learning is, at the forefront of technological

innovation, a technological leap in how computers

learn and make decisions. Machine learning, a branch

of artificial intelligence, enables systems to study vast

amounts of information on their own, discovering

trends and correlations that more traditional

programming methods miss. Machine learning

models, as opposed to traditional systems that rely on

human-written commands, learn and improve over

time as additional data is exposed to the system. So

essentially this adaptive capability allows machines

to predict outcomes, detect complex patterns and

learn more and more about various datasets over

time. From image recognition to predictive analytics

to a variety of other things your machine learning

changes from industry to industry, but you're making

a big impact in technology, healthcare, finance, and

more. Through the process of innovation, they will

not only determine better solutions but will reshape

whole industries, enhance productivity and unveil

insights which we could not obtain using other

computational methods.

1.3 Predictive Modeling

Predictive modeling, as one of the best methodologic

tools, which helps the user to navigate through a

complex data analysis landscape by projecting

information about its future using information

collected from the past trends and patterns. In simple

terms, it means developing and using algorithms to

predict or identify any patterns in data sets. This

method is particularly useful in domains where it is

important to understand and predict future events.

Predictive modeling is an innovative technology used

in areas like marketing to predict customer behavior,

in healthcare to anticipate the course of an illness, or

in finance to gain insight into market trends. It

processes this raw data into actionable insights using

statistical and mathematical methods, allowing

decision-makers to take pre-emptive actions to

project hurdles and tap opportunities. As the sectors

turn increasingly to data to inform strategies,

predictive modeling becomes not just a tool but a

game changer, enabling businesses to navigate

nuances and reconditions and make wise decisions in

ever-evolving environments.

1.4 Medical History

The medical history of a patient is a complicated story

stitched together from the threads of their previous

medical experiences. It is an integral part of health

care. Think of it as an exhaustive record: a chronicle

of all the factors that form a person’s health journey,

from diseases and treatments to choices and genetic

characteristics. For healthcare professionals, this

historical tapestry is a vital road map that offers

priceless insights into the trajectory of a patient’s

health. By carefully analyzing medical histories,

clinicians can recognize patterns, identify risk

factors, and reach evidence-based conclusions about

diagnosis, treatment, and the best course of

preventive measures. Overall, a strong understanding

of medical history informs present-day practices and

enables a personalized, holistic approach to patients.

As healthcare advances, the need for deep diveing

into a rich database of a patient medical history is

becoming increasingly evident.

2 LITERATURE REVIEW

Sean C. A lot of good has been achieved so far with

the implementation of machine learning in health

care, and the future looks promising, with the early

diagnosis and prognosis of a number of diseases. And

with the use of machine learning algorithms, the

benefits for heart health are even more pronounced.

The most important advantage of predicting future

heart trouble in advance is being able to detect them in

time and customizing the treatment. The objective of

this research work is to explore and compare the

performance of various types of machine learning

classifiers with respect to heart disease prediction. So,

the classifiers we are work on are: Decision trees,

Naive Bayes, Logistic Regression, Support Vector

Machines (SVM) and Random Forest. These

classifiers were compared in order to find the best for

heart health prediction. necessary. Every one of these

classifiers has unique strengths and characteristics.

Furthermore, the research proposes a new approach;

an ensemble classifier. This classifier integrates the

advantages of strong and weak classifiers, rather than

merely following a single-model method. The

reasoning for this hybrid classification approach lies

in its ability to adequately use a vast amount of

training and validation samples. By combining these

diverse models, the ensemble classifier seeks to

Heart Disease Prediction Using Warm and Naviebayes

481

enhance the overall robustness and predictive

performance of the model, allowing for a more reliable

approach to the early identification of potential cardiac

issues.

Senthilkumar Mohan of one of leading causes of

death, heart disease remains a worldwide health

problem. Cardiovascular diseases prediction →

Clinical data analysis would be incomplete without

addressing prediction of cardiovascular diseases.

Analyzing the huge quantity of data generated by the

healthcare sector requires advanced technologies,

which has made machine learning extremely

effective in this field. One potential way to enhance

prediction accuracy and decision making for

cardiovascular health alongside machine learning

methods is to combine them with clinical prerogatives.

The confluence of machine learning and the Internet

of Things (IoT) adds a new world to healthcare

analytics. Recent IoT based advancements have

illustrated that the combination of machine learning

algorithms with IoT devices can result in live data

that may be helpful in cardiac disorder prediction and

prevention. This convergence of technologies has

enabled more reactive and personalized healthcare

interventions. Though several advancements have

already been made in this area of cardiac disease

prediction, this study aims to push the field further by

proposing a different approach. The objective is to

apply advanced machine-learning-based algorithms to

identify and exploit key factors for significant rise in

prediction accuracy of cardiovascular-related

diseases.

Shu Jiang A worldwide analysis of the impact of

cardiovascular diseases (CVDs) indicates that this

health condition affects a large number of individuals

and leads to the causation of the most deaths in the

world, more than any other reason. In 2016, CVDs

were responsible for 17.9 million deaths globally

(31% of all deaths) according to the World Health

Organization K. T and Agarwal. Kumar, et al. Heart

attacks and strokes were responsible for 85% of these

deaths. With death rates of 50% or higher and

cardiovascular surgery being notoriously expensive,

this grim reality not only takes an enormous emotional

toll on the affected families, it also poses a major

financial burden. Heart disease is a significant and

even apparently unmanageable risk in economically

impoverished areas, where the problem is particularly

bad. Therefore, exploring the indirect associations

between various human attributes and susceptibility to

coronary heart disease may be necessary. Building

solid predictive model is more of analytics work but

also an important tool in predicting and preventing

cardiac problems. However, within this paradigm,

machine learning applications emerge as a formidable

weapon against heart disease. It builds theory and

methods around practical application thus is closely

related to computational statistics. The two branches

of traditional approaches of both supervised learning

and unsupervised learning represent the diversity in

the field of Machine learning. For the very specific

goal of identifying heart disease from its physiological

characteristics the solution is clear: supervised

learning.

Pronab Ghosh Cardiovascular diseases (CVDs)

remain a global health challenge because of their

broad and detrimental impact on human health. It is

crucial to identify risk factors, as early detection of

CVDs can help prevent or mitigate their effects. In this

context, predicting cardiac disease from machine

learning models would appear to be a viable approach.

To enhance the accuracy of such predictions, the

proposed model in this study incorporates a blend of

methods. The success of the proposed model relies on

a robust data management approach with impactful

data collection, pre- processing, and transformation

techniques. These steps are necessary to ensure the

generation of accurate and reliable data used for

training the model. By including the various datasets

like that of Stat log, Long Beach VA, Cleveland,

Switzerland, and Hungarian it is a thorough model.

This makes it possible to capture a wide array of data

for analysis. Feature selection: A critical step to

enhancing the predictive power of the model in order

to find and select the most relevant features in this

paper, the Relief method and Least Absolute

Shrinkage and Selection Operator (LASSO) are

presented. The application of a strategic selection

process enhances the model's ability to the risk

factors for heart disease. The novelty of this research

is the introduction of new hybrid classifiers, including

Gradient Boosting Method (GBBM), AdaBoost

Boosting Method (ABBM), K-Nearest Neighbors

Bagging Method (KNNBM), Decision Tree Bagging

Method (DTBM), and Random Forest Bagging

Method (RFBM). These hybrid classifiers learn

bagging and boosting

methods with basic classifiers at

the training time.

T. Kumaresan et al, Heart diseases are becoming

more common and to prevent before they become

severe, pre-examination is a must Harshit Jindal the

complexity of this diagnostic task calls for both

efficiency and precision, making testing novel

approaches desirable. The study article under

discussion is on the subject of identifying patients who

are at higher risk for heart disease based on a variety

of medical characteristics. To meet this challenge, the

researchers have developed a heart disease prediction

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

482

system that harnesses large portions of patients'

medical histories for this analysis. The aim of this

system is to provide a predictive model that identifies

a patient with heart disease before it actually takes

place, enabling a preventive approach to treatment

intervention. This suggests the range of machine

learning algorithms such as K-nn and logistic

regression that can be used, underlining the flexibility

of modern computational techniques in medical

diagnostic contexts. One of the key points of the

research is improving the accuracy of heart disease

predictions. Authors have fine-tuned the model to

ensure reliability and performance.

3 EXISTING SYSTEM

One of the hardest things to do in medicine is to

estimate heart disease. This certainly takes a lot of

time and effort in particular for doctors and other

medical professionals to figure out the cause of this.

This study uses GridSearchCV with LR, KNN,

SVM, and GBC (added/mentioned above) machine

learning algorithms to forecast cardiac illness. The

system uses a 5-fold cross-validation approach for

verification. Fourth, a comparative analysis of these

four approaches. The datasets for Cleveland,

Hungary, Switzerland, Long Beach V and UCI

Kaggle are used to test the performance of the

models. Data mining analysis shows that with one

exception (Hungary, Switzerland & Long Beach V

and UCI Kaggle), the Extreme Gradient Boosting

Classifier with GridSearchCV has the best and almost

similar test and train model accuracy of 100% and

99.03 respectively. Moreover, as shown in the

analysis, both datasets (Hungar, Switzerland & Long

Beach V and UCI Kaggle) produced the best and

identical accuracies in testing and training to be the

XGBoost Classifier without GridSearchCV

produced (98.05-100) and (100,100).

4 PROPOSED SYSTEM

The proposed technique leverages an ensemble of

machine learning algorithms to produce a model that

estimates an individual heart disease risk. The

system employs the Train-Test Split technique to

divide the data into separate teaching/testing sets,

using a dataset that provides important clinical

parameters such as age, sex, cholesterol, and other

relevant variables. Then the dataset is used to train

the models on Decision It can integrate two popular

algorithms for classification and regression-

Decision Tree and Naive Bayes classifier which

enables the system to recognize complex pattern and

relationships between the data

complex. Also,

WARM Rules are employed in the algorithm, which

could indicate a sophisticated method for assigning

weights to decision tree rules and enhancing both the

predictive power and interpretability of the system. It

also provides thorough checks of algorithms,

including metrics for predicting cardiac disease such

as accuracy, precision, recall, and F1-score. Through

this evaluation process the researchers and medical

experts can select an appropriate algorithm for stage

diagnosis and detection.

Dataset: The dataset consists of 14 columns, which

include significant details such as age, sex,

cholesterol levels, and the target variable which

indicates whether or not heart disease exists.

Analysing the dataset, through either statistical

summaries and/or visualizations, provides insight

into the target variable, the distribution of features

and potential relationships between features and the

target variable.

WARM Rules: Your data is limited up to October

2023, but it seems that the WARM Rules section

directly references a specific aspect of the

implementation of the decision tree algorithm, which

involves assigning different weights to the model

rules. These rules determine how the algorithm

makes decisions, having an impact on the model's

effectiveness in predicting the future. More

contextual information and warranted clarification

might shed light on why these norms matter in the

context of the study.

Train Test Split: This section splits the dataset into

training and testing sets. These (features/independent

and dependent/target) two variables make up the

training set, that consists of 80 samples. The testing

set (X_test, Y_test) consists of 20 samples as well, as

the features and target variable, the latter is for testing

purposes. Dividing the models makes certain that

they are trained on only a portion of the data and

tested on another element, which significantly

simplifies the assessment of the generalization ability.

Figure 2 show the Block Diagram.

Model Evaluation: In particular, the performances

of the Decision Tree (DT) and Naive Bayes (NB)

models on the testing set are examined in detail. Our

Decision Tree model performance shows we

correctly classified most of our samples with an 85%

accuracy, while precision, recall, and F1 score nearly

matched: 88%, 85%, and 0.85 respectively. In

Heart Disease Prediction Using Warm and Naviebayes

483

contrast, the Naive Bayes model exhibited a strong

and robust level of prediction accuracy, achieving an

accuracy of 90%, thereby surpassing the Decision

Tree. Naive Bayes had stable and balanced

performance, with precision, recall, and F1 score all

at the 90% mark. Further analysis of the models'

classification performance was done by using of

confusion matrix of respective to models, Naive

Bayes had got better balance in true positive, true

negative, and false positive and false negative ratio.

Figure 2: Block diagram.

5 RESULT ANALYSIS

To predict cardiac disease from patient

characteristics, two machine learning algorithms

(Decision Tree (DT) and Naive Bayes (NB)) were

trained and evaluated in this study. Models were

trained on a dataset with columns of age, sex, resting

blood pressure (trestbps), cholesterol levels (chol),

and type of chest pain (cp). I used part of the dataset

to train the models and I used a part of the data for

testing. In this case, with 85% accuracy, the Decision

Tree is able to classify 85% of the occurrences in the

testing set correctly. It also had an accuracy rate of

88% and recall rate of 85%, which indicates that the

tool can accurately detect positive cases for heart

disease with fewer false positives. The confusion

matrix shows that the model did, in fact, misclassify

a few cases, such as in three cases where the heart

disease is classified as non-disease. Naive Bayes

achieved a higher accuracy (90%), precision (90%)

and recall (90%) than the What-If tool. Table 1 show

the Comparison table The Naive Bayes well

classified a higher percentage of occurrences, along

with a decent balance between accuracy and recall.

In the case of Naive Bayes confusion matrix, only

one instance was soft predicted, which represents

robustness of this model for heart disease prediction.

Figure 3 show the Comparison graph.

Accuracy is one of the most common metrics for

evaluating classification performance, calculated as

the ratio between the number of correctly segmented

samples and the total number of samples.

𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚  𝑻𝑷/ 𝑻𝑷  𝑭𝑵

(1)

Precision: Precision is the ratio of correctly

predicted positive observations to the total predicted

positive observations (In other words, the accuracy of

positive predictions). Precision can be expressed as:

𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏  𝑻𝑷 / 𝑻𝑷  𝑭𝑷 (2)

The ratio of true positives to total (real) positives

in the data is known as recall or sensitivity.

Sensitivity and recall are synonymous.

𝑹𝒆𝒄𝒂𝒍𝒍  𝑻𝑷 / 𝑻𝑷  𝑭𝑵

(3)

The ratio of genuine negatives to total negatives

in the data is known as specificity. Specificity is the

program's accurate designation for everyone who is

actually healthy.

𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚  𝑻𝑵 / 𝑻𝑵  𝑭𝑷

(4)

Table 1: Comparison Table.

Algorithm Accuracy Precision Recall

F1-

Score

NB 0.9 0.9 0.9 0.9

DT 0.85 0.85 0.85 0.85

Figure 3: Comparison Graph.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

484

6 CONCLUSIONS

To sum up, incorporating machine learning

algorithms offers a viable way to improve cardiac

disease prediction and early detection. This study

shows that Decision Tree and Naive Bayes classifiers

may accurately predict an individual's risk of heart

disease through the building and evaluation of

predictive models using clinical factors, such as age,

sex, and cholesterol levels. The results emphasize

how critical it is to use cutting-edge analytical

methods to address the increased prevalence of

cardiovascular diseases. This research adds to the

ongoing efforts to improve diagnostic skills and

individualized healthcare interventions by offering

insights into the relative efficacy of various

algorithms as well as their advantages and

disadvantages.

7 FUTURE WORK

Subsequent research in this field may investigate

the combination of sophisticated feature

engineering methods and deep learning

structures to improve the prognostic potential of

heart disease models. A more thorough

understanding of the dynamic nature of

cardiovascular health might be obtained by

looking at the effects of various health

parameters and adding data from real- time

monitoring. Moreover, efforts must to be focused

on creating interpretable models in order to

improve predictability and transparency,

particularly in situations where important

healthcare decisions must be made.

REFERENCES

H. S. and Kaur. "Improving the accuracy of heart disease

prediction using machine learning methods and

optimization," Ajay, Next Generation Computing

Technologies (ngct), 2016, pp. 516–521.

J. In the proceedings of the inaugural instructional

conference on machine learning, vol., Ramos et al.

discuss "efficient prediction of cardiovascular disease

using machine learning algorithms with relief and lasso

feature selection techniques." 242. 133–142 in

Piscataway, New Jersey, 2003.

K. Toutanova as well as c. Cherry, "Heart disease prediction

using machine learning and svm techniques," in

Proceedings of the 4th International Joint Conference

on Natural Language Processing of the American

Foundation for Nursing and Palliative Care, Volume 1–

Vol 1. 2009, pp. 486–494, Association for

Computational Linguistics.

K. T and Agarwal. Kumar, in the 2nd International

Conference on Intelligent Computing and Control

Systems (ICICCS), "Heart disease prediction using

machine learning." Ieee, 2018.

M. A and Mohammed. Selamat, "Machine learning

algorithms for heart disease prediction," in

International Conference on Computer,

Communications, and Control Technology (i4ct). 2015,

IEEEE,pp. 227–231.

Rizky, W. M., Afrizal, D., Ristu, S. "Heart disease

prediction system using sequential backward selection

algorithm for features selection and machine learning

model." Journal of Scientific Informatics, vol. 3(2),

Nov. 2020, pp. 41–50.

S. Rajput together with one. Arora, "Hybrid machine

learning techniques for effective prediction of heart

disease," International Journal of Computer

Applications, vol. 2013, 75, no. 10, pp. 6–12.

T. Mikolov and g. Zweig, "Machine learning for

cardiovascular diseases (CVDs)," in the 2012 IEEE

Spoken Language Technology Workshop (SLT).

Pages. 234–239 in IEEEE, 2012.

T. Sainath, N. o. Vinyals, one. both h and senior. Sak,

"Optimization of energy consumption in container-

oriented cloud computing centers," 2015 IEEE

International Conference on Acoustics, Speech, and

Signal Processing (icassp) proceedings. IEEEE, 2015,

4580–4584 pages.

T. Kumaresan along with c. Palanisamy, "Prognostication

of heart disease, "International Journal of Bio-

inspired Computing, vol. 9, no. 3, 2017, pp. 142–156.

Heart Disease Prediction Using Warm and Naviebayes

485