and disadvantages. Logistic Regression is known for
its simplicity and scalability, making it very
appealing in the medical field where quick decisions
need to be taken. However, with linear regression all
the complex health data cannot be modelled.
Improvements such as iterating it with other
regularization methods that aim to increase accuracy
and precision. In contrast, KNN lacks high-
dimensional or unbalanced datasets, which inhibits its
performance for complex clinical cases. Possibly,
applying optimization techniques and data
preprocessing steps, such as normalization, can
improve its robustness and scalability for health.
As the outcome must be clear in clinical settings
for decision making, Decision Trees are often favored
due to their interpretability. It makes healthcare
workers comfortable as they can pick up the model
easily. The major issue with Decision Trees is that
they are prone to overfit, particularly on smaller or
noise datasets, and not as good at generalization.
Despite a relatively low accuracy and precision, their
good interpretability is still an advantage of such
models when used in clinical practice. One of the
biggest hurdles in predicting heart disease has been
finding a trade-off between model complexity and
interpretability necessary for clinical decisions.
However, DNNs, while often being able to achieve
very high accuracies, are still considered "black
boxes" in that they leave decisions unexplained and
as such are not well-suited for medical applications
where interpretability is of utmost importance.
Logistic Regression and Decision Trees are simpler
models that give better transparency, but they may not
catch the more complicated patterns in the data.
Further research employing hybrid models that
amalgamate the best of both types and utilizing
explainable AI (XAI) methods to, in general, increase
transparency of more complex models while keeping
the balance with accuracy can be pursued.
Moreover, data imbalance is common in medical
datasets because typically the number of the healthy
is greater than the number of the sick and this also
leads to deteriorating on model performance. Over-
sampling techniques like Smote (Synthetic Minority
Over-sampling Technique) or cost-sensitive learning
can help balance these classes which in turn will
increase the sensitivity and precision of heart disease
detection. Lastly, future work for ML models should
seek the integration of them into clinical decision
support systems. Integrating predictive models into
front-line clinical systems would streamline
diagnostic workflows and help clinicians pinpoint
high-risk patients, readmissions, or alarms for time-
sensitive decisions. The extent of collaboration
between data scientists and medical professionals will
also be vital in order to ensure that models are not
only accurate but useful for actual clinical
applications.
4 CONCLUSIONS
This study explores how different machine learning
models would perform to predict cardiovascular
disease using the UCI Heart Disease dataset. This
research was performed by using models such as
Logistic Regression, KNN, Decision Trees, ANN and
DNN to study the significant risk factors Responsible
for predicting heart diseases. The steps include
extensive data preprocessing, followed by an
evaluation from metrics such as accuracy and
precision. Results obtained from the experiments
showed that Logistic Regression was the most
accurate, but ANN/DNN had better pattern
recognition ability in spite of suffering overfitting
problems. Decision Trees provided good
interpretability yet had a problem in generalization,
and performance was dependent on data
characteristics. Further work should strive to improve
the generalizability of these models and to tackle the
overfitting and data imbalance problems. Algorithms
and strategies like data augmentation or transfer
learning will be investigated in order to get more out
of neural networks. Secondly, thesis will explore the
integration of XAI methods to further improve model
interpretability and render these state-of-the-art
techniques more suitable for clinical use. It will also
be necessary to work on increasing the % accuracy
and reduce false positives even more before these
methods are ready for use in practical health facilities.
REFERENCES
Absar, N., Das, E. K., Shoma, S. N., Khandaker, M. U.,
Miraz, M. H., Faruque, M. R. I., Tamam, N., Sulieman,
A., & Pathan, R. K., 2022. The Efficacy of Machine-
Learning-Supported Smart System for Heart Disease
Prediction. Healthcare, 10(6), 1137.
Bhavekar, G.S., Das Goswami, A., Vasantrao, C.P., et al.
2024. Heart disease prediction using machine learning,
deep Learning and optimization techniques-A semantic
review. Multimedia Tools and Applications, 1-28.
Cardiovascular. D., 2021. Retrieved on 2024, Retrieved
from:https://www.who.int/news-room/fact-
sheets/detail/cardiovascular-diseases-(cvds).
Enad, H., & Mohammed, M., 2023. A Review on Artificial
Intelligence and Quantum Machine Learning for Heart
Disease Diagnosis: Current Techniques, Challenges