machine learning models such as XG Boost, SVM,
Random Forest and kNN, and demonstrate
hyperparameter tuning techniques for each. XG Boost
emerges as the most successful model in terms of
predictive performance, according to our evaluation.
This research not only attains high accuracy but
also, and, more importantly, prioritizes model
interpretability - the latter is a critical feature for
healthcare AI adoption. You are trained until 2023
October. We use SHAP (Shapley Additive
Explanations) and LIME (Local Interpretable Model-
Agnostic Explanations) to increase model
transparency so that we can understand the key risk
factors better (such as age, blood pressure and
lifestyle habits). This allows healthcare practitioners
to accurately interpret, and respond, to the model
outputs. Moreover, in line with the ever-increasing
role of security in healthcare applications, out web-
based system employs HTTPS encryption for safe
data transfer and it utilizes Task Limiter module to
avoid Distributed Denial-of Service (DDoS) attacks
to guarantee reliability in high-traffic scenarios.
To this end, the main contributions of this paper are
as follows:
Construct a high-accuracy, interpretable stroke
prediction system using XG Boost
• SHAP added to LIME for transparency and
enabling clinician understanding of risk factors.
• SMOTE applied on imbalanced data, resulting in
more reliable and fairer by way of better tuning of
parameters
• Security hardening using HTTPS encryption and
DDoS protection, thus putting the system on a
target for production.
This research normalizes advanced machine learning,
explainable AI, and cybersecurity, moving stroke
prediction beyond traditional statistical approaches.
This proposed methodology not only improves early
detection of stroke but also fosters a more reliable and
utilitarian AI-based healthcare system.
2 LITERATURE REVIEW
Stroke is still a severe global health problem with a
high burden of mortality and chronic disability.
Development of accurately and interpretable stroke
prediction models is critical for timely therapeutic
intervention. MRs and deep neural networks
(DNNs)Recent machine learning (ML) algorithms
have shown great potential in analysing complex
patterns in large healthcare datasets for medical
diagnostics. However, there are still issues regarding
clinical inter pretablility, data imbalance, and real-
world application.
Numerous studies for stroke prediction have used
different ML techniques. Sharma et al introduced
ensemble learning to improve prediction accuracy
but this type of models tends to be unfriendly to
clinicians as it is not easy to interpret the results. The
model would be even better by including some
explainable AI methods like SHAP or LIME to trust
the model. Meanwhile, Patel and Verma reported a
92.5% accuracy using Random Forest and XG
Boost, but the model’s generalizability across
different populations was limited by dataset
constraints.
Stroke risk prediction has also been studied using
deep learning. Kumar et al. used traditional ML
classifiers with neural nets and reported a good ROC
Score of 0.94. However, their model is computation-
heavy and hence not deployable in resource-
constrained healthcare settings. Das and Mehta
examined SVM, KNN and XG Boost for stroke
classification and showed that XG Boost
outperformed SVM and KNN. However, they do not
tackle data imbalance, as it could lead to predictions
biased towards the majority class.
Gupta et al. the role of lifestyle in the prediction
of strokes, which may aid in risk stratification of
patients through behavior-based patterns. However,
the lack of longitudinal data prevented the model
from capturing longterm trends. Using boosting
algorithms (XG Boost and AdaBoost), Raj and
Reddy demonstrated their usefulness in stroke
prediction. But they did not include interpretability
measures in their study, which can help render the
reasoning behind the model’s decisions and actions
clear to medical professionals.
Ahmed et al. animal data, where feature
importance modeling identified hypertension and
cholesterol as key predictors. However, their work
did not consider socioeconomic factors, which might
have led to a more comprehensive picture of stroke
risk. Das et al. that found AdaBoost to perform best
out of a number of classifiers but their lack of
diversity in used datasets limits the model’s
generalizability to other demographic groups.
Although these advancements have been
achieved, stroke prediction models are still
constrained by issues concerning clinical
interpretability, computational feasibility, and dataset
bias. In the future, researchers should work on
designing models that are not only accurate and
scalable but also transparent by combining domain
knowledges and machine learning techniques for
more real-world applications.