
diabetic. This research is not just limited to only med-
ical data, it can be adapted to any type of numeric data
requiring a classification task.
7.3 Future Work
Future research efforts will substantially broaden the
scope of our investigation. We plan to incorporate
more diverse and extensive datasets, encompassing
varied populations to enhance the generalizability and
robustness of the predictive models herein. Addition-
ally, further exploration into advanced Deep Learn-
ing methods such as Convolutional Neural Networks
(CNNs), Recurrent Neural Networks (RNNs), and hy-
brid Deep Learning models is anticipated. These so-
phisticated architectures could uncover deeper pat-
terns within complex medical data, potentially of-
fering substantial improvements in predictive perfor-
mance. Moreover, detailed preprocessing techniques
including advanced feature engineering, dimensional-
ity reduction techniques such as Principal Component
Analysis (PCA) and t-distributed Stochastic Neighbor
Embedding (t-SNE), will be rigorously examined to
optimize the feature set for enhanced predictive Ac-
curacy scores.
Addressing the interpretability of these models
will also be an essential aspect of our future work,
aiming to produce models that are not only accurate
but also easily interpretable by healthcare profession-
als. In addition to technical advancements, we intend
to deploy our model on accessible and user-friendly
platforms, such as interactive web applications or mo-
bile apps. This approach aims to facilitate real-time
diabetes risk assessment tools for healthcare profes-
sionals and patients alike, promoting proactive health
management.
Collaborations with clinical institutions and
healthcare providers will also be pursued to validate
our model further through extensive real-world clini-
cal trials and implementations, ensuring practical rel-
evance and efficiency in varied clinical domains.
ACKNOWLEDGMENT
The authors would like to thank the Department
of Computer Science at California State University,
Sacramento, for providing the necessary resources
and guidance throughout the course of this project.
We also extend our gratitude to the faculty members
and mentors who offered valuable insights and sup-
port during the research and experimentation phases.
Special thanks to the creators and managers of the
Pima-Indian Diabetes Dataset for making the dataset
publicly available, which served as a critical foun-
dation for our study. Finally, we would like to ac-
knowledge the contributions of all team members
whose dedication, collaboration, and shared problem-
solving efforts were instrumental in the successful
completion of this project.
REFERENCES
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). Smote: Synthetic minority over-
sampling technique. Journal of Artificial Intelligence
Research, 16:321–357.
Demidova, L. and Klyueva, I. (2017). Svm classification:
Optimization with the smote algorithm for the class
imbalance problem. In 2017 6th Mediterranean Con-
ference on Embedded Computing (MECO), pages 1–4.
Kaliappan, J., Kumar, I. J. S., Sundaravelan, S., Anesh,
T., Rithik, R. R., Singh, Y., Vera-Garcia, D. V.,
Himeur, Y., Mansoor, W., Atalla, S., and Srinivasan,
K. (2024). Analyzing classification and feature selec-
tion strategies for diabetes prediction across diverse
diabetes datasets. Frontiers in Artificial Intelligence,
7:1421751.
Lugat, V. (2021). Pima indians diabetes - eda and prediction
(0.906). https://www.kaggle.com/code/vincentlugat/
pima-indians-diabetes-eda-prediction-0-906. Ac-
cessed: 2025-05-18.
Mooney, P. T. (2018). Predict diabetes
from medical records. https://www.
kaggle.com/code/paultimothymooney/
predict-diabetes-from-medical-records. Accessed:
2025-05-18.
Nnamoko, N. and Korkontzelos, I. (2020). Efficient
treatment of outliers and class imbalance for dia-
betes prediction. Artificial intelligence in medicine,
104:101815.
Olisah, C. C., Smith, L. N., and Smith, M. L. (2022).
Diabetes mellitus prediction and diagnosis from a
data preprocessing and machine learning perspec-
tive. Computer methods and programs in biomedicine,
220:106773.
Poornima, V. and R., R. (2024). A hybrid model for predic-
tion of diabetes using machine learning classification
algorithms and random projection. Wirel. Pers. Com-
mun., 139:1437–1449.
Santos, M. S., Soares, J. P., Abreu, P. H., Ara
´
ujo, H., and
Santos, J. A. M. (2018). Cross-validation for imbal-
anced datasets: Avoiding overoptimistic and overfit-
ting approaches [research frontier]. IEEE Computa-
tional Intelligence Magazine, 13:59–76.
Zhang, Z., Ahmed, K. A., Hasan, M. R., Gedeon, T., and
Hossain, M. Z. (2024). A deep learning approach to
diabetes diagnosis. In Asian Conference on Intelli-
gent Information and Database Systems, pages 87–99.
Springer.
Predictive Modelling for Diabetes Mellitus with Respect to Basic Medical History
349