Predictive Modelling for Diabetes Mellitus with Respect to Basic Medical History

Patrick Purta, Aryan Mishra, Vishal Reddy Vadde, Ruthvika Bojjala, Gopichand Jagarlamudi, Bonaventure Chidube Molokwu

2025

Abstract

In our work herein, we observed how three (3) common oversampling techniques - SMOTE, SMOTE-ENN, and SVM-SMOTE - affect the performance of Machine Learning (ML) models applied towards predicting diabetes risk with reference to the Pima-Indian (Akimel O’odham) Diabetes dataset. Our aim was to figure out if using these methods to mitigate class imbalance, in a medical dataset, might cause the ML models to overfit - in other words, they tend to do very well on the training data but lose fitness and accuracy on new data. Our project began from a simple question: “Can oversampling fix class imbalances, with respect to a given dataset, without hurting the model’s ability to generalize?” Previous studies have shown that oversampling can help balance target-classes within a dataset, but these studies do not always address the risk of overfitting. To answer this, we combined each oversampling technique via three (3) ensemble methods - Extra Trees, Gradient Boosting, and Random Forest - and compared their performances via cross-validation objective functions. Our results reveal that, although each method improves the results or metrics on the training data, they tend to under-perform slightly on unseen test or sample data. This suggests that while oversampling is a useful strategy, it must be applied with caution to avoid overfitting. These insights are important for refining predictive models, especially in healthcare contexts where reliable performance is critical.

Download


Paper Citation


in Harvard Style

Purta P., Mishra A., Vadde V., Bojjala R., Jagarlamudi G. and Molokwu B. (2025). Predictive Modelling for Diabetes Mellitus with Respect to Basic Medical History. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-772-6, SciTePress, pages 343-349. DOI: 10.5220/0013692100003985


in Bibtex Style

@conference{webist25,
author={Patrick Purta and Aryan Mishra and Vishal Vadde and Ruthvika Bojjala and Gopichand Jagarlamudi and Bonaventure Molokwu},
title={Predictive Modelling for Diabetes Mellitus with Respect to Basic Medical History},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2025},
pages={343-349},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013692100003985},
isbn={978-989-758-772-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - Predictive Modelling for Diabetes Mellitus with Respect to Basic Medical History
SN - 978-989-758-772-6
AU - Purta P.
AU - Mishra A.
AU - Vadde V.
AU - Bojjala R.
AU - Jagarlamudi G.
AU - Molokwu B.
PY - 2025
SP - 343
EP - 349
DO - 10.5220/0013692100003985
PB - SciTePress