Diabetes Risk Assessment: A Logistic Regression Modelling Study Based on Large-Scale Data
Haotian Sun, Haoning Tian, Fan Zhang
2025
Abstract
As a major global public health problem, early diagnosis and accurate risk assessment of diabetes are of great significance for disease prevention and control. Based on machine learning methods, this study systematically explored the application efficiency and clinical value of logistic regression (LR) and random forest (RF) algorithms in diabetes prediction. The study used a clinical data set of 768 observations to construct a prediction model by analyzing key health indicators such as blood glucose level, BMI index, age, number of pregnancies, and diabetes pedigree function. The results showed that the LR model showed good prediction performance with an accuracy of 78.26%, among which blood glucose level, BMI index, number of pregnancies, and diabetes pedigree function were identified as the most statistically significant predictors. The RF model (500 decision trees) showed a stronger ability to capture nonlinear relationships, with an accuracy of 74.03% and an AUC value of 0.831. Feature importance analysis showed that blood glucose, BMI, and age contributed the most to prediction. LR provides clear clinical interpretability, which helps doctors understand the impact of each risk factor; RF can effectively identify complex interactions between variables.
DownloadPaper Citation
in Harvard Style
Sun H., Tian H. and Zhang F. (2025). Diabetes Risk Assessment: A Logistic Regression Modelling Study Based on Large-Scale Data. In Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA; ISBN 978-989-758-774-0, SciTePress, pages 464-469. DOI: 10.5220/0013827600004708
in Bibtex Style
@conference{iampa25,
author={Haotian Sun and Haoning Tian and Fan Zhang},
title={Diabetes Risk Assessment: A Logistic Regression Modelling Study Based on Large-Scale Data},
booktitle={Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA},
year={2025},
pages={464-469},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013827600004708},
isbn={978-989-758-774-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA
TI - Diabetes Risk Assessment: A Logistic Regression Modelling Study Based on Large-Scale Data
SN - 978-989-758-774-0
AU - Sun H.
AU - Tian H.
AU - Zhang F.
PY - 2025
SP - 464
EP - 469
DO - 10.5220/0013827600004708
PB - SciTePress