Precision: The proportion of predicted positive
samples that are actually positive. Recall: The
proportion of the positive category that is actually
correctly predicted to be positive. F1-score: The
harmonic average of accuracy and recall, suitable for
scenarios where both accuracy and recall need to be
considered. Accuracy: The proportion of correctly
classified samples to the total sample size, applicable
to data sets with balanced categories, but not
applicable to cases with unbalanced categories.
3.3 Discussion
As can be seen from the above table, the analysis and
comparison of prediction results on the test set show
that in terms of model accuracy, the Bayes decision
model is higher than the other five models, and the
recall rate and F1-score of the decision tree model are
better than the other five models. In terms of model
accuracy, SVM is higher than the other five models.
4 CONCLUSIONS
Diabetes is one of today's most serious chronic
illnesses, and early detection may significantly
enhance a patient's chances of managing it.This paper
constructs a prediction model based on various
machine learning algorithms, which can be applied to
predicting diabetes risk based on user input
characteristic data. This model takes the diabetes data
set as the research object, and 2000 effective data sets
are obtained through data preprocessing technology.
Through data feature analysis, it is concluded that
diabetes prevalence has the greatest correlation with
glucose, while insulin and diabetes spectrum function
has the least correlation. Through data preprocessing
and data feature analysis, a prediction model based on
KNN, naive Bayes, SVM, decision tree, random
forest, logistic regression, and other six classification
algorithms was constructed to achieve diabetes risk
prediction.Finally, the test set was utilized to assess
the predictive model's performance. Through the
analysis of the model accuracy rate, recall rate, F1-
score, accuracy rate, and other indicators, it was
found that the model constructed using the SVM
algorithm achieved the highest accuracy of prediction
results, and the recall rate and F1-score of the
decision tree model were superior to the other five
models. The Bayesian decision model is higher than
the other five models. In future studies, a similar
approach could be applied to other disease datasets,
such as cardiovascular disease.
REFERENCES
Birjais, R. et al. 2019. Prediction and diagnosis of
future diabetes risk: a machine learning approach.
SN Applied Sciences 1:1-8.
Chou, C.Y. et al. 2023. Predicting the onset of
diabetes with machine learn-ing methods. Journal
of Personalized Medicine 13(3):406.
Gong, T., et al. 2024. Analysis of diabetes prevalence
rate, awareness rate, treatment rate and control rate
in Fangshan District. Preven-tive medicine 7:616-
621.
Harding, J.L. et al. 2019. Global trends in diabetes
complications: a review of current evidence. Dia-
betologia 62:3-16.
Khan, R.M.M. et al. 2019. From pre-diabetes to
diabetes: diagnosis, treatments and translational
research. Medicina 55(9):546.
Roglic, G. 2016. WHO Global report on diabetes: A
summary. International Journal of Noncommuni-
cable Diseases 1(1): 3-8.
Sharma, T. & Shah, M. 2021. A comprehensive
review of machine learning techniques on diabe-
tes detection. Visual Computing for
Industry,Biomedicine, and Art 4(1): 30.
Standl, E. et al. 2019. The global epidemics of
diabetes in the 21st century: Current situation and
per-spectives. European journal of preventive
cardiology 26:7-14.
Taylor, R. et al. 2021. Nutritional basis of type 2 di-
abetes remission. BMJ 374.
Tomic, D. et al. 2022. The burden and risks of
emerging complications of diabetes mellitus.
Nature Reviews Endocrinology 18(9): 525-539.
Zhang, Z., et al. 2024. Study on the incidence and
influencing factors of diabetes mellitus in
communities in central urban areas of Beijing.
Chinese Medical Review 18: 34-37.