Predictive Modeling of Diabetes using EMR Data

Hasan Zafari, Jie Li, Farhana Zulkernine, Leanne Kosowan, Alexander Singer

2022

Abstract

As the prevalence of diabetes continues to increase globally, an efficient diabetes prediction model based on Electronic Medical Records (EMR) is critical to ensure the well-being of the patients and reduce the burden on the healthcare system. Prediction of diabetes in patients at an early stage and analysis of the risk factors can enable diabetes primary and secondary prevention. The objective of this study is to explore various classification models for identifying diabetes using EMR data. We extracted patient information, disease, health conditions, billing, and medication from EMR data. Six machine learning algorithms including three ensemble and three non-ensemble classifiers were used namely XGBoost, Random Forest, AdaBoost, Logistic Regression, Naive Bayes, and K-Nearest Neighbor (KNN). We experimented with both imbalanced data with the original class distribution and artificially balanced data for training the models. Our results indicate that the Random Forest model overall outperformed other models. When applied to the imbalanced data (112,837 instances), it results in the highest values in specificity (0.99) and F1-score (0.84), and when training with balanced data (35,858 instances) it achieves better values in sensitivity (1.00) and AUC (0.96). Analyzing feature importance, we identified a set of features that are more impactful in deciding the outcome including a number of comorbid conditions such as hypertension, dyslipidemia, osteoarthritis, CKD, and depression as well as a number of medication codes such as A10, D08, C10, and C09.

Download


Paper Citation


in Harvard Style

Zafari H., Li J., Zulkernine F., Kosowan L. and Singer A. (2022). Predictive Modeling of Diabetes using EMR Data. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF; ISBN 978-989-758-552-4, SciTePress, pages 211-218. DOI: 10.5220/0010908900003123


in Bibtex Style

@conference{healthinf22,
author={Hasan Zafari and Jie Li and Farhana Zulkernine and Leanne Kosowan and Alexander Singer},
title={Predictive Modeling of Diabetes using EMR Data},
booktitle={Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF},
year={2022},
pages={211-218},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010908900003123},
isbn={978-989-758-552-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF
TI - Predictive Modeling of Diabetes using EMR Data
SN - 978-989-758-552-4
AU - Zafari H.
AU - Li J.
AU - Zulkernine F.
AU - Kosowan L.
AU - Singer A.
PY - 2022
SP - 211
EP - 218
DO - 10.5220/0010908900003123
PB - SciTePress