Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance

Kunti Robiatul Mahmudah, Bedy Purnama, Bedy Purnama, Fatma Indriani, Fatma Indriani, Kenji Satou

2021

Abstract

Chronic obstructive pulmonary disease (COPD) is a progressive inflammatory lung disease that causes breathlessness and leads to serious illness including lung cancer. It is estimated that COPD caused 5% of all deaths globally in 2015, putting COPD as the three leading causes of death worldwide. This study proposes methods that utilize gene expression data from microarrays to predict the presence or absence of COPD. The proposed method assists in determining better treatments to lower the fatality rates. In this study, microarray data of the small airway epithelium cells obtained from 135 samples of 23 smokers with COPD (9 GOLD stage I, 12 GOLD stage II, and 2 GOLD stage III), 59 healthy smokers, and 53 healthy nonsmokers were selected from GEO dataset. Machine learning and regression algorithms performed in this study included Random Forest, Support Vector Machine, Naïve Bayes, Gradient Boosting Machines, Elastic Net Regression, and Multiclass Logistic Regression. After diminishing imbalance data effect using SMOTE, classification algorithms were performed using 825 of the selected features. High AUC score was achieved by elastic net regression and multiclass logistic regression with AUC of 89% and 90%, respectively. In the metrics including accuracy, specificity, and sensitivity, both classifiers also outperformed the others.

Download


Paper Citation


in Harvard Style

Mahmudah K., Purnama B., Indriani F. and Satou K. (2021). Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-490-9, SciTePress, pages 148-153. DOI: 10.5220/0010316500002865


in Bibtex Style

@conference{bioinformatics21,
author={Kunti Robiatul Mahmudah and Bedy Purnama and Fatma Indriani and Kenji Satou},
title={Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS},
year={2021},
pages={148-153},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010316500002865},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS
TI - Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance
SN - 978-989-758-490-9
AU - Mahmudah K.
AU - Purnama B.
AU - Indriani F.
AU - Satou K.
PY - 2021
SP - 148
EP - 153
DO - 10.5220/0010316500002865
PB - SciTePress