Investigating Random Forest Classification on Publicly Available Tuberculosis Data to Uncover Robust Transcriptional Biomarkers

Carly A. Bobak, Alexander J. Titus, Jane E. Hill

Abstract

There has been increasing concern amongst the scientific community of a reproducibility crisis, particularly in the field of bioinformatics. Often, published research results do not correlate with clinical success. One theory explaining this phenomenon is that findings from homogeneous cohort studies are not generalizable to an inherently heterogeneous population. In this work, we integrate data from 4 distinct tuberculosis (TB) cohorts, for a total of 1164 samples, to find common differentially regulated genes which may be used to diagnose active TB from latent TB, treated TB, other diseases, and healthy controls. We selected 25 genes using random forest to get an AUC of 0.89 in our training data, and 0.86 in our test data. A total of 18 out of 25 genes had been previously associated with TB in independent studies, suggesting that integrating data may be an important tool for increasing micro-array research reproducibility.

Download


Paper Citation


in Harvard Style

Bobak C., Titus A. and Hill J. (2018). Investigating Random Forest Classification on Publicly Available Tuberculosis Data to Uncover Robust Transcriptional Biomarkers.In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: AI4Health, ISBN 978-989-758-281-3, pages 695-701. DOI: 10.5220/0006752406950701


in Bibtex Style

@conference{ai4health18,
author={Carly A. Bobak and Alexander J. Titus and Jane E. Hill},
title={Investigating Random Forest Classification on Publicly Available Tuberculosis Data to Uncover Robust Transcriptional Biomarkers},
booktitle={Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: AI4Health,},
year={2018},
pages={695-701},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006752406950701},
isbn={978-989-758-281-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: AI4Health,
TI - Investigating Random Forest Classification on Publicly Available Tuberculosis Data to Uncover Robust Transcriptional Biomarkers
SN - 978-989-758-281-3
AU - Bobak C.
AU - Titus A.
AU - Hill J.
PY - 2018
SP - 695
EP - 701
DO - 10.5220/0006752406950701