Prediction of Essential Genes based on Machine Learning and Information Theoretic Features

Dawit Nigatu, Werner Henkel

2017

Abstract

Computational tools have enabled a relatively simple prediction of essential genes (EGs), which would otherwise be done by costly and tedious gene knockout experimental procedures. We present a machine learning based predictor using information-theoretic features derived exclusively from DNA sequences. We used entropy, mutual information, conditional mutual information, and Markov chain models as features. We employed a support vector machine (SVM) classifier and predicted the EGs in 15 prokaryotic genomes. A fivefold cross-validation on the bacteria E. coli, B. subtilis, and M. pulmonis resulted in AUC score of 0.85, 0.81, and 0.89, respectively. In cross-organism prediction, the EGs of a given bacterium are predicted using a model trained on the rest of the bacteria. AUC scores ranging from 0.66 to 0.9 and averaging 0.8 were obtained. The average AUC of the classifier on a one-to-one prediction among E. coli, B. subtilis, and Acinetobacter is 0.85. The performance of our predictor is comparable with recent and state-of-the art predictors. Considering that we used only sequence information on a problem that is much more complicated, the achieved results are very good.

Download


Paper Citation


in Harvard Style

Nigatu D. and Henkel W. (2017). Prediction of Essential Genes based on Machine Learning and Information Theoretic Features. In - BIOINFORMATICS, (BIOSTEC 2017) ISBN , pages 0-0. DOI: 10.5220/0006165700001488


in Bibtex Style

@conference{bioinformatics17,
author={Dawit Nigatu and Werner Henkel},
title={Prediction of Essential Genes based on Machine Learning and Information Theoretic Features},
booktitle={ - BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006165700001488},
isbn={},
}


in EndNote Style

TY - CONF

JO - - BIOINFORMATICS, (BIOSTEC 2017)
TI - Prediction of Essential Genes based on Machine Learning and Information Theoretic Features
SN -
AU - Nigatu D.
AU - Henkel W.
PY - 2017
SP - 0
EP - 0
DO - 10.5220/0006165700001488