A Machine Learning-based Approach for the Categorization of MicroRNAs to Their Species of Origin

Luise Odenthal, Jens Allmer, Malik Yousef, Malik Yousef

2020

Abstract

Many diseases are driven by dysregulated gene expression. MicroRNAs are key players for post-transcriptional gene regulation. miRBase contains microRNAs (miRNAs) from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate between correct and false miRNAs should be explored. Here, a novel approach allowing the assignment of an unknown miRNA to its most likely clade/species of origin is presented. A simple way to filter new data would be to ensure that the novel miRNA categorizes closely to the species it is said to originate from. The approach presented here automatically assigns a miRNA sample to its clade/species of origin. For that, an ensemble classifier of multiple two class random forest was designed, where each random forest was trained on one species/clade pair. The approach was tested with different sampling methods on a dataset that was taken from miRBase and it was evaluated using a hierarchical f-measure. The approach predicted 81% to 94% of the test data correctly, depending on the sampling method. This is the first classifier that can classify miRNAs to their species of origin.

Download


Paper Citation


in Harvard Style

Odenthal L., Allmer J. and Yousef M. (2020). A Machine Learning-based Approach for the Categorization of MicroRNAs to Their Species of Origin. In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-398-8, SciTePress, pages 150-157. DOI: 10.5220/0008975001500157


in Bibtex Style

@conference{bioinformatics20,
author={Luise Odenthal and Jens Allmer and Malik Yousef},
title={A Machine Learning-based Approach for the Categorization of MicroRNAs to Their Species of Origin},
booktitle={Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS},
year={2020},
pages={150-157},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008975001500157},
isbn={978-989-758-398-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS
TI - A Machine Learning-based Approach for the Categorization of MicroRNAs to Their Species of Origin
SN - 978-989-758-398-8
AU - Odenthal L.
AU - Allmer J.
AU - Yousef M.
PY - 2020
SP - 150
EP - 157
DO - 10.5220/0008975001500157
PB - SciTePress