Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data

Mirele Costa, João Oliveira, Waldeyr C. da Silva, Waldeyr C. da Silva, Rituparno Sen, Jörg Fallmann, Peter Stadler, Peter Stadler, Peter Stadler, Peter Stadler, Maria Walter

Abstract

Machine learning (ML) methods are often used to identify members of non-coding RNA classes such as microRNAs or snoRNAs. However, ML methods have not been successfully used for homology search tasks. A systematic evaluation of ML in homology search requires large, controlled, and known ground truth test sets, and thus, methods to construct large realistic artificial data sets. Here we describe a method for producing sets of arbitrarily large and diverse snoRNA sequences based on artificial evolution. These are then used to evaluate supervised ML methods (Support Vector Machine, Artificial Neural Network, and Random Forest) for snoRNA detection in a chordate genome. Our results indicate that ML approaches can indeed be competitive also for homology search.

Download


Paper Citation


in Harvard Style

Costa M., Oliveira J., C. da Silva W., Sen R., Fallmann J., Stadler P. and Walter M. (2021). Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data.In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, ISBN 978-989-758-490-9, pages 176-183. DOI: 10.5220/0010346001760183


in Bibtex Style

@conference{bioinformatics21,
author={Mirele Costa and João Oliveira and Waldeyr C. da Silva and Rituparno Sen and Jörg Fallmann and Peter Stadler and Maria Walter},
title={Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS,},
year={2021},
pages={176-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010346001760183},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS,
TI - Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data
SN - 978-989-758-490-9
AU - Costa M.
AU - Oliveira J.
AU - C. da Silva W.
AU - Sen R.
AU - Fallmann J.
AU - Stadler P.
AU - Walter M.
PY - 2021
SP - 176
EP - 183
DO - 10.5220/0010346001760183