Distinguishing between MicroRNA Targets from Diverse Species using Sequence Motifs and K-mers

Malik Yousef, Waleed Khalifa, İlhan Erkin Acar, Jens Allmer

Abstract

A disease phenotype is often due to dysregulation of gene expression. Post-translational regulation of protein abundance by microRNAs (miRNAs) is, therefore, of high importance in, for example, cancer studies. MicroRNAs provide a complementary sequence to their target messenger RNA (mRNA) as part of a complex molecular machinery. Known miRNAs and targets are listed in miRTarBase for a variety of organisms. The experimental detection of such pairs is convoluted and, therefore, their computational detection is desired which is complicated by missing negative data. For machine learning, many features for parameterization of the miRNA targets are available and k-mers and sequence motifs have previously been used. Unrelated organisms like intracellular pathogens and their hosts may communicate via miRNAs and, therefore, we investigated whether miRNA targets from one species can be differentiated from miRNA targets of another. To achieve this end, we employed target information of one species as positive and the other as negative training and testing data. Models of species with higher evolutionary distance generally achieved better results of up to 97% average accuracy (mouse versus \textit{Caenorhabditis elegans}) while more closely related species did not lead to successful models (human versus mouse; 60%). In the future, when more targeting data becomes available, models can be established which will be able to more precisely determine miRNA targets in hostpathogen systems using this approach.

References

  1. Allmer, J. (2014). Computational and bioinformatics methods for microRNA gene prediction. Methods in Molecular Biology (Clifton, N.J.), 1107, 157-75. http://doi.org/10.1007/978-1-62703-748-8_9.
  2. Allmer, J., & Yousef, M. (2012). Computational methods for ab initio detection of microRNAs. Frontiers in Genetics, 3, 209. http://doi.org/10.3389/ fgene.2012.00209.
  3. Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., Noble, W. S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research, 37(Web Server issue), W202-8. http://doi.org/10.1093/nar/gkp335.
  4. Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings / ... International Conference on Intelligent Systems for Molecular Biology; ISMB. International Conference on Intelligent Systems for Molecular Biology, 2, 28-36. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7584402.
  5. Berthold, M. R., Cebron, N., Dill, F., Gabriel, T. R., Kötter, T., Meinl, T., Wiswedel, B. (2008). KNIME: The Konstanz Information Miner. In SIGKDD Explorations (Vol. 11, pp. 319-326). http://doi.org/10.1007/978-3- 540-78246-9_38.
  6. Çakir, M. V., & Allmer, J. (2010). Systematic computational analysis of potential RNAi regulation in Toxoplasma gondii. In 2010 5th International Symposium on Health Informatics and Bioinformatics (pp. 31-38). Ankara, Turkey: IEEE. http://doi.org/ 10.1109/HIBIT.2010.5478909.
  7. Chen, K., & Rajewsky, N. (2006). Deep conservation of microRNA-target relationships and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harbor Symposia on Quantitative Biology, 71, 149-56. http://doi.org/10.1101/sqb.2006.71.039.
  8. Ding, J., Zhou, S., & Guan, J. (2010). MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features. BMC Bioinformatics, 11 Suppl 1(Suppl 11), S11. http://doi.org/10.1186/1471-2105-11-S11-S11.
  9. Ding, J., Zhou, S., & Guan, J. (2011). miRFam: an effective automatic miRNA classification method based on ngrams and a multiclass SVM. BMC Bioinformatics, 12(1), 216. http://doi.org/10.1186/1471-2105-12-216.
  10. Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461.
  11. http://doi.org/10.1093/bioinformatics/btq461.
  12. Erson-Bensan, A. E. (2014). Introduction to microRNAs in biological systems. Methods in Molecular Biology (Clifton, N.J.), 1107, 1-14. http://doi.org/10.1007/978- 1-62703-748-8_1.
  13. Grey, F. (2015). Role of microRNAs in herpesvirus latency and persistence. The Journal of General Virology, 96(Pt 4), 739-51. http://doi.org/10.1099/vir.0.070862-0.
  14. Griffiths-Jones, S. (2010). miRBase: microRNA sequences and annotation. Current Protocols in Bioinformatics / Editoral Board, Andreas D. Baxevanis ... [et Al.], Chapter 12, Unit 12.9.1-10. http://doi.org/ 10.1002/0471250953.bi1209s29.
  15. Hamzeiy, H., Allmer, J., & Yousef, M. (2014). Computational methods for microRNA target prediction. Methods in Molecular Biology (Clifton, N.J.), 1107, 207-21. http://doi.org/10.1007/978-1- 62703-748-8_12.
  16. Hsu, S.-D., Lin, F.-M., Wu, W.-Y., Liang, C., Huang, W.- C., Chan, W.-L., Huang, H.-D. (2011). miRTarBase: a database curates experimentally validated microRNAtarget interactions. Nucleic Acids Research, 39(Database issue), D163-9. http://doi.org/10.1093/ nar/gkq1107.
  17. Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., & Lu, Z. (2007). MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Research, 35(Web Server issue), W339-344. http://doi.org/10.1093/nar/gkm368.
  18. Khalifa, W., Yousef, M., Saçar Demirci, M. D., & Allmer, J. (2016). The impact of feature selection on one and two-class classification performance for plant microRNAs. PeerJ, 4, e2135. http://doi.org/ 10.7717/peerj.2135.
  19. Letunic, I., & Bork, P. (2011). Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Research, 39(suppl), W475- W478. http://doi.org/10.1093/nar/gkr201.
  20. Liang, H., & Li, W.-H. (2009). Lowly expressed human microRNA genes evolve rapidly. Molecular Biology and Evolution, 26(6), 1195-8. http://doi.org/10.1093/ molbev/msp053.
  21. Londin, E., Loher, P., Telonis, A. G., Quann, K., Clark, P., Jing, Y., Rigoutsos, I. (2015). Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs. Proceedings of the National Academy of Sciences, 112(10), E1106- E1115. http://doi.org/10.1073/pnas.1420955112.
  22. Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. BBA - Protein Structure, 405(2), 442-451. http://doi.org/10.1016/0005-2795(75)90109-9.
  23. Saçar, M., & Allmer, J. (2014). Machine Learning Methods for MicroRNA Gene Prediction. In M. Yousef & J. Allmer (Eds.), miRNomics: MicroRNA Biology and Computational Analysis SE - 10 (Vol. 1107, pp. 177- 187). Humana Press. http://doi.org/10.1007/978-1- 62703-748-8_10.
  24. Sacar, M. D., & Allmer, J. (2013). Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction. In 2013 8th International Symposium on Health Informatics and Bioinformatics (pp. 1-6). IEEE. http://doi.org/10.1109/HIBIT.2013.6661685.
  25. Saçar, M. D., & Allmer, J. (2013). Current Limitations for Computational Analysis of miRNAs in Cancer. Pakistan Journal of Clinical and Biomedical Research, 1(2), 3-5. Retrieved from https://www.researchgate. net/publication/260487667_Current_Limitations_for_ Computational_Analysis_of_miRNAs_in_Cancer.
  26. Saçar, M. D., Bagci, C., & Allmer, J. (2014). Computational Prediction of MicroRNAs from Toxoplasma gondii Potentially Regulating the Hosts' Gene Expression. Genomics, Proteomics & Bioinformatics, 12(5), 228-238. http://doi.org/ 10.1016/j.gpb.2014.09.002.
  27. Saçar Demirci, M. D., Bagci, C., & Allmer, J. (2016). Differential Expression of T. gondii MicroRNAs in Murine and Human Hosts. In Non-coding RNAs and inter-kingdom communication. Springer.
  28. Sethupathy, P., Corda, B., & Hatzigeorgiou, A. G. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA, 12(2), 192- 7. http://doi.org/10.1261/rna.2239606.
  29. Vapnik, V. N. (1995). The nature of statistical learning theory. New York, New York, USA: Springer-Verlag. Retrieved from http://dl.acm.org/citation.cfm?id=211359.
  30. Xu, Q.-S., & Liang, Y.-Z. (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems, 56(1), 1-11. http://doi.org/10.1016/S0169- 7439(00)00122-2.
  31. Yang, Y., & Pedersen, J. O. (1997). A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 412-420. http:// doi.org/10.1093/bioinformatics/bth267.
  32. Yousef, M., Allmer, J., & Khalifa, W. (2016a). Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features. Journal of Intelligent Learning Systems and Applications, 8(1), 9-22. http://doi.org/10.4236/jilsa.2016.81002.
  33. Yousef, M., Allmer, J., & Khalifa, W. (2016b). Feature Selection for MicroRNA Target Prediction - Comparison of One-Class Feature Selection Methodologies. In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (pp. 216-225). Rome: SCITEPRESS - Science and and Technology Publications. http://doi.org/10.5220/0005701602160225.
  34. Yousef, M., Allmer, J., & Khalifaa, W. (2015). Plant MicroRNA Prediction employing Sequence Motifs Achieves High Accuracy.
Download


Paper Citation


in Harvard Style

Yousef M., Khalifa W., Acar İ. and Allmer J. (2017). Distinguishing between MicroRNA Targets from Diverse Species using Sequence Motifs and K-mers . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 133-139. DOI: 10.5220/0006137901330139


in Bibtex Style

@conference{bioinformatics17,
author={Malik Yousef and Waleed Khalifa and İlhan Erkin Acar and Jens Allmer},
title={Distinguishing between MicroRNA Targets from Diverse Species using Sequence Motifs and K-mers},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={133-139},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006137901330139},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Distinguishing between MicroRNA Targets from Diverse Species using Sequence Motifs and K-mers
SN - 978-989-758-214-1
AU - Yousef M.
AU - Khalifa W.
AU - Acar İ.
AU - Allmer J.
PY - 2017
SP - 133
EP - 139
DO - 10.5220/0006137901330139