Ensemble Learning-based Prediction of Drug-pathway Interactions based on Features Integration

Mingyuan Xin, Jun Fan, Zhenran Jiang

2017

Abstract

Recently, developing computational methods to explore drug-pathway interaction relationships has attracted attention for their potentiality in discovering unknown targets and mechanisms of drug actions. However, mining suitable features of drugs and pathways is challenging for available prediction methods. This paper performed an ensemble learning-based method to predict potential drug-pathway interactions by integrating different drug-based and pathway-based features. The main characteristic of our method lies in using the Relief algorithm for feature selection and regarding three ensemble methods (AdaBoost, Bagging and Random Subspace) for classifiers. Cross validation results showed the AdaBoost algorithm that based on the Decision Tree classifier can obtain a higher prediction accuracy, which indicated the effectiveness of ensemble learning. Moreover, some new predicted interactions were validated by database searching, which demonstrated its potentiality for further biological experiment investigation.

References

  1. Ahmed, J., Meinel, T., Dunkel, M., Murgueitio, M.S., Adams, R., Blasse, C., et al. 2011. CancerResource: a comprehensive database of cancer-relevant proteins and compound interactions supported by experimental knowledge. Nucleic Acids Res., 39, D960-D967.
  2. Binns, D., Dimmer, E., Huntley, R., Barrell, D., O'Donovan, C. and Apweiler, R., 2009. QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics, 25(22), 3045-6.
  3. Breiman, L., 1996. Bagging predictors. Machine Learning, 24(2), 123-140.
  4. Chen, B., Wild, D. and Guha, R., 2009. PubChem as a source of polypharmacology. J. Chem. Inf. Model., 49 (9), 2044-2055.
  5. Chen, C., Fu, X., Zhang, D., Li, Y., Xie, Y., Li, Y. and Huang Y., 2011. Varied pathways of stage IA lung adenocarcinomas discovered by integrated gene expression analysis. Int. J. Biol. Sci., 7(5), 551-66.
  6. Cortes, C. and Vapnik, V., 1995. Support vector machine. Machine Learning, 20(3), 273-297.
  7. Davis, A.P., Grondin, C.J., Lennon-Hopkins, K., SaraceniRichards, C., Sciaky, D., King, B.L., et al. 2015. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res., 43, D914-20.
  8. Dietterich, T.G., 2000. Ensemble methods in machine learning. In: Multiple classifier systems. Springer Berlin Heidelberg, 1-15.
  9. Freund, Y. and Schapire, R. E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput. Syst. Sci., 55(1), 119-139.
  10. Friedl, M.A. and Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ., 61(3), 399-409.
  11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H., 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.
  12. Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
  13. Hopkins, A.L., 2008. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4(11), 682-90.
  14. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. and Tanabe, M., 2012. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res., 40, D109-D114.
  15. Lipkus, A.H., 1999. A proof of the triangle inequality for the Tanimoto distance. J Math Chem., 26, 263-265.
  16. Ma, H. and Zhao, H., 2012. iFad: an integrative factor analysis model for drug-pathway association inference. Bioinformatics, 28(14), 1911-1918.
  17. Ma, H. and Zhao, H., 2012. FacPad: Bayesian sparse factor modeling for the inference of pathways responsive to drug treatment. Bioinformatics, 28(20), 2662-70.
  18. Ovaska, K., Laakso, M. and Hautaniemi, S., 2008. Fast Gene Ontology based clustering for microarray experiments. BioData Min., 1(1), 11.
  19. Pratanwanich, N. and Lio, P., 2014. Exploring the complexity of pathway-drug relationships using latent Dirichlet allocation. Comput. Biol. Chem., 53,144-152.
  20. Reinhold, W.C., Sunshine, M., Liu, H., Varma, S., Kohn, K.W., Morris, J., et al. 2012. CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell lineset. Cancer Res., 72(14), 3499-3511.
  21. Rish, I., 2001. An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, Hoos, H.H. Ed., IBM New York, 3(22), pp. 41-46.
  22. Schwenker, F., 2013. Ensemble methods: Foundations and algorithms. Computational Intelligence Magazine, IEEE, 8(1), 77-79.
  23. Shin, J.Y., Hong, S.H., Kang, B., Minai-Tehrani, A. and Cho, M.H., 2013. Overexpression of beclin1 induced autophagy and apoptosis in lungs of K-rasLA1 mice. Lung Cancer, 81(3), 362-70.
  24. Silberberg, Y., Gottlieb, A., Kupiec, M., Ruppin, E. and Sharan, R., 2012 Large-scale elucidation of drug response pathways in humans. J. Comput. Biol., 19(2), 163-74.
  25. Smith, T.F., Waterman, M.S. and Burks, C., 1985. The statistical distribution of nucleic acid similarities. Nucleic Acids Res., 13, 645-656.
  26. Song, M., Yan, Y. and Jiang, Z., 2014. Drug-pathway interaction prediction via multiple feature fusion. Mol. Biosyst., 10(11), 2907-2913.
  27. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., et al. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A., 102(43), 15545-50.
  28. Sun, Y., Lou, X. and Bao, B., 2011. A novel relief feature selection algorithm based on mean-variance model. J Inf Comput Sci., 8, 3921-3929.
  29. Van, L.T., Nabuurs, S.B. and Marchiori, E., 2013. Predicting drug-target interaction networks of human diseases based on multiple feature information. Pharmacogenomics, 14(14), 1701-7.
  30. Wishart, D.S., Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., et al. 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res., 34, D668-72.
  31. Xiao, G., Lu, Q., Li, C., Wang, W., Chen, Y. and Xiao, Z., 2010. Comparative proteome analysis of human adenocarcinoma. Med Oncol., 27(2), 346-56.
  32. Xue, D., Lu, M., Gao, B., Qiao, X. and Zhang, Y., 2014. Screening for transcription factors and their regulatory small molecules involved in regulating the functions of CL1-5 cancer cells under the effects of macrophage-conditioned medium. Oncol. Rep., 31(3), 23-33.
  33. Yap, C.W., 2011. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem., 32(7), 1466-1474.
Download


Paper Citation


in Harvard Style

Xin M., Fan J. and Jiang Z. (2017). Ensemble Learning-based Prediction of Drug-pathway Interactions based on Features Integration . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 117-124. DOI: 10.5220/0006096701170124


in Bibtex Style

@conference{bioinformatics17,
author={Mingyuan Xin and Jun Fan and Zhenran Jiang},
title={Ensemble Learning-based Prediction of Drug-pathway Interactions based on Features Integration},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={117-124},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006096701170124},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Ensemble Learning-based Prediction of Drug-pathway Interactions based on Features Integration
SN - 978-989-758-214-1
AU - Xin M.
AU - Fan J.
AU - Jiang Z.
PY - 2017
SP - 117
EP - 124
DO - 10.5220/0006096701170124