Complementary Domain Prioritization: A Method to Improve Biologically Relevant Detection in Multi-Omic Data Sets

Benjamin A. Neely, Paul E. Anderson

Abstract

As the speed and quality of different analytical platforms increase, it is more common to collect data across multiple biological domains in parallel (\textit{i.e.}, genomics, transcriptomics, proteomics, and metabolomics). There is a growing interest in algorithms and tools that leverage heterogeneous data streams in a meaningful way. Since these domains are typically non-linearly related, we evaluated whether results from one domain could be used to prioritize another domain to increase the power of detection, maintain type 1 error, and highlight biologically relevant changes in the secondary domain. To perform this feature prioritization, we developed a methodology called Complementary Domain Prioritization that utilizes the underpinning biology to relate complementary domains. Herein, we evaluate how proteomic data can guide transcriptomic differential expression analysis by analyzing two published colorectal cancer proteotranscriptomic data sets. The proposed strategy improved detection of cancer-related genes compared to standard permutation invariant filtering approaches and did not increase type I error. Moreover, this approach detected differentially expressed genes that would not have been detected using filtering alone while also highlighted pathways that might have otherwise been overlooked. These results demonstrate how this strategy can effectively prioritize transcriptomic data and drive new hypotheses, though subsequent validation studies are still required.

References

  1. Alves, G. and Yu, Y. K. (2014). Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS ONE, 9(3).
  2. Bizama, C., Benavente, F., Salvatierra, E., GutiérrezMoraga, A., Espinoza, J. a., Fernández, E. a., Roa, I., Mazzolini, G., Sagredo, E. a., Gidekel, M., and Podhajcer, O. L. (2014). The low-abundance transcriptome reveals novel biomarkers, specific intracellular pathways and targetable genes associated with advanced gastric cancer. International Journal of Cancer, 134:755-764.
  3. Boja, E. S., Kinsinger, C. R., Rodriguez, H., and Srinivas, P. (2014). Integration of omics sciences to advance biology and medicine. 11(1):1-12.
  4. Börnigen, D., Tranchevent, L. C., Bonachela-Capdevila, F., Devriendt, K., De Moor, B., De Causmaecker, P., and Moreau, Y. (2012). An unbiased evaluation of gene prioritization tools. Bioinformatics, 28(23):3081- 3088.
  5. Bourgon, R., Gentleman, R., and Huber, W. (2010). Independent filtering increases detection power for highthroughput experiments. Proceedings of the National Academy of Sciences of the United States of America, 107:9546-9551.
  6. Cancer, T. and Atlas, G. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407):330-7.
  7. Carvalho, B. S. and Irizarry, R. a. (2010). A framework for oligonucleotide microarray preprocessing. Bioinformatics, 26(19):2363-2367.
  8. Cattaneo, E., Laczko, E., Buffoli, F., Zorzi, F., Bianco, M. A., Menigatti, M., Bartosova, Z., Haider, R., Helmchen, B., Sabates-Bellver, J., Tiwari, A., Jiricny, J., and Marra, G. (2011). Preinvasive colorectal lesion transcriptomes correlate with endoscopic morphology (polypoid vs. nonpolypoid). EMBO molecular medicine, 3(6):334-47.
  9. de Klerk, E. and a.C. t Hoen, P. (2015). Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends in Genetics, 31(3):128-139.
  10. Dudoit, S., Dudoit, S., Shaffer, J. P., Shaffer, J. P., Boldrick, J. C., and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statistical Science, 18(1):71-103.
  11. Foss, E. J., Radulovic, D., Shaffer, S. a., Ruderfer, D. M., Bedalov, A., Goodlett, D. R., and Kruglyak, L. (2007). Genetic basis of proteome variation in yeast. Nature genetics, 39(11):1369-1375.
  12. Geiger, T., Wehner, a., Schaab, C., Cox, J., and Mann, M. (2012). Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins. Molecular & Cellular Proteomics, 11:M111.014050-M111.014050.
  13. Gygi, S. P., Rochon, Y., Franza, B. R., and Aebersold, R. (1999). Correlation between protein and mRNA abundance in yeast. Molecular and cellular biology, 19(3):1720-1730.
  14. Haider, S. and Pal, R. (2013). Integrated analysis of transcriptomic and proteomic data. Current genomics, 14(2):91-110.
  15. Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2014). Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Research, 42(D1):199-205.
  16. Kelder, T., Pico, A. R., Hanspers, K., Van Iersel, M. P., Evelo, C., and Conklin, B. R. (2009). Mining biological pathways using WikiPathways web services. PLoS ONE, 4(7):2-5.
  17. Kumar, D., Bansal, G., Narang, A., Basak, T., Abbas, T., and Dash, D. (2016). Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics, pages 1-12.
  18. Kuo, T.-C., Tian, T.-F., and Tseng, Y. J. (2013). 3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data. BMC systems biology, 7(1):64-78.
  19. Larance, M. and Lamond, A. I. (2015). Multidimensional proteomics for cell biology. Nature Reviews Molecular Cell Biology, 16(5):268-80.
  20. Luo, B. and Lee, a. S. (2012). The critical roles of endoplasmic reticulum chaperones and unfolded protein response in tumorigenesis and anticancer therapies. Oncogene, 32(7):805-818.
  21. Matys, V., Kel-Margoulis, O. V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, a. E., and Wingender, E. (2006). TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic acids research, 34(Database issue):D108-D110.
  22. Neely, B. A., Wilkins, C. E., Marlow, L. A., Malyarenko, D., Kim, Y., Ignatchenko, A., Sasinowska, H., Sasinowski, M., Nyalwidhe, J. O., Kislinger, T., Copland, J. A., and Drake, R. R. (2016). Proteotranscriptomic Analysis Reveals Stage Specific Changes in the Molecular Landscape of Clear-Cell Renal Cell Carcinoma. PloS one, 11(4):e0154074.
  23. Pirhaji, L., Milani, P., Leidl, M., Curran, T., AvilaPacheco, J., Clish, C. B., White, F. M., Saghatelian, A., and Fraenkel, E. (2016). Revealing diseaseassociated pathways by network integration of untargeted metabolomics. Nature methods, 13(9):770-776.
  24. Piruzian, E., Bruskin, S., Ishkin, A., Abdeev, R., Moshkovskii, S., Melnik, S., Nikolsky, Y., and Nikolskaya, T. (2010). Integrated network analysis of transcriptomic and proteomic data in psoriasis. BMC systems biology, 4(41).
  25. Polpitiya, A. D., Qian, W. J., Jaitly, N., Petyuk, V. a., Adkins, J. N., Camp, D. G., Anderson, G. a., and Smith, R. D. (2008). DAnTE: A statistical tool for quantitative analysis of -omics data. Bioinformatics, 24(13):1556-1558.
  26. Robinson, M. D. and Smyth, G. K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (Oxford, England), 23(21):2881-7.
  27. Shimwell, N. J., Bryan, R. T., Wei, W., James, N. D., Cheng, K. K., Zeegers, M. P., Johnson, P. J., Martin, a., and Ward, D. G. (2013). Combined proteome and transcriptome analyses for the discovery of urinary biomarkers for urothelial carcinoma. British journal of cancer, 108(9):1854-61.
  28. Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical applications in genetics and molecular biology, 3(1):1-26.
  29. Talloen, W., Clevert, D.-A., Hochreiter, S., Amaratunga, D., Bijnens, L., Kass, S., and Göhlmann, H. W. H. (2007). I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data. Bioinformatics (Oxford, England), 23(21):2897-902.
  30. Uzozie, A., Nanni, P., Staiano, T., Grossmann, J., BarkowOesterreicher, S., Shay, J. W., Tiwari, A., Buffoli, F., Laczko, E., and Marra, G. (2014). Sorbitol dehydrogenase overexpression and other aspects of dysregulated protein expression in human precancerous colorectal neoplasms: a quantitative proteomics study. Molecular & Cellular Proteomics, 13(5):1198-1218.
  31. Wang, J., Duncan, D., Shi, Z., and Zhang, B. (2013). WEBbased GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic acids research, 41(Web Server issue):77-83.
  32. Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., Shi, Z., Chambers, M. C., Zimmerman, L. J., Shaddox, K. F., Kim, S., Davies, S. R., Wang, S., Wang, P., Kinsinger, C. R., Rivers, R. C., Rodriguez, H., Townsend, R. R., Ellis, M. J. C., Carr, S. a., Tabb, D. L., Coffey, R. J., Slebos, R. J. C., and Liebler, D. C. (2014). Proteogenomic characterization of human colon and rectal cancer. Nature, 513(7518):382-387.
Download


Paper Citation


in Harvard Style

Neely B. and Anderson P. (2017). Complementary Domain Prioritization: A Method to Improve Biologically Relevant Detection in Multi-Omic Data Sets . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 68-80. DOI: 10.5220/0006151500680080


in Bibtex Style

@conference{bioinformatics17,
author={Benjamin A. Neely and Paul E. Anderson},
title={Complementary Domain Prioritization: A Method to Improve Biologically Relevant Detection in Multi-Omic Data Sets},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={68-80},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006151500680080},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Complementary Domain Prioritization: A Method to Improve Biologically Relevant Detection in Multi-Omic Data Sets
SN - 978-989-758-214-1
AU - Neely B.
AU - Anderson P.
PY - 2017
SP - 68
EP - 80
DO - 10.5220/0006151500680080