Generalized Association Rules for Connecting Biological Ontologies

Fernando Benites, Elena Sapozhnikova

2013

Abstract

The constantly increasing volume and complexity of available biological data requires new methods for managing and analyzing them. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining generalized association rules connecting their categories. To select only the most important rules, we propose a new interestingness measure especially well-suited for hierarchically organized rules. To demonstrate this approach, we applied it to the bioinformatics domain and, more specifically, to the analysis of data from Gene Ontology, Cell type Ontology and GPCR databases. In this way found association rules connecting two biological ontologies can provide the user with new knowledge about underlying biological processes. The preliminary results show that produced rules represent meaningful and quite reliable associations among the ontologies and help infer new knowledge.

References

  1. Agrawal, R., ImieliÁski, T., and Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. In Proc. of the 1993 ACM SIGMOD Int. Conf. on Management of Data.
  2. An, L., Obradovic, Z., Smith, D., Bodenreider, O., and Megalooikonomou, V. (2009). Mining association rules among gene functions in clusters of similar gene expression maps. In 2nd Wksp. on Data Mining in Functional Genomics.
  3. Artamonova, I., Frishman, G., and Frishman, D. (2007). Applying negative rule mining to improve genome annotation. BMC Bioinformatics, 8.
  4. Artamonova, I., Frishman, G., Gelfand, M., and Frishman, D. (2005). Mining sequence annotation databanks for association patterns. Bioinformatics, 21(3).
  5. Bada, M. and Hunter, L. (2007). Enrichment of obo ontologies. J. of Biomed. Informatics, 40(3).
  6. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J., and Gandrillon, O. (2002). Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genome Biol., 3(12).
  7. Benites, F. and Sapozhnikova, E. (2012). Learning Different Concept Hierarchies and the Relations Between them from Classified Data. Intel. Data Analysis for RealLife Appl.: Theory and Practice.
  8. Bodenreider, O., Aubry, M., and Burgun, A. (2005). Nonlexical approaches to identifying associative relations in the gene ontology. In Pacific Symp. on Biocomputing.
  9. Brijs, T., Vanhoof, K., and Wets, G. (2003). Defining interestingness measures for association rules. Int. J. of Inf. Theories and Appl., 10(4).
  10. Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In Proc. of the ACM SIGMOD Int. Conf. on Manag. of data.
  11. Carmona-Saez, P., Chagoyen, M., Rodríguez, A., Trelles, O., Carazo, J., and Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics, 7.
  12. Creighton, C. and Hanash, S. (2003). Mining gene expression databases for association rules. Bioinformatics, 19(1).
  13. Dafas, A., Garcez, D., and Artur, S. (2007). Discovering Meaningful Rules from Gene Expression Data. Curr. Bioinformatics, 2(3).
  14. Doan, A., Madhavan, J., Domingos, P., and Halevy, A. (2002). Learning to map between ontologies on the semantic web. In Proc. of the 11th Int. Conf. on WWW.
  15. Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., and Falcão, A. (2012). Mining go annotations for improving annotation consistency. PLoS ONE, 7.
  16. Hackenberg, M. and Matthiesen, R. (2008). AnnotationModules: A tool for finding significant combinations of multisource annotations for gene lists. Bioinformatics.
  17. Hoehndorf, R., Ngonga, A., Dannemann, M., and Kelso, J. (2008). From terms to categories: Testing the significance of co-occurrences between ontological categories. In Proc. of the 3rd Int. Symp. on Semantic Mining in Biomed.
  18. Joyce, A. R. and Palsson, B. O. (2006). The model organism as a system: integrating 'omics' data sets. Nat. Rev. Mol. Cell. Biol., 7(3).
  19. Karpinets, T., Park, B., and Uberbacher, E. (2012). Analyzing large biological datasets with association networks. Nucleic Acids Research.
  20. Lallich, S., Teytaud, O., and Prudhomme, E. (2007). Association rule interestingness: Measure and statistical validation. In Quality Measures in Data Mining, Studies in Comp. Intel.
  21. MacDonald, N. and Beiko, R. (2010). Efficient learning of microbial genotype-phenotype association rules. Bioinformatics, 26(15).
  22. Maedche, A. and Staab, S. (2000). Discovering conceptual relations from text. In Proc. of the 14th ECAI.
  23. Martin, T., Shen, Y., and Azvine, B. (2008). Granular association rules for multiple taxonomies: A mass assignment approach. Uncertainty Reasoning for the Semantic Web I.
  24. Nagel, U., Thiel, K., Kötter, T., Piatek, D., and Berthold, M. (2011). Bisociative discovery of interesting relations between domains. In Proc. of the 10th Int. Symp. on Intel. Data Analysis, Lecture Notes in Computer Science (LNCS).
  25. Paulheim, H. and Fümkranz, J. (2012). Unsupervised generation of data mining features from linked open data. In Proc. of the 2nd Int. Conf. on Web Intel., Mining and Semantics.
  26. Schjetne, K., Gundersen, H., Iversen, J.-G., Thompson, K., and Bogen, B. (2003). Antibody-mediated delivery of antigen to chemokine receptors on antigen-presenting cells results in enhanced cd4+ t cell responses. European J. of Immunology, 33(11).
  27. Schneider, M., Meingassner, J., Lipp, M., Moore, H., and Rot, A. (2007). Ccr7 is required for the in vivo function of cd4+ cd25+ regulatory t cells. The J. of Exp. Med., 204(4).
  28. Shivakumar, B. and Porkodi, R. (2012). Finding relationships among gene ontology terms in biological documents using association rule mining and go annotations. Int. J. of Computer Science, Inf. Tech., & Security, 2(3).
  29. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. In Proc. of the 21th Int. Conf. on Very Large Data Bases.
  30. Surana, A., Kiran, U., and Reddy, P. (2010). Selecting a right interestingness measure for rare association rules. In 16th Int. Conf. on Manag. of Data.
  31. Tamura, M. and D'haeseleer, P. (2008). Microbial genotype-phenotype mapping by class association rule mining. Bioinformatics, 24(13).
  32. Tan, P., Kumar, V., and Srivastava, J. (2004). Selecting the right objective measure for association analysis. Information Systems, 29.
  33. Troyanskaya, O., Dolinski, K., Owen, A., Altman, R., and Botstein, D. (2003). A Bayesian framework for combining heterogeneous data sources for gene function prediction (in S. cerevisiae).
  34. Tseng, V., Yu, H., and Yang, S. (2009). Efficient mining of multilevel gene association rules from microarray and gene ontology. Inform. Syst. Front.
  35. Van Hemert, J. and Baldock, R. (2007). Mining spatial gene expression data for association rules. In Proc. of the 1st int. conf. on Bioinformatics research and development, BIRD'07.
  36. Vroling, B., Sanders, M., Baakman, C., Borrmann, A., Verhoeven, S., Klomp, J., Oliveira, L., de Vlieg, J., and Vriend, G. (2011). Gpcrdb: information system for g protein-coupled receptors. Nucleic Acids Research, 39(suppl 1).
  37. Wu, T., Chen, Y., and Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Min. Knowl. Disc., 21.
Download


Paper Citation


in Harvard Style

Benites F. and Sapozhnikova E. (2013). Generalized Association Rules for Connecting Biological Ontologies . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 229-236. DOI: 10.5220/0004327102290236


in Bibtex Style

@conference{bioinformatics13,
author={Fernando Benites and Elena Sapozhnikova},
title={Generalized Association Rules for Connecting Biological Ontologies},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={229-236},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004327102290236},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Generalized Association Rules for Connecting Biological Ontologies
SN - 978-989-8565-35-8
AU - Benites F.
AU - Sapozhnikova E.
PY - 2013
SP - 229
EP - 236
DO - 10.5220/0004327102290236