Formal Concept Analysis for the Interpretation of Relational Learning Applied on 3D Protein-binding Sites

Emmanuel Bresso, Renaud Grisoni, Marie-Dominique Devignes, Amedeo Napoli, Malika Smaïl-Tabbone

2012

Abstract

Inductive Logic Programming (ILP) is a powerful learning method which allows an expressive representation of the data and produces explicit knowledge in the form of a theory, i.e., a set of first-order logic rules. However, ILP systems suffer from a drawback as they return a single theory based on heuristic user-choices of various parameters, thus ignoring potentially relevant rules. Accordingly, we propose an original approach based on Formal Concept Analysis for effective interpretation of reached theories with the possibility of adding domain knowledge. Our approach is applied to the characterization of three-dimensional (3D) protein-binding sites which are the protein portions on which interactions with other proteins take place. In this context, we define a relational and logical representation of 3D patches and formalize the problem as a concept learning problem using ILP. We report here the results we obtained on a particular category of protein-binding sites namely phosphorylation sites using ILP followed by FCA-based interpretation.

References

  1. Aloy, P., Russell, R., 2003. InterPreTS: Protein Interaction Prediction through Tertiary Structure. Bioinformatics Applications Note, 19 (1): 161-162.
  2. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E., 2000. The Protein Data Bank. Nucleic Acids Research, 28: 235-242.
  3. De Raedt L., 2008. Logical and Relational Learning. Springer.
  4. Diella, F., Gould, C. M., Chica, C., Via, A., Gibson, T. J., 2008. Phospho.ELM: a database of phosphorylation sites - update 2008. Nucleic Acids Res., 36 (Database issue): D240-4.
  5. Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I., Kim, SH., 1999. Recognition of a protein fold in the context of the SCOP classification. Proteins: Structure, Function, and Genetics, 35(4): 401-407.
  6. Durek, P., Schudoma, C., Weckwerth, W., Selbig, J., Walther, D., 2009. Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins. BMC Bioinformatics., 10: 117.
  7. Finn, P., Muggleton, S., Page, D., Srinivasan, A., 1998. Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL. Machine Learning, 30(2-3):241-273.
  8. Ganter, B. and Wille, R., 1999. Formal concept analysis: Mathematical foundations. Springer, Heidelberg, Germany: Springer.
  9. Guharoy, M., Chakrabarti, P., 2005. Conservation and relative importance of residues across protein-protein interfaces. PNAS, 102(43):15447-15452.
  10. Humphrey, W., Dalke, A., Schulten, K., 1996. VMDVisual Molecular Dynamics. J. Molec. Graphics, 14: 33-38.
  11. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S., Emili, A., Snyder, M., Greenblatt, J. F., Gerstein, M., 2003. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 302(5644): 449-53.
  12. Jones, S., Thornton, J., 1997. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol., 272: 121-32.
  13. King, R., 2011. Logic, Automation, and the Future of Biology. Proceedings of the Spring School on Modelling Complex Biological Systems, SophiaAntipolis, France.
  14. Muggleton, S., 1991. Inductive Logic Programming. New Generation Computing, 8(4): 295-318.
  15. Muggleton, S., and De Raedt, L., 1994. Inductive Logic Programming: Theory And Methods. Journal of Logic Programming, 19/20: 629-679.
  16. Obata, T., Yaffe, M. B., Leparc, G. G., Piro, E. T., Maegawa, H., Kashiwagi, A., Kikkawa, R., Cantley L. C., 2000. Peptide and protein library screening defines optimal substrate motifs for AKT/PKB. J. Biol. Chem. 275, 36108-36115.
  17. Page, D., Craven, M., 2003. Biological applications of multi-relational data mining. SIGKDD Explorations, 5(1): 69-79.
  18. Page, D., Srinivasan, A., 2003. ILP: A Short Look Back and a Longer Look Forward. Journal of Machine Learning Research 4: 415-430.
  19. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L., 1999. Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24(1), 25-46.
  20. Punta, M. et al., 2012. The Pfam protein families database. Nucleic Acids Research, 40 (Database Issue): D290- D301.
  21. Smith, G., Sternberg, M., 2002. Prediction of proteinprotein interactions by docking methods. Current Opinion in Structural Biology, 12(1):28-35.
  22. Srinivasan, A., 2007. The Aleph Manual. Available at http://www.comlab.ox.ac.uk/oucl/research/areas/machl earn/Aleph/.
  23. Szathmary, L., 2006. Symbolic Data Mining Methods with the Coron Platform. PhD Thesis in Computer Science, Univ. Henri Poincaré - Nancy 1, France.
  24. Tran, T., Satou, K., Ho, T., 2005. Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data. In: Knowledge Discovery in Databases: PKDD 2005; Lecture Notes in Computer Science Volume 3721; Springer Berlin / Heidelberg.
  25. Tsunoyama, K., Ata Amini, A., Sternberg, M., Muggleton, S., 2008. Scaffold Hopping in Drug Discovery Using Inductive Logic Programming. Journal of Chemical Information and Modeling, 48(5):949-957.
  26. Turcotte, M., Muggleton, S., Sternberg, M., 2001. Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology, 306(3):591-605.
  27. Wong, Yh. et al., 2007. Kinasephos 2.0: A Web Server For Identifying Protein Kinase-Specific Phosphorylation Sites Based on Sequences and Coupling Patterns. Nucleic Acids Res., 35 (Web Server issue): W588-W594.
  28. Yu, C. S., Chen, Y. C., Lu, C. H., Hwang, J. K., 2006. Prediction of protein subcellular localization. Proteins, 64: 643-51.
  29. Zhu, H., Domingues, F. S., Sommer, I., Lengauer, T., 2006. NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics, 7: 27.
Download


Paper Citation


in Harvard Style

Bresso E., Grisoni R., Devignes M., Napoli A. and Smaïl-Tabbone M. (2012). Formal Concept Analysis for the Interpretation of Relational Learning Applied on 3D Protein-binding Sites . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 111-120. DOI: 10.5220/0004149901110120


in Bibtex Style

@conference{kdir12,
author={Emmanuel Bresso and Renaud Grisoni and Marie-Dominique Devignes and Amedeo Napoli and Malika Smaïl-Tabbone},
title={Formal Concept Analysis for the Interpretation of Relational Learning Applied on 3D Protein-binding Sites},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={111-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004149901110120},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Formal Concept Analysis for the Interpretation of Relational Learning Applied on 3D Protein-binding Sites
SN - 978-989-8565-29-7
AU - Bresso E.
AU - Grisoni R.
AU - Devignes M.
AU - Napoli A.
AU - Smaïl-Tabbone M.
PY - 2012
SP - 111
EP - 120
DO - 10.5220/0004149901110120