Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics

Barbara Gawronska, Björn Erlendsson

2005

Abstract

An essential part of bioinformatic research concerns the iterative process of validating hypotheses by analyzing facts stored in databases and in published literature. This process can be enhanced by automatic in-depth text understanding. A prerequisite for this is an adequate syntactic and semantic analysis. The paper presents the results of syntactic, semantic, and textual analysis of a corpus of biomedical abstracts. It focuses on the ways in which relevant molecular interactions are referred to in the abstracts, and proposes a strategy for linking natural language expressions to the standard notation used in Kyoto Encyclopedia of Genes and Genomes. The syntactic and semantic regularities observed in the language of biomedicine are also discussed from the cognitive point of view.

References

  1. Narayanan, A., 2003. Document Technologies for Bioinformatics, Ms, Dept. of Computer Science, University of Exeter, GB.
  2. Narayanan, A., Keedwell, E.C., Olsson, B., 2002. Artificial Intelligence techniques for bioinformatics. In Applied Bioinformatics, Vol. 1 Nr. 4, pp. 191-222.
  3. Narayanan, A., Keedwell, E., Tatinneni, S.S., Gamalielsson, J., 2003. Artificial Neural Networks for Gene Expression Analysis', 19 March, www.dcs.ex.ac.uk/anarayan/publications/combined_gene_expression_paper.pdf
  4. Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H., 2002. Accomplishment and challenges in literature data mining for biology. In Bioinformatics, Vol. 18, Nr. 12, pp. 1553-1561.
  5. Hahn, U., Romacker, M., Schulz, S., 2002. Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system. In Pacific Symposium on Biocomputing 2002, Kauai, Hawaii, USA, pp. 338 - 349.
  6. Gene Ontology general documentation, 2004. An Introduction to Gene Ontology, 18 March, http://www.geneontology.org/GO.doc.html
  7. Smith, B., Williams, J.,Schulze-Kremer, S., 2003. The Ontology of the Gene Ontology. In Proceedings of AMIA Symposium 2003, Ottawa, Canada, pp. 609-613
  8. Putejovsky, J., Castano, J., 2002. Robust relational parsing over biomedical literature: Extracting inhibit relations, Proceedings of PSB 2002, Hawaii, USA, pp. 362-373.
  9. Park, J.C. Kim, H.S., Kim, J.J., 2001. Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. In Proceedings of PSB 2001, Hawaii, USA, pp. 396-407.
  10. Sidner, C., 1983. Focusing in the comprehension of definite anaphora. In Brandy, M. and Berwick, R. C. (eds.) Computational Models of Discourse, pp. 267-330. MIT Press, Cambridge.
  11. Ding, J., Berleant, D., Nettleton, D., Wurtele, E., 2002. Mining MEDLINE: Abstracts, sentences or phrases?, In Proceedings of PSB 2002, Hawaii, USA, pp. 326-337.
  12. Stapley, B., Benoit, G., 2000. Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In Proceedings of PSB 2000, Hawaii, USA, pp. 529-540.
  13. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A., 2001. GENIES: A naturallanguage processing system for the extraction of molecular pathways from journal articles, In Bioinformatics, Vol. 17.
  14. Rindflesch, T., Tanabe, L., Weinstein, J., Hunter, L., 2000. EDGAR: Extraction of drugs, genes, and relations from biomedical literature. In Proceedings of PSB 2000, Hawaii, USA, pp. 517-528.
  15. Gawronska, B, Olsson, B, de Vin, L., 2004a. Natural Language Technology In MultiSource Information Fusion. In Proceedings of the International IPSI-2004k Conference, Kopaonik, Serbia, April 2004, Published on CD with ISBN 86-7466-117-3
  16. Novichkova, S., Egorov, S., and Daraselia, N., 2003. MedScan, a natural language processing engine for MEDLINE abstracts. In Bioinformatics, vol. 19:13, pp. 1699-1706.
  17. Gawronska, B., Torstensson, N., Erlendsson, B., 2004b. Defining and Classifying Space Builders for Information Extraction. In Sharp, B. (ed.): Proceedings of NLUCS- (Natural Language Understanding and Cognitive Science), Porto, Portugal, April 2004, pp 15-27
  18. Kyoto Encyclopaedia of Genes and Genomes. http://www.genome.jp/kegg/, 2005-02-14
  19. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M., 2004. The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280
  20. Reichenbach, H., 1947/1966. Elements of Symbolic Logic, Collier-Macmillan Canada, Toronto, Ontario.
  21. Pettersson,T., 1994. Tense. In Working Papers 42, Dept. of Linguistics, Lund University, Sweden, pp. 179-196.
  22. Mitkov, R., 2003. Anaphora Resolution. In Mitkov, R. (eds.), The Oxford Handbook of Computational Linguistics, Oxford University Press.
Download


Paper Citation


in Harvard Style

Gawronska B. and Erlendsson B. (2005). Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 68-77. DOI: 10.5220/0002566900680077


in Bibtex Style

@conference{nlucs05,
author={Barbara Gawronska and Björn Erlendsson},
title={Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},
year={2005},
pages={68-77},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002566900680077},
isbn={972-8865-23-6X},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics
SN - 972-8865-23-6X
AU - Gawronska B.
AU - Erlendsson B.
PY - 2005
SP - 68
EP - 77
DO - 10.5220/0002566900680077