Automatic Generation of Semantic Patterns using Techniques of Natural Language Processing

Pablo Suarez, Valentín Moreno, Anabel Fraga, Juan Llorens

2013

Abstract

Within the discipline of natural language processing there are diffe-rent approaches to analyze large amounts of text corpus. The identification patterns with semantic elements in a text let us classify and examine the corpus to facilitate interpretation and management of information through computers. This paper proposes the development of a software tool that generates index patterns automatically using various algorithms for lexical, syntactic and semantic analysis of text and integrates the results into other projects in the area of research and other ontological formats. The algorithms in the system implemented various types of analysis in the context of natural language processing, so they can identify grammatical categories and semantic characteristics of words, making up index patterns. The results obtained correspond to a pattern list sorted by frequency of occurrence and take into account intermediate optional elements, which determine its relevance and usefulness to other projects. The developed system proposes a model of generation and storage of patterns, and a control interface that allows the specification of parameters and running reports.

References

  1. Larman, C. UML y Patrones: Una Introducción al Análisis y Diseño Orientado a Objetos y al Proceso Unificado, Segunda Edición, Prentice-Hall, 2002. Chapter 23.
  2. Gómez-Pérez, Asunción. Fernando-López, Mariano. Corcho, Oscar. Ontological Engineering. London: Springer, 2004.
  3. Thomason, Richmond H. What is Semantics? Version 2. March 27, 2012. Available in: http://web.eecs.umich.edu/rthomaso/documents/general/what-is-semantics.html
  4. Amsler, R.A. A Taxonomy for English Nouns and Verbs. Proceedings of the 19th Annual Meeting of the Association for Computational Linguistic. Stanford, California, 1981. pp. 133-138.
  5. Llorens, Juan. Definición de una Metodología y una Estructura de Repositorio Orientadas a la Reutilización: El Tesauro de Software. Universidad Carlos III. 1996.
  6. Moreno, Valentín. Representación del Conocimiento de Proyectos de Software Mediante Técnicas Automatizadas. Anteproyecto de Tesis Doctoral. Universidad Carlos III de Madrid. Marzo 2009.
  7. Cowie, Jim. Wilks, Yorick. Information Extraction. En Dale, r. (ed). Handbook of Natural Language Processing. New York: Marcel Dekker, 2000. pp.241-260.
  8. Dale, R. Symbolic Approaches to Natural Language Processing. En Dale, R (ed). Handbook of Natural Language Processing. New York: Marcel Dekker, 2000.
  9. Riley, M. D. Some Applications of Tree-based Modeling to Speech and Language Indexing. Proceedings of the Darpa Speech and Natural Language Workshop. California: Morgan Kaufmann, 1989. pp. 339-352.
  10. Hopcroft, J. E. Ullman, J. D. introduction to automata theory, languages and computations. addison-wesley, reading, ma, united states. 1979.
  11. Triviño, J. L. Morales Bueno, R. A Spanish Pos Tagger with Variable Memory. in Proceedings of the Sixth International Workshop On Parsing Technologies (iwpt-2000). ACL/SIGPARSE, Trento, Italia, 2000. pp. 254-265.
  12. Martí, M. A. Llisterri, J. Tratamiento del Lenguaje Natural. Barcelona: Universitat de Barcelona, 2002. p. 207.
  13. Abney, Steven. Part-of-speech Tagging and Partial Parsing, S. Young and G. Bloothooft (eds.) Corpus-based Methods in Language and Speech Processing. An Elsnet Book. Bluwey Academic Publishers, Dordrecht. 1997.
  14. Carreras, xavier. Márquez, luis. phrase recognition by filtering and ranking with perceptrons. en proceedings of the 4th ranlp conference, borovets, bulgaria, september 2003.
  15. Weischedel, R. Metter, M. Schwartz, r. Ramshaw, L. Palmucci, J. coping with ambiguity and unknown through probabilistic models. computational linguistics, vol. 19, pp. 359-382.
  16. Poesio, M. semantic analysis. en dale, r. (ed). handbook of natural language processing. new york: marcel dekker, 2000.
  17. Llorens, J., Morato, J., Genova, G. RSHP: an information representation model based on relationships. in ernesto damiani, lakhmi c. jain, mauro madravio (eds.), soft computing in software engineering (studies in fuzziness and soft computing series, vol. 159), springer 2004, pp. 221-253.
  18. Alonso, Laura. Herramientas Libres para Procesamiento del Lenguaje Natural. Facultad de Matemática, Astronomía y Física. UNC, Córdoba, Argentina. 5tas Jornadas Regionales de Software Libre. 20 de Noviembre de 2005. available in: http://www.cs.famaf.unc.edu.ar/ laura/freenlp
  19. Rehberg, C. P. Automatic Pattern Generation in Natural Language Processing. United States Patent. US 8,180,629 b2. May 15, 2012. January, 2010.
Download


Paper Citation


in Harvard Style

Suarez P., Moreno V., Fraga A. and Llorens J. (2013). Automatic Generation of Semantic Patterns using Techniques of Natural Language Processing . In Proceedings of the 4th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2013) ISBN 978-989-8565-76-1, pages 34-44. DOI: 10.5220/0004641500340044


in Bibtex Style

@conference{sky13,
author={Pablo Suarez and Valentín Moreno and Anabel Fraga and Juan Llorens},
title={Automatic Generation of Semantic Patterns using Techniques of Natural Language Processing},
booktitle={Proceedings of the 4th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2013)},
year={2013},
pages={34-44},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004641500340044},
isbn={978-989-8565-76-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2013)
TI - Automatic Generation of Semantic Patterns using Techniques of Natural Language Processing
SN - 978-989-8565-76-1
AU - Suarez P.
AU - Moreno V.
AU - Fraga A.
AU - Llorens J.
PY - 2013
SP - 34
EP - 44
DO - 10.5220/0004641500340044