EFFICIENT LITERATURE RESEARCH BASED ON SEMANTIC TAGNETS - Implemented and Evaluated for a German Text-corpus

Uta Christoph, Daniel Götten, Karl-Heinz Krempels

2010

Abstract

In this paper we present an approach that is capable to automatically generate semantic tagnets for given sets of german tags (keywords) and an arbitrary text corpus using three different analysis methods. The resulting tagnets are used to estimate similarities between texts that are manually tagged with the keywords from the given tagset. Basically, this approach can be used in digital libraries to provide an efficient and intuitive interface for literature research. Although it is mainly optimized for the german language the proposed methods can easily be enhanced to generate tagnets for a given set of english keywords.

References

  1. Ananthanarayanan, R., Chenthamarakshan, V., Deshpande, P. M., and Krishnapuram, R. (2008). Rule based synonyms for entity extraction from noisy text. In AND 7808: Proceedings of the second workshop on Analytics for noisy unstructured text data, pages 31-38, New York, NY, USA. ACM.
  2. Barnett, B. (2009). Regular expressions, http://www.grymoire.com/unix/regular.html.
  3. Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web - a new form of web content that is meaningful to computers will unleash a revolution of new possibilities.
  4. Chaffin, R. (1992). The concept of a semantic relation. In A. Lehrer, E. K., editor, Frames, Fields and Contrasts, pages 253-288. Lawrence Erlbaum, Hillsdale, N.J.
  5. Collins, A. and Quillian, M. (1969). Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8(2):240-247.
  6. Doan, A. and McCann, R. (2003). Building data integration systems: A mass collaboration approach. In IIWeb, pages 183-188.
  7. Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press.
  8. Gaizauskas, R. and Humphreys, K. (1997). Using a semantic network for information extraction. Nat. Lang. Eng., 3(2):147-169.
  9. Götten, D. (2009). Semantische Schlagwortnetze zur effizienten Literaturrecherche. Master's thesis, RWTH Aachen University.
  10. Harris, Z. (1985). In Katz, J. J., editor, The Philosophy of linguistics, pages 26-47. Oxford University Press.
  11. Harrison, M. A. (1978). Introduction to Formal Language Theory. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  12. Lee, M. D., Pincombe, B., and Welsh, M. (2005). An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, pages 1254-1259, Mahwah, NJ. Erlbaum.
  13. Löbner, S. (2003). Semantik. Eine Einführung.
  14. Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22-31.
  15. Perera, P. and Witte, R. (2005). A self-learning contextaware lemmatizer for german. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 636-643, Vancouver, British Columbia, Canada. Association for Computational Linguistics.
  16. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130-137.
  17. Porter, M. F. (2009). German stemming algorithm.
  18. Quillian, M. R. (1967). Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral Science, 12:410-430.
  19. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing.
  20. Schmid, H. (1995). Improvements in part-of-speech tagging with an application to german. In In Proceedings of the ACL SIGDAT-Workshop, pages 47-50.
  21. Sowa, J., editor (1991). Principles of Semantic Networks: Explorations in the Representation of Knowledge (Morgan Kaufmann Series in Representation and Reasoning). Morgan Kaufmann Pub.
  22. Sowa, J. (2009). Semantic http://www.jfsowa.com/pubs/semnet.htm.
Download


Paper Citation


in Harvard Style

Christoph U., Götten D. and Krempels K. (2010). EFFICIENT LITERATURE RESEARCH BASED ON SEMANTIC TAGNETS - Implemented and Evaluated for a German Text-corpus . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST, ISBN 978-989-674-025-2, pages 48-54. DOI: 10.5220/0002805400480054


in Bibtex Style

@conference{webist10,
author={Uta Christoph and Daniel Götten and Karl-Heinz Krempels},
title={EFFICIENT LITERATURE RESEARCH BASED ON SEMANTIC TAGNETS - Implemented and Evaluated for a German Text-corpus},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,},
year={2010},
pages={48-54},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002805400480054},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,
TI - EFFICIENT LITERATURE RESEARCH BASED ON SEMANTIC TAGNETS - Implemented and Evaluated for a German Text-corpus
SN - 978-989-674-025-2
AU - Christoph U.
AU - Götten D.
AU - Krempels K.
PY - 2010
SP - 48
EP - 54
DO - 10.5220/0002805400480054