A Dictionary based Stemming Mechanism for Polish

Michał Korzycki

2012

Abstract

In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.

References

  1. Chomsky, N.: Aspects of the Theory of Syntax, MIT Press, (1965)
  2. Koskenniemi, K.: Two-level Morphology - A general Computational Model for Word-Form Recognition and Production, University of Helsinki Publication No. 11 (1983)
  3. Lubaszewski, W., Wróbel, H., Gaje?cki, M., Moskal, B., Orzechowska, A., Pietras, P., Pisarek, P., Rokicka, T.: Slownik Fleksyjny je?zyka polskiego, Lexis Nexis, Kraków (2001)
  4. Lubaszewski, W. (ed.): Slowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, Kraków, AGH Press, (2009), original text in Polish
  5. Lubaszewski, W.: A Grammar for the Polish Inflection Lexicon TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk ; ISSN 1428-6394 - (2000) vol. 4 no. 2 s.291-300. - Abstr.
  6. Weiss, D.: Stempelator: A Hybrid Stemmer for the Polish Language. Technical Report RA002/05, Institute of Computing Science, PoznaÁ University of Technology, Poland, (2005).
  7. Weiss, D.: A survey of freely available polish stemmers and evaluation of their applicability in information retrieval. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, Proceedings of the 2nd Language and Technology Conference, pages 216-221, PoznaÁ, Poland, (2005).
  8. Korzycki, M.: Transducer skoÁczenie stanowy jako narze¸dzie rozpoznawania form tekstowych wyrazów [The Finite-State Transducer as a Tool for Polish Inflection Form Recognition], PhD Thesis, AGH (2008)
Download


Paper Citation


in Harvard Style

Korzycki M. (2012). A Dictionary based Stemming Mechanism for Polish . In Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012) ISBN 978-989-8565-16-7, pages 143-150. DOI: 10.5220/0004100301430150


in Bibtex Style

@conference{nlpcs12,
author={Michał Korzycki},
title={A Dictionary based Stemming Mechanism for Polish},
booktitle={Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)},
year={2012},
pages={143-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004100301430150},
isbn={978-989-8565-16-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2012)
TI - A Dictionary based Stemming Mechanism for Polish
SN - 978-989-8565-16-7
AU - Korzycki M.
PY - 2012
SP - 143
EP - 150
DO - 10.5220/0004100301430150