loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: György Orosz and Attila Novák

Affiliation: Pázmány Péter Catholic University, Hungary

Keyword(s): Part-of-Speech Tagging, Morphological Disambiguation, Lemmatization, Agglutinative Languages, Open Source.

Abstract: This paper presents PurePos, a new open source Hidden Markov model based morphological tagger tool that has an interface to an integrated morphological analyzer and thus performs full disambiguated morphological analysis including lemmatization of words both known and unknown to the morphological analyzer. The tagger is implemented in Java and has a permissive LGPL license thus it is easy to integrate and modify. It is fast to train and use while having an accuracy on par with slow to train Maximum Entropy or Conditional Random Field based taggers. Full integration with morphology and an incremental training feature make it suited for integration in web based applications. We show that the integration with morphology boosts our tool’s accuracy in every respect – especially in full morphological disambiguation – when used for morphologically complex agglutinating languages. We evaluate PurePos on Hungarian data demonstrating its state-of-the-art performance in terms of tagging precisi on and accuracy of full morphological analysis. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 35.175.172.94

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Orosz, G. and Novák, A. (2012). PurePos: An Open Source Morphological Disambiguator. In Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2012) - NLPCS; ISBN 978-989-8565-16-7, SciTePress, pages 53-63. DOI: 10.5220/0004090300530063

@conference{nlpcs12,
author={György Orosz. and Attila Novák.},
title={PurePos: An Open Source Morphological Disambiguator},
booktitle={Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2012) - NLPCS},
year={2012},
pages={53-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004090300530063},
isbn={978-989-8565-16-7},
}

TY - CONF

JO - Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2012) - NLPCS
TI - PurePos: An Open Source Morphological Disambiguator
SN - 978-989-8565-16-7
AU - Orosz, G.
AU - Novák, A.
PY - 2012
SP - 53
EP - 63
DO - 10.5220/0004090300530063
PB - SciTePress