loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Andrés Tomás Hohendahl 1 ; José Francisco Zelasco 2 and Judith Donayo 2

Affiliations: 1 Laboratorio de Estereología y Mecánica Inteligente, Universidad de Buenos Aires; Instituto de Ingeniería Biomédica, Facultad de Ingeniería Universidad de Buenos Aires, Argentina ; 2 Facultad de Ingeniería, Universidad de Buenos Aires, Argentina

Abstract: We present a multilingual robust morphologic tagger and tokenizer for highly inflected languages like Spa-nish, with efficient spell correction and ‘sound-like’ word inference, obtaining some semantic extraction even on parasynthetic and unknown words. This algorithm combines rules, statistical best-affix-fit along with a language estimator. A rich flag set controls the internal behaviour. The system has been designed for efficiency and low memory footprint, using data structures based on simple available affixing rules. Our system, packed with a Spanish dictionary of 83k lemmas and 5k rules, recognizes 2.2M exact words, the guessing word-space is many times this much.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.138.105.31

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Tomás Hohendahl, A.; Zelasco, J. and Donayo, J. (2010). Robust Morphologic Analyzer for Highly Inflected Languages. In Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010) - NLPCS; ISBN 978-989-8425-13-3, SciTePress, pages 112-118. DOI: 10.5220/0003015301120118

@conference{nlpcs10,
author={Andrés {Tomás Hohendahl}. and José Francisco Zelasco. and Judith Donayo.},
title={Robust Morphologic Analyzer for Highly Inflected Languages},
booktitle={Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010) - NLPCS},
year={2010},
pages={112-118},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003015301120118},
isbn={978-989-8425-13-3},
}

TY - CONF

JO - Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010) - NLPCS
TI - Robust Morphologic Analyzer for Highly Inflected Languages
SN - 978-989-8425-13-3
AU - Tomás Hohendahl, A.
AU - Zelasco, J.
AU - Donayo, J.
PY - 2010
SP - 112
EP - 118
DO - 10.5220/0003015301120118
PB - SciTePress