ArabiaNer: A System to Extract Named Entities from Arabic Content

Mohammad Hudhud, Hamed Abdelhaq, Fadi Mohsen

Abstract

The extraction of named entities from unstructured text is a crucial component in numerous Natural Language Processing (NLP) applications such as information retrieval, question answering, machine translation, to name but a few. Named-entity Recognition (NER) aims at locating proper nouns from unstructured text and classifying them into a predefined set of types, such as persons, locations, and organizations. There has been extensive research on improving the accuracy of NER in English text. For other languages such as Arabic, extracting Named-entities is quite challenging due to its morphological structure. In this paper, we introduce ArabiaNer, a system employing Conditional Random Field (CRF) learning algorithm with extensive feature engineering steps to effectively extract Arabic named Entities. ArabiaNer produced state-of-the-art results with f1-score of 91.31% when applied on the ANERcrop dataset.

Download


Paper Citation


in Harvard Style

Hudhud M., Abdelhaq H. and Mohsen F. (2021). ArabiaNer: A System to Extract Named Entities from Arabic Content.In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, ISBN 978-989-758-484-8, pages 489-497. DOI: 10.5220/0010382404890497


in Bibtex Style

@conference{nlpinai21,
author={Mohammad Hudhud and Hamed Abdelhaq and Fadi Mohsen},
title={ArabiaNer: A System to Extract Named Entities from Arabic Content},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,},
year={2021},
pages={489-497},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010382404890497},
isbn={978-989-758-484-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,
TI - ArabiaNer: A System to Extract Named Entities from Arabic Content
SN - 978-989-758-484-8
AU - Hudhud M.
AU - Abdelhaq H.
AU - Mohsen F.
PY - 2021
SP - 489
EP - 497
DO - 10.5220/0010382404890497