loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Adelle Abdallah 1 ; Hussein Awdeh 1 ; Youssef Zaki 1 ; Gilles Bernard 1 and Mohammad Hajjar 2

Affiliations: 1 LIASD Lab, Paris 8 University, 2 rue de la Liberté 93526 Saint-Denis, Cedex, France ; 2 Faculty of Technology, Lebanese University, Hisbeh Street, Saida, Lebanon

Keyword(s): Arabic Language, Arabic Natural Language Process, Validation Information Retrieval, Silver Standard Corpus.

Abstract: Many methods have been applied to automatic construction or expansion of lexical semantic resources. Most follow the distributional hypothesis applied to lexical context of words, eliminating grammatical context (stopwords). This paper will show that the grammatical context can yield information about semantic properties of words, if the corpus be large enough. In order to do this, we present an unsupervised pattern-based model building semantic word categories from large corpora, devised for resource-poor languages. We divide the vocabulary between high-frequency and lower frequency items, and explore the patterns formed by high-frequency items in the neighborhood of lower frequency words. Word categories are then created by clustering. This is done on a very large Arabic corpus, and, for comparison, on a large English corpus; results are evaluated with direct and indirect evaluation methods. We compare the results with state-of-the-art lexical models for performance and for computa tion time. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.61

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Abdallah, A., Awdeh, H., Zaki, Y., Bernard, G. and Hajjar, M. (2021). Unsupervised Grammatical Pattern Discovery from Arabic Extra Large Corpora. In Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - NCTA; ISBN 978-989-758-534-0; ISSN 2184-3236, SciTePress, pages 211-220. DOI: 10.5220/0010651700003063

@conference{ncta21,
author={Adelle Abdallah and Hussein Awdeh and Youssef Zaki and Gilles Bernard and Mohammad Hajjar},
title={Unsupervised Grammatical Pattern Discovery from Arabic Extra Large Corpora},
booktitle={Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - NCTA},
year={2021},
pages={211-220},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010651700003063},
isbn={978-989-758-534-0},
issn={2184-3236},
}

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - NCTA
TI - Unsupervised Grammatical Pattern Discovery from Arabic Extra Large Corpora
SN - 978-989-758-534-0
IS - 2184-3236
AU - Abdallah, A.
AU - Awdeh, H.
AU - Zaki, Y.
AU - Bernard, G.
AU - Hajjar, M.
PY - 2021
SP - 211
EP - 220
DO - 10.5220/0010651700003063
PB - SciTePress