Paper Unlock
Towards the Enrichment of Arabic WordNet with Big Corpora

Topics: Applications: Image Processing and Artificial Vision, Pattern Recognition, Decision Making, Industrial and Real World applications, Financial Applications, Neural Prostheses and Medical Applications, Neural based Data Mining and Complex Information Processing, Neural Network Software and Applications, Applications of Deep Neural networks, Robotics and Control Applications; Learning Paradigms and Algorithms; Self-Organization and Emergence

Authors: Georges Lebboss 1 ; Gilles Bernard 1 ; Noureddine Aliane 1 and Mohammad Hajjar 2

Affiliations: 1 LIASD and Paris 8 University, France ; 2 Lebanese University and IUT, Lebanon

ISBN: 978-989-758-274-5

ISSN: 2184-2825

Keyword(s): Semantic Relations, Semantic Arabic Resources, Arabic WordNet, Synsets, Arabic Corpus, Data Preprocessing, Word Vectors, Word Classification, Self Organizing Maps.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Biomedical Signal Processing ; Computational Intelligence ; Health Engineering and Technology Applications ; Human-Computer Interaction ; Learning Paradigms and Algorithms ; Methodologies and Methods ; Neural Networks ; Neurocomputing ; Neurotechnology, Electronics and Informatics ; Pattern Recognition ; Physiological Computing Systems ; Self-Organization and Emergence ; Sensor Networks ; Signal Processing ; Soft Computing ; Theory and Methods

Abstract: This paper presents a method aiming to enrich Arabic WordNet with semantic clusters extracted from a large general corpus. As the Arabic language is poor in open digital linguistic resources, we built such a corpus (more than 7.5 billion words) with ad-hoc tools. We then applied GraPaVec, a new method for word vectorization using automatically generated frequency patterns, as well as state-of-the-art Word2Vec and Glove methods. Word vectors were fed to a Self Organizing Map neural network model; the clusterings produced were then compared for evaluation with Arabic WordNet existing synsets (sets of synonymous words). The evaluation yields a F-score of 82.1 % for GrapaVec, 55.1 % for Word2Vec's Skipgram, 52.2 % for CBOW and 56.6 % for Glove, which at least shows the interest of the context that GraPaVec takes into account. We end up by discussing parameters and possible biases.

PDF ImageFull Text


Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lebboss, G.; Bernard, G.; Aliane, N. and Hajjar, M. (2017). Towards the Enrichment of Arabic WordNet with Big Corpora.In Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, ISBN 978-989-758-274-5, ISSN 2184-2825, pages 101-109. DOI: 10.5220/0006505701010109

author={Georges Lebboss. and Gilles Bernard. and Noureddine Aliane. and Mohammad Hajjar.},
title={Towards the Enrichment of Arabic WordNet with Big Corpora},
booktitle={Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,},


JO - Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,
TI - Towards the Enrichment of Arabic WordNet with Big Corpora
SN - 978-989-758-274-5
AU - Lebboss, G.
AU - Bernard, G.
AU - Aliane, N.
AU - Hajjar, M.
PY - 2017
SP - 101
EP - 109
DO - 10.5220/0006505701010109

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.