Sentence Transformers and DistilBERT for Arabic Word Sense Induction

Rakia Saidi, Fethi Jarray, Fethi Jarray

2023

Abstract

Word sense induction (WSI) is a fundamental task in natural language processing (NLP) that consists in discovering the sense associated to each instance of a given target ambiguous word. In this paper, we propose a two-stage approach for solving Arabic WSI. In the first stage, we encode the input sentence into context representations using Transformer-based encoder such as BERT or DistilBERT. In the second stage, we apply clustering to the embedded corpus obtained in the first stage by using K-Means and Agglomerative Hierarchical Clustering (HAC). We evaluate our proposed method on the Arabic WSI summarization task. Experimental results show that our model achieves new state-of-the-art on both the Open Source Arabic Corpus (OSAC)(Saad and Ashour, 2010) and the SemEval arabic (2017).

Download


Paper Citation


in Harvard Style

Saidi R. and Jarray F. (2023). Sentence Transformers and DistilBERT for Arabic Word Sense Induction. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-623-1, pages 1020-1027. DOI: 10.5220/0011891700003393


in Bibtex Style

@conference{icaart23,
author={Rakia Saidi and Fethi Jarray},
title={Sentence Transformers and DistilBERT for Arabic Word Sense Induction},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2023},
pages={1020-1027},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011891700003393},
isbn={978-989-758-623-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Sentence Transformers and DistilBERT for Arabic Word Sense Induction
SN - 978-989-758-623-1
AU - Saidi R.
AU - Jarray F.
PY - 2023
SP - 1020
EP - 1027
DO - 10.5220/0011891700003393