loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Amina Amara ; Mohamed Ali Hadj Taieb and Mohamed Ben Aouicha

Affiliation: Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Tunisia

Keyword(s): Topic Modeling, Latent Dirichlet Allocation, Word Representation Learning, Covid-19, Multilingual.

Abstract: The value of user-generated content on social media platforms has been well established and acknowledged since their rich and subjective information allows for favorable computational analysis. Nevertheless, social data are often text-heavy and unstructured, thereby complicating the process of data analysis. Topic models act as a bridge between social science and unstructured social data analysis to provide new perspectives for interpreting social phenomena. Latent Dirichlet Allocation (LDA) is one of the most used topic modeling techniques. However, the LDA-based topic models alone do not always provide promising results and do not consider the recent advancement in the natural language processing field by leveraging word embeddings when learning latent topics to capture more word-level semantic and syntactic regularities. In this work, we extend the LDA model by mixing the Skip-gram model with Dirichlet-optimized sparse topic mixtures to learn dense word embeddings jointly with the Dirichlet distributed latent document-level mixtures of topic vectors. The embeddings produced through the proposed model were submitted to experimental evaluation using a Covid-19 based multilingual dataset extracted from the Facebook social network. Experimental results show that the proposed model outperforms all compared baselines in terms of both topic quality and predictive performance. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.137.167.107

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Amara, A.; Hadj Taieb, M. and Ben Aouicha, M. (2024). Joining LDA and Word Embeddings for Covid-19 Topic Modeling on English and Arabic Data. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 275-282. DOI: 10.5220/0012320900003636

@conference{icaart24,
author={Amina Amara. and Mohamed Ali {Hadj Taieb}. and Mohamed {Ben Aouicha}.},
title={Joining LDA and Word Embeddings for Covid-19 Topic Modeling on English and Arabic Data},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={275-282},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012320900003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Joining LDA and Word Embeddings for Covid-19 Topic Modeling on English and Arabic Data
SN - 978-989-758-680-4
IS - 2184-433X
AU - Amara, A.
AU - Hadj Taieb, M.
AU - Ben Aouicha, M.
PY - 2024
SP - 275
EP - 282
DO - 10.5220/0012320900003636
PB - SciTePress