Comparative Evaluation of Zero-Shot, Latent Dirichlet Allocation, and Similarity-Based Methods for Automatic Topic Labeling in News Texts
Dilara Adıgüzel, Burcu Yalçıner, Işıl Karabey Aksakallı
2025
Abstract
This study presents a comparative analysis of supervised and unsupervised methods for automatic topic labeling in news articles, emphasizing models that work with unlabeled data. The Reuters-21578 dataset was used to evaluate three distinct approaches: topic modeling, zero-shot classification (ZSC), and similarity-based classification. In the first phase, topic modeling was performed using Latent Dirichlet Allocation (LDA) on 6,440 documents. Fifteen topics were extracted, and the best coherence score achieved was 0.5122 when the number of topics was set to 15. The second phase involved zero-shot classification without labeled training data. Two pre-trained natural language inference (NLI) models—BART-large-MNLI and DeBERTa-v3-MNLI-FEVER—were employed. This approach yielded 63.06% accuracy, 74.12% precision, 63.06% recall, and an F1-score of 66.15%. Three-fold stratified cross-validation produced a consistent average F1-score of 67.96 ± 1.24%, demonstrating good generalization. In the final phase, similarity-based classification was performed using vector representations derived from Term Frequency—Inverse Document Frequency (TF-IDF), Bag-of-Words (BoW), and Word2Vec embeddings. Among these techniques, the TF-IDF-based approach demonstrated the highest performance, achieving 94.47% accuracy and 97.03% precision. The findings reveal the relative strengths and limitations of each approach under different conditions, providing practical insights for real-world applications that involve unlabeled or weakly labeled text data. This work serves as a practical guide for researchers and practitioners seeking effective solutions for automatic topic classification in resource-constrained scenarios.
DownloadPaper Citation
in Harvard Style
Adıgüzel D., Yalçıner B. and Karabey Aksakallı I. (2025). Comparative Evaluation of Zero-Shot, Latent Dirichlet Allocation, and Similarity-Based Methods for Automatic Topic Labeling in News Texts. In Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS; ISBN 978-989-758-783-2, SciTePress, pages 74-81. DOI: 10.5220/0014363100004848
in Bibtex Style
@conference{iceeecs25,
author={Dilara Adıgüzel and Burcu Yalçıner and Işıl Karabey Aksakallı},
title={Comparative Evaluation of Zero-Shot, Latent Dirichlet Allocation, and Similarity-Based Methods for Automatic Topic Labeling in News Texts},
booktitle={Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS},
year={2025},
pages={74-81},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0014363100004848},
isbn={978-989-758-783-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Advances in Electrical, Electronics, Energy, and Computer Sciences - Volume 1: ICEEECS
TI - Comparative Evaluation of Zero-Shot, Latent Dirichlet Allocation, and Similarity-Based Methods for Automatic Topic Labeling in News Texts
SN - 978-989-758-783-2
AU - Adıgüzel D.
AU - Yalçıner B.
AU - Karabey Aksakallı I.
PY - 2025
SP - 74
EP - 81
DO - 10.5220/0014363100004848
PB - SciTePress