Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions

Daniel Atzberger, Tim Cech, Willy Scheibel, Jürgen Döllner, Tobias Schreck

2024

Abstract

Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, capturing their association with topics. A subsequent dimensionality reduction creates a two-dimensional scatter plot, illustrating semantic similarity between the documents. A recent study by Atzberger et al. has shown that topic models are beneficial for generating two-dimensional layouts. However, in their study, the hyperparameters of the topic models are fixed, and thus the study does not analyze the impact of the topic models’ quality on the resulting layout. Following the methodology of Atzberger et al., we present a comprehensive benchmark comprising (1) text corpora, (2) layout algorithms based on topic models and dimensionality reductions, (3) quality metrics for assessing topic models, and (4) metrics for evaluating two-dimensional layouts’ accuracy and cluster separation. Our study involves an exhaustive evaluation of numerous parameter configurations, yielding a dataset that quantifies the quality of each dataset-layout algorithm combination. Through a rigorous analysis of this dataset, we derive practical guidelines for effectively employing topic models in text spatializations. As a main result, we conclude that the quality of a topic model measured by coherence is positively correlated to the layout quality in the case of Latent Semantic Indexing and Non-Negative Matrix Factorization.

Download


Paper Citation


in Harvard Style

Atzberger D., Cech T., Scheibel W., Döllner J. and Schreck T. (2024). Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: IVAPP; ISBN 978-989-758-679-8, SciTePress, pages 593-602. DOI: 10.5220/0012391100003660


in Bibtex Style

@conference{ivapp24,
author={Daniel Atzberger and Tim Cech and Willy Scheibel and Jürgen Döllner and Tobias Schreck},
title={Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: IVAPP},
year={2024},
pages={593-602},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012391100003660},
isbn={978-989-758-679-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: IVAPP
TI - Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions
SN - 978-989-758-679-8
AU - Atzberger D.
AU - Cech T.
AU - Scheibel W.
AU - Döllner J.
AU - Schreck T.
PY - 2024
SP - 593
EP - 602
DO - 10.5220/0012391100003660
PB - SciTePress