loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Daniel Atzberger 1 ; Tim Cech 2 ; Willy Scheibel 1 ; Jürgen Döllner 1 and Tobias Schreck 3

Affiliations: 1 Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany ; 2 Digital Engineering Faculty, University of Potsdam, Germany ; 3 Graz University of Technology, Austria

Keyword(s): Topic Modeling, Dimensionality Reductions, Text Spatializations.

Abstract: Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, capturing their association with topics. A subsequent dimensionality reduction creates a two-dimensional scatter plot, illustrating semantic similarity between the documents. A recent study by Atzberger et al. has shown that topic models are beneficial for generating two-dimensional layouts. However, in their study, the hyperparameters of the topic models are fixed, and thus the study does not analyze the impact of the topic models’ quality on the resulting layout. Following the methodology of Atzberger et al., we present a comprehensive benchmark comprising (1) text corpora, (2) layout algorithms based on topic models and dimensionality redu ctions, (3) quality metrics for assessing topic models, and (4) metrics for evaluating two-dimensional layouts’ accuracy and cluster separation. Our study involves an exhaustive evaluation of numerous parameter configurations, yielding a dataset that quantifies the quality of each dataset-layout algorithm combination. Through a rigorous analysis of this dataset, we derive practical guidelines for effectively employing topic models in text spatializations. As a main result, we conclude that the quality of a topic model measured by coherence is positively correlated to the layout quality in the case of Latent Semantic Indexing and Non-Negative Matrix Factorization. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.222.115.120

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Atzberger, D.; Cech, T.; Scheibel, W.; Döllner, J. and Schreck, T. (2024). Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - IVAPP; ISBN 978-989-758-679-8; ISSN 2184-4321, SciTePress, pages 593-602. DOI: 10.5220/0012391100003660

@conference{ivapp24,
author={Daniel Atzberger. and Tim Cech. and Willy Scheibel. and Jürgen Döllner. and Tobias Schreck.},
title={Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - IVAPP},
year={2024},
pages={593-602},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012391100003660},
isbn={978-989-758-679-8},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - IVAPP
TI - Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions
SN - 978-989-758-679-8
IS - 2184-4321
AU - Atzberger, D.
AU - Cech, T.
AU - Scheibel, W.
AU - Döllner, J.
AU - Schreck, T.
PY - 2024
SP - 593
EP - 602
DO - 10.5220/0012391100003660
PB - SciTePress