Authors:
Pawandeep Kaur
1
and
Dora Kiesel
2
Affiliations:
1
Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University, Jena, Germany
;
2
Bauhaus University, Faculty of Media, VR Group, Weimar, Germany
Keyword(s):
Caption Analysis, Chart Classification, Natural Language Processing, NLP, Caption Classification, Visualization Recommendation, Chart Recognition
Abstract:
Chart type classification through caption analysis is a new area of study. Distinct keywords in the captions that relate to the visualization vocabulary (e.g., for scatterplot: dot, y-axis, x-axis, bubble) and keywords from the specific domain (e.g., species richness, species abundance, phylogenetic associations in the case of biodiversity research), serve as parameters to train a text classifier. For better chart comprehensibility, along with the visual characteristics of the chart, a classifier should also understand these parameters well. Such conceptual/semantic chart classifiers then will not only be useful for chart classification purposes but also for other visualization studies. One of the applications of such a classifier is in the creation of the domain knowledge-assisted visualization recommendation system, where these text classifiers can provide the recommendation of visualization types based on the classification of the text provided along with the dataset. Motivated by
this use case, in this paper, we have explored our idea of semantic chart classifiers. We have taken the assistance of state-of-the-art natural language processing (NLP) and computer vision algorithms to create a biodiversity domain-based visualization classifier. With an average test accuracy (F1-score) of 92.2% over all 15 classes, we can prove that our classifiers can differentiate between different chart types conceptually and visually.
(More)