Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts

Pawandeep Kaur, Dora Kiesel

2020

Abstract

Chart type classification through caption analysis is a new area of study. Distinct keywords in the captions that relate to the visualization vocabulary (e.g., for scatterplot: dot, y-axis, x-axis, bubble) and keywords from the specific domain (e.g., species richness, species abundance, phylogenetic associations in the case of biodiversity research), serve as parameters to train a text classifier. For better chart comprehensibility, along with the visual characteristics of the chart, a classifier should also understand these parameters well. Such conceptual/semantic chart classifiers then will not only be useful for chart classification purposes but also for other visualization studies. One of the applications of such a classifier is in the creation of the domain knowledge-assisted visualization recommendation system, where these text classifiers can provide the recommendation of visualization types based on the classification of the text provided along with the dataset. Motivated by this use case, in this paper, we have explored our idea of semantic chart classifiers. We have taken the assistance of state-of-the-art natural language processing (NLP) and computer vision algorithms to create a biodiversity domain-based visualization classifier. With an average test accuracy (F1-score) of 92.2% over all 15 classes, we can prove that our classifiers can differentiate between different chart types conceptually and visually.

Download


Paper Citation


in Harvard Style

Kaur P. and Kiesel D. (2020). Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 3: IVAPP; ISBN 978-989-758-402-2, SciTePress, pages 157-168. DOI: 10.5220/0008946701570168


in Bibtex Style

@conference{ivapp20,
author={Pawandeep Kaur and Dora Kiesel},
title={Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts},
booktitle={Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 3: IVAPP},
year={2020},
pages={157-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008946701570168},
isbn={978-989-758-402-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 3: IVAPP
TI - Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts
SN - 978-989-758-402-2
AU - Kaur P.
AU - Kiesel D.
PY - 2020
SP - 157
EP - 168
DO - 10.5220/0008946701570168
PB - SciTePress