Word-sized Visualizations for Exploring

Discussion Diversity in Social Media

Franziska Huth

1 a

, Tanja Blascheck

1 b

, Steffen Koch

1 c

, Sonja Utz

2 d

and Thomas Ertl

1 e

University of Stuttgart, Germany

Leibniz-Institut f

ur Wissensmedien T

ubingen and University of T

ubingen, Germany

Keywords:

Word-sized Visualization, Visualization of Categories, Social Media Analysis.

Abstract:

In this paper, we explore the design space of word-sized visualizations—small graphics, usually the same

size as a word, that visualize data in or related to a text—for displaying and exploring categories in social

media feeds such as Twitter streams. Social media contributions are typically microposts, which allow us

to attach word-sized visualizations to show category assignment, diversity, or development. We consider

and combine word-sized visualizations made up of basic marks and visual variables, existing word-sized

visualization concepts, as well as large text visualizations. In an application example we show how word-

sized visualizations can evince context changes within a discussion on Twitter and reveal topic diversity.

1 INTRODUCTION

Context change is a common phenomenon in social

media—text content, photos, and videos are often torn

out of context or changed while sharing. This re-

quires extreme caution when reading or sharing digi-

tal content in times of “fake news” and “social bots”

(Vosoughi et al., 2018). Especially because the gate-

keeping inﬂuence of journalists has diminished, con-

tent is moderated less and less while it is spreading

at an ever increasing rate. To the purpose of help-

ing people investigate context change as well as dis-

cussion diversity within (controversial) discussions in

social media, usually large visual analytics systems

are proposed. Often the target audience of visual an-

alytics systems are experts, as they give a top-down

overview of data and involve a learning phase.

Typically, social media feeds compile a number of

microposts, e.g., Twitter messages, which are usually

short and there is not much space for larger visualiza-

tions. Therefore, these microposts can be enriched by

placing word-sized visualizations, i.e., small visual-

izations of word size, next to them. We investigate the

design space of word-sized visualizations for convey-

https://orcid.org/0000-0003-1393-5641

https://orcid.org/0000-0003-4002-4499

https://orcid.org/0000-0002-8123-8330

https://orcid.org/0000-0002-7979-3554

https://orcid.org/0000-0003-4019-2505

ing discussion diversity of a discourse in a social me-

dia feed, helping people understand context change

and observe how a discourse changes over time.

Our goal is to give an overview about discussion

diversity and embed this information into each micro-

post, to enable a person to place this micropost into a

larger context without having to leave the social me-

dia feed. Our approach permits a person to investigate

the driving factors within a discussion, inspect con-

text change at a glance within a social media feed, ob-

serve how diverse the discourse is during a discussion,

examine where in that discourse individual contribu-

tions are located, as well as follow the development

and change of a discussion over time. This can help

people realize how easily content on social media is

placed out of context or in a different context.

We explore which word-sized visualizations are

suitable for social media data. We assume that the

individual microposts within a social media feed can

be categorized and each micropost can be assigned to

one speciﬁc category. For example, Twitter messages

from a political campaign could be labeled based on

the topics discussed (e.g., education, climate change,

economy, pandemic). The word-sized visualizations

we imagine show the category for each micropost as

well as the larger context it is discussed in.

We use a systematic approach by exploring poten-

tial word-sized visualizations made from scratch by

exploiting the basic marks and visual variables from

visualization design. In addition, we look at the exist-

256

Huth, F., Blascheck, T., Koch, S., Utz, S. and Ertl, T.

Word-sized Visualizations for Exploring Discussion Diversity in Social Media.

DOI: 10.5220/0010328602560265

In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 3: IVAPP, pages

256-265

ISBN: 978-989-758-488-6

ing collection of word-sized visualizations (Latif and

Beck, 2018) and discuss how they can be adapted to

our needs. Last, we examine the suitability of large

visualizations and visual analytics systems available

for text visualizations to investigate if the visual el-

ements used in these visualizations can be scaled to

word size. With an application example, we demon-

strate how different word-sized visualizations are in-

tegrated into a Twitter stream and describe what peo-

ple can observe about the diversity of the discussion.

2 RELATED WORK

We discuss the related work of the research areas that

are most relevant to this paper—how discourse in so-

cial media can be analyzed with the help of visualiza-

tions and visual analytics systems, as well as design

of word-sized visualizations and their applications.

2.1 Visual Analysis of Discourse in

Social Media

For visualizing topics, sentiment, or discourse, as well

as their development in social media, in contrast to

our bottom-up approach usually top-down visual an-

alytics systems are proposed. Several versions of vi-

sualizing topic development as streams exist (Cuenca

et al., 2018; Havre et al., 2002; Xu et al., 2013), with

variations such as adding keywords and glyph mark-

ers for critical events (Cui et al., 2011), or grouping by

sub-topics (Heimerl et al., 2016a). When geographi-

cal data is available or extracted from the text corpus,

in addition to the categorical data that we use, spatio-

temporal visualizations can be implemented, for ex-

ample, chart overlays on a map (Bosch et al., 2013;

He and Chen, 2016) to show topics in certain regions.

Due to their visual complexity, however, those are

more difﬁcult to scale to word size and might require

some abstraction.

For the task of analyzing discourse in social media

or blogs, Heimerl et al. (2016b) and Kim et al. (2017)

use point clouds to show sub-groups of topics in two-

dimensional arrangements, while Brandes and Cor-

man (2016) and Liu et al. (2014) employ node-link

diagrams for displaying how topics are connected. El-

Assady et al. (2018) contribute a timeline that is con-

nected with arcs to show conversation threads. How-

ever, these works all focus on different aspects of data

from social media feeds than we do, e.g., connections

between individual social media contributions on a

topic, or the back-and-forth of a discussion.

To track the progression of a discourse, El-Assady

et al. (2016) place speakers of a conversation in one

of multiple topic-corners of a circle, with trails across

the circle to signal which topic they moved from

last. The visualization is interactive and has differ-

ent small visual elements, again designed as a top-

down approach and while effective for their use-case,

not transferable to our data. Circular graphs are fur-

ther used by Liu et al. (2016) and Xu et al. (2017)

to visualize opinion in the center and topics arranged

around it. However, in general a circular graph is not

well suited as a word-sized visualization because of

its small aspect ratio. When scaled to word size, it

becomes narrow and details are not distinguishable.

Coordinated views combine several of the above

visual elements, e.g., by Lu et al. (2018). In con-

trast, our goal is to show one or two word-sized vi-

sualizations instead of multiple visualizations at the

same time.

2.2 Word-sized Visualizations

The ﬁrst versions of word-sized visualizations were

introduced by Tufte (2006), who called them

sparklines. Building on that, Latif and Beck (2018)

as well as Gofﬁn et al. (2014, 2017) discuss design

considerations for word-sized visualizations in gen-

eral, which we expand on in this paper, while Heer

et al. (2009) explore options for reducing the height of

line charts to ﬁt text size. In addition to deﬁning de-

sign aspects, we explore in this paper which of these

word-sized visualizations are suitable for visualizing

discussion diversity within a larger discussion. Gof-

ﬁn et al. (2015, 2020) investigate how word-sized vi-

sualizations affect the reading behavior of people and

present various interaction techniques.

Supplementary to those general considerations,

several usage scenarios for word-sized visualizations

have been proposed, like visualizing eye tracking

data (Beck et al., 2017), code understanding and

code quality visualization (Hoffswell et al., 2018),

and better understanding of scientiﬁc texts (Beck and

Weiskopf, 2017). VIS author proﬁles (Latif and Beck,

2019) and sparkClouds (Lee et al., 2010) focus on

using word-sized visualizations for development over

time. However, none of those employ word-sized vi-

sualizations for social media data.

2.3 Applications

Real-world applications of word-sized visualizations

include visualizing stock price development with

spark lines, as introduced by Tufte (2006) ,

or analyzing soccer matches as in Perin et al. (2013).

Word-sized graphics that are not data driven, and thus

less interesting for our paper, are widely used. Those

Word-sized Visualizations for Exploring Discussion Diversity in Social Media

257

include emoticons , especially in social media

contributions, and icons, for example, for showing

readers which keys to press when giving software re-

lated instructions

Ctrl

+ +

Esc

3 DESIGN SPACE

Before discussing word-sized visualizations embed-

ded into microposts of social media feeds, we de-

scribe the data source we consider, the structure of the

data, speciﬁc design aspects, tasks, and requirements

that our word-sized visualizations need to fulﬁll and

to conﬁne our design space.

3.1 Data Source

The underlying data source for the word-sized vi-

sualizations are social media feeds with microposts,

such as Twitter

messages, and social news aggre-

gation platforms like Reddit

, Instagram

stories, or

Wikipedia

articles. We deﬁne a corpus as a subset of

such a social media feed based on a common connec-

tion, for example, a hashtag, a community, an event,

a personal feed, or an item that is shared. We refer

to this common connection as a digital fragment, that

can be a link to an image, a video, or a news article.

Each micropost in the social media feed is considered

as an individual data item. The word-sized visualiza-

tion is embedded into this data item, for example, it is

placed at the end of a Twitter message (cf. Figure 1).

3.2 Data Structure

A corpus is a collection of data items, with exactly

one data item representing the digital fragment. We

categorize, i.e., label, the data items to analyze the di-

versity of a corpus. For example, a categorization can

be based on a clustering approach, topic modeling,

sentiment analysis, or manual labeling. Therefore,

our ﬁrst dimension is a category. Each data item is as-

signed to exactly one category. Because we consider

microposts, such as Twitter messages, which are short

text messages, we assume that each micropost only

contains one aspect. Each data item has a time stamp,

allowing us to investigate a speciﬁc point in time or

a time-series with a range of time stamps. Therefore,

our second dimension is time. In addition, the word-

sized visualizations should show the data item explic-

www.twitter.com

www.reddit.com

www.instagram.com

www.wikipedia.org

itly and potentially also the digital fragment. Thus,

these two form the third and fourth dimension.

3.3 Design Aspects

A category contains a number of data items from the

corpus. The number of possible categories can be be-

tween two and possibly inﬁnite. If the number of cate-

gories becomes too large, however, the representation

becomes difﬁcult due to the limited size of the word-

sized visualizations as width and height are typically

constrained to a few hundred pixels. Therefore, if a

large number of categories is represented, the space

per category might be limited to one pixel. However,

we categorize the data items to decrease the complex-

ity of the corpus and assume that an appropriate num-

ber of categories is chosen.

Based on the chosen categorization scheme, the

categories can have an inherent order (e.g., political

parties are often ordered from left to right based on

their political orientation) or not (e.g., if clustering is

used the order is arbitrary). If an inherent order is

given, the word-sized visualization should represent

this order. When the categories have no inherent or-

der, an order should be chosen and should not change

to preserve the mental map a person builds.

Table 1 shows the dimensions with example word-

sized visualizations for each of them. Representing

the size—the number of data items in each category—

again requires some consideration. For example, one

category can be overly large or small in comparison

to the other categories. In either case a minimum or

maximum width can be deﬁned, however, this may

come with incorrectly representing the exact number

of data items. A tooltip showing the exact number or

a visual mark indicating this discrepancy (e.g., using

a color gradient ) can be added. Also, if many

categories are small they should not be placed next

to each other to avoid them blurring together into one

category .

In addition, the corpus can change over time.

Therefore, we are both interested in representations

for a speciﬁc point in time or a representation depict-

ing a time range. If we consider a time range, we can

either focus on how the number of data items per cat-

Table 1: Dimensions of our word-sized visualizations. Cat-

egory can represent the number of data items (size) or not.

Time can be a point in time or a range. The data item is

shown as a circle or as multiple circles connected by lines.

category

no size size

time

point

range

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

258

egory changes over time or we can investigate how

the category of an individual data item changes over

time. Categories changing over time can mean that

new categories are added or removed, but also that

the number of data items changes (e.g., more/less data

items belong to a category) . Again, the order-

ing should remain as stable as possible to not disturb

the mental map an analyst has of the categories. New

categories should be added at the top and categories

that are removed at the bottom, similar to Sankey Di-

agrams (Kennedy and Sankey, 1898). If the category

of an individual data item changes over time, multi-

ple markers need to be added at different time steps

. This could lead to overcrowding and using a

second encoding such as value to indicate older points

in time can reduce this effect .

We assume that each data item can be assigned

to a category. In cases for which there is no cate-

gory that ﬁts, an “other” category could be used. This

special type of category should then be highlighted to

indicate that something is different about it (e.g., sep-

arating the “other” category from the rest by adding

some horizontal space ). Another option is to

not represent the data items marker in the word-sized

visualization, or show the closest match and encode it

in a speciﬁc way to indicate this mismatch .

Last, the digital fragment that deﬁned the corpus

can be explicitly shown or not. For example, if the

digital fragment is a link to an image that is shared

and discussed on Twitter, this digital fragment is also

assigned to a category, because it is also a data item in

the corpus. Therefore, we need a separate representa-

tion for this speciﬁc data item to show, for example,

where the digital fragment is positioned in compari-

son to an individually inspected data item (up-

per and lower marker, respectively).

Other aspects we need to consider when design-

ing word-sized visualizations due to their small form-

factor are deﬁned by Latif and Beck (2018). Word-

sized visualizations do not show labels, grid lines,

axes, or legends. Their size should ﬁt into an as-

pect ratio that is 45

◦

for the average orientation of

a line segment. Latif and Beck (2018) also recom-

mend using a border to enhance interpretability and

readability. Another challenge is the use of colors

in the context of word-sized visualizations. Using

large color or brightness contrasts helps to discern

colors for small areas (Latif and Beck, 2018). In addi-

tion, we are limited to twelve colors that humans can

distinguish (Ware, 2013) and should use color-blind

friendly colors. In all our examples, we use the color

scheme provided by Harrower and Brewer (2003) for

qualitative data.

3.4 Tasks and Requirements

We assume a lay person as our primary analyst. The

main tasks we consider are that the analyst wants to

get an overview (T1) about different aspects within a

corpus (e.g., inspect the diversity of a discussion); in-

vestigate the driving factor within a discussion (T2)

(e.g., if the digital fragment drives the discussion); in-

spect a change of context (T3) from a digital frag-

ment (e.g., is someone drawing attention to a differ-

ent issue); examine the development and change of a

discussion over time (T4) (e.g., which issues are dis-

cussed at what point in time and how frequently).

Based on these tasks, the description and the data

structure presented above, the following requirements

for our word-sized visualizations ensue. First, a lim-

ited number of categories are always represented with

a stable ordering that is predeﬁned (R1). Next, the

current data item is represented and shown at the cat-

egory it belongs to (R2) and the digital fragment can

optionally be shown (R3). The word-sized visualiza-

tion should either show one speciﬁc point in time or

a time range (R4). Word-sized visualization speciﬁc

design considerations need to be followed (R5) (e.g.,

width, height, etc.), and perception issues should be

considered (R6) (e.g., color-blind friendly).

4 WORD-SIZED VISUALIZATION

DESIGNS

Based on the design aspects and requirements, we

designed and collected word-sized visualizations that

would represent our different dimensions (category,

time, data item, and digital fragment). Our ﬁrst step is

to use the basic marks and visual variables for design-

ing novel visualizations and explore how to use them

to represent the dimensions. Next, we look at exist-

ing word-sized visualizations (Latif and Beck, 2018)

and how we can adapt them to our needs. Last, we in-

spect state-of-the-art text visualizations (Kucher and

Kerren, 2015) ﬁtting for our dimensions and discuss

if and how we can shrink them to word size.

Our main dimension is category, therefore, the fo-

cus is to ﬁnd a representation of the categories ﬁrst,

since we always want to represent our complete cor-

pus to set the data item into this larger scope instead

of only depicting the data item with its category (R1).

Therefore, after ﬁnding a suitable representation for

the categories, we add a reference to the speciﬁc data

item (R2). Optionally, the digital fragment is shown

together with the data item (R3). In a ﬁrst iteration,

we focus on a speciﬁc point in time. Later, we look at

Word-sized Visualizations for Exploring Discussion Diversity in Social Media

259

word-sized visualizations that could be used to repre-

sent our data depicting changes over time (R4).

4.1 Visual Variables

Table 2 shows examples of different visual vari-

ables (Bertin, 1967) suitable for representing multi-

ple categories and a data item using the area mark.

All marks can represent categories in the word-sized

visualizations. However, at word scale we need to

ensure that the point is large enough, e.g., at least

the size of the letter “o”. Lines too close together

lead to blurring effects, therefore, enough space be-

tween the lines needs to be added . Area can be a

square, a horizontal bar, or vertical bar. Since word-

sized visualizations are most often elongated, squares

or vertical bars use space bet-

ter than horizontal bars . A more compact rep-

resentation can be used if the order of the categories

is not relevant .

For representing categories, the chosen visual

variable needs to be selective, i.e., easy to distinguish

from the others, and have a certain length, i.e., the

number of values that can be differentiated. Possible

visual variables are position , color , tex-

ture , and shape (Carpendale,

2003). Each of them has some drawbacks: position is

ordered, which is only a slight drawback in this case

because some order needs to be chosen. Color, al-

though theoretically inﬁnite, is limited in its length to

about 12 colors, which is the number of colors a hu-

man can distinguish (Ware, 2013). Texture is not well

suited to be scaled to word size. Shape, although po-

tentially inﬁnite in length, in practice is limited when

scaled to word size because similar shapes, for exam-

ple, a pentagon and a circle , can be hard to

distinguish when too small. If order is of relevance

for the category, then position is the most effective

visual variable to represent this . Even though

value and size are also visual variables that

can depict order, they are less suited for categorical

data (Wolfe and Horowitz, 2004). If the number of

data items in a category is to be encoded as well, the

most effective visual variables are position , value

, and size as they show which category has

more or less data items (Wolfe and Horowitz, 2004).

The beneﬁt of value over size is that we are less lim-

ited in the categories becoming small (=1px).

For an appropriate representation of the data item,

selectiveness is particularly important, which all vi-

sual variables (partially) fulﬁll (Carpendale, 2003).

However, some combinations are less suited than oth-

ers. For example, if using the area mark for the

data item, position (on the y-axis) leads to vertically

squeezed word-sized visualizations . If color is

not used for the categories, this is a good choice for

the data item . Changing the x-position of a line,

point , or also shape based on a data items cate-

gory are good choices as well.

As an addition, the digital fragment can also be

explicitly shown. If we extend the word-sized visu-

alizations we need another visual variable to repre-

sent the digital fragment. This could be, for example,

position—the digital fragment is always shown at the

upper half and the data item in the lower half of the

word-sized visualization using a line marker for both

. We can also use two distinct colors ,

textures , shapes , values , or sizes

for each item. One challenge to consider when

choosing a visual variable is the case in which the data

item and the digital fragment are shown together. Ei-

ther we need a double encoding, for example, position

and color, or we need a visual variable that is easily

distinguishable, for example, shape.

If we consider a time range, we can either focus on

how the number of data items per category changes

over time or we can investigate how the category of

an individual data item changes over time. Time is

represented either on the x- or y-axis. The other axis

can then be used to encode the categories with size

or without size changes, in this case the change

of the data item is of relevance . Another option

is to use value to indicate a change over time. This is

both suitable for representing the change of categories

for an individual data item or the size of the

categories for multiple time steps .

4.2 Existing Word-sized Visualizations

Latif and Beck (2018) discuss word-sized visualiza-

tions for different data types: univariate, multivariate,

spatial, as well as relational data. We inspect each of

the word-sized visualizations they propose for these

data types and discuss if and how they could be ap-

plied to our data. We refer to the visual variables and

word-sized visualizations mentioned in Section 4.1

when appropriate.

A simple quantity chart has limited options,

only color and area, for showing the data. It cannot vi-

sualize multiple categories and might, at best, be used

to show the total number of categories found or the

number of data items within a category in compari-

son to the number of data items in the corpus. Much

better suited is the subdivided area chart with which

we can display multiple categories. This can be done

either with ﬁxed-width elements

to show which

categories there are, or with elements whose width is

data-driven to show, for example, the size (the

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

260

Table 2: Word-sized visualizations representing multiple categories and one data item using different visual variables (Bertin,

1967) (except orientation) for a speciﬁc point in time.

category

position color texture shape value size

data item

position

color x

texture x

shape x

value x

size x

number of data items) of the categories in relation to

each other. If color is used for the different categories,

then shape, value, or texture markers can be added to

show which categories the data item and digital frag-

ment belong to . Similarly this holds if any of

the other visual variables are used for the categories.

However, if a category is small, it is hard to see. This

can be remedied by enforcing a minimum width in

the visualization, although this either leads to a dis-

tortion of the relations of the categories to each other,

or makes the whole visualization wider.

Line and area charts can be used to

show the development of the size of one category over

time. For showing multiple categories, line compari-

son and stacked area charts can be use-

ful and allow marking the categories of both data item

and digital fragment with texture, value, shape, or size

(line width). Due to the added time component, they

are, however, slightly harder to read than the subdi-

vided area chart. Similarly, small categories can be-

come barely visible. Enforcing a minimum category

size in the visualization, however, leads to more dis-

tortion than for the subdivided are chart.

When using a bar chart , the underlying data

can either be the size of the categories or one cat-

egory over time in time intervals, which makes it a

histogram . For a bar comparison chart ,

the data is the size of multiple categories over time,

for which color value is used to depict time. How-

ever, for a medium or large number of categories, it

becomes prone to visual overcrowding. If there are

small categories, the same problem arises as with line

charts, as word-sized bar charts cannot extend in the

y-direction. As for markers, for the bar chart texture,

value, shape and size (border around the bar) and for

the bar comparison chart texture, shape and size are

possible and reasonably selective.

A table

can represent the number of data

items per time step, encoded using grayscale or color

values—the darker, the more data items belong to a

category. The row or individual cells can be high-

lighted (using texture, shape, or border size) to show

which category the data item belongs to. If the digital

fragment is represented as well, texture or shape are in

our opinion the best options. However, the individual

cells are not distinguishable well and a heatmap effect

occurs, yet a data overview might still be possible.

Finally, some of the existing word-sized visual-

izations do not ﬁt our data types, such as scatter plots

, maps , and trajectories . In a box

plot , data on the y-axis is the category, but plot-

ting the data for category size on the x-axis would be

a point rather than a box. It could be more suitable,

however, when there is uncertainty in the data, e.g.,

from clustering of the categories.

4.3 Large Visualizations

With the help of the Text Visualization Browser

(Kucher and Kerren, 2015) we categorized visual ele-

ments of existing visualizations and visual analytics

systems whose goal it is to show discussion diver-

sity. We surveyed publications that focus on discourse

analysis in online forums, blogs, or social media.

Some of the visualizations and visual analytics

systems mentioned in Section 2.1 use visual elements

already discussed in the previous section, like sub-

divided area charts by Hoque and Carenini

(2014). However, most either have a different under-

lying data structure than our categorical data, e.g., ge-

ographical data or network data, different tasks that

focus on a top-down approach, or are too complex to

scale well to word size.

One of the most popular visualizations for dis-

playing discussion development in social media or

blogs are stream graphs. Widely used are glyphs

and text overlays of events or category characteris-

tics, which when scaled to word size become unread-

able

. Without those overlays, however, stream

graphs are suitable word-sized visualizations for our

data . They are similar to stacked area charts

that we discussed in Section 4.2, except that their y-

Word-sized Visualizations for Exploring Discussion Diversity in Social Media

261

baseline typically is not at the bottom, but at the mid

line of the chart area. Stream graphs suffer from the

same limitations as stacked area charts, i.e., it is difﬁ-

cult to display small category sizes or a large number

of categories while maintaining readability, though

they otherwise fulﬁll our requirements deﬁned in Sec-

tion 3.4 and are a suitable candidate.

4.4 Summary

Based on the different word-sized visualizations dis-

cussed in the previous sections, we suggest the word-

sized visualizations we think represent our data the

best as well as fulﬁll the requirements. For represent-

ing categories without depicting size together with the

data item and digital fragment for one point in time,

our suggested word-sized visualization is a subdi-

vided area chart with ﬁxed-width elements . We

choose a subdivided area chart with variable-width el-

ements or a bar chart for best represent-

ing categories with size information together with the

data item and digital fragment for one point in time.

For representing categories without depicting size in-

formation together with the data item and digital frag-

ment for a time range, our suggested word-sized visu-

alization is a stacked area chart with ﬁxed-sizes and

an overlay for the category of the data item or digital

fragment . For representing categories with size

information together with the data item and digital

fragment for a time range, we advise using a stacked

area chart or a stream graph .

5 USAGE EXAMPLE

There are multiple scenarios in which a person might

want to see word-sized visualizations integrated into

a social media feed. First, a social media feed for a

hashtag or a feed compiled based on a speciﬁc search

result. Then, there is one digital fragment which is the

hashtag or search term. Second, their personal social

media feed, in which case there are multiple digital

fragments. Third, a social media feed of a speciﬁc

group or organization. Then multiple digital frag-

ments are relevant again. In the following we focus

on the ﬁrst scenario.

To illustrate how word-sized visualizations can

be integrated into a social media feed, we collected

251.146 Twitter messages for a digital fragment—a

video about the explosion in Beirut in August 2020

(cf. Figure 1). For demonstration purposes, we

use the tf-idf features of the corpus to cluster the

tweets with a simple k-means++ algorithm, with a

data-adaptive number of clusters (Davies–Bouldin in-

dex (Davies and Bouldin, 1979) and elbow method

(Thorndike, 1953)), yielding k = 3. We create the

word-sized approaches that show temporal evolution

by binning the tweets of each cluster into intervals ac-

cording to their timestamp.

The intention of the word-sized visualizations is

that they are placed within an individual data item,

i.e., Twitter message. Where to place the word-

sized visualization exactly within a text has been thor-

oughly discussed by Gofﬁn et al. (2014), hence we

follow their recommendations and suggestions. In a

social media feed, the word-sized visualization should

always be at the same position in all data items to en-

able comparison. For our demonstration, we decided

to place the word-sized visualization next to the link

of the digital fragment, because the categories are re-

lated to it. We show the Twitter stream similar to the

way it would appear online and place the digital frag-

ment tweet as well as a permanently visible legend

with color coding and the most salient words for each

category at the top of the stream of Twitter messages

with integrated word-sized visualizations.

To showcase visualization variants, we chose the

subdivided area chart as well as a stream graph. For

comparison of suitability of the word-sized visualiza-

tions in our example, we added one of the tweets with

a pie chart and a stacked area chart at the bottom of

Figure 1, each separated by a horizontal line. The sub-

divided area chart shows variable-width elements rep-

resenting the size (number of data items) of the cate-

gory. We set a minimum width of 5 pixels for the cat-

egories to ensure readability as well as a maximum to-

tal width of 75 pixels for the word-sized visualization.

The pie chart represents category size as arc length, its

diameter is set to 1.5 times the height of the surround-

ing text. The stream graph shows the amount of data

items in each category over time through variable-

height streams, while the ratio of category sizes over

time becomes visible with the height of the areas in

the stacked area chart.

The category of the digital fragment is marked

with a line from the top of the chart in the subdivided

area charts and with a circle in the other charts. The

data item is marked with a line from the bottom and a

triangle, respectively. It is visible at a glance whether

those marks are in the same category, i.e., they touch

in case of the subdivided area chart or not.

This indicates whether the data item is close to the

digital fragment, or if they are in different categories

and this speciﬁc tweet seems to be discussing a dif-

ferent topic and a change in context may be arising

that might be worth investigating. Further, with the

number of categories explicitly represented, analysts

can get an overview about how many different con-

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

262

Figure 1: Comparison of visualizing categories in a Twit-

ter stream with subdivided area charts and pie charts, which

are semantically similar but show a clear difference in vi-

sual readability, in addition to stream graphs and stacked

area charts. The digital fragment (a video about the Beirut

explosion in August 2020) and the legend with ten most

salient words per cluster are shown on top. The same ex-

ample tweets from several clusters are shown for the word-

sized visualization versions. We removed real user names

and proﬁle pictures and replaced them with pseudonyms.

text categories a digital fragment is discussed in, and

through the category sizes how prevalent each con-

text category is (T1). If the category of a digital frag-

ment is the largest, the digital fragment is the driving

factor in a discussion (T2). In our example, the dig-

ital fragment is in the largest cluster 2 , whereas

clusters 1 and 3 have less tweets assigned to

them. Therefore, the context of the discussion does

not seem to have changed into a different direction

much, which can be further examined by inspecting

individual tweets.

When a data item is in a different category than

the digital fragment, it is an indicator that someone

is trying to change the discussion and draw attention

to a different issue (T3). An analyst can detect those

context changes, for example, cluster 3 (prayers,

stay, safe). Examining the discourse over time (T4)

is possible when time is added as a variable into the

word-sized visualizations, as with the stream graph

or the stacked area chart. To depict temporal devel-

opment, without adding time as a variable to the chart

directly and to keep the visualization from being over-

loaded, two charts of the same type could be placed

before and after the digital fragment of the data item,

that show data that temporally occurred before and

after. This way, the data of two time points or time

ranges can be observed, and whether the discussion

changed. Another option is to place word-sized visu-

alizations showing different kinds of categories, e.g.,

topics and sentiment, next to each other, with the pos-

sibility to ﬁlter the data of one through interaction

with the other, showing sentiment within a category.

There are several possibilities for interaction, on

various elements of the word-sized visualization, i.e.,

on the categories, markers, legend, single words, or

the whole visualization. In Section 4.2 we elaborate

on why fulﬁlling (R6) is often an issue, i.e., maintain-

ing readability of the visualizations at word size, as,

for example, with cluster 3 in Figure 1. A possi-

ble remedy is through interaction, for example, with a

focus lens as a mouse-over interaction, although this

hinders an overview at a glance of the social media

feed. For a subdivided area chart, a possibility to add

temporal data is through interaction by adding a tem-

poral timeline overlay over the surrounding text, with

time on the y-axis and the top of the visualization de-

picting the latest state , which resembles the order-

ing of items in a social media feed. A way to omit the

legend is showing the characteristics of each category

on mouse-over. Mouse interaction, maybe altered by

pressing a key, e.g.,

Ctrl

, while hovering or clicking,

can further be used to lead a person to a visual ana-

lytics system that enables a more detailed review, to

Word-sized Visualizations for Exploring Discussion Diversity in Social Media

263

switch from the text-centric view to a visualization-

centric view. In general, interaction with the word-

sized-visualization should show additional data and

further enable navigation through the corpus, to allow

ﬁnding ways out of a potential ﬁlter bubble and un-

derstanding details about the different categories.

6 CONCLUSION

In this paper, we discussed the design aspects of

word-sized visualizations for discourse analysis in so-

cial media, showed which word-sized visualizations

are suitable for this task, and suggested word-sized

visualizations for several example applications. Fur-

ther, we described an example Twitter stream with in-

tegrated word-sized visualizations.

One of the main challenges of word-sized visu-

alizations is that their visual elements should be rec-

ognizable at word scale. In our example with real-

world data this is evident for several visualization

types. While the font size of this paper is 11 pixels,

social media sites often use larger font sizes. Twitter

messages, for example, have a font size of 15 pixels,

which slightly helps overcome this challenge.

We see the main advantage of word-sized visual-

izations in a social media feed in their possibility to

give an overview of discussion diversity directly in

a data item, without leaving the environment of the

feed. Further, through interaction, word-sized visual-

izations can lead analysts to a visual analytics system

for a more detailed analysis of the discourse.

In addition to social media feeds, word-sized visu-

alizations can similarly be embedded in news articles,

e.g., for showing topics throughout the article.

To supplement our discussion, we are planning

evaluations with lay people as well as experts, com-

paring different word-sized visualizations on discus-

sion diversity analysis tasks. Though we expect mul-

tiple versions to be equally effective, studies will be

helpful in validating this assumption.

ACKNOWLEDGEMENTS

This work has been funded by the University of

Stuttgart as part of the project Context Changes In

Social Media Contributions in cooperation with the

Leibniz-Institut f

ur Wissensmedien as part of the

Leibniz Wissenschaftscampus ”Cognitive Interfaces”.

Tanja Blascheck is indebted to the European Social

Fund and the Ministry of Science, Research, and Arts

Baden-W

urttemberg.

REFERENCES

Beck, F., Blascheck, T., Ertl, T., and Weiskopf, D.

(2017). Word-Sized Eye-Tracking Visualizations. In

Burch, M., Chuang, L., Fisher, B., Schmidt, A., and

Weiskopf, D., editors, Eye Tracking and Visualiza-

tion, Mathematics and Visualization, pages 113–128.

Springer.

Beck, F. and Weiskopf, D. (2017). Word-Sized Graphics for

Scientiﬁc Texts. IEEE Transactions on Visualization

and Computer Graphics, 23(6):1576–1587.

Bertin, J. (1967). S

emiologie graphique. Paris,

Mouton/Gauthier-Villars, 1st edition.

Bosch, H., Thom, D., Heimerl, F., P

uttmann, E., Koch, S.,

uger, R., W

orner, M., and Ertl, T. (2013). Scat-

terBlogs2: Real-Time Monitoring of Microblog Mes-

sages through User-Guided Filtering. IEEE Trans-

actions on Visualization and Computer Graphics,

19(12):2022–2031.

Brandes, U. and Corman, S. R. (2016). Visual Unrolling

of Network Evolution and the Analysis of Dynamic

Discourse. Information Visualization.

Carpendale, M. S. T. (2003). Considering visual variables

as a basis for information visualisation. Technical re-

port, University of Calgary.

Cuenca, E., Sallaberry, A., Wang, F. Y., and Poncelet, P.

(2018). MultiStream: A Multiresolution Streamgraph

Approach to Explore Hierarchical Time Series. IEEE

Transactions on Visualization and Computer Graph-

ics, 24(12):3160–3173.

Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu,

H., and Tong, X. (2011). TextFlow: Towards Bet-

ter Understanding of Evolving Topics in Text. IEEE

Transactions on Visualization and Computer Graph-

ics, 17(12):2412–2421.

Davies, D. L. and Bouldin, D. W. (1979). A Cluster Separa-

tion Measure. IEEE Transactions on Pattern Analysis

and Machine Intelligence, PAMI-1(2):224–227.

El-Assady, M., Gold, V., Acevedo, C., Collins, C., and

Keim, D. (2016). ConToVi: Multi-Party Conversa-

tion Exploration using Topic-Space Views. Computer

Graphics Forum, 35(3):431–440.

El-Assady, M., Sevastjanova, R., Keim, D., and Collins,

C. (2018). ThreadReconstructor: Modeling Reply-

Chains to Untangle Conversational Text through

Visual Analytics. Computer Graphics Forum,

37(3):351–365.

Gofﬁn, P., Boy, J., Willett, W., and Isenberg, P. (2017). An

Exploratory Study of Word-Scale Graphics in Data-

Rich Text Documents. IEEE Transactions on Visual-

ization and Computer Graphics, 23(10):2275–2287.

Gofﬁn, P., Isenberg, P., Blascheck, T., and Willett, W.

(2020). Interaction Techniques for Visual Exploration

Using Embedded Word-Scale Visualizations. In Pro-

ceedings of the 2020 CHI Conference on Human Fac-

tors in Computing Systems, pages 1–13.

Gofﬁn, P., Willett, W., Bezerianos, A., and Isenberg, P.

(2015). Exploring the effect of word-scale visual-

izations on reading behavior. In Proceedings of the

33rd Annual ACM Conference Extended Abstracts on

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

264

Human Factors in Computing Systems, pages 1827–

1832. ACM.

Gofﬁn, P., Willett, W., Fekete, J.-D., and Isenberg, P.

(2014). Exploring the Placement and Design of Word-

Scale Visualizations. IEEE Transactions on Visualiza-

tion and Computer Graphics, 20(12):2291–2300.

Harrower, M. and Brewer, C. (2003). ColorBrewer.org: An

online tool for selecting colour schemes for maps. The

Cartographic Journal, 40(1):27–37.

Havre, S., Hetzler, E., Whitney, P., and Nowell, L. (2002).

ThemeRiver: visualizing thematic changes in large

document collections. IEEE Transactions on Visual-

ization and Computer Graphics, 8(1):9–20.

He, J. and Chen, C. (2016). Spatiotemporal Analytics of

Topic Trajectory. In Proceedings of the 9th Interna-

tional Symposium on Visual Information Communi-

cation and Interaction, VINCI ’16, pages 112–116.

ACM.

Heer, J., Kong, N., and Agrawala, M. (2009). Sizing the

horizon: the effects of chart size and layering on the

graphical perception of time series visualizations. In

Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, CHI ’09, pages 1303–

1312.

Heimerl, F., Han, Q., Koch, S., and Ertl, T. (2016a). Ci-

teRivers: Visual Analytics of Citation Patterns. IEEE

Transactions on Visualization and Computer Graph-

ics, 22(1):190–199.

Heimerl, F., John, M., Han, Q., Koch, S., and Ertl, T.

(2016b). DocuCompass: Effective exploration of doc-

ument landscapes. In 2016 IEEE Conference on Vi-

sual Analytics Science and Technology (VAST), pages

11–20.

Hoffswell, J., Satyanarayan, A., and Heer, J. (2018). Aug-

menting Code with In Situ Visualizations to Aid Pro-

gram Understanding. In Proceedings of the 2018 CHI

Conference on Human Factors in Computing Systems,

CHI ’18, pages 1–12. ACM.

Hoque, E. and Carenini, G. (2014). ConVis: A Visual Text

Analytic System for Exploring Blog Conversations.

Computer Graphics Forum, 33(3):221–230.

Kennedy, A. and Sankey, R. (1898). The thermal efﬁciency

of steam engines. In Minutes of the Proceedings of

the Institution of Civil Engineers, volume 134, pages

278–312. Thomas Telford-ICE Virtual Library.

Kim, M., Kang, K., Park, D., Choo, J., and Elmqvist,

N. (2017). TopicLens: Efﬁcient Multi-Level Visual

Topic Exploration of Large-Scale Document Collec-

tions. IEEE Transactions on Visualization and Com-

puter Graphics, 23(1):151–160.

Kucher, K. and Kerren, A. (2015). Text visualization tech-

niques: Taxonomy, visual survey, and community

insights. In IEEE Paciﬁc Visualization Symposium,

pages 117–121. IEEE Computer Society.

Latif, S. and Beck, F. (2018). Visually Augmenting Docu-

ments With Data. Computing in Science Engineering,

20(6):96–103.

Latif, S. and Beck, F. (2019). VIS Author Proﬁles: Interac-

tive Descriptions of Publication Records Combining

Text and Visualization. IEEE Transactions on Visual-

ization and Computer Graphics, 25(1):152–161.

Lee, B., Riche, N. H., Karlson, A. K., and Carpendale,

S. (2010). SparkClouds: Visualizing Trends in Tag

Clouds. IEEE Transactions on Visualization and

Computer Graphics, 16(6):1182–1189.

Liu, S., Wang, X., Chen, J., Zhu, J., and Guo, B. (2014).

TopicPanorama: A full picture of relevant topics. In

2014 IEEE Conference on Visual Analytics Science

and Technology (VAST), pages 183–192.

Liu, X., Xu, A., Gou, L., Liu, H., Akkiraju, R., and Shen,

H.-W. (2016). SocialBrands: Visual analysis of public

perceptions of brands on social media. In 2016 IEEE

Conference on Visual Analytics Science and Technol-

ogy (VAST), pages 71–80.

Lu, Y., Wang, H., Landis, S., and Maciejewski, R. (2018).

A Visual Analytics Framework for Identifying Topic

Drivers in Media Events. IEEE Transactions on Visu-

alization and Computer Graphics, 24(9):2501–2515.

Perin, C., Vuillemot, R., and Fekete, J.-D. (2013). Soccer-

Stories: A Kick-off for Visual Soccer Analysis. IEEE

Transactions on Visualization and Computer Graph-

ics, 19(12):2506–2515.

Thorndike, R. L. (1953). Who belongs in the family? Psy-

chometrika, 18(4):267–276.

Tufte, E. R. (2006). Beautiful Evidence. Graphics Press, 1

edition.

Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of

true and false news online. Science, 359(6380):1146–

1151.

Ware, C. (2013). Information Visualization: Perception for

Design. Morgan Kaufmann Publishers Inc., 3 edition.

Wolfe, J. M. and Horowitz, T. S. (2004). What attributes

guide the deployment of visual attention and how do

they do it? Nature reviews neuroscience, 5(6):495–

501.

Xu, J., Tao, Y., Lin, H., Zhu, R., and Yan, Y. (2017). Ex-

ploring controversy via sentiment divergences of as-

pects in reviews. In 2017 IEEE Paciﬁc Visualization

Symposium (PaciﬁcVis), pages 240–249.

Xu, P., Wu, Y., Wei, E., Peng, T.-Q., Liu, S., Zhu, J. J., and

Qu, H. (2013). Visual Analysis of Topic Competition

on Social Media. IEEE Transactions on Visualization

and Computer Graphics, 19(12):2012–2021.

Word-sized Visualizations for Exploring Discussion Diversity in Social Media

265