Visualization of Swedish News Articles: A Design Study

Kostiantyn Kucher

1 a

, Nellie Engstr

, Wilma Axelsson

, Berkant Savas

1,2 b

and Andreas Kerren

1,3 c

Department of Science and Technology, Link

oping University, Norrk

oping, Sweden

iMatrics AB, Link

oping, Sweden

Department of Computer Science and Media Technology, Linnaeus University, V

axj

o, Sweden

Keywords:

Information Visualization, Text Visualization, Natural Language Processing, News, Editorial Media, Swedish

Language, Journalism.

Abstract:

The amount of available text data has increased rapidly in the past years, making it difﬁcult for many users

to ﬁnd relevant information. To solve this, natural language processing (NLP) and text visualization methods

have been developed, however, they typically focus on English texts only, while the support for low-resource

languages is limited. The aim of this design study was to implement a visualization prototype for exploring

a large number of Swedish news articles (made available by industrial collaborators), including the temporal

and relational data aspects. Sketches of three visual representations were designed and evaluated through user

tests involving both our collaborators and end-users (journalists). Next, an NLP pipeline was designed in

order to support dynamic and hierarchical topic modeling. The ﬁnal part of the study resulted in an interactive

visualization prototype that uses a variation of area charts to represent topic evolution. The prototype was

evaluated through an internal case study and user tests with two groups of participants with the background in

journalism and NLP. The evaluation results reveal the participants’ preference for the representation focusing

on top topics rather than the topic hierarchy, while suggestions for future work relevant for Swedish text data

visualization are also provided.

1 INTRODUCTION

In the modern digitised society, a large amount of text

data is generated daily for different areas of applica-

tions such as product reviews, posts on social media,

research papers and news articles. With such large-

scale data come many challenges for the reader when

exploring the underlying data at scale, such as ﬁnding

and extracting relevant information, gaining insights,

getting an overview, grasping the overall meaning of

the data, as well as getting details on demand. To han-

dle the large digitized text corpora, methods which

involve Natural Language Processing (NLP) / Text

Mining, and further Artiﬁcial Intelligence (AI), have

been developed to extract valuable information auto-

matically. The areas of Visual Analytics (VA) and Vi-

sual Text Analytics (VTA) have also grown larger in

interest as Information Visualization (InfoVis), text

https://orcid.org/0000-0002-1907-7820

https://orcid.org/0000-0002-1542-2690

https://orcid.org/0000-0002-0519-2537

visualization, and text analysis methods have been

documented in an increasing number of papers over

the years (Kucher and Kerren, 2015; Liu et al., 2019;

Alharbi and Laramee, 2019). However, the major-

ity of these papers have been based on English texts,

and the research ﬁeld of using NLP and visualiza-

tion techniques for lower-resourced languages, such

as Swedish, remains less explored, presenting chal-

lenges and opportunities for both academic research

and industrial applications (for instance, the average

word length in Swedish is greater than in English, af-

fecting designs that rely heavily on text labels).

In this paper

, we contribute to the less explored

area of VTA for Swedish text data (more speciﬁcally,

news articles as well as associated metadata) based on

the data provided by our collaborators from iMatrics,

a company located in Link

oping, Sweden. As they

explore the opportunities of using visualization for in-

ternal use as well as for products available for their

clients (often with non-technical background, for in-

Based on a thesis project (Axelsson and Engstr

om,

2023).

670

Kucher, K., Engström, N., Axelsson, W., Savas, B. and Kerren, A.

Visualization of Swedish News Ar ticles: A Design Study.

DOI: 10.5220/0012398600003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 1: GRAPP, HUCAPP

and IVAPP, pages 670-677

ISBN: 978-989-758-679-8; ISSN: 2184-4321

stance, journalists), the design of our proposed solu-

tion takes the respective constraints and requirements

into account. Our design study generally focuses on

the challenges and necessary trade-offs of designing

a temporal visualization, showing the relations be-

tween the topics of a large-scale text corpus, that is

simple enough for a non-technical user to understand,

while also considering the text genre aspect (e.g., ar-

ticles focused on a speciﬁc story/event or a speciﬁc

place/region) and target audience (e.g., the value of

the interactive visualization prototype as perceived by

the users with non-technical vs NLP background).

The rest of this paper is organized as follows: we

discuss the related work as well as the methodology

of this study in Sections 2 and 3. The ﬁrst iteration

of the study involving sketches and user feedback is

described in Section 4. The backend and the frontend

of our prototype are presented in Sections 5 and 6, re-

spectively. Evaluation of the resulting prototype then

follows in Section 7, and the discussion and conclu-

sions are provided in Sections 8 and 9.

2 RELATED WORK

Both computational and visual/interactive perspec-

tives are relevant to this study.

2.1 Natural Language Processing

NLP is a ﬁeld that focuses on various text analysis

techniques and methods used, among other tasks, to

extract useful information such as keywords or top-

ics (Chowdhary, 2020). Topic modeling is an un-

supervised method used to uncover underlying top-

ics, or themes, in a large collection of documents and

group the documents according to these different top-

ics (Tolegen et al., 2022). This is typically a soft clus-

tering method, where each document belongs to each

topic with a certain probability (Aggarwal and Zhai,

2012). Some of the more used topic modeling meth-

ods are Latent Dirichlet Allocation (LDA) and Non-

Negative Matrix Factorization (NMF), for instance.

There are also different extensions and applications of

topic modeling, for example, Dynamic Topic Model-

ing (DTM) and Hierarchical Topic Modeling (HTM).

DTM is a topic modeling approach which models the

evolution of topics over time, with the ability to cre-

ate a temporal overview of a large collection of doc-

uments (Blei and Lafferty, 2006). HTM focuses on

hierarchical clustering/grouping of topics, including

such methods as hierarchical Latent Dirichlet Alloca-

tion (hLDA) and Pachinko Allocation Model (PAM),

for instance (Liu et al., 2016).

2.2 Visualization

InfoVis focuses on gaining insights of the (abstract)

data with the use of various (interactive) visualiza-

tion techniques (Spence, 2014). The prior works have

established the basic stages of creating a visualiza-

tion (Ware, 2021) as well as workﬂows for support-

ing user tasks from overview to details (Shneiderman,

1996) and design study methodology (Sedlmair et al.,

2012). Visual Analytics is a related ﬁeld, wherein in-

teractive visualizations are designed based on compu-

tational data analysis methods, with the aim to explore

and understand especially large and complex data sets

(Keim et al., 2010), including the concerns such as the

data characteristics, the users, and their tasks (Miksch

and Aigner, 2014) into account.

Regarding the special data types that have strong

implications for the visualization design process, text

is one prominent example (Cao and Cui, 2016). As of

today, there are many respective techniques to choose

from, depending on the speciﬁc data and task at hand

(Kucher and Kerren, 2015; Liu et al., 2019; Alharbi

and Laramee, 2019). When designing a visualiza-

tion for temporal data, multiple design choices must

also be made. A type of stacked graph, named a

Streamgraph, has been a prominent example of a vi-

sual representation for time series (Byron and Watten-

berg, 2008). This graph comprises individual layers,

stacked upon each other, with different colours and

labels to reﬂect separate data series. The thickness of

the stack is then set to represent the total sum of the

layers’ corresponding time series. Furthermore, there

can be a need of visualizing data which contains many

different distributions at once. A few viable options

for this are ridgeline plots, violin plots, and boxplots

(Wilke, 2019). For example, the ridgeline plot is es-

pecially useful when visualizing overall trends in dis-

tributions. Each distribution in the ridgeline plot is

displayed in the form of an area chart, where the area

chart is represented through a density estimate.

There have been many previous implementations

of VA tools that include visualizations of topics over

time, or relationships between topics, based on re-

sults from topic modeling. For example, ThemeRiver

is a visualization using the river-based-ﬂow / stacked

graph metaphor to present, e.g., themes and patterns

of a large document corpus over time (Havre et al.,

2002). Visual Backchannel is a multi-faceted inter-

face that visualizes events over time through pro-

viding the user with three types of visualizations: a

stacked graph, a spiral, and an image cloud (D

ork

et al., 2010). StoryFlow is a storyline visualization

system developed for visualizing the evolution of sto-

ries over time and hierarchical relationships between

Visualization of Swedish News Articles: A Design Study

671

Figure 1: Workﬂow explaining the different parts and actors of the study and how these relate to each other.

a large set of entities (Liu et al., 2013). It should

be mentioned that evaluation of such approaches is

considered a difﬁcult challenge (Lam et al., 2012;

Elmqvist and Yi, 2015). This is due to visualizations

being designed to solve activity tasks, such as gain-

ing insights or making conclusions from the data vi-

sualized, which are generally complex and context-

dependent tasks, also often limited to a particular tar-

get user audience. There are also several challenges

with evaluating VTAs speciﬁcally, due to their com-

plexity of combining NLP and visualization, which

themselves can consist of systems that are not per-

fect (Kucher et al., 2022). Still, the methods such as

semi-structured interviews as well as heuristic evalu-

ation (Stasko, 2014) can be applied to get a glimpse,

if not conclusive evidence, about the validity of the

proposed approach and potential improvements.

3 DESIGN STUDY

METHODOLOGY

The main steps and actors of this study are presented

in Figure 1. During the initial phase, the general

requirements were discussed with the collaborating

company representative, namely, developing a visu-

alization prototype for the tasks of representing and

exploring interesting data aspects (including temporal

and relational) from a large number of news articles

in Swedish. The end-users were not strictly deﬁned,

however, as such a prototype could be interesting for

various audiences, including non-technical ones.

The ﬁrst major step consisted of a visualization

sketch design and evaluation process (the green block

in Figure 1). Here, it was important to consider the

requirements mentioned in the earlier stage, which

was partially done by deﬁning the data, users, and

tasks (Miksch and Aigner, 2014) in the abstraction

step. Besides the preliminary exploration of the avail-

able data and review of the prior work, sketches of the

entire visualization prototype were prepared and eval-

uated through user tests using the think-aloud method,

in order to gain feedback from possible end-users and

decide on which design should be further developed.

The second major step (the blue block in Figure 1)

was to design and implement the NLP pipeline, while

internally evaluating it. The next steps (the two red

blocks) were to implement the visualization prototype

and evaluate it. The visualization front-end was de-

veloped in an iterative manner along with the NLP

pipeline. Evaluations were then performed on the vi-

sualization prototype, which would test how the par-

ticipants perceived the ﬁnal visualization, while tak-

ing their background (non-technical vs NLP) into ac-

count. Additionally, a smaller case study was per-

formed by visualizing two use cases through the pro-

totype and comparing these to see if there were any

noticeable visual differences for articles of different

types, e.g., articles related to a certain place or event.

The data used in this project was based on a

larger collection of news articles from the years 2019–

2022—in total, ≈200,000 articles. However, due

to the performance concerns, each year was initially

handled separately and only the articles and features

interesting for the scope of this design study were ex-

tracted, including the headline, text, timestamps, and

IVAPP 2024 - 15th International Conference on Information Visualization Theory and Applications

672

(a) (b) (c)

Figure 2: The initial visualization interface sketches: (a) the streamgraph sketch with glyphs, keywords, and an info box

shown; (b) the network graph sketch with an info box for an edge shown; and (c) the storyline sketch with an info box

describing the similarities between two topics.

article tags. The tags had been automatically gen-

erated by iMatrics and manually approved by their

clients, i.e., journalists. All of the data described

above was cleaned and preprocessed, with only arti-

cles containing texts in Swedish kept for further anal-

yses. In order to extract two use cases for the case

study performed later and to reduce the computation

time, two smaller (sub)sets of data were extracted.

These two data sets, one with the story tag “COVID-

19” and another with the city tag “Kalmar”, contained

around 10,000 and 38,000 articles, respectively.

4 INITIAL DESIGN

Based on the initial analyses and discussions, three

concepts were chosen as the basis for interactive

sketches created with Figma

in order to make the

user tests more time efﬁcient and provide the partici-

pants with interactions (such as hovering and clicking

for details on demand) to test as well.

4.1 Initial Sketches Design

The decision was made to explore the visualization

possibilities (and to get the initial user feedback)

before implementing any NLP methods, since the

choice of visual representations and interactions—

and the required underlying information—highly af-

fects the NLP pipeline design. Thus, the initial

sketches relied on a combination of data from two

ﬁxed time intervals with more general mock-up topics

such as “Crime” or “Politics”, while the participants

were instead asked to imagine that the visualization

prototype could eventually display both general and

more speciﬁc news topics. Overview of the resulting

interactive Figma sketches is provided in Figure 2.

https://www.ﬁgma.com/

4.2 Initial Sketches Evaluation

To evaluate the sketches, user tests were performed

with a focus on how easy or difﬁcult the sketches were

to interpret, the usefulness of the proposed representa-

tions/interfaces, and what information the participants

were interested in seeing in such a potential future

tool. Each participant of the user test was tested in-

dividually, in person at their respective workplaces,

and each user test took around 40–60 minutes.

The user test was performed with two different

groups of participants: one group consisted of three

journalists (two investigative reporters with 8 and

14 years of experience + a journalist/photographer

with 27 years of experience), while the other group

consisted of two iMatrics staff members (one en-

trepreneur with around 6–7 years of experience + a

head of marketing with around half a year of experi-

ence in that role). This group included two investiga-

tive reporters had worked as reporters for 8 and 14

years, respectively, while the third participant, a jour-

nalist/photographer, had worked for 27 years. The

journalists all conﬁrmed being in general used to vi-

sualizations such as line graphs and pie charts; and

the staff members also come in contact with visualiza-

tions often or daily, while their attitude towards using

technical aids was either neutral or positive.

To summarize the outcomes brieﬂy, the network

graph was considered as the best alternative with

respect to simplicity and representation of relations

(but not over time)—however, it was also consid-

ered the worst in showing valuable information, and

it lacked the support for temporal aspects. The sto-

ryline graph was overall considered quite simple to

understand and the most useful on average, while be-

ing the best in showing relations over time. However,

it was considered difﬁcult to interpret by the partici-

pants. The streamgraph was overall considered valu-

able, yet difﬁcult to understand at a ﬁrst glance; how-

ever, one participant commented that the difﬁculty

may be caused by the lack of familiarity.

Visualization of Swedish News Articles: A Design Study

673

As the result, we made the decision to focus on

a streamgraph for the rest of this design study, how-

ever, several changes would be made in order to sim-

plify the visualization (e.g., the participants had a hard

time understanding what the streamgraph “branches”

represented) and to better match the expected NLP

pipeline results. Overall, features such as visualiz-

ing coverage of topics, details in the form of access

to original articles, and more visual clarity of the re-

lations between topics were considered important to

support, as well as the scalability.

5 NLP PIPELINE

Based on the chosen streamgraph sketch, the NLP

pipeline and eventually the visualization frontend

could then be designed. In order to analyze and rep-

resent the evolution of topics, we intended to ap-

ply Dynamic Topic Modeling; to address the rela-

tions between topics, while being able to scale to

a larger number of topics, we also decided to sup-

port Hierarchical Topic Modeling, as inspired by the

HierarchicalTopics tool (Dou et al., 2013), for in-

stance. Due to the performance, but also support

for HTM and ﬂexibility in customization, we chose

BERTopic

(Grootendorst, 2022) for our implementa-

tion. To set the different parameters or model options

used by BERTopic, e.g., the sentence transformer

used to generate document representations (embed-

dings), evaluations were carried out to compare dif-

ferent choices. For these comparisons, the COVID-

19 use case data was used. The metric used to eval-

uate the results from the NLP pipeline in an unsuper-

vised fashion was the silhouette score (Rousseeuw,

1987). For this project, two different sentence trans-

formers were tested: a multilingual one

(Reimers

and Gurevych, 2019) and a Swedish sentence trans-

former

from KB Lab (Rekathati, 2021). When us-

ing the former, the training of the model took 8 min-

utes and 54 seconds and 131 topics were generated.

Meanwhile, the latter took 63 minutes and 182 top-

ics were generated. The generated silhouette score

for the multilingual sentence transformer was around

0.565, while the Swedish sentence transformer re-

ceived a slightly higher silhouette score of 0.648.

Based on these results, we can see that the multilin-

gual transformer was considerably faster, but it also

gave slightly worse clustering results. Other than the

https://maartengr.github.io/BERTopic/

https://huggingface.co/sentence-transformers/

paraphrase-multilingual-MiniLM-L12-v2

https://huggingface.co/KBLab/

sentence-bert-swedish-cased

time and the silhouette scores, a third important aspect

to take in regard was the more subjective quality of the

topics. The quality was investigated for topics from

both transformers, and one typical example is how the

multilingual sentence transformer produced a topic

with representative words chosen such as “worst”,

“increases”, and “most”, while the Swedish sentence

transformer led a topic with more descriptive words

such as “elderly homes” or “the public health author-

ity”

. Based on all of these results, it was decided

that the Swedish sentence transformer would be used

for the ﬁnal pipeline and any further evaluations.

6 VISUALIZATION PROTOTYPE

The resulting interactive visualization prototype is

implemented using D3.js

. As Figure 3 demonstrates,

the main visual representation is a stack of area charts

resembling a ridgeline plot, which typically visualizes

the distribution of multiple groups over time or space

(however, we avoid the overdrawing applied in typi-

cal ridgeline plots for vertical space compression and

speciﬁc aesthetics, even though this design decision

leads to increased vertical space usage and vertical

scrolling within the user interface). This representa-

tion was deemed to contain multiple similarities with

the Figma sketch, e.g., in the forms of visualizing dis-

tribution of each topic over time and showing multiple

topics at once, while avoiding the issues related to in-

terpretation of streamgraph “branches”, as reported in

Section 4.2. The height scale is the same for all topics

and is determined by the topic with the highest fre-

quency peak (i.e., the values are normalized against

the global maximum count). Each graph also uses a

separate categorical color for a more distinct repre-

sentation and supports a magic lens for exploring the

topical keywords over time, as well as an additional

pop-up dialog with details on demand. The user can

adjust the displayed time range by using a range slider

at the bottom of the interface (divided into 15 tempo-

ral bins as a trade-off between a cluttered x-axis and

an overly coarse level of detail). Furthermore, the user

can toggle the representation of locally- (as opposed

to globally-) normalized topic counts over time (as a

dotted outline) in order to observe trends easier, es-

pecially for less prominent topics. The prototype is

showcased through two different views/perspectives

(chosen by the user): a top 10 topics view (based on

DTM only) and a hierarchical topics view (with an

option to navigate to the nested child topics).

These examples are translated from Swedish.

https://d3js.org/

IVAPP 2024 - 15th International Conference on Information Visualization Theory and Applications

674

Figure 3: The resulting visualization prototype showing the top topics for the COVID-19 use case.

7 EVALUATION

Finally, with the implementation complete, a smaller

case study and user tests were performed.

7.1 Case Study

The case study consisted of visualizing and analyz-

ing the two use cases deﬁned earlier, i.e., articles

tagged with either a story tag “COVID-19” or a place

tag “Kalmar”. These two data sets were both sepa-

rately processed through the NLP pipeline with the

same parameters. The prototype was used internally

with the available data, leading to some interesting

observations: for example, some of the hierarchical

topic groups were quite peculiar, such as one group

for the COVID-19 data that was described by the top

keywords “rehabilitation”, “funerals”, “fever”, “pa-

tients”, and “long-term covid”—by drilling down into

the details, we found out that this group contained a

mixture of topics related to both COVID-19-related

aspects and crime. The use cases were then compared

in order to discover if there are any visual differences,

with only the top 10 topics compared. As demon-

strated in Figure 4, there were more visible changes

in the COVID-19 use case, i.e., there were clear ups

and downs regarding how much each topic had been

written about (further details in Figure 3), including

the “School” and “Vaccinations” topics, for instance,

while the Kalmar use case demonstrated more stable

behavior. There was also a smaller difference regard-

ing the time periods of the use cases, as the Kalmar

data stretched over a wider period (2019–2022) than

the COVID-19 use case data (2020–2022).

7.2 User Tests

The user tests were performed to evaluate the pro-

totype in its ﬁnal state. Each user test took in total

around 40–60 minutes and consisted of an introduc-

tion, demo, two introductory mini tasks (identifying

peaks and navigating & exploring the respective top-

ical keywords), and multiple interview questions fo-

cusing on the users’ interpretation of the prototype,

which were inspired by ICE-T questions (Stasko,

2014; Wall et al., 2019). Similar to the sketch eval-

uation sessions described in Section 4.2, the user tests

were divided into two groups of participants (and the

questions adapted accordingly): (1) participants with

journalistic experience (conducted remotely due to

different locations; four participants in total with 7,

20, 35, and 14 years of journalistic work experience,

respectively, and weekly exposure to visualizations),

and (2) participants with technical roles/experience

(conducted in person; ﬁve participants in total with

2, 28, and between 2–5 years of experience with NLP,

and daily or weekly exposure to visualizations). To

summarize the outcomes of these evaluation sessions

brieﬂy, both groups were able to succeed in the mini

tasks with ease. Both groups were interested in in-

Visualization of Swedish News Articles: A Design Study

675

(a) (b)

Figure 4: Topic trend differences from (a) the COVID-19 use case and (b) the Kalmar use case.

vestigating somewhat similar questions regarding the

data set of articles, such as the topic evolution over

time; and both groups mentioned that the prototype

supports this task, which in itself fulﬁlls one of the

main purposes of the prototype. The visualisation was

stated to convey a visual overview of the articles writ-

ten, compared to the archive search otherwise used.

The prototype could also be useful in some other con-

texts, e.g., making sure to not miss follow-ups from

previous years. Regarding the hierarchical topic view,

the results were considered somewhat confusing by

both groups, and thus the value of this functionality

considered lower than the top 10 topics view.

8 DISCUSSION

One of the interesting observations related to the ini-

tial sketches’ evaluation is that the streamgraph was

considered difﬁcult to understand, e.g., through the

branches showcasing the relations, but it was also

found most useful by a majority of the participants.

An impression was also that the same participants

took a longer time to investigate the sketch and that

resulted in a deeper discussion, e.g., regarding the

graph’s potential and improvement possibilities; how-

ever, raising the level of complexity of the solution too

high also has a risk of discouraging the users.

With respect to the ﬁnal prototype, the results of

the evaluation show that the prototype is promising in

its current state, as it provides the user with insights

through, e.g., showing trends and helping the user to

draw conclusions from a larger set of articles over a

longer period of time. There were, however, many im-

provements mentioned by the user participants such

as including: real-time data, the articles’ full text, a

search function, a map showing the extent of the top-

ics, etc., some of which are beyond the scope of a

prototype as opposed to a full-ﬂedged tool/product.

The overall feedback of the participants was positive

with respect to the top 10 topics view—the partici-

pants from the ﬁeld of journalism mentioned that it

could, with some improvements, aid them in their

work. The hierarchical topics view had, however, in

general lower value and quality for both groups, while

the group of people working closely to NLP under-

stood why some unexpected hierarchical topics ap-

peared. This shows that including hierarchies of top-

ics can be difﬁcult for an end-user to perceive.

While the design space of all possible representa-

tions, interactions, and NLP methods potentially ap-

plicable for news articles data is enormous, only a

part of that design space was considered due to the

limited scope of this project; thus, this study cannot

claim to provide deﬁnitive answers and design guide-

lines, but rather contribute to the existing body of

knowledge, especially with respect to the user feed-

back for various visual representations that can be

considered well-known and trivial within the visual-

ization research community, while being unfamiliar

to the end-users. Additionally, some questions about

the scalability of the approach as well as its generaliz-

ability towards other languages could be part of future

work. Finally, the number of participants involved in

evaluations was limited and some additional concerns

could be considered (e.g., remote vs on-site partici-

pation), which could be addressed to some extent by

further evaluation efforts.

9 CONCLUSIONS

The aim to develop a web-based visualization pro-

totype, which can be used to explore a large set

of Swedish news articles, was fulﬁlled through this

project. In this project, it was of interest to include

both temporal and relational data, which challenged

the design and choice of visual representations, es-

pecially when considering non-technical end-users.

Therefore, there is a challenge in trying to adapt the

visualizations, yet include as much valuable informa-

tion as possible while still not overwhelming the user.

Secondly, an important trade-off when designing a vi-

sualization for a large-scale text corpus is that it is not

possible to show all of the data at once, yet the data vi-

sualized should fulﬁll the user’s needs. The end-users

saw value in the prototype as it gave a visual aspect

of the articles and could be helpful when, e.g., do-

ing research or writing follow-up stories. Meanwhile,

people working closely to the NLP ﬁeld did perceive

IVAPP 2024 - 15th International Conference on Information Visualization Theory and Applications

676

some value in the features of the prototype, but did

not relate it as clearly to areas of application.

To develop a visualization prototype for low-

resource language such as Swedish (in comparison to

English) can be considered a positive contribution as

it makes such tools accessible to further audiences.

Thus, the lessons learned from this design study as

well as its limitations and identiﬁed suggestions for

improvements could lead to the future work in that

direction from the perspective of visual text analytics,

within and beyond the academic community.

ACKNOWLEDGEMENTS

This work was partially supported through (1) the EL-

LIIT environment for strategic research in Sweden

and (2) the Wallenberg AI, Autonomous Systems and

Software Program (WASP) funded by the Knut and

Alice Wallenberg Foundation. We are also thankful

to all user test participants.

REFERENCES

Aggarwal, C. C. and Zhai, C. (2012). An introduction to text

mining. In Mining Text Data, pages 1–10. Springer.

Alharbi, M. and Laramee, R. (2019). SoS TextVis: An ex-

tended survey of surveys on text visualization. Com-

puters, 8(1).

Axelsson, W. and Engstr

om, N. (2023). Large-scale

exploratory text visualisation. Master’s thesis,

Link

oping University.

Blei, D. M. and Lafferty, J. D. (2006). Dynamic topic mod-

els. In Proc. of ICML, pages 113–120. ACM.

Byron, L. and Wattenberg, M. (2008). Stacked graphs —

Geometry & aesthetics. IEEE TVCG, 14(6):1245–

1252.

Cao, N. and Cui, W. (2016). Introduction to Text Visualiza-

tion. Atlantis Press.

Chowdhary, K. R. (2020). Natural language processing.

In Fundamentals of Artiﬁcial Intelligence, pages 603–

649. Springer.

Dou, W., Yu, L., Wang, X., Ma, Z., and Ribarsky, W.

(2013). HierarchicalTopics: Visually exploring large

text collections using topic hierarchies. IEEE TVCG,

19(12):2002–2011.

ork, M., Gruen, D., Williamson, C., and Carpendale, S.

(2010). A visual backchannel for large-scale events.

IEEE TVCG, 16(6):1129–1138.

Elmqvist, N. and Yi, J. S. (2015). Patterns for visualization

evaluation. Information Visualization, 14(3):250–269.

Grootendorst, M. (2022). BERTopic: Neural topic mod-

eling with a class-based TF-IDF procedure. arXiV

Preprints, arXiv:2203.05794.

Havre, S., Hetzler, E., Whitney, P., and Nowell, L. (2002).

ThemeRiver: Visualizing thematic changes in large

document collections. IEEE TVCG, 8(1):9–20.

Keim, D., Kohlhammer, J., Ellis, G., and Mansmann, F.

(2010). Mastering the Information Age: Solving Prob-

lems with Visual Analytics. Eurographics.

Kucher, K. and Kerren, A. (2015). Text visualization tech-

niques: Taxonomy, visual survey, and community in-

sights. In Proc. of PaciﬁcVis, pages 117–121. IEEE.

Kucher, K., Sultanum, N., Daza, A., Simaki, V., Skeppstedt,

M., Plank, B., Fekete, J.-D., and Mahyar, N. (2022).

An interdisciplinary perspective on evaluation and ex-

perimental design for visual text analytics: Position

paper. In Proc. of BELIV. IEEE.

Lam, H., Bertini, E., Isenberg, P., Plaisant, C., and Carpen-

dale, S. (2012). Empirical studies in information visu-

alization: Seven scenarios. IEEE TVCG, 18(9):1520–

1536.

Liu, L., Tang, L., He, L., Zhou, W., and Yao, S. (2016). An

overview of hierarchical topic modeling. In Proc. of

IHMSC, pages 391–394. IEEE.

Liu, S., Wang, X., Collins, C., Dou, W., Ouyang, F., El-

Assady, M., Jiang, L., and Keim, D. A. (2019). Bridg-

ing text visualization and mining: A task-driven sur-

vey. IEEE TVCG, 25(7):2482–2504.

Liu, S., Wu, Y., Wei, E., Liu, M., and Liu, Y. (2013).

StoryFlow: Tracking the evolution of stories. IEEE

TVCG, 19(12):2436–2445.

Miksch, S. and Aigner, W. (2014). A matter of time: Ap-

plying a data–users–tasks design triangle to visual an-

alytics of time-oriented data. Comput. & Graphics,

38:286–290.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sen-

tence embeddings using Siamese BERT-networks. In

Proc. of EMNLP-IJCNLP, pages 3982–3992. ACL.

Rekathati, F. (2021). The KBLab blog: Introducing a

Swedish sentence transformer. Online resource.

Rousseeuw, P. (1987). Silhouettes: A graphical aid to the in-

terpretation and validation of cluster analysis. J. Com-

put. Appl. Math., 20:53–65.

Sedlmair, M., Meyer, M., and Munzner, T. (2012). Design

study methodology: Reﬂections from the trenches and

the stacks. IEEE TVCG, 18(12):2431–2440.

Shneiderman, B. (1996). The eyes have it: A task by

data type taxonomy for information visualizations. In

Proc. of VL, pages 336–343. IEEE.

Spence, R. (2014). Information Visualization: An Introduc-

tion. Springer.

Stasko, J. (2014). Value-driven evaluation of visualizations.

In Proc. of BELIV, pages 46–53. ACM.

Tolegen, G., Toleu, A., Mussabayev, R., and Krassovitskiy,

A. (2022). A clustering-based approach for topic mod-

eling via word network analysis. In Proc. of UBMK,

pages 192–197. IEEE.

Wall, E., Agnihotri, M., Matzen, L., Divis, L., Haass, M.,

Endert, A., and Stasko, J. (2019). A heuristic approach

to value-driven evaluation of visualizations. IEEE

TVCG, 25(1):491–500.

Ware, C. (2021). Information Visualization: Perception for

Design. Morgan Kaufmann, 4th edition.

Wilke, C. (2019). Fundamentals of Data Visualization: A

Primer on Making Informative and Compelling Fig-

ures. O’Reilly Media.

Visualization of Swedish News Articles: A Design Study

677