Visual Analytics for Narrative Text

Visualizing Characters and their Relationships as Extracted from Novels

Markus John, Steffen Lohmann, Steffen Koch, Michael W

orner and Thomas Ertl

Institute for Visualization and Interactive Systems, University of Stuttgart, Universit

atsstraße 38, Stuttgart, Germany

Keywords:

Text Visualization, Visual Text Analytics, Digital Humanities, Distant Reading.

Abstract:

The study of novels and the analysis of their plot, characters and other entities are time-consuming and com-

plex tasks in literary science. The digitization of literature and the proliferation of electronic books provide

new opportunities to support these tasks with visual abstractions. Methods from the ﬁelds of computational lin-

guistics can be used to automatically extract entities and their relations from digitized novels, which can then

be visualized to ease exploration and analysis tasks. This paper presents a web-based approach that combines

automatic analysis methods with effective visualization techniques. Different views on the extracted entities

are provided and relations between them across the plot are indicated. Two usage scenarios show successful

applications of the approach and demonstrate its beneﬁts and limitations.

1 INTRODUCTION

Common tasks in literary science are studying novels

and analyzing their plot, characters and other entities.

Literary scholars are interested in getting an overview

of the plot and its characters, the relationships be-

tween them and their evolution during the plot (Phe-

lan, 1989). The digitization of literature and the pro-

liferation of electronic books (ebooks) provide new

means to support these tasks with visual abstractions

that are automatically generated from ebooks.

Traditionally, literary scholars read and analyze

novels in a sequential way by using so-called close

reading. In contrast to this, Moretti introduced the

idea of distant reading (Moretti, 2005). Instead of

carefully reading and analyzing a literary work, dis-

tant reading abstracts the text by providing visualiza-

tions such as graphs that depict the genre change of

historical novels, maps to represent geographical as-

pects of the plot, or trees to classify various types of

detective stories (J

anicke et al., 2015). These visual

abstractions can convey useful information and as-

sist in exploring and understanding complex relation-

ships, verifying hypotheses as well as forming new

research ideas.

In order to provide visual abstractions for lit-

erary works, a combination of automatic methods

and interactive visualization techniques is required.

When dealing with ebooks, natural language process-

ing methods are the ﬁrst choice for automatic analy-

sis. Using these methods, entities such as characters

and places can be extracted from the ebooks. This en-

ables the development of visual abstractions that al-

low to explore these entities and their relationships in

more detail. In particular, a large interest has grown in

web-based systems for literature analysis that are easy

to use and do not require any skills in computational

linguistics (Plaisant et al., 2006).

This paper presents a web-based approach that

aims to provide literary scholars with visual abstrac-

tions to facilitate character analysis in novels. It uti-

lizes automatic named entity extraction and visualizes

relationships between characters and places based on

co-occurrence analyses. The basic idea of the ap-

proach is to highlight patterns, such as speciﬁc char-

acters and places or groups of characters that interact

with each other at certain places over time. By offer-

ing several views, such patterns are made easily rec-

ognizable with our approach and provide the starting

point for a deeper analysis. This can result in a better

understanding of the plot, in particular related to the

characters and their relationships.

The main contributions of this work are: 1) A web-

based approach that offers a wide range of interac-

tive features to facilitate character analysis in nov-

els. 2) Several interrelated visual abstractions that en-

able the interactive switching to the corresponding

text sources in the document.

The rest of the paper is structured as follows: Sec-

tion 2 summarizes related work before our approach

John, M., Lohmann, S., Koch, S., Wörner, M. and Ertl, T.

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels.

DOI: 10.5220/0005669800270038

In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 2: IVAPP, pages 29-40

ISBN: 978-989-758-175-5

is detailed in Section 3. This is followed by two use

cases demonstrating the applicability and usefulness

of the approach in Section 4. Section 5 provides a dis-

cussion of the approach and Section 6 concludes the

paper with a summary and outlook on future work.

2 RELATED WORK

Since our approach is concerned with the visual ab-

straction of text, we ﬁrst summarize existing work in

this area. Next, we report on visual analytics attempts

in the ﬁeld of literary science and review the systems

that are most closely related to our approach.

2.1 Visual Text Abstraction

Several techniques for visually abstracting and sum-

marizing text documents have been developed over

the last years. One compact visualization method

related to our approach is literature ﬁngerprinting

(Keim and Oelke, 2007), which uses a pixel-based

technique that represents each text unit as a single

pixel and visually groups them into higher level units.

A related technique is used in Seesoft (Eick et al.,

1992), which has been designed as a visual ﬁnger-

print summarization of source code to graphically

represent software statistics. The intention of Tilebars

(Hearst, 1995) is the visual representation of search

results similar to the ﬁngerprint idea, while Feature-

Lens (Don et al., 2007) also uses a pixel-based at-

tempt to explore interesting text patterns and to ﬁnd

co-occurrences in texts.

Another popular technique to visually summa-

rize text are word clouds (Vi

egas and Wattenberg,

2008; Clement et al., 2009). They usually depict the

most frequently used words of a text with the font

size scaled according to the word frequencies. Word

clouds enable literature scholars to get a ﬁrst impres-

sion of the main terms and topics of a text (Heimerl

et al., 2014).

To visualize relational information of a text doc-

ument, approaches such as WordTrees (Wattenberg

and Viegas, 2008) and PhraseNets (Van Ham et al.,

2009) were suggested. They depict either syntactic,

lexical or hierarchical relationships that exist between

the words of a text as node-link diagrams. In contrast,

(Oelke et al., 2013) use an adjacency matrix to encode

the development of relations between entities across a

text document.

Inspired by Munroe’s hand-drawn illustration

“Movie Narrative Charts” (Munroe, 2009), a new vi-

sual technique has emerged as so-called storyline or

plot view visualization (Liu et al., 2013). It aims to

portray the dynamic relationship between entities in a

story over time. (Tanahashi and Ma, 2012) as well as

(Liu et al., 2013) propose design considerations and

an efﬁcient optimization approach for generating aes-

thetically appealing storyline visualizations.

2.2 Visual Text Analytics

In recent years, several approaches for visual text an-

alytics have been introduced in different domains.

Examples can be found in social media (Dou et al.,

2012), opinion mining (Oelke et al., 2009) or patent

analysis (Koch et al., 2011), among others.

There are also quite a number of visual analytics

approaches in the ﬁeld of literary science. (J

anicke

et al., 2014) propose several techniques for the visu-

alization and comparison of text that is reused in dif-

ferent documents in order to support literary scholars

in discovering and exploring intertextual similarities.

(Abdul-Rahman et al., 2013) present a rule-based so-

lution for poetry visualization allowing for high-level

interactions with the end users in a closed loop. They

use glyphs to encode phonetic units and visual links

to show phonetic and semantic relationships.

The VarifocalReader (Koch et al., 2014) sup-

ports literary scholars by combining distant and close

reading and by enabling intra-document explorations

through advanced navigation concepts. It integrates

machine learning techniques, search mechanisms and

several visual abstractions.

(Oelke et al., 2013) discuss the analysis of prose

literature by using the aforementioned literature ﬁn-

gerprinting technique. Their approach visually ab-

stracts implicit relationships between characters and,

at the same time, encodes their development within

the analyzed novel. However, it does not allow to di-

rectly work with the text resource.

(Vuillemot et al., 2009) present the system

POSvis, which extracts named entities from literary

text and focuses on the exploration of networks of

characters. POSvis offers multiple coordinated views,

including word clouds and self-organizing graphs,

equipped with ﬁlter methods to review the vocabu-

lary of novels. While this is closely related to our

work, we do not pursue the goal to review the vocab-

ulary in the context of one or more entities ﬁltered by

part of speech. Instead, we aim to support the analy-

sis of characters and their relationships in the story-

line of a novel, based on named entity extraction and

co-occurrence analyses.

Another system closely related to our work is Jig-

saw (Stasko et al., 2008), which has been designed

to support analysts during foraging and sense-making

activities in collections of textual reports and other

IVAPP 2016 - International Conference on Information Visualization Theory and Applications

sets of documents. It provides multiple coordinated

views including lists, scatter plots, word clouds and

graph visualizations that allow tracking entities and

exploring their relationships across the document col-

lections.

Jigsaw has been designed primarily for inter-

document analysis, whereas we are interested in intra-

document analysis, i.e., we support the analysis of a

single text document at a time. Apart from that, Jig-

saw follows a rather generic approach that does not

focus on ﬁctional literature and the analysis of char-

acters but provides general-purpose visualizations for

different kinds of entities extracted from the docu-

ments.

3 VISUAL ANALYSIS

Our approach has been implemented as a web appli-

cation that is easily available to literary scholars and

other user groups, as it does not require any installa-

tion on the user’s side. The implementation is based

on standard web technologies and can be run with a

modern web browser supporting HTML5, SVG, CSS,

and JavaScript.

It provides different visual abstrac-

tions representing speciﬁc characteristics of the ana-

lyzed novel and highlighting search results that illus-

trate the development of characters in a storyline.

The web application offers automatic methods for

importing novels, extracting characters and places

and visualizing them. The developed visualizations

include word clouds, ﬁngerprints of characters and

places, a graph representation indicating connections

between characters and a plot view that illustrates the

relationships between characters and places in a story

over time.

3.1 Text Processing

There are a variety of formats for the digital represen-

tation of novels. One widely used format is EPUB,

which is a free and open standard that encodes struc-

ture and layout information besides the actual text

of the novel. Many digital libraries, such as Project

Gutenberg

, offer ebooks in EPUB format or alterna-

tively as plain text.

EPUB and plain text are also the two formats sup-

ported by our approach. It does not require the plain

text to be structured in a speciﬁc way. However, if

the structure of chapters or other metadata should be

A public demo of the web application is available at:

http://textvis.visualdataweb.org

http://www.gutenberg.org

considered in the analysis, this information must be

given in the text ﬁle. We therefore utilize some sim-

ple markup to structure ebooks provided in plain text,

such as those by Project Gutenberg. The markup can

be used to add chapter headings, line breaks and com-

ments as well as other metadata (e.g., the title, au-

thor(s), publication date, publisher, edition or genre

of a book) manually or automatically.

Once the ebook is loaded into the system, it is

processed in a linguistic analysis pipeline, consisting

of tokenization, sentence splitting and named-entity

recognition. The implementation offers three differ-

ent analysis tools that users can choose from: Stan-

ford CoreNLP

, OpenNLP

and ANNIE

. All three

tools perform state-of-the-art natural language pro-

cessing (NLP) but use different techniques that each

have their beneﬁts and limitations. Depending on the

use case and type of novel, users can select the NLP

tool that is most suitable for the analysis. As this is

often not clear from the start, they can also run the

linguistic analysis several times with all three tools,

compare the different outputs and choose the one that

produces the best results. The current implementation

only supports the processing of English texts; how-

ever, it can be extended to other languages if required.

The users can set several other parameters for the

analysis and visualization. Most importantly, they can

control whether stop words are removed and whether

person and place names starting with a lowercase let-

ter should be considered. However, these parameters

are only shown on demand and are intended for the

experienced user. By default, ANNIE is used for NLP

processing and unlikely character and place names

are removed, as these settings, in our experience, pro-

duced the best results for most of the novels we tested.

3.2 Overview Page

After an ebook has been linguistically analyzed, an

overview page is shown, listing metadata about the

book and providing links to the visual abstractions.

As an example, the overview page of the classic ad-

venture novel “Around the World in 80 Days” by Jules

Verne is shown in Figure 1

.

Some of the metadata (e.g., the title, author, and

release date) is directly taken from the text source

(if provided), while the determination other meta-

data (e.g., the number of words and chapters) requires

some basic text analysis. Yet other metadata, such as

the main characters listed on the overview page, can

only be determined by using advanced text analysis,

http://nlp.stanford.edu/software/corenlp.shtml

http://opennlp.apache.org/

https://gate.ac.uk/ie/annie.html

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels

Figure 1: Overview page showing metadata about the novel “Around the World in 80 Days” by Jules Verne.

in this case named-entity recognition. This advanced

text analysis is computationally complex and can take

some time depending on the size of the novel. For in-

stance, the advanced analysis of the novel “Around

the World in 80 Days” with ANNIE requires around

50 seconds on the current server that hosts the demo

application (CPU: Intel Core i7-4930K with 3.4 GHz,

4 GB RAM), whereas the basic analysis is completed

in less than 5 seconds. As a general strategy, we there-

fore decided to show the results of each analysis step

whenever they are available; for example, the web ap-

plication already shows the results of the basic analy-

sis although the advanced analysis is still running.

This strategy also applies to the different visu-

alizations provided by our approach. For instance,

while a basic word cloud visualization can quickly be

generated, the more advanced word cloud as well as

most other visualizations require more sophisticated

NLP processing and can therefore not be shown be-

fore this processing is completed.

The interactive visualizations are available via the

menu on the left (Figure 1

), or via the icons on the

right on the overview page (Figure 1

). The ﬁnger-

print visualizations of the main characters can directly

be opened from the overview page by clicking on

the corresponding links. The main characters are de-

termined by counting their occurrences in the novel,

with those that appear most often in the novel are as-

sumed to be the protagonists. This simple measure

worked surprisingly well for the novels we tested,

in particular, since we also consider variations of the

character names as detailed in Section 3.4.

The application assigns a unique color to each

of the main characters. This color is shown in the

overview page and consistently used for that character

on all pages and in all visualizations. We created two

color schemes, one for users with color vision deﬁ-

ciencies, consisting of four distinct colors determined

by using the ColorBrewer 2.0

, and the other for users

with normal vision consisting of seven distinct colors

using the categorical color scheme of D3

. If there are

more than seven characters in a novel, the rest of them

are shown in a gray color.

3.3 Characters and Places

To get an overview of the extracted characters and

places, users can open either the characters or places

view. Initially, the most frequently occurring entity is

preselected in both views, complemented by a list of

all extracted entities (Figure 2

), where it is possible

to search for and switch between entities.

For each selected entity, a proﬁle is provided,

containing information about its occurrences, the de-

tected gender and alternative names (for characters),

and listing other entities that co-occur most frequently

with that entity (Figure 2

). In addition, the ﬁnger-

print visualization of the selected entity is shown (Fig-

ure 2

).

3.4 Fingerprint Visualization

After the users obtained a ﬁrst overview, they can fur-

ther analyze the characters and places with the ﬁn-

gerprint visualization. It shows the temporal distribu-

tion of the entity occurrences in the novel. Blue and

red bars represent parts and chapters of a book. Black

blocks depict text segments where the entities occur.

http://colorbrewer2.org

https://github.com/mbostock/d3/wiki/Ordinal-Scales#

categorical-colors

IVAPP 2016 - International Conference on Information Visualization Theory and Applications

Figure 2: Character view listing information about the character Mr. Fogg extracted from the novel.

By clicking on the toggle separators button, the part

and chapter bars can be hidden to get the ‘plain’ ﬁn-

gerprint.

Longer blocks indicate that entities occur often in

that segment of the book, while shorter blocks appear

when a character or place is only brieﬂy mentioned.

However, a longer block does not necessarily mean

that entities are mentioned in every consecutive sen-

tence, but it is sufﬁcient if they are mentioned every

few sentences.

Users can highlight a block segment by hovering

over it to determine the respective chapter, shown in

a tooltip. By clicking on it, they can jump to the cor-

responding text passage in the novel, which is opened

in the text view. In that view, all occurrences are high-

lighted with the assigned speciﬁc color of the entity.

This supports users in ﬁnding and analyzing text pas-

sages faster.

In addition, they can select multiple characters or

places to get a combined ﬁngerprint, as depicted in

Figure 3. The example shows the conjunction of two

selected characters (Passepartout and Fix). This way,

users can easily determine text passages where se-

lected characters co-occur.

By default, we deﬁne that entities co-occur if

they appear in at least one sentence, which is a

common co-occurrence measure. However, other co-

occurrence measures (number of words, etc.) are also

possible and can be set as internal system parameters

if required.

3.5 Character Network

To further investigate the character relations, users

can switch to the character network view. This view

contains a force-directed graph visualization that rep-

resents connections between characters, as depicted in

Figure 4. The nodes of the graph represent the charac-

ters and the edges the number of sentences in which

each pair of characters co-occurs. The node size and

edge thickness are scaled proportionally to the charac-

ters’ individual and co-occurrence frequency respec-

tively. This helps to get a quick overview of the main

characters and their connections.

The character network is once again comple-

mented by a ﬁngerprint visualization (Figure 4

) to

indicate where the characters occur in the novel, and

with a range slider (Figure 4

) that lets users limit

the view to a certain range of the novel (e.g., a sin-

gle chapter). This has the advantage that users are

not only enabled to analyze the overall structure of

a novel but also the course of the relationships be-

tween characters, at least on the level of character

co-occurrences. Users can select an edge in the graph

(Figure 4

) to display the co-occurrences of two re-

lated characters in the combined ﬁngerprint visualiza-

tion.

Initially, up to seven characters are preselected for

the graph visualization, based on their occurrence fre-

quency. The list of characters is again available in that

view (Figure 4

), so that users can search, select and

deselect the characters that are shown in the graph

visualization. That way, the graph visualization can

be dynamically adapted according to the goals of the

user. It can also be panned, zoomed, and rearranged

to further support the analysis.

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels

Figure 3: Combined ﬁngerprint visualization showing two selected characters: Passepartout and Fix.

Figure 4: Character network visualization representing the

co-occurrence graph of a selected text segment. This exam-

ple represents connections between the main characters of

the novel “The Lord of the Rings”. The nodes of the graph

represent the characters and the edges the number of sen-

tences in which they co-occur.

3.6 Word Cloud

If users are interested in what is being said near the

characters or in getting a ﬁrst impression of the con-

tents of the novel, they can switch to the word cloud

view. Word clouds are commonly used by literary

scholars, as they are considered easy to understand

(Clement et al., 2009) despite all their limitations

(McNaught and Lam, 2010). The font size of the vi-

sualized words is scaled proportionally to their oc-

currence frequency, which is additionally shown in a

tooltip on demand.

The user can switch between a global word cloud

representing the entire novel and local word clouds

for the individual characters. The latter show the

words that co-occur most often with the characters,

as depicted in Figure 5 for Mr. Fogg. This gives users

some ﬂexibility in their analysis, by providing a visu-

ally appealing overview of the novel or a novel char-

acter as well as supporting the discovery of new ideas

and hypotheses.

3.7 Plot View

Finally, users can switch to the plot view to get a bet-

ter idea of the dynamic relationships between char-

Figure 5: Word cloud view of the character Mr. Fogg.

acters. It reuses and extends an implementation of

the University of Waterloo

, which takes annotation

data and automatically generates narrative charts in

the spirit of the aforementioned “Movie Narrative

Charts” (Munroe, 2009).

Our approach adapts the visualization and dis-

plays the ten most frequently occurring characters as

lines and every chapter as a node (cf. Figure 6

). The

horizontal axis represents the plot of the novel and the

vertical grouping of lines indicates which characters

co-occur in the chapters. If two or more lines share

a node, this means that the corresponding characters

co-occur frequently in that chapter. When hovering

over a node, a tooltip lists the characters and places

which co-occur in that chapter. Hovering over a line

highlights the whole line as well as the name of the

corresponding character.

The plot view supports users in getting a rough

idea of the course of the storyline. It allows to quickly

identify when and where characters come together or

go separate ways and whether groups of characters

exist. In the plot view of Figure 6, one can see, for

example, that Mr. Fogg and Fix interact for the ﬁrst

time in chapter seven (Figure 6

), while Aouda and

John Bunsby enter the plot in the middle of the novel

(Figure 6

).

http://csclub.uwaterloo.ca/

∼

n2iskand/?page id=13/

IVAPP 2016 - International Conference on Information Visualization Theory and Applications

Figure 6: Plot view of Jules Verne’s novel “Around the World in 80 Days”.

3.8 Text View

To support literary scholars in their common work-

ﬂow, we also provide a text view where they can di-

rectly work with the text. Recognized chapters are

listed as hyperlinks on the left (Figure 7

), while

the text is presented on the right (Figure 7

). The

focused chapter is emphasized with bold type. After

clicking on a chapter, the text view jumps to the begin-

ning of that chapter. If the user reached the text view

from the ﬁngerprint, graph network or word cloud

view, the selected entity or entities are highlighted in

the assigned color. Furthermore, there is a possibility

for searching any other word or text passage as well

as to reset the highlighting with a reset button.

The text view displays a vertical ﬁngerprint next

to its scrollbar. The idea is to provide both a visual

representation of the distribution of entities and the

possibility to inspect a text passage in detail, in or-

der to support a distant and close reading analysis.

When hovering over the ﬁngerprint blocks, the corre-

sponding text passage is displayed in a tooltip (Fig-

ure 7

), and after clicking on one, the text view

jumps to the corresponding position. Additionally, the

literary scholars can jump to the next or previous oc-

currence of an entity by clicking on the up and down

arrow buttons.

4 USAGE SCENARIOS

In the following, we present two usage scenarios that

demonstrate the applicability and usefulness of our

approach. In the ePoetics project

, we developed ap-

proaches to support the analysis of German poetics

– a form of early scholarly works on literature that

http://www.epoetics.de

formed the foundation of modern literature science.

During our collaboration with colleagues from the lit-

erature department, we discussed the usefulness of

direct access to text sources when using visual ab-

stractions. This discussion inspired the development

of an approach for analyzing narrative texts in a simi-

lar way, but taking into account the speciﬁc aspects of

the plot and the social network of characters described

in the text.

Although this was not the primary focus of the

project, our colleagues emphasized its usefulness for

their own work and for teaching. We therefore de-

cided to set up a corresponding student project in oder

to ﬁnd out whether creating such visual abstractions

from text mining results would be possible using off-

the-shelf NLP techniques and tested the applicability

on well-known novels.

For the usage scenarios, we selected a modern and

an old English novel for analysis by a ﬁctitious lit-

erary scholar. She has previous knowledge about the

novels, since she read them some time ago, and is now

trying to retrace the storyline and the important events

by means of our approach.

4.1 Analysis of “Harry Potter and the

Half-Blood Prince”

In our ﬁrst usage scenario, we present an analysis of

the novel “Harry Potter and the Half-Blood Prince”

by J. K. Rowling. It is the sixth and penultimate novel

in the Harry Potter series and was published in 2009.

The series chronicles the adventures of the young wiz-

ard Harry Potter and his quest to defeat the dark wiz-

ard Lord Voldemort, who strives to rid the wizarding

world of Muggle (non-magical) heritage.

In a ﬁrst step, the literary scholar explores and an-

alyzes the character and network view. That way, she

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels

Figure 7: Text view with selected character Fix.

Figure 8: Some of the visualizations used by the literary scholar in the analysis of the novel “Harry Potter and the Half-Blood

Prince”: character view

, word cloud

, ﬁngerprint visualization

 and text view

.

IVAPP 2016 - International Conference on Information Visualization Theory and Applications

Figure 9: Plot view showing some of the main characters of J. R. R. Tolkien’s novel “The Hobbit”.

gets a quick overview of the main characters and their

relationships. During the analysis, she encounters the

name Slughorn and is surprised because she cannot

remember him. To ﬁnd out more about Slughorn, she

selects the name in the character view (Figure 8

)

and uses the ﬁngerprint visualization to jump directly

to its ﬁrst occurrence, which is opened in the text

view.

While reading some paragraphs in the text view,

she ﬁnds out that Dumbledore, the headmaster of the

wizarding school Hogwarts, convinced Slughorn to

return as potions teacher. Afterwards, she vaguely re-

members that Dumbledore tasked Harry Potter with

retrieving a memory from Slughorn that contains cru-

cial information about Voldemort. To pursue this as-

sumption, she switches to the word cloud view and

explores the word clouds of Slughorn and Voldemort

(Figure 8

).

In the word cloud of Voldemort, she identiﬁes the

term Horcrux (an object in which a dark wizard has

hidden a fragment of his soul for the purpose of at-

taining immortality) and remembers a conversation

between Slughorn and Harry Potter at the home of

Hagrid, the gamekeeper of Hogwarts.

In order to ﬁnd the text passage, she uses the

ﬁngerprint view and selects the three characters

Slughorn, Hagrid and Harry Potter. She determines

that all three characters only co-occur at one text pas-

sage (Figure 8

). Consequently, she jumps to that

text passage and ﬁnds that Harry Potter succeeds in

retrieving the memory, which shows Voldemort ask-

ing for information on creating Horcruxes.

In the following, she switches to the plot view

since she is interested in examining the course of the

storyline again. She still knows that Dumbledore dies

at the end of the book and that Draco Malfoy, the son

of one of Voldemort’s followers, and Severus Snape,

a professor at Hogwarts, are involved in his death. By

analyzing the different chapters (nodes) and occurring

characters (lines) in the plot view, she quickly recog-

nizes the chapter of Dumbledore’s death and jumps

directly to the text.

With the aid of the vertical ﬁngerprints next to the

text view’s scroll bar, she can easily analyze the rele-

vant text passages as depicted in Figure 8

. She con-

ﬁrms her recollection that Draco Malfoy was chosen

by Voldemort to kill Dumbledore. Furthermore, she

ﬁnds out that Malfoy was unable to bring himself to

do it and that Snape accomplished it with a deadly

curse.

4.2 Analysis of “The Hobbit”

In the second usage scenario, our ﬁctitious literary

scholar analyzes the children’s book “The Hobbit” by

J. R. R. Tolkien. It was published in 1937 and is about

Bilbo, a hobbit and the protagonist, and his adventures

with dwarfs, elves, trolls and a dragon.

To reproduce the course of the novel, she starts

her analysis on the plot view, as depicted in Figure 9.

By scanning the view, she gets a quick overview of

the plot and remembers that Bilbo’s adventure begins

at his home with Gandalf, a wizard, and 13 dwarfs

(Figure 9

).

After jumping to the text and reading some pas-

sages in the text view, she remembers that they want

to recover the treasure from Erebor (also known as the

Lonely Mountain, former home to the greatest dwarf

kingdom) and Bilbo is hired as their “burglar”, since

hobbits are small and unobtrusive.

Once she returns to the plot view, she notices

the name of the creature Gollum (Figure 9

) (origi-

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels

nally a hobbit) and recalls that Bilbo wins a magical

ring from him in a riddle war. However, she is un-

sure which further role Gollum plays in the plot and

whether he co-occurs with other characters. To gain

insights into this question, she switches to the graph

view and immediately recognizes that only a relation-

ship between Gollum and Bilbo exist, as depicted in

Figure 10

. Subsequently, she activates the character

view of Gollum to examine his ﬁngerprint visualiza-

tion, and thus conﬁrms that he only appears at this

point within the novel.

As the next step, she further analyzes the plot view

and discovers that Smaug (Figure 9

), a great ﬁre

dragon, enters the plot at the end of the novel. The

literary scholar knows that Smaug invaded the dwarf

kingdom of Erebor a long time ago and now guards

the treasure. She vaguely remembers that Bilbo and

the dwarfs are searching for a speciﬁc treasure ob-

ject. To inspect this assumption, she switches to the

text view and jumps to the text passage where Smaug

occurs for the ﬁrst time in the plot. While reading,

she ﬁnds out that the searched-for object is the Arken-

stone, a great jewel.

In order to get more information on the Arken-

stone, she uses the keyword search to highlight all

occurrences as vertical ﬁngerprints next to the text

view’s scroll bar. By analyzing the text passages, she

determines, for example, that the Arkenstone is a heir-

loom of the dwarf kings and that, at the end of the

novel, it is placed upon Thorin’s, the legitimate king

of Erebor’s, chest within his tomb deep under the

Lonely Mountain.

The usage scenarios show that our approach pro-

vides literary scholars with visual abstractions that fa-

cilitate character analysis in novels. Through the de-

veloped set of visual and automatic methods, we sup-

port them in conﬁrming and generating hypotheses,

conﬁrming recollections and gaining insights.

5 DISCUSSION AND FUTURE

WORK

Our approach was developed to support intra-

document analysis by visualizing the dynamic rela-

tion between characters, and not for a comparative

text of novels. However, we provide several visual ab-

stractions, which offer different views on the text to

support literary scholars in their analysis.

The approach can be ﬂexibly extended by addi-

tional visual abstractions. For example, views that

support literary scholars with visualizations of geo-

graphical information (J

anicke et al., 2012) could be

integrated if the novel’s setup is based on the real

Figure 10: Character network view showing some of the

main characters of “The Hobbit” and their co-occurrence

frequencies.

world or offers own cartographic material.

In addition, the visual abstractions could be im-

proved and extended. One such extension could be

different graph layouts for the character network to

get a better impression of the constellation of char-

acters or to add more information, such as semantic

relations between characters. This would enable liter-

ary scholars to recognize relationships between char-

acters even better.

A further possibility might be to expand the plot

view with automatically detected speciﬁc events (e.g.,

the “Battle of Helm’s Deep” in case of the novel “The

Lord of the Rings”). Thereby, literary scholars could

better track and identify the course of the story.

Our visual abstractions are based on out-of-the-

box natural language processing toolkits and we are

interested in examining how far we can get in sup-

porting literary scholars with their analysis. Natural

language processing methods have continuously im-

proved and show strong robustness. Typically, they

are rule-based, or rely on statistical and machine

learning approaches, which have been trained on

available modern newspaper or journal article texts,

since large trainings sets are most often available for

these types of texts. As a consequence, natural lan-

guage processing techniques could be less effective

when applied to a very speciﬁc type of text, such as

historical or ﬁctional novels.

This can lead to uncertainties and errors in the

visual abstractions. As a consequence, uncritical in-

terpretation of such visual representations can cause

confusion and misunderstandings. Visualization can

play an important role in addressing these problems

IVAPP 2016 - International Conference on Information Visualization Theory and Applications

in the future. We intend to provide visual clues on the

quality of automatic analyses (if available), for exam-

ple, through color and opacity. This could help literary

scholars to interpret automatically generated results.

Furthermore, we want to let users steer and adapt

techniques interactively according to their needs. As

a result, automatic methods can be improved and ap-

plied to similar texts.

The current implementation of the detection of co-

occurrences between entities is based on named en-

tity recognition. We determine that two entities co-

occur if they both appear at least in one sentence.

This approach works well in many cases. However, it

can be improved in various aspects. One option could

be to let the literary scholars set the co-occurrences

range variable, for example, several sentences or a

whole paragraph. Additionally, methods for corefer-

ence resolution might be integrated, to detect more

instances and compute more precise frequency val-

ues for entities. Coreference resolution is the task of

resolving noun phrases to the entities that they re-

fer to and there already exist robust methods (Raghu-

nathan et al., 2010; Stoyanov et al., 2010). Existing

approaches are usually trained on non-ﬁctional texts

and have to be adapted (where appropriate) to the

users’ needs.

In the future, we plan to take the semantics of the

text into account in order to better investigate the re-

lationships between entities. Again, already existing

approaches (Gildea and Jurafsky, 2002; Ruiz-Casado

et al., 2007) could be integrated into our approach

to provide the literary scholars with more informa-

tion for their analysis. In general, we aim at support-

ing a broader spectrum of natural language processing

methods in the future.

Another issue is the challenge of scalability when

working with arbitrarily long novels. Most of our vi-

sual abstractions scale very well. However, the ﬁnger-

print visualization and the plot view become less use-

ful with longer text. Since the available screen space

remains the same, they are increasingly compressed

until they overlap. To address this problem in the fu-

ture, we could integrate a focus+context technique,

such as ﬁsheye distortion (Bederson, 2000), to present

detailed information in context, or an overview+detail

approach to present multiple views with different lev-

els of abstractions (Cockburn et al., 2009).

6 CONCLUSION

In this work, we presented an easily accessible web-

based approach for visualizing the relation of charac-

ters and places in a novel. Our approach includes dif-

ferent natural language processing toolkits to extract

named entities and possibilities to set parameters for

variable analyses. Furthermore, we provide several vi-

sual abstractions, which support literary scholars with

a ﬂexible and comprehensive analysis of the novel

characters. In particular, our approach facilitate dis-

tant reading and provide a starting point for new ideas,

hypotheses and further analyses. All views enable lit-

erary scholars to jump to the corresponding text pas-

sage and thus allow for working with the text directly.

We presented an implementation of the approach and

provided two usage scenarios that illustrate its appli-

cability and usefulness.

ACKNOWLEDGEMENTS

We would like to thank our students Sanjeev Balakr-

ishnan, Felix Do, Sebastian Frank, Paul Kuznecov,

Vincent Link, Eduard Marbach, Jan Melcher, Chris-

tian Richter, Marc Weise, and Marvin Wyrich who

implemented the approach in a student project. This

work has partly been funded by the German Federal

Ministry of Education and Research (BMBF) as part

of the ‘ePoetics’ project.

REFERENCES

Abdul-Rahman, A., Lein, J., Coles, K., Maguire, E., Meyer,

M., Wynne, M., Johnson, C. R., Trefethen, A., and

Chen, M. (2013). Rule-based visual mappings – with a

case study on poetry visualization. Computer Graph-

ics Forum, 32(3pt4):381–390.

Bederson, B. (2000). Fisheye menus. In Proceedings of

the 13th Annual ACM Symposium on User interface

Software and Technology, pages 217–225. ACM.

Clement, T., Plaisant, C., and Vuillemot, R. (2009). The

story of one: Humanity scholarship with visualization

and text analysis. In Proceedings of the Digital Hu-

manities Conference, DH ’09.

Cockburn, A., Karlson, A., and Bederson, B. (2009). A re-

view of overview+detail, zooming, and focus+context

interfaces. ACM Computing Surveys, 41(1):1–31.

Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L.,

Clement, T., Shneiderman, B., and Plaisant, C. (2007).

Discovering interesting usage patterns in text collec-

tions: Integrating text mining with visualization. In

Proceedings of the 16th ACM Conference on Con-

ference on Information and Knowledge Management,

CIKM ’07, pages 213–222. ACM.

Dou, W., Wang, X., Skau, D., Ribarsky, W., and Zhou, M.

(2012). Leadline: Interactive visual analysis of text

data through event identiﬁcation and exploration. In

Proceedings of the IEEE Conference on Visual Ana-

lytics Science and Technology, VAST ’12, pages 93–

102.

Visual Analytics for Narrative Text - Visualizing Characters and their Relationships as Extracted from Novels

Eick, S., Steffen, J., and Sumner, E.E., J. (1992). Seesoft

– a tool for visualizing line oriented software statis-

tics. IEEE Transactions on Software Engineering,

18(11):957–968.

Gildea, D. and Jurafsky, D. (2002). Automatic labeling of

semantic roles. Computational linguistics, 28(3):245–

288.

Hearst, M. A. (1995). Tilebars: Visualization of term dis-

tribution information in full text information access.

In Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, CHI ’95, pages 59–66.

ACM/Addison-Wesley.

Heimerl, F., Lohmann, S., Lange, S., and Ertl, T. (2014).

Word cloud explorer: Text analytics based on word

clouds. In 47th Hawaii International Conference

on System Sciences, HICCS ’14, pages 1833–1842.

IEEE.

anicke, S., Franzini, G., Cheema, M. F., and Scheuermann,

G. (2015). On Close and Distant Reading in Digi-

tal Humanities: A Survey and Future Challenges. In

Eurographics Conference on Visualization – STARs,

EuroVis ’15. Eurographics Association.

anicke, S., Geßner, A., B

uchler, M., and Scheuermann, G.

(2014). Visualizations for text re-use. In Proceedings

of the 5th International Conference on Information

Visualization Theory and Applications, IVAPP ’14,

pages 59–70. INSTICC.

anicke, S., Heine, C., Stockmann, R., and Scheuermann,

G. (2012). Comparative visualization of geospatial-

temporal data. In Proceedings of the 3rd International

Conference on Information Visualization Theory and

Applications, IVAPP ’14, pages 613–625. INSTICC.

Keim, D. and Oelke, D. (2007). Literature ﬁngerprinting:

A new method for visual literary analysis. In Pro-

ceedings of the IEEE Symposium on Visual Analytics

Science and Technology, VAST ’07, pages 115–122.

Koch, S., Bosch, H., Giereth, M., and Ertl, T. (2011). It-

erative integration of visual insights during scalable

patent search and analysis. IEEE Transactions on Vi-

sualization and Computer Graphics, 17(5):557–569.

Koch, S., John, M., Worner, M., Muller, A., and Ertl, T.

(2014). Varifocalreader – in-depth visual analysis of

large text documents. IEEE Transactions on Visual-

ization and Computer Graphics, 20(12):1723–1732.

Liu, S., Wu, Y., Wei, E., Liu, M., and Liu, Y. (2013).

Storyﬂow: Tracking the evolution of stories. IEEE

Transactions on Visualization and Computer Graph-

ics, 19(12):2436–2445.

McNaught, C. and Lam, P. (2010). Using wordle as a

supplementary research tool. The qualitative report,

15(3):630–643.

Moretti, F. (2005). Graphs, maps, trees: abstract models for

a literary history. Verso.

Munroe, R. (2009). Movie narrative charts.

http://xkcd.com/657/.

Oelke, D., Hao, M., Rohrdantz, C., Keim, D., Dayal, U.,

Haug, L., and Janetzko, H. (2009). Visual opinion

analysis of customer feedback data. In Proceedings

of the IEEE Symposium on Visual Analytics Science

and Technology, VAST ’09, pages 187–194.

Oelke, D., Kokkinakis, D., and Keim, D. A. (2013). Fin-

gerprint matrices: Uncovering the dynamics of social

networks in prose literature. Computer Graphics Fo-

rum, 32(3pt4):371–380.

Phelan, J. (1989). Reading people, reading plots: Charac-

ter, progression, and the interpretation of narrative.

University of Chicago Press.

Plaisant, C., Rose, J., Yu, B., Auvil, L., Kirschenbaum,

M. G., Smith, M. N., Clement, T., and Lord, G.

(2006). Exploring erotics in emily dickinson’s corre-

spondence with text mining and visual interfaces. In

Proceedings of the 6th ACM/IEEE-CS Joint Confer-

ence on Digital libraries, pages 141–150. ACM.

Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N.,

Surdeanu, M., Jurafsky, D., and Manning, C. (2010).

A multi-pass sieve for coreference resolution. In Pro-

ceedings of the 2010 Conference on Empirical Meth-

ods in Natural Language Processing, EMNLP ’10,

pages 492–501. ACL.

Ruiz-Casado, M., Alfonseca, E., and Castells, P. (2007).

Automatising the learning of lexical patterns: An ap-

plication to the enrichment of wordnet by extract-

ing semantic relationships from wikipedia. Data &

Knowledge Engineering, 61(3):484–499.

Stasko, J., G

org, C., and Liu, Z. (2008). Jigsaw: Support-

ing investigative analysis through interactive visual-

ization. Information Visualization, 7(2):118–132.

Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D.,

and Hysom, D. (2010). Coreference resolution with

reconcile. In Proceedings of the ACL 2010 Conference

Short Papers, pages 156–161. ACL.

Tanahashi, Y. and Ma, K.-L. (2012). Design considera-

tions for optimizing storyline visualizations. IEEE

Transactions on Visualization and Computer Graph-

ics, 18(12):2679–2688.

Van Ham, F., Wattenberg, M., and Viegas, F. (2009). Map-

ping text with phrase nets. IEEE Transactions on Visu-

alization and Computer Graphics, 15(6):1169–1176.

egas, F. B. and Wattenberg, M. (2008). Tag clouds and

the case for vernacular visualization. interactions,

15(4):49–52.

Vuillemot, R., Clement, T., Plaisant, C., and Kumar, A.

(2009). What’s being said near “martha”? exploring

name entities in literary text collections. In Proceed-

ings of the IEEE Symposium on Visual Analytics Sci-

ence and Technology, 2009, VAST ’09, pages 107–

114.

Wattenberg, M. and Viegas, F. (2008). The word tree, an in-

teractive visual concordance. IEEE Transactions on

Visualization and Computer Graphics, 14(6):1221–

1228.

IVAPP 2016 - International Conference on Information Visualization Theory and Applications