A BROWSER EXTENSION

FOR PROVIDING VISUALLY IMPAIRED USERS

ACCESS TO THE CONTENT OF BAR CHARTS ON THE WEB

Stephanie Elzer, Edward Schwartz

Department of Computer Science, Millersville University, POB 1002, Millersville, PA 17551, USA

Sandra Carberry, Daniel Chester, Seniz Demir, Peng Wu

Department of Computer & Information Sciences, University of Delaware, 103 Smith Hall, Newark, DE 19716, USA

Keywords:

Accessibility, assistive technology, plan recognition, Bayesian networks, graph understanding.

Abstract:

This paper presents the SIGHT (Summarizing Information GrapHics Textually) system that enables visually

impaired users to gain access to information graphics that appear on a web page. The user interface is im-

plemented as a browser extension that is launched by a keystroke combination. The output of SIGHT is a

textual summary of the graphic, the core content of which is the hypothesized intended message of the graphic

designer. The textual summary of the graphic is then conveyed to the user by screen reading software (such as

JAWS). This approach has the beneﬁts of 1) not requiring any action on the part of the web page developer, and

2) providing the user with the message and knowledge that one would gain from viewing the graphic rather

than requiring the user to form a mental map of the graphic.

1 INTRODUCTION

Information graphics (non-pictorial graphics, such as

bar charts, line graphs and pie charts) are an impor-

tant component of many documents. Unfortunately,

these visual presentations present serious informa-

tion access challenges for visually impaired users.

Documents on the Web can be accessed by visually

impaired users through screen reading software that

reads the information on a computer screen using syn-

thesized speech. If the developer of the web page has

supplied alternative text (or “alt text”) for graphics in

the html, most screen readers will read this text to the

user. However, the vast majority of web pages are

developed without broad accessibility in mind, and

alt text is not supplied, thus making the content of

the document’s graphics inaccessible to a visually im-

paired user.

It has been our observation that the majority of

information graphics that appear in formal reports,

newspapers, and magazines are intended to convey a

message or communicative intention. We conducted

a corpus study (Carberry et al., 2006) whose primary

goal was to determine the extent to which the mes-

sage conveyed by an information graphic in a multi-

modal document is also conveyed by the document’s

text. We analyzed 100 randomly selected graphics

from our collected corpus of information graphics,

along with the articles in which they appeared. The

selected articles were taken from magazines (such as

Newsweek, Business Week, Fortune, and Time) and

local and national newspapers. In 39% of the in-

stances, the text was judged to fully or mostly con-

vey the message of the information graphic. How-

ever, in 26% of the instances, the text conveyed only

a little of the graphic’s message. Most surprising was

the observation that in 35% of the instances in our

analyzed corpus, the text failed to convey any of the

message. These ﬁndings make it crucial that mech-

anisms be developed for providing visually impaired

individuals with alternative access to the content of

information graphics.

A number of projects have attempted to make im-

ages accessible to visually impaired viewers by repro-

ducing the image in an alternative medium, such as

sound (Meijer, 1992; Alty and Rigas, 1998), touch

(Ina, 1996) or a combination of the two (Ramloll

et al., 2000; Yu et al., 2002; Kennel, 1996). One par-

ticularly interesting project is that of Yu and Brew-

ster (Yu et al., 2002). In this project, they investigate

the development and usefulness of web-based mul-

timodal graphs which use haptic devices and audi-

Elzer S., Schwartz E., Carberry S., Chester D., Demir S. and Wu P. (2007).

A BROWSER EXTENSION FOR PROVIDING VISUALLY IMPAIRED USERS ACCESS TO THE CONTENT OF BAR CHARTS ON THE WEB.

In Proceedings of the Third International Conference on Web Information Systems and Technologies - Web Interfaces and Applications, pages 59-66

DOI: 10.5220/0001274600590066

 SciTePress

tory output to communicate the contents of the graphs

to the users. The web pages containing the graph-

ics must be properly formatted with the coordination

of embedded haptic and audio features controlled by

Cascading Style Sheets (Yu et al., 2002). Although

the evaluation of their system does demonstrate the

usefulness of the approach when compared to tradi-

tional tactile diagrams, they note that the process of

preparing the graphics is laborious (Yu et al., 2002).

Aside from the use of sound and touch, there is

some research by Kurze (Kurze, 1995) that generates

text in a presentation tool used to convey the content

of a graphic. In this system, a verbal description of

the diagram’s properties, such as the style of the dia-

gram, the number of data sets, the labels of axes and

the ranges of axes, is output through a text-to-speech

device (Kurze, 1995).

However, all of these approaches require the user

to build a “mental map” of the diagram – a task that is

very difﬁcult for the congenitally blind because they

have no personal knowledge regarding the appearance

of information graphics (Kennel, 1996). In addition,

many of the other systems require 1) special equip-

ment or 2) preparation work (model creation) to be

done by a sighted individual. Consequently, existing

systems have not been successful in solving the acces-

sibility issue for visually impaired individuals. Thus

it is imperative that novel approaches be investigated.

Rather than providing alternative access to what

the graphic looks like or a listing of all of the data

points contained in the graphic, our SIGHT system

provides the user with the message and knowledge

that one would gain from viewing the graphic. We

hypothesize that the core message of an information

graphic (the primary overall message that the graphic

conveys) can serve as the basis for an effective tex-

tual summary of the graphic. For example, given the

graphic in Figure 1, rather than simply presenting the

heights of the bars and their values, SIGHT infers

that the graphic’s intended message is that there is

an increase in Jan ’99 in the dollar value of 6-month

growth in consumer revolving credit in contrast with

the decreasing trend from July ’97 to July ’98 and

conveys this message to the user via speech. We even-

tually envision SIGHT as an interactive natural lan-

guage system that will provide a richer textual sum-

mary, with the inferred message as the core content,

and that can respond to follow-up questions from the

user or requests for justiﬁcation of the system’s in-

ferred message.

The remainder of the paper presents our imple-

mented SIGHT system. First we present the browser

extension that we have constructed to enable visually

Graphic from BusinessWeek, April 5, 1999 issue.

6−MONTH GROWTH

IN CONSUMER

REVOLVING

CREDIT

CREDIT−CARD DEBT

LEAPS HIGHER

July ’97 Jan ’98 July ’98 Jan ’99

BILLIONS OF DOLLARS

Figure 1: Graphic with a Contrast Pt with Trend Message

impaired users to gain access to the information pro-

vided by information graphics that appear on a web

page. Next we outline how images are translated into

an XML representation of the graphic, and then we

present our overall methodology for inferring the in-

tended message of an information graphic. Note that

all of the examples presented in this paper have been

successfully run on the SIGHT system. We then con-

clude with a discussion of the future directions of our

work.

2 THE SIGHT SYSTEM

The architecture of our SIGHT system is shown in

Figure 2. The following subsections discuss the

various components with particular emphasis on the

browser extension which allows visually impaired

users to access textual summaries of information

graphics.

2.1 The Browser Extension

Because our target users are visually impaired, our

user interface cannot rely on any of the usual visual

mechanisms. Instead, the users must be able to launch

our application from the keyboard. We have imple-

mented our browser extension speciﬁcally for JAWS

7.10 (the most recently released version of JAWS)

and Internet Explorer (preferably version 6.0 or later).

JAWS, produced by Freedom Scientiﬁc, is the most

widely used screen reading software for Windows,

WEBIST 2007 - International Conference on Web Information Systems and Technologies

Analysis of Perceptual Task Effort

Intention Recognition Module

(IRM)

(APTE)

Visual Extraction Module

(VEM)

XML

augmented

(CTM)

Caption Tagging Module

Preprocessing and

image file

textual summary

Browser Helper Object

(BHO)

web

page

window containing

textual summary

Figure 2: SIGHT System Architecture.

and JAWS users are encouraged to use Microsoft’s

Internet Explorer for web browsing, so this pair of

applications should be ideal for a large portion of our

target users. However, the concepts applied here are

extensible to other implementation platforms, as ex-

plained in Section 2.2.3.

When navigating a web page, JAWS users have

many options. When the web page is initially opened,

JAWS begins reading the content of the web page,

from top to bottom. The actual content that JAWS

reads is highly conﬁgurable by the user, but typically

includes any text on the page, the screen text per-

taining to links and buttons, and the alternative text

associated with graphics. So that we do not con-

ﬂict with the existing navigation commands in JAWS,

we chose CONTROL+Z as the key combination for

launching our interface. If the user comes across a

bar chart during their navigation of a web page, they

can hit CONTROL+Z to launch our application and

receive a textual summary of the information con-

veyed by the bar chart. For example, if the user en-

countered the graphic shown in Figure 3, they could

hit CONTROL+Z and a new window containing the

summary of the graphic would appear. For this par-

ticular graphic, SIGHT produces the summary

This bar chart titled ’The notebook spiral’

shows that the dollar value of average laptop

prices fell from 2000 to 2003 and then falls

more slowly until 2007.

However, this type of interaction requires a very

tight coupling between our application and the web

browser, because our application needs to be able to

determine which graphic is currently in focus within

the web browser. We achieved the proper level of

integration by implementing our user interface as a

Browser Helper Object for Internet Explorer.

Browser Helper Objects are special add-on com-

ponents that enable the customization of Internet Ex-

Graphic from BusinessWeek, September 5, 2005.

’05 ’06 ’07’03 ’04’02’01’00

500

2500

2000

1500

1000

AVERAGE LAPTOP PRICES

DOLLARS

THE NOTEBOOK SPIRAL

Figure 3: Graphic with a Changing Trend Message

plorer (version 4.0 or later). BHOs are tied to the

main window of the browser and are created as soon

as a new browser window is created. BHOs are im-

plemented as in-process Component Object Model

(COM) components, and they run in the same process

space as the browser; this means that they can perform

virtually any action on the available windows and

modules. Our BHO hooks onto Internet Explorer’s

message loop and captures all of the keyboard events

within the browser, looking for the CONTROL+Z

combination. Upon detecting the CONTROL+Z com-

bination, the BHO queries the Document object of the

Internet Explorer instance to determine which object

is currently in focus within the browser. If the ob-

ject in focus appears to be a graphic containing a bar

chart, our system then attempts to infer the intended

message of the bar chart. The resultant textual sum-

mary is presented to the user in a new window. The

text in that new window is then read to the user by

JAWS, and the user can subsequently close the win-

dow using standard JAWS/Windows keystrokes (typ-

ically ALT+F4).

2.2 Implementation Issues

The procedure described above works well as long as

1) the object currently in focus in the browser is a bar

chart, and 2) the object in focus in the JAWS inter-

face is the same as the object in focus according to

the browser. This section addresses these issues.

2.2.1 Identifying Bar Charts

Since our BHO is operating within Internet Explorer,

and web pages contain many graphical elements that

are not bar charts, it is entirely possible that the user

A BROWSER EXTENSION FOR PROVIDING VISUALLY IMPAIRED USERS ACCESS TO THE CONTENT OF

BAR CHARTS ON THE WEB

may attempt to launch our system to process a graphic

that is not a bar chart. Therefore, when the CON-

TROL+Z keystroke is detected, the BHO runs a brief

image processing algorithm to determine whether the

selected graphic has particular attributes that identify

it as a possible bar chart, such as whether the graphic

has 20 or fewer gray levels, and whether the graphic

contains at least two rectangles with a common begin-

ning row or column. If the graphic does not appear

to be a bar chart, the message “The selected graphic

does not appear to be a bar chart,” is read to the user

by JAWS.

2.2.2 Focusing On Bar Charts

When browsing a web page in Internet Explorer,

JAWS users have the option of “tabbing” through the

content of the page. By repeatedly hitting the tab but-

ton, JAWS will traverse the elements on the page ac-

cording to the browser’s default tab order for that page

or the tab order speciﬁed in the html code. When the

user is traversing a web page in JAWS using the tab

keys, the focus in Internet Explorer is updated along

with JAWS’ focus. For graphics that are in the tab

order, JAWS will read the alt text (if any) associ-

ated with the graphic when the user traverses to the

graphic. If the user hits CONTROL+Z when JAWS

reaches the graphic, our BHO will be activated.

However, many graphics are not included in the

tab order of the page, since there is typically not an ac-

tion associated with them (such as a link to follow or

a text box to ﬁll in). To address this issue, when a web

page is opened in Internet Explorer, our BHO per-

forms a scan of all of the graphics on that page, per-

forming the bar chart detection logic described pre-

viously. If an image appears to be a bar chart, our

BHO will insert that graphic into the tab order of the

page, and will append “This graphic appears to be a

bar chart” to the alt text (if any). This pre-processing

of the web page ensures that the user will not acci-

dentally overlook bar charts that could be processed

by our system while traversing the page.

When a web page is opened in Internet Explorer

while JAWS is running, JAWS automatically begins

reading the content of the page from top to bottom

(this is sometimes called “say all” mode). It is gen-

erally more convenient for JAWS users to utilize the

“say all” mode rather than tabbing through the content

of the page. If the BHO’s pre-processing has detected

a potential bar chart, the user will be alerted to its

presence by JAWS reading the alt text, “This graphic

appears to be a bar chart,” that was inserted into the

document by our BHO. Unfortunately, while in the

“say all” mode, the focus in JAWS is not reﬂected

in Internet Explorer. This obviously poses a problem

since our BHO relies on the Document object in Inter-

net Explorer to identify the graphic in which the user

is interested. Fortunately, the latest release of JAWS

allows the user to set the application focus to the lo-

cation of the JAWS virtual cursor by entering CON-

TROL+INSERT+DELETE. After entering this while

JAWS’ virtual cursor has the graphic containing a bar

chart in focus, the user may then enter CONTROL+Z

to hear the summary of the bar chart.

2.2.3 Extensibility of the Browser Extension

While the current version of the user interface has

been designed speciﬁcally with JAWS and Internet

Explorer in mind, we expect similar solutions to work

for other applications. For example, extensions sim-

ilar to BHOs can be developed for Mozilla’s Firefox

browser using the Cross Platform Component Object

Model (XPCOM). Regarding the use of screen read-

ers other than JAWS, our BHO in Internet Explorer

will work with any screen reader; it is simply a mat-

ter of investigating how the focus of Internet Explorer

and the screen reading software interact and of ensur-

ing that the keystroke combination does not conﬂict

with existing screen reader functionality. For visually

impaired users who primarily use a screen magniﬁer

(such as ZoomText), rather than a screen reader, the

text produced by our BHO can be handled in the same

manner as text in any other application.

2.3 Processing the Image

Once the browser component of the SIGHT system

has detected that the user would like to access a partic-

ular graphic, it sends the image to the Visual Extrac-

tion Module. VEM is responsible for analyzing the

graphic’s image ﬁle and producing an XML represen-

tation containing information about the components

of the information graphic including the graphic type

(bar chart, pie chart, etc.) and the textual pieces of the

graphic (such as its caption). For a bar chart, the rep-

resentation includes the number of bars in the graph,

the labels of the axes, and information for each bar

such as the label, the height of the bar, the color of the

bar, and so forth (Chester and Elzer, 2005). This mod-

ule currently handles only electronic images produced

with a given set of fonts and no overlapping charac-

ters. In addition, the VEM currently assumes standard

placement of labels and axis headings. Work is under-

way to remove these restrictions. But even with these

restrictions removed, the VEM can assume that it is

dealing with a simple bar chart, and thus the problem

of recognizing the entities in a graphic is much more

constrained than typical computer vision problems.

WEBIST 2007 - International Conference on Web Information Systems and Technologies

The XML representation is then passed to the Pre-

processing and Caption Tagging Module (CTM). The

preprocessing augments the xml with salience infor-

mation such as a bar that is colored differently from

other bars in the graphic or a bar that has an anno-

tation when the other bars do not. The caption tag-

ging extracts information from the caption (discussed

later) and then passes the augmented XML repre-

sentation to the intention recognition module (IRM),

which is described in the next section.

2.4 Irm: A Bayesian Inference System

The IRM is responsible for recognizing the intended

message of the information graphic, which we hy-

pothesize can serve as the basis for an effective sum-

mary of the graphic. While the browser extension will

work for any type of information graphic, the scope

of the work currently implemented for the image pro-

cessing and IRM components is limited to the pro-

cessing of simple bar charts. By simple bar charts, we

mean bar charts that display the values of a single in-

dependent attribute and the corresponding values for a

single dependent attribute. Although our system cur-

rently handles only simple bar charts, we believe that

our methodology is broadly applicable and extensible

to other types of graphics.

2.4.1 Communicative Signals

We view information graphics that appear in popular

media as a form of language with a communicative

intention. The IRM reasons about the communica-

tive signals present in a graphic in order to identify

its intended message. We have identiﬁed three kinds

of communicative signals that appear in simple bar

charts.

Our ﬁrst communicative signal is the relative ef-

fort required for different perceptual and cognitive

tasks.

Here we are extending a hypothesis of the Au-

toBrief group (Kerpedjiev and Roth, 2000). The Au-

toBrief project was concerned with generating infor-

mation graphics, and they hypothesized that a graphic

designer chooses a design that best facilitates the per-

ceptual and cognitive tasks that a viewer will need to

perform on the graphic. Thus given a graphic, our hy-

pothesis is that the relative difﬁculty of different per-

ceptual tasks serves as a signal about which tasks the

By perceptual tasks(Kerpedjiev and Roth, 2000) we

mean tasks that are performed by viewing the graphic, such

as comparing the heights of two bars; by cognitive tasks, we

mean tasks that require a mental computation such as inter-

polating between two labelled values on the dependent axis

in order to determine the value represented by a bar whose

top is not aligned with a labelled value.

viewer was expected to perform in deciphering the

graphic’s message. This correlates with Larkin and

Simon’s (Larkin and Simon, 1987) observation that

graphics that are informationally equivalent are not

necessarily computationally equivalent — for exam-

ple, if a set of bars are arranged in order of increasing

height, then it will be much easier to identify the rank

of an individual bar than if the bars were arranged in

alphabetical order of their label.

To rank tasks in terms of effort, we constructed a

set of effort estimation rules. Each rule represents a

task that can be performed on a graphic and is com-

prised of a set of condition-computation pairs: each

condition captures possible features of the graphic

(such as whether the top of a bar is aligned with a la-

belled tick mark on the dependent axis), and the com-

putation captures the effort required for the task given

that set of graphic features. The effort computations

are based on work by cognitive psychologists, and our

effort estimation rules have been validated via a set of

eye-tracking experiments (Elzer et al., 2006).

A second communicative signal is the salience of

entities in the graphic. Our system recognizes a num-

ber of ways in which an entity in a bar chart becomes

salient. These include coloring or annotating a bar

differently from the other bars in a bar chart (as is the

case for the bar labelled CBS in Figure 4), since these

design strategies draw attention to the bar. It also in-

cludes mentioning a bar’s label in the caption, since

this also draws attention to the bar. The preprocessor

and caption tagging module are responsible for identi-

fying salient entities in the graphic. The preprocessor

analyzes the XML representation of the graphic and

augments it to indicate entities that are salient due to

graphic design decisions. The Caption Tagging Mod-

ule uses a part-of-speech tagger to identify nouns in

the caption, and then it augments the XML represen-

tation to indicate any bars that match a noun in the

caption.

A third communicative signal is the presence of

certain verbs and adjectives in a caption. In (Elzer

et al., 2005a) we present a corpus study showing that

(1) captions are often very general or uninformative,

and (2) even when captions convey something about

the graphic’s intended message, the caption is often

ill-formed or requires extensive analogical reasoning.

Thus we have chosen to perform a shallow analysis of

captions that extracts communicative signals but does

not attempt to understand the caption. For example,

the verb lag in the caption “American Express’ to-

tal billings still lag” suggests a message about an en-

tity’s rank with respect to some measure. Similarly,

we found that nouns derived from verbs, such as rise

in the caption “Cable on the Rise”, and adjectives

A BROWSER EXTENSION FOR PROVIDING VISUALLY IMPAIRED USERS ACCESS TO THE CONTENT OF

BAR CHARTS ON THE WEB

also suggest the general category of message. Using

WordNet and a thesaurus, we identiﬁed verbs and ad-

jectives that were similar in meaning and might sig-

nal one or more categories of message and organized

them into verb classes. The Caption Tagging Mod-

ule uses a part-of-speech tagger and a stemmer to an-

alyze captions and extract nouns, adjectives, and the

root form of verbs, adjectives, and nouns derived from

verbs, and further augments the XML representation

of the graphic to indicate the presence of one of our

identiﬁed verb or adjective classes in the caption.

2.4.2 Reasoning About the Graphic’s Intention

Having developed a means of extracting the evidence

provided by the communicative signals in a graphic,

we need to use this evidence in reasoning about the

intended message of an information graphic. For this

purpose, we dynamically construct a Bayesian net-

work for each new information graphic. The top level

of the network captures the various categories of mes-

sages that can be conveyed by a bar chart, such as con-

veying a change in trend (Change-Trend), conveying

the rank of an entity in a bar chart (Get-Rank), com-

paring two entities (Relative-Difference), etc. Below

each category of message are nodes that capture the

different possible instantiations of that message cate-

gory. For example, if a graphic has ﬁve bars as in Fig-

ure 4, then the children of the Get-Rank node would

be Get-Rank(BAR1,LABEL1), . . ., Get-Rank(BAR5,

LABEL5).

We use plan operators to specify how a graphic de-

signer’s communicative goal can be achieved via the

viewer performing certain perceptual and cognitive

tasks. For example, consider the goal of the viewer

getting the rank of a bar, given that the bar is salient,

as in Figure 4. The operator for achieving this goal

decomposes the goal into three subgoals: perceiving

whether the bars are sorted in order of height, per-

ceiving (ie., ﬁnding) the label associated with the bar,

and perceiving the rank of that bar with respect to bar

height. Each subgoal in an operator is either a prim-

itive with an associated effort rule or has an operator

that decomposes it into a set of simpler subgoals. The

operators determine the structure of our Bayesian net-

work, in that subgoals in an operator become children

of their goal node in the Bayesian network. For exam-

ple, Figure 5 displays the piece of the Bayesian net-

work produced by the Get-Rank operator. The entire

network is built dynamically for a new graphic.

Graphic from BusinessWeek, April 5, 1999. Note that

in its original form, the graph was not a simple bar chart,

because there was a secondary value (average age of values)

also displayed on the bars, so it has been adapted to display

only a single dependent value.

120

160

200

Average Price of Ad

WBFOXABC

NBC

Advertisers Pay

More for Youth

THOUSANDS OF DOLLARS

CBS

Figure 4: Graphic with a Get-Rank Message

Perceive−rank(_bar)Perceive−if−bars−are−sorted

Get−Rank(_bar,_label)

Perceive−label(_bar,_label)

Figure 5: A Piece of Network Structure.

Evidence nodes must be added to the Bayesian

network to reﬂect the evidence provided by the infor-

mation graphic about its intended message. Evidence

nodes reﬂecting the amount of effort required for a

perceptual task (categorized as low, medium, high,

or impossible) and evidence nodes reﬂecting whether

a parameter of a perceptual task is salient (via high-

lighting, annotating it, etc.) are attached to percep-

tual task nodes. For example, consider the graphic in

Figure 4, the piece of network structure shown in Fig-

ure 5, and the evidence nodes that would be attached

to the instantiated perceptual task node Perceive-

rank(BAR4). The effort evidence node would indi-

cate that little effort is required for this task since the

bars are sorted according to height. The highlighting

evidence node would indicate that the instantiated pa-

rameter BAR4 is highlighted in the graphic. The an-

notation and noun-in-caption evidence nodes would

indicate respectively that no bars have special anno-

tations and that none of the bar labels are part of the

graphic’s caption. Evidence nodes reﬂecting the pres-

ence of one of our verb or adjective classes in the cap-

tion provide evidence for a general category of mes-

sage and thus are attached to the top level node in the

network.

Associated with each child node in a Bayesian

network is a conditional probability table that gives

the conditional probability for each value of the child

node given the value of the parent node. In our

WEBIST 2007 - International Conference on Web Information Systems and Technologies

Bayesian network, the value of the parent node is ei-

ther that it is or is not part of the plan that the graphic

designer has for the viewer to deduce the graphic’s

message. The conditional probability tables for net-

work nodes are learned from our corpus of graphics.

Once the network with its evidence nodes is built,

the probabilities propagate through the network to hy-

pothesize the intended message of the graphic. For

the graphic in Figure 4, SIGHT infers that the graphic

is conveying the rank of CBS and produces the nat-

ural language “This bar chart titled ‘Advertisers pay

more for youth’ shows that CBS has the second lowest

rank in terms of the dollar value of average price of

Ad compared with NBC, ABC, FOX, and WB.”

2.4.3 Evaluation

The performance of the message inference within our

SIGHT system has been evaluated in two ways. First,

using a corpus of 110 simple bar charts that had pre-

viously been annotated with their primary message by

two human coders, we evaluated our approach using

leave-one-out cross validation in which each graphic

is selected once as the test graphic, and the other 109

graphics are used to compute the conditional proba-

bility tables for the Bayesian network. We viewed

the system as successful in recognizing the graphic’s

message if its top-rated hypothesis matched the mes-

sage assigned by the coders and the system-assigned

probability for the hypothesis exceeded 50%. The

system’s overall success rate, 79.1%, is the average

of the results of all 110 experiments.

We also performed a qualitative evaluation in or-

der to determine whether the intentions being inferred

by our system would meet the approval of users. Sev-

enteen human subjects rated a posited message for

each of 27 bar charts. In 20 of the 27 cases, the

posited message matched the message inferred by our

system. Subjects rated the messages from 0 (strongly

disagree) to 4 (strongly agree). For the 20 messages

matching the system’s output, the rating was 3.33

with a standard deviation of 1.02 and a 95% conﬁ-

dence interval of .108, whereas for the 7 messages

that differed from the system’s output, the rating was

only 1.19 with a standard deviation of 1.46 and a 95%

conﬁdence interval of .261.

From these evaluations, we conclude that our sys-

tem has a high degree of success at recognizing the

primary message of a simple bar chart, and that using

the recognized message as the basis for a graphic’s

summary should produce summaries that would be

satisfactory to a majority of users.

For further detail on the Bayesian network, see (Elzer

et al., 2005b)

2.5 Generating the Summary

Once the intended message has been inferred by our

Bayesian inference system, it is used as the core con-

tent of a textual summary of the graphic. One of the

most challenging aspects of generating coherent nat-

ural language has been determining the full label for

the measurement (or value) axis. In examining our

corpus of bar charts, taken from a variety of maga-

zines and newspapers, we have found that the mea-

surement axis label might be very abbreviated and

that full rendering of the label often requires extrac-

tion of words from text within the graphic (such as

AVERAGE LAPTOP PRICES in Figure 3) or from the

caption and/or second-tier descriptive text below the

caption. We have constructed a set of heuristics for

ranking the graphic’s components in terms of where

to look for the measurement axis label and how to

extract it from these textual pieces. Other heuristics

augment the wording of the label. For example, one

heuristic states that if the graphic’s text contains a sin-

gle proper noun that does not match the label of a

bar, then the measurement axis label should gener-

ally be preceded with that proper noun in possessive

case. Consider, for example, the graphic in Figure 6.

Here the measurement axis is labelled as Percentage

of unauthorized workers and the unit of measurement

is also captured by the % sign after the annotated val-

ues, but “Percentage of unauthorized workers” must

be preceded with the proper noun “United States” in

order to fully describe what is being measured. The

natural language generation component of our sys-

tem (to be presented in a separate paper) uses a set

of heuristics to generate a complete rendering of the

measurement axis label, which is then used in tem-

plates to generate the appropriate wording for the par-

ticular category of inferred message. For example,

SIGHT hypothesizes that the graphic in Figure 6 is

conveying a Rank-of-all message and generates the

textual summary “This bar chart titled ’Workers with-

out papers’ compares the entities Farming, Cleaning,

Construction, and Food preparation with respect to

United States’s percentage of unauthorized workers.”

3 CONCLUSION

SIGHT has been implemented and tested for simple

bar charts. We eventually envision SIGHT as an in-

teractive natural language system which infers the in-

tended message of an information graphic, provides

a summary that includes the intended message along

with notable features of the graphic, and then re-

sponds to follow-up questions from the user. These

A BROWSER EXTENSION FOR PROVIDING VISUALLY IMPAIRED USERS ACCESS TO THE CONTENT OF

BAR CHARTS ON THE WEB

Percentage of

unauthorized workers

24%

Farming

Cleaning

Construction

Food preparation

17%

14%

12%

Workers without papers

Industries that require manual labor

and little formal education draw heavily on illegal immigrants’ labor.

About 5% of workers in the United

States are illegal immigrants.

Figure 6: Graphic with a Rank of All Message

additional features would obviously make for a richer

and more robust user interface. We are also working

to make the image processing in VEM more robust,

and to extend the SIGHT system to other kinds of in-

formation graphics such as line graphs and pie charts,

and to complex graphics, such as grouped bar charts.

Information graphics are an important part of

many documents available on the world-wide web,

yet they are largely inaccessible to visually impaired

users. This paper has presented a novel implemented

interface that enables visually impaired users to gain

access to the information provided by simple bar

charts that appear on a web page. Our approach of

presenting the message conveyed by the information

graphic, rather than rendering the graphic in a differ-

ent medium, has signiﬁcant advantages — it provides

the user with easy access to the communicative intent

of the graphic, and does not require specialized hard-

ware or for the user to construct a mental map of the

graphic. Moreover, our system does not require any

action on the part of the web page developer.

REFERENCES

Alty, J. L. and Rigas, D. (1998). Communicating graphical

information to blind users using music: The role of

context. In Proc. of CHI-98, Human Factors in Com-

puter Systems, p. 574–581, Los Angeles. ACM Press.

Carberry, S., Elzer, S., and Demir, S. (2006). Information

graphics: An untapped resource for digital libraries.

In Proc. of SIGIR 2006, Seattle, WA.

Graphic from USA Today, July 11, 2006 issue.

Chester, D. and Elzer, S. (2005). Getting computers to see

information graphics so users do not have to. In Proc.

of the 15th Int’l Symposium on Methodologies for In-

telligent Systems, LNAI 3488, p. 660–668. Springer-

Verlag.

Elzer, S., Carberry, S., Chester, D., Demir, S., Green, N.,

Zukerman, I., and Trnka, K. (2005a). Exploring and

exploiting the limited utility of captions in recognizing

intention in information graphics. In Proc. of the 43rd

Annual Meeting of the ACL, p. 223–230.

Elzer, S., Carberry, S., Zukerman, I., Chester, D., Green,

N., and Demir, S. (2005b). A probabilistic framework

for recognizing intention in information graphics. In

Proc. of IJCAI, p. 1042–1047.

Elzer, S., Green, N., Carberry, S., and Hoffman, J. (2006).

A model of perceptual task effort for bar charts and

its role in recognizing intention. User Modeling and

User-Adapted Interaction, 16(1):1–30.

Ina, S. (1996). Computer graphics for the blind. ACM SIG-

CAPH Computers and the Physically Handicapped,

55:16–23.

Kennel, A. R. (1996). Audiograf: A diagram-reader for the

blind. In Second Annual ACM Conference on Assistive

Technologies, p. 51–56.

Kerpedjiev, S. and Roth, S. (2000). Mapping communica-

tive goals into conceptual tasks to generate graphics

in discourse. In Proc. of the Int’l Conf. on Intelligent

User Interfaces, p. 60–67.

Kurze, M. (1995). Giving blind people access to graph-

ics (example: Business graphics). In Proc. Software-

Ergonomie ’95 Workshop Nicht-visuelle graphische

Benutzungsoberﬂchen, Darmstadt, Germany.

Larkin, J. and Simon, H. (1987). Why a diagram is (some-

times) worth ten thousand words. Cognitive Science,

11:65–99.

Meijer, P. B. (1992). An experimental system for auditory

image representations. IEEE Transactions on Biomed-

ical Engineering, 39(2):112–121.

Ramloll, R., Yu, W., Brewster, S., Riedel, B., Murton, M.,

and Dimigen, G. (2000). Constructing soniﬁed haptic

line graphs for the blind student: First steps. In Proc.

of ASSETS 2000, p. 17–25, Arlington, VA.

Yu, W., Reid, D., and Brewster, S. (2002). Web-based mul-

timodal graphs for visually impaired people. In Proc.

of the 1st Cambridge Workshop on Universal Access

and Assistive Techn., p. 97–108.

WEBIST 2007 - International Conference on Web Information Systems and Technologies