COMPUTATIONAL MODEL FOR PROBABILITY PREDICTION OF

SCAN PATHS IN STATIC SCENES

Yorie Nakahira

and Nakayama Minoru

Control and Systems Engineering, Tokyo Institute of Technology, Tokyo, Japan

Department of Human System Science, Tokyo Institute of Technology, Tokyo, Japan

Keywords:

Eye Movements, Scan Path, Fixation, Static Scenes, Probability Prediction.

Abstract:

We develop a computational model of scan paths when viewing static images. The proposed scan path model

generates a dynamic distribution of visual attention using multiple image processing algorithms based on

biological principles. The probability of any scan paths is computed from this distribution of visual attention

at each subsequent numbered ﬁxation. The validity of our model is tested using eye movement data. Our

results verify the possibility of conventionally infeasible modeling of the scan paths for static images.

1 INTRODUCTION

Despite the promise of scan path prediction (Robert

et al., 2003), a two-fold scan path prediction prob-

lem exists for static images. First, scan paths are

rarely coherent from person to person (Bohme et al.,

2004). The possible accuracy of any conventional

model which outputs a single probable scan path can-

not exceed the low coincidence level of scan paths

from different viewers (Privitera and Stark, 2000). In

addition, for the purpose of usability evaluation, it

is unreasonable to ignore many other possible scan

paths merely because they are not the most probable

ones. Second, processing an image using an algo-

rithm can only yield static parameters for visual con-

spicuousness, whereas temporal eye movements are

dynamic and so are their distribution. Therefore es-

timating dynamic change using static parameters is

self-limiting.

This paper suggests a novel way for scan path pre-

diction that is based on two premises; (1) a scan path

prediction model should yield the possibility (distri-

bution) of visual attention and scan paths instead of

the most probablescan path; (2) that the model should

incorporate several algorithms for eye movement pre-

diction to optimize its value over the temporal se-

quence of ﬁxations (subsequent numbered ﬁxations).

By assuming the computability of the statistical

distribution of scan paths, we compute the distribu-

tion of visual attention along the temporal sequence

of ﬁxations. In this way, variations in personal visual

behavior can be suitably expressed through statisti-

cal expression, instead of as aberrations. The com-

putation of the distribution of visual attention is quite

useful, as it makes a previously impossible prediction

possible for scan paths when static images are viewed.

A model of scan path prediction is proposed in Sec-

tion 2. Section 3 describes the experimental protocol

to acquire eye movement data. Section 4 presents the

experimental results. Section 5 summarize the study

presented here.

2 PROPOSED MODEL

2.1 Attention Distribution Prediction

The displayed image was ﬁrst labeled according to

each region of interest. Take ﬁgure 1 for example.

The pie-graph represents a data set with three compo-

nents, each labeled from 1 to 3. Conditions excluding

the above-mentioned state 1 to 3 are brief enough so

that they can be categorized as intervals during which

the state of visual attention changes.

Attention Distribution Ratio (ADR) is deﬁned as

the ratio of attention to each position on the image.

Each pixel value of an image is labeled from 1 to z.

The number of labels, z, is determined by the number

of objects in the image, or the number of positions

which represent different meanings. If the ratio of

subjects that ﬁxate at labels 1, 2 and 3 are 1 : 2 : 1 (ex-

perimental ADR), the ideal computational algorithms

for computed ADR are expected to generate a value

427

Nakahira Y. and Nakayama M..

COMPUTATIONAL MODEL FOR PROBABILITY PREDICTION OF SCAN PATHS IN STATIC SCENES.

DOI: 10.5220/0003847504270430

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 427-430

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Example of a scan path and its labeling. Each

pixel value of the image is labeled from 1 to 3. The cases

when a subject ﬁxates at the positions labeled 1, 2, 3 are

deﬁned as state 1, state 2, and state 3 respectively. The scan

path noted in this ﬁgure, for example, is ”1 → 3 → 2 → 2 →

1 → 3 → 2 → 1”.

close to ”state1 : state2 : state3 = 1 : 2 : 1”.

This algorithmically computed ADR is calculated

using the following three procedures.

I. Converting a image (static visual environment)

into maps using image processing algorithms

(IPAs) whose pixel values denote the visual con-

spicuousness, in other words the likelihood to be

viewed. Suggested IPAs are deﬁned in Section

3.3.

II. Each pixel value of the map is labeled from 1 to

z according to the meaning it represents. z is the

number of objects in the image. This labeling sim-

pliﬁes the large image size into a small number

of sections and makes the computational proce-

dure less demanding. For the experiments in this

study, an example of labeling is explained in Sec-

tion 5.1.1.

III. The visual conspicuousness represented by the

pixel value is summed up separately for each la-

beled region. The ratio of the summation values

of differently labeled regions is expected to de-

note the relative amounts of visual attention each

region is likely to receive. This quantitatively

expressed likelihood to be viewed is deﬁned as

cADR (computed Attention Distribution Ratio).

An important thing to note is that conventional

models usually only have procedure I, and produce

an output of only a single scan path. However, proce-

dures II and III are signiﬁcant because these are steps

that afford the model to take the personal difference

of the scan paths into consideration.

2.2 Computing Scan Path Probability

Since the distribution of visual attention changes over

time, it is unreasonable to expect a single IPA to yield

accurate prediction results over temporal number of

IPA

k (Subsequent number of fixations)

2 3

label 1

label 2

䞉

label z

cADR

IPA

䞉

Probability

P(n,z)

Figure 2: Computation of cADR to predict scan paths when

viewing a static image. First, the image is processed using

IPAs to yeild cADR for each subsequent numbered ﬁxation

that indicate the possibility of being state s (s = 1, · · ·, z).

The resultant cADR indicates the statistical distribution of

state s at kth ﬁxation (k = 1, · · ·, n) (section 3.1). P(n, z) is

deﬁned to be the probability of state z at the nth ﬁxation.

The probability of the scan path ”s

, · · ·, s

” can then be cal-

culated as P(1, s

) · · · P(n, s

) (section 3.2).

ﬁxations. Consequently, our model optimizes itself

over time; in other words, over the subsequent num-

bered ﬁxations. This optimization is achieved by us-

ing IPAs in a manner where each IPA is used only

when it is at highest accuracy.

The overview of our model for computing cADR

from a set of IPAs throughout the subsequent num-

bered ﬁxations is deﬁned in ﬁgure 2. P(k, s) is deﬁned

to be the computed probability of eye movement at

state s (subjects ﬁxating at the position labeled s ) at

kth ﬁxation. The P(k, s) is computed for kth ﬁxation

(k = 1, · · ··, n) from the IPA that are most suitable for

the kth ﬁxation. The biological principles represented

by IPAs include the intuitive tendencies of eye move-

ment such as ﬁxating at the centers of images, ﬁxat-

ing at the salient positions, and the task-oriented cog-

nitive model. The choice IPAs for each subsequent

numbered ﬁxation is constructed on a case by case

basis depending on the context under which viewers

are put. The possibility of the scan path ”state : s

→

→ s

→c→ s

” is computed as

P(1, s

) · P(2, s

) · P(3, s

) · · · ·P(n, s

) (1)

We used the following algorithms (IPAs) in the

new model; C: center-surround map, and D: Atten-

tion Distribution Map, and S: Saliency map (Iiit et al.,

1998). All three IPAs are based on the bottom-up im-

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

428

age based intuitive cognitive process or task-oriented

attention element.

C: A Center-Surround map was generated by drop-

ping an Gaussian kernel at the center of an im-

age. The half-height width for the Gaussian is

determined according to the area over which ﬁx-

ation can be said to exist. The biological back-

ground for the algorithms is based on the central

ﬁxation bias noted by previous studies (Buswell,

1935; Tatler, 2007).

D: An Attention Distribution Map combines the

saliency map from Itti’s model and a task rele-

vance map. For Itti’s model, the three conspicu-

ity maps are normalized and summed together at

an equal ratio, to become a saliency map (S).For

the task relevance map, under the context of the

images being graphs, the quantitative data it rep-

resents is used as task-relevance. For each section

labeled differently, the comparativeamount repre-

senting each different region is used in a manner

so that the larger the data, the higher the conspic-

uous value, the larger the cADR assigned to the

region.

3 EXPERIMENTAL METHOD

During the experiment, eye movements were mea-

sured using an eye tracker (nac: EMR-NL, 640x460

pixel resolution, 60Hz). Images of graphs were dis-

played on a computer screen for three seconds each,

followed by a one-second interval during which sub-

jects ﬁxate on a central cross. The coordinated po-

sitions on the images where subjects were looking

was recorded for later analysis. The subjects were

seated in front of a screen with their head secured to a

chin-rest structure. The viewing distance was approx-

imately 65 cm; the stimulus size was about 20 cm x

25 cm. In this paper, ﬁxation is deﬁned as a stable eye

position with a velocity below the threshold of 20 de-

grees per second (Robert et al., 2003), and a scan path

as the spatial arrangement of a sequence of ﬁxations.

45 different images of graphs were utilized. There

were ﬁve types of graphs with different design: two

types of bar graphs and three types of pie charts. For

each type of graph’s design, there were 9 graphs rep-

resenting 9 different sets of data. These 9 sets each

have three components. Six subjects (3 male, 3 fe-

male) were used. Each subject repeated this experi-

ments for 3 times.

4 RESULTS

4.1 Selection of IPAs for Each

Subsequent Numbered Fixation

The best IPAs for each subsequent numbered ﬁxa-

tion in the experimental situation are determined from

the strength of the correlational relationship between

eADR (experimental Attention Distribution Ratio)

and cADR (computed Attention Distribution Ratio)

from the 1st ﬁxation to the 7th ﬁxation. The cADR in

the regions labeled from 1 to 3 by the method in sec-

tion 5.1.1 are calculated for all images (45 images x

3 IPAs). The experimental Attention Distribution Ra-

tio (eADR) of each subsequent numbered ﬁxation on

each image in the three second experiment was calcu-

lated. Then, the correlation coefﬁcients between the

computational data (cADR) and the experimental data

(eADR) are computed from the 1st ﬁxation to the 7th

ﬁxation.

Figure 3 plots the dynamic changes in correla-

tional value (prediction accuracy) of IPAs from 1st

ﬁxation to the 7th ﬁxation. The ﬁgure 3 conﬁrms the

expectation that each IPA has its own peak at a differ-

ent subsequent numbered ﬁxations. This data implies

that the distribution of the scan path can be modeled

by C from the 1st to the 2nd ﬁxation, and by D from

the 3rd to the 5th. In this way, the dynamic changes

in the eADR over ﬁxation number are suitably incor-

porated in the model by the shift of the static cADR

computed by each IPA. The effectiveness of this is

apparent when correlation values are compared with

existing prediction model (S), which deﬁnes an abso-

lute single path by a single cADR.

The cADR computed from the above-mentioned

model changes its value over the sequence of ﬁxations

by shifting the IPA. Thus, the model for ADR predic-

tion can be deﬁned as follows: IPA C calculate P(k, s)

for k = 1, 2; IPA D calculate P(k, s) for k = 3, 4. The

calculation of the scan path probability is by equation

4.2 Accuracy of the Model

Table 1 compares the probability of the computation-

ally predicted scan paths (see equation 1) and the ex-

perimental probability when subjects are viewing a

particular image. There are 81 (3

) possible states

in total for the subsequent numbered ﬁxations from 1

to 4, The table shows 4 types of scan paths with the

highest computational probabilities. This result is en-

couraging because it suggests the validity of our pro-

posed method of predicting possibility of scan paths.

The predicted probability of each scan path and

COMPUTATIONAL MODEL FOR PROBABILITY PREDICTION OF SCAN PATHS IN STATIC SCENES

429

Figure 3: Changes in accuracy over the subsequent num-

bered ﬁxations. This ﬁgure plots the correlational coefﬁ-

cient between eADR and the cADR computed from IPA C,

D, or S on the y-axis, and the subsequent numbered ﬁxa-

tions on the x axis. This ﬁgure implies that cADR can be

optimized by predicting 1st to the 2nd ﬁxations by C, 3rd to

the 5th ﬁxations by D.

Table 1: Computationally predicted scan paths and their

experimental counterparts. This table compares the algo-

rithmically computed scan paths probability with its exper-

imental value.

Scan path Computed Experimental

2-3-1-1 0.335 0.389

2-2-3-2 0.166 0.056

2-2-2-3 0.166 0.167

2-2-2-2 0.082 0.111

the experimental probability of the scan path is com-

pared using the correlational coefﬁcient. Although

45 images, the correlational values is above 0.6 in 25

types of images, and above 0.4 in35 types of images.

We also used the correlational coefﬁcients be-

tween cADR and eADR as the index for the accuracy

of the computational prediction. The high correlation

between cADR and eADR means that the computa-

tionally predicted scan path probability could yield an

accurate output.

Figure 4 plots the correlational value between

cADR and eADR on the x axis and the variance of

cADR on y axis. It is suggested from ﬁgure 4 that

images with a high variance of cADR are generally

predicted accurately, while images with low variance

of cADR may not be reliably predicted. The biolog-

ical meaning of the variation in cADR is that visual

attention is likely to focus on a couple of labeled re-

gions, rather than all of the existing regions. There-

fore, variances in cADR values can be the index for

model accuracy.

5 SUMMARY

In summary, this paper proposed a novel method of

-0.5 0 0.5 1

Accuracy (correlation level)

Range of variance

0.73

Average

Figure 4: Accuracy of cADR vs the range of variance of

cADR. This ﬁgure illustrates the relationship between the

variance of cADR on the y axis and the correlation values

between cADR and eADR on the x axis for data sets from

1 to 9. For each set of data, the maximum and minimum

variance is denoted by ’*’, and the range of variance is de-

noted by vertical lines. The accuracy (correlation level) for

all images as a whole is 0.73.

scan path prediction. The computability of the dis-

tribution of ’idiosyncratic’ scan path is conﬁrmed.

The feasibility of the computationalprediction of scan

paths is validated by eye movement experiment. It is

suggested that the accuracy of the model can also be

estimated by quantitative parameters explained in this

study. The future direction would be to apply the scan

path calculation to a longer sequence of ﬁxations by

ﬁnding the IPAs that is applicable to each temporal

sequence of ﬁxations.

REFERENCES

Bohme, M., Dorr, M., Krause, C., Martinetz, T., and Barth,

E. (2004). Eye movement prediction of natural videos.

Neurocomputing, Vol 69, 16-18, 1996-2004.

Buswell, G. (1935). How people look at pictures: A study

of the psychology of perception in art. University of

Chicago Press, Chicago.

Iiit, L., Koch, C., and Niebur, E. (1998). A model of

saliency-based visual attention for rapid scene anal-

ysis. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence 20, 11, 1254-1259.

Privitera, C. and Stark, L. (2000). Algorithms for deﬁn-

ing visual regions-of-interest: Comparison with eye

ﬁxations. IEEE Transactions of Pattern Analysis and

Machine Intelligence, Vol 22(9), 970-982.

Robert, J., Jacob, K., and Karn, K. (2003). Eye tracking

in human-computer interaction and usability research:

ready to deliver promises. In The Mind’s Eye. Elsevier

Science BV.

Tatler, B. (2007). The central ﬁxation bias in scene viewing:

selecting an optimal viewing position independently

of motor biases and image feature distributions. Jour-

nal of vision 7(14):4, 1-7.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

430