IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis

Model for Target Selection Tasks

Mohsen Parisay

, Charalambos Poullis

and Marta Kersten-Oertel

1,2

Department of Computer Science, Concordia University, Montreal, Canada

PERFORM Center, Montreal, Canada

Keywords:

Fitts’ Law, Eye Tracking, NASA TLX, Midas Touch.

Abstract:

Fitts’ law is a prediction model to measure the difﬁculty level of target selection for pointing devices. However,

emerging devices and interaction techniques require more ﬂexible parameters to adopt the original Fitts’ law to

new circumstances and case scenarios. We propose Index of Difﬁculty for Eye tracking Applications (IDEA)

which integrates Fitts’ law with users’ feedback from the NASA TLX to measure the difﬁculty of target

selection. The COVID-19 pandemic has shown the necessity of contact-free interactions on public and shared

devices, thus in this work, we aim to propose a model for evaluating contact-free interaction techniques, which

can accurately measure the difﬁculty of eye tracking applications and can be adapted to children, users with

disabilities, and elderly without requiring the acquisition of physiological sensory data. We tested the IDEA

model using data from a three-part user study with 33 participants that compared two eye tracking selection

techniques, dwell-time, and a multi-modal eye tracking technique using voice commands.

1 INTRODUCTION

In this paper we introduce IDEA: Index of Difﬁculty

for Eye tracking Applications, an integrated predic-

tion model of task workload and performance of tar-

get selection tasks. The IDEA model combines the

effective contact-free target selection of eye tracking

with direct feedback of user’s experience obtained

from the NASA TLX scores. The IDEA model calcu-

lates a prediction index value based on objective tech-

nical speciﬁcations such as the target’s size and dis-

tance, and subjective measures from the NASA TLX

questionnaire obtained from user studies. To demon-

strate the efﬁcacy of IDEA, we measured target se-

lection performance with data from three user studies

that compared two eye tracking interaction techniques

(dwell-time, and selection by voice commands) and

showed that our predictions correlate with throughput

and movement time of the Fitts’ prediction model.

1.1 Fitts’ Law

Paul Morris Fitts introduced a mathematical predic-

tion model to measure the difﬁculty level of target

selection in 1954 (Fitts, 1954). This model, which

has been extensively applied in user study interface

evaluations (Gori et al., 2017), correlates the required

movement time (MT) to activate a target with a spe-

ciﬁc size (W), at a certain distance (D). Fitts’ Law is

formulated as: MT = a + b . ID, and ID = log

(

)

where ID denotes the index of difﬁculty, and a and b

are empirically deﬁned constant values. In the ﬁeld

of HCI, the Shannon formulation is most commonly

used to calculate the index of difﬁculty, ID = log

(1+

) as described in (MacKenzie, 1989). Fitts’ law

has been applied effectively in numerous user stud-

ies to analyse the performance of selecting speciﬁc

targets such as buttons (e.g. (Crossman and Goodeve,

1983), (Keele and Posner, 1968)). One of the earli-

est applications of Fitts’ law in HCI was to compare

four devices (mouse, joystick, step keys and text keys)

for text selection on a monitor (Card et al., 1978).

Researchers have also proposed variations to extend

the original Fitts’ law, for example MacKenzie et al.

(MacKenzie and Buxton, 1992) extended Fitts’ law

from a one-dimension to a 2D model for target acqui-

sition tasks to improve the accuracy of the index of

difﬁculty measure for interactive computer systems.

1.2 Cognitive Workload

Cognitive workload refers to the amount of men-

tal effort used to perform a task by a person. The

NASA Task Load Index (TLX) questionnaire is a

Parisay, M., Poullis, C. and Kersten-Oertel, M.

IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis Model for Target Selection Tasks.

DOI: 10.5220/0010195701350144

In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 2: HUCAPP, pages

135-144

ISBN: 978-989-758-488-6

135

well-known method to measure subjective workload

in user studies (Hart and Staveland, 1988) and has

been shown to be an effective tool to measure cog-

nitive workload (Ruiz-Rabelo et al., 2015). The ques-

tionnaire includes: physical demand, mental demand,

temporal demand, effort, performance, and frustra-

tion with the maximum range of 100 points (Group,

1986). Although there is physiological data (e.g. elec-

troencephalogram or EEG) which can be used to mea-

sure subjects’ workload, these methods although ac-

curate in detecting brain activity require specialized

and sometimes cumbersome equipment. In addition,

these techniques are intrusive for users and therefore

are restricted to controlled environments such as labo-

ratories (Zagermann et al., 2016). Thus in our model,

we focus on the NASA TLX.

1.3 Midas Touch Problem

Eye tracking, like many emerging technologies, has

its challenges. The Midas touch problem which refers

to unintended activation of functions by eye gaze to

select a target is one of the major challenges to be

considered when dealing with eye tracking applica-

tions. According to Jacob (1990), this problem oc-

curs since the eyes are used to look around an ob-

ject or to scan a scene, often without any intention

to activate a command or function. Thus, numer-

ous research has focused on solving the Midas touch

problem for gaze-based interactions (e.g. (Pi and Shi,

2017), (Velichkovsky et al., 2014), (Velloso et al.,

2016), and (Schenk et al., 2017)).

2 RELATED WORK

Both Fitts’ law and the NASA TLX are popular tools

for user studies. Felton et al. applied these tools

to study mental workload during brain-computer in-

teractions (Felton et al., 2012). Kim et al. applied

Fitts’ law in a driving safety simulation to analyze

the usability of touch-key sizes (Kim et al., 2014).

Hansen et al. made use of Fitts’ law to analyze the

performance of gaze and head tracking for point and

selection tasks when using head-mounted displays

(HMDs) (Hansen et al., 2018). In addition, Fitts’ law

was applied to reduce dwell-time for gaze-based se-

lection techniques by considering the estimated tar-

get acquisition time and the actual eye movement

time (Isomoto et al., 2018). Researchers have in-

vestigated the relation between eye blinks and mental

workload among surgeons (Zheng et al., 2012), ﬁnd-

ing that shorter blink duration and frequency indicate

an increase of mental workload (Zheng et al., 2012).

Borghini et al. studied brain activity and heart rate of

car drivers and also found shorter blink rates correlate

with mental workload (Borghini et al., 2012). Lan-

thier et al. studied the correlation between ﬁxations

and eye fatigue during visual search tasks and found

that ﬁxation duration increases with fatigue (Lanthier

et al., 2013). Abdulin et al. showed that the distance

drift of ﬁxation points in response to a stimuli can re-

veal physical eye fatigue (Abdulin and Komogortsev,

2015) and calculated this using the ﬁxation qualita-

tive score (FQlS) (Komogortsev et al., 2010). Another

study looked at developing a metric based on ﬁxation

points and the NASA TLX to determine the possibil-

ity of eye fatigue in gaze-based interactions (Parisay

et al., 2020). There are also approaches to measure

eye fatigue based on saccades, however, analysis of

saccades requires expensive eye trackers, and these

approaches are not applicable on budget-friendly de-

vices (Abdulin and Komogortsev, 2015), such as the

one used in our study.

Building on previous work, we propose a non-

invasive approach which can be applied on any remote

eye trackers without the need of raw data analysis of

the speciﬁc eye tracking sensors. We apply eye track-

ing for target selection from a safe distance and assess

the difﬁculty levels including subjects’ ratings inde-

pendent from device abilities or tracking techniques.

The primary purposes of IDEA are (1) to compare

different eye tracking applications, and (2) to enable

adaptation of eye tracking applications on different

user groups such as children, users with disabilities,

and the elderly. Furthermore, IDEA has the potential

to be applied for eye fatigue assessment, and stress

level measures based on target selection tasks. To the

best of our knowledge, there are no models that in-

tegrate the index of difﬁculty of the Fitts’ law (ID)

and the NASA TLX scores for eye tracking applica-

tions without the need of technical parameters such as

blink rates, ﬁxation duration time, average number of

ﬁxations, and saccade duration.

3 INDEX OF DIFFICULTY (IDEA)

Users’ perceived rating is one of the most valuable

sources of data in any user study and the NASA TLX

questionnaire is a valid tool for this purpose. On the

other hand, Fitts’ law can reﬂect the difﬁculty and

performance of target selection tasks based on test

speciﬁcations. Therefore, we integrated users’ feed-

back into the Fitts’ law model to result in a combined

value reﬂecting both technical and experimental as-

pects of target selection tasks for eye tracking appli-

cations. In addition, the entire workload of a task

HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications

136

(subjective rating) can be modulated by a selection

ratio parameter (selection distance divided by screen

diameter) which is determined based on test condi-

tions, users’ ability to select targets, and interaction

techniques. The purpose of modulating the techni-

cal factor with the experimental factor is to combine

the importance of both into a single index value. In

other words, the multiplication combines both, tech-

nical aspects which are bound to case scenarios, with

subjective understanding of the actual functions. This

results in a single value for comparison. Thus, the

IDEA analysis model is a novel simple-to-calculate

compound model for eye tracking techniques based

on the Fitts’ law (Fitts, 1954) and the NASA TLX

questionnaire (Group, 1986) to measure the difﬁculty

of target selection tasks. IDEA is device-independent

and can be applied on any eye tracker, and depends

on the following parameters:

• All scores from the NASA TLX questionnaire:

physical demand (PD), mental demand (MD), tem-

poral demand (TD), effort (E), performance (P),

and frustration (F).

• Diameter of screen (D): represents the longest dis-

tance on screen D =

+ y

where x and y repre-

sent screen width and height.

• Selection ratio (S): represents the difﬁculty of tar-

get selection (distance to target) in regards to the

screen diameter (see Figure 1).

• DISTANCES: the set of target distances from each

other.

• WIDTHS: the set of target sizes (widths).

The conditions and range of each of the parameters

are given by:

1. ∀ a ∈ {PD,MD,TD,E, P,F} : a ∈ Z ∧1 ≤ a ≤ 100

All NASA TLX scores are integers in the range of

1 to 100.

2. D ∈ Z ∧ D > 0

Diameter of screen is an integer value greater than

0 in pixels.

3. r ∈ R ∧ 0 ≤ r ≤ D

The distance to target (r) is a real number between

0 and screen diameter in pixels (see Figure 1a).

4. S =

r+1

∧ S ∈ R ∧ S > 0

Selection ratio (S) is the ratio of distance to target

(r) over diameter of the screen (D). The constant

value of 1 added to the equation to avoid the 0 case

for distance to target (see Figure 1b).

5. DISTANCES = {d | d ∈ R ∧ d > 0}

DISTANCES is the set of real numbers containing

distances of targets from each other greater than 0.

6. W IDT HS = {w | w ∈ R ∧ w > 0}

WIDTHS is the set of real numbers containing

widths (sizes) of targets greater than 0.

7. m =

W IDT HS

∧ m ≥ 1

m is the count of members in the W IDT HS set

greater than or equal to 1.

8. n =

DISTANCES

∧ n ≥ 1

n is the count of members in the DISTANCES set

greater than or equal to 1.

9. Technical Factor ∈ R ∧ Technical Factor > 0

The technical factor (Equation 1) is the sum of all

distances (d ∈ DISTANCES) doubled and divided

by the width values (w ∈ W IDT HS) derived from

the Fitts’ law (Fitts, 1954). This results in a real

number greater than 0 which resembles the index

of difﬁculty of the Fitts’ law ID = log

. The

technical factor represents the precondition of tar-

get properties (distances and widths).

Technical Factor =

∑

i=1

∑

j=1

(1)

10. R ∈ R ∧ 1 ≤ R ≤ 100

The subjective rating (R) is the mean of all TLX

scores which is a real number between 1 and 100

shown in Equation 2.

R =

PD + MD + T D + E + P + F

(2)

11. Experimental Factor ∈ R ∧

Experimental Factor > 1

The experimental factor (Equation 3) is deﬁned

as the product of the calculated selection ratio (S)

depicted in Figure 1, and the subjective rating (R)

which results in a real number greater than 1.

Experimental Factor = S × R (3)

12. IDEA ∈ R ∧ IDEA > 1

The proposed index of difﬁculty for eye tracking

applications (IDEA) is calculated by multiplying

(a) the technical factor, and (b) the experimental

factor offset by a constant value of 2 which results

in a real number greater than 1 (Equation 4). We

offset both technical and experimental factors by

the constant value of 2 in case these factors are

close to zero, therefore the calculated IDEA value

starts from 1.x. Figure 2 shows the 3D visualiza-

tion of IDEA and its factors.

IDEA = log



(

∑

i=1

∑

j=1

)

| {z }

Tec. Fac.

× S × R

|{z}

Exp. Fac.



(4)

IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis Model for Target Selection Tasks

137

(a)

(b)

Figure 1: (a) overview of the dart-test to measure Euclidean

distance, and (b) the concept of selection ratio regarding

diameter of screen (D) and the selection distance (r).

Figure 2: 3D illustration of the IDEA model.

4 METHODOLOGY

We conducted a three-part repeated measures user

study to evaluate the efﬁcacy of our proposed model

with 33 participants (20 male, from 22 to 35 years

old, mean = 26.06). Subjects were asked to navigate

and select highlighted targets (see Figure 4) under

two gaze-based interaction techniques: (1) dwell-time

with 500 ms threshold, and (2) eye tracking using

voice commands. Prior to running the experiments,

participants were informed about the objectives of the

user study, trained on each of the interaction tech-

niques, and ﬁlled out a pre-test questionnaire. Before

running the tests, the built-in eye tracking software

was used to calibrate eye positions for each partic-

ipant. The order of interaction techniques was ran-

domly selected for each participant. Overall, the user

studies took 8 minutes on average for a participant to

ﬁnish. At the end of the two experiments measuring

the Fitts’ law parameters (Figure 4) participants were

asked to ﬁll out a post-test questionnaire consisting of

the NASA TLX questionnaire.

4.1 Interaction Techniques

We applied two eye tracking techniques (single and

multi-modal interactions) to evaluate the efﬁcacy of

our proposed model. We ran the mentioned interac-

tion techniques on an Intel i7 PC with the 64-bit Win-

dows operating system. Figure 3 illustrates the test

setting and overview of the interaction techniques.

4.1.1 Dwell-time

The dwell-time method can select a target only by eye

gaze ﬁxations after a predeﬁned threshold is reached.

We deﬁned the target selection threshold to 500 mil-

liseconds which is in the typically accepted range of

300-1100 milliseconds (

Spakov and Miniotas, 2004),

and has been shown to be the best-suited threshold for

the dwell-time method (MacKenzie, 2012), (

Spakov

and Miniotas, 2004). In other words, when a subject

focuses for 0.5 seconds on a target it gets selected, and

any gaze movement from the target boundaries prior

to that threshold causes the restart of target selection

process.

4.1.2 Eye Tracking with Voice Recognition

The voice recognition method operates in two phases,

(1) pointing phase using the eye tracker, and (2) se-

lection phase using voice commands. Figure 3b il-

lustrates the overview of these phases. The process

of voice recognition was developed using the built-in

Windows 10 speech recognition functionalities pro-

vided in the Microsoft .NET framework. We de-

veloped a C# application to capture user’s activation

command ’select’ to activate a left mouse click.

(a)

(b)

Figure 3: (a) test setting and equipment, and (b) system

overview and workﬂow of both interaction techniques.

4.2 Interaction Modules

Eye Tracking: We used the Tobii 4C eye tracker to

capture the mouse pointer position to enable users to

interact with the system with their gaze. Moreover,

we employed the Tobii SDK to obtain users’ gaze

locations (2D coordinates) on the screen and syn-

chronize the mouse pointer to these coordinates in

pixel. The eye tracking module for both interaction

techniques was developed in C++ and integrated into

the Tobii SDK as a new plug-in. The samples were

recorded at a distance of 60 cm (24 in) from the eye

tracker with a sampling rate of 90 Hz on a 24 inch

HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications

138

screen with the resolution of 1920 × 1080 pixels.

The dwell-time technique relies solely on the eye

tracking module.

Voice Processing: We used a headset microphone

(Logitech H370) to capture the user’s voice com-

mands in the presence of an artiﬁcial ambient noise

around 50 dB played by stereo speakers (Figure 3a)

to simulate a typical working ofﬁce. The voice recog-

nition module received the commands in real-time to

be activated by the keyword ’select’ to trigger a left

mouse click.

4.3 Hypotheses

Based on the previous literature, which has demon-

strated the effectiveness of Fitts’ law (Crossman and

Goodeve, 1983), (Keele and Posner, 1968), (Card

et al., 1978), and (MacKenzie and Buxton, 1992) and

the NASA TLX questionnaire (Hart and Staveland,

1988), (Hart, 2006), and (Ruiz-Rabelo et al., 2015),

we propose a compound simple-to-calculate mathe-

matical model to measure the difﬁculty level of eye

tracking applications independent from device type

and technical capabilities during user studies. This

model enables analysis of eye tracking applications

based on user groups and their abilities to interact

with an eye tracking device or interaction technique.

Speciﬁcally, we hypothesize that:

1. When IDEA is higher on average for an interaction

technique, the calculated throughput based on the

Fitts’ law will be lower, and vice versa.

2. When IDEA is higher on average for an interaction

technique, the calculated movement time based on

the Fitts’ law will be higher as well, and vice versa.

3. When IDEA is higher on average for an interaction

technique, the registered error rates will be higher

as well, and vice versa.

4.4 User Study

The user study described above was used to analyze

the mentioned eye tracking interaction techniques

to evaluate the proposed IDEA model, according to

well-established academic standards. We measured

four parameters in our 3 part study: (1) distance to

target, (2) throughput, (3) movement time, and (4) er-

ror rates. We developed a dart-like application (Figure

1a) to measure distance to target and used the applica-

tion developed by Wobbrock et al. (Wobbrock et al.,

2011) called the FittsStudy version 4.2.7 which in-

cludes two widths (96, 128), and three distances (256,

384, 512) pixels to record the rest of the measures.

4.4.1 Dart Test

The stimulus consisted of three circles, green from 0

to 30 pixels, blue from 30 to 60 pixels, and red from

60 to 90 pixels in radius as illustrated in Figure 1a.

Any selection outside of the dart colored circles is

recorded as the ﬁxed maximum range of 90 pixels for

that selection. The purpose of this experiment was

to measure the Euclidean distance to target to be ap-

plied in Equation 3 by calculating the fraction of dis-

tance (r) over diameter of screen (D) as shown earlier

(S =

r+1

). Subjects were asked to select, as accurately

as possible, the center of a dart target using both in-

teraction methods. Since eye tracking has different

accuracy in different regions of a screen (Feit et al.,

2017), we calculated an average of ﬁve trials for each

interaction techniques where the stimulus moved to

different areas around the center of screen randomly.

Each random trial started in two second intervals en-

abling subjects to change their gaze before recording

the distance measures. A countdown timer with inter-

vals of 100 ms was displayed from 5 to 0 to show the

remaining time to subjects.

4.4.2 Ribbon-shaped Test

The stimulus contains two vertical bars to be selected

(clicked), each at a time shown in Figure 4a. The vari-

ation of distances and widths are chosen randomly by

the FittsStudy (Wobbrock et al., 2011) application and

the order of each interaction method for each partici-

pant were also chosen randomly.

4.4.3 Circle-shaped Test

This test is the same as the ribbon-shaped test with

circular-shaped targets illustrated in Figure 4b. This

experiment measures two variations for throughput,

(1) uni-variate endpoint deviation (SD

) through one

axis, and (2) bi-variate endpoint deviation (SD

x,y

)

through both axes which results in a better Fitts’ law

model (Wobbrock et al., 2011). The stimulus con-

tains equally-sized circles with different distances and

widths to be selected (clicked), each at a time shown

in Figure 4b. The variation of distances and widths

are chosen randomly by the FittsStudy (Wobbrock

et al., 2011) application and the order of each inter-

action method for each participant was also chosen

randomly.

4.4.4 Workﬂow and Parameters

The user study was conducted on a screen with the

resolution of 1920 × 1080 pixels which results in a

diameter (D) of 2203 pixels (rounded up). Distances

IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis Model for Target Selection Tasks

139

(a)

(b)

Figure 4: The FittsStudy application (Wobbrock et al.,

2011). (a) Ribbon-shaped, and (b) Circle-shaped targets.

of 256, 384, and 512 pixels between targets were used

with a target width of 96 and 128 pixels. The dis-

tance to target (r) for both interaction techniques was

measured by the dart test application (Figure 1) based

on the Euclidean distance in pixels. Lastly, the se-

lection ratio was calculated by measured distance to

target over the screen diameter (S =

r+1

). The con-

stant value of 1 is added to the measured distance for

selecting the target exactly in the middle which results

in a distance to target of 0.

5 RESULTS

The results of our experiments were analyzed using

paired-sample t-tests with the JASP

software. Figure

5b shows the NASA TLX scores and the calculated

average workload based on Equation 2 for both inter-

action techniques from the post-test questionnaire.

As per Equation 1, the technical factor, which is

42, was the same for both interaction techniques as

it depends on distances and widths which were con-

stant in our user study. This is the case in our ex-

periments as both interaction techniques were evalu-

ated on the same device with the same screen resolu-

tion and the same target distances and widths. How-

ever, the technical factor can be different for varying

case scenarios. A paired-sample t-test was applied

to check the effectiveness of the interaction technique

on the experimental factor based on Equation 3 with

(t(32)=2.86, p < .05). A signiﬁcant difference was

found between dwell-time (M = 0.48, SE = 0.06) and

voice recognition (M = 0.65,SE = 0.06). Speciﬁ-

cally, dwell-time had a lower experimental factor than

the voice recognition technique. This suggests that

the multiplication of users’ selection ratio on screen

(S) and their rating scores (R) is signiﬁcantly lower

for the dwell-time method than the voice recognition

technique.

A paired-sample t-test was applied to check the

effectiveness of interaction technique on the index

of difﬁculty based on the Equation 4 shown in

Figure 5a and Table 1. A signiﬁcant difference

(t(32)=3.19, p < .05) was found between dwell-time

(M = 4.17,SE = 0.15) and voice recognition (M =

https://jasp-stats.org/

4.66,SE = 0.15). This suggests that the dwell-time

method has a signiﬁcantly lower IDEA value than the

voice recognition technique. Dwell-time can thus be

considered an easier eye tracking technique for our

subjects comparing to the voice recognition.

Table 1: Summary of IDEA calculations.

Dwell-Time Voice Recog.

Distance 35.30 29.27

Selection ratio 0.016 0.014

Tech. factor 42 42

Exp. factor 0.48 0.65

IDEA 4.17 4.66

Dart Test: Paired-sample t-tests were performed to

study the effect of interaction type on (1) distance to

target, and (2) selection ratio. A signiﬁcant differ-

ence (t(32)=2.88, p < .05) was found between dwell-

time (M = 35.30 pixels,SE = 2.11 pixels) and voice

recognition (M = 29.27 pixels, SE = 2.07 pixels)

on distance to target (r) depicted in Figure 6a. This

shows that the voice recognition technique has a

higher target selection accuracy (lower distance to

target) than the dwell-time method. This is likely

the case because this method splits the pointing (eye

tracking) and selecting (voice command) into differ-

ent modalities.

A paired-sample t-test was also applied to check

the effectiveness of interaction technique on selection

ratio (S) depicted in Figure 6b. A signiﬁcant differ-

ence (t(32)=2.88, p < .05) was found between dwell-

time (M = 0.016 pixels, SE = 9.620e − 4 pixels) and

voice recognition (M = 0.014 pixels,SE = 9.409e −

4 pixels). This means that users are more accurate

to select targets using the voice recognition technique

than the dwell-time.

Ribbon-shaped Test: Paired-sample t-tests were per-

formed to study the effect of interaction type on

(1) throughput, (2) movement time, and (3) error

rate. There was a signiﬁcant difference (t(32)=5.96,

p < .001) of throughput for dwell-time (M = 3.30

bits/sec,SE = 0.36 bits/sec) and voice recognition

(M = 1.16 bits/sec, SE = 0.09 bits/sec) as seen in

Figure 6c. This conﬁrms our hypothesis that a lower

IDEA value for an interaction technique reﬂects a

higher throughput.

A paired-sample t-test was applied to check the

effectiveness of interaction technique on movement

time depicted in Figure 6d. A signiﬁcant difference

(t(32)=15.13, p < .001) was found between dwell-

time (M = 0.60 sec, SE = 0.01 sec) and voice recog-

nition (M = 2.01 sec, SE = 0.08 sec). This conﬁrms

our hypothesis that a lower IDEA value for an inter-

action technique reﬂects a lower movement time.

A paired-sample t-test was applied to check the

HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications

140

effectiveness of interaction technique on error rate

depicted in Figure 6e. A signiﬁcant difference

(t(32)=4.84, p < .001) was found between dwell-time

(M = 0.28 errors,SE = 0.03 errors) and voice recog-

nition (M = 0.11 errors,SE = 0.02 errors). This re-

jects our hypothesis that an interaction technique with

a lower IDEA value should cause lower error rate.

The cause of errors in eye tracking applications as ex-

plained above are mostly due to the Midas touch prob-

lem (Jacob, 1990). Thus as the dwell-time method re-

lies on eye tracking solely, and selection is done based

on ﬁxation time there were higher error rates in this

method than in the multi-modal voice method where

selection is done based on a voice command.

Circle-shaped Test: Paired-sample t-tests were per-

formed to study the effect of interaction type on (1)

throughput with two variations, (2) movement time,

and (3) error rate. For univariate throughput (il-

lustrated in Figure 7a) there was a signiﬁcant dif-

ference (t(32)=7.98, p < .001) between dwell-time

(M = 3.91 bits/sec,SE = 0.31 bits/sec) and voice

recognition (M = 1.48 bits/sec,SE = 0.09 bits/sec).

This conﬁrms our hypothesis that an interaction tech-

nique with a lower IDEA value should reach higher

throughput.

A paired-sample t-test was applied to check the

effectiveness of interaction technique on bivariate

throughput illustrated in Figure 7b. A signiﬁcant dif-

ference (t(32)=7.19, p < .001) was found between

dwell-time (M = 2.51 bits/sec, SE = 0.22 bits/sec)

and voice recognition (M = 1.01 bits/sec,SE =

0.06 bits/sec). This conﬁrms our hypothesis that an

interaction technique with a lower IDEA value should

reach higher throughput.

A paired-sample t-test was applied to check the

effectiveness of interaction technique on movement

time illustrated in Figure 7c. A signiﬁcant difference

(t(32)=11.31, p < .001) was found between dwell-

time (M = 0.64 sec, SE = 0.02 sec) and voice recog-

nition (M = 2.12 sec,SE = 0.13 sec). This conﬁrms

our hypothesis that an interaction technique with a

lower IDEA value should reach a lower movement

time.

A paired-sample t-test was applied to check the

effectiveness of interaction technique on error rate

illustrated in Figure 7d. A signiﬁcant difference

(t(32)=2.26, p < .05) was found between dwell-time

(M = 0.23 errors, SE = 0.03 errors) and voice recog-

nition (M = 0.13 errors, SE = 0.02 errors). This re-

jects our hypothesis that an interaction technique with

a lower IDEA value should have a lower error rate. As

described above, the cause of errors in eye tracking

applications are mostly due to the Midas touch prob-

lem and thus the single mode method which requires

gaze for both pointer movement and selection is more

error prone.

6 DISCUSSION

The results reﬂect the efﬁcacy of our two-factor

model to measure the performance of eye tracking ap-

plications independently of device type. We showed

that our model can predict the difﬁculty of eye track-

ing applications solely based on Fitts’ law and the

NASA TLX scores. Further, we showed our model

correlates with the standard measures (throughput and

movement time) described by Fitts’ law. The global

pandemic of COVID-19 showed the importance of

computer interactions from a safe distance without

physical contact. Eye tracking applications, speciﬁ-

cally the dwell-time method, are suitable candidates

to enable safe interactions on shared and public de-

vices for selection tasks. Therefore, our proposed

model can be applied in pilot studies to measure the

usability and performance of selection techniques to

address different user groups such as children, users

with disabilities, or elderly based on the experimen-

tal factor which reﬂects (a) subjective ratings (NASA

TLX scores), and (b) perceived difﬁculty levels of

interaction techniques or user groups. Although we

only studied voice recognition as a multi-modal inter-

action technique, the results of the user studies con-

ﬁrm our ﬁrst and second hypotheses regarding the

correlation between throughput and movement time

calculated by the Fitts’ law and the predictions by our

proposed model. However, eye tracking applications

suffer from the Midas touch problem, and since the

dwell-time method relies on eye gaze only, it reached

higher error rates than the multi-modal selection tech-

nique using voice recognition with separate modali-

ties for point and selection. The analysis of our re-

sults emphasizes the potential of our two-factor pre-

diction model on two similar eye tracking interaction

techniques. We hope, this experiment leads to more

innovations of multi-dimensional compound models

for gaze-based interactions.

7 CONCLUSION AND FUTURE

WORK

In this paper we proposed the Index of Difﬁculty for

Eye tracking Applications (IDEA) a compound two-

factor model to measure the performance and usabil-

ity of selection techniques based on calculations of

Fitts’ law and the results of a NASA TLX question-

IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis Model for Target Selection Tasks

141

(a)

(b)

Figure 5: (a) shows index of difﬁculty for eye tracking applications (IDEA) based on Equation 4 for both interaction techniques

(p < .05), and (b) illustrates the results of the NASA TLX scores. Error bars represent SE.

(a)

(b)

(c)

(d)

(e)

Figure 6: (a) Euclidean distance to target measure (r). (b) Calculated selection ratio (S =

r+1

) for both interaction techniques.

(c) Throughput (TP), (d) Movement time (MT), and (e) Error rates (ER) for both interaction techniques of the ribbon-shaped

test. Error bars represent SE. (p < .05 on (a) and (b) measures, p < .001 on (c), (d), and (e) measures).

(a)

(b)

(c)

(d)

Figure 7: Calculated measures of the circle-shaped test. (a) Univariate throughput (TP) (p < .001), (b) Bivariate TP (p < .001),

(c) Movement time (MT) (p < .001), and (d) Error rates for both interaction techniques (p < .05). Error bars represent SE.

naire. As emerging interaction techniques are re-

quired to cope with emerging users’ demands, the

need for more complex models to compare different

techniques requires more attention. We present our

model to asses the efﬁcacy of eye tracking applica-

tions for pilot studies with different user groups such

as children, users with disabilities, or elderly. Our

conﬁgurable model can be applied for case scenar-

ios as well as to discriminate speciﬁc interaction tech-

niques. In addition, we presented an in-depth analysis

of the dwell-time method based on the Fitts’ law mea-

sures. Although our model was developed to address

eye tracking interactions, it can be applied on any se-

lection technique to measure difﬁculty levels based on

test speciﬁcations (target size and distance) and users’

subjective ratings. Further, we showed eye tracking

techniques can be compared without analysis of tech-

nical raw data such as ﬁxation duration time and blink

rates. These enable researchers to run pilot studies in-

dependently from device type. We predict the transi-

tion from conventional interaction techniques, such as

keyboard and mouse, to contact-free techniques from

a safe distance caused by the latest global outbreak of

viral infections, especially for equipment in health-

care sectors, and shared public devices. IDEA en-

ables researchers to run user studies based on video

eye tracking techniques via remote webcams to com-

ply with restrictions caused by viral diseases which

limit the physical presence of participants in labora-

tories or attaching sensory equipment to record users’

feedback. We plan on applying our proposed model

on AR and VR headsets with internal eye trackers to

study usability of target selection in our future work.

HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications

142

ACKNOWLEDGEMENT

This work was supported by the Natural Sciences and

Engineering Research Council of Canada Grants DG-

N01670 and DG-N06722.

REFERENCES

Abdulin, E. and Komogortsev, O. (2015). User eye fatigue

detection via eye movement behavior. In Proceed-

ings of the 33rd annual ACM conference extended ab-

stracts on human factors in computing systems, pages

1265–1270. ACM.

Borghini, G., Vecchiato, G., Toppi, J., Astolﬁ, L., Maglione,

A., Isabella, R., Caltagirone, C., Kong, W., Wei, D.,

Zhou, Z., et al. (2012). Assessment of mental fa-

tigue during car driving by using high resolution eeg

activity and neurophysiologic indices. In 2012 An-

nual International Conference of the IEEE Engineer-

ing in Medicine and Biology Society, pages 6442–

6445. IEEE.

Card, S. K., English, W. K., and Burr, B. J. (1978). Evalua-

tion of mouse, rate-controlled isometric joystick, step

keys, and text keys for text selection on a crt. Er-

gonomics, 21(8):601–613.

Crossman, E. and Goodeve, P. (1983). Feedback control

of hand-movement and ﬁtts’ law. The Quarterly Jour-

nal of Experimental Psychology Section A, 35(2):251–

278.

Feit, A. M., Williams, S., Toledo, A., Paradiso, A., Kulka-

rni, H., Kane, S., and Morris, M. R. (2017). Toward

everyday gaze input: Accuracy and precision of eye

tracking and implications for design. In Proceedings

of the 2017 Chi conference on human factors in com-

puting systems, pages 1118–1130. ACM.

Felton, E. A., Williams, J. C., Vanderheiden, G. C., and

Radwin, R. G. (2012). Mental workload during brain–

computer interface training. Ergonomics, 55(5):526–

537.

Fitts, P. M. (1954). The information capacity of the human

motor system in controlling the amplitude of move-

ment. Journal of experimental psychology, 47(6):381.

Gori, J., Rioul, O., Guiard, Y., and Beaudouin-Lafon, M.

(2017). One ﬁtts’ law, two metrics. In IFIP Con-

ference on Human-Computer Interaction, pages 525–

533. Springer.

Group, N. H. P. R. (1986). Nasa task load index (tlx) paper

and pencil package.

Hansen, J. P., Rajanna, V., MacKenzie, I. S., and Bækgaard,

P. (2018). A ﬁtts’ law study of click and dwell inter-

action by gaze, head and mouse with a head-mounted

display. In Proceedings of the Workshop on Commu-

nication by Gaze Interaction, pages 1–5.

Hart, S. G. (2006). Nasa-task load index (nasa-tlx); 20 years

later. In Proceedings of the human factors and er-

gonomics society annual meeting, volume 50, pages

904–908. Sage publications Sage CA: Los Angeles,

CA.

Hart, S. G. and Staveland, L. E. (1988). Development of

nasa-tlx (task load index): Results of empirical and

theoretical research. In Advances in psychology, vol-

ume 52, pages 139–183. Elsevier.

Isomoto, T., Ando, T., Shizuki, B., and Takahashi, S.

(2018). Dwell time reduction technique using ﬁtts’

law for gaze-based target acquisition. In Proceed-

ings of the 2018 ACM Symposium on Eye Tracking

Research & Applications, pages 1–7.

Jacob, R. J. K. (1990). What you look at is what you get:

Eye movement-based interaction techniques. In Pro-

ceedings of the SIGCHI Conference on Human Fac-

tors in Computing Systems, CHI ’90, pages 11–18,

New York, NY, USA. ACM.

Keele, S. W. and Posner, M. I. (1968). Processing of visual

feedback in rapid movements. Journal of experimen-

tal psychology, 77(1):155.

Kim, H., Kwon, S., Heo, J., Lee, H., and Chung, M. K.

(2014). The effect of touch-key size on the us-

ability of in-vehicle information systems and driving

safety during simulated driving. Applied ergonomics,

45(3):379–388.

Komogortsev, O. V., Gobert, D. V., Jayarathna, S., Koh,

D. H., and Gowda, S. M. (2010). Standardization of

automated analyses of oculomotor ﬁxation and sac-

cadic behaviors. IEEE Transactions on Biomedical

Engineering, 57(11):2635–2645.

Lanthier, S., Risko, E., Smilek, D., and Kingstone, A.

(2013). Measuring the separate effects of practice and

fatigue on eye movements during visual search. In

Proceedings of the Annual Meeting of the Cognitive

Science Society, volume 35.

MacKenzie, I. S. (1989). A note on the information-

theoretic basis for ﬁtts’ law. Journal of motor behav-

ior, 21(3):323–330.

MacKenzie, I. S. (2012). Evaluating eye tracking systems

for computer input. In Gaze interaction and applica-

tions of eye tracking: Advances in assistive technolo-

gies, pages 205–225. IGI Global.

MacKenzie, I. S. and Buxton, W. (1992). Extending ﬁtts’

law to two-dimensional tasks. In Proceedings of the

SIGCHI conference on Human factors in computing

systems, pages 219–226.

Parisay, M., Poullis, C., and Kersten-Oertel, M. (2020). Fe-

lix: Fixation-based eye fatigue load index a multi-

factor measure for gaze-based interactions. In 2020

13th International Conference on Human System In-

teraction (HSI), pages 74–81.

Pi, J. and Shi, B. E. (2017). Probabilistic adjustment

of dwell time for eye typing. In 2017 10th Inter-

national Conference on Human System Interactions

(HSI), pages 251–257. IEEE.

Ruiz-Rabelo, J. F., Navarro-Rodriguez, E., Di-Stasi, L. L.,

Diaz-Jimenez, N., Cabrera-Bermon, J., Diaz-Iglesias,

C., Gomez-Alvarez, M., and Brice

no-Delgado, J.

(2015). Validation of the nasa-tlx score in ongoing

assessment of mental workload during a laparoscopic

learning curve in bariatric surgery. Obesity surgery,

25(12):2451–2456.

IDEA: Index of Difﬁculty for Eye Tracking Applications - An Analysis Model for Target Selection Tasks

143

Schenk, S., Dreiser, M., Rigoll, G., and Dorr, M. (2017).

Gazeeverywhere: enabling gaze-only user interaction

on an unmodiﬁed desktop pc in everyday scenarios. In

Proceedings of the 2017 CHI Conference on Human

Factors in Computing Systems, pages 3034–3044.

Spakov, O. and Miniotas, D. (2004). On-line adjustment of

dwell time for target selection by gaze. In Proceedings

of the third Nordic conference on Human-computer

interaction, pages 203–206. ACM.

Velichkovsky, B. B., Rumyantsev, M. A., and Morozov,

M. A. (2014). New solution to the midas touch prob-

lem: Identiﬁcation of visual commands via extraction

of focal ﬁxations. Procedia Computer Science, 39:75–

82.

Velloso, E., Wirth, M., Weichel, C., Esteves, A., and

Gellersen, H. (2016). Ambigaze: Direct control of

ambient devices by gaze. In Proceedings of the 2016

ACM Conference on Designing Interactive Systems,

DIS ’16, pages 812–817, New York, NY, USA. ACM.

Wobbrock, J. O., Shinohara, K., and Jansen, A. (2011).

The effects of task dimensionality, endpoint devia-

tion, throughput calculation, and experiment design

on pointing measures and models. In Proceedings of

the SIGCHI Conference on Human Factors in Com-

puting Systems, pages 1639–1648. ACM.

Zagermann, J., Pfeil, U., and Reiterer, H. (2016). Measur-

ing cognitive load using eye tracking technology in

visual computing. In Proceedings of the Sixth Work-

shop on Beyond Time and Errors on Novel Evaluation

Methods for Visualization, BELIV ’16, pages 78–85,

New York, NY, USA. ACM.

Zheng, B., Jiang, X., Tien, G., Meneghetti, A., Panton, O.

N. M., and Atkins, M. S. (2012). Workload assess-

ment of surgeons: correlation between nasa tlx and

blinks. Surgical endoscopy, 26(10):2746–2750.

HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications

144