Where is that Button Again?! – Towards a Universal GUI Search Engine

Sven Hertling

, Markus Schr

oder

, Christian Jilek

and Andreas Dengel

1,2

German Research Center for Artiﬁcial Intelligence (DFKI) GmbH, Trippstadter Straße 122,

67663 Kaiserslautern, Germany

Knowledge-based Systems Group, Department of Computer Science, University of Kaiserslautern,

P.O. Box 3049, 67653 Kaiserslautern, Germany

Keywords:

Information Retrieval, GUI automation, Accessibility Interface.

Abstract:

In feature-rich software a wide range of functionality is spread across various menus, dialog windows, tool

bars etc. Remembering where to ﬁnd each feature is usually very hard, especially if it is not regularly used.

We therefore provide a GUI search engine which is universally applicable to a large number of applications.

Besides giving an overview of related approaches, we describe three major problems we had to solve, which

are analyzing the GUI, understanding the users’ query and executing a suitable solution to ﬁnd a desired UI

element. Based on a user study we evaluated our approach and showed that it is particularly useful if a not

regularly used feature is searched for. We already identiﬁed much potential for further applications based on

our approach.

1 INTRODUCTION

“Where is that button again?!” – this and other com-

plaints are very common among users of feature-rich

software. Typically in such software a high amount of

features is spread across various menus, dialog win-

dows, tool bars etc. in the graphical user interface

(GUI). Remembering where to ﬁnd each feature is

usually very hard especially if it is not regularly used.

The GUI, consisting of graphical elements like but-

tons, checkboxes, input ﬁelds etc., provides some tex-

tual clues of the functionality it offers. For example,

a button’s label reveals its utility and an additional

tooltip may give detailed information about its usage.

Depending on the implementation there may be many

or just few of such hints. In order to ﬁnd a certain

software function, the user has to browse its graphi-

cal front-end as well as read and understand the tex-

tual representations. (Cockburn and Gutwin, 2009)

show that novice users need more time to retrieve el-

ements within hierarchical menus. Using software

having such complex graphical interfaces can lead to

despair. One approach to help expert users is Com-

mandMaps (Scarr et al., 2012). Like the name sug-

gests, it unfolds several menus in order to cover the

whole screen. Users may then ﬁnd the demanded

menu items with the help of their spatial memory.

In this paper, we concentrate on helping novice

users ﬁnding the corresponding software features.

Figure 1: A user wants to rotate an upside down video. Our

search engine retrieves a list of graphical elements by the

given search term “rotate” (left-hand side). By selecting the

ﬁrst entry, the system preforms necessary steps like mouse

and keyboard interactions automatically to get the element

visible. Therefore it traverses the appropriate menu item

as well as several tab controls to ﬁnally present the Rotate

feature (right).

This is done by a search engine for widgets within the

GUI. After the user clicks on a search result, the ap-

propriate mouse and keyboard actions are performed

to make the element visible on screen.

To offer such a support, an approach ﬁrst has to

grasp information and structure of the user interface.

Since a graphical interface is rendered for humans, an

obvious solution could be based on image processing

(Yeh et al., 2009). However, in order to also extract

the texts of the graphical elements, OCR-based ap-

Hertling S., SchrÃ˝uder M., Jilek C. and Dengel A.

Where is that Button Again?! â

A ¸S Towards a Universal GUI Search Engine.

DOI: 10.5220/0006201402170227

In Proceedings of the 9th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2017), pages 217-227

ISBN: 978-989-758-220-2

217

proaches have to be applied. Beside the error rate,

they also lack the ability to get hidden information

like texts in tooltips.

Even if this is solved, we are still facing the chal-

lenges of understanding which feature the user is cur-

rently looking for (handled by (Adar et al., 2014)) and

how to guide them there. Concerning the latter, an

obvious solution would be to provide a textual ex-

planation what has to be done in order to navigate

to the element in question. For even higher conve-

nience we offer a fully automated execution of the

necessary steps. Figure 1 shows an example scenario

of our application. The user searches for a software

functionality called “rotate”. As a result the search

engine provides several user interface elements. They

are rendered like typical search results but also con-

tain images and more detailed descriptions of the UI

element. The right hand side shows the correspond-

ing application. All necessary steps like mouse and

keyboard interactions are performed automatically.

For the proposed approach we identify three main

problems: The need for a sufﬁciently detailed GUI

representation, a possibility to derive what the user is

currently looking for and a method for searching the

GUI model for it. Finally, the system has to perform

the necessary steps to reach the desired graphical ele-

ment. Additionally, we also focus on covering a wide

range of software, not depending on any speciﬁc im-

plementation details. Therefore 16 of the most fre-

quently used software application are tested with our

approach.

This paper is structured as follows: The next sec-

tion contains an overview of related work. Next, we

present our method in more detail according to the

various subtasks it solves in order to implement a uni-

versal GUI search engine (Section 3). Section 4 is

about a user study that was conducted to evaluate our

approach. In Section 5 we conclude this paper and

give an outlook on possible future work.

2 RELATED WORK

In the past, several approaches of analyzing and pro-

viding assistance in the GUI have been proposed. We

therefore concentrate on four topics: GUI represen-

tation & testing, mapping user queries to commands,

GUI assistance and product-speciﬁc GUI search en-

gines.

2.1 GUI Representation & Testing

In the area of GUI testing, which veriﬁes speciﬁca-

tions in graphical front-ends, there is a need for gener-

ating GUI models motivated by model-based testing.

As a consequence there are already solutions regard-

ing the extraction of information and structure from

the graphical user interface. Writing GUI test cases is

an expensive task making their automation a worth-

while goal. For that reason (Memon et al., 2003) show

an approach for reverse engineering graphical user in-

terfaces which is called GUI Ripping. Memon et al.

use the Windows API to crawl the complete GUI in-

formation. In (Memon, 2007) the extracted informa-

tion is used to build an event-ﬂow model of the GUI

for testing purposes.

(Aho et al., 2013) and (Aho et al., 2014) trans-

ferred GUI testing to the industrial domain. They in-

troduce the platform-independent Murphy tool set for

extracting state models of GUI applications. (Grilo

et al., 2010) cope with incomplete models and add an

additional manual step to complete and validate them.

There is also a .NET-based framework to auto-

mate rich client applications which can be based on

Win32, WinForms, WPF, Silverlight and SWT plat-

forms (TestStack, 2016). It hides some of the com-

plexity of the used frameworks like Microsoft UI Au-

tomation and windows messages. Commercial vari-

ants for GUI testing are Ranorex ((Ranorex GmbH,

2016) and (Dubey and Shiwani, 2014)) as well as

TestComplete (SmartBear Software, 2016).

All previously described approaches use APIs for

user interface accessibility like UI Automation. But

there are also others that rely only on the graphi-

cal representation of UI widgets. Sikuli (Yeh et al.,

2009), for example, retrieves elements by screen-

shots. If the corresponding position is found, the tool

will execute mouse and keyboard events. Moreover

they provide a visual scripting API to easily create

GUI automation tasks. Another approach called Pre-

fab (Dixon and Fogarty, 2010) reverse engineers the

user interface on a pixel-based level. In (Dixon et al.,

2011) they extended their solution to also model hier-

archical structures of complex widgets.

2.2 Mapping User Queries to

Commands

In some cases the user describes their tasks in a

different vocabulary than the actual applications do-

main language. For ﬁnding these mappings, (Four-

ney et al., 2011) introduce query-feature graphs (QF-

graphs). Search queries and system features are both

represented as vertices. Each edge symbolizes a map-

ping between them. QF-graphs are constructed using

search query logs, search engine results, web page

content, and localization data from interactive sys-

tems.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

218

In addition, the CommandSpace system (Adar

et al., 2014) ﬁnds these matches based on a large cor-

pus of Web documents about the application. They

use deep learning techniques to create them and eval-

uated their approach on a single application, Adobe

Photoshop.

ShowMeHow (Ramesh et al., 2011) translates

user interface instructions between similar applica-

tions. Since software rapidly changes, tutorials are

often out of date or not available at all. Thus, the goal

is to locate commands in one application using the in-

terface language of a similar application.

2.3 GUI Assistance

Regarding assistance in the graphical user interface

there exist several approaches: CommandMaps (Scarr

et al., 2012; Scarr et al., 2014) is a command selection

interface which makes use of the user’s spatial mem-

ory to faster interact with UI elements. They demon-

strated that their approach is signiﬁcantly faster than

Ribbon and menu interfaces for experienced users.

Another strategy seeks for preventing false feedfor-

ward in menus (Lafreniere et al., 2015) and propose a

task-centric interface (Lafreniere et al., 2014) which

shows a textual workﬂow with actionable commands

included. Both approaches share the intention of get-

ting the user to the desired graphical element faster.

PixelTone (Laput et al., 2013) provides a mul-

timodal interface for photo editing. With their ap-

proach it is possible to select a region of an image and

label it, e.g. “This is a shirt”. GEKA (Hendy et al.,

2010) created a graphically enhanced keyboard accel-

erator which enables users to call software functions

by entering commands like in a typical command line

interface.

(Ekstrand et al., 2011) developed an approach for

ﬁnding better software learning resources (documen-

tation, tutorials and the like), in particular by exploit-

ing the user’s current context in a speciﬁc application.

The Process Guide (IMC Information Multimedia

Communication AG, 2016) by IMC directs the user

through a graphical front-end by giving practical us-

age hints in a context sidebar. These hints are gener-

ated manually for each workﬂow.

2.4 Product-speciﬁc GUI Search

Engines

The following approaches already solved the prob-

lem of ﬁnding graphical elements in their application

by providing an embedded search engine. They are,

however, product-speciﬁc and do not work in other

applications. One example is the settings search of

the Chrome web browser (Google Inc., 2016). All

content of the settings pages can be searched for,

even deeply nested pages (e.g. search for “cookies”).

In Microsoft Visual Studio, QuickLaunch (Microsoft

Corporation, 2016b; Naboulsi, 2012) is embedded for

fast retrieval of commonly used functionality. An-

other similar search engine called Search Commands

(Microsoft Corporation, 2011) is an add-in for the Mi-

crosoft Ofﬁce product suite. The successor Tell Me

(Microsoft Corporation, 2016a) is available since Mi-

crosoft Ofﬁce 2016. A more general approach is Help

Menus in Apple’s OS X operating system (Miser,

2008, p. 62), (Ness, 2011, p. 13), (Pogue, 2012, p.

67). It allows for searching menu items in the menu

bar in all OS X applications. Unfortunately, all other

graphical elements are not searchable. This is also the

case for the settings search in Windows 10 and Ap-

ple iOS. Cortana (Microsoft Corporation, 2016c) can

search for application names but not texts or names

within them.

3 APPROACH

Helping the user in ﬁnding the graphical elements

they are looking for, several problems have to be tack-

led. As a basis we ﬁrst have to grasp the structure and

information of the GUI. Then, the appropriate func-

tionality has to be derived from terms entered by the

user when conducting a search. Additionally, the sys-

tem has to perform all mouse and keyboard interac-

tions to navigate to the UI element in question.

3.1 Software Basis

As a basis for the approach, a good sampling of

most often used software is necessary. Therefore we

analyzed websites which offer commonly used soft-

ware for downloading. A collection of 35 websites is

reduced to the 11 most accessed ones based on web

trafﬁc analyzers like Alexa, Compete, Semrush(de),

Semrush(en) and PageRank. The resulting set of

download websites is

• amazon.com • amazon.de

• cnet.com • sourceforge.net

• softonic.com • zdnet.com

• chip.de • computerbild.de

• heise.de • netzwelt.de

• ﬁlehippo.com

We chose software which is among the top ten in

the ranking of at least two sites. After the removal

of programs like “Flash Player” (web-plugin) and

“Minecraft” (game) which are irrelevant for our use

Where is that Button Again?! â

A¸S Towards a Universal GUI Search Engine

219

case the resulting set is:

• Word 2013 (Trial) • Excel 2013 (Trial)

• PowerPoint 2013 (Trial) • Open Ofﬁce Writer

• Open Ofﬁce Calc • Open Ofﬁce Impress

• VLC media player • CCleaner

• Google Chrome • Mozilla Firefox

• 7-Zip • WinRAR

• Skype • PhotoScape

• AntiVir 2014 - Avira • NortonInternetSecurity

• Adobe Reader • avast! Free Antivirus

Unfortunately, “Norton Internet Security” as well

as “avast! Free Antivirus” (2 of 18 applications or

11.11 % of the sample set) do neither support the UI

Automation framework nor the MSAA framework.

The remaining 16 applications are used in this paper

to build software models and conduct the evaluation.

Figure 2 shows a more detailed analysis of the

download count of cnet.com (depicted as a log-log

plot). We see that it best ﬁts a log-normal distribu-

tion with the parameters of x

min

= 554, µ = 3.656 and

σ = 3.499. The best ﬁtting parameters for a power law

distribution are x

min

= 57033 and α = 1.813. They are

all calculated using the poweRlaw package (Gillespie

et al., 2015). The log-normal distribution also applies

for the chip.de website. Unfortunately all other pages

don’t provide absolute download counts. This anal-

ysis validates that there are only few software tools

which are often downloaded and a large number of

those that are demanded less frequently. Thus, with an

appropriate selection of applications, a huge amount

of users could beneﬁt from an approach like ours.

3.2 GUI Analysis

Analyzing the graphical user interface is a difﬁcult

task. OCR or other image-based approaches did not

fulﬁll our needs due to the rudimentary amount of in-

formation and error-proneness. We therefore utilize

accessibility interfaces since they provide precise and

detailed information about an application’s content

and structure. They are typically used in screen read-

ers which help visually impaired people navigating

their applications. In particular, we use “Microsoft

Active Accessibility” (MSAA) (Microsoft Corpora-

tion, 2000) and its successor “Microsoft UI Automa-

tion” (UIA) (Microsoft Corporation, 2016d) which

are two prominent accessibility APIs. Based on this

powerful technology we are able to exploit a lot of

software applications without further ado.

The graphical user interface of Microsoft Win-

dows is represented as a tree providing a hierarchical

structured list of graphical elements. The APIs al-

low access to these elements’ structure (i.e. child and

Figure 2: Log-log plot of downloaded software from

cnet.com.

parent relations) and properties (i.e. name, position,

size, etc). However, they do not offer functionality

which extracts a complete software model. In particu-

lar, they only allow for crawling the currently visible

UI tree. Consequently, parts of the graphical front-

end have to be made visible in order to capture them.

To make further elements visible one has to interact

with another element, for example a menu is opened if

the corresponding menu item is clicked. That is why

we decided to observe GUI interactions (i.e. mouse

clicks or keyboard shortcuts) to detect appearing ele-

ments and thereby grasp the complete software. As a

beneﬁt we recognize which element is responsible to

make other elements visible. This insight is later use-

ful to execute appropriate interactions automatically

to navigate to a certain element.

We consider two observable interaction sources:

On the one hand we can observe users solving daily

tasks using the graphical front-end. We can thereby

understand GUI usage behavior, but fail in obtaining a

complete software model which would be preferable.

To solve this we implemented an algorithm called

click monkey that automatically explores a given ap-

plication. This is done by clicking all graphical ele-

ments in a depth-limited traversal strategy ensuring a

high UI element coverage of the software. We imple-

mented an observer which is able to capture UI inter-

actions regardless whether they are performed by the

user or the click monkey. In each record we consider

(a) the causing action, (b) an application software’s

GUI tree and (c) the graphical element corresponding

to the current cursor location. By collecting a lot of

records over time we thus receive a detailed interac-

tion log serving as a basis for an expressive software

model.

Gathering GUI data is only a preliminary step.

Next, we have to understand what the user is cur-

rently looking for by interpreting their search query

and matching it to known UI elements.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

220

3.3 Interpreting User Queries

To solve the problem of having to browse software

manually, we have to construct a searchable GUI

model. We therefore make use of information re-

trieval technologies indexing all observed graphical

elements, especially their names and metadata. Us-

ing this textual information we try to match given

user queries to suitable UI elements. A document in

the search index consists of a ﬁeld which contains

concatenated values of the properties Name, Lega-

cyIAccessibleDescription, HelpText and Localized-

ControlType of the corresponding element. We call

this ﬁeld “UI text”. Additionally, information like

ProductName, CompanyName, FileDescription of the

corresponding software is added.

Three approaches for searching UI elements are

developed. The baseline approach analyzes the “UI

text” ﬁeld based on the StandardAnalyzer of Lucene

which generates tokens based on a Unicode Text Seg-

mentation algorithm (speciﬁed in Unicode Standard

Annex #29 (Davis and Iancu, 2016)) and lowercases

all tokens.

The second approach called “language” further

analyzes the “UI text” ﬁeld resulting in three addi-

tional ﬁelds in the index:

(1) standard tokenizer, HTMLStripCharFilter (be-

cause HTML is contained in UI elements’ text

or description) and the following ﬁlters: lower-

case, synonyms (English: WordNet (Fellbaum,

1998; Miller, 1995), German: OpenThesaurus

(Naber, 2004; Naber, 2005)), ASCIIFolding

(Lucene), stopword removal (based on list of elas-

ticsearch (Elasticsearch reference, 2016)), lan-

guage dependent ﬁlter like GermanNormaliza-

tionFilter (Lucene), stemming (based on (Savoy,

2006) for French, Portuguese, German and Hun-

garian languages and (Porter, 1980) for English),

(2) standard tokenizer and the following ﬁlters: low-

ercase, NGramTokenFilter (min:1, max:20),

(3) same as (2) but with EdgeNGramTokenFilter.

The query is a boolean should clause query of

“baseline” and three “language” ﬁelds. This implies

that terms which do not fully match a term in the in-

dex are still counted using for example their stemmed

version (and thus increase recall). Nevertheless we

also consider precision because more matching ﬁelds

will increase the score.

The “context” approach uses the result of the pre-

vious one and re-ranks the documents considering the

overall desktop context. In particular, search results

corresponding to already opened (i.e. currently in

use) applications receive a higher score.

For all approaches TF-IDF is used as a similarity

metric. (Okapi BM25 has also been tested but was

omitted due to not having a considerable difference

compared to TF-IDF).

In the current implementation the mapping be-

tween user queries and application vocabulary is

based on term matching and synonyms. It would

also be possible to include other approaches like

QF-graphs (Fourney et al., 2011) or CommandSpace

(Adar et al., 2014) as additional resources for syn-

onyms.

The information retrieval system we described so

far still lacks the ability of executing mouse and key-

board interactions to get the graphical element visible.

That is why the next section covers another software

model which is able to derive such information.

3.4 Solution Execution

As already mentioned, one has to interact with an el-

ement to make further elements visible. This relation

plays a key role in the following graph-based software

model. By using a graph of connected UI elements, a

navigation is reduced to a shortest path between two

elements. For explaining a step by step solution a path

is appropriate: Ideally, it starts with an already visible

element and describes what has to be done in order to

open them successively. The ﬁnal element will be the

one the user is looking for.

In particular, our model is a directed, weighted

graph having graphical element vertices and inter-

action edges. In short, an edge encodes what ele-

ment causes another one’s occurrence. Because of

this property we call this model a Show New Ele-

ment Graph (SNEG). The edge weight is used to store

the reliability that the interaction will correctly lead

to the promised graphical element. (This is induced

by the observation process not being completely reli-

able.) Based on evidences the model can learn which

paths have proven to be correct.

Models like the one just described explain how to

reach elements. Users still have to read and under-

stand the explanation and navigate though the GUI

on their own. Although it helps them to learn about

the software’s structure, we additionally want to pro-

vide an automation for convenience. Thus, we im-

plemented an algorithm which executes interactions

automatically. To do this, it uses the path acquired

from the previous model, which encodes two types of

information: the expected graphical elements and the

interactions which have to be executed on them. Ac-

cordingly, the algorithm has to master two subtasks:

a reliable recovery of graphical elements and the exe-

cution of the interactions.

Where is that Button Again?! â

A¸S Towards a Universal GUI Search Engine

221

UI Element Recovering. For recovering a UI el-

ement, an evaluation with the 16 selected software

tools was conducted (see Section Software Basis). All

UI elements and all their properties from each start

screen are recorded. In total there are 714 elements

which should clearly be differentiated. Table 1 shows

those properties that best distinguish these elements.

Each property is tested separately and the count

of resulting equivalence classes is used to ﬁnd out

which feature combination is best suited. Most of

them are volatile and thus change over time like the

BoundingRectangle, ProcessId etc. The following set

can be used to rediscover UI elements:

• Name • HelpText

• AutomationId • AccessKey

• ClassName • LocalizedControlType

• ControlType • LegacyIAccessibleRole

• LegacyIAccessibleChildId

• LegacyIAccessibleDescription

In GUI testing also UI elements are identiﬁed.

The identiﬁcation characteristics are determined at

test recording time. Since the playback is executed in

the very same environment, this is sufﬁcient. For the

proposed approach the environment can change dras-

tically and it is therefore necessary to collect suitable

properties for identiﬁcation.

It is possible that the values of one or more of the

properties stated above can change over time. Each

element can thus also have dynamic properties which

can not be used for identiﬁcation. Dynamic properties

are automatically detected when UI elements with the

same RuntimeId are compared.

4 EVALUATION

To evaluate our approach we conducted a user study

with 10 participants (5 male, 5 female, average age

35.2, std. deviation 16.3). First, the participants were

asked about their computer usage ranging from “al-

most no use” (1) to “very frequent use” (4) which

resulted in a mean of 3.3 with a standard deviation

of 0.67. In order to classify the participants in two

classes, they were asked for each application if they

frequently use it (expert), or if they rarely use it

(novice).

In order to test our GUI search engine under real

conditions we extracted 16 most frequently used soft-

ware applications from the most commonly visited

download sites (see Section Software Basis). For

each of them a ﬁctional task and a corresponding pic-

togram is deﬁned which illustrates the task at hand

Table 1: All UIA properties sorted by their resulting equiv-

alence classes (table shows only properties with 10 or more

classes). In total there are 714 elements which should be

distinguished.

Property name ID Count of

equi-

valence

classes

RuntimeId 30000 583

BoundingRectangle 30001 570

Name 30005

299

LegacyIAccessibleName 30092

ProviderDescription 30107 156

NativeWindowHandle 30020 139

HelpText 30013

100

LegacyIAccessibleHelp 30097

AutomationId 30011 92

AccessKey 30007

LegacyIAccessible-

KeyboardShortcut

30098

ClassName 30012 62

LegacyIAccessible-

Description

30094 45

LegacyIAccessibleState 30096 35

LegacyIAccessibleRole 30095 32

LocalizedControlType 30004 31

LegacyIAccessibleValue 30093 28

ControlType 30003 27

ValueValue 30045 25

ProcessId 30002 15

LegacyIAccessible-

DefaultAction

30100 13

LegacyIAccessibleChildId 30091 10

without inﬂuencing the user’s search terms. They are

listed in Table 2.

The evaluation setup for each participant was re-

alized as follows: In a random sequence a participant

processed one task after the other. For each task four

subtasks had to be accomplished:

(Subtask 1) Initially, the task’s pictogram as well

as the related program’s icon and name were dis-

played. Without inﬂuencing the user’s search terms,

the pictogram helps them to derive the envisioned

task. This method wants to simulate an intention to

perform a speciﬁc task in a certain application.

(Subtask 2) Afterwards, the participant had to

formulate a textual query describing the task. Us-

ing our proposed information retrieval system a list

of retrieved graphical element was shown to the

participant. This randomized list consists of the

top 15 graphical elements of the three introduced

search strategies (pooling strategy of TREC evalua-

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

222

Table 2: Table of programs, tasks and corresponding pictograms used in the user study.

Word 2013

Change paragraph

Exel 2013

Sort

PowerPoint 2013

Insert video

OpenOfﬁce Writer

Double underline

OpenOfﬁce Calc

Insert frame

OpenOfﬁce Impress

Change orientation

VLC media player

Rotate video

CCleaner

Update

Google Chrome

Zoom

Mozilla Firefox

Print page

7Zip File manager

Delete ﬁle

WinRar

Rename ﬁle

Skype

Change sound

Photoscape

Red eye effect

Adobe Reader

Fullscreen

Avira Free Antivirus

Password

tion (Buckley and Voorhees, 2005)). Based on the

participant’s relevance feedback we later evaluated

the strategies’ precision and recall with the trec eval

evaluation package (Buckley, 2004).

(Subtask 3) Then, the participant is supposed to

search for the graphical element on their own in the

graphical front-end. We encouraged them to use any

form of external assistance (e.g. internet, manual,

etc.). While doing so, time and click count is cap-

tured. The user had to indicate if the element was

found.

(Subtask 4) Finally, our GUI Search Engine must

be used to ﬁnd the correct graphical element. Again,

time and click count was recorded while using the

tool. In particular, this also includes the formulation

of the query, the selection of an element and the exe-

cution of the tool. Once more, the user has to state if

the tool found the graphical element in question.

Finally each participant had to ﬁll out a user expe-

rience questionnaire (UEQ) (Laugwitz et al., 2008).

Based on the user study’s acquired data, we an-

alyzed the usefulness of our approach and the usage

experience.

Users can mainly beneﬁt in two ways: The GUI

search engine may ﬁnd the UI elements faster and it

may retrieve results when the user has already given

up manual search. In particular, a confusion matrix

describing the dependencies is shown in Table 4(b).

It distinguishes whether the tool and/or the user found

the desired UI element or not. In case both found it,

the matrix further shows in how many cases using the

tool was faster. We differentiate 160 observations –

10 participants performing 16 tasks each with another

application.

The measured number of clicks and the time to

complete the tasks are visualized in Figure 3. We or-

dered the tasks according to their difﬁculty, i.e. the

average user time to fulﬁll the task. We assume that

easier tasks are solved faster while more difﬁcult ones

need more time. Figure 4(a) especially focuses on

those cases in which the tool was helpful. It shows

that the search engine helped users coping with the

most difﬁcult and moderately difﬁcult tasks. The

green bars indicate the fraction of participants per task

for whom the tool was particularly helpful, since they

could either ﬁnd elements faster (green) or they were

able ﬁnd the elements after manual search failed (dark

green). If the tool was not helpful, we differentiate the

following three cases: The user was faster when not

using the tool (orange), tasks could not be completed

at all – whether the tool was used or not – (red) and

only participants not using the tool could ﬁnd the el-

ement (dark red). Unexpectedly, the tool could not

efﬁciently support participants in Tasks 5, 4 and 1. In

case of Task 5 the GUI search engine could not ﬁnd

any graphical element at all due to an incorrect ob-

Where is that Button Again?! â

A¸S Towards a Universal GUI Search Engine

223

(a) Average number of clicks to complete each task differentiating between tool, expert and novice users.

(b) Average time in seconds to complete each task differentiating between tool, expert and novice users.

Figure 3: Analysis of the average click count and time to successfully execute a task (ordered by the tasks’ difﬁculties).

served software model. Although the tool found the

elements in Tasks 4 and 1, it took more time than a

manual search. Regardless of the necessary time, our

GUI search engine could ﬁnd the desired elements in

55.6% of all 160 cases.

In order to evaluate the users’ perceptions of the

search engine, we additionally conducted a user expe-

rience questionnaire (UEQ) (Laugwitz et al., 2008).

Please note that there might be a slight bias since

the participants are personally known to the authors.

Based on the collected data it derives the following

six factors: attractiveness, perspicuity, efﬁciency, de-

pendability, stimulation, and novelty. The results are

depicted in Figure 5 and are discussed in the follow-

ing. The excellent average value in novelty shows

that the participants consider the approach as cre-

ative, innovative and leading-edge. Although there

are product-speciﬁc search engines, presented in re-

lated work section, they seem to be unknown to the

users. Ergonomic quality aspects like efﬁciency, per-

spicuity and dependability are rated above average,

whereas the latter two achieve a slightly higher score.

We assume that the slightly lower rating of efﬁciency

is due to the technical immaturity of the current pro-

totype. However, there is still potential for improve-

ment.

Another minor criterion of the evaluation is the

performance of the graphical element retrieval sys-

tem which lists suitable UI elements based on a user’s

search term. Since this is information retrieval tech-

nology, we will investigate its recall and precision.

Based on relevance feedback in the user study, we

created recall-precision-curves for each of the three

search strategies that are depicted in Figure 6. We see

that more analytical effort results in higher recall and

precision values.

In the next section we conclude this paper and give

an outlook on possible future work.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

224

(a) The colored bars indicate the fraction of participants per task for ﬁve aspects: The tool helped the users because they could

not ﬁnd the element on their own (dark green) or because using the tool was faster (green). In cases the tool was not helpful we

differentiate three cases: Not using the tool was faster (orange), element could not be found at all (whether the tool was used

or not, red) and only participants not using the tool found the element (dark red).

User with tool

successful failed Total

User w/o tool

successful 33,1% 15,0% (tool faster) 31,9% 80%

failed 7,5% 12,5% 20%

Total 55,6% 44,4% 100%

(b) Confusion Matrix depicting in how many cases the tool and/or user found the UI element

in question. Please note that we split up the case of both groups being successful: we

additionally record whether using the tool was faster or not.

Figure 4: Analysis of the approach’s usefulness.

-1,00

-0,50

0,00

0,50

1,00

1,50

2,00

2,50

Attractiveness Perspicuity Efficiency Dependability Stimulation Novelty

Excellent

Good

Above Average

Below Average

Bad

Mean

Figure 5: User experience questionnaire results.

5 CONCLUSION AND OUTLOOK

In this paper we presented the ﬁrst universal GUI

search engine, i.e. one that works for a wide range

of applications.

For 16 of 18 very frequently used programs we

showed that our approach supports users in easily

ﬁnding software functionality, especially if it is a fea-

ture that is not used regularly. We provide the auto-

matic execution of required steps like mouse and key-

Where is that Button Again?! â

A¸S Towards a Universal GUI Search Engine

225

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Recall

Precision

Baseline

Language

Context

Figure 6: Recall/precision curves of the three search strate-

gies based on relevance feedback in the user study.

board actions. To obtain the necessary software mod-

els we implemented the click monkey, a mechanism

to automatically explore the user interfaces of given

applications. Concerning our information retrieval

system we implemented a ﬁrst version and identiﬁed

some potential for improvement. We also showed that

rather minor enhancements already resulted in notice-

able positive effects.

In the two remaining cases our solution could not

be used, since the applications did not support the

Windows accessibility API. Using the Java Access

Bridge in these cases seems promising and will be im-

plemented in future versions.

We also intend to build an in-application tutor-

ing system based on our approach. Additionally, the

possibility of deriving usability implications from the

given software models could be investigated. Porting

our approach to mobile platforms like Android is an-

other future task.

The GUI usage behavior can further be exploited

in order to share expert knowledge about software

which is contained therein.

ACKNOWLEDGEMENTS

This research was funded in part by the German Fed-

eral Ministry of Education and Research under grant

no. 01IS12050 (project SuGraBo). The responsibility

for this publication lies with the authors.

REFERENCES

Adar, E., Dontcheva, M., and Laput, G. (2014). Com-

mandspace: modeling the relationships between tasks,

descriptions and features. In Proceedings of the 27th

annual ACM symposium on User interface software

and technology, pages 167–176. ACM.

Aho, P., Suarez, M., Kanstren, T., and Memon, A. (2013).

Industrial adoption of automatically extracted GUI

models for testing. In Proceedings of the 3rd Interna-

tional Workshop on Experiences and Empirical Stud-

ies in Software Modelling. Springer Inc.

Aho, P., Suarez, M., Kanstren, T., and Memon, A. (2014).

Murphy tools: Utilizing extracted GUI models for in-

dustrial software testing. In Software Testing, Veriﬁca-

tion and Validation Workshops (ICSTW), 2014 IEEE

Seventh International Conference on, pages 343–348.

Buckley, C. (2004). Trec eval ir evaluation package.

http://trec.nist.gov/trec eval/. accessed on 2016-10-

25.

Buckley, C. and Voorhees, E. M. (2005). Retrieval system

evaluation. TREC: Experiment and evaluation in in-

formation retrieval, pages 53–75.

Cockburn, A. and Gutwin, C. (2009). A predictive model

of human performance with scrolling and hierarchical

lists. Human Computer Interaction, 24(3):273–314.

Davis, M. and Iancu, L. (2016). Unicode text segmentation.

http://unicode.org/reports/tr29/. accessed on 2016-10-

25.

Dixon, M. and Fogarty, J. (2010). Prefab: implementing

advanced behaviors using pixel-based reverse engi-

neering of interface structure. In Proceedings of the

SIGCHI Conference on Human Factors in Computing

Systems, pages 1525–1534. ACM.

Dixon, M., Leventhal, D., and Fogarty, J. (2011). Content

and hierarchy in pixel-based methods for reverse en-

gineering interface structure. In Proceedings of the

SIGCHI Conference on Human Factors in Computing

Systems, pages 969–978. ACM.

Dubey, N. and Shiwani, M. S. (2014). Studying and

comparing automated testing tools; ranorex and test-

complete. International Journal Of Engineering And

Computer Science, 3:5916–5923.

Ekstrand, M., Li, W., Grossman, T., Matejka, J., and Fitz-

maurice, G. (2011). Searching for software learning

resources using application context. In Proceedings

of the 24th annual ACM symposium on User interface

software and technology, pages 195–204. ACM.

Elasticsearch reference (2016). Stop token ﬁlter.

http://www.elasticsearch.org/guide/en/elasticsearch/

reference/current/analysis-stop-tokenﬁlter.html.

accessed on 2016-10-25.

Fellbaum, C. (1998). WordNet – An Electronic Lexical

Database. MIT Press.

Fourney, A., Mann, R., and Terry, M. (2011). Query-feature

graphs: bridging user vocabulary and system func-

tionality. In Proceedings of the 24th annual ACM

symposium on User interface software and technol-

ogy, pages 207–216. ACM.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

226

Gillespie, C. S. et al. (2015). Fitting heavy tailed distribu-

tions: The powerlaw package. Journal of Statistical

Software, 64(i02).

Google Inc. (2016). Chrome-browser features.

https://www.google.com/chrome/browser/features.

html. accessed on 2016-10-25.

Grilo, A. M., Paiva, A. C., and Faria, J. P. (2010). Reverse

engineering of gui models for testing. In 5th Iberian

Conference on Information Systems and Technologies,

pages 1–6. IEEE.

Hendy, J., Booth, K. S., and McGrenere, J. (2010). Graph-

ically enhanced keyboard accelerators for guis. In

Proceedings of Graphics Interface 2010, pages 3–10.

Canadian Information Processing Society.

IMC Information Multimedia Communication AG

(2016). Electronic performance support system imc

process guide. https://www.im-c.de/en/learning-

technologies/performance-support. accessed on

2016-10-25.

Lafreniere, B., Bunt, A., and Terry, M. (2014). Task-centric

interfaces for feature-rich software. In Proceedings

of the 26th Australian Computer-Human Interaction

Conference on Designing Futures: the Future of De-

sign, pages 49–58. ACM.

Lafreniere, B., Chilana, P. K., Fourney, A., and Terry, M. A.

(2015). These aren’t the commands you’re looking

for: Addressing false feedforward in feature-rich soft-

ware. In Proceedings of the 28th Annual ACM Sympo-

sium on User Interface Software & Technology, pages

619–628. ACM.

Laput, G. P., Dontcheva, M., Wilensky, G., Chang, W.,

Agarwala, A., Linder, J., and Adar, E. (2013). Pixel-

tone: a multimodal interface for image editing. In Pro-

ceedings of the SIGCHI Conference on Human Fac-

tors in Computing Systems, pages 2185–2194. ACM.

Laugwitz, B., Held, T., and Schrepp, M. (2008). Con-

struction and evaluation of a user experience ques-

tionnaire. Springer.

Memon, A. M. (2007). An event-ﬂow model of GUI-based

applications for testing. Software Testing, Veriﬁcation

and Reliability, 17(3):137–157.

Memon, A. M., Banerjee, I., and Nagarajan, A. (2003). Gui

ripping: Reverse engineering of graphical user inter-

faces for testing. In WCRE, volume 3, page 260.

Microsoft Corporation (2000). Microsoft active accessi-

bility: Architecture. http://msdn.microsoft.com/en-

us/library/ms971310.aspx. accessed on 2016-10-25.

Microsoft Corporation (2011). Ofﬁce labs: Search com-

mands. http://www.microsoft.com/en-us/download/

details.aspx?id=28559. accessed on 2016-10-25.

Microsoft Corporation (2016a). Do things quickly with

tell me. https://support.ofﬁce.com/en-us/article/Do-

things-quickly-with-Tell-Me-f20d2198-17b8-4b09-

a3e5-007a337f1e4e. accessed on 2016-10-25.

Microsoft Corporation (2016b). Quick launch.

http://msdn.microsoft.com/en-us/library/

hh417697.aspx. accessed on 2016-10-25.

Microsoft Corporation (2016c). What is cortana?

https://support.microsoft.com/en-us/help/17214/

windows-10-what-is. accessed on 2016-10-25.

Microsoft Corporation (2016d). Windows automa-

tion api. http://msdn.microsoft.com/en-us/library/win

dows/desktop/ff486375(v=vs.85).aspx. accessed on

2016-10-25.

Miller, G. A. (1995). Wordnet: a lexical database for en-

glish. Communications of the ACM, 38(11):39–41.

Miser, B. (2008). Mac OS X Leopard in Depth. Que Pub-

lishing.

Naber, D. (2004). Openthesaurus: Building a thesaurus

with a web community. Citeseer, 3:2005.

Naber, D. (2005). OpenThesaurus: ein offenes deutsches

Wortnetz. Beitr

age zur GLDV-Tagung, pages 422–

433.

Naboulsi, Z. (2012). Visual studio 2012 new fea-

tures: Quick launch. http://blogs.msdn.com/b/

zainnab/archive/2012/06/26/visual-studio-2012-new-

features-quick-launch.aspx. accessed on 2016-10-25.

Ness, R. (2011). Mac OS X Lion in Depth. Que Publishing.

Pogue, D. (2012). OS X Mountain Lion: The Missing Man-

ual. O’Reilly Media.

Porter, M. F. (1980). An algorithm for sufﬁx stripping. Pro-

gram, 14(3):130–137.

Ramesh, V., Hsu, C., Agrawala, M., and Hartmann, B.

(2011). Showmehow: translating user interface in-

structions between applications. In Proceedings of the

24th annual ACM symposium on User interface soft-

ware and technology, pages 127–134. ACM.

Ranorex GmbH (2016). Ranorex hompage. http://www.

ranorex.com. accessed on 2016-10-25.

Savoy, J. (2006). Light stemming approaches for the french,

portuguese, german and hungarian languages. In Pro-

ceedings of the 2006 ACM symposium on Applied

computing, pages 1031–1035. ACM.

Scarr, J., Cockburn, A., Gutwin, C., and Bunt, A. (2012).

Improving command selection with commandmaps.

In Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pages 257–266. ACM.

Scarr, J., Cockburn, A., Gutwin, C., Bunt, A., and

Cechanowicz, J. E. (2014). The usability of com-

mandmaps in realistic tasks. In Proceedings of the

32nd annual ACM conference on Human factors in

computing systems, pages 2241–2250. ACM.

SmartBear Software (2016). TestComplete hompage.

https://smartbear.com/product/testcomplete/overview/.

accessed on 2016-10-25.

TestStack (2016). White. https://github.com/Test

Stack/White. accessed on 2016-10-25.

Yeh, T., Chang, T.-H., and Miller, R. C. (2009). Sikuli:

using gui screenshots for search and automation. In

Proceedings of the 22nd annual ACM symposium on

User interface software and technology, pages 183–

192. ACM.

Where is that Button Again?! â

A¸S Towards a Universal GUI Search Engine

227