Authors:
Mădălina Dicu
1
;
Enol García González
2
;
Camelia Chira
1
and
José R. Villar
2
Affiliations:
1
Faculty of Mathematics and Computer Science, Babes,-Bolyai University, Str. Mihail Kogălniceanu nr. 1, Cluj-Napoca 400084, Romania
;
2
Department of Computer Science, University of Oviedo, C. Jesús Arias de Velasco, s/n, Oviedo 33005, Spain
Keyword(s):
User Interface Recognition, Dataset, Computer Vision, ChatGPT, Object Detection.
Abstract:
The identification of elements in user interfaces is a problem that can generate great interest in current times due to the significant interaction between users and machines. Digital technologies are increasingly used to carry out almost any daily task. Computer vision can be helpful in different applications, such as accessibility, testing, or automatic code generation, to accurately identify the elements that make up a graphical interface. This paper focuses on one problem that affects almost any Deep Learning and computer vision problem, which is the generation and annotation of datasets. Few contributions in the literature provide datasets to train vision models to solve this problem. Moreover, analyzing the literature, most datasets focus on generating images of mobile applications, all in English. In this paper, we propose GenGUI, a new dataset of desktop applications that presents various contents, including multiple languages. Furthermore, this contribution will train differ
ent versions of YOLO models using GenGUI to test their quality with reasonably good results.
(More)