
ity. Thus, this study contributes by presenting a viable
alternative for generating alternative texts for images
that lack any description.
Unlike other tools (Azure Computer Vision En-
gine, Amazon Rekognition, Cloudsight, and Auto
Alt-Text for Google Chrome) examined in previous
studies (Leotta et al., 2023), MLLM was assessed
by sighted individuals as capable of yielding descrip-
tions at least equivalent to those created by humans.
However, further research is necessary to determine
whether the method meets the expectations of peo-
ple with visual disabilities, as reported in (Jung et al.,
2022).
8 CONCLUSION AND FUTURE
WORK
The use of MLLM for generating alternative texts for
Web images shows substantial potential in enhanc-
ing accessibility for individuals with visual impair-
ments. The study suggested that MLLM-generated
descriptions could serve as valuable alternatives when
human-written texts are unavailable, without compro-
mising the information for visually impaired users. It
is evident that utilizing contextual data did not pro-
duce results superior to descriptions generated solely
from the image, exhibiting the method’s versatility
across various environments. As it requires no ad-
ditional context, the method can generate alternative
texts for standalone images, highlighting the potential
of MLLMs in addressing accessibility challenges and
fostering a more inclusive digital environment for all.
Possible future work includes: (1) testing the ap-
proach with pages of diverse topics and images; (2)
validating the results with visually impaired individ-
uals and a larger and more diverse group; (3) utiliz-
ing alternative resources for generating descriptions,
such as other MLLMs like GPT-4 and more contex-
tual data; and (4) modifying prompt parameters to as-
sess MLLM’s capacity to produce more precise re-
sults. Such research could significantly contribute to
reducing Web accessibility barriers.
REFERENCES
Aljedaani, W., Eler, M. M., and Parthasarathy, P. D.
(2025). Enhancing accessibility in software engi-
neering projects with large language models (llms).
In Proceedings of the 56th ACM Technical Sympo-
sium on Computer Science Education V. 1, SIGCSETS
2025, page 25–31, New York, NY, USA. Association
for Computing Machinery.
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller,
R. C., Miller, R., Tatarowicz, A., White, B., White, S.,
and Yeh, T. (2010). Vizwiz: nearly real-time answers
to visual questions. In Proceedings of the 23nd an-
nual ACM symposium on User interface software and
technology, UIST ’10, page 333–342, New York, NY,
USA. Association for Computing Machinery.
Brasil (2015). Lei nº 13.146, de 06 de julho de 2015.
Institui a Lei Brasileira de Inclus
˜
ao da Pessoa com
Defici
ˆ
encia (Estatuto da Pessoa com Defici
ˆ
encia).
Brasil, Bras
´
ılia, DF.
Faisal, F., Salam, M. A., Habib, M. B., Islam, M. S., and
Nishat, M. M. (2022). Depth estimation from video
using computer vision and machine learning with hy-
perparameter optimization. In 2022 4th International
Conference on Smart Sensors and Application (IC-
SSA), pages 39–44, Kuala Lumpur, Malaysia. IEEE.
Gleason, C., Carrington, P., Cassidy, C., Morris, M. R., Ki-
tani, K. M., and Bigham, J. P. (2019). “it’s almost like
they’re trying to hide it”: How user-provided image
descriptions have failed to make twitter accessible. In
The World Wide Web Conference, WWW ’19, page
549–559, New York, NY, USA. Association for Com-
puting Machinery.
Gleason, C., Pavel, A., McCamey, E., Low, C., Carrington,
P., Kitani, K. M., and Bigham, J. P. (2020). Twit-
ter a11y: A browser extension to make twitter images
accessible. In Proceedings of the 2020 CHI Confer-
ence on Human Factors in Computing Systems, CHI
’20, page 1–12, New York, NY, USA. Association for
Computing Machinery.
Guinness, D., Cutrell, E., and Morris, M. R. (2018). Caption
crawler: Enabling reusable alternative text descrip-
tions using reverse image search. In Proceedings of
the 2018 CHI Conference on Human Factors in Com-
puting Systems, CHI ’18, page 1–11, New York, NY,
USA. Association for Computing Machinery.
Hafeth, D. A., Lal, G., Al-Khafajiy, M., Baker, T., and Kol-
lias, S. (2023). Cloud-iot application for scene under-
standing in assisted living: Unleashing the potential
of image captioning and large language model (chat-
gpt). In 2023 16th International Conference on Devel-
opments in eSystems Engineering (DeSE), pages 150–
155, Istanbul, Turkiye. IEEE.
Hajizadeh Saffar, A., Sitbon, L., Hoogstrate, M., Abbas,
A., Roomkham, S., and Miller, D. (2024). Human and
large language model intent detection in image-based
self-expression of people with intellectual disability.
In Proceedings of the 2024 Conference on Human In-
formation Interaction and Retrieval, CHIIR ’24, page
199–208, New York, NY, USA. Association for Com-
puting Machinery.
Huh, M., Peng, Y.-H., and Pavel, A. (2023). Genassist:
Making image generation accessible. In Proceedings
of the 36th Annual ACM Symposium on User Interface
Software and Technology, UIST ’23, New York, NY,
USA. Association for Computing Machinery.
Inal, Y., Mishra, D., and Torkildsby, A. B. (2022). An
analysis of web content accessibility of municipal-
ity websites for people with disabilities in norway:
Web accessibility of norwegian municipality web-
WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies
500