Multimodal Large Language Models for Portuguese Alternative Text Generation for Images

Víctor Alexsandro Elisiário, Willian Massami Watanabe

2025

Abstract

Since the creation of the Web Content Accessibility Guidelines (WCAG), the Web has become increasingly accessible to people with disabilities. However, related works report that Web developers are not always aware of accessibility specifications and many Web applications still contain accessibility barriers. Therefore, this work proposes the use of Multimodal Large Language Models (MLLM), leveraging Google’s Cloud Vision API and contextual information extracted from Web pages’ HTML, to generate alternative texts for images using the Gemini-1.5-Pro model. To evaluate this approach, a case study was conducted to analyze the perceived relevance of the generated descriptions. Six Master’s students in Computer Science participated in a blind analysis, assessing the relevance of the descriptions produced by the MLLM alongside the original alternative texts provided by the page authors. The evaluations were compared to measure the relative quality of the descriptions. The results indicate that the descriptions generated by the MLLM are at least equivalent to those created by humans. Notably, the best performance was achieved without incorporating additional contextual data. These findings suggest that alternative texts generated by MLLMs can effectively meet the needs of blind or visually impaired users, thereby enhancing their access to Web content.

Download


Paper Citation


in Harvard Style

Elisiário V. and Watanabe W. (2025). Multimodal Large Language Models for Portuguese Alternative Text Generation for Images. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-772-6, SciTePress, pages 493-501. DOI: 10.5220/0013673800003985


in Bibtex Style

@conference{webist25,
author={Víctor Elisiário and Willian Watanabe},
title={Multimodal Large Language Models for Portuguese Alternative Text Generation for Images},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2025},
pages={493-501},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013673800003985},
isbn={978-989-758-772-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - Multimodal Large Language Models for Portuguese Alternative Text Generation for Images
SN - 978-989-758-772-6
AU - Elisiário V.
AU - Watanabe W.
PY - 2025
SP - 493
EP - 501
DO - 10.5220/0013673800003985
PB - SciTePress