loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Carlos Rocha ; Jonatas Grosman ; Fernando Correia ; Venicius Rego and Hélio Lopes

Affiliation: Department of Informatics, PUC-Rio, Marquês de São Vicente, 225 RDC, 4th floor - Gávea, Rio de Janeiro, Brazil

Keyword(s): Data Annotation, Large Language Model, Visual Question-Answering, Documents, Machine Learning.

Abstract: Documents are crucial for the economic and academic systems, yet extracting information from them can be complex and time-consuming. Visual Question Answering (VQA) models address this challenge using natural language prompts to extract information. However, their development depends on annotated datasets, which are costly to produce. To face this challenge, we propose a four-step process that combines Computer Vision Models and Large Language Models (LLMs) for VQA data annotation in financial reports. This method starts with Document Layout Analysis and Table Structure Extraction to identify document structures. Then, it uses two distinct LLMs for the generation and evaluation of question and answer pairs, automating the construction and selection of the best pairs for the final dataset. As a result, we found Mixtral-8x22B and GPT-4o mini to be the most cost-benefit for generating pairs, while Claude 3.5 Sonnet performed best for evaluation, aligning closely with human assessments.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.179

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Rocha, C., Grosman, J., Correia, F., Rego, V. and Lopes, H. (2025). A Data Annotation Approach Using Large Language Models. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8; ISSN 2184-4992, SciTePress, pages 748-755. DOI: 10.5220/0013280100003929

@conference{iceis25,
author={Carlos Rocha and Jonatas Grosman and Fernando Correia and Venicius Rego and Hélio Lopes},
title={A Data Annotation Approach Using Large Language Models},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={748-755},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013280100003929},
isbn={978-989-758-749-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - A Data Annotation Approach Using Large Language Models
SN - 978-989-758-749-8
IS - 2184-4992
AU - Rocha, C.
AU - Grosman, J.
AU - Correia, F.
AU - Rego, V.
AU - Lopes, H.
PY - 2025
SP - 748
EP - 755
DO - 10.5220/0013280100003929
PB - SciTePress