A Data Annotation Approach Using Large Language Models

Carlos Rocha; Jonatas Grosman; Fernando Correia; Venicius Rego; Hélio Lopes

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

A Data Annotation Approach Using Large Language Models

Topics: Deep Learning; Industrial Applications of Artificial Intelligence; Natural Language Interfaces to Intelligent Systems

In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 748-755, 2025 , Porto, Portugal

Authors: Carlos Rocha ; Jonatas Grosman ; Fernando Correia ; Venicius Rego and Hélio Lopes

Affiliation: Department of Informatics, PUC-Rio, Marquês de São Vicente, 225 RDC, 4th floor - Gávea, Rio de Janeiro, Brazil

Keyword(s): Data Annotation, Large Language Model, Visual Question-Answering, Documents, Machine Learning.

Abstract: Documents are crucial for the economic and academic systems, yet extracting information from them can be complex and time-consuming. Visual Question Answering (VQA) models address this challenge using natural language prompts to extract information. However, their development depends on annotated datasets, which are costly to produce. To face this challenge, we propose a four-step process that combines Computer Vision Models and Large Language Models (LLMs) for VQA data annotation in financial reports. This method starts with Document Layout Analysis and Table Structure Extraction to identify document structures. Then, it uses two distinct LLMs for the generation and evaluation of question and answer pairs, automating the construction and selection of the best pairs for the final dataset. As a result, we found Mixtral-8x22B and GPT-4o mini to be the most cost-benefit for generating pairs, while Claude 3.5 Sonnet performed best for evaluation, aligning closely with human assessments.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.179

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Rocha, C., Grosman, J., Correia, F., Rego, V. and Lopes, H. (2025). A Data Annotation Approach Using Large Language Models. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8; ISSN 2184-4992, SciTePress, pages 748-755. DOI: 10.5220/0013280100003929

@conference{iceis25,
author={Carlos Rocha and Jonatas Grosman and Fernando Correia and Venicius Rego and Hélio Lopes},
title={A Data Annotation Approach Using Large Language Models},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={748-755},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013280100003929},
isbn={978-989-758-749-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - A Data Annotation Approach Using Large Language Models
SN - 978-989-758-749-8
IS - 2184-4992
AU - Rocha, C.
AU - Grosman, J.
AU - Correia, F.
AU - Rego, V.
AU - Lopes, H.
PY - 2025
SP - 748
EP - 755
DO - 10.5220/0013280100003929
PB - SciTePress