loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Carlos Alberto Alvares Rocha 1 ; 2 ; Marcos Vinícius Pinheiro Dib 1 ; 3 ; Li Weigang 1 ; 3 ; Andrea Ferreira Portela Nunes 1 ; 4 ; Allan Victor Almeida Faria 1 ; 5 ; Daniel Oliveira Cajueiro 1 ; 6 ; Maísa Kely de Melo 1 ; 7 and Victor Rafael Rezende Celestino 8 ; 1

Affiliations: 1 LAMFO - Lab. of ML in Finance and Organizations, University of Brasilia, Campus Darcy Ribeiro, Brasilia, Brazil ; 2 PPMEC, Faculty of Technology, University of Brasilia, Federal District, Brazil ; 3 TransLab, Department of Computer Science, University of Brasilia, Campus Darcy Ribeiro, Brasilia, Brazil ; 4 Ministry of Science, Technology and Innovation of Brazil, Federal District, Brazil ; 5 Department of Statistics, University of Brasília, Federal District, Brazil ; 6 Department of Economics, University of Brasilia, Federal District, Brazil ; 7 Department of Mathematics, Instituto Federal de Minas Gerais Campus Formiga, Formiga, Brazil ; 8 Department of Business Administration, University of Brasilia, Federal District, Brazil

Keyword(s): CNN, Deep Learning, MCTI, Longformer, Web Long-text Classification, LSTM, Transfer-learning, Word2vec.

Abstract: Text classification is a traditional problem in Natural Language Processing (NLP). Most of the state-of-the-art implementations require high-quality, voluminous, labeled data. Pre-trained models on large corpora have shown beneficial for text classification and other NLP tasks, but they can only take a limited amount of symbols as input. This is a real case study that explores different machine learning strategies to classify a small amount of long, unstructured, and uneven data to find a proper method with good performance. The collected data includes texts of financing opportunities the international R&D funding organizations provided on their websites. The main goal is to find international R&D funding eligible for Brazilian researchers, sponsored by the Ministry of Science, Technology and Innovation. We use pre-training and word embedding solutions to learn the relationship of the words from other datasets with considerable similarity and larger scale. Then, using the acquired fe atures, based on the available dataset from MCTI, we apply transfer learning plus deep learning models to improve the comprehension of each sentence. Compared to the baseline accuracy rate of 81%, based on the available datasets, and the 85% accuracy rate achieved through a Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 88%. The research results serve as a successful case of artificial intelligence in a federal government application. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.235.25.27

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Rocha, C.; Dib, M.; Weigang, L.; Nunes, A.; Faria, A.; Cajueiro, D.; Kely de Melo, M. and Celestino, V. (2022). Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data. In Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST, ISBN 978-989-758-613-2; ISSN 2184-3252, pages 201-213. DOI: 10.5220/0011527700003318

@conference{webist22,
author={Carlos Alberto Alvares Rocha. and Marcos Vinícius Pinheiro Dib. and Li Weigang. and Andrea Ferreira Portela Nunes. and Allan Victor Almeida Faria. and Daniel Oliveira Cajueiro. and Maísa {Kely de Melo}. and Victor Rafael Rezende Celestino.},
title={Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data},
booktitle={Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST,},
year={2022},
pages={201-213},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011527700003318},
isbn={978-989-758-613-2},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST,
TI - Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data
SN - 978-989-758-613-2
IS - 2184-3252
AU - Rocha, C.
AU - Dib, M.
AU - Weigang, L.
AU - Nunes, A.
AU - Faria, A.
AU - Cajueiro, D.
AU - Kely de Melo, M.
AU - Celestino, V.
PY - 2022
SP - 201
EP - 213
DO - 10.5220/0011527700003318