loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Maximilian Meidinger and Matthias Aßenmacher

Affiliation: Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany

Keyword(s): Benchmark, Multi-label Classification, Open-ended Responses, Transfer Learning, Pre-trained Language Models.

Abstract: In order to evaluate transfer learning models for Natural Language Processing on a common ground, numerous general domain (sets of) benchmark data sets have been established throughout the last couple of years. Primarily, the proposed tasks are classification (binary, multi-class), regression or language generation. However, no benchmark data set for (extreme) multi-label classification relying on full-text inputs has been proposed in the area of social science survey research to this date. This constitutes an important gap, as a common data set for algorithm development in this field could lead to more reproducible, sustainable research. Thus, we provide a transparent and fully reproducible preparation of the 2008 American National Election Study (ANES) data set, which can be used for benchmark comparisons of different NLP models on the task of multi-label classification. In contrast to other data sets, our data set comprises full-text inputs instead of bag-of-words representations or similar. Furthermore, we provide baseline performances of simple logistic regression models as well as performance values for recently established transfer learning architectures, namely BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019) and XLNet (Yang et al., 2019). (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.221.112.220

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Meidinger, M. and Aßenmacher, M. (2021). A New Benchmark for NLP in Social Sciences: Evaluating the Usefulness of Pre-trained Language Models for Classifying Open-ended Survey Responses. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-484-8; ISSN 2184-433X, SciTePress, pages 866-873. DOI: 10.5220/0010255108660873

@conference{icaart21,
author={Maximilian Meidinger. and Matthias Aßenmacher.},
title={A New Benchmark for NLP in Social Sciences: Evaluating the Usefulness of Pre-trained Language Models for Classifying Open-ended Survey Responses},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2021},
pages={866-873},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010255108660873},
isbn={978-989-758-484-8},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - A New Benchmark for NLP in Social Sciences: Evaluating the Usefulness of Pre-trained Language Models for Classifying Open-ended Survey Responses
SN - 978-989-758-484-8
IS - 2184-433X
AU - Meidinger, M.
AU - Aßenmacher, M.
PY - 2021
SP - 866
EP - 873
DO - 10.5220/0010255108660873
PB - SciTePress