loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

MCCD: Generating Human Natural Language Conversational Datasets

Topics: Collaborative and Social Interaction; Conversational Agents; Coupling and Integrating Heterogeneous Data Sources; Data Mining; Guidelines, Principles, Patterns and Standards; Natural Language Interfaces to Intelligent Systems; Tools, Techniques and Methodologies for System Development; User Behavior Analysis; Web Services

Authors: Matheus F. Sanches 1 ; Jader M. C. de Sá 1 ; Allan M. de Souza 1 ; Diego A. Silva 2 ; Rafael R. de Souza 1 ; Julio C. dos Reis 1 and Leandro A. Villas 1

Affiliations: 1 Institute of Computing, University of Campinas, São Paulo, Brazil ; 2 CI&T, Brazil

Keyword(s): Natural Language Processing, Data Wrangling, Data Acquisition, Human Conversation, Model Learning, Tool.

Abstract: In recent years, state-of-the-art problems related to Natural Language Processing (NLP) have been extensively explored. This includes better models for text generation and text understanding. These solutions depend highly on data to training models, such as dialogues. The limitations imposed by the lack of data in a specific language significantly limit the available datasets. This becomes worse as intensive data is required to achieve specific solutions for a particular domain. This investigation proposes MCCD, a methodology to extract human conversational datasets based on several data sources. MCCD identifies different answers to the same message differentiating various conversation flows. This enables the resulting dataset to be used in more applications. Datasets generated by MCCD can train models for different purposes, such as Questions & Answers (QA) and open-domain conversational agents. We developed a complete software tool to implement and evaluate our proposal. We applied our solution to extract human conversations from two datasets in Portuguese language. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.174.168

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sanches, M.; C. de Sá, J.; M. de Souza, A.; Silva, D.; R. de Souza, R.; Reis, J. and Villas, L. (2022). MCCD: Generating Human Natural Language Conversational Datasets. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS; ISBN 978-989-758-569-2; ISSN 2184-4992, SciTePress, pages 247-255. DOI: 10.5220/0011077400003179

@conference{iceis22,
author={Matheus F. Sanches. and Jader M. {C. de Sá}. and Allan {M. de Souza}. and Diego A. Silva. and Rafael {R. de Souza}. and Julio C. dos Reis. and Leandro A. Villas.},
title={MCCD: Generating Human Natural Language Conversational Datasets},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS},
year={2022},
pages={247-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011077400003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS
TI - MCCD: Generating Human Natural Language Conversational Datasets
SN - 978-989-758-569-2
IS - 2184-4992
AU - Sanches, M.
AU - C. de Sá, J.
AU - M. de Souza, A.
AU - Silva, D.
AU - R. de Souza, R.
AU - Reis, J.
AU - Villas, L.
PY - 2022
SP - 247
EP - 255
DO - 10.5220/0011077400003179
PB - SciTePress