Hybrid Model of Data Augmentation Methods for Text Classification Task

Jia Hui Feng, Mahsa Mohaghegh

2021

Abstract

Data augmentation techniques have been increasingly explored in natural language processing to create more textual data for training. However, the performance gain of existing techniques is often marginal. This paper explores the performance of combining two EDA (Easy Data Augmentation) methods, random swap and random delete for the performance in text classification. The classification tasks were conducted using CNN as a text classifier model on a portion of the SST-2: Stanford Sentiment Treebank dataset. The results show that the performance gain of this hybrid model performs worse than the benchmark accuracy. The research can be continued with a different combination of methods and experimented on larger datasets.

Download


Paper Citation


in Harvard Style

Feng J. and Mohaghegh M. (2021). Hybrid Model of Data Augmentation Methods for Text Classification Task. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 3: KMIS; ISBN 978-989-758-533-3, SciTePress, pages 194-197. DOI: 10.5220/0010688500003064


in Bibtex Style

@conference{kmis21,
author={Jia Hui Feng and Mahsa Mohaghegh},
title={Hybrid Model of Data Augmentation Methods for Text Classification Task},
booktitle={Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 3: KMIS},
year={2021},
pages={194-197},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010688500003064},
isbn={978-989-758-533-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 3: KMIS
TI - Hybrid Model of Data Augmentation Methods for Text Classification Task
SN - 978-989-758-533-3
AU - Feng J.
AU - Mohaghegh M.
PY - 2021
SP - 194
EP - 197
DO - 10.5220/0010688500003064
PB - SciTePress