SMOTE: Are We Learning to Classify or to Detect Synthetic Data?

Nada Boudegzdame, Karima Sedki, Rosy Tspora, Rosy Tspora, Rosy Tspora, Jean-Baptiste Lamy

2024

Abstract

Oversampling algorithms are used as preprocess in machine learning, in the case of highly imbalanced data in an attempt to balance the number of samples per class, and therefore improve the quality of models learned. While oversampling can be effective in improving the performance of classification models on minority classes, it can also introduce several problems. From our work, it came to light that the models learn to detect the noise added by the oversampling algorithms instead of the underlying patterns. In this article, we will define oversampling, and present the most common techniques, before proposing a method for evaluating oversampling algorithms.

Download


Paper Citation


in Harvard Style

Boudegzdame N., Sedki K., Tspora R. and Lamy J. (2024). SMOTE: Are We Learning to Classify or to Detect Synthetic Data?. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 283-290. DOI: 10.5220/0012325300003636


in Bibtex Style

@conference{icaart24,
author={Nada Boudegzdame and Karima Sedki and Rosy Tspora and Jean-Baptiste Lamy},
title={SMOTE: Are We Learning to Classify or to Detect Synthetic Data?},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={283-290},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012325300003636},
isbn={978-989-758-680-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - SMOTE: Are We Learning to Classify or to Detect Synthetic Data?
SN - 978-989-758-680-4
AU - Boudegzdame N.
AU - Sedki K.
AU - Tspora R.
AU - Lamy J.
PY - 2024
SP - 283
EP - 290
DO - 10.5220/0012325300003636
PB - SciTePress