SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

Reda Al-Bahrani, Dipendra Jha, Qiao Kang, Sunwoo Lee, Zijiang Yang, Wei-Keng Liao, Ankit Agrawal, Alok Choudhary

Abstract

Machine learning models trained on imbalanced datasets tend to produce sub-optimal results. This happens because the learning of the minority classes is dominated by the learning of the majority class. Recommendations to overcome this obstacle include oversampling the minority class by synthesizing new instances and using different performance measures. We propose a novel approach to handle the imbalance in datasets by using a sequence-to-sequence recurrent neural network to synthesize minority class instances. The generative neural network is trained on the minority class instances to learn its data distribution; the generative neural network is then used to synthesize minority class instances; these instances are used to augment the original dataset and balance the minority class. We evaluate our proposed approach against several imbalanced datasets. We train Decision Tree models on the original and augmented datasets and compare their results against the Synthetic Minority Over-sampling TEchnique (SMOTE), Adaptive Synthetic sampling (ADASYN) and Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC). All results are an average of multiple runs and the results are compared across four different performance metrics. SIGRNN performs well compared to SMOTE and ADASYN, specifically in lower percentage increments to the minority class. Also, SIGRNN outperforms SMOTE-NC on datasets having nominal features.

Download


Paper Citation


in Harvard Style

Al-Bahrani R., Jha D., Kang Q., Lee S., Yang Z., Liao W., Agrawal A. and Choudhary A. (2021). SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network.In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-486-2, pages 349-356. DOI: 10.5220/0010348103490356


in Bibtex Style

@conference{icpram21,
author={Reda Al-Bahrani and Dipendra Jha and Qiao Kang and Sunwoo Lee and Zijiang Yang and Wei-Keng Liao and Ankit Agrawal and Alok Choudhary},
title={SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2021},
pages={349-356},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010348103490356},
isbn={978-989-758-486-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network
SN - 978-989-758-486-2
AU - Al-Bahrani R.
AU - Jha D.
AU - Kang Q.
AU - Lee S.
AU - Yang Z.
AU - Liao W.
AU - Agrawal A.
AU - Choudhary A.
PY - 2021
SP - 349
EP - 356
DO - 10.5220/0010348103490356