A COMPREHENSIVE STUDY OF THE EFFECT OF CLASS IMBALANCE ON THE PERFORMANCE OF CLASSIFIERS

Rodica Potolea, Camelia Lemnaru

Abstract

Class imbalance is one of the significant issues which affect the performance of classifiers. In this paper we systematically analyze the effect of class imbalance on some standard classification algorithms. The study is performed on benchmark datasets, in relationship with concept complexity, size of the training set, and ratio between number of instances and number of attributes of the training set data. In the evaluation we considered six different metrics. The results indicate that the multilayer perceptron is the most robust to the imbalance in training data, while the support vector machine’s performance is the most affected. Also, we found that unpruned C4.5 models work better than the pruned versions.

References

  1. Barandela, R., Sanchez, J. S., Garcia, V., Rangel, E. (2003). Strategies for Learning in Class Imbalance Problems. Pattern Recognition. 36(3). 849--851
  2. Batista, G.E.A.P.A., Prati, R. C. Monard, M. C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, 20-29
  3. Brodersen, K.H., Ong, C.S. ,Stephen, K.E. and Buhmann, J.M. (2010). The balanced accuracy and its posterior distribution. Proc. of the 20th Int. Conf. on Pattern Recognition. pp. 3121-3124
  4. Cieslak, D. A., Chawla, N. V., Striegel, A. (2006). Combating Imbalance in Network Intrusion Datasets. In: Proceedings of the IEEE International Conference on Granular Computing. 732--737
  5. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority OverSampling Technique. Journal of Artificial Intelligence Research, 16:321--357
  6. Chawla, N. V. (2006). Data Mining from Imbalanced Data Sets, Data Mining and Knowledge Discovery Handbook, chapter 40, Springer US, 853--867
  7. Cohen, G., Hilario, M., Sax, H., Hugonnet, S., Geissbuhler, A. (2006). Learning from Imbalanced Data in Surveillance of Nosocomial Infection. Artificial Intelligence in Medicine, 37(1):7--18
  8. Garcia, S., Herrera, F. (2009). Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy, Evolutionary Computation 17(3): 275--306
  9. Grzymala-Busse, J. W., Stefanowski, J., Wilk, S. (2005). A Comparison of Two Approaches to Data Mining from Imbalanced Data. Journal of Intelligent Manufacturing, 16, Springer Science+Business Media, 65--573
  10. Guo, H., Viktor, H.L. (2004). Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach, Sigkdd Explorations. Volume 6, 30-39
  11. Huang, K., Yang, H., King, I., and Lyu, M. R. (2006). Imbalanced Learning with a Biased Minimax Probability Machine. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 36(4): 913--923
  12. Japkowicz, N., Myers, C. and Gluck, M. A. (1995). A Novelty Detection Approach to Classification. IJCAI : 518--523
  13. Japkowicz, N., and Stephen, S. (2002). The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis Journal. Volume 6: 429--449
  14. Weiss, G., and Provost, F. (2003). Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315--354
  15. Weiss, G. (2004). Mining with Rarity: A Unifying Framework, SIGKDD Explorations 6(1), 7--19
  16. Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, P. (1993). Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. Int. Journal of Pattern Rec. and AI, 7(6), 1417--1436
Download


Paper Citation


in Harvard Style

Potolea R. and Lemnaru C. (2011). A COMPREHENSIVE STUDY OF THE EFFECT OF CLASS IMBALANCE ON THE PERFORMANCE OF CLASSIFIERS . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-53-9, pages 14-21. DOI: 10.5220/0003415800140021


in Bibtex Style

@conference{iceis11,
author={Rodica Potolea and Camelia Lemnaru},
title={A COMPREHENSIVE STUDY OF THE EFFECT OF CLASS IMBALANCE ON THE PERFORMANCE OF CLASSIFIERS},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2011},
pages={14-21},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003415800140021},
isbn={978-989-8425-53-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A COMPREHENSIVE STUDY OF THE EFFECT OF CLASS IMBALANCE ON THE PERFORMANCE OF CLASSIFIERS
SN - 978-989-8425-53-9
AU - Potolea R.
AU - Lemnaru C.
PY - 2011
SP - 14
EP - 21
DO - 10.5220/0003415800140021