loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Victoria López 1 ; Alberto Fernández 2 ; María José del Jesus 2 and Francisco Herrera 1

Affiliations: 1 University of Granada, Spain ; 2 University of Jaén, Spain

Keyword(s): Classification, Imbalanced Data-sets, Preprocessing, Sampling, Cost-Sensitive Learning, Hybridizations

Related Ontology Subjects/Areas/Topics: Applications ; Classification ; Cost-Sensitive Learning ; Learning and Adaptive Control ; Pattern Recognition ; Software Engineering ; Theory and Methods

Abstract: The scenario of classification with imbalanced data-sets has supposed a serious challenge for researchers along the last years. The main handicap is related to the large number of real applications in which one of the classes of the problem has a few number of examples in comparison with the other class, making it harder to be correctly learnt and, what is most important, this minority class is usually the one with the highest interest. In order to address this problem, two main methodologies have been proposed for stressing the significance of the minority class and for achieving a good discrimination for both classes, namely preprocessing of instances and cost-sensitive learning. The former rebalances the instances of both classes by replicating or creating new instances of the minority class (oversampling) or by removing some instances of the majority class (undersampling); whereas the latter assumes higher misclassification costs with samples in the minority class and seek to min imize the high cost errors. Both solutions have shown to be valid for dealing with the class imbalance problem but, to the best of our knowledge, no comparison between both approaches have ever been performed. In this work, we carry out a full exhaustive analysis on this two methodologies, also including a hybrid procedure that tries to combine the best of these models. We will show, by means of a statistical comparative analysis developed with a large collection of more than 60 imbalanced data-sets, that we cannot highlight an unique approach among the rest, and we will discuss as a potential research line the use of hybridizations for achieving better solutions to the imbalanced data-set problem. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.138.200.66

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
López, V.; Fernández, A.; José del Jesus, M. and Herrera, F. (2012). COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-8425-99-7; ISSN 2184-4313, SciTePress, pages 98-107. DOI: 10.5220/0003751600980107

@conference{icpram12,
author={Victoria López. and Alberto Fernández. and María {José del Jesus}. and Francisco Herrera.},
title={COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2012},
pages={98-107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003751600980107},
isbn={978-989-8425-99-7},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS
SN - 978-989-8425-99-7
IS - 2184-4313
AU - López, V.
AU - Fernández, A.
AU - José del Jesus, M.
AU - Herrera, F.
PY - 2012
SP - 98
EP - 107
DO - 10.5220/0003751600980107
PB - SciTePress