COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS

Victoria López; Alberto Fernández; María José del Jesus; Francisco Herrera

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS

Topics: Classification; Cost-Sensitive Learning; Learning and Adaptive Control

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, 98-107, 2012 , Vilamoura, Algarve, Portugal

Authors: Victoria López ¹ ; Alberto Fernández ² ; María José del Jesus ² and Francisco Herrera ¹

Affiliations: ¹ University of Granada, Spain ; ² University of Jaén, Spain

Keyword(s): Classification, Imbalanced Data-sets, Preprocessing, Sampling, Cost-Sensitive Learning, Hybridizations

Related Ontology Subjects/Areas/Topics: Applications ; Classification ; Cost-Sensitive Learning ; Learning and Adaptive Control ; Pattern Recognition ; Software Engineering ; Theory and Methods

Abstract: The scenario of classification with imbalanced data-sets has supposed a serious challenge for researchers along the last years. The main handicap is related to the large number of real applications in which one of the classes of the problem has a few number of examples in comparison with the other class, making it harder to be correctly learnt and, what is most important, this minority class is usually the one with the highest interest. In order to address this problem, two main methodologies have been proposed for stressing the significance of the minority class and for achieving a good discrimination for both classes, namely preprocessing of instances and cost-sensitive learning. The former rebalances the instances of both classes by replicating or creating new instances of the minority class (oversampling) or by removing some instances of the majority class (undersampling); whereas the latter assumes higher misclassification costs with samples in the minority class and seek to min imize the high cost errors. Both solutions have shown to be valid for dealing with the class imbalance problem but, to the best of our knowledge, no comparison between both approaches have ever been performed. In this work, we carry out a full exhaustive analysis on this two methodologies, also including a hybrid procedure that tries to combine the best of these models. We will show, by means of a statistical comparative analysis developed with a large collection of more than 60 imbalanced data-sets, that we cannot highlight an unique approach among the rest, and we will discuss as a potential research line the use of hybridizations for achieving better solutions to the imbalanced data-set problem. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.138.200.66

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

López, V.; Fernández, A.; José del Jesus, M. and Herrera, F. (2012). COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-8425-99-7; ISSN 2184-4313, SciTePress, pages 98-107. DOI: 10.5220/0003751600980107

@conference{icpram12,
author={Victoria López. and Alberto Fernández. and María {José del Jesus}. and Francisco Herrera.},
title={COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2012},
pages={98-107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003751600980107},
isbn={978-989-8425-99-7},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - COST SENSITIVE AND PREPROCESSING FOR CLASSIFICATION WITH IMBALANCED DATA-SETS: SIMILAR BEHAVIOUR AND POTENTIAL HYBRIDIZATIONS
SN - 978-989-8425-99-7
IS - 2184-4313
AU - López, V.
AU - Fernández, A.
AU - José del Jesus, M.
AU - Herrera, F.
PY - 2012
SP - 98
EP - 107
DO - 10.5220/0003751600980107
PB - SciTePress