Authors:
M. Julia Flores
and
José A. Gámez
Affiliation:
University of Castilla - La Mancha, Spain
Keyword(s):
Bayesian Networks, Supervised Classification, Data Mining, Imbalanced Datasets, Naive Bayes.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Bayesian Networks
;
Computational Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
;
Uncertainty in AI
Abstract:
In this paper we present a study on the behaviour of some representative Bayesian Networks Classifiers
(BNCs), when the dataset they are learned from presents imbalanced data, that is, there are far fewer cases
labelled with a particular class value than with the other ones (assuming binary classification problems). This
is a typical source of trouble in some datasets, and the development of more robust techniques is currently
very important. In this study, we have selected a benchmark of 129 imbalanced datasets, and performed an
analytical approach focusing on BNCs. Our results show good performance of these classifiers, that outperform
decision trees (C4.5). Finally, an algorithm to improve the performance of any BNC is also given. We
have carried out an experimentation where we show how the using of oversampling of the minority class to
achieve the desired value for the imbalance ratio (IR), which is the division of the number of cases for the
majority class by the cases of the mi
nority class. From this work we can conclude that BNCs show a very
good performance for imbalanced datasets, and that our proposal enhance their results for those datasets that
provided poor results.
(More)