Authors:
Sašo Karakatič
;
Marjan Heričko
and
Vili Podgorelec
Affiliation:
UM FERI, Slovenia
Keyword(s):
Classification, Genetic Algorithm, Instance selection, Weighting, Bagging.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Genetic Algorithms
;
Informatics in Control, Automation and Robotics
;
Intelligent Control Systems and Optimization
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
An imbalanced or inappropriate dataset can have a negative influence in classification model training. In
this paper we present an evolutionary method that effectively weights or samples the tuples from the training
dataset and tries to minimize the negative effects from innaprotirate datasets. The genetic algorithm with
genotype of real numbers is used to evolve the weights or occurrence number for each learning tuple in the
dataset. This technique is used with individual classifiers and in combination with the ensemble technique of
bagging, where multiple classification models work together in a classification process. We present two variations
– weighting the tuples and sampling the classification tuples. Both variations are experimentally tested
in combination with individual classifiers (C4.5 and Naive Bayes methods) and in combination with bagging
ensemble. Results show that both variations are promising techniques, as they produced better classification
models than methods wit
hout weighting or sampling, which is also supported with statistical analysis.
(More)