Authors:
Camelia Vidrighin Bratu
and
Rodica Potolea
Affiliation:
Technical University of Cluj-Napoca, Romania
Keyword(s):
Preprocessing, Unified Methodology, Feature Selection, Data Imputation.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Business Analytics
;
Computational Intelligence
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Health Information Systems
;
Human-Computer Interaction
;
Methodologies and Methods
;
Neural Network Software and Applications
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Theory and Methods
Abstract:
Data-related issues represent the main obstacle in obtaining a high quality data mining process. Existing strategies for preprocessing the available data usually focus on a single aspect, such as incompleteness, or dimensionality, or filtering out “harmful” attributes, etc. In this paper we propose a unified methodology for data preprocessing, which considers several aspects at the same time. The novelty of the approach consists in enhancing the data imputation step with information from the feature selection step, and performing both operations jointly, as two phases in the same activity. The methodology performs data imputation only on the attributes which are optimal for the class (from the feature selection point of view). Imputation is performed using machine learning methods. When imputing values for a given attribute, the optimal subset (of features) for that attribute is considered. The methodology is not restricted to the use of a particular technique, but can be applied usi
ng any existing data imputation and feature selection methods.
(More)