Authors:
Nuo Zhang
and
Toshinori Watanabe
Affiliation:
Graduate School of Information Systems, The University of Electro-Communications, Japan
Keyword(s):
Documents representation, PRDC, Independent component analysis, Feature space, Clustering, Data compression.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Data Manipulation
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Representation and Reasoning
;
Methodologies and Methods
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
There are two well-known feature representation methods, bag-of-words and N-gram models, which have been widely used in natural language processing, text mining, and web document analysis. A novel Pattern Representation scheme using Data Compression (PRDC) has been proposed for data representation. The PRDC not only can process data of linguistic text, but also can process the other multimedia data effectively. Although PRDC provides better performance than the traditional methods in some situation, it still suffers the problem of dictionary selection and construction of feature space. In this study, we propose a method for PRDC to construct an independent compressibility space, and compare the proposed method to the two other representation methods and PRDC. The performance will be compared in terms of clustering ability. Experiment results will show that the proposed method can provide better performance than that of PRDC and the other two methods.