Authors:
Natascha Hoebel
and
Stanislav Kreuzer
Affiliation:
Goethe University Frankfurt, Germany
Keyword(s):
User profile analysis, Clustering, Ordinal data, Optimization, Web mining.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Engineering
;
e-Business and e-Commerce
;
Internet Technology
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Ontologies and the Semantic Web
;
Society, e-Business and e-Government
;
Soft Computing
;
Symbolic Systems
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
;
Web Mining
;
Web Personalization
;
Web Services and Web Engineering
Abstract:
This paper presents CORD, a hybrid clustering system, which combines modifications of three modern clustering approaches to create a hybrid solution, that is able to efficiently process very large sets of ordinal data. The Self-organizing Maps algorithm for categorical data by Chen and Marques is hereby used for a rough preclustering for finding the initial position and number of centroids. The main clustering task utilizes a k-modes algorithm and its fuzzy set extension described by Kim et al. for categorical data using fuzzy centroids. Finally in dealing with large amounts of data, the BIRCH algorithm described by Zhang et al. for efficient clustering of very large databases (VLDBs) is adapted to ordinal data. BIRCH can be used as a preliminary phase for both Fuzzy Centroids and NCSOM. Both algorithms profit from this symbiosis as their iterative computations can be done on data, that is fully held in main memory. Combining these approaches, the resulting system is able to extract
significant information even from very large datasets efficiently. The presented reference implementation of the hybrid system shows good results. The aim is clustering and visual analyzing large amounts of user profiles. This should help in understandingWeb user behavior and personalize advertisement.
(More)