Authors:
Pei Ling Lai
1
;
Yang Jin Liang
1
and
Alfred Inselberg
2
Affiliations:
1
Southern Taiwan University, Taiwan
;
2
Tel Aviv University, Israel
Keyword(s):
Classification, Divide and Conquer, Parallel Coordinates, Visualization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Analytics
;
Data Engineering
;
Data Management and Quality
;
Data Manipulation
;
Data Mining
;
Data Modeling and Visualization
;
Data Visualization
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Modeling and Managing Large Data Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
From the Nested Cavities (abbr. NC) classifier (Inselberg and Avidan, 2000) a powerful new classification approach emerged. For a dataset P and a subset S ¼P the classifer constructs a rule distinguishing the elements of S from those in P.S. The NC is a geometrical algorithm which builds a sequence of nested unbounded parallelopipeds of minimal dimensionality containing disjoint subsets of P, and from which a hypersurface (the rule) containing the subset S is obtained. The partitioning of P.S and S into disjoint subsets is very useful when the original rule obtained is either too complex or imprecise. As illustrated with examples, this separation reveals exquisite insight on the datasetfs structure. Specifically from one of the problems we studied two different types of watermines were separated. From another dataset, two distinct types of ovarian cancer were found. This process is developed and illustrated on a (sonar) dataset with 60 variables and two categories (gminesh and g
rocksh) resulting in significant understanding of the domain and simplification of the classification rule. Such a situation is generic and occurs with other datasets as illustrated with a similar decompositions of a financial dataset producing two sets of conditions determing gold prices. The divide-and-conquer extension can be automated and also allows the classification of the sub-categories to be done in parallel.
(More)