ROBUST, GENERALIZED, QUICK AND EFFICIENT AGGLOMERATIVE CLUSTERING

Manolis Wallace, Stefanos Kollias

2004

Abstract

Hierarchical approaches, which are dominated by the generic agglomerative clustering algorithm, are suitable for cases in which the count of distinct clusters in the data is not known a priori; this is not a rare case in real data. On the other hand, important problems are related to their application, such as susceptibility to errors in the initial steps that propagate all the way to the final output and high complexity. Finally, similarly to all other clustering techniques, their efficiency decreases as the dimensionality of their input increases. In this paper we propose a robust, generalized, quick and efficient extension to the generic agglomerative clustering process. Robust refers to the proposed approach’s ability to overcome the classic algorithm’s susceptibility to errors in the initial steps, generalized to its ability to simultaneously consider multiple distance metrics, quick to its suitability for application to larger datasets via the application of the computationally expensive components to only a subset of the available data samples and efficient to its ability to produce results that are comparable to those of trained classifiers, largely outperforming the generic agglomerative process.

References

  1. Aggarwal, C.C., Yu, P.S. 2002 Redefining clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14 2 , 210-225.
  2. Bagui, S.C., Bagui, S., Pal, K., Pal, N.R. 2003 Breast cancer detection using rank nearest neighbor classification rules. Pattern Recognition 36, 25-34.
  3. Breiman, L. 1996 Bagging Predictors, Machine Learning 24 2 , 123-140.
  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. 1984 Classification and Regression Trees, Wadsworth, California.
  5. Friedman, J.H. 1997 On bias, variance, 0/1-loss, and the curse-of-dimensionality, Data Mining Knowledge Discovery 1 1 , 55-77.
  6. Halgamuge, S., Glesner, M. 1994 Neural Networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems 65, 1-12
  7. Haykin, S. 1999 Neural Networks: A Comprehensive Foundation, 2nd edition. Prentice Hall.
  8. Hirota, K., Pedrycz, W. 1999 Fuzzy computing for data mining. Proceedings of the IEEE 87, 1575-1600.
  9. Hothorn, T., Lausen, B. 2003 Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognition 36, 1303-1309.
  10. Kasabov, N. 1996 Learning fuzzy rules and approximate reasoning in fuzzy neural networks and hybrid systems. Fuzzy Sets and Systems 82, 135-149
  11. Kasabov, N., Woodford, B. 1999 Rule insertion and rule extraction from evolving fuzzy neural networks: Algorithms and applications for building adaptive, intelligent, expert systems. Proceedings of the IEEE International Conference on Fuzzy Systems FUZZIEEE
  12. Lim, T.-S., Loh, W.-Y., Shih, Y.-S. 2000 A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning 40, 203-229.
  13. Miyamoto, S. 1990 Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers.
  14. Nauk, D., Kruse, R. 1997 A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy sets and Systems 8, 277-288
  15. Tsapatsoulis, N., Wallace, M. and Kasderidis, S. 2003 Improving the Performance of Resource Allocation Networks through Hierarchical Clustering of High - Dimensional Data. Proceedings of the International Conference on Artificial Neural Networks ICANN , Istanbul, Turkey.
  16. Wallace, M., Akrivas, G. and Stamou, G. 2003 Automatic Thematic Categorization of Documents Using a Fuzzy Taxonomy and Fuzzy Hierarchical Clustering, Proceedings of the IEEE International Conference on Fuzzy Systems FUZZ-IEEE , St. Louis, MO, USA.
  17. Wallace, M. and Kollias, S. 2003 Soft Attribute Selection for Hierarchical Clustering in High Dimensions, Proceedings of the International Fuzzy Systems Association World Congress IFSA , Istanbul, Turkey.
  18. Wallace, M. and Stamou, G. 2002 Towards a Context Aware Mining of User Interests for Consumption of Multimedia Documents, Proceedings of the IEEE International Conference on Multimedia and Expo ICME , Lausanne, Switzerland.
  19. Wu, K.L., Yang, M.S. 2002 Alternative c-means clustering algorithms, Pattern Recognition 35 10 , 2267-2278.
  20. Yager, R.R. 2000 Intelligent control of the hierarchical agglomerative clustering process. IEEE Transactions on Systems, Man and Cybernetics, Part B 30 6 , 835- 845.
  21. IST-1999-20502. FAETHON: Unified Intelligent Access to Heterogeneous Audiovisual Content. http://www.image.ece.ntua.gr/faethon/
Download


Paper Citation


in Harvard Style

Wallace M. and Kollias S. (2004). ROBUST, GENERALIZED, QUICK AND EFFICIENT AGGLOMERATIVE CLUSTERING . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 409-416. DOI: 10.5220/0002639604090416


in Bibtex Style

@conference{iceis04,
author={Manolis Wallace and Stefanos Kollias},
title={ROBUST, GENERALIZED, QUICK AND EFFICIENT AGGLOMERATIVE CLUSTERING},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={409-416},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002639604090416},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - ROBUST, GENERALIZED, QUICK AND EFFICIENT AGGLOMERATIVE CLUSTERING
SN - 972-8865-00-7
AU - Wallace M.
AU - Kollias S.
PY - 2004
SP - 409
EP - 416
DO - 10.5220/0002639604090416