A Comparative Study of Clustering versus Classification over Reuters Collection

Leandro Krug Wives, Stanley Loh, José Palazzo Moreira de Oliveira

2008

Abstract

People have plenty of information at their disposal. The problem is that, even with the advent of search engines, it is still complex to analyze, understand and select relevant information. In this sense, clustering techniques sound very promising, grouping related information in an organized way. This paper address some problems of the existing document clustering techniques and present the “best star” algorithm, which can be used to group and understand chunks of information and find the most relevant ones.

References

  1. Brake, D.: Lost in cyberspace. New Scientist, 154(2088):12-13, (1997)
  2. Cutting, D., Karger, D.R., Pedersen, J.O. and Tukey, J.W.: Scatter/Gather: a cluster-based approach to browsing large document collections. In Proc. of the ACM-SIGIR Conference pp. 318-329. ACM Press, New York (1992)
  3. Everitt, B.S., Landau, S. and Leese, M.: Cluster Analysis. Oxford University Press Inc, New York (2001)
  4. Farhoomand, A. F. and Drury, D. H.: Managerial information overload. Communications of the ACM, 45(10):127-131, (2002)
  5. Halkidi, M., Batistakis, Y. and Varzigiannis, M.: Cluster Validity Checking Methods: Part II. ACM SIGMOD Record, 31(3):19-27, (2002)
  6. Jain, A.K. and Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ (1988)
  7. Jain, A.K., Murty, M.N. and Flynn, P.J.: Data clustering: a review. ACM Computing Surveys, 31(3):264-323, (1999)
  8. Kowalski, G.: Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers, Boston (1997)
  9. LEWIS, D.D.: Representation and Learning in Information Retrieval. Department of Computer and Information Science. University of Massachusetts, Amherst (1991)
  10. Pedrycz, W.: Fuzzy neural networks and neurocomputations. Fuzzy Sets and Systems, 56(1):1-28, (1993)
  11. Prado, H.A.d., de Oliveira, J.P.M., Ferneda, E., Wives, L.K., Silva, E.M. and Loh, S.: Text Mining in the context of Business Intelligence. In: Khosrow-Pour, M. (ed.): Encyclopedia of Information Science and Technology. Idea Group Reference, Hershey, PA, USA (2005) 2793-2798
  12. Steinbach, M., Karypis, G. and Kumar, V.: A comparison of document clustering techniques. In Proc. of the Workshop on Textmining. pp. 2, Boston, USA (2000)
  13. Clustan, http://www.clustan.com.
  14. Wives, L.K., de Oliveira, J.P.M. and Loh, S.: Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery. In: Prado, H.A. do and Ferneda, E. (eds.): Text Mining: Techniques and Applications. Information Science Reference Hershey, PA, USA (2008) 223-243
  15. YANG, Y. and LIU, X.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1(1/2):67-88, (1999)
Download


Paper Citation


in Harvard Style

Krug Wives L., Loh S. and Palazzo Moreira de Oliveira J. (2008). A Comparative Study of Clustering versus Classification over Reuters Collection . In Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008) ISBN 978-989-8111-42-5, pages 231-236. DOI: 10.5220/0001736202310236


in Bibtex Style

@conference{pris08,
author={Leandro Krug Wives and Stanley Loh and José Palazzo Moreira de Oliveira},
title={A Comparative Study of Clustering versus Classification over Reuters Collection},
booktitle={Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)},
year={2008},
pages={231-236},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001736202310236},
isbn={978-989-8111-42-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)
TI - A Comparative Study of Clustering versus Classification over Reuters Collection
SN - 978-989-8111-42-5
AU - Krug Wives L.
AU - Loh S.
AU - Palazzo Moreira de Oliveira J.
PY - 2008
SP - 231
EP - 236
DO - 10.5220/0001736202310236