loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Yong Wang and Julia Hodges

Affiliation: Mississippi State University, United States

Abstract: Document clustering is a widely used strategy for information retrieval and text data mining. This paper describes the preliminary work for ongoing research of document clustering problems. A prototype of a document clustering system has been implemented and some basic aspects of document clustering problems have been studied. Our experimental results demonstrate that the average-link inter-cluster distance measure and TFIDF weighting function are good methods for the document clustering problem. Other investigators have indicated that the bisecting K-means method is the preferred method for document clustering. However, in our research we have found that, whereas the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for small datasets.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.82.3.33

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wang, Y. and Hodges, J. (2005). A Comparison of Document Clustering Algorithms. In Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS; ISBN 972-8865-28-7, SciTePress, pages 186-191. DOI: 10.5220/0002557501860191

@conference{pris05,
author={Yong Wang. and Julia Hodges.},
title={A Comparison of Document Clustering Algorithms},
booktitle={Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS},
year={2005},
pages={186-191},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002557501860191},
isbn={972-8865-28-7},
}

TY - CONF

JO - Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS
TI - A Comparison of Document Clustering Algorithms
SN - 972-8865-28-7
AU - Wang, Y.
AU - Hodges, J.
PY - 2005
SP - 186
EP - 191
DO - 10.5220/0002557501860191
PB - SciTePress