Ki Jun Lee, Myungjin Lee, Wooju Kim



With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm’s effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines’ usability.


  1. Aixin, S., Maggy, S., Ying, L. 2007. Blog Classification Using Tags: An Empirical Study. In ICADL 2007.
  2. Broder A. 2002. A Taxonomy of Web Search. In SIGIR Forum.
  3. Chung, Y.M., Lee, J.Y. 2001. A corpus-based approach to comparative evaluation of statistical term association measures. In J. of the American Society for Information Science and Technology.
  4. Fujiki, T., Nanno, T., Suzuki, Y., Okumura, M. 2004. Identification of Bursts in a Document Stream. In First International Workshop on Knowledge Discovery 2004.
  5. Fujimura, K.,Toda, H., Inoue, T., Hiroshima, N., Kataoka, R., Sugizaki M. 2006. BLOGRANGER - A multifaceted Blog Search Engine. In WWW 2006.
  6. Gilad, M., Maarten, R. 2006. A Study of Blog Search. In ECIR 2006. LNCS 3936.
  7. Kumar, R., Novak, J., Raghavan, P., Tomkins, A. 2003. On the bursty evolution of blogspace. In WWW'03: Proceedings of the 12th international conference on world wide web. ACM Press.
  8. Macqueen J. 1967. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. University of California Press.
  9. Mukul, J., Nikhil, B. 2006. BlogHarvest: Blog Mining and Search Framework. In COMAD 2006.
  10. Rand, W.M. 1971. Objective Criteria for The Evaluation of clustering Methods. In J. of the American Statistical Association.
  11. Takama, Y., Kajinami, T., Matsumura, A. Application of Keyword Map-based Relevance Feedback to Interactive Blog Search. In IEEE 2005.
  12. Technorati Weblog: State of the Blogsphere,

Paper Citation

in Harvard Style

Jun Lee K., Lee M. and Kim W. (2009). BLOG CLASSIFICATION USING K-MEANS . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS, ISBN 978-989-8111-87-6, pages 61-67. DOI: 10.5220/0001949600610067

in Bibtex Style

author={Ki Jun Lee and Myungjin Lee and Wooju Kim},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,},

in EndNote Style

JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,
SN - 978-989-8111-87-6
AU - Jun Lee K.
AU - Lee M.
AU - Kim W.
PY - 2009
SP - 61
EP - 67
DO - 10.5220/0001949600610067