Ahmed Rafea, Ahmed El Kholy Sherif, G. Aly


This paper proposes applying Bisecting K-means algorithm, to cluster the social network discussion groups and providing a meaningful label to the cluster containing these groups. The clustering of the discussion groups is based on the heterogeneous meta-features that define each group; e.g. title, description, type, sub-type, network. The main ideas is to represent each group as a tuple of multiple feature vectors and construct a proper similarity measure to each feature space then perform the clustering using the proposed bisecting K-means clustering algorithm. The main key phrases are extracted from the titles and descriptions of the discussion groups of a given cluster and combined with the main meta-features to build a phrase label of the cluster. The analysis of the experiments results showed that combining more than one feature produced better clustering in terms of quality and interrelationship between the discussion groups of a given cluster. Some features like the Network improved the compactness and tightness of the cluster objects within the clusters while other features like the type and subtype improves the separation of the clusters.


  1. Abrantes, A., 2000. A Constrained Clustering Algorithm for Shape Analysis with Multiple Features. ICPR, 15th International Conference on Pattern Recognition (ICPR'00) - Volume 1, page 1916.
  2. Antonellis, P., Makris, C., Tsirakis, N., 2008. XEdge: Clustering Homogeneous and Heterogeneous XML Documents using Edge Summaries. In Proceedings of the 2008 ACM symposium on Applied computing, Fortaleza, pages 1081-1088.
  3. Blake, A., Isard, M. 1998. Active Con tours, Springer, Chastain, L., 2008. Social networking for Businesses and Association. Cerado Inc. Half Moon Bay.
  4. Costa, G., Manco, G., Ortale, R, Tagarelli, A., 2004. A Tree-Based Approach to Clustering XML Documents by Structure. In Proceedings of the 8th European Conference on Principles and Practice Knowledge Discovery in Databases (PKDD 7804).Pisa, pages 137- 148.
  5. Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K., 2006. A methodology for clustering XML documents by structure. In Information Systems Journal, 31(3), pages 187-228.
  6. Doucet, A., Ahonen-Myka, H., 2002. Naïve Clustering of a large XML Document Collection. In Proceedings of the 2002 Initiative for the Evaluation of XML Retrieval Workshop (INEX 7802), pages 81-87.
  7. Eterfreund, N., 1998. Robust Tracking with SpatioVelocity Snakes: Kalman Filtering Approach. ICCV, pages 433-439.
  8. Frakes, W. B., Baeza-Yates, R., 1992. Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs.
  9. Kleinberg, J., Papadimitriou, C., Raghavan, P., 1998. A Microeconomic View of Data Mining. Data Mining and Knowledge Discovery, 2(4), pages 311-324.
  10. Lee, H., Lee, C., Kim, S., 2000. Abrupt Shot Change Detection using an Unsupervised Clustering of Multiple Features. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 6, pages 2015 - 2018 Modha, D., Spangler, S., 2003. Feature Weighting in Kmeans clustering. Machine Learning, 52(3), pages 217-237.
  11. Nayak, R., Xu, S., 2006. XCLS: A Fast and Effective Clustering Algorithm for Heterogeneous XML Documents. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 7806). Singapore, pages 292-302.
  12. Salton, G., McGill, M. J., 1983. Introduction to Modern Retrieval, McGraw-Hill Book Company.
  13. Singhal, A., Buckley, C., Mitra, M., Salton, G., 1996. Pivoted Document Length Normalization. In Proc. ACM SIGIR, pages 21-29.
  14. Tagarelli, A., Greco, S., 2006. Toward Semantic XML Clustering. In Proceedings of the 2006 Siam Conference on Data Mining (SDM 7806). Maryland, pages188-199.
  15. Tan, P., Steinbach, M., Kumar, V., 2006. Introduction to Data Mining, Pearson Addison Wesley.
  16. Witten, I., Paynter, G., Frank, E., Gutwin, C., NevilleManning, C., 1999. KEA: Practical Automatic Keyphrase Extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries, Berkeley, pages 254-255.
  17. Zhong, Y., Jain, A., Dubuisson-Jolly, M., 1998. Object Tracking Using Deformable Templates, ICCV, pages 440-446.

Paper Citation

in Harvard Style

Rafea A., El Kholy Sherif A. and Aly G. (2011). LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-53-9, pages 205-210. DOI: 10.5220/0003488402050210

in Bibtex Style

author={Ahmed Rafea and Ahmed El Kholy Sherif and G. Aly},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
SN - 978-989-8425-53-9
AU - Rafea A.
AU - El Kholy Sherif A.
AU - Aly G.
PY - 2011
SP - 205
EP - 210
DO - 10.5220/0003488402050210