LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS

Ahmed Rafea, Ahmed El Kholy Sherif, G. Aly

2011

Abstract

This paper proposes applying Bisecting K-means algorithm, to cluster the social network discussion groups and providing a meaningful label to the cluster containing these groups. The clustering of the discussion groups is based on the heterogeneous meta-features that define each group; e.g. title, description, type, sub-type, network. The main ideas is to represent each group as a tuple of multiple feature vectors and construct a proper similarity measure to each feature space then perform the clustering using the proposed bisecting K-means clustering algorithm. The main key phrases are extracted from the titles and descriptions of the discussion groups of a given cluster and combined with the main meta-features to build a phrase label of the cluster. The analysis of the experiments results showed that combining more than one feature produced better clustering in terms of quality and interrelationship between the discussion groups of a given cluster. Some features like the Network improved the compactness and tightness of the cluster objects within the clusters while other features like the type and subtype improves the separation of the clusters.

References

  1. Abrantes, A., 2000. A Constrained Clustering Algorithm for Shape Analysis with Multiple Features. ICPR, 15th International Conference on Pattern Recognition (ICPR'00) - Volume 1, page 1916.
  2. Abrantes, A., 2000. A Constrained Clustering Algorithm for Shape Analysis with Multiple Features. ICPR, 15th International Conference on Pattern Recognition (ICPR'00) - Volume 1, page 1916.
  3. Antonellis, P., Makris, C., Tsirakis, N., 2008. XEdge: Clustering Homogeneous and Heterogeneous XML Documents using Edge Summaries. In Proceedings of the 2008 ACM symposium on Applied computing, Fortaleza, pages 1081-1088.
  4. Antonellis, P., Makris, C., Tsirakis, N., 2008. XEdge: Clustering Homogeneous and Heterogeneous XML Documents using Edge Summaries. In Proceedings of the 2008 ACM symposium on Applied computing, Fortaleza, pages 1081-1088.
  5. Blake, A., Isard, M. 1998. Active Con tours, Springer, Chastain, L., 2008. Social networking for Businesses and Association. Cerado Inc. Half Moon Bay.
  6. Blake, A., Isard, M. 1998. Active Con tours, Springer, Chastain, L., 2008. Social networking for Businesses and Association. Cerado Inc. Half Moon Bay.
  7. Costa, G., Manco, G., Ortale, R, Tagarelli, A., 2004. A Tree-Based Approach to Clustering XML Documents by Structure. In Proceedings of the 8th European Conference on Principles and Practice Knowledge Discovery in Databases (PKDD 7804).Pisa, pages 137- 148.
  8. Costa, G., Manco, G., Ortale, R, Tagarelli, A., 2004. A Tree-Based Approach to Clustering XML Documents by Structure. In Proceedings of the 8th European Conference on Principles and Practice Knowledge Discovery in Databases (PKDD 7804).Pisa, pages 137- 148.
  9. Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K., 2006. A methodology for clustering XML documents by structure. In Information Systems Journal, 31(3), pages 187-228.
  10. Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K., 2006. A methodology for clustering XML documents by structure. In Information Systems Journal, 31(3), pages 187-228.
  11. Doucet, A., Ahonen-Myka, H., 2002. Naïve Clustering of a large XML Document Collection. In Proceedings of the 2002 Initiative for the Evaluation of XML Retrieval Workshop (INEX 7802), pages 81-87.
  12. Doucet, A., Ahonen-Myka, H., 2002. Naïve Clustering of a large XML Document Collection. In Proceedings of the 2002 Initiative for the Evaluation of XML Retrieval Workshop (INEX 7802), pages 81-87.
  13. Eterfreund, N., 1998. Robust Tracking with SpatioVelocity Snakes: Kalman Filtering Approach. ICCV, pages 433-439.
  14. Eterfreund, N., 1998. Robust Tracking with SpatioVelocity Snakes: Kalman Filtering Approach. ICCV, pages 433-439.
  15. Frakes, W. B., Baeza-Yates, R., 1992. Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs.
  16. Frakes, W. B., Baeza-Yates, R., 1992. Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs.
  17. Kleinberg, J., Papadimitriou, C., Raghavan, P., 1998. A Microeconomic View of Data Mining. Data Mining and Knowledge Discovery, 2(4), pages 311-324.
  18. Kleinberg, J., Papadimitriou, C., Raghavan, P., 1998. A Microeconomic View of Data Mining. Data Mining and Knowledge Discovery, 2(4), pages 311-324.
  19. Lee, H., Lee, C., Kim, S., 2000. Abrupt Shot Change Detection using an Unsupervised Clustering of Multiple Features. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 6, pages 2015 - 2018 Modha, D., Spangler, S., 2003. Feature Weighting in Kmeans clustering. Machine Learning, 52(3), pages 217-237.
  20. Lee, H., Lee, C., Kim, S., 2000. Abrupt Shot Change Detection using an Unsupervised Clustering of Multiple Features. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 6, pages 2015 - 2018 Modha, D., Spangler, S., 2003. Feature Weighting in Kmeans clustering. Machine Learning, 52(3), pages 217-237.
  21. Nayak, R., Xu, S., 2006. XCLS: A Fast and Effective Clustering Algorithm for Heterogeneous XML Documents. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 7806). Singapore, pages 292-302.
  22. Nayak, R., Xu, S., 2006. XCLS: A Fast and Effective Clustering Algorithm for Heterogeneous XML Documents. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 7806). Singapore, pages 292-302.
  23. Salton, G., McGill, M. J., 1983. Introduction to Modern Retrieval, McGraw-Hill Book Company.
  24. Salton, G., McGill, M. J., 1983. Introduction to Modern Retrieval, McGraw-Hill Book Company.
  25. Singhal, A., Buckley, C., Mitra, M., Salton, G., 1996. Pivoted Document Length Normalization. In Proc. ACM SIGIR, pages 21-29.
  26. Singhal, A., Buckley, C., Mitra, M., Salton, G., 1996. Pivoted Document Length Normalization. In Proc. ACM SIGIR, pages 21-29.
  27. Tagarelli, A., Greco, S., 2006. Toward Semantic XML Clustering. In Proceedings of the 2006 Siam Conference on Data Mining (SDM 7806). Maryland, pages188-199.
  28. Tagarelli, A., Greco, S., 2006. Toward Semantic XML Clustering. In Proceedings of the 2006 Siam Conference on Data Mining (SDM 7806). Maryland, pages188-199.
  29. Tan, P., Steinbach, M., Kumar, V., 2006. Introduction to Data Mining, Pearson Addison Wesley.
  30. Tan, P., Steinbach, M., Kumar, V., 2006. Introduction to Data Mining, Pearson Addison Wesley.
  31. Witten, I., Paynter, G., Frank, E., Gutwin, C., NevilleManning, C., 1999. KEA: Practical Automatic Keyphrase Extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries, Berkeley, pages 254-255.
  32. Witten, I., Paynter, G., Frank, E., Gutwin, C., NevilleManning, C., 1999. KEA: Practical Automatic Keyphrase Extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries, Berkeley, pages 254-255.
  33. Zhong, Y., Jain, A., Dubuisson-Jolly, M., 1998. Object Tracking Using Deformable Templates, ICCV, pages 440-446.
  34. Zhong, Y., Jain, A., Dubuisson-Jolly, M., 1998. Object Tracking Using Deformable Templates, ICCV, pages 440-446.
Download


Paper Citation


in Harvard Style

Rafea A., El Kholy Sherif A. and Aly G. (2011). LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-53-9, pages 205-210. DOI: 10.5220/0003488402050210


in Harvard Style

Rafea A., El Kholy Sherif A. and Aly G. (2011). LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-53-9, pages 205-210. DOI: 10.5220/0003488402050210


in Bibtex Style

@conference{iceis11,
author={Ahmed Rafea and Ahmed El Kholy Sherif and G. Aly},
title={LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2011},
pages={205-210},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003488402050210},
isbn={978-989-8425-53-9},
}


in Bibtex Style

@conference{iceis11,
author={Ahmed Rafea and Ahmed El Kholy Sherif and G. Aly},
title={LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2011},
pages={205-210},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003488402050210},
isbn={978-989-8425-53-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS
SN - 978-989-8425-53-9
AU - Rafea A.
AU - El Kholy Sherif A.
AU - Aly G.
PY - 2011
SP - 205
EP - 210
DO - 10.5220/0003488402050210


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - LABEL ORIENTED CLUSTERING FOR SOCIAL NETWORK DISCUSSION GROUPS
SN - 978-989-8425-53-9
AU - Rafea A.
AU - El Kholy Sherif A.
AU - Aly G.
PY - 2011
SP - 205
EP - 210
DO - 10.5220/0003488402050210