KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks

Wu Bin, Dong Yuxiao, Qin Lei, Ke Qing, Wang Bai

2011

Abstract

Social network analysis is the mapping and measuring of relationships and flows between people, groups, computers and other information or knowledge entities. The continued exponential growth in the scale of social networks is giving birth to a new challenge to social network analysis. The scale of these graphs, in some cases, is millions of nodes and billions of edges. In this paper, we present a distributed system, KANGAROO, for huge scale social network based on two main computing models which are for finding common neighbour and maximal clique. KANGAROO is implemented on the top of the Hadoop platform, the open source version of MapReduce. This system implements most algorithms of social network analysis, including basic statistics, community detection, link prediction and network evolution etc. based on the MapReduce computing framework. More than anything else, KANGAROO is applied to a real-world huge scale social network. The application scenarios, including degree distribution, linear projection algorithm for community detection and community visualization of presentation layer, demonstrate KANGAROO is efficient, scalable and effective.

References

  1. U. Kang. Charalampos E. Tsourakakis, Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Ming System - Implementation and Observations. In ICDM2009, Ninth IEEE International Conference on Data Mining.
  2. U. Kang. Charalampos E. Tsourakakis, Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Ming System - Implementation and Observations. In ICDM2009, Ninth IEEE International Conference on Data Mining.
  3. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD2010, ACM SIGMOD International Conference on Management of Data.
  4. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD2010, ACM SIGMOD International Conference on Management of Data.
  5. Shengqi Yang, Bai Wang, Haizhou Zhao and Bin Wu. 2009. Efficient Dense Structure Mining using MapReduce. In ICDM2009, Ninth IEEE International Conference on Data Mining workshop on Large-scale Data Mining.
  6. Shengqi Yang, Bai Wang, Haizhou Zhao and Bin Wu. 2009. Efficient Dense Structure Mining using MapReduce. In ICDM2009, Ninth IEEE International Conference on Data Mining workshop on Large-scale Data Mining.
  7. A. L. Barabasi and R. Albert. 1999. Emergence of scaling in random networks. In Science, 286(5439):509-512.
  8. A. L. Barabasi and R. Albert. 1999. Emergence of scaling in random networks. In Science, 286(5439):509-512.
  9. D. J. Watts and S.H. Strogatz. 1998. Collective dynamics of small-world networks. In Nature, 393(6684):440-442.
  10. D. J. Watts and S.H. Strogatz. 1998. Collective dynamics of small-world networks. In Nature, 393(6684):440-442.
  11. Linyuan Lv, Tao Zhou. 2010. Link Prediction in Complex Networks: A Survey. In arXiv:1010.0725v1 [Physics and Society (physics.soc-ph)] 4 Oct 2010.
  12. Linyuan Lv, Tao Zhou. 2010. Link Prediction in Complex Networks: A Survey. In arXiv:1010.0725v1 [Physics and Society (physics.soc-ph)] 4 Oct 2010.
  13. Shengqi Yang, Bai Wang, Haizhou Zhao, Yuan Gao, Bin Wu. 2009. DisTec: Towards a Distributed System for Telecom computing. In International Conference on Cloud Computing 2009.
  14. Shengqi Yang, Bai Wang, Haizhou Zhao, Yuan Gao, Bin Wu. 2009. DisTec: Towards a Distributed System for Telecom computing. In International Conference on Cloud Computing 2009.
  15. Bin Wu, Shengqi Yang, Haizhou Zhao, Yuan Gao and Lijun Suo. 2009. CosDic: towards a Comprehensive System for Knowledge Discovery in Large-scale data. In The 2009 IEEE/WIC/ACM International Conference on Web Intelligence 2009.
  16. Bin Wu, Shengqi Yang, Haizhou Zhao, Yuan Gao and Lijun Suo. 2009. CosDic: towards a Comprehensive System for Knowledge Discovery in Large-scale data. In The 2009 IEEE/WIC/ACM International Conference on Web Intelligence 2009.
  17. J. Dean and S. Ghemawat. 2004. Mapreduce: Simplified data processing on large clusters. In OSDI 2004
  18. J. Dean and S. Ghemawat. 2004. Mapreduce: Simplified data processing on large clusters. In OSDI 2004
  19. L. da F. Costa, F. A. Rodrigues, G. Travieso, P. R. Villas Boas. 2005. Characterization of Complex Networks: A Survey of measurements. In Condensed Matter/0505185
  20. L. da F. Costa, F. A. Rodrigues, G. Travieso, P. R. Villas Boas. 2005. Characterization of Complex Networks: A Survey of measurements. In Condensed Matter/0505185
  21. P. J. Flory. 1941. Molecular size distribution in three-dimensional polymers. i. gelation. In Journal of the American Chemical Society, 63:3083-3090
  22. P. J. Flory. 1941. Molecular size distribution in three-dimensional polymers. i. gelation. In Journal of the American Chemical Society, 63:3083-3090
  23. A. Rapoport. 1953. Contribution to the theory of random and biased nets. In Bulletin of Mathematical Biophysics, 19:257-277, 1957.
  24. A. Rapoport. 1953. Contribution to the theory of random and biased nets. In Bulletin of Mathematical Biophysics, 19:257-277, 1957.
  25. P. Erdos and A.Renyi. 1961. On the strength of connectedness of a random graph. In Acta Mathematica Scientia Hungary, 12:261-267, 1961.
  26. P. Erdos and A.Renyi. 1961. On the strength of connectedness of a random graph. In Acta Mathematica Scientia Hungary, 12:261-267, 1961.
  27. Valdis Kredbs 2004. Valdis Krebs' website for Inflow, a software-based SNA tool. In http://www.orgnet.com/sna.html
  28. Valdis Kredbs 2004. Valdis Krebs' website for Inflow, a software-based SNA tool. In http://www.orgnet.com/sna.html
  29. XiaoPing Liao, Wei Ren, Guiying Yan. 2009. A Linear Projection Approach for Resolving Community Structure. In The Third International Symposium on Optimization and Systems Biology 2009.
  30. XiaoPing Liao, Wei Ren, Guiying Yan. 2009. A Linear Projection Approach for Resolving Community Structure. In The Third International Symposium on Optimization and Systems Biology 2009.
Download


Paper Citation


in Harvard Style

Bin W., Yuxiao D., Lei Q., Qing K. and Bai W. (2011). KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 404-409. DOI: 10.5220/0003387304040409


in Harvard Style

Bin W., Yuxiao D., Lei Q., Qing K. and Bai W. (2011). KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 404-409. DOI: 10.5220/0003387304040409


in Bibtex Style

@conference{closer11,
author={Wu Bin and Dong Yuxiao and Qin Lei and Ke Qing and Wang Bai},
title={KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={404-409},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003387304040409},
isbn={978-989-8425-52-2},
}


in Bibtex Style

@conference{closer11,
author={Wu Bin and Dong Yuxiao and Qin Lei and Ke Qing and Wang Bai},
title={KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={404-409},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003387304040409},
isbn={978-989-8425-52-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks
SN - 978-989-8425-52-2
AU - Bin W.
AU - Yuxiao D.
AU - Lei Q.
AU - Qing K.
AU - Bai W.
PY - 2011
SP - 404
EP - 409
DO - 10.5220/0003387304040409


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - KANGAROO: A DISTRIBUTED SYSTEM FOR SNA - Social Network Analysis in Huge-Scale Networks
SN - 978-989-8425-52-2
AU - Bin W.
AU - Yuxiao D.
AU - Lei Q.
AU - Qing K.
AU - Bai W.
PY - 2011
SP - 404
EP - 409
DO - 10.5220/0003387304040409