DON’T FOLLOW ME - Spam Detection in Twitter

Alex Hai Wang

2010

Abstract

The rapidly growing social network Twitter has been infiltrated by large amount of spam. In this paper, a spam detection prototype system is proposed to identify suspicious users on Twitter. A directed social graph model is proposed to explore the “follower” and “friend” relationships among Twitter. Based on Twitter’s spam policy, novel content-based features and graph-based features are also proposed to facilitate spam detection. A Web crawler is developed relying on API methods provided by Twitter. Around 25K users, 500K tweets, and 49M follower/friend relationships in total are collected from public available data on Twitter. Bayesian classification algorithm is applied to distinguish the suspicious behaviors from normal ones. I analyze the data set and evaluate the performance of the detection system. Classic evaluation metrics are used to compare the performance of various traditional classification methods. Experiment results show that the Bayesian classifier has the best overall performance in term of F-measure. The trained classifier is also applied to the entire data set. The result shows that the spam detection system can achieve 89% precision.

References

  1. Analytics, P. (2009). Twitter study. http://www. pearanalytics.com/wp-content/uploads/2009/08/ Twitter-Study-August-2009.pdf.
  2. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., and Gonc¸alves, M. (2009). Detecting spammers and content promoters in online video social networks. In SIGIR 7809: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 620-627, New York, NY, USA. ACM.
  3. Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. (2007). Know your neighbors: web spam detection using the web topology. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 423-430, New York, NY, USA. ACM.
  4. CNET (2009). 4chan may be behind attack on twitter. http://news.cnet.com/8301-13515 3-10279618- 26.html.
  5. Geng, G.-G., Li, Q., and Zhang, X. (2009). Link based small sample learning for web spam detection. In WWW 7809: Proceedings of the 18th international conference on World wide web, pages 1185-1186, New York, NY, USA. ACM.
  6. Gyongyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J. (2006). Link spam detection based on mass estimation. In VLDB 7806: Proceedings of the 32nd international conference on Very large data bases, pages 439-450. VLDB Endowment.
  7. Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. (2004). Combating web spam with trustrank. In VLDB 7804: Proceedings of the Thirtieth international conference on Very large data bases, pages 576-587. VLDB Endowment.
  8. Krishnamurthy, B., Gill, P., and Arlitt, M. (2008). A few chirps about twitter. In WOSP 7808: Proceedings of the first workshop on Online social networks, pages 19-24, New York, NY, USA. ACM.
  9. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8):707-710.
  10. Nooy, W. d., Mrvar, A., and Batagelj, V. (2004). Exploratory Social Network Analysis with Pajek. Cambridge University Press, New York, NY, USA.
  11. Opera (2009). State of the mobile web. http://www.opera. com/smw/2009/12/.
  12. Rish, I. (2005). An empirical study of the naive bayes classifier. In IJCAI workshop on Empirical Methods in AI.
  13. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998). A bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization.
  14. Twitter (2009a). Restoring accidentally suspended accounts. http://status.twitter.com/post/136164828/ restoring-accidentally-suspended-accounts.
  15. Twitter (2009b). The twitter rules. http://help.twitter.com/ forums/26257/entries/18311.
  16. Wang, A. H. (2010). Detecting spam bots in online social networking websites: A machine learning approach. In 24th Annual IFIP WG 11.3 Working Conference on Data and Applications Security.
  17. Yu-Sung, W., Bagchi, S., Singh, N., and Wita, R. (2009). Spam detection in voice-over-ip calls through semisupervised clustering. In DSN 7809: Proceedings of the 2009 Dependable Systems Networks, pages 307 -316.
  18. Zhou, D., Burges, C. J. C., and Tao, T. (2007). Transductive link spam detection. In AIRWeb 7807: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 21-28, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Hai Wang A. (2010). DON’T FOLLOW ME - Spam Detection in Twitter . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010) ISBN 978-989-8425-18-8, pages 142-151. DOI: 10.5220/0002996201420151


in Bibtex Style

@conference{secrypt10,
author={Alex Hai Wang},
title={DON’T FOLLOW ME - Spam Detection in Twitter},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)},
year={2010},
pages={142-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002996201420151},
isbn={978-989-8425-18-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)
TI - DON’T FOLLOW ME - Spam Detection in Twitter
SN - 978-989-8425-18-8
AU - Hai Wang A.
PY - 2010
SP - 142
EP - 151
DO - 10.5220/0002996201420151