Information Quality in Online Social Networks: A Fast Unsupervised Social Spam Detection Method for Trending Topics

Mahdi Washha, Dania Shilleh, Yara Ghawadrah, Reem Jazi, Florence Sedes

2017

Abstract

Online social networks (OSNs) provide data valuable for a tremendous range of applications such as search engines and recommendation systems. However, the easy-to-use interactive interfaces and low barriers of publications have exposed various information quality (IQ) problems, decreasing the quality of user-generated content (UGC) in such networks. The existence of a particular kind of ill-intentioned users, so-called social spammers, imposes challenges to maintain an acceptable level of information quality. Social spammers simply misuse all services provided by social networks to post spam contents in an automated way. As a natural reaction, various detection methods have been designed, which inspect individual posts or accounts for the existence of spam. The major limitations of these methods are supervised learning-based requiring ground truth data-sets. Moreover, the account-based detection methods are not practical for processing ”crawled” large collections of social posts, requiring months to process such collections. Post-level detection methods also have another drawback in adapting robustly the dynamic behavior of spammers because of the weakness of features in discriminating among spam and non-spam, although of applicability of such methods in regards of time. Hence, in this paper, we introduce a design of an unsupervised learning approach dedicated for detecting spam accounts (or users) existing in large collections of trending topics, from a collective perspective point of view. More precisely, our method leverages the available simple meta-data about users and the published posts (tweets) related to a topic, as heuristic information, to find any correlation among spam users acting as a spam campaign. Compared to the supervised learning methods, our experimental evaluation demonstrates the efficiency of predicting spam accounts (users) in terms of accuracy, precision, recall, and F-measure performance metrics.

References

  1. Agarwal, N. and Yiliyasi, Y. (2010). Information quality challenges in social media. In International Conference on Information Quality (ICIQ), pages 234-248.
  2. Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. (2010). Detecting spammers on twitter. In In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), page 12.
  3. Cao, C. and Caverlee, J. (2015). Detecting spam urls in social media via behavioral analysis. In Advances in Information Retrieval, pages 703-714. Springer.
  4. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
  5. Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012a). Detecting automation of twitter accounts: Are you a human, bot, or cyborg? Dependable and Secure Computing, IEEE Transactions on, 9(6):811-824.
  6. Chu, Z., Widjaja, I., and Wang, H. (2012b). Detecting social spam campaigns on twitter. In Applied Cryptography and Network Security, pages 455-472. Springer.
  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
  8. Hu, X., Tang, J., and Liu, H. (2014). Online social spammer detection. In AAAI, pages 59-65.
  9. Hu, X., Tang, J., Zhang, Y., and Liu, H. (2013). Social spammer detection in microblogging. In IJCAI, volume 13, pages 2633-2639. Citeseer.
  10. Lee, K., Caverlee, J., and Webb, S. (2010). Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 7810, pages 435-442, New York, NY, USA. ACM.
  11. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
  12. Martinez-Romo, J. and Araujo, L. (2013). Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992-3000.
  13. McCord, M. and Chuah, M. (2011). Spam detection on twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing, ATC'11, pages 175-186. SpringerVerlag.
  14. Nazir, A., Raza, S., and Chuah, C.-N. (2008). Unveiling facebook: a measurement study of social network based applications. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, pages 43-56. ACM.
  15. Stringhini, G., Kruegel, C., and Vigna, G. (2010). Detecting Spammers on Social Networks. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 7810, pages 1-9, New York, NY, USA. ACM.
  16. Twitter (2016). The twitter rules. https://support.twitter.com/articles/18311#. [Online; accessed 1-March-2016].
  17. Wang, A. H. (2010). Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1-10.
  18. Washha, M., Qaroush, A., and Sedes, F. (2016). Leveraging time for spammers detection on twitter. In Proceedings of the 8th International Conference on Management of Digital EcoSystems, MEDES 2016, Biarritz, France, November 1-4, 2016, pages 109-116.
  19. Yang, C., Harkreader, R., Zhang, J., Shin, S., and Gu, G. (2012). Analyzing spammers' social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web, WWW 7812, pages 71-80, New York, NY, USA. ACM.
  20. Yang, C., Harkreader, R. C., and Gu, G. (2011). Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection, RAID'11, pages 318-337, Berlin, Heidelberg. Springer-Verlag.
  21. Yang, J. and Leskovec, J. (2013). Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 7813, pages 587-596, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Washha M., Shilleh D., Ghawadrah Y., Jazi R. and Sedes F. (2017). Information Quality in Online Social Networks: A Fast Unsupervised Social Spam Detection Method for Trending Topics . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-248-6, pages 663-675. DOI: 10.5220/0006372006630675


in Bibtex Style

@conference{iceis17,
author={Mahdi Washha and Dania Shilleh and Yara Ghawadrah and Reem Jazi and Florence Sedes},
title={Information Quality in Online Social Networks: A Fast Unsupervised Social Spam Detection Method for Trending Topics},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2017},
pages={663-675},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006372006630675},
isbn={978-989-758-248-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Information Quality in Online Social Networks: A Fast Unsupervised Social Spam Detection Method for Trending Topics
SN - 978-989-758-248-6
AU - Washha M.
AU - Shilleh D.
AU - Ghawadrah Y.
AU - Jazi R.
AU - Sedes F.
PY - 2017
SP - 663
EP - 675
DO - 10.5220/0006372006630675