MR-SAT: A MapReduce Algorithm for Big Data Sentiment Analysis on Twitter

Nikolaos Nodarakis, Spyros Sioutas, Athanasios K. Tsakalidis, Giannis Tzimas

Abstract

Sentiment analysis on Twitter data has attracted much attention recently. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. In this paper, we develop a novel method to harvest sentiment knowledge in the MapReduce framework. Our algorithm exploits the hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification procedure of diverse sentiment types in a parallel and distributed manner. Moreover, we utilize Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable and confirm the quality of our sentiment identification.

References

  1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R. (2011). Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, pages 30-38.
  2. Barbosa, L. and Feng, J. (2010). Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 36-44.
  3. Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422- 426.
  4. Davidov, D. and Rappoport, A. (2006). Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 297- 304.
  5. Davidov, D., Tsur, O., and Rappoport, A. (2010). Enhanced sentiment learning using twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 241- 249.
  6. Dean, J. and Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pages 137-150.
  7. Ding, X. and Liu, B. (2007). The utility of linguistic rules in opinion mining. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 811- 812.
  8. Go, A., Bhayani, R., and Huang, L. (2009). Twitter sentiment classification using distant supervision. Processing, pages 1-6.
  9. Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 168-177.
  10. Jiang, L., Yu, M., Zhou, M., Liu, X., and Zhao, T. (2011). Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 151-160.
  11. Khuc, V. N., Shivade, C., Ramnath, R., and Ramanathan, J. (2012). Towards building large-scale distributed systems for twitter sentiment analysis. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, pages 459-464.
  12. Nasukawa, T. and Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2Nd International Conference on Knowledge Capture, pages 70-77.
  13. Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A. K., Tsoumakos, D., and Tzimas, G. (2014). Efficient multidimensional aknn query processing in the cloud. In Database and Expert Systems Applications - 25th International Conference, DEXA 2014, Munich, Germany, September 1-4, 2014. Proceedings, Part I, pages 477-491.
  14. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, pages 79-86.
  15. Wang, X., Wei, F., Liu, X., Zhou, M., and Zhang, M. (2011). Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pages 1031-1040.
  16. White, T. (2012). Hadoop: The Definitive Guide, 3rd Edition. O'Reilly Media / Yahoo Press.
  17. Wilson, T., Wiebe, J., and Hoffmann, P. (2009). Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist., 35(3):399-433.
  18. Yamamoto, Y., Kumamoto, T., and Nadamoto, A. (2014). Role of emoticons for multidimensional sentiment analysis of twitter. In Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services, pages 107- 115.
  19. Yu, H. and Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 129-136.
  20. Zhang, W., Yu, C., and Meng, W. (2007). Opinion retrieval from blogs. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pages 831-840.
  21. Zhuang, L., Jing, F., and Zhu, X.-Y. (2006). Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pages 43-50.
Download


Paper Citation


in Harvard Style

Nodarakis N., Sioutas S., Tsakalidis A. and Tzimas G. (2016). MR-SAT: A MapReduce Algorithm for Big Data Sentiment Analysis on Twitter . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-186-1, pages 140-147. DOI: 10.5220/0005850401400147


in Bibtex Style

@conference{webist16,
author={Nikolaos Nodarakis and Spyros Sioutas and Athanasios K. Tsakalidis and Giannis Tzimas},
title={MR-SAT: A MapReduce Algorithm for Big Data Sentiment Analysis on Twitter},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2016},
pages={140-147},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005850401400147},
isbn={978-989-758-186-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - MR-SAT: A MapReduce Algorithm for Big Data Sentiment Analysis on Twitter
SN - 978-989-758-186-1
AU - Nodarakis N.
AU - Sioutas S.
AU - Tsakalidis A.
AU - Tzimas G.
PY - 2016
SP - 140
EP - 147
DO - 10.5220/0005850401400147