Facebook Posts Text Classification to Improve Information Filtering

Randa Benkhelifa, Fatima Zohra Laallam

2016

Abstract

Facebook is one of the most used socials networking sites. It is more than a simple website, but a popular tool of communication. Social networking users communicate between them exchanging a several kinds of content including a free text, image and video. Today, the social media users have a special way to express themselves. They create a new language known as “internet slang”, which crosses the same meaning using different lexical units. This unstructured text has its own specific characteristics, such as, massive, noisy and dynamic, while it requires novel preprocessing methods adapted to those characteristics in order to ease and make the process of the classification algorithms effective. Most of previous works about social media text classification eliminate Stopwords and classify posts based on their topic (e.g. politics, sport, art, etc). In this paper, we propose to classify them in a lower level into diverse pre-chosen classes using three machine learning algorithms SVM, Naïve Bayes and K-NN. To improve our classification, we propose a new preprocessing approach based on the Stopwords, Internet slang and other specific lexical units. Finally, we compared between all results for each classifier, then between classifiers results.

References

  1. Aggarwal, C. C., and Zhai, C. 2012. “A survey of text classification algorithms”, In Mining text data, Springer US, pp. 163-222.
  2. Akaichi, J., Dhouioui, Z., and Lopez-Huertas Perez, M. J. (2013) “Text mining facebook status updates for sentiment classification”. In System Theory, Control and Computing (ICSTCC), 17th International Conference, IEEE, pp. 640-645.
  3. Al-Ayyoub, M., Essa, S. B., & Alsmadi, I., 2015. “ Lexicon-based sentiment analysis of Arabic tweets”, International Journal of Social Network Mining, Vol.2, No.2, pp.101 - 114.
  4. Amiri, H., and Chua, T. S. 2012. “Mining slang and urban opinion words and phrases from cQA services: an optimization approach”. In Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp. 193-202.
  5. Belew, R. K. 2000. Finding out about: a cognitive perspective on search engine technology and the WWW, Vol. 1. Cambridge University Press.
  6. Benkhelifa, R., Laallam, F.Z, 2015. “Opinion Extraction and Classification of Real Time E-commerce Websites Reviews”, International Journal of Computer Science and Information Technologies, Vol. 6 No. 6 , pp 4992- 4996.
  7. Faqeeh, M., Abdulla, N., Al-Ayyoub, M., Jararweh, Y., and Quwaider, M. 2014. “Cross-lingual short-text document classification for facebook comments”. In Future Internet of Things and Cloud (FiCloud), 2014 International Conference on. IEEE. pp. 573-578.
  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. 2009. “Witten, The WEKA Data Mining Software: An Update”, SIGKDD Explorations, Vol. 11, No. 1.
  9. Hu, X., Tang, J., Gao, H., and Liu, H. 2013. “Unsupervised sentiment analysis with emotional signals”. In Proceedings of the 22nd international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 607- 618.
  10. Kovach, B. and Rosenstiel, T. 2007. “The Elements of Journalism: What Newspeople Should Know and the Public Should Expect”. Three Rivers Press.
  11. Kundi, F. M., Ahmad, S., Khan, A., and Asghar, M. Z. 2014. “Detection and Scoring of Internet Slangs for Sentiment Analysis Using SentiWordNet”, Life Science Journal, Vol.11 No. 9.
  12. Liu, B. 2012. “Sentiment analysis and opinion mining”. Synthesis Lectures on Human Language Technologies, Vol 5, No, 1 pp. 1-167.
  13. Luhn, H. P., 1957. “A statistical approach to mechanized encoding and searching of literary information”. IBM Journal of Research and Development, Vol. 1 No. 4, pp 309-317.
  14. Nagar, N.a. 2009. “The Loud Public: Users' Comments and the Online News Media”. Online Journalism Symposium.
  15. Poomagal, S., Visalakshi, P., and Hamsapriya, T. 2015. “A novel method for clustering tweets in Twitter. International Journal of Web Based Communities”, Vol. 11 No. 2, pp 170-187.
  16. Ramos, J. 2003. “Using tf-idf to determine word relevance in document queries”. In Proceedings of the first instructional conference on machine learning.
  17. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., and Demirbas, M., 2010. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM. pp. 841-842.
  18. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, B. 2014. “Learning sentiment-specific word embedding for twitter sentiment classification”. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Vol. 1, pp. 1555-1565.
  19. Tsz-Wai Lo, R., He, B., and Ounis, I. 2005, “Automatically building a stopword list for an information retrieval system”. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop (DIR), Vol 5, pp 17-24.
  20. Uttarwar, M., and Bhute, Y., 2013. “A Review on Customizable Content-Based Message Filtering from OSN User Wall” IJCSMC, Vol. 2, No. 10, pp 198 - 202.
  21. Vanetti, M., Binaghi, E., Ferrari, E., Carminati, B., and Carullo, M. 2013. “A System to Filter Unwanted Messages from OSN User Walls”, IEEE Trans. Knowledge and Data Eng., Vol. 25, No. 2, pp. 1041- 4347.
  22. Weber, P. 2013. “Discussions in the comments section: Factors influencing participation and interactivity in online newspapers' reader comments”. New Media & Society, Vol.16 No. 6, pp 941-957.
Download


Paper Citation


in Harvard Style

Benkhelifa R. and Laallam F. (2016). Facebook Posts Text Classification to Improve Information Filtering . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-186-1, pages 202-207. DOI: 10.5220/0005907702020207


in Bibtex Style

@conference{webist16,
author={Randa Benkhelifa and Fatima Zohra Laallam},
title={Facebook Posts Text Classification to Improve Information Filtering},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2016},
pages={202-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005907702020207},
isbn={978-989-758-186-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Facebook Posts Text Classification to Improve Information Filtering
SN - 978-989-758-186-1
AU - Benkhelifa R.
AU - Laallam F.
PY - 2016
SP - 202
EP - 207
DO - 10.5220/0005907702020207