Arabic Sentiment Analysis using WEKA a Hybrid Learning Approach

Sarah Alhumoud, Tarfa Albuhairi, Mawaheb Altuwaijri

2015

Abstract

Data has become the currency of this era and it is continuing to massively increase in size and generation rate. Large data generated out of organisations’ e-transactions or individuals through social networks could be of a great value when analysed properly. This research presents an implementation of a sentiment analyser for Twitter’s tweets which is one of the biggest public and freely available big data sources. It analyses Arabic, Saudi dialect tweets to extract sentiments toward a specific topic. It used a dataset consisting of 3000 tweets collected from Twitter. The collected tweets were analysed using two machine learning approaches, supervised which is trained with the dataset collected and the proposed hybrid learning which is trained on a single words dictionary. Two algorithms are used, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). The obtained results by the cross validation on the same dataset clearly confirm the superiority of the hybrid learning approach over the supervised approach.

References

  1. Abdulla, N. Ahmed, N. Shehab, M. & Al-Ayyoub, M. (2013) Arabic Sentiment Analysis: Lexicon-Based and Corpus-Based. Proceedings of the IEEE Jordan Conference Applied Electrical Engineering and Computing Technologies (AEECT). Amman, pp. 1-6.
  2. Ahmed, E. & Bansal, P. (2013) Clustering Technique on Search Engine Dataset using Data Mining Tool. Proceedings of International Conference on Modeling, Simulation and Applied Optimization, Hammamet, pp. 1 - 5.
  3. Alhumoud, S. Altuwaijri, M. Albuhairi, T. & Alohaideb, W. (2015) Survey on Arabic Sentiment Analysis in Twitter. Proceedings of the International Conference on Computer Science and Information Technology (ICCSIT), Paris, pp. 364 - 368.
  4. Ali, B. & Massmoudi, Y. (2013) K-Means clustering based on Gower Similarity Coefficient: A comparative study. Proceedings of International Conference on Advanced Computing & Communication Technologies, Rohtak, pp. 86 - 89.
  5. Apala, K. Jose, M. Motnam, S. Chan, C. Liszka, K. & Gregorio, F. (2013)Prediction of Movies Box Office Performance Using Social Media. Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, ON, pp. 1209 - 1214.
  6. Arab Social Media Report. (2014) Twitter in the Arab Region, Dubai School of Government, [Online] Available from: https://shar.es/129INW. [Accessed: 16th June 2015].
  7. Chen, S. Xie, X. Zeng, Z. Yu, J. & Lu, C. (2013) Study of the Regularities in the Treatment of Psoriasis Vulgaris by TCM: Applying Association Rule Mining to TCM Literature. Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, pp. 15 - 17.
  8. Choo, T. Abu Bakar, A. Talebi, A. Sundararajan, E. & Rahmany, M. (2013) Classification modeling on distributed environment. Proceedings of IEEE Conference on Open Systems, Kuching, pp. 209 - 214.
  9. Dan, L. Lihua L. & Zhaoxin,Z. (2012) Research of Text Categorization on WEKA. Proceedings of International Conference on Intelligent System Design and Engineering Applications, Hong Kong, pp. 1129 - 1131.
  10. Dass, V. Abdul Rasheed, M. & Ali, M. (2014) Classification of Lung cancer subtypes by Data Mining technique. Proceedings of International Conference on Control, Instrumentation, Energy & Communication, Calcutta, pp. 558 - 562.
  11. EMC. (2011) The 2011 IDC Digital Universe Study Sponsored by EMC, [Online] Available from: http://www.emc.com/collateral/about/news/idc-emcdigital-universe-2011-infographic.pdf. [Accessed: 16th June 2015].
  12. Gantz, J & Reinsel, D. (2011) Extracting Value from Chao, EMC, [Online] Available from: http://www.emc.com/ collateral/analyst-reports/idc-extracting-value-fromchaos-ar.pdf. [Accessed: 16th June 2015].
  13. Han, J. Kamber, M. & Pei, J. (2000) Data Mining: Concepts and Techniques. Morgan Kaufmann.
  14. Jin, H. Zhu,Y. Jin,Z. and Arora,S (2014) Sentiment Visualization on Tweets Stream Journal of Software. 9 (9). p. 2348-2352.
  15. Jovic, A. Brkic, K. & Bogunovic, N. (2014) An Overview Of Free Software Tools For General Data Mining. Proceedings of the International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, pp. 1112 - 1117.
  16. Khasawneh, R. Wahsheh, H. Al Kabi M. & Aismadi, I. (2013) Sentiment analysis of arabic social media content: a comparative study. Proceedings of the 8th International Conference for Internet Technology and Secured Transactions (ICITST). London, pp. 101 - 106.
  17. Lekhal, A. Srikrishna,C. &Vinod,V. (2013) Utility of Association Rule Mining: a Case Study using WEKA Tool. Proceedings of International Conference on Emerging Trends in VLSI, Embedded System, Nano Electronics and Telecommunication System, Tiruvannamalai, pp. 1 - 6.
  18. Liu, B. (2012) Sentiment Analysis and Opinion Mining. Morgan & Claypool.
  19. Medhat, W. Hassan, A. Korashy H. (2014) Sentiment analysis algorithms and applications: A survey, Ain Shams Engineering Journal (5). P. 1093-1113. [Online] Available from: http://www.sciencedirect.com/science/ article/pii/S2090447914000550. [Accessed: 28th Aug 2015].
  20. NLP4Arabi. (2012) Arabic MPQA Subjective Lexicon & Arabic Opinion Holder Corpus. [Online] Available from: http://nlp4arabic.blogspot.com/2012/05/arabicmpqa-subjective-lexicon-arabic.html . [Accessed: 17th June 2015].
  21. Orange. (2015) Data Mining - Fruitful and Fun, [Online] Available from: http://orange.biolab.si/. [Accessed: 13th July 2015].
  22. Parack, S. Zahid, Z. & Merchant,F. (2012) Application of Data Mining in Educational Databases for Predicting Academic Trends and Patterns. Proceedings of IEEE International Conference on Technology Enhanced Education, Kerala, pp. 1 - 4.
  23. R- project. (2015) The R Project for Statistical Computing, [Online] Available from: http://www.r-project.org/. [Accessed: 13th July 2015].
  24. RapidMiner. (2015) Predictive Analytics Reimagined, [Online] Available from: https://rapidminer.com/. [Accessed: 13th July 2015].
  25. Ravi, K. & Ravi, V. (2015) A survey on opinion mining and sentiment analysis: Tasks, approaches and applications, Knowledge-Based Systems (1). P. 1-33. [Online] Available from: http://www.science direct.com/science/article/pii/S0950705115002336. [Accessed: 28th Aug 2015].
  26. Sagiroglu, S. & Sinanc, D. (2013) Big Data: A review, Proceedings of the International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, pp. 42 - 47.
  27. Saraç, E. & Özel,S. (2013) Web Page Classification Using Firefly Optimization. Proceedings of IEEE International Symposium on Innovations in Intelligent Systems and Applications, Albena, pp. 1 - 5.
  28. Shah, C. & Jivani, A. (2013) Comparison of Data Mining Classification Algorithms for Breast Cancer Prediction. Proceedings of International Conference on Computing, Communications and Networking Technologies, Tiruchengode, pp. 1 - 4.
  29. Shoukry, A. & Rafea, A. (2012) Sentence Level Arabic Sentiment Analysis. Proceedings of the International Conference on Collaboration Technologies and Systems, Denver, USA, pp. 546 - 550.
  30. Sokolova, M. Japkowicz, N. and Szpakowicz, S. (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Hutchison, D. Kanade, T. Kittler, J. Kleinberg, J.M. Mattern, F. Mitchell, J.C. Naor, M. Pandu Rangan, C. Steffen, B. Terzopoulos, D. Tygar, D. & Weikum, G. (eds.). AI 2006: Advances in Artificial Intelligence. Lecture Notes in Computer Science (4304). Australia Springer Berlin Heidelberg.
  31. Thabtah, F. Gharaibeh, O. & Abdeljaber,H. (2011) Comparison of Rule based Classification Techniques for the Arabic Textual Data. Proceedings of International Symposium on Innovation in Information & Communication Technology, Amman, pp. 105 - 111.
  32. Twitter. (2014) About Twitter, [Online]. Available from: https://about.twitter.com/what-is-twitter. [Accessed: 14th July 2015].
  33. Twitter. (2015) [Online] Available from: https://support.twitter.com/articles/215585-gettingstarted-with-twitter. [Accessed: 16th June 2015].
  34. Twitter. (2015) About.Twitter, [Online]. Available from: https://about.twitter.com/en/company. [Accessed: 31th Aug 2015].
  35. V, U. (2014) Sentiment Analysis Using WEKA International Journal of Engineering Trends and Technology. 18 (4). p. 181-183.
  36. Vinodhini G. & Chandrasekaran, RM. (2012) Sentiment Analysis and Opinion Mining:A Survey. International Journal of Advanced Research in Computer Science and Software Engineering. [Online] (2). P. 283- 292. Available from: http://www.dmi.unict.it/faro/tesi/ sentiment_analysis/SA2.pdf. [Accessed: 16th June 2015].
  37. WEKA. (2014) WEKA 3: Data Mining Software in Java, [Online] Available from: http://www.cs.waikato. ac.nz/ml/weka/index.html. [Accessed: 24th June 2015].
  38. WEKA. (2015) ARFF (book version), [Online] Available from: http://weka.wikispaces.com/ARFF+%28book+ version%29. [Accessed: 20th April 2015].
  39. Witten, I. Frank, E. & Hall, M. (2011) Data Mining Practical Machine Learning Tools and Techniques. Burlington: Elsevier.
Download


Paper Citation


in Harvard Style

Alhumoud S., Albuhairi T. and Altuwaijri M. (2015). Arabic Sentiment Analysis using WEKA a Hybrid Learning Approach . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 402-408. DOI: 10.5220/0005616004020408


in Bibtex Style

@conference{kdir15,
author={Sarah Alhumoud and Tarfa Albuhairi and Mawaheb Altuwaijri},
title={Arabic Sentiment Analysis using WEKA a Hybrid Learning Approach},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={402-408},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005616004020408},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Arabic Sentiment Analysis using WEKA a Hybrid Learning Approach
SN - 978-989-758-158-8
AU - Alhumoud S.
AU - Albuhairi T.
AU - Altuwaijri M.
PY - 2015
SP - 402
EP - 408
DO - 10.5220/0005616004020408