A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language

Franco Chiavetta, Giosuè Lo Bosco, Giovanni Pilato

2016

Abstract

We present a system aimed at the automatic classification of the sentiment orientation expressed into book reviews written in Italian language. The system we have developed is found on a lexicon-based approach and uses NLP techniques in order to take into account the linguistic relation between terms in the analyzed texts. The classification of a review is based on the average sentiment strenght of its sentences, while the classification of each sentence is obtained through a parsing process inspecting, for each term, a window of previous items to detect particular combinations of elements giving inversions or variations of polarity. The score of a single word depends on all the associated meanings considering also semantically related concepts as synonyms and hyperonims. Concepts associated to words are extracted from a proper stratification of linguistic resources that we adopt to solve the problems of lack of an opinion lexicon specifically tailored on the Italian language. The system has been prototyped by using Python language and it has been tested on a dataset of reviews crawled from Amazon.it, the Italian Amazon website. Experiments show that the proposed system is able to automatically classify both positive and negative reviews, with an average accuracy of above 82%.

References

  1. Abbasi, A., Chen, H., and Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst., 26(3):1-34.
  2. Agathangelou, P., Katakis, I., Kokkoras, F., and Ntonas, K. (2014). Mining domain-specific dictionaries of opinion words. In 15th International Conference on Web Information System ngineering (WISE 2014), pages 47-62.
  3. Agerri, R. and Garcia-Serrano, A. (2010). Q-wordnet: Extracting polarity from wordnet senses. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10).
  4. Bautin, M., Vijayarenu, L., and Skiena, S. (2008). International sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2008), pages 19-26.
  5. Bentivogli, L., Girardi, C., and Pianta, E. (2002). Multiwordnet, developing an aligned multilingual database. In Proceedings of the First International Conference on Global WordNet, pages 293-302.
  6. Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning (COLT92), pages 144-152.
  7. Cambria, E., Olsher, D., and Rajagopal, D. (2014). Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In Twentyeight AAAI conference on artificial intelligence (AAAI-14), pages 1515-1521.
  8. Casoto, P., Dattolo, A., and Tasso, C. (2008). Sentiment classification for the italian language: a case study on movie reviews. Journal Of Internet Technology, 9(4):365-373.
  9. Compagnoni, S., Demontis, V., Formentelli, A., Gandini, M., and Cerini, G. (2007). Language resources and linguistic theory: Typology, second language acquisition, English linguistics (Forthcoming), chapter Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. Franco Angeli Editore.
  10. Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (WWW 7803), pages 519-528.
  11. Dinu, L. P. and Iuga, I. (2012). The naive bayes classifier in opinion mining: In search of the best feature set. In 13th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2012), pages 556-567.
  12. Domingos, P. and Pazzani, M. J. (1997). On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29:103-130.
  13. Dragut, E. C., Yu, C., Sistla, P., and Meng, W. (2010). Construction of a sentimental word dictionary. In ACM International Conference on Information and Knowledge Management (CIKM 2010), pages 1761-1764.
  14. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
  15. Esuli, A., Sebastiani, F., and Baccianella, S. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC 7810), pages 2200- 2204.
  16. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
  17. Garcia, M. and Gamallo, P. (2014). Citius: A naive-bayes strategy for sentiment analysis on english tweets. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 171-175.
  18. Hassan, A., Korashy, H., and Medhat, W. (2014). Sentiment analysis algorithms and applications - a survey. Ain Shams Engineering Journal, 5(4):1093-1113.
  19. Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the ACM International Conference on Knowledge Discovery & Data Mining (SIGKDD), pages 168-177.
  20. Kanayama, H. and Nasukawa, T. (2006). Fully automatic lexicon expansion for domain-oriented sentiment analysis. In In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 7806), pages 355-363.
  21. Kennedy, A. and Inkpen, D. (May 2006). Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2):110-125.
  22. Kim, S.-M. and Hovy, E. (2006). Identifying and analyzing judgment opinions. In Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL-06), pages 200- 207.
  23. Lewis, D. D. (1998). Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of the European Conference on Machine Learning (ECML-98), pages 4-15.
  24. Littman, P. and M.L., T. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical report, National Research Council Canada, Institute for Information Technology.
  25. Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, San Rafael, US.
  26. Liu, L. and Ozsu, M. T. (2009). Encyclopedia of Database Systems. Springer.
  27. Meyer, D., Leisch, F., and Hornik, K. (2003). The support vector machine under test. Neurocomputing, 55(1- 2):169-186.
  28. Mihalcea, R., Banea, C., and Wiebe, J. (2007). Learning multilingual subjective language via cross-lingual projections. In Proceedings of the Association for Computational Linguistics (ACL 2007), pages 976- 983.
  29. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), pages 79-86.
  30. Polanyi, L. and Zaenen, A. (2006). Contextual valence shifters. In Croft, W. B., Shanahan, J., Qu, Y., and Wiebe, J., editors, Computing Attitude and Affect in Text: Theory and Applications, volume 20 of The Information Retrieval Series, chapter 1, pages 1-10. Springer Netherlands.
  31. Shanmuganathan, P. and Sakthivel, C. (2015). An efficient naive bayes classification for sentiment analysis on twitter. Data Mining and Knowledge Engineering, 7(5).
  32. Strapparava, C. and Valitutti, A. (2004). Wordnet-affect: an affective extension of wordnet. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pages 1083-1086.
  33. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2):267-307.
  34. Takamura, H., Inui, T., and Okumura, M. (2006). Latent variable models for semantic orientations of phrases. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pages 201-208.
  35. Vossen, P. (1998). Introduction to eurowordnet. Computers and the Humanities, 32(2):73-89.
  36. Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT/EMNLP 2005), pages 347-354.
  37. Yi, H., Jianyong, D., Xiaoming, C., and Bingzhen, Pei andRuzhan, L. (2005). A new method for sentiment classification in text retrieval. In Proceedings of the Second International Joint Conference on Natural Language Processing (IIJCNLP-05), pages 1-9.
  38. Zagibalov, T. and Carroll, J. (2008). Automatic seed word selection for unsupervised sentiment classfication of chinese text. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 7808), pages 1073-1080.
  39. Zanchetta, E. and Baroni, M. (2005). Morph-it! a free corpus-based morphological resource for the italian language. In Proceedings of Corpus Linguistics 2005.
Download


Paper Citation


in Harvard Style

Chiavetta F., Lo Bosco G. and Pilato G. (2016). A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-186-1, pages 159-170. DOI: 10.5220/0005915301590170


in Bibtex Style

@conference{webist16,
author={Franco Chiavetta and Giosuè Lo Bosco and Giovanni Pilato},
title={A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2016},
pages={159-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005915301590170},
isbn={978-989-758-186-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language
SN - 978-989-758-186-1
AU - Chiavetta F.
AU - Lo Bosco G.
AU - Pilato G.
PY - 2016
SP - 159
EP - 170
DO - 10.5220/0005915301590170