CSL: A Combined Spanish Lexicon - Resource for Polarity Classification and Sentiment Analysis

Luis G. Moreno-Sandoval, Paola Beltrán-Herrera, Jaime A. Vargas-Cruz, Carolina Sánchez-Barriga, Alexandra Pomares-Quimbaya, Jorge A. Alvarado-Valencia, Juan C. García-Díaz

2017

Abstract

Opinion mining and sentiment analysis in texts from social networks such as Twitter has taken great importance during the last decade. Quality lexicons for the sentiment analysis task are easily found in languages such as English; however, this is not the case in Spanish. For this reason, we propose CSL, a Combined Spanish Lexicon approach for sentiment analysis that uses an ensemble of six lexicons in Spanish and a weighted bag of words strategy. In order to build CSL we used 68,019 tweets previously classified by researchers at the Spanish Society of Natural Language Processing (SEPLN) obtaining a precision of 62.05 and a recall of 60.75 in the validation set, showing improvements in both measurements. Additionally, we compare the results of CSL with a very well-known commercial software for sentiment analysis in Spanish finding an improvement of 10 points in precision and 15 points in recall.

References

  1. Alvarado-Valencia, J.A., Carrillo, A., Forero, J., Caicedo, L., Urueña, J.C., 2016. Análisis de sentimiento político en twitter para las elecciones de la Alcaldía de Bogotá 2.015. Presented at the XXVI Simposio Internacional de Estadística 2016, Sincelejo, Sucre, Colombia.
  2. Baccianella, S., Esuli, A., Sebastiani, F., 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining., in: LREC. pp. 2200-2204.
  3. Councill, I.G., McDonald, R., Velikovich, L., 2010. What's Great and What's Not: Learning to Classify the Scope of Negation for Improved Sentiment Analysis, in: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, NeSp-NLP 7810. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 51-59.
  4. Cruz, F.L., Troyano, J. a., Pontes, B., Ortega, F. j., 2014. ML-SentiCon: A multilingual, lemma-level sentiment lexicon. Proces. Leng. Nat. 53, 113-120.
  5. Cruz, F.L., Troyano, J.A., Pontes, B., Ortega, F.J., 2014. Building layered, multilingual sentiment lexicons at synset and lemma levels.
  6. Data Science Lab, 2014. Multilingualsentiment [WWW Document]. URL https://sites.google.com/site/datascienceslab/projects /multilingualsentiment (accessed 6.27.16).
  7. Díaz Rangel, I., Sidorov, G., Suárez Guerra, S., 2014. Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomázein Rev. Lingüíst. Filol. Trad. 29, 31-46. doi:10.7764/onomazein.29.5
  8. Esuli, A., Sebastiani, F., 2006. Sentiwordnet: A publicly available lexical resource for opinion mining, in: Proceedings of LREC. Citeseer, pp. 417-422.
  9. EuroWordNet, 2001. EuroWordNet:Building a multilingual database with wordnets for several European languages. [WWW Document]. URL http://www.illc.uva.nl/EuroWordNet/ (accessed 5.23.16).
  10. Feldman, R., 2013. Techniques and Applications for Sentiment Analysis. Commun. ACM 56, 82-89. doi:10.1145/2436256.2436274
  11. García-Moya, I., Moreno, C., Rivera, F., 2013. Sense of Coherence and Biopsychosocial Health in Spanish Adolescents. Span. J. Psychol. 16. doi:10.1017/sjp.2013.90
  12. González, M.D.M., Cámara, E.M., Martín-Valdivia, M.T., López, L.A.U., 2015a. A Spanish semantic orientation approach to domain adaptation for polarity classification. ResearchGate 51, 520-531. doi:10.1016/j.ipm.2014.10.002
  13. González, M.D.M., Cámara, E.M., Valdivia, M.T.M., 2015b. CRiSOL: Base de Conocimiento de Opiniones para el Español. Proces. Leng. Nat. 55, 143-150.
  14. IBM, 2016a. AlchemyLanguage | IBM Watson Developer Cloud [WWW Document]. URL https://www.ibm.com/watson/developercloud/alche my-language.html (accessed 12.19.16).
  15. IBM, 2016b. AlchemyLanguage Service Documentation | Watson Developer Cloud [WWW Document]. URL https://www.ibm.com/watson/developercloud/doc/al chemylanguage/ (accessed 12.19.16).
  16. Jiménez-Zafra, S.M., Martin, M., González, M.D.M., Lopez, L.A.U., 2016. Domain Adaptation of Polarity Lexicon combining Term Frequency and Bootstrapping, in: ResearchGate. Presented at the Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 137-146. doi:10.18653/v1/W16- 0422
  17. Martinez-Camara, E., Martin-Valdivia, M., MolinaGonzalez, M., Perea-Ortega, J., 2014. Integrating Spanish lexical resources by meta-classifiers for polarity classification. J. Inf. Sci. 40, 538-554.
  18. Molina-González, M.D., Martínez-Cámara, E., MartínValdivia, M.-T., Perea-Ortega, J.M., 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Syst. Appl. 40, 7250-7257. doi:10.1016/j.eswa.2013.06.076
  19. Montejo-Ráez, A., Martínez-Cámara, E., MartínValdivia, M.T., Ureña-López, L.A., 2014. Ranked WordNet graph for Sentiment Polarity Classification in Twitter. Comput. Speech Lang. 28, 93-107. doi:10.1016/j.csl.2013.04.001
  20. Montoyo, A., Martinez-Barco, P., Balahur, A., 2012. Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. Decis. SUPPORT Syst. 53, 675-679.
  21. Ortigosa, A., Martín, J.M., Carro, R.M., 2014. Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527-541. doi:10.1016/j.chb.2013.05.024
  22. Pang, B., Lee, L., 2009. Opinion mining and sentiment analysis. Comput. Linguist. 35, 311-312. doi:10.1162/coli.2009.35.2.311
  23. Pérez-Rosas, V., Banea, C., Mihalcea, R., 2012. Learning Sentiment Lexicons in Spanish 5.
  24. Princeton University, 2015. About WordNet - WordNet - About WordNet [WWW Document]. URL https://wordnet.princeton.edu/ (accessed 5.23.16).
  25. Ravi, K., Ravi, V., 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl.-BASED Syst. 89, 14-46.
  26. Saif, H., He, Y., Alani, H., 2012. Semantic sentiment analysis of twitter, Lecture Notes in Computer Science.
  27. Saralegi, X., San Vicente, I., 2013. Elhuyar at TASS 2013, in: Workshop on Sentiment Analysis at SEPLN (TASS2013). Presented at the XXIX Congreso de la Sociedad Española de Procesamiento de Lenguaje Natural, Madrid, pp. 143-150.
  28. Saralegi, X., San Vicente, I., Ugarteburu, I., 2013. Crosslingual projections vs. corpora extracted subjectivity lexicons for less-resourced languages, Lecture Notes in Computer Science.
  29. SentiWordNet, 2010. SentiWordNet [WWW Document]. URL http://sentiwordnet.isti.cnr.it/ (accessed 5.23.16).
  30. Sidorov, G.( 1 ), Miranda-Jiménez, S.( 1 ), ViverosJiménez, F.( 1 ), Gelbukh, A.( 1 ), Castro-Sánchez, N.( 1 ), Velásquez, F.( 1 ), Díaz-Rangel, I.( 1 ), Suárez-Guerra, S.( 1 ), Treviño, A.( 2 ), Gordon, J.( 2 ), 2013. Empirical study of machine learning based approach for opinion mining in tweets, Lecture Notes in Computer Science.
  31. Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J., 2012. Empirical Study of Opinion Mining in Spanish Tweets 1-14.
  32. Taboada, M., 2016. Sentiment Analysis: An Overview from Linguistics. Annu. Rev. Linguist. 2, 325.
  33. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M., 2011. Lexicon-Based Methods for Sentiment Analysis. Comput. Linguist. 37, 267-307.
  34. Tang, L., Liu, H., 2010. Community Detection and Mining in Social Media.
  35. Vilares, D., Alonso, M.A., Gómez-Rodríguez, C., 2015. A syntactic approach for opinion mining on Spanish reviews. Nat. Lang. Eng. 21, 139-163. doi:10.1017/S1351324913000181
  36. Villena-Román, J.( 1 ), Lana-Serrano, S.( 2 ), MartínezCámara, E.( 3 ), González-Cristóbal, J. c. ( 4 ), 2013. TASS - Workshop on sentiment analysis at SEPLN. Proces. Leng. Nat. 50, 37-44.
Download


Paper Citation


in Harvard Style

Moreno-Sandoval L., Beltrán-Herrera P., Vargas-Cruz J., Sánchez-Barriga C., Pomares-Quimbaya A., Alvarado-Valencia J. and García-Díaz J. (2017). CSL: A Combined Spanish Lexicon - Resource for Polarity Classification and Sentiment Analysis . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 288-295. DOI: 10.5220/0006336402880295


in Bibtex Style

@conference{iceis17,
author={Luis G. Moreno-Sandoval and Paola Beltrán-Herrera and Jaime A. Vargas-Cruz and Carolina Sánchez-Barriga and Alexandra Pomares-Quimbaya and Jorge A. Alvarado-Valencia and Juan C. García-Díaz},
title={CSL: A Combined Spanish Lexicon - Resource for Polarity Classification and Sentiment Analysis},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2017},
pages={288-295},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006336402880295},
isbn={978-989-758-247-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - CSL: A Combined Spanish Lexicon - Resource for Polarity Classification and Sentiment Analysis
SN - 978-989-758-247-9
AU - Moreno-Sandoval L.
AU - Beltrán-Herrera P.
AU - Vargas-Cruz J.
AU - Sánchez-Barriga C.
AU - Pomares-Quimbaya A.
AU - Alvarado-Valencia J.
AU - García-Díaz J.
PY - 2017
SP - 288
EP - 295
DO - 10.5220/0006336402880295