Markov Chain based Method for In-Domain and Cross-Domain Sentiment Classification

Giacomo Domeniconi, Gianluca Moro, Andrea Pagliarani, Roberto Pasolini

2015

Abstract

Sentiment classification of textual opinions in positive, negative or neutral polarity, is a method to understand people thoughts about products, services, persons, organisations, and so on. Interpreting and labelling opportunely text data polarity is a costly activity if performed by human experts. To cut this labelling cost, new cross domain approaches have been developed where the goal is to automatically classify the polarity of an unlabelled target text set of a given domain, for example movie reviews, from a labelled source text set of another domain, such as book reviews. Language heterogeneity between source and target domain is the trickiest issue in cross-domain setting so that a preliminary transfer learning phase is generally required. The best performing techniques addressing this point are generally complex and require onerous parameter tuning each time a new source-target couple is involved. This paper introduces a simpler method based on the Markov chain theory to accomplish both transfer learning and sentiment classification tasks. In fact, this straightforward technique requires a lower parameter calibration effort. Experiments on popular text sets show that our approach achieves performance comparable with other works.

References

  1. Aue, A. and Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of recent advances in natural language processing (RANLP), volume 1, pages 2-1.
  2. Blitzer, J., Dredze, M., Pereira, F., et al. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, volume 7, pages 440-447.
  3. Bollegala, D., Weir, D., and Carroll, J. (2013). Crossdomain sentiment classification using a sentiment sensitive thesaurus. Knowledge and Data Engineering, IEEE Transactions on, 25(8):1719-1731.
  4. Cao, G., Nie, J.-Y., and Bai, J. (2007). Using markov chains to exploit word relationships in information retrieval. In Large Scale Semantic Access to Content (Text, Image, Video, and Sound), pages 388-402. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE.
  5. Dai, W., Xue, G.-R., Yang, Q., and Yu, Y. (2007). Coclustering based classification for out-of-domain documents. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 210-219. ACM.
  6. Dave, K., Lawrence, S., and Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web, pages 519-528. ACM.
  7. Deng, Z.-H., Luo, K.-H., and Yu, H.-L. (2014). A study of supervised term weighting scheme for sentiment analysis. Expert Systems with Applications, 41(7):3506- 3513.
  8. Domeniconi, G., Masseroli, M., Moro, G., and Pinoli, P. (2015a). Random perturbations and term weighting of gene ontology annotations for unknown gene function discovering. In Fred, A. et al. (eds.) IC3K 2014. CCIS, volume 553. Springer.
  9. Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C. (2014). Cross-domain text classification through iterative refining of target categories representations. In Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval.
  10. Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C. (2015b). Iterative refining of category profiles for nearest centroid cross-domain text classification. In Fred, A. et al. (eds.) IC3K 2014. CCIS, volume 553. Springer.
  11. Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C. (2015c). A study on term weighting for text categorization: a novel supervised variant of tf.idf. In Proceedings of the 4th International Conference on Data Management Technologies and Applications.
  12. Frasconi, P., Soda, G., and Vullo, A. (2002). Hidden markov models for text categorization in multi-page documents. Journal of Intelligent Information Systems, 18(2-3):195-217.
  13. He, Y., Lin, C., and Alani, H. (2011). Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 123-131. Association for Computational Linguistics.
  14. Jin, W., Ho, H. H., and Srihari, R. K. (2009). Opinionminer: a novel machine learning system for web opinion mining and extraction. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1195-1204. ACM.
  15. Jo, Y. and Oh, A. H. (2011). Aspect and sentiment unification model for online review analysis. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 815-824. ACM.
  16. Li, F. and Dong, T. (2013). Text categorization based on semantic cluster-hidden markov models. In Advances in Swarm Intelligence, pages 200-207. Springer.
  17. Li, F., Huang, M., and Zhu, X. (2010). Sentiment analysis with global topics and local dependency. In AAAI, volume 10, pages 1371-1376.
  18. Li, L., Jin, X., and Long, M. (2012). Topic correlation analysis for cross-domain text classification. In AAAI.
  19. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1):1-167.
  20. Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, pages 171-180. ACM.
  21. Melville, P., Gryc, W., and Lawrence, R. D. (2009). Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1275- 1284. ACM.
  22. Miller, D. R., Leek, T., and Schwartz, R. M. (1999). A hidden markov model information retrieval system. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 214-221. ACM.
  23. Mittendorf, E. and Schäuble, P. (1994). Document and passage retrieval based on hidden markov models. In SIGIR94, pages 318-327. Springer.
  24. Nasukawa, T. and Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture, pages 70-77. ACM.
  25. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: bringing order to the web.
  26. Paltoglou, G. and Thelwall, M. (2010). A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1386-1395. Association for Computational Linguistics.
  27. Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z. (2010). Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751-760. ACM.
  28. Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345-1359.
  29. Pan, Y.-C., Lee, H.-Y., and Lee, L.-S. (2012). Interactive spoken document retrieval with suggested key terms ranked by a markov decision process. Audio, Speech, and Language Processing, IEEE Transactions on, 20(2):632-645.
  30. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130-137.
  31. Qiu, L. (1993). Markov models of search state patterns in a hypertext information retrieval system. Journal of the American Society for Information Science, 44(7):413- 427.
  32. Qiu, L., Zhang, W., Hu, C., and Zhao, K. (2009). Selc: a self-supervised model for sentiment classification. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 929-936. ACM.
  33. Sarukkai, R. R. (2000). Link prediction and path analysis using markov chains. Computer Networks, 33(1):377- 386.
  34. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2):267-307.
  35. Tan, S., Wang, Y., and Cheng, X. (2008). Combining learnbased and lexicon-based techniques for sentiment detection without using labeled examples. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 743-744. ACM.
  36. Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417-424. Association for Computational Linguistics.
  37. Wu, H. and Gu, X. (2014). Reducing over-weighting in supervised term weighting for sentiment analysis. COLING.
  38. Xu, J. and Weischedel, R. (2000). Cross-lingual information retrieval using hidden markov models. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13, pages 95-103. Association for Computational Linguistics.
  39. Xu, R., Supekar, K., Huang, Y., Das, A., and Garber, A. (2006). Combining text classification and hidden markov modeling techniques for structuring randomized clinical trial abstracts. In AMIA Annual Symposium Proceedings, volume 2006, page 824. American Medical Informatics Association.
  40. Xue, G.-R., Dai, W., Yang, Q., and Yu, Y. (2008). Topicbridged plsa for cross-domain text classification. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 627-634. ACM.
  41. Yi, K. and Beheshti, J. (2008). A hidden markov modelbased text classification of medical documents. Journal of Information Science.
  42. Yi, K. and Beheshti, J. (2013). A text categorization model based on hidden markov models. In Proceedings of the Annual Conference of CAIS/Actes du congrès annuel de l'ACSI.
Download


Paper Citation


in Harvard Style

Domeniconi G., Moro G., Pagliarani A. and Pasolini R. (2015). Markov Chain based Method for In-Domain and Cross-Domain Sentiment Classification . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 127-137. DOI: 10.5220/0005636001270137


in Bibtex Style

@conference{kdir15,
author={Giacomo Domeniconi and Gianluca Moro and Andrea Pagliarani and Roberto Pasolini},
title={Markov Chain based Method for In-Domain and Cross-Domain Sentiment Classification},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={127-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005636001270137},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Markov Chain based Method for In-Domain and Cross-Domain Sentiment Classification
SN - 978-989-758-158-8
AU - Domeniconi G.
AU - Moro G.
AU - Pagliarani A.
AU - Pasolini R.
PY - 2015
SP - 127
EP - 137
DO - 10.5220/0005636001270137