Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning

Minato Sato, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara, Akihiko Ohsuga

2017

Abstract

Temporal (one-dimensional) Convolutional Neural Network (Temporal CNN, ConvNet) is an emergent technology for text understanding. The input for the ConvNets could be either a sequence of words or a sequence of characters. In the latter case there are no needs for natural language processing that depends on a language such as morphological analysis. Past studies showed that the character-level ConvNets worked well for news category classification and sentiment analysis / classification tasks in English and romanized Chinese text corpus. In this article we apply the character-level ConvNets to Japanese text understanding. We also attempt to reuse meaningful representations that are learned in the ConvNets from a large-scale dataset in the form of transfer learning, inspired by its success in the field of image recognition. As for the application to the news category classification and the sentiment analysis and classification tasks in Japanese text corpus, the ConvNets outperformed N-gram-based classifiers. In addition, our ConvNets transfer learning frameworks worked well for a task which is similar to one used for pre-training.

References

  1. Agrawal, P., Girshick, R., and Malik, J. (2014). Analyzing the Performance of Multilayer Neural Networks for Object Recognition. In the Proceedings of the 13th European Conference on Computer Vision (ECCV), ECCV 7814, pages 329-344.
  2. Bengio, Y., Boulanger-Lewandowski, N., and Pascanu, R. (2013). Advances in Optimizing Recurrent Networks. In the Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 7813.
  3. Chollet, F. (2015). Keras.
  4. Del Corso, G. M., GullĂ­, A., and Romani, F. (2005). Ranking a Stream of News. In the Proceedings of the 14th International Conference on World Wide Web (WWW), WWW 7805, pages 97-106.
  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and FeiFei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. In the Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR 7809.
  6. dos Santos, C. and Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. In the Proceedings of the 25th International Conference on Computational Linguistics (COLING), COLING 7814, pages 69-78.
  7. dos Santos, C. N., Xiang, B., and Zhou, B. (2015). Classifying Relations by Ranking with Convolutional Neural Networks. In the Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), ACL 7815, pages 626-634.
  8. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In the Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR 7814.
  9. Glorot, X. and Bengio, Y. (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. In the Proceedings of the 13rd International Conference on Artificial Intelligence and Statistics (AISTATS), AISTATS 7810.
  10. Glorot, X., Bordes, A., and Bengio, Y. (2011). Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach. In the Proceedings of the 28th International Conference on Machine Learning (ICML), ICML 7811.
  11. Gulli, A. (2005). The Anatomy of a News Search Engine. In International Conference on World Wide Web (WWW) Special interest tracks and posters, WWW 7805, pages 880-881.
  12. Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 7814, pages 1746-1751.
  13. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet Classification with Deep Convolutional Neural Networks. In the Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), NIPS 7812, pages 1097-1105.
  14. Kudo, T., Yamamoto, K., and Matsumoto, Y. (2004). Applying Conditional Random Fields to Japanese Morphological Analysis. In the Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 7804, pages 230- 237.
  15. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011). Learning Word Vectors for Sentiment Analysis. In the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT), ACL-HLT 7811, pages 142-150.
  16. McAuley, J., Pandey, R., and Leskovec, J. (2015a). Inferring Networks of Substitutable and Complementary Products. In the Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), KDD 7813, pages 785-794.
  17. McAuley, J., Targett, C., Shi, Q., and van den Hengel, A. (2015b). Image-Based Recommendations on Styles and Substitutes. In the Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), SIGIR 7813, pages 43-52.
  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013a). Distributed Representations of Words and Phrases and their Compositionality. In the Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), NIPS 7813, pages 3111-3119.
  19. Mikolov, T., Yih, W.-t., and Zweig, G. (2013b). Linguistic Regularities in Continuous Space Word Representations. In the Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), NAACL-HLT 7813, pages 746-751.
  20. Nair, V. and Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In the Proceedings of the 27th International Conference on Machine Learning (ICML), ICML 7810, pages 807-814.
  21. Severyn, A. and Moschitti, A. (2015a). Twitter Sentiment Analysis with Deep Convolutional Neural Networks. In the Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), SIGIR 7815, pages 959- 962.
  22. Severyn, A. and Moschitti, A. (2015b). UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification. In the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval), SemEval 7815, pages 464-469.
  23. Sharif Razavian, A., Azizpour, H., Sullivan, J., and Carlsson, S. (2014). CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, CVPR 7814.
  24. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 7813, pages 1631-1642.
  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15:1929- 1958.
  26. Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688.
  27. Zhang, X. and LeCun, Y. (2015). Text Understanding from Scratch. CoRR, abs/1502.01710.
  28. Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification. Inthe Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), NIPS 7815, pages 649-657.
Download


Paper Citation


in Harvard Style

Sato M., Orihara R., Sei Y., Tahara Y. and Ohsuga A. (2017). Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 175-184. DOI: 10.5220/0006193401750184


in Bibtex Style

@conference{icaart17,
author={Minato Sato and Ryohei Orihara and Yuichi Sei and Yasuyuki Tahara and Akihiko Ohsuga},
title={Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={175-184},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006193401750184},
isbn={978-989-758-220-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning
SN - 978-989-758-220-2
AU - Sato M.
AU - Orihara R.
AU - Sei Y.
AU - Tahara Y.
AU - Ohsuga A.
PY - 2017
SP - 175
EP - 184
DO - 10.5220/0006193401750184