Adjusting Word Embeddings by Deep Neural Networks

Xiaoyang Gao, Ryutaro Ichise


Continuous representations language models have gained popularity in many NLP tasks. To measure the similarity of two words, we have to calculate the cosine distances. However the qualities of word embeddings are due to the selected corpus. As for Word2Vec, we observe that the vectors are far apart to each other. Furthermore, synonym words with low occurrences or with multiple meanings are even further. In these cases, cosine similarities are not appropriate to evaluate how similar the words are. And considering about the structures of most of the language models, they are not as deep as we supposed. Based on these observations, we implement a mixed deep neural networks with two kinds of architectures. We show that adjustment can be done on word embeddings in both unsupervised and supervised ways. Remarkably, this approach improves the cases we mentioned by largely increasing almost all of synonyms similarities. It is also easy to train and adapt to certain tasks by changing the training target and dataset.


  1. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb): 1137-1155, 2003.
  2. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993-1022, 2003.
  3. Tomas Mikolov, Stefan Kombrink, Lukás? Burget, Jan C?ernockÈ, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5528-5531. IEEE, 2011.
  4. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
  5. George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39-41, 1995.
  6. Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. In Advances in neural information processing systems, pages 1081-1088, 2009.
  7. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.
  8. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web, pages 373-374. ACM, 2014.
  9. Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng. Parsing with compositional vector grammars. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 455-465, 2013.

Paper Citation

in Harvard Style

Gao X. and Ichise R. (2017). Adjusting Word Embeddings by Deep Neural Networks . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 398-406. DOI: 10.5220/0006120003980406

in Bibtex Style

author={Xiaoyang Gao and Ryutaro Ichise},
title={Adjusting Word Embeddings by Deep Neural Networks},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

in EndNote Style

JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Adjusting Word Embeddings by Deep Neural Networks
SN - 978-989-758-220-2
AU - Gao X.
AU - Ichise R.
PY - 2017
SP - 398
EP - 406
DO - 10.5220/0006120003980406