Chinese Word Similarity Computation based on Automatically Acquired Knowledge

Yuteng Zhang, Wenpeng Lu, Hao Wu


This paper describes our methods for Chinese word similarity computation based on automatically acquired knowledge on NLPCC-ICCPOL 2016 Task 3. All of the methods utilize off-the-shelf tools and data, which makes them easy to be replicated. We use Sogou corpus to train word vector for Chinese words and utilize Baidu to get Web page counts for word pairs. Both word vector and Web page counts can be acquired automatically. All of our methods don’t utilize any dictionary and manual-annotated knowledge, which avoids the huge human labor. Among the four submitted results, three systems achieve a similar Spearman correlation coefficient (0.327 by word vector, 0.328 by word vector and PMI, 0.314 by word vector and Dice). Besides, when all the English letters are converted to lowercase, the best performance of our methods is improved, which is 0.372 by word vector and Dice. All of the comparative methods and experiments are described in the paper.


