Authors:
Dolça Tellols
1
;
Takenobu Tokunaga
1
and
Hikaru Yokono
2
Affiliations:
1
School of Computing, Tokyo Institute of Technology, Tokyo, Meguro, Ôokayama 2-12-1, Japan
;
2
School of Information Science, Meisei University, Tokyo, Hino-shi, Hodokubo 2-1-1, Japan
Keyword(s):
Vocabulary Assessment, Vocabulary Volume, Word Difficulty, Semantic Diversity, Natural Language Processing, Word Embedding.
Abstract:
This paper presents Vocabulary Volume, a new metric to assess vocabulary knowledge. The existing metrics for vocabulary knowledge assessment rely on word difficulty, which is often defined in terms of the use frequency of words. In addition to word difficulty, our proposed metrics consider the semantic diversity of words. To formalise semantic diversity, every word is transformed into a vector representation in the semantic space by using the word embedding techniques developed in the natural language processing research. The semantic diversity is defined as the volume of a convex hull that covers all points corresponding to the words. The Vocabulary Volume score (VVS) is calculated from both semantic diversity and word difficulty. To prove the validity of our proposed metric, we conducted experiments using data gathered from Japanese language learners and native Japanese speakers. The experiments explored various options for each component in calculating VVS: word embeddings, dimens
ion reduction methods, and word difficulty scale. The metric was evaluated by distinguishing between the learners’ responses with different levels of language proficiency. The experimental results suggested the best configuration of the components and showed that our proposed metric is better than an existing metric that considers only word difficulty.
(More)