Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings

Enes Burak Dündar, Osman Fatih Kılıç, Tolga Çekiç, Yusufcan Manav, Onur Deniz

2020

Abstract

We have developed a large-scale intent detection method for our Turkish conversation system in banking domain to understand the problems of our customers. Recent advancements in natural language processing(NLP) have allowed machines to understand the words in a context by using their low dimensional vector representations a.k.a. contextual word embeddings. Thus, we have decided to use two language model architectures that provide contextual embeddings: ELMo and BERT. We trained ELMo on Turkish corpora while we used a pretrained Turkish BERT model. To evaluate these models on an intent classification task, we have collected and annotated 6453 customer messages in 148 intents. Furthermore, another Turkish document classification dataset named Kemik News are used for comparing our method with the state-of-the-art models. Experimental results have shown that using contextual word embeddings boost Turkish document classification performance on various tasks. Moreover, converting Turkish characters to English counterparts results in a slightly better performance. Lastly, an experiment is conducted to find out which BERT layer is more effective to use for intent classification task.

Download


Paper Citation


in Harvard Style

Dündar E., Kılıç O., Çekiç T., Manav Y. and Deniz O. (2020). Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR; ISBN 978-989-758-474-9, SciTePress, pages 187-192. DOI: 10.5220/0010108301870192


in Bibtex Style

@conference{kdir20,
author={Enes Burak Dündar and Osman Fatih Kılıç and Tolga Çekiç and Yusufcan Manav and Onur Deniz},
title={Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings},
booktitle={Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR},
year={2020},
pages={187-192},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010108301870192},
isbn={978-989-758-474-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR
TI - Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings
SN - 978-989-758-474-9
AU - Dündar E.
AU - Kılıç O.
AU - Çekiç T.
AU - Manav Y.
AU - Deniz O.
PY - 2020
SP - 187
EP - 192
DO - 10.5220/0010108301870192
PB - SciTePress