Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices

Hiroshi Fujimura; Ning Ding; Daichi Hayakawa; Takehiko Kagoshima

doi:10.5220/0008903202970307

Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices

Hiroshi Fujimura, Ning Ding, Daichi Hayakawa, Takehiko Kagoshima

2020

Abstract

This paper proposes a new method for simultaneous flexible keyword detection and text-dependent speaker identification using a recognized keyword. The purpose is to identify a speaker from among a set of pre-registered speakers on the basis of a short-command utterance in an office or home on low-resource chip devices. The first contribution is to construct the process that includes a neural network (NN) and a customized Viterbi-based algorithm for flexible keyword detection, and Gaussian mixture models (GMMs) for speaker identification. Outputs of a middle layer in the NN and alignment information for keyword detection are also used for creating feature vectors for speaker GMMs. The second contribution is to apply DropConnect in speaker-modeling uncertainties of the Bayesian NN that is used for speaker reacognition. It results in robust speaker models when enrollment utterances are few. Evaluation was conducted using 39 Japanese keywords by 100 speakers. Recognition performance was measured on the basis of false acceptances and false rejects using keyword utterances. Speaker identification for 100 pre-registered speakers for recognized keywords was simultaneously evaluated. The identification rate when using a conventional i-vector method was 71.22%. By contrast, the identification rate of the proposed method was 89.29% while using low-cost resources.

Download

Paper Citation

in Harvard Style

Fujimura H., Ding N., Hayakawa D. and Kagoshima T. (2020). Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-397-1, pages 297-307. DOI: 10.5220/0008903202970307

in Bibtex Style

@conference{icpram20,
author={Hiroshi Fujimura and Ning Ding and Daichi Hayakawa and Takehiko Kagoshima},
title={Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices},
booktitle={Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2020},
pages={297-307},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008903202970307},
isbn={978-989-758-397-1},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices
SN - 978-989-758-397-1
AU - Fujimura H.
AU - Ding N.
AU - Hayakawa D.
AU - Kagoshima T.
PY - 2020
SP - 297
EP - 307
DO - 10.5220/0008903202970307