loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Shinnosuke Isobe 1 ; Satoshi Tamura 2 ; Yuuto Gotoh 3 and Masaki Nose 3

Affiliations: 1 Graduate School of Natural Science and Technology, Gifu University, Gifu, Japan ; 2 Faculty of Engineering, Gifu University, Gifu, Japan ; 3 Ricoh Company, Ltd., Kanagawa, Japan

Keyword(s): Scene Classification, Audio-visual Speech Recognition, Multi-angle Lipreading, Anomaly Detection, Neural Vocoder.

Abstract: Recently, Audio-Visual Speech Recognition (AVSR), one of robust Automatic Speech Recognition (ASR) methods against acoustic noise, has been widely researched. AVSR combines ASR and Visual Speech Recognition (VSR). Considering real applications, we need to develop VSR that can accept frontal and non-frontal face images, and reduce computational time for image processing. In this paper, we propose an efficient multi-angle AVSR method using a Parallel-WaveGAN-based scene classifier. The classifier estimates whether given speech data were recorded in clean or noisy environments. Multi-angle AVSR is conducted if our scene classification detected noisy environments to enhance the recognition accuracy, whereas only ASR is performed if the classifier predicts clean speech data to avoid the increase of processing time. We evaluated our framework using two multi-angle audio-visual database: an English corpus OuluVS2 having 5 views and a Japanese phrase corpus GAMVA consisting of 12 views. Expe rimental results show that the scene classifier worked well, and using multi-angle AVSR achieved higher recognition accuracy than ASR. In addition, our approach could save processing time by switching recognizers according to noise condition. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.141.30.162

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Isobe, S.; Tamura, S.; Gotoh, Y. and Nose, M. (2022). Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 449-460. DOI: 10.5220/0010846000003122

@conference{icpram22,
author={Shinnosuke Isobe. and Satoshi Tamura. and Yuuto Gotoh. and Masaki Nose.},
title={Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={449-460},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010846000003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier
SN - 978-989-758-549-4
IS - 2184-4313
AU - Isobe, S.
AU - Tamura, S.
AU - Gotoh, Y.
AU - Nose, M.
PY - 2022
SP - 449
EP - 460
DO - 10.5220/0010846000003122
PB - SciTePress