Visual-only Voice Activity Detection using Human Motion in Conference Video

Keisuke Yamazaki; Satoshi Tamura; Yuuto Gotoh; Masaki Nose

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Visual-only Voice Activity Detection using Human Motion in Conference Video

Topics: Audio and Speech Analysis; Image and Video Analysis and Understanding; Motion Tracking and Action Recognition

In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 570-577, 2022

Authors: Keisuke Yamazaki ¹ ; Satoshi Tamura ² ; Yuuto Gotoh ³ and Masaki Nose ³

Affiliations: ¹ Graduate School of Natural Science and Technology, Gifu University, Gifu, Japan ; ² Faculty of Engineering, Gifu University, Gifu, Japan ; ³ Ricoh Company, Ltd., Kanagawa, Japan

Keyword(s): Voice Activity Detection, Human Motion, Speaker Diarization, Dynamic Image, Multi-modal Transfer Module, Conference Video Processing.

Abstract: In this paper, we propose a visual-only Voice Activity Detection (VAD) method using human movements. Although audio VAD is commonly used in many applications, it has a problem it is not robust in noisy environments. In such the cases, multi-modal VAD using speech and mouth information is effective. However, due to the current pandemic situation, people wear masks causing we cannot observe mouths. On the other hand, utilizing a video capturing the entire of a speaker is useful for visual VAD, because gestures and motions may contribute to identify speech segments. In our scheme, we firstly obtain dynamic images which represent motion of a person. Secondly, we fuse dynamic and original images using Multi-Modal Transfer Module (MMTM). To evaluate the effectiveness of our scheme, we conducted experiments using conference videos. The results show that the proposed model has better than the baseline. Furthermore, through model visualization we confirmed that the proposed model focused much more on speakers. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.140.188.157

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Yamazaki, K.; Tamura, S.; Gotoh, Y. and Nose, M. (2022). Visual-only Voice Activity Detection using Human Motion in Conference Video. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 570-577. DOI: 10.5220/0010829200003122

@conference{icpram22,
author={Keisuke Yamazaki. and Satoshi Tamura. and Yuuto Gotoh. and Masaki Nose.},
title={Visual-only Voice Activity Detection using Human Motion in Conference Video},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={570-577},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010829200003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Visual-only Voice Activity Detection using Human Motion in Conference Video
SN - 978-989-758-549-4
IS - 2184-4313
AU - Yamazaki, K.
AU - Tamura, S.
AU - Gotoh, Y.
AU - Nose, M.
PY - 2022
SP - 570
EP - 577
DO - 10.5220/0010829200003122
PB - SciTePress