Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain

Ryota Sato; Suzana Beleza; Erica Shimomoto; Matheus Silva de Lima; Nobuko Kato; Kazuhiro Fukui

doi:10.5220/0012577000003654

Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain

Ryota Sato, Suzana Beleza, Erica Shimomoto, Matheus Silva de Lima, Nobuko Kato, Kazuhiro Fukui

2024

Abstract

This paper proposes a subspace-based method for sign language recognition in videos. Typical subspace-based methods represent a video as a low-dimensional subspace generated by applying principal component analysis (PCA) to a set of images from the video. Such representation is compact and practical for motion recognition under few learning data. However, given the complex motion and structure in sign languages, subspace-based methods need to improve performance as they do not consider temporal information like the order of frames. To address this issue, we propose processing time-domain information on the frequency-domain by applying the three-dimensional fast Fourier transform (3D-FFT) to sign videos, where a sign video is represented as a 3D amplitude spectrum tensor, which is invariant to deviations in the spatial and temporal directions of target objects. Further, a 3D amplitude spectral tensor is regarded as one point on the Product Grassmann Manifold (PGM). By unfolding the tensor in all three dimensions, PGM can account for the temporal information. Finally, we calculate video similarity by using the distances between two corresponding points on the PGM. The effectiveness of the proposed method is demonstrated on private and public sign language recognition datasets, showing a significant performance improvement over conventional subspace-based methods.

Download

Paper Citation

in Harvard Style

Sato R., Beleza S., Shimomoto E., Silva de Lima M., Kato N. and Fukui K. (2024). Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 152-159. DOI: 10.5220/0012577000003654

in Bibtex Style

@conference{icpram24,
author={Ryota Sato and Suzana Beleza and Erica Shimomoto and Matheus Silva de Lima and Nobuko Kato and Kazuhiro Fukui},
title={Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={152-159},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012577000003654},
isbn={978-989-758-684-2},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain
SN - 978-989-758-684-2
AU - Sato R.
AU - Beleza S.
AU - Shimomoto E.
AU - Silva de Lima M.
AU - Kato N.
AU - Fukui K.
PY - 2024
SP - 152
EP - 159
DO - 10.5220/0012577000003654
PB - SciTePress