Authors:
Monika Schak
and
Alexander Gepperth
Affiliation:
Fulda University of Applied Sciences, 36037 Fulda, Germany
Keyword(s):
Hand Gestures, Dataset, Multimodal Data, Data Fusion, Sequence Detection.
Abstract:
We present a new large-scale multi-modal dataset for free-hand gesture recognition. The freely available
dataset consists of 79,881 sequences, grouped into six classes representing typical hand gestures in human-machine interaction. Each sample contains four independent modalities (arriving at different frequencies)
recorded from two independent sensors: a fixed 3D camera for video, audio and 3D, and a wearable acceleration sensor attached to the wrist. The gesture classes are specifically chosen with investigations on multi-modal
fusion in mind. For example, two gesture classes can be distinguished mainly by audio, while the four others
are not exhibiting audio signals – besides white noise. An important point concerning this dataset is that it is
recorded from a single person. While this reduces variability somewhat, it virtually eliminates the risk of incorrectly performed gestures, thus enhancing the quality of the data. By implementing a simple LSTM-based
gesture classifie
r in a live system, we can demonstrate that generalization to other persons is nevertheless high.
In addition, we show the validity and internal consistency of the data by training LSTM and DNN classifiers
relying on a single modality to high precision.
(More)