Authors:
Dimitrios Koutrintzes
1
;
Eirini Mathe
2
;
1
and
Evaggelos Spyrou
3
;
1
Affiliations:
1
Institute of Informatics and Telecommunications, National Center for Scientific Research - “Demokritos,” Athens, Greece
;
2
Department of Informatics, Ionian University, Corfu, Greece
;
3
Department of Computer Science and Telecommunications, University of Thessaly, Lamia, Greece
Keyword(s):
Human Activity Recognition, Multimodal Fusion.
Abstract:
Contemporary human activity recognition approaches are heavily based on deep neural network architectures, since the latter do not require neither significant domain knowledge, nor complex algorithms for feature extraction, while they are able to demonstrate strong performance. Therefore, handcrafted features are nowadays rarely used. In this paper we demonstrate that these features are able to learn complementary representations of input data and are able to boost the performance of deep approaches, i.e., when both deep and handcrafted features are fused. To this goal, we choose an existing set of handcrafted features, extracted from 3D skeletal joints. We compare its performance with two approaches. The first one is based on a visual representation of skeletal data, while the second is a rank pooling approach on raw RGB data. We show that when fusing both types of features, the overall performance is significantly increased. We evaluate our approach using a publicly available, chal
lenging dataset of human activities.
(More)