Authors:
Matthew Chapman
;
Ardhendu Behera
;
Anthony G. Cohn
and
David C. Hogg
Affiliation:
University of Leeds, United Kingdom
Keyword(s):
Egocentric Activity Recognition, Histogram of Oriented Pairwise Relations (HOPR), Spatio-temporal Relationships, Pairwise Qualitative Relations, Bag-of-visual-words.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Image and Video Analysis
;
Motion, Tracking and Stereo Vision
;
Video Surveillance and Event Detection
;
Visual Attention and Image Saliency
Abstract:
This paper presents an approach for recognising activities using video from an egocentric (first-person view) setup. Our approach infers activity from the interactions of objects and hands. In contrast to previous approaches to activity recognition, we do not require to use an intermediate such as object detection, pose estimation, etc. Recently, it has been shown that modelling the spatial distribution of visual words corresponding to local features further improves the performance of activity recognition using the bag-of-visual words representation. Influenced and inspired by this philosophy, our method is based on global spatio-temporal relationships between visual words. We consider the interaction between visual words by encoding their spatial distances, orientations and alignments. These interactions are encoded using a histogram that we name the Histogram of Oriented Pairwise Relations (HOPR). The proposed approach is robust to occlusion and background variation and is evalua
ted on two challenging egocentric activity datasets consisting of manipulative task. We introduce a novel representation of activities based on interactions of local features and experimentally demonstrate its superior performance in comparison to standard activity representations such as bag-of-visual words.
(More)