Authors:
Josep Maria Carmona
and
Joan Climent
Affiliation:
Barcelona Tech (UPC), Spain
Keyword(s):
R Transform, Action Recognition, PHOW, Projection Templates.
Related
Ontology
Subjects/Areas/Topics:
Applications and Services
;
Computer Vision, Visualization and Computer Graphics
;
Enterprise Information Systems
;
Features Extraction
;
Human and Computer Interaction
;
Human-Computer Interaction
;
Image and Video Analysis
;
Image and Video Coding and Compression
;
Image Formation and Preprocessing
;
Motion, Tracking and Stereo Vision
;
Video Surveillance and Event Detection
Abstract:
The objective of this paper is the automatic recognition of human actions in video sequences. The use of spatio-temporal features for action recognition has become very popular in recent literature Instead of extracting the spatio-temporal features from the raw video sequence, some authors propose to project the sequence to a single template first.
As a contribution we propose the use of several variants of the R transform for projecting the image sequences to templates. The R transform projects the whole sequence to a single image, retaining information concerning movement direction and magnitude. Spatio-temporal features are extracted from the template, they are combined using a bag of words paradigm, and finally fed to a SVM for action classification.
The method presented is shown to improve the state-of-art results on the standard Weizmann action dataset