ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE

Binu M. Nair, Vijayan K. Asari

Abstract

A novel algorithm has been proposed for recognizing human actions using a combination of shape descriptors. Every human action is considered as a 3D space time shape and the shape descriptors based on the 3D Euclidean distance transform and the Radon transform are used for its representation. The Euclidean distance transform gives a suitable internal representation where the interior values correspond to the action performed. By taking the gradient of this distance transform, the space time shape is divided into different number of levels with each level representating a coarser version of the original space time shape. Then, at each level and at each frame, the Radon transform is applied from where action features are extracted. The action features are the R-Transform feature set which gives the posture variations of the human body with respect to time and the R-Translation vector set which gives the translatory variation. The combination of these action features is used in a nearest neighbour classifier for action recognition.

References

  1. A.A.Efros, A.C.Berg, G.Mori, and J.Malik (2003). RecogD. W. Paglieroni (1992). Distance transforms: Properties and machine vision applications. In Graphical Models and Image Processing - CVGIP 1992.
  2. D. Zhang and G. Lu (2003). A comparative study on shape retrieval using fourier descriptors with different shape signatures. In Journal of Visual Communication and Image Representation.
  3. G. Borgefors (1984). Distance transformations in arbitrary dimensions. In Computer Vision, Graphics, and Image Processing.
  4. G. Bradski and A. Kaehler (2008). Learning OpenCV. O'Reilly Media Inc.
  5. G. R. Bradski and J. W. Davis (2000). Motion segmentation and pose recognition with motion history gradients. In Fifth IEEE Workshop on Applications of Computer Vision.
  6. J. C. Niebles, H. Wang and L. Fei-Fei (2006). Unsupervised learning of human action categories using spatial-temporal words. In Proceedings of the British Machine Vision Conference - BMVC 2006.
  7. J. W. Davis (2003). Hierarchical motion history images for recognizing human motion. In Proceedings of the IEEE Workshop on Detection and Recognition of Events in Video.
  8. J. W. Davis and A. F. Bobick (1997). The representation and recognition of human movement using temporal templates. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  9. J.Zhang, K. F. Man and J. Y. Ke (1998). Timeseries prediction using lyapunov exponents in embedding phase space. In International Conference on Systems, Man and Cybernetics.
  10. L. Gorelick, M. Galun, E. Sharon, R. Basri, and A. Brandt (2004). Shape representation and classification using the poisson equation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-CVPR 2004.
  11. M. Blank, L. Gorelick, E. Shechtman, M.Irani, and R.Basri (2005). Actions as space-time shapes. In Tenth IEEE International Conference on Computer Vision - ICCV 2005.
  12. M. K. Hu (2008). Visual pattern recognition by moment invariants. In IEEE Workshop on Motion and Video Computing - WMVC 2008.
  13. M.Mitani, M.Takaya, A.Kojima, and K.Fukunaga (2006). Environment recognition based on analysis of human actions for mobile robot. In The 18th International Conference on Pattern Recognition - ICPR 2006.
  14. P. F. Felzenszwalb and D. P. Huttenlocher (2004). Distance transforms of sampled functions. In Technical report, Cornell Computing and Information Science.
  15. P. Scovanner, S. Ali, and M. Shah (2007). A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia.
  16. Q. Chen, E. Petriu and X.Yang (2004). A comparative study of fourier descriptors and hu's seven moment invariants for image recognition. In Canadian Conference on Electrical and Computer Engineering.
  17. R. N. Bracewell (1995). Two-Dimensional Imaging. Prentice Hall, Englewood Cliffs, NJ.
  18. S. Ali, A. Basharat and M.Shah (2007). Chaotic invariants for human action recognition. In IEEE 11th International Conference on Computer Vision - ICCV 2007.
  19. S. Tabbone, L. Wendling and J. P.Salmon (2006). A new shape descriptor defined on the radon transform. In Computer Vision and Image Understanding.
  20. X. Sun, M. Chen and A.Hauptmann (2009). Action recognition via local descriptors and holistic features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
  21. Y. Wang, K. Huang and T.Tan (2007). Human activity recognition based on r-transform. In IEEE Conference on Computer Vision and Pattern Recognition - CVPR 2007.
Download


Paper Citation


in Harvard Style

M. Nair B. and K. Asari V. (2011). ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011) ISBN 978-989-8425-47-8, pages 378-386. DOI: 10.5220/0003376303780386


in Bibtex Style

@conference{visapp11,
author={Binu M. Nair and Vijayan K. Asari},
title={ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)},
year={2011},
pages={378-386},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003376303780386},
isbn={978-989-8425-47-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)
TI - ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE
SN - 978-989-8425-47-8
AU - M. Nair B.
AU - K. Asari V.
PY - 2011
SP - 378
EP - 386
DO - 10.5220/0003376303780386