Action-centric Polar Representation of Motion Trajectories for Online Action Recognition

Fabio Martinez, Antoine Manzanera, Michèle Gouiffès, Thanh Phuong Nguyen

Abstract

This work introduces a novel action descriptor that represents activities instantaneously in each frame of a video sequence for action recognition. The proposed approach first characterizes the video by computing kinematic primitives along trajectories obtained by semi-dense point tracking in the video. Then, a frame level characterization is achieved by computing a spatial action-centric polar representation from the computed trajectories. This representation aims at quantifying the image space and grouping the trajectories within radial and angular regions. Motion histograms are then temporally aggregated in each region to form a kinematic signature from the current trajectories. Histograms with several time depths can be computed to obtain different motion characterization versions. These motion histograms are updated at each time, to reflect the kinematic trend of trajectories in each region. The action descriptor is then defined as the collection of motion histograms from all the regions in a specific frame. Classic support vector machine (SVM) models are used to carry out the classification according to each time depth. The proposed approach is easy to implement, very fast and the representation is consistent to code a broad variety of actions thanks to a multi-level representation of motion primitives. The proposed approach was evaluated on different public action datasets showing competitive results (94% and 88:7% of accuracy are achieved in KTH and UT datasets, respectively), and an efficient computation time.

References

  1. Cao, T., Wu, X., Guo, J., Yu, S., and Xu, Y. (2009). Abnormal crowd motion analysis. In Int. Conf. on Robotics and Biomimetics, pages 1709-1714.
  2. Cao, X., Zhang, H., Deng, C., Liu, Q., and Liu, H. (2014). Action recognition using 3d daisy descriptor. Mach. Vision Appl., 25(1):159-171.
  3. Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1-27:27.
  4. Chaudhry, R., Ravichandran, A., Hager, G., and Vidal, R. (2009). Histograms of oriented optical flow and binetcauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1932-1939.
  5. Efros, A., Berg, A. C., Mori, G., and Malik, J. (2003). Recognizing Action at a Distance. In Int. Conf. on Computer Vision, Washington, DC, USA.
  6. Garrigues, M. and Manzanera, A. (2012). Real time semidense point tracking. In Campilho, A. and Kamel, M., editors, Int. Conf. on Image Analysis and Recognition (ICIAR 2012), volume 7324 of Lecture Notes in Computer Science, pages 245-252, Aveiro, Portugal. Springer.
  7. Ikizler, N., Cinbis, R., and Duygulu, P. (2008). Human action recognition with line and flow histograms. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pages 1-4.
  8. Jain, M., Jegou, H., and Bouthemy, P. (2013). Better exploiting motion for better action recognition. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7813, pages 2555-2562, Washington, DC, USA. IEEE Computer Society.
  9. Kantorov, V. and Laptev, I. (2014). Efficient feature extraction, encoding and classification for action recognition.
  10. Ke, Y., Sukthankar, R., and Hebert, M. (2005). Efficient visual event detection using volumetric features. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 166-173 Vol. 1.
  11. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Conference on Computer Vision & Pattern Recognition.
  12. Liu, J., Luo, J., and Shah, M. (2009). Recognizing realistic actions from videos ”in the wild”. IEEE International Conference on Computer Vision and Pattern Recognition.
  13. Ryoo, M. S. and Aggarwal, J. K. (2010). UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA).
  14. Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: A local svm approach. In Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03, ICPR 7804, pages 32-36, Washington, DC, USA. IEEE Computer Society.
  15. Scovanner, P., Ali, S., and Shah, M. (2007). A 3- dimensional sift descriptor and its application to action recognition. pages 357-360.
  16. Tabia, H., Gouiffes, M., and Lacassagne, L. (2012). Motion histogram quantification for human action recognition. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2404-2407. IEEE.
  17. Wang, H., Klaser, A., Schmid, C., and Liu, C.-L. (2011). Action recognition by dense trajectories. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7811, pages 3169- 3176, Washington, DC, USA. IEEE Computer Society.
  18. Wang, H. and Schmid, C. (2013). Action Recognition with Improved Trajectories. In ICCV 2013 - IEEE International Conference on Computer Vision, pages 3551- 3558, Sydney, Australia. IEEE.
  19. Wang, H., Ullah, M. M., Kliser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In BMVC. British Machine Vision Association.
  20. Weinland, D., Ronfard, R., and Boyer, E. (2011). A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, 115(2):224-241.
  21. Willems, G., Tuytelaars, T., and Gool, L. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV 7808, pages 650-663, Berlin, Heidelberg. SpringerVerlag.
  22. Yu, T.-H., Kim, T.-K., and Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forest. In Proceedings of the British Machine Vision Conference, pages 52.1-52.12. BMVA Press. doi:10.5244/C.24.52.
  23. Zhang, Z., Hu, Y., Chan, S., and Chia, L.-T. (2008). Motion context: A new representation for human action recognition. Computer Vision-ECCV 2008, pages 817-829.
Download


Paper Citation


in Harvard Style

Martinez F., Manzanera A., Gouiffès M. and Nguyen T. (2016). Action-centric Polar Representation of Motion Trajectories for Online Action Recognition . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 442-448. DOI: 10.5220/0005730404420448


in Bibtex Style

@conference{visapp16,
author={Fabio Martinez and Antoine Manzanera and Michèle Gouiffès and Thanh Phuong Nguyen},
title={Action-centric Polar Representation of Motion Trajectories for Online Action Recognition},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={442-448},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005730404420448},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Action-centric Polar Representation of Motion Trajectories for Online Action Recognition
SN - 978-989-758-175-5
AU - Martinez F.
AU - Manzanera A.
AU - Gouiffès M.
AU - Nguyen T.
PY - 2016
SP - 442
EP - 448
DO - 10.5220/0005730404420448