Information Fusion for Action Recognition with Deeply Optimised Hough Transform Paradigm

Geoffrey Vaquette, Catherine Achard, Laurent Lucat

Abstract

Automatic human action recognition is a challenging and largely explored domain. In this work, we focus on action segmentation with Hough Transform paradigm and more precisely with Deeply Optimised Hough Transform (DOHT). First, we apply DOHT on video sequences using the well-known dense trajectories features and then, we propose to extend the method to efficiently merge information coming from various sensors. We have introduced three different ways to perform fusion, depending on the level at which information is merged. Advantages and disadvantages of these solutions are presented from the performance point of view and also according to the ease of use. Thus, one of the fusion level has the advantage to stay suitabe even if one or more sensors is out of order or disturbed.

References

  1. Cai, J., Merler, M., Pankanti, S., and Tian, Q. (2015). Heterogeneous semantic level features fusion for action recognition. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pages 307-314. ACM.
  2. Chan-Hon-Tong, A., Achard, C., and Lucat, L. (2013a). Deeply optimized hough transform: Application to action segmentation. In Image Analysis and Processing-ICIAP 2013, pages 51-60. Springer.
  3. Chan-Hon-Tong, A., Achard, C., and Lucat, L. (2014). Simultaneous segmentation and classification of human actions in video streams using deeply optimized hough transform. Pattern Recognition, 47(12):3807- 3818.
  4. Chan-Hon-Tong, A., Ballas, N., Achard, C., Delezoide, B., Lucat, L., Sayd, P., and Preˆteux, F. (2013b). Skeleton point trajectories for human daily activity recognition. In International Conference on Computer Vision Theory and Application.
  5. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886- 893. IEEE.
  6. Dalal, N., Triggs, B., and Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In Computer Vision-ECCV 2006, pages 428- 441. Springer.
  7. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1627-1645.
  8. Gall, J. and Lempitsky, V. (2009). Class-specific hough forests for object detection. In Internationnal Conference on Computer Vision and Pattern Recognition.
  9. Gall, J., Yao, A., Razavi, N., Van Gool, L., and Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. Pattern Analysis and Machine Intelligence.
  10. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. (2011). Efficient regression of generalactivity human poses from depth images. In Internationnal Conference on Computer Vision and Pattern Recognition.
  11. Han, J., Shao, L., Xu, D., and Shotton, J. (2013). Enhanced computer vision with microsoft kinect sensor: A review. Cybernetics, IEEE Transactions on, 43(5):1318-1334.
  12. Hough, P. V. (1962). Method and means for recognizing complex patterns. Technical report.
  13. Hu, J.-F., Zheng, W.-S., Lai, J., and Zhang, J. (2015). Jointly learning heterogeneous features for rgb-d activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5344-5352.
  14. Kosmopoulos, D. I., Papoutsakis, K., and Argyros, A. A. (2011). Online segmentation and classification of modeled actions performed in the context of unmodeled ones. Trans. on PAMI, 33(11):2188-2202.
  15. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE.
  16. Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision.
  17. Lin, Y.-Y., Hua, J.-H., Tang, N. C., Chen, M.-H., and Liao, H.-Y. M. (2014). Depth and skeleton associated action recognition without online accessible rgb-d cameras. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2617-2624. IEEE.
  18. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  19. Maji, S. and Malik, J. (2009). Object detection using a maxmargin hough transform. In Internationnal Conference on Computer Vision and Pattern Recognition.
  20. Peng, X., Wang, L., Wang, X., and Qiao, Y. (2014). Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. arXiv preprint arXiv:1405.4506.
  21. Song, Y., Liu, S., and Tang, J. (2015). Describing trajectory of surface patch for human action recognition on rgb and depth videos. Signal Processing Letters, IEEE, 22(4):426-429.
  22. Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., and Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In Internationnal Conference on Computer Vision and Pattern Recognition.
  23. Tenorth, M., Bandouch, J., and Beetz, M. (2009). The tum kitchen data set of everyday manipulation activities for motion tracking and action recognition. In International Conference on Computer Vision Workshops.
  24. Tian, Y., Sukthankar, R., and Shah, M. (2013). Spatiotemporal deformable part models for action detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE.
  25. Ullah, M. M., Parizi, S. N., and Laptev, I. (2010). Improving bag-of-features action recognition with non-local cues. In BMVC, volume 10, pages 95-1. Citeseer.
  26. Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2011). Action Recognition by Dense Trajectories. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3169-3176, Colorado Springs, United States.
  27. Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1):60-79.
  28. Wang, H. and Schmid, C. (2013). Action recognition with improved trajectories. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3551- 3558. IEEE.
  29. Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1290- 1297. IEEE.
  30. Wang, J., Nie, X., Xia, Y., Wu, Y., and Zhu, S.-C. (2014). Cross-view action modeling, learning, and recognition. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2649- 2656. IEEE.
  31. Wohlhart, P., Schulter, S., Kostinger, M., Roth, P., and Bischof, H. (2012). Discriminative hough forests for object detection. In Conference of British Machine Vision Conference.
  32. Xia, L. and Aggarwal, J. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2834-2841. IEEE.
  33. Xiaohan Nie, B., Xiong, C., and Zhu, S.-C. (2015). Joint action recognition and pose estimation from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1293-1301.
  34. Yao, A., Gall, J., Fanelli, G., and Van Gool, L. (2011). Does human action recognition benefit from pose estimation? In Conference of British Machine Vision Conference.
  35. Yao, A., Gall, J., and Van Gool, L. (2010). A hough transform-based voting framework for action recognition. In Internationnal Conference on Computer Vision and Pattern Recognition.
  36. Zhang, Y. and Chen, T. (2010). Implicit shape kernel for discriminative learning of the hough transform detector. In Conference of British Machine Vision Conference.
Download


Paper Citation


in Harvard Style

Vaquette G., Achard C. and Lucat L. (2016). Information Fusion for Action Recognition with Deeply Optimised Hough Transform Paradigm . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 423-430. DOI: 10.5220/0005725604230430


in Bibtex Style

@conference{visapp16,
author={Geoffrey Vaquette and Catherine Achard and Laurent Lucat},
title={Information Fusion for Action Recognition with Deeply Optimised Hough Transform Paradigm},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={423-430},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005725604230430},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Information Fusion for Action Recognition with Deeply Optimised Hough Transform Paradigm
SN - 978-989-758-175-5
AU - Vaquette G.
AU - Achard C.
AU - Lucat L.
PY - 2016
SP - 423
EP - 430
DO - 10.5220/0005725604230430