Graph-based Kernel Representation of Videos for Traditional Dance Recognition

Christina Chrysouli, Vasileios Gavriilidis, Anastasios Tefas

2014

Abstract

In this paper, we propose a novel graph-based kernel method in order to construct histograms for a bag of words approach, by using similarity measures, applied in activity recognition problems. Bag of words is the most popular framework for performing classification on video data. This framework, however, is an orderless collection of features. We propose a better way to encode action in videos, via altering the histograms. The creation of such histograms is performed based on kernel methods, inspired from graph theory, computed with no great additional computational cost. Moreover, when using the proposed algorithm to construct the histograms, a richer representation of videos is attained. Experiments on folk dances recognition were conducted based on our proposed method, by comparing histograms extracted from a typical bag-of-words framework against histograms of the proposed method, which provided promising results on this challenging task.

References

  1. Bobick, A. F. and Davis, J. W. (2001). The recognition of human movement using temporal templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(3):257-267.
  2. Chapelle, O., Weston, J., and Schölkopf, B. (2002). Cluster kernels for semi-supervised learning. In Advances in Neural Information Processing Systems, page 15.
  3. Chung, F. R. K. (1996). Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). American Mathematical Society.
  4. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886- 893. IEEE.
  5. Dalal, N., Triggs, B., and Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In Computer Vision-ECCV 2006, pages 428- 441. Springer.
  6. Graf, A. B. A. and Borer, S. (2001). Normalization in support vector machines. In in Proc. DAGM 2001 Pattern Recognition, pages 277-282. SpringerVerlag.
  7. Grauman, K. and Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1458-1465 Vol. 2.
  8. Ikizler-Cinbis, N. and Sclaroff, S. (2010). Object, scene and actions: Combining multiple features for human action recognition. In Computer Vision-ECCV 2010, pages 494-507. Springer.
  9. Iosifidis, A., Tefas, A., and Pitas, I. (2013a). Dynamic action recognition based on dynemes and extreme learning machine. Pattern Recognition Letters, 34(15):1890-1898.
  10. Iosifidis, A., Tefas, A., and Pitas, I. (2013b). Multidimensional sequence classification based on fuzzy distances and discriminant analysis. Knowledge and Data Engineering, IEEE Transactions on, 25(11):2564-2575.
  11. Kapsouras, I., Karanikolos, S., Nikolaidis, N., and Tefas, A. (2013). Feature comparison and feature fusion for traditional dances recognition. In Engineering Applications of Neural Networks, pages 172-181. Springer.
  12. Kläser, A., Marszalek, M., and Schmid, C. (2008). A spatiotemporal descriptor based on 3d-gradients. In BMVC. British Machine Vision Association.
  13. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR 7806, pages 2169-2178, Washington, DC, USA. IEEE Computer Society.
  14. Li, B., Ayazoglu, M., Mao, T., Camps, O. I., and Sznaier, M. (2011). Activity recognition using dynamic subspace angles. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3193- 3200. IEEE.
  15. Niu, W., Long, J., Han, D., and Wang, Y.-F. (2004). Human activity detection and recognition for video surveillance. In Multimedia and Expo, 2004. ICME'04. 2004 IEEE International Conference on, volume 1, pages 719-722. IEEE.
  16. Peng, X., Wu, X., Peng, Q., Qi, X., Qiao, Y., and Liu, Y. (2013). Exploring dense trajectory feature and encoding methods for human interaction recognition. In Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pages 23-27. ACM.
  17. Raptis, M., Kokkinos, I., and Soatto, S. (2012). Discovering discriminative action parts from mid-level video representations. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1242-1249. IEEE.
  18. Ravichandran, A., Chaudhry, R., and Vidal, R. (2013). Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2):342-353.
  19. Robertson, N. and Reid, I. (2006). A general method for human activity recognition in video. Computer Vision and Image Understanding, 104(2):232-248.
  20. Rousseau, F. and Vazirgiannis, M. (2013). Graph-of-word and tw-idf: New approach to ad hoc ir. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 7813, pages 59-68, New York, USA. ACM.
  21. Schölkopf, B. and Smola, A. J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA.
  22. Szummer, M. and Jaakkola, T. (2002). Partially labeled classification with markov random walks. In Advances in Neural Information Processing Systems, pages 945-952. MIT Press.
  23. Turaga, P., Chellappa, R., Subrahmanian, V. S., and Udrea, O. (2008). Machine recognition of human activities: A survey. Circuits and Systems for Video Technology, IEEE Transactions on, 18(11):1473-1488.
  24. Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2011). Action Recognition by Dense Trajectories. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3169-3176, Colorado Springs, United States.
  25. Wang, H. and Schmid, C. (2013). Action recognition with improved trajectories. In IEEE International Conference on Computer Vision, Sydney, Australia.
  26. Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., and Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In ICPR Workshop on Learning for Adaptable Visual Systems.
  27. Wu, S., Oreifej, O., and Shah, M. (2011). Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1419-1426. IEEE.
  28. Zhang, J., Lazebnik, S., and Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73:2007.
Download


Paper Citation


in Harvard Style

Chrysouli C., Gavriilidis V. and Tefas A. (2014). Graph-based Kernel Representation of Videos for Traditional Dance Recognition . In Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014) ISBN 978-989-758-054-3, pages 195-202. DOI: 10.5220/0005076101950202


in Bibtex Style

@conference{ncta14,
author={Christina Chrysouli and Vasileios Gavriilidis and Anastasios Tefas},
title={Graph-based Kernel Representation of Videos for Traditional Dance Recognition},
booktitle={Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)},
year={2014},
pages={195-202},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005076101950202},
isbn={978-989-758-054-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)
TI - Graph-based Kernel Representation of Videos for Traditional Dance Recognition
SN - 978-989-758-054-3
AU - Chrysouli C.
AU - Gavriilidis V.
AU - Tefas A.
PY - 2014
SP - 195
EP - 202
DO - 10.5220/0005076101950202