On-line Hand Gesture Recognition to Control Digital TV using a Boosted and Randomized Clustering Forest

Ken Yano, Takeshi Ogawa, Motoaki Kawanabe, Takayuki Suyama

2015

Abstract

Behavior recognition has been one of the hot topics in the field of computer vision and its application. The popular appearance-based behavior classification methods often utilize sparse spatio-temporal features that capture the salient features and then use a visual word dictionary to construct visual words. Visual word assignments based on K-means clustering are very effective and behave well for general behavior classification. However, these pipelines often demand high computational power for the stages for low visual feature extraction and visual word assignment, and thus they are not suitable for real-time recognition tasks. To overcome the inefficient processing of K-means and the nearest neighbor approach, an ensemble approach is used for fast processing. For real-time recognition, an ensemble of random trees seems particularly suitable for visual dictionaries owing to its simplicity, speed, and performance. In this paper, we focus on the real-time recognition by utilizing a random clustering forest and verifying its effectiveness by classifying various hand gestures. In addition, we proposed a boosted random clustering forest so that training time can be successfully shortened with minimal negative impact on its recognition rate. For an application, we demonstrated a possible use of real-time gesture recognition by controlling a digital TV using hand gestures.

References

  1. Chaudhry, R., Ravichandran, A., Hager, G., and Vidal, R. (2009). Histograms of oriented optical flow and binetcauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1932-1939.
  2. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886- 893 vol. 1.
  3. Dollar, P., Rabaud, V., Cotrell, G., and Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on, pages 65-72.
  4. Everingham, M., Ali Eslami, S. M., Gool, L. V., Christopher Williams, K. I., Winn, J., and Zisserman, A. (2014). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision.
  5. Jurie, F. and Triggs, B. (2005). Creating efficient codebooks for visual recognition. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 604-610 Vol. 1.
  6. Kurakin, A., Zhang, Z., and Liu, Z. (2012). A real time system for dynamic hand gesture recognition with a depth sensor. In Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pages 1975- 1979.
  7. Laptev, I. and Lindeberg, T. (2003). Space-time interest points. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 432- 439 vol.1.
  8. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169-2178.
  9. Moosmann, F., Nowak, E., and Jurie, F. (2008). Randomized clustering forests for image classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):1632-1646.
  10. Perronnin, F. and Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In Computer Vision and Pattern Recognition, 2007. CVPR 7807. IEEE Conference on, pages 1-8.
  11. Shotton, J., Johnson, M., and Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8.
  12. Smeaton, A. F., Over, P., and Kraaij, W. (2006). Evaluation campaigns and trecvid. In In Multimedia Information Retrieval, pages 321-330.
  13. Uijlings, J., Smeulders, A. W. M., and Scha, R. (2010). Real-time visual concept classification. Multimedia, IEEE Transactions on, 12(7):665-681.
  14. Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C. (2006). Local features and kernels for classification of texture and object categories: A comprehensive study. In Computer Vision and Pattern Recognition Workshop, 2006. CVPRW 7806. Conference on, pages 13-13.
Download


Paper Citation


in Harvard Style

Yano K., Ogawa T., Kawanabe M. and Suyama T. (2015). On-line Hand Gesture Recognition to Control Digital TV using a Boosted and Randomized Clustering Forest . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 220-227. DOI: 10.5220/0005263502200227


in Bibtex Style

@conference{visapp15,
author={Ken Yano and Takeshi Ogawa and Motoaki Kawanabe and Takayuki Suyama},
title={On-line Hand Gesture Recognition to Control Digital TV using a Boosted and Randomized Clustering Forest},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={220-227},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005263502200227},
isbn={978-989-758-090-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - On-line Hand Gesture Recognition to Control Digital TV using a Boosted and Randomized Clustering Forest
SN - 978-989-758-090-1
AU - Yano K.
AU - Ogawa T.
AU - Kawanabe M.
AU - Suyama T.
PY - 2015
SP - 220
EP - 227
DO - 10.5220/0005263502200227