Adding Model Constraints to CNN for Top View Hand Pose Recognition in Range Images

Aditya Tewari, Frederic Grandidier, Bertram Taetz, Didier Stricker

2016

Abstract

A new dataset for hand-pose is introduced. The dataset includes the top view images of the palm by Time of Flight (ToF) camera. It is recorded in an experimental setting with twelve participants for six hand-poses. An evaluation on the dataset is carried out with a dedicated Convolutional Neural Network (CNN) architecture for Hand Pose Recognition (HPR). This architecture uses a model-layer. The small size model layer creates a funnel shape network which adds a priori knowledge and constrains the network by modelling the degree of freedom of the palm, such that it learns palm features. It is demonstrated that this network performs better than a similar network without the prior added. A two-phase learning scheme which allows training the model on full dataset even when the classification problem is confined to a subset of the classes is described. The best model performs at an accuracy of 92%. Finally, we show the feature transfer capability of the network and compare the extracted features from various networks and discuss usefulness for various applications.

References

  1. Buchmann, V., Violich, S., Billinghurst, M., and Cockburn, A. (2004). Fingartips: gesture based direct manipulation in augmented reality. In Proceedings of the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia, pages 212-221. ACM.
  2. Chen, Q., Georganas, N. D., and Petriu, E. M. (2007). Real-time vision-based hand gesture recognition using haar-like features. In Instrumentation and Measurement Technology Conference Proceedings, 2007. IMTC 2007. IEEE, pages 1-6. IEEE.
  3. Davis, J. and Shah, M. (1994). Recognizing hand gestures. In Computer VisionECCV'94, pages 331-340. Springer.
  4. Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pages 249-256.
  5. Gupta, N., Mittal, P., Roy, S. D., Chaudhury, S., and Banerjee, S. (2002). Developing a gesture-based interface. Journal of the Institution of Electronics and Telecommunication Engineers, 48(3):237-244.
  6. Hasan, H. S. and Kareem, S. A. (2012). Human computer interaction for vision based hand gesture recognition: A survey. In Advanced Computer Science Applications and Technologies (ACSAT), 2012 International Conference on, pages 55-60. IEEE.
  7. Just, A. and Marcel, S. (2009). A comparative study of two state-of-the-art sequence processing techniques for hand gesture recognition. Computer Vision and Image Understanding, 113(4):532-543.
  8. LeCun, Y., Jackel, L., Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U., Sackinger, E., Simard, P., et al. (1995). Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective, 261:276.
  9. Lin, H.-I., Hsu, M.-H., and Chen, W.-K. (2014). Human hand gesture recognition using a convolution neural network. In Automation Science and Engineering (CASE), 2014 IEEE International Conference on, pages 1038-1043. IEEE.
  10. Liu, A., Tendick, F., Cleary, K., and Kaufmann, C. (2003). A survey of surgical simulation: applications, technology, and education. Presence: Teleoperators and Virtual Environments, 12(6):599-614.
  11. Nagi, J., Ducatelle, F., Di Caro, G., Cires¸an, D., Meier, U., Giusti, A., Nagi, F., Schmidhuber, J., Gambardella, L. M., et al. (2011). Max-pooling convolutional neural networks for vision-based hand gesture recognition. In Signal and Image Processing Applications (ICSIPA), 2011 IEEE International Conference on, pages 342-347. IEEE.
  12. Schlömer, T., Poppinga, B., Henze, N., and Boll, S. (2008). Gesture recognition with a wii controller. In Proceedings of the 2nd international conference on Tangible and embedded interaction, pages 11-14. ACM.
  13. Tang, D., Chang, H. J., Tejani, A., and Kim, T.-K. (2014). Latent regression forest: Structured estimation of 3d articulated hand posture. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3786-3793. IEEE.
  14. Tompson, J., Stein, M., Lecun, Y., and Perlin, K. Realtime continuous pose recovery of human hands using convolutional networks, journal = ACM Transactions on Graphics, year = 2014, month = August, volume = 33.
  15. Tompson, J., Stein, M., Lecun, Y., and Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph., 33(5):169:1-169:10.
  16. Weir, R., Mitchell, M., Clark, S., Puchhammer, G., Haslinger, M., Grausenburger, R., Kumar, N., Hofbauer, R., Kushnigg, P., Cornelius, V., et al. (2008). The intrinsic hand-a 22 degree-of-freedom artificial hand-wrist replacement. Myoelectric Symposium.
  17. Xu, C. and Cheng, L. (2013). Efficient hand pose estimation from a single depth image. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3456-3462. IEEE.
  18. Xu, Z., Möller, T., Kraft, H., Frey, J., and Albrecht, M. (2008). Photonic mixer device. US Patent 7,361,883.
  19. Zhang, X., Chen, X., Wang, W.-h., Yang, J.-h., Lantz, V., and Wang, K.-q. (2009). Hand gesture recognition and virtual game control based on 3d accelerometer and emg sensors. In Proceedings of the 14th International Conference on Intelligent User Interfaces, IUI 7809, pages 401-406, New York, NY, USA. ACM.
  20. Zobl, M., Geiger, M., Schuller, B., Lang, M., and Rigoll, G. (2003). A real-time system for hand gesture controlled operation of in-car devices. In Multimedia and Expo, 2003. ICME 7803. Proceedings. 2003 International Conference on, volume 3, pages III-541-4 vol.3.
Download


Paper Citation


in Harvard Style

Tewari A., Grandidier F., Taetz B. and Stricker D. (2016). Adding Model Constraints to CNN for Top View Hand Pose Recognition in Range Images . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 170-177. DOI: 10.5220/0005660301700177


in Bibtex Style

@conference{icpram16,
author={Aditya Tewari and Frederic Grandidier and Bertram Taetz and Didier Stricker},
title={Adding Model Constraints to CNN for Top View Hand Pose Recognition in Range Images},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={170-177},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005660301700177},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Adding Model Constraints to CNN for Top View Hand Pose Recognition in Range Images
SN - 978-989-758-173-1
AU - Tewari A.
AU - Grandidier F.
AU - Taetz B.
AU - Stricker D.
PY - 2016
SP - 170
EP - 177
DO - 10.5220/0005660301700177