Upper Body Detection and Feature Set Evaluation for Body Pose Classification

Laurent Fitte-Duval, Alhayat Ali Mekonnen, Frédéric Lerasle

2015

Abstract

This work investigates some visual functionalities required in Human-Robot Interaction (HRI) to evaluate the intention of a person to interact with another agent (robot or human). Analyzing the upper part of the human body which includes the head and the shoulders, we obtain essential cues on the person’s intention. We propose a fast and efficient upper body detector and an approach to estimate the upper body pose in 2D images. The upper body detector derived from a state-of-the-art pedestrian detector identifies people using Aggregated Channel Features (ACF) and fast feature pyramid whereas the upper body pose classifier uses a sparse representation technique to recognize their shoulder orientation. The proposed detector exhibits state-of-the-art result on a public dataset in terms of both detection performance and frame rate. We also present an evaluation of different feature set combinations for pose classification using upper body images and report promising results despite the associated challenges.

References

  1. Andriluka, M., Roth, S., and Schiele, B. (2010). Monocular 3d pose estimation and tracking by detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 623-630.
  2. Baltieri, D., Vezzani, R., and Cucchiara, R. (2012). People orientation recognition by mixtures of wrapped distributions on random trees. In European Conference in Computer Vision (ECCV), pages 270-283.
  3. Bazzani, L., Cristani, M., Tosato, D., Farenzena, M., Paggetti, G., Menegaz, G., and Murino, V. (2013). Social interactions by visual focus of attention in a three-dimensional environment. Expert Systems, 30(2):115-127.
  4. Bourdev, L. and Brandt, J. (2005). Robust object detection via soft cascade. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 236- 243.
  5. Chen, C., Heili, A., and Odobez, J.-M. (2011). Combined estimation of location and body pose in surveillance video. In IEEE Advanced Video and Signal-Based Surveillance (AVSS), pages 5-10.
  6. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Conference onComputer Vision and Pattern Recognition (CVPR), volume 1, pages 886-893.
  7. Dollár, P., Appel, R., Belongie, S., and Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8):1532-1545.
  8. Dollár, P., Wojek, C., Schiele, B., and Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):743-761.
  9. Eichner, M., Marin-Jimenez, M., Zisserman, A., and Ferrari, V. (2012). 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International Journal of Computer Vision, 99(2):190- 214.
  10. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303-338.
  11. Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8.
  12. Hu, R., Wang, R., Shan, S., and Chen, X. (2014). Robust head-shoulder detection using a two-stage cascade framework. In International Conference on Pattern Recognition (ICPR).
  13. Jafari, O. H., Mitzel, D., and Leibek, B. (2014). Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras. In International Conference on Robotics and Automation (ICRA'14).
  14. Katzenmaier, M., Stiefelhagen, R., and Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces (ICMI), pages 144-151.
  15. Li, M., Zhang, Z., Huang, K., and Tan, T. (2009). Rapid and robust human detection and tracking based on omegashape features. In International Conference on Image Processing (ICIP), pages 2545-2548.
  16. Mekonnen, A. A., Lerasle, F., and Zuriarrain, I. (2011). Multi-modal person detection and tracking from a mobile robot in a crowded environment. In International Conference on Computer Vision Theory and Applications (VISAPP'11), pages 511-520.
  17. Sheikhi, S. and Odobez, J.-M. (2012). Recognizing the visual focus of attention for human robot interaction. In Human Behavior Understanding, pages 99-112. Springer.
  18. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages I-511.
  19. Weinrich, C., Vollmer, C., and Gross, H.-M. (2012). Estimation of human upper body orientation for mobile robotics using an svm decision tree on monocular images. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2147-2152.
  20. Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., and Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210-227.
  21. Zeng, C. and Ma, H. (2010). Robust head-shoulder detection by pca-based multilevel HOG-LBP detector for people counting. In International Conference on Pattern Recognition (ICPR), pages 2069-2072.
Download


Paper Citation


in Harvard Style

Fitte-Duval L., Mekonnen A. and Lerasle F. (2015). Upper Body Detection and Feature Set Evaluation for Body Pose Classification . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 439-446. DOI: 10.5220/0005313104390446


in Bibtex Style

@conference{visapp15,
author={Laurent Fitte-Duval and Alhayat Ali Mekonnen and Frédéric Lerasle},
title={Upper Body Detection and Feature Set Evaluation for Body Pose Classification},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={439-446},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005313104390446},
isbn={978-989-758-090-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Upper Body Detection and Feature Set Evaluation for Body Pose Classification
SN - 978-989-758-090-1
AU - Fitte-Duval L.
AU - Mekonnen A.
AU - Lerasle F.
PY - 2015
SP - 439
EP - 446
DO - 10.5220/0005313104390446