Watch Where You’re Going! - Pedestrian Tracking Via Head Pose

Sankha S. Mukherjee, Rolf H. Baxter, Neil M. Robertson


In this paper we improve pedestrian tracking using robust, real-time human head pose estimation in low resolution RGB data without any smoothing motion priors such as direction of motion. This paper presents four principal novelties. First, we train a deep convolutional neural network (CNN) for head pose classification with data from various sources ranging from high to low resolution. Second, this classification network is then fine-tuned on the continuous head pose manifold for regression based on a subset of the data. Third, we attain state-of-art performance on public low resolution surveillance datasets. Finally, we present improved tracking results using a Kalman filter based intentional tracker. The tracker fuses the instantaneous head pose information in the motion model to improve tracking based on predicted future location. Our implementation computes head pose for a head image in 1.2 milliseconds on commercial hardware, making it real-time and highly scalable.


  1. Balasubramanian, V., Ye, J., and Panchanathan, S. (2007). Biased manifold embedding: a framework for personindependent head pose estimation. In Proceeding of the IEEE Conference on Computer Vision and PatternRecognition, pages 1-7.
  2. Baxter, R., Leach, M., Mukherjee, S., and Robertson, N. (2015). An adaptive motion model for person tracking with instantaneous head-pose features. Signal Processing Letters, IEEE, 22(5):578-582.
  3. Baxter, R. H., Leach, M., and Robertson, N. M. (2014). Tracking with Intent. In Sensor Signal Prcoessing for Defence.
  4. BenAbdelkader, C. (2010). Robust head pose estimation using supervised manifold learning. In Proceeding of the 11th European Conference on Computer Vision, pages 518-531.
  5. Benfold, B. and Reid, I. (2008). Colour invariant head pose classification in low resolution video. In Proceeding of the British Machine Vision Conference.
  6. Benfold, B. and Reid, I. (2011). Unsupervised learning of a scene-specific coarse gaze estimator. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2344-2351.
  7. Blanz, V. and Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 7899, pages 187-194, New York, NY, USA. ACM Press/Addison-Wesley Publishing Co.
  8. Cheng, C. and Odobez, J. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1554-1551.
  9. Fanelli, G., Dantone, M., Gall, J., Fossati, A., and Van Gool, L. (2013). Random forests for real time 3d face analysis. Int. J. Comput. Vision, 101(3):437-458.
  10. Gesierich, B., Bruzzo, A., Ottoboni, G., and Finos, L. (2008). Human gaze behaviour during action execution and observation. Acta Psychologica, 128(2):324 - 330.
  11. Gourier, N., Maisonnasse, J., Hall, D., and Crowley, J. (2006). Head pose estimation on low resolution images. In Proceeding of the 1st International Evaluation Conference on Classification of Events, Activities and Relationships, pages 270-280.
  12. He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.
  13. Henderson, J. M. and Hollingworth, A. (1999). Highlevel scene perception. Annual Review of Psychology, 50(1):243-271. PMID: 10074679.
  14. Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167.
  15. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
  16. Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. CoRR, abs/1404.5997.
  17. Langton, S., Honeyman, H., and Tessler, E. (2004). The influence of head contour and nose angle on the perception of eye-gaze direction. Perception & Psychophysics, 66(5):752-771.
  18. Robertson, N. and Reid, I. (2006). Estimating gaze direction from low-resolution faces in video. In Proceeding of the 9th European Conference on Computer Vision, 2006, volume 3952/2006, pages 402-415.
  19. Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.
  20. Stiefelhagen, R. (2004). Estimating head pose with neural network-results on the pointing04 icpr workshop evaluation data. In Proceedings of the ICPR Workshop on Visual Observation of Deictic Gestures.
  21. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv e-prints.
  22. Tosato, D., Spera, M., Cristani, M., and Murino, V. (2013). Characterizing humans on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1972-1984.

Paper Citation

in Harvard Style

Mukherjee S., Baxter R. and Robertson N. (2016). Watch Where You’re Going! - Pedestrian Tracking Via Head Pose . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 573-579. DOI: 10.5220/0005786905730579

in Bibtex Style

author={Sankha S. Mukherjee and Rolf H. Baxter and Neil M. Robertson},
title={Watch Where You’re Going! - Pedestrian Tracking Via Head Pose},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016)},

in EndNote Style

JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016)
TI - Watch Where You’re Going! - Pedestrian Tracking Via Head Pose
SN - 978-989-758-175-5
AU - Mukherjee S.
AU - Baxter R.
AU - Robertson N.
PY - 2016
SP - 573
EP - 579
DO - 10.5220/0005786905730579