Knowing What You Don’t Know - Novelty Detection for Action Recognition in Personal Robots

Thomas Moerland, Aswin Chandarr, Maja Rudinac, Pieter Jonker

2016

Abstract

Novelty detection is essential for personal robots to continuously learn and adapt in open environments. This paper specifically studies novelty detection in the context of action recognition. To detect unknown (novel) human action sequences we propose a new method called background models, which is applicable to any generative classifier. Our closed-set action recognition system consists of a new skeleton-based feature combined with a Hidden Markov Model (HMM)-based generative classifier, which has shown good earlier results in action recognition. Subsequently, novelty detection is approached from both a posterior likelihood and hypothesis testing view, which is unified as background models. We investigate a diverse set of background models: sum over competing models, filler models, flat models, anti-models, and some reweighted combinations. Our standard recognition system has an inter-subject recognition accuracy of 96% on the Microsoft Research Action 3D dataset. Moreover, the novelty detection module combining anti-models with flat models has 78% accuracy in novelty detection, while maintaining 78% standard recognition accuracy as well. Our methodology can increase robustness of any current HMM-based action recognition system against open environments, and is a first step towards an incrementally learning system.

References

  1. Aggarwal, J. and Xia, L. (2014). Human activity recognition from 3D data: A review. Pattern Recognition Letters, 48:70-80.
  2. Chandarr, A., Bruinink, M., Gaisser, F., Rudinac, M., and Jonker, P. (2013). Towards bringing service robots to households: Robby ,Lea smart affordable interactive robots. In IEEE/RSJ International Conference on Advanced Robotics (ICAR 2013).
  3. Gales, M. and Young, S. (2008). The application of hidden Markov models in speech recognition. Foundations and trends in signal processing, 1(3):195-304.
  4. Jiang, H. (2005). Confidence measures for speech recognition: a survey. Speech Communication, 45(4):455- 470.
  5. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception & psychophysics, 14(2):201-211.
  6. Kamppari, S. O. and Hazen, T. J. (2000). Word and phone level acoustic confidence scoring. In ICASSP, pages 1799-1802. IEEE.
  7. Kemp, T. and Schaaf, T. (1997). Estimating confidence using word lattices. In Kokkinakis, G., Fakotakis, N., and Dermatas, E., editors, EUROSPEECH. ISCA.
  8. Li, W., Zhang, Z., and Liu, Z. (2008). Expandable datadriven graphical modeling of human actions based on salient postures. Circuits and Systems for Video Technology, IEEE Transactions on, 18(11):1499-1510.
  9. Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition based on a bag of 3d points. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 9-14.
  10. Markou, M. and Singh, S. (2003). Novelty Detection: a Review-Part 1: Statistical Approaches. Signal Processing, 83(12):2481-2497.
  11. Masud, M. M., Chen, Q., Khan, L., Aggarwal, C. C., Gao, J., Han, J., Srivastava, A. N., and Oza, N. C. (2013). Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans. Knowl. Data Eng., 25(7):1484-1497.
  12. Nowozin, S. and Shotton, J. (2012). Action points: A representation for low-latency online human action recognition. Microsoft Research Cambridge, Tech. Rep. MSR-TR-2012-68.
  13. Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA: Harvard University Press.
  14. Popoola, O. P. and Wang, K. (2012). Video-based abnormal human behavior recognition-A review. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(6):865-878.
  15. Rahim, M. G., Lee, C.-H., and Juang, B.-H. (1997). Discriminative utterance verification for connected digits recognition. Speech and Audio Processing, IEEE Transactions on, 5(3):266-277.
  16. Raptis, M., Kirovski, D., and Hoppe, H. (2011). Realtime classification of dance gestures from skeleton animation. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 7811, pages 147-156, New York, NY, USA. ACM.
  17. Rose, R. C., Juang, B.-H., and Lee, C.-H. (1995). A training procedure for verifying string hypotheses in continuous speech recognition. In ICASSP, pages 281-284. IEEE Computer Society.
  18. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116-124.
  19. Sukkar, R., Lee, C.-H., et al. (1996). Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition. Speech and Audio Processing, IEEE Transactions on, 4(6):420-429.
  20. Vieira, A. W., Nascimento, E. R., Oliveira, G. L., Liu, Z., and Campos, M. F. (2012). STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 252-259. Springer.
  21. Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2014). Learning actionlet ensemble for 3D human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(5):914-927.
  22. Weinland, D., Ronfard, R., and Boyer, E. (2011). A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, 115(2):224-241.
  23. Yang, X. and Tian, Y. (2014). Effective 3D action recognition using eigenjoints. Journal of Visual Communication and Image Representation, 25(1):2-11.
  24. Yu, G., Liu, Z., and Yuan, J. (2014). Discriminative orderlet mining for real-time recognition of human-object interaction. In Asian Conference on Computer Vision.
Download


Paper Citation


in Harvard Style

Moerland T., Chandarr A., Rudinac M. and Jonker P. (2016). Knowing What You Don’t Know - Novelty Detection for Action Recognition in Personal Robots . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 317-327. DOI: 10.5220/0005677903170327


in Bibtex Style

@conference{visapp16,
author={Thomas Moerland and Aswin Chandarr and Maja Rudinac and Pieter Jonker},
title={Knowing What You Don’t Know - Novelty Detection for Action Recognition in Personal Robots},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={317-327},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005677903170327},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Knowing What You Don’t Know - Novelty Detection for Action Recognition in Personal Robots
SN - 978-989-758-175-5
AU - Moerland T.
AU - Chandarr A.
AU - Rudinac M.
AU - Jonker P.
PY - 2016
SP - 317
EP - 327
DO - 10.5220/0005677903170327