Behavior Recognition in Mouse Videos using Contextual Features Encoded by Spatial-temporal Stacked Fisher Vectors

Zheheng Jiang, Danny Crookes, Brian Desmond Green, Shengping Zhang, Huiyu Zhou

Abstract

Manual measurement of mouse behavior is highly labor intensive and prone to error. This investigation aims to efficiently and accurately recognize individual mouse behaviors in action videos and continuous videos. In our system each mouse action video is expressed as the collection of a set of interest points. We extract both appearance and contextual features from the interest points collected from the training datasets, and then obtain two Gaussian Mixture Model (GMM) dictionaries for the visual and contextual features. The two GMM dictionaries are leveraged by our spatial-temporal stacked Fisher Vector (FV) to represent each mouse action video. A neural network is used to classify mouse action and finally applied to annotate continuous video. The novelty of our proposed approach is: (i) our method exploits contextual features from spatio-temporal interest points, leading to enhanced performance, (ii) we encode contextual features and then fuse them with appearance features, and (iii) location information of a mouse is extracted from spatio-temporal interest points to support mouse behavior recognition. We evaluate our method against the database of Jhuang et al. (Jhuang et al., 2010) and the results show that our method outperforms several state-of-the-art approaches.

References

  1. Bishop, C. M. (2006). In Pattern Recognition and Machine Learning.
  2. Burgos-Artizzu, X. P., Dollár, P., Lin, D., Anderson, D. J., and Perona, P. (2012, June). In Social behavior recognition in continuous video. IEEE Conference on Computer Vision and Pattern Recognition.
  3. Chatfield, K., Lempitsky, V. S., Vedaldi, A., and Zisserman, A. (2011, September). In The devil is in the details: an evaluation of recent feature encoding methods. British Machine Vision Conference (Vol. 2, No. 4, p. 8).
  4. Dankert, H., Wang, L., Hoopfer, E. D., Anderson, D. J., and Perona, P. (2009). In Automated monitoring and analysis of social behavior in Drosophila. Nature methods, 6(4), 297-303.
  5. Dollár, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005, October). In Behavior recognition via sparse spatiotemporal features. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.
  6. Jaakkola, T. S., and Haussler, D. (1999). In Exploiting generative models in discriminative classifiers. Advances in neural information processing systems, 487-493.
  7. Jhuang, H., Garrote, E., Yu, X., Khilnani, V., Poggio, T., Steele, A. D., and Serre, T. (2010). In Automated homecage behavioural phenotyping of mice. Nature communications, 1, 68.
  8. Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007, October). In A biologically inspired system for action recognition. IEEE 11th International Conference on Computer Vision (pp. 1-8).
  9. Laptev, I. (2005). In On space-time interest points. International Journal of Computer Vision, 64(2-3), 107- 123.
  10. Roughan, J. V., Wright-Williams, S. L., and Flecknell, P. A. (2009). In Automated analysis of postoperative behaviour: assessment of HomeCageScan as a novel method to rapidly identify pain and analgesic effects in mice. Laboratory animals, 43(1), 17-26.
  11. Rousseau, J. B. I., Van Lochem, P. B. A., Gispen, W. H., and Spruijt, B. M. (2000). In Classification of rat behavior with an image-processing method and a neural network. Behavior Research Methods, Instruments, and Computers,32(1), 63-71.
  12. Sánchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). In Image classification with the fisher vector: Theory and practice. International journal of computer vision, 105(3), 222-245.
  13. Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). In Deep fisher networks for large-scale image classification. Advances in neural information processing systems (pp. 163-171).
  14. Steele, A. D., Jackson, W. S., King, O. D., and Lindquist, S. (2007). In The power of automated high-resolution behavior analysis revealed by its application to mouse models of Huntington's and prion diseases. Proceedings of the National Academy of Sciences, 104(6), 1983- 1988.
  15. Perronnin, F., and Larlus, D. (2015). In Fisher vectors meet neural networks: A hybrid classification architecture. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3743-3752).
  16. Perronnin, F., Sánchez, J., and Mensink, T. (2010). In Improving the fisher kernel for large-scale image classification. European conference on computer vision (pp. 143-156). Springer Berlin Heidelberg.
  17. Peng, X., Wang, L., Wang, X., and Qiao, Y. (2014). In Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. arXiv preprint arXiv:1405.4506.
  18. Peng, X., Wang, L., Qiao, Y., and Peng, Q. (2014, August). In A joint evaluation of dictionary learning and feature encoding for action recognition. 22nd International Conference on Pattern Recognition (pp. 2607-2612).
  19. Wang, H., Kläser, A., Schmid, C., and Liu, C. L. (2013). In Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1), 60-79.
  20. Wang, H., and Schmid, C. (2013). In Action recognition with improved trajectories. Proceedings of the IEEE International Conference on Computer Vision (pp. 3551-3558).
  21. Wang, H., Ullah, M. M., Klaser, A., Laptev, I., and Schmid, C. (2009). In Evaluation of local spatio-temporal features for action recognition. British Machine Vision Conference (pp. 124-1).
  22. Wang, H., Oneata, D., Verbeek, J., and Schmid, C. (2015). In A robust and efficient video representation for action recognition. International Journal of Computer Vision, 1-20.
  23. Willems, G., Tuytelaars, T., and Van Gool, L. (2008). In An efficient dense and scale-invariant spatio-temporal interest point detector. European conference on computer vision (pp. 650-663). Springer Berlin Heidelberg.
Download


Paper Citation


in Harvard Style

Jiang Z., Crookes D., Green B., Zhang S. and Zhou H. (2017). Behavior Recognition in Mouse Videos using Contextual Features Encoded by Spatial-temporal Stacked Fisher Vectors . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 259-269. DOI: 10.5220/0006244602590269


in Bibtex Style

@conference{icpram17,
author={Zheheng Jiang and Danny Crookes and Brian Desmond Green and Shengping Zhang and Huiyu Zhou},
title={Behavior Recognition in Mouse Videos using Contextual Features Encoded by Spatial-temporal Stacked Fisher Vectors},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={259-269},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006244602590269},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Behavior Recognition in Mouse Videos using Contextual Features Encoded by Spatial-temporal Stacked Fisher Vectors
SN - 978-989-758-222-6
AU - Jiang Z.
AU - Crookes D.
AU - Green B.
AU - Zhang S.
AU - Zhou H.
PY - 2017
SP - 259
EP - 269
DO - 10.5220/0006244602590269