Recognizing Human Actions based on Extreme Learning Machines

Grégoire Lefebvre, Julien Cumin

Abstract

In this paper, we tackle the challenge of action recognition by building robust models from Extreme Learning Machines (ELM). Applying this approach from reduced preprocessed feature vectors on the Microsoft Research Cambridge-12 (MSRC-12) Kinect gesture dataset outperforms the state-of-the-art results with an average correct classification rate of 0.953 over 20 runs, when splitting in two equal subsets for training and testing the 6,244 action instances. This ELM based proposal using a multi-quadric radial basis activation function is compared to other classical classification strategies such as Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) and advancements are also presented in terms of execution times.

References

  1. Amer, M., Todorovic, S., Fern, A., and Zhu, S.-C. (2013). Monte carlo tree search for scheduling activity recognition. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1353-1360.
  2. Broomhead, D. S. and Lowe, D. (1988). Multivariable Functional Interpolation and Adaptive Networks. Complex Systems 2, pages 321-355.
  3. Chen, S., Cowan, C., and Grant, P. (1991). Orthogonal least squares learning algorithm for radial basis function networks. Neural Networks, IEEE Transactions on, 2(2):302-309.
  4. Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recognizing action at a distance. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 726-733 vol.2.
  5. Ellis, C., Masood, S. Z., Tappen, M. F., Laviola, Jr., J. J., and Sukthankar, R. (2013). Exploring the trade-off between accuracy and observational latency in action recognition. Int. J. Comput. Vision, 101(3):420-436.
  6. Farabet, C., Couprie, C., Najman, L., and LeCun, Y. (2013). Learning hierarchical features for scene labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1915-1929.
  7. Fothergill, S., Mentis, H. M., Kohli, P., and Nowozin, S. (2012). Instructing people for training gestural interactive systems. In Konstan, J. A., Chi, E. H., and Höök, K., editors, CHI, pages 1737-1746. ACM.
  8. Graves, A., Mohamed, A., and Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pages 6645-6649.
  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining software: An update. In SIGKDD Explorations, volume 11.
  10. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput., 9(8):1735-1780.
  11. Huang, G., Huang, G.-B., Song, S., and You, K. (2015). Trends in extreme learning machines: A review. Neural Networks, 61(0):32 - 48.
  12. Huang, G.-B., Zhou, H., Ding, X., and Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 42(2):513-529.
  13. Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2004). Extreme learning machine: a new learning scheme of feedforward neural networks. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, volume 2, pages 985-990 vol.2.
  14. Hussein, M. E., Torki, M., Gowayyed, M. A., and El-Saban, M. (2013). Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence , IJCAI 7813, pages 2466-2472. AAAI Press.
  15. Iosifidis, A., Tefas, A., and Pitas, I. (2014). Minimum variance extreme learning machine for human action recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 5427-5431.
  16. Jain, A., Gupta, A., Rodriguez, M., and Davis, L. (2013). Representing videos using mid-level discriminative patches. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2571- 2578.
  17. Kantorov, V. and Laptev, I. (2014). Efficient feature extraction, encoding, and classification for action recognition. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2593- 2600.
  18. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324.
  19. Lefebvre, G., Berlemont, S., Mamalet, F., and Garcia, C. (2015). Inertial gesture recognition with BLSTMRNN. In Koprinkova-Hristova, P., Mladenov, V., and Kasabov, N. K., editors, Artificial Neural Networks, volume 4 of Springer Series in Bio-/Neuroinformatics, pages 393-410. Springer International Publishing.
  20. Minhas, R., Baradarani, A., Seifzadeh, S., and Jonathan Wu, Q. M. (2010). Human action recognition using extreme learning machine based on visual vocabularies. Neurocomput., 73(10-12):1906-1917.
  21. Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., and Pala, P. (2013). Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pages 479-485.
  22. Tong, M. H., Bickett, A. D., Christiansen, E. M., and Cottrell, G. W. (2007). Learning grammatical structure with echo state networks. Neural Networks, 20(3):424 - 432. Echo State Networks and Liquid State Machines.
  23. Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 588-595.
  24. Wang, H., Klaser, A., Schmid, C., and Liu, C.-L. (2011a). Action recognition by dense trajectories. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3169-3176.
  25. Wang, Y., Cao, F., and Yuan, Y. (2011b). A study on effectiveness of extreme learning machine. Neurocomputing, 74(16):2483-2490.
  26. Zhang, Y., Ding, S., Xu, X., Zhao, H., and Xing, W. (2013). An algorithm research for prediction of extreme learning machines based on rough sets. Journal of Computers, 8(5).
Download


Paper Citation


in Harvard Style

Lefebvre G. and Cumin J. (2016). Recognizing Human Actions based on Extreme Learning Machines . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 478-483. DOI: 10.5220/0005675004780483


in Bibtex Style

@conference{visapp16,
author={Grégoire Lefebvre and Julien Cumin},
title={Recognizing Human Actions based on Extreme Learning Machines},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={478-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005675004780483},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Recognizing Human Actions based on Extreme Learning Machines
SN - 978-989-758-175-5
AU - Lefebvre G.
AU - Cumin J.
PY - 2016
SP - 478
EP - 483
DO - 10.5220/0005675004780483