Fang, H., Xie, S., Tai, Y., Lu, C., 2017. RMPE: Regional 
Multi-person Pose Estimation. IEEE International 
Conference on Computer Vision, 2353-2362. 
Feichtenhofer, C., Pinz, A., Zisserman, A., 2016. Deep 
Residual Learning for Image Recognition. IEEE 
Conference on Computer Vision and Pattern 
Recognition, 1933-1941. 
He, K., Gkioxari, G., Dollár, P., Girshick, R. B., 2017. 
Mask R-CNN. IEEE International Conference on 
Computer Vision, 2980-2988. 
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual 
Learning for Image Recognition. IEEE Conference on 
Computer Vision and Pattern Recognition, 770-778. 
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, 
K., 2015. Spatial Transformer Networks. Conference 
on Neural Information Processing Systems, 2017–
2025. 
Joseph, R., Ali, F., 2018. YOLOv3: An Incremental 
Improvement. arXiv preprint arXiv: 1804.02767. 
Kim T. S., Reiter, A., 2017. Interpretable 3d Human Action 
Analysis with Temporal Convolutional Networks. 
Computer Vision and Pattern Recognition Workshops, 
1623–1631. 
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T., 
2011. HMDB: A Large Video Database for Human 
Motion Recognition. IEEE International Conference 
on Computer Vision, 2556-2563. 
Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L., 2018. 
Skeleton-Based Relational Modeling for Action 
Recognition. arXiv preprint arXiv:1805.02556. 
Liu, J., Shahroudy, A., Xu, D., Wang, G., 2016. Spatio-
Temporal LSTM with Trust Gates for 3d Human 
Action Recognition. European Conference on 
Computer Vision, 816-833. 
Paszke, A., Gross, S., Massa, F. et al., 2015. PyTorch: An 
Imperative Style, High-Performance Deep Learning 
Library.  Advances in Neural Information Processing 
Systems, 8024-8035. 
Qiu, Z., Yao, T., Mei, T., 2017. Learning Spatio-Temporal 
Representation with Pseudo-3D Residual Networks. 
2017 IEEE Conference on Computer Vision and 
Pattern Recognition, 5534-5542. 
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., 
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, 
M. S., Berg, A. C., Li, F., 2015. ImageNet Large Scale 
Visual Recognition Challenge. International Journal of 
Computer Vision, 211-252. 
Shahroudy, A., Liu, J., Ng, T., Wang, G., 2016. NTU 
RGB+D: A Large Scale Dataset for 3d Human Activity 
Analysis.  IEEE Conference on Computer Vision and 
Pattern Recognition, 1010-1019. 
Shi, L., Zhang, Y., Cheng, J., Lu, H., 2018. Non-Local 
Graph Convolutional Networks for Skeleton-Based 
Action Recognition. arXiv preprint arXiv:1805.07694. 
Shi, L., Zhang, Y., Cheng, J., Lu, H., 2019. Skeleton-Based 
Action Recognition with Directed Graph Neural 
Networks. IEEE Conference on Computer Vision and 
Pattern Recognition, 7912-7921. 
Simonyan, K., Zisserman, A., 2014. Two-Stream 
Convolutional Networks for Action Recognition in 
Videos.  Annual Conference on Neural Information 
Processing Systems 2014, 568-576. 
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J., 2017. An End-
to-End Spatio-Temporal Attention Model for Human 
Action Recognition from Skeleton Data. AAAI 
Conference on Artificial Intelligence, 4263-4270. 
Soomro, K., Zamir, A. R., Shah, M., 2012. A Dataset of 101 
Human Actions Classes from Videos in The Wild. 
arXiv preprint arXiv:1212.0402. 
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., 
Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, 
A., 2015. Going deeper with convolutions. IEEE 
Conference on Computer Vision and Pattern 
Recognition, 1-9. 
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J., 2018. Deep 
Progressive Reinforcement Learning for Skeleton-
Based Action Recognition. IEEE Conference on 
Computer Vision and Pattern Recognition, 5323-5332. 
Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., Paluri, 
M., 2015. Learning Spatiotemporal Features with 3D 
Convolutional Networks. 2015 IEEE International 
Conference on Computer Vision, 4489-4497. 
Tran, D., Wang, H., Torresani, L., Feiszli, M., 2019. Video 
Classification with Channel-Separated Convolutional 
Networks. arXiv preprint arXiv: 1904.02811. 
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., 
Paluri, M., 2018. A Closer Look at Spatiotemporal 
Convolutions for Action Recognition. 2018 IEEE 
Conference on Computer Vision and Pattern 
Recognition, 6450-6459. 
Wang, H., Schmid, C., 2013. Action Recognition with 
Improved Trajectories. IEEE International Conference 
on Computer Vision, 3551-3558. 
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., 
Gool, L. V., 2016. Temporal Segment Networks: 
Towards Good Practices for Deep Action Recognition. 
European Conference on Computer Vision, 20-36. 
Xie, S., Girshick, R. B., Dollár, P., Tu, Z., He, K., 2017. 
Aggregated Residual Transformations for Deep Neural 
Networks. IEEE Conference on Computer Vision and 
Pattern Recognition, 5987-5995. 
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K., 2018. 
Rethinking Spatiotemporal Feature Learning for Video 
Understanding.  European Conference on Computer 
Vision, 318-335. 
Yan, S., Xiong, Y., Lin, D., 2018. Spatial Temporal Graph 
Convolutional Networks for Skeleton-Based Action 
Recognition.  AAAI Conference on Artificial 
Intelligence, 
7444-7452.