
models and advanced data augmentation techniques
to further enhance performance.
ACKNOWLEDGEMENTS
This research has been supported by the ESPOL
project “Reconocimiento de patrones en im
´
agenes us-
ando t
´
ecnicas basadas en aprendizaje”.
REFERENCES
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov,
A., and Zagoruyko, S. (2020). End-to-end object de-
tection with transformers.
Carreira, J. and Zisserman, A. (2017). Quo vadis, action
recognition? a new model and the kinetics dataset.
In proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 6299–6308.
Cioppa, A., Deli
`
ege, A., Giancola, S., Ghanem, B.,
Droogenbroeck, M. V., Gade, R., and Moeslund, T. B.
(2020). A context-aware loss function for action spot-
ting in soccer videos.
Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2019).
Slowfast networks for video recognition. In Proceed-
ings of the IEEE/CVF international conference on
computer vision, pages 6202–6211.
Gao, Y., Lu, J., Li, S., Li, Y., and Du, S.
(2024). Hypergraph-based multi-view action recog-
nition using event cameras. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
46(10):6610–6622.
Giancola, S., Amine, M., Dghaily, T., and Ghanem, B.
(2018). Soccernet: A scalable dataset for action spot-
ting in soccer videos. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion workshops, pages 1711–1721.
Gu, A. and Dao, T. (2024). Mamba: Linear-time sequence
modeling with selective state spaces.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Held, J., Cioppa, A., Giancola, S., Hamdi, A., Ghanem, B.,
and Van Droogenbroeck, M. (2023). Vars: Video as-
sistant referee system for automated soccer decision
making from multiple views. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 5086–5097.
Hu, Y., Zeng, Z., Yin, L., Wei, X., Zhou, X., and Huang,
T. S. (2008). Multi-view facial expression recognition.
In 2008 8th IEEE International Conference on Auto-
matic Face & Gesture Recognition, pages 1–6. IEEE.
Iosifidis, A., Tefas, A., and Pitas, I. (2013). Multi-view hu-
man action recognition: A survey. In 2013 Ninth in-
ternational conference on intelligent information hid-
ing and multimedia signal processing, pages 522–525.
IEEE.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C.,
Vijayanarasimhan, S., Viola, F., Green, T., Back, T.,
Natsev, P., et al. (2017). The kinetics human action
video dataset. arXiv preprint arXiv:1705.06950.
Li, Y., Wu, C.-Y., Fan, H., Mangalam, K., Xiong, B., Ma-
lik, J., and Feichtenhofer, C. (2022). Mvitv2: Im-
proved multiscale vision transformers for classifica-
tion and detection. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 4804–4814.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on
computer vision, pages 2980–2988.
Loshchilov, I. and Hutter, F. (2017). Decoupled weight de-
cay regularization. arXiv preprint arXiv:1711.05101.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. Advances in Neural Information Pro-
cessing Systems, 32.
Putra, P. U., Shima, K., and Shimatani, K. (2022). A deep
neural network model for multi-view human activity
recognition. PloS one, 17(1):e0262181.
Riba, E., Mishkin, D., Ponsa, D., Rublee, E., and Brad-
ski, G. (2020). Kornia: an open source differentiable
computer vision library for pytorch. In Proceedings of
the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 3674–3683.
Shah, K., Shah, A., Lau, C. P., de Melo, C. M., and Chel-
lappa, R. (2023). Multi-view action recognition using
contrastive learning. In Proceedings of the ieee/cvf
winter conference on applications of computer vision,
pages 3381–3391.
Smith, L. N. and Topin, N. (2019). Super-convergence:
Very fast training of neural networks using large learn-
ing rates. In Artificial intelligence and machine learn-
ing for multi-domain operations applications, volume
11006, pages 369–386. SPIE.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-
jna, Z. (2016). Rethinking the inception architecture
for computer vision. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2818–2826.
Thomas, G., Gade, R., Moeslund, T. B., Carr, P., and Hilton,
A. (2017). Computer vision for sports: Current ap-
plications and research topics. Computer Vision and
Image Understanding, 159:3–18.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and
Paluri, M. (2018). A closer look at spatiotemporal
convolutions for action recognition. In Proceedings of
the IEEE conference on Computer Vision and Pattern
Recognition, pages 6450–6459.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2023). Attention is all you need.
icSPORTS 2025 - 13th International Conference on Sport Sciences Research and Technology Support
68