loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Mohammad Al-Saad ; Lakshmish Ramaswamy and Suchendra Bhandarkar

Affiliation: School of Computing, The University of Georgia, Athens, GA, U.S.A

Keyword(s): Video-Level Action Recognition, Factorized Convolutional Neural Network, Temporal Attention, Spatio-Temporal Attention, Channel Attention, 3D CNN, 4D CNN.

Abstract: Recent studies have shown that video-level representation learning is crucial to the capture and understanding of the long-range temporal structure for video action recognition. Most existing 3D convolutional neural network (CNN)-based methods for video-level representation learning are clip-based and focus only on short-term motion and appearances. These CNN-based methods lack the capacity to incorporate and model the long-range spatiotemporal representation of the underlying video and ignore the long-range video-level context during training. In this study, we propose a factorized 4D CNN architecture with attention (F4D) that is capable of learning more effective, finer-grained, long-term spatiotemporal video representations. We demonstrate that the proposed F4D architecture yields significant performance improvements over the conventional 2D, and 3D CNN architectures proposed in the literature. Experiment evaluation on five action recognition benchmark datasets, i.e., Something-So mething-v1, Something-Something-v2, Kinetics-400, UCF101, and HMDB51 demonstrate the effectiveness of the proposed F4D network architecture for video-level action recognition. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.133.133.110

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Al-Saad, M.; Ramaswamy, L. and Bhandarkar, S. (2024). F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 1002-1013. DOI: 10.5220/0012430200003636

@conference{icaart24,
author={Mohammad Al{-}Saad. and Lakshmish Ramaswamy. and Suchendra Bhandarkar.},
title={F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={1002-1013},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012430200003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning
SN - 978-989-758-680-4
IS - 2184-433X
AU - Al-Saad, M.
AU - Ramaswamy, L.
AU - Bhandarkar, S.
PY - 2024
SP - 1002
EP - 1013
DO - 10.5220/0012430200003636
PB - SciTePress