F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning

Mohammad Al-Saad; Lakshmish Ramaswamy; Suchendra Bhandarkar

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning

Topics: Deep Learning; Machine Learning; Neural Networks; Vision and Perception

In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, 1002-1013, 2024 , Rome, Italy

Authors: Mohammad Al-Saad ; Lakshmish Ramaswamy and Suchendra Bhandarkar

Affiliation: School of Computing, The University of Georgia, Athens, GA, U.S.A

Keyword(s): Video-Level Action Recognition, Factorized Convolutional Neural Network, Temporal Attention, Spatio-Temporal Attention, Channel Attention, 3D CNN, 4D CNN.

Abstract: Recent studies have shown that video-level representation learning is crucial to the capture and understanding of the long-range temporal structure for video action recognition. Most existing 3D convolutional neural network (CNN)-based methods for video-level representation learning are clip-based and focus only on short-term motion and appearances. These CNN-based methods lack the capacity to incorporate and model the long-range spatiotemporal representation of the underlying video and ignore the long-range video-level context during training. In this study, we propose a factorized 4D CNN architecture with attention (F4D) that is capable of learning more effective, finer-grained, long-term spatiotemporal video representations. We demonstrate that the proposed F4D architecture yields significant performance improvements over the conventional 2D, and 3D CNN architectures proposed in the literature. Experiment evaluation on five action recognition benchmark datasets, i.e., Something-So mething-v1, Something-Something-v2, Kinetics-400, UCF101, and HMDB51 demonstrate the effectiveness of the proposed F4D network architecture for video-level action recognition. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.133.133.110

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Al-Saad, M.; Ramaswamy, L. and Bhandarkar, S. (2024). F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 1002-1013. DOI: 10.5220/0012430200003636

@conference{icaart24,
author={Mohammad Al{-}Saad. and Lakshmish Ramaswamy. and Suchendra Bhandarkar.},
title={F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={1002-1013},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012430200003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - F4D: Factorized 4D Convolutional Neural Network for Efficient Video-Level Representation Learning
SN - 978-989-758-680-4
IS - 2184-433X
AU - Al-Saad, M.
AU - Ramaswamy, L.
AU - Bhandarkar, S.
PY - 2024
SP - 1002
EP - 1013
DO - 10.5220/0012430200003636
PB - SciTePress