Action Tube Generation by Person Query Matching for Spatio-Temporal Action Detection

Kazuki Omi; Jion Oshima; Toru Tamaki

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Action Tube Generation by Person Query Matching for Spatio-Temporal Action Detection

Topics: Deep Learning for Visual Understanding ; Event and Human Activity Recognition

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 261-268, 2025 , Porto, Portugal

Authors: Kazuki Omi ; Jion Oshima and Toru Tamaki

Affiliation: Nagoya Institute of Technology, Japan

Keyword(s): Spatio-Temporal Action Detection (STAD), Action Tubes, Query Matching, DETR, Query-Based Detection, IoU-Based Linking.

Abstract: This paper proposes a method for spatio-temporal action detection (STAD) that directly generates action tubes from the original video without relying on post-processing steps such as IoU-based linking and clip splitting. Our approach applies query-based detection (DETR) to each frame and matches DETR queries to link the same person across frames. We introduce the Query Matching Module (QMM), which uses metric learning to bring queries for the same person closer together across frames compared to queries for different people. Action classes are predicted using the sequence of queries obtained from QMM matching, allowing for variable-length inputs from videos longer than a single clip. Experimental results on JHMDB, UCF101-24 and AVA datasets demonstrate that our method performs well for large position changes of people while offering superior computational efficiency and lower resource requirements.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.58

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Omi, K., Oshima, J. and Tamaki, T. (2025). Action Tube Generation by Person Query Matching for Spatio-Temporal Action Detection. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3; ISSN 2184-4321, SciTePress, pages 261-268. DOI: 10.5220/0013089500003912

@conference{visapp25,
author={Kazuki Omi and Jion Oshima and Toru Tamaki},
title={Action Tube Generation by Person Query Matching for Spatio-Temporal Action Detection},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={261-268},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013089500003912},
isbn={978-989-758-728-3},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Action Tube Generation by Person Query Matching for Spatio-Temporal Action Detection
SN - 978-989-758-728-3
IS - 2184-4321
AU - Omi, K.
AU - Oshima, J.
AU - Tamaki, T.
PY - 2025
SP - 261
EP - 268
DO - 10.5220/0013089500003912
PB - SciTePress