loading
Papers

Research.Publish.Connect.

Paper

Authors: Dichao Liu 1 ; Yu Wang 2 and Jien Kato 3

Affiliations: 1 Graduate School of Informatics, Nagoya University, Nagoya City, Japan ; 2 Graduate School of International Development, Nagoya University, Nagoya City, Japan ; 3 College of Information Science and Engineering, Ritsumeikan University, Kusatsu City, Japan

ISBN: 978-989-758-354-4

Keyword(s): Action Recognition, Video Understanding, Attention, Fine-grained, Deep Learning.

Related Ontology Subjects/Areas/Topics: Computer Vision, Visualization and Computer Graphics ; Image and Video Analysis ; Visual Attention and Image Saliency

Abstract: We aim to propose more effective attentional regions that can help develop better fine-grained action recognition algorithms. On the basis of the spatial transformer networks’ capability that implements spatial manipulation inside the networks, we propose an extension model, the Supervised Spatial Transformer Networks (SSTNs). This network model can supervise the spatial transformers to capture the regions same as hard-coded attentional regions of certain scale levels at first. Then such supervision can be turned off, and the network model will adjust the region learning in terms of location and scale. The adjustment is conditioned to classification loss so that it is actually optimized for better recognition results. With this model, we are able to capture attentional regions of different levels within the networks. To evaluate SSTNs, we construct a six-stream SSTN model that exploits spatial and temporal information corresponding to three levels (general, middle and detail). The res ults show that the deep-learned attentional regions captured by SSTNs outperform hard-coded attentional regions. Also, the features learned by different streams of SSTNs are complementary to each other and better result is obtained by fusing the features. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.233.226.151

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Liu, D.; Wang, Y. and Kato, J. (2019). Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition.In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP, ISBN 978-989-758-354-4, pages 311-318. DOI: 10.5220/0007257803110318

@conference{visapp19,
author={Dichao Liu. and Yu Wang. and Jien Kato.},
title={Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition},
booktitle={Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP,},
year={2019},
pages={311-318},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007257803110318},
isbn={978-989-758-354-4},
}

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP,
TI - Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
SN - 978-989-758-354-4
AU - Liu, D.
AU - Wang, Y.
AU - Kato, J.
PY - 2019
SP - 311
EP - 318
DO - 10.5220/0007257803110318

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.