Body Part Information Additional in Multi-decoder Transformer-Based Network for Human Object Interaction Detection

Zihao Guo, Fei Li, Rujie Liu, Ryo Ishida, Genta Suzuki

2023

Abstract

Human Object Interaction Detection is one of the essential branches of video understanding. However, many complex scenes exist, such as humans interacting with multiple objects. The whole human body as the subject of interaction in the complex interaction environment may misjudge the interaction with the wrong objects. In this paper, we propose a Transformer based structure with the body part additional module to solve this problem. The Transformer structure is applied to provide powerful information mining capability. Moreover, a multi-decoder structure is adopted for solving different sub-problems, enabling models to focus on different regions to provide more powerful performance. The most important contribution of our work is the proposed body part additional module. It introduces the body part information for Human-Object Interaction(HOI) detection, which refines the subject of the HOI triplet and assists the interaction detection. The body part additional module also includes the Channel Attention module to ensure the balance between the information, preventing the model from paying too much attention to the body part or the Human-Object pair. We got better performance than the State-Of-The-Art model.

Download


Paper Citation


in Harvard Style

Guo Z., Li F., Liu R., Ishida R. and Suzuki G. (2023). Body Part Information Additional in Multi-decoder Transformer-Based Network for Human Object Interaction Detection. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 221-229. DOI: 10.5220/0011755300003417


in Bibtex Style

@conference{visapp23,
author={Zihao Guo and Fei Li and Rujie Liu and Ryo Ishida and Genta Suzuki},
title={Body Part Information Additional in Multi-decoder Transformer-Based Network for Human Object Interaction Detection},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={221-229},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011755300003417},
isbn={978-989-758-634-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - Body Part Information Additional in Multi-decoder Transformer-Based Network for Human Object Interaction Detection
SN - 978-989-758-634-7
AU - Guo Z.
AU - Li F.
AU - Liu R.
AU - Ishida R.
AU - Suzuki G.
PY - 2023
SP - 221
EP - 229
DO - 10.5220/0011755300003417
PB - SciTePress