Fairer Evaluation of Zero Shot Action Recognition in Videos

Kaiqiang Huang; Sarah Jane Delany; Susan Mckeever

doi:10.5220/0010324402060215

Fairer Evaluation of Zero Shot Action Recognition in Videos

Kaiqiang Huang, Sarah Jane Delany, Susan Mckeever

2021

Abstract

Zero-shot learning (ZSL) for human action recognition (HAR) aims to recognise video action classes that have never been seen during model training. This is achieved by building mappings between visual and semantic embeddings. These visual embeddings are typically provided via a pre-trained deep neural network (DNN). The premise of ZSL is that the training and testing classes should be disjoint. In the parallel domain of ZSL for image input, the widespread poor evaluation protocol of pre-training on ZSL test classes has been highlighted. This is akin to providing a sneak preview of the evaluation classes. In this work, we investigate the extent to which this evaluation protocol has been used in ZSL for human action recognition research work. We show that in the field of ZSL for HAR, accuracies for overlapping classes are being boosted by between 5.75% to 51.94% depending on the use of visual and semantic features as a result of this flawed evaluation protocol. To assist other researchers in avoiding this problem in the future, we provide annotated versions of the relevant benchmark ZSL test datasets in the HAR field: UCF101 and HMDB51 datasets - highlighting overlaps to pre-training datasets in the field.

Download

Paper Citation

in Harvard Style

Huang K., Delany S. and Mckeever S. (2021). Fairer Evaluation of Zero Shot Action Recognition in Videos. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP; ISBN 978-989-758-488-6, SciTePress, pages 206-215. DOI: 10.5220/0010324402060215

in Bibtex Style

@conference{visapp21,
author={Kaiqiang Huang and Sarah Jane Delany and Susan Mckeever},
title={Fairer Evaluation of Zero Shot Action Recognition in Videos},
booktitle={Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP},
year={2021},
pages={206-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010324402060215},
isbn={978-989-758-488-6},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP
TI - Fairer Evaluation of Zero Shot Action Recognition in Videos
SN - 978-989-758-488-6
AU - Huang K.
AU - Delany S.
AU - Mckeever S.
PY - 2021
SP - 206
EP - 215
DO - 10.5220/0010324402060215
PB - SciTePress