Multimodal Dance Recognition

Monika Wysoczańska, Tomasz Trzciński

Abstract

Video content analysis is still an emerging technology, and the majority of work in this area extends from the still image domain. Dance videos are especially difficult to analyse and recognise as the performed human actions are highly dynamic. In this work, we introduce a multimodal approach for dance video recognition. Our proposed method combines visual and audio information, by fusing their representations, to improve classification accuracy. For the visual part, we focus on motion representation, as it is the key factor in distinguishing dance styles. For audio representation, we put the emphasis on capturing long-term dependencies, such as tempo, which is a crucial dance discriminator. Finally, we fuse two distinct modalities using a late fusion approach. We compare our model with corresponding unimodal approaches, by giving exhaustive evaluation on the Let’s Dance dataset. Our method yields significantly better results than each single-modality approach. Results presented in this work not only demonstrate the strength of integrating complementary sources of information in the recognition task, but also indicate the potential of applying multimodal approaches within specific research areas.

Download


Paper Citation


in Harvard Style

Wysoczańska M. and Trzciński T. (2020). Multimodal Dance Recognition.In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, ISBN 978-989-758-402-2, pages 558-565. DOI: 10.5220/0009326005580565


in Bibtex Style

@conference{visapp20,
author={Monika Wysoczańska and Tomasz Trzciński},
title={Multimodal Dance Recognition},
booktitle={Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,},
year={2020},
pages={558-565},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009326005580565},
isbn={978-989-758-402-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,
TI - Multimodal Dance Recognition
SN - 978-989-758-402-2
AU - Wysoczańska M.
AU - Trzciński T.
PY - 2020
SP - 558
EP - 565
DO - 10.5220/0009326005580565