Contextualise, Attend, Modulate and Tell: Visual Storytelling

Zainy M. Malakan; Zainy M. Malakan; Nayyer Aafaq; Ghulam Mubashar Hassan; Ajmal Mian

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Contextualise, Attend, Modulate and Tell: Visual Storytelling

Topics: Categorization and Scene Understanding; Deep Learning for Image-to-Text Translation and Dialogue; Deep Learning for Visual Understanding ; Machine Learning Technologies for Vision; Vision for Robotics

In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP, 196-205, 2021

Authors: Zainy M. Malakan ^{1

;

2} ; Nayyer Aafaq ¹ ; Ghulam Mubashar Hassan ¹ and Ajmal Mian ¹

Affiliations: ¹ Department of Computer Science and Software Engineering, The University of Western Australia, Australia ; ² Department of Information Science, Faculty of Computer Science and Information System, Umm Al-Qura University, Saudi Arabia

Keyword(s): Storytelling, Image Captioning, Visual Description.

Abstract: Automatic natural language description of visual content is an emerging and fast-growing topic that has attracted extensive research attention recently. However, different from typical ‘image captioning’ or ‘video captioning’, coherent story generation from a sequence of images is a relatively less studied problem. Story generation poses the challenges of diverse language style, context modeling, coherence and latent concepts that are not even visible in the visual content. Contemporary methods fall short of modeling the context and visual variance, and generate stories devoid of language coherence among multiple sentences. To this end, we propose a novel framework Contextualize, Attend, Modulate and Tell (CAMT) that models the temporal relationship among the image sequence in forward as well as backward direction. The contextual information and the regional image features are then projected into a joint space and then subjected to an attention mechanism that captures the spatio-temp oral relationships among the images. Before feeding the attentive representations of the input images into a language model, gated modulation between the attentive representation and the input word embeddings is performed to capture the interaction between the inputs and their context. To the best of our knowledge, this is the first method that exploits such a modulation technique for story generation. We evaluate our model on the Visual Storytelling Dataset (VIST) employing both automatic and human evaluation measures and demonstrate that our CAMT model achieves better performance than existing baselines. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.135.246.193

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Malakan, Z.; Aafaq, N.; Hassan, G. and Mian, A. (2021). Contextualise, Attend, Modulate and Tell: Visual Storytelling. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP; ISBN 978-989-758-488-6; ISSN 2184-4321, SciTePress, pages 196-205. DOI: 10.5220/0010314301960205

@conference{visapp21,
author={Zainy M. Malakan. and Nayyer Aafaq. and Ghulam Mubashar Hassan. and Ajmal Mian.},
title={Contextualise, Attend, Modulate and Tell: Visual Storytelling},
booktitle={Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP},
year={2021},
pages={196-205},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010314301960205},
isbn={978-989-758-488-6},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 5: VISAPP
TI - Contextualise, Attend, Modulate and Tell: Visual Storytelling
SN - 978-989-758-488-6
IS - 2184-4321
AU - Malakan, Z.
AU - Aafaq, N.
AU - Hassan, G.
AU - Mian, A.
PY - 2021
SP - 196
EP - 205
DO - 10.5220/0010314301960205
PB - SciTePress