Learning Cross-modal Representations with Multi-relations for Image Captioning

Peng Cheng; Tung Le; Teeradaj Racharak; Cao Yiming; Kong Weikun; Minh Le Nguyen

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Learning Cross-modal Representations with Multi-relations for Image Captioning

Topics: Deep Learning and Neural Networks; Image and Video Analysis and Understanding; Natural Language Processing

In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 346-353, 2022

Authors: Peng Cheng ; Tung Le ; Teeradaj Racharak ; Cao Yiming ; Kong Weikun and Minh Le Nguyen

Affiliation: School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan

Keyword(s): Cross-modality, Multi-relational Semantics, Object Semantics, Vision-and-Language, Pre-training.

Abstract: Image captioning is a cross-domain study that generates image description sentences based on a given image. Recently, (Li et al., 2020b) shows that concatenating sentences, object tags, and region features as a unified representation enables to overcome state-of-the-art works in different vision-and-language-related tasks. Such results have inspired us to investigate and propose two new learning methods that exploit the relation representation in the model and improve the model’s generation results in this paper. To the best of our knowledge, we are the first that exploit both relations extracted from text and images for image captioning. Our idea is motivated by the phenomenon that humans can correct other people’s descriptions by knowing the relationship between objects in an image while observing the same image. We conduct experiments based on the MS COCO dataset (Lin et al., 2014) and show that our method can yield the higher SPICE score than the baseline.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 13.58.151.231

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Cheng, P.; Le, T.; Racharak, T.; Yiming, C.; Weikun, K. and Nguyen, M. (2022). Learning Cross-modal Representations with Multi-relations for Image Captioning. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 346-353. DOI: 10.5220/0010915100003122

@conference{icpram22,
author={Peng Cheng. and Tung Le. and Teeradaj Racharak. and Cao Yiming. and Kong Weikun. and Minh Le Nguyen.},
title={Learning Cross-modal Representations with Multi-relations for Image Captioning},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={346-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010915100003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Learning Cross-modal Representations with Multi-relations for Image Captioning
SN - 978-989-758-549-4
IS - 2184-4313
AU - Cheng, P.
AU - Le, T.
AU - Racharak, T.
AU - Yiming, C.
AU - Weikun, K.
AU - Nguyen, M.
PY - 2022
SP - 346
EP - 353
DO - 10.5220/0010915100003122
PB - SciTePress