loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Peng Cheng ; Tung Le ; Teeradaj Racharak ; Cao Yiming ; Kong Weikun and Minh Le Nguyen

Affiliation: School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan

Keyword(s): Cross-modality, Multi-relational Semantics, Object Semantics, Vision-and-Language, Pre-training.

Abstract: Image captioning is a cross-domain study that generates image description sentences based on a given image. Recently, (Li et al., 2020b) shows that concatenating sentences, object tags, and region features as a unified representation enables to overcome state-of-the-art works in different vision-and-language-related tasks. Such results have inspired us to investigate and propose two new learning methods that exploit the relation representation in the model and improve the model’s generation results in this paper. To the best of our knowledge, we are the first that exploit both relations extracted from text and images for image captioning. Our idea is motivated by the phenomenon that humans can correct other people’s descriptions by knowing the relationship between objects in an image while observing the same image. We conduct experiments based on the MS COCO dataset (Lin et al., 2014) and show that our method can yield the higher SPICE score than the baseline.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 13.58.151.231

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cheng, P.; Le, T.; Racharak, T.; Yiming, C.; Weikun, K. and Nguyen, M. (2022). Learning Cross-modal Representations with Multi-relations for Image Captioning. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 346-353. DOI: 10.5220/0010915100003122

@conference{icpram22,
author={Peng Cheng. and Tung Le. and Teeradaj Racharak. and Cao Yiming. and Kong Weikun. and Minh Le Nguyen.},
title={Learning Cross-modal Representations with Multi-relations for Image Captioning},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={346-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010915100003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Learning Cross-modal Representations with Multi-relations for Image Captioning
SN - 978-989-758-549-4
IS - 2184-4313
AU - Cheng, P.
AU - Le, T.
AU - Racharak, T.
AU - Yiming, C.
AU - Weikun, K.
AU - Nguyen, M.
PY - 2022
SP - 346
EP - 353
DO - 10.5220/0010915100003122
PB - SciTePress