loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Xuehao Liu ; Sarah Jane Delany and Susan McKeever

Affiliation: School of Computer Science, Technological University Dublin, Ireland

Keyword(s): Image Captioning, Prompts, Parameter-Efficient Tuning, Vision-Language Transformer.

Abstract: Large-Scale transformer models pose challenges due to resource-intensive training, time, and data requirements for fine-tuning on new tasks, mainly due to their extensive parameter count. To address this, zero-shot and few-shot learning, aided by techniques like prompts and parameter-efficient modules, have emerged. However, these techniques are often tailored for vision-only or language-only tasks, leaving a gap for their effectiveness in multi-modal tasks like image captioning. This paper explores the effectiveness of prompts and parameter-efficient modules in reducing the training effort for image captioning. Rather than extensive fine-tuning, we trained only the prompt and parameter-efficient modules on the pretrained Oscar transformer model using the COCO dataset. We tested five prompt tuning approaches and two parameter-efficient methods. Notably, combining visual prompt tuning(VPT) with Adapter and LoRA led to a 2% Cider score improvement after just one epoch training, with a minimal increase in trainable parameters (5.7%). Our work paves the way towards using single-stream transformer models for a variety of fine-tuned tasks, but with a huge potential reduction in retraining time and processing resources. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.35.178

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Liu, X.; Delany, S. and McKeever, S. (2024). Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-679-8; ISSN 2184-4321, SciTePress, pages 501-508. DOI: 10.5220/0012364800003660

@conference{visapp24,
author={Xuehao Liu. and Sarah Jane Delany. and Susan McKeever.},
title={Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2024},
pages={501-508},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012364800003660},
isbn={978-989-758-679-8},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers
SN - 978-989-758-679-8
IS - 2184-4321
AU - Liu, X.
AU - Delany, S.
AU - McKeever, S.
PY - 2024
SP - 501
EP - 508
DO - 10.5220/0012364800003660
PB - SciTePress