Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers

Xuehao Liu; Sarah Jane Delany; Susan McKeever

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers

Topics: Deep Learning for Visual Understanding ; Domain Adaptation; Few-Shot Learning; Machine Learning Technologies for Vision; Transfer Learning

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 501-508, 2024 , Rome, Italy

Authors: Xuehao Liu ; Sarah Jane Delany and Susan McKeever

Affiliation: School of Computer Science, Technological University Dublin, Ireland

Keyword(s): Image Captioning, Prompts, Parameter-Efficient Tuning, Vision-Language Transformer.

Abstract: Large-Scale transformer models pose challenges due to resource-intensive training, time, and data requirements for fine-tuning on new tasks, mainly due to their extensive parameter count. To address this, zero-shot and few-shot learning, aided by techniques like prompts and parameter-efficient modules, have emerged. However, these techniques are often tailored for vision-only or language-only tasks, leaving a gap for their effectiveness in multi-modal tasks like image captioning. This paper explores the effectiveness of prompts and parameter-efficient modules in reducing the training effort for image captioning. Rather than extensive fine-tuning, we trained only the prompt and parameter-efficient modules on the pretrained Oscar transformer model using the COCO dataset. We tested five prompt tuning approaches and two parameter-efficient methods. Notably, combining visual prompt tuning(VPT) with Adapter and LoRA led to a 2% Cider score improvement after just one epoch training, with a minimal increase in trainable parameters (5.7%). Our work paves the way towards using single-stream transformer models for a variety of fine-tuned tasks, but with a huge potential reduction in retraining time and processing resources. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.145.35.178

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Liu, X.; Delany, S. and McKeever, S. (2024). Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-679-8; ISSN 2184-4321, SciTePress, pages 501-508. DOI: 10.5220/0012364800003660

@conference{visapp24,
author={Xuehao Liu. and Sarah Jane Delany. and Susan McKeever.},
title={Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2024},
pages={501-508},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012364800003660},
isbn={978-989-758-679-8},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Applying Prompts and Parameter-Efficient Methods to Enhance Single-Stream Vision-Language Transformers
SN - 978-989-758-679-8
IS - 2184-4321
AU - Liu, X.
AU - Delany, S.
AU - McKeever, S.
PY - 2024
SP - 501
EP - 508
DO - 10.5220/0012364800003660
PB - SciTePress