Object-less Vision-language Model on Visual Question Classification for Blind People

Tung Le; Khoa Pho; Thong Bui; Thong Bui; Huy Tien Nguyen; Huy Tien Nguyen; Minh Le Nguyen

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Object-less Vision-language Model on Visual Question Classification for Blind People

Topics: Data Science; Deep Learning; Natural Language Processing; Neural Networks; Vision and Perception

In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, 180-187, 2022

Authors: Tung Le ¹ ; Khoa Pho ¹ ; Thong Bui ^{2

;

3} ; Huy Tien Nguyen ^{2

;

3} and Minh Le Nguyen ¹

Affiliations: ¹ School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan ; ² Faculty of Information Technology, University of Science, Ho Chi Minh city, Vietnam ; ³ Vietnam National University, Ho Chi Minh city, Vietnam

Keyword(s): Visual Question Classification, Object-less Image, Vision-language Model, Vision Transformer, VizWiz-VQA.

Abstract: Despite the long-standing appearance of question types in the Visual Question Answering dataset, Visual Question Classification does not received enough public interest in research. Different from general text classification, a visual question requires an understanding of visual and textual features simultaneously. Together with the enthusiasm and novelty of Visual Question Classification, the most important and practical goal we concentrate on is to deal with the weakness of Object Detection on object-less images. We thus propose an Object-less Visual Question Classification model, OL–LXMERT, to generate virtual objects replacing the dependence of Object Detection in previous Vision-Language systems. Our architecture is effective and powerful enough to digest local and global features of images in understanding the relationship between multiple modalities. Through our experiments in our modified VizWiz-VQC 2020 dataset of blind people, our Object-less LXMERT achieves promising resul ts in the brand-new multi-modal task. Furthermore, the detailed ablation studies show the strength and potential of our model in comparison to competitive approaches. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.15.5.183

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Le, T.; Pho, K.; Bui, T.; Nguyen, H. and Nguyen, M. (2022). Object-less Vision-language Model on Visual Question Classification for Blind People. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-547-0; ISSN 2184-433X, SciTePress, pages 180-187. DOI: 10.5220/0010797400003116

@conference{icaart22,
author={Tung Le. and Khoa Pho. and Thong Bui. and Huy Tien Nguyen. and Minh Le Nguyen.},
title={Object-less Vision-language Model on Visual Question Classification for Blind People},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2022},
pages={180-187},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010797400003116},
isbn={978-989-758-547-0},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Object-less Vision-language Model on Visual Question Classification for Blind People
SN - 978-989-758-547-0
IS - 2184-433X
AU - Le, T.
AU - Pho, K.
AU - Bui, T.
AU - Nguyen, H.
AU - Nguyen, M.
PY - 2022
SP - 180
EP - 187
DO - 10.5220/0010797400003116
PB - SciTePress