Visual Question Answering on the Indian Heritage in Digital Space Dataset Using the BLIP Model

Aryan Phadnis, Vibhu V Revadi, Abhishek B R, Vinayak Neginhal, Uday Kulkarni, Shashank Hegde

2025

Abstract

Visual Question Answering is a rapidly evolving domain in the field of artificial intelligence, which combines computer vision and natural language processing to understand and answer textual questions based on image content. Our approach involves the fine-tuning of the Bootstrapping Language-Image Pre-training model, a multimodal framework to address the problems between vision and language modalities. By using a pre-trained architecture, we can optimize the model for real-world applications through some task-specific adaptations. Our work highlights how such a model can address the practical challenges in Visual Question Answering tasks thus improving the alignment between the visual and textual modalities. Experimental results on the test dataset created using unseen images and questions from the IHDS dataset show an accuracy of 86.42% and a weighted F1 score of 0.89, showing the effectiveness of our approach in enhancing VQA systems for any diverse and complex dataset. The integration of domain-specific datasets highlights the versatility of using fine-tuned models for addressing distinct challenges while also maintaining robust performance. Our proposed methodology demonstrates adaptability to a domain and also establishes a foundation for applying multimodal frameworks to culturally rich datasets.

Download


Paper Citation


in Harvard Style

Phadnis A., Revadi V., B R A., Neginhal V., Kulkarni U. and Hegde S. (2025). Visual Question Answering on the Indian Heritage in Digital Space Dataset Using the BLIP Model. In Proceedings of the 3rd International Conference on Futuristic Technology - Volume 2: INCOFT; ISBN 978-989-758-763-4, SciTePress, pages 5-11. DOI: 10.5220/0013585800004664


in Bibtex Style

@conference{incoft25,
author={Aryan Phadnis and Vibhu Revadi and Abhishek B R and Vinayak Neginhal and Uday Kulkarni and Shashank Hegde},
title={Visual Question Answering on the Indian Heritage in Digital Space Dataset Using the BLIP Model},
booktitle={Proceedings of the 3rd International Conference on Futuristic Technology - Volume 2: INCOFT},
year={2025},
pages={5-11},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013585800004664},
isbn={978-989-758-763-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Futuristic Technology - Volume 2: INCOFT
TI - Visual Question Answering on the Indian Heritage in Digital Space Dataset Using the BLIP Model
SN - 978-989-758-763-4
AU - Phadnis A.
AU - Revadi V.
AU - B R A.
AU - Neginhal V.
AU - Kulkarni U.
AU - Hegde S.
PY - 2025
SP - 5
EP - 11
DO - 10.5220/0013585800004664
PB - SciTePress