Audio-to-Image Generation: A Pipeline for Sound Analysis and Visual Synthesis

Vijayalakshmi M., Aayush Anshul, Kishu Raj Tyagi

2025

Abstract

This paper presents a novel pipeline for transforming audio into high-resolution images, leveraging advanced neural networks and modern generative models. The proposed approach integrates robust audio feature extraction, cross-modal mapping, and state-of-the-art image synthesis techniques to enhance the fidelity and generalizability of audio-to-image generation. Utilizing deep learning architectures such as YAMNet for sound classification, Whisper for speech recognition, and Stable Diffusion for high-quality image synthesis, the system ensures scalability and realism. Additionally, real-time processing and an interactive user feedback mechanism enable iterative refinement, optimizing the relevance and precision of generated visuals. The proposed methodology holds significant potential across various domains, including multimedia content creation, environmental monitoring, and educational applications, offering a transformative step toward seamless audio-visual synthesis.

Download


Paper Citation


in Harvard Style

M. V., Anshul A. and Tyagi K. (2025). Audio-to-Image Generation: A Pipeline for Sound Analysis and Visual Synthesis. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 24-32. DOI: 10.5220/0013907500004919


in Bibtex Style

@conference{icrdicct`2525,
author={Vijayalakshmi M. and Aayush Anshul and Kishu Tyagi},
title={Audio-to-Image Generation: A Pipeline for Sound Analysis and Visual Synthesis},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={24-32},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013907500004919},
isbn={978-989-758-777-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - Audio-to-Image Generation: A Pipeline for Sound Analysis and Visual Synthesis
SN - 978-989-758-777-1
AU - M. V.
AU - Anshul A.
AU - Tyagi K.
PY - 2025
SP - 24
EP - 32
DO - 10.5220/0013907500004919
PB - SciTePress