Multilingual and Robust Speech Recognition: Leveraging Advanced Machine Learning for Accurate and Real-Time Natural Language Processing

Nithya S., Alok Sengar, Jayalakshmi K., K. Kokulavani, Joel Philip J., M. Srinivasulu

2025

Abstract

The use of ML technology has significantly boosted the development of SR, but the systems that we have at present suffer from noise sensitivity, poor support for multiple languages, system complexity, and poor flexibility to real applications. This work aims to close this gap by introducing a strong and scalable framework for speech recognition using recent architectures (Conformers, Transformers, as well as Whisper models). The model includes domain adaptation, speaker variability and data augmentation techniques to achieve higher accuracy and natural language understanding. It also enables real-time and cross-lingual processing that makes it practical to deploy in noisy and diverse environments. This work overcomes the shortcomings of previous research by using contemporary toolkits and reproducible pipelines, resulting in consistent performance gains in the recognition performance under all settings.

Download


Paper Citation


in Harvard Style

S. N., Sengar A., K. J., Kokulavani K., J. J. and Srinivasulu M. (2025). Multilingual and Robust Speech Recognition: Leveraging Advanced Machine Learning for Accurate and Real-Time Natural Language Processing. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 513-518. DOI: 10.5220/0013885700004919


in Bibtex Style

@conference{icrdicct`2525,
author={Nithya S. and Alok Sengar and Jayalakshmi K. and K. Kokulavani and Joel J. and M. Srinivasulu},
title={Multilingual and Robust Speech Recognition: Leveraging Advanced Machine Learning for Accurate and Real-Time Natural Language Processing},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={513-518},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013885700004919},
isbn={978-989-758-777-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - Multilingual and Robust Speech Recognition: Leveraging Advanced Machine Learning for Accurate and Real-Time Natural Language Processing
SN - 978-989-758-777-1
AU - S. N.
AU - Sengar A.
AU - K. J.
AU - Kokulavani K.
AU - J. J.
AU - Srinivasulu M.
PY - 2025
SP - 513
EP - 518
DO - 10.5220/0013885700004919
PB - SciTePress