Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings
Marco Murgia, Marco Murgia, Marco Fontana, Alberto Pes, Diego Recupero, Diego Recupero, Giuseppe Scarpi, Giuseppe Scarpi, Leonardo Daniele Scintilla
2025
Abstract
In this paper, we propose a robust and computationally efficient pipeline for transcribing speech in noisy environments, such as workshops and industrial settings. The pipeline is designed to operate offline, making it suitable for resource-constrained scenarios. It begins with a noise filtering module that preprocesses audio recordings to suppress background noise and enhance speech clarity. The filtered audio is then passed to an Automatic Speech Recognition (ASR) model, which generates initial transcription outputs. Given the potential for transcription errors in challenging acoustic conditions, we incorporate a quantized Small Language Model (SLM) trained on an ontology of defects related to the industrial environment to post-process and correct these errors. The quantization of the SLM significantly reduces its computational footprint while maintaining correction accuracy, enabling the pipeline to function effectively on low-resource devices. Experimental evaluations demonstrate the effectiveness of the proposed approach in improving transcription quality in noisy conditions, highlighting its practicality for offline and resource-limited applications. In fact, preliminary validation on a synthetic dataset of 200 sentences in Italian and English showed a consistent F1 score of 87.04% for SNR as challenging as -5 dBW (Decibels Watt) in Italian sentences and 91.25% in English sentences, with the least computationally expensive version of Whisper (Whisper Tiny) and the SLM correction.
DownloadPaper Citation
in Harvard Style
Murgia M., Fontana M., Pes A., Recupero D., Scarpi G. and Scintilla L. (2025). Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings. In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO; ISBN 978-989-758-770-2, SciTePress, pages 479-486. DOI: 10.5220/0013704200003982
in Bibtex Style
@conference{icinco25,
author={Marco Murgia and Marco Fontana and Alberto Pes and Diego Recupero and Giuseppe Scarpi and Leonardo Scintilla},
title={Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings},
booktitle={Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO},
year={2025},
pages={479-486},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013704200003982},
isbn={978-989-758-770-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO
TI - Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings
SN - 978-989-758-770-2
AU - Murgia M.
AU - Fontana M.
AU - Pes A.
AU - Recupero D.
AU - Scarpi G.
AU - Scintilla L.
PY - 2025
SP - 479
EP - 486
DO - 10.5220/0013704200003982
PB - SciTePress