Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings

Marco Murgia; Marco Murgia; Marco Fontana; Alberto Pes; Diego Recupero; Diego Recupero; Giuseppe Scarpi; Giuseppe Scarpi; Leonardo  Daniele Scintilla

doi:10.5220/0013704200003982

Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings

Marco Murgia, Marco Murgia, Marco Fontana, Alberto Pes, Diego Recupero, Diego Recupero, Giuseppe Scarpi, Giuseppe Scarpi, Leonardo Daniele Scintilla

2025

Abstract

In this paper, we propose a robust and computationally efficient pipeline for transcribing speech in noisy environments, such as workshops and industrial settings. The pipeline is designed to operate offline, making it suitable for resource-constrained scenarios. It begins with a noise filtering module that preprocesses audio recordings to suppress background noise and enhance speech clarity. The filtered audio is then passed to an Automatic Speech Recognition (ASR) model, which generates initial transcription outputs. Given the potential for transcription errors in challenging acoustic conditions, we incorporate a quantized Small Language Model (SLM) trained on an ontology of defects related to the industrial environment to post-process and correct these errors. The quantization of the SLM significantly reduces its computational footprint while maintaining correction accuracy, enabling the pipeline to function effectively on low-resource devices. Experimental evaluations demonstrate the effectiveness of the proposed approach in improving transcription quality in noisy conditions, highlighting its practicality for offline and resource-limited applications. In fact, preliminary validation on a synthetic dataset of 200 sentences in Italian and English showed a consistent F1 score of 87.04% for SNR as challenging as -5 dBW (Decibels Watt) in Italian sentences and 91.25% in English sentences, with the least computationally expensive version of Whisper (Whisper Tiny) and the SLM correction.

Download

Paper Citation

in Harvard Style

Murgia M., Fontana M., Pes A., Recupero D., Scarpi G. and Scintilla L. (2025). Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings. In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO; ISBN 978-989-758-770-2, SciTePress, pages 479-486. DOI: 10.5220/0013704200003982

in Bibtex Style

@conference{icinco25,
author={Marco Murgia and Marco Fontana and Alberto Pes and Diego Recupero and Giuseppe Scarpi and Leonardo Scintilla},
title={Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings},
booktitle={Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO},
year={2025},
pages={479-486},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013704200003982},
isbn={978-989-758-770-2},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO
TI - Noise-Robust Speech Transcription with Quantized Language Model Correction for Industrial Settings
SN - 978-989-758-770-2
AU - Murgia M.
AU - Fontana M.
AU - Pes A.
AU - Recupero D.
AU - Scarpi G.
AU - Scintilla L.
PY - 2025
SP - 479
EP - 486
DO - 10.5220/0013704200003982
PB - SciTePress