5 CONCLUSIONS
In this paper, we have introduced a powerful and
scalable speech recognition architecture for LRL
using adaptive deep transfer learning and noise-aware
multilingual modeling. The proposed method deals
with pressing issues of limited data and diversity of
languages, which are neglected in existing ASR
systems suitable for real-world settings.
By using a judicious mix of self-supervised
pretraining, adapter-based transfer learning and
multilingual acoustic model, the system demonstrated
a striking performance gain in recognition accuracy
over a variety of low-resource languages. By
incorporating the preservation of tonal and prosodic
features, performance was further improved in tonal
languages while noise-aware training techniques led
to consistent performance in noisy environments.
Amongst other, for the emphasis on efficiency
and deployability this work is considered a significant
contribution. Through model compression and
inference optimization, the framework achieved
real-time processing performance on edge devices,
which is well suited for communities with poor
technology infrastructure. This makes speech-driven
technologies more accessible to all and encourages
digital inclusion for speakers of minority languages.
The study also pointed out some of the limitations
such as code-switching and speaker variations,
limiting thoughts for future work. Possibly further
improvement could be made by exploiting language
identification, speaker adaptation modules and
generalising the system to zero-resource.
In total, this work offers a comprehensive, flexible
and state-of-the-art solution to the problem of low-
resource speech recognition and provides a
promising baseline for the development of further
progresses in inclusive and ethical AI for voice
technologies.
REFERENCES
Baklouti, I., Ben Ahmed, O., & Fernandez-Maloigne, C.
(2024). Cross-lingual low-resources speech emotion
recognition with domain adaptive transfer learning. In
Proceedings of the 11th International Conference on
Pattern Recognition Applications and Methods (pp.
123– 130). https://www.scitepress.org/Papers/2024/12
7881/127881.pdfSciTePress
Byambadorj, Z., Nishimura, R., Ayush, A., Ohta, K., &
Kitaoka, N. (2021). Text-to-speech system for low-
resource language using cross-lingual transfer learning
and data augmentation. EURASIP Journal on Audio,
Speech, and Music Processing, 2021(1), 42.
https://doi.org/10.1186/s13636- 021- 00225- 4Springer
Open
Chopra, S., Mathur, P., Sawhney, R., & Shah, R. R. (2021).
Meta-learning for low-resource speech emotion
recognition. In Proceedings of the 2021 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP) (pp. 6259–6263). IEEE.
SpringerLink
Do, P., Coler, M., Dijkstra, J., & Klabbers, E. (2023).
Strategies in transfer learning for low-resource speech
synthesis: Phone mapping, features input, and source
language selection. arXiv preprint arXiv:2306.12040
https://arxiv.org/abs/2306.12040arXiv
Durrani, S., & Arshad, U. (2021). Transfer learning from
high-resource to low-resource language improves
speech affect recognition classification accuracy. arXiv
preprint arXiv:2103.11764. https://arxiv.org/abs/2103.
11764
Gales, M. J. F., Knill, K. M., Ragni, A., & Rath, S. P.
(2014). Speech recognition and keyword spotting for
low-resource languages: Babel project research at
CUED. In Proceedings of the 4th International
Workshop on Spoken Language Technologies for
Under-resourced Languages (pp. 16–23). SpringerLink
Gao, H. (2024). Unsupervised speech technology for low-
resource languages. University of Illinois at Urbana-
Champaign. https://www.ideals.illinois.edu/items/131
333IDEALS
Hou, W., Zhu, H., Wang, Y., Wang, J., Qin, T., Xu, R., &
Shinozaki, T. (2021). Exploiting adapters for cross-
lingual low-resource speech recognition. IEEE/ACM
Transactions on Audio, Speech, and Language
Processing, 29, 317– 329.https://doi.org/10.1109/TAS
LP.2020.3048250SpringerLink+1arXiv+1
Khare, S., & Khare, A. (2021). Low resource ASR: The
surprising effectiveness of high resource self-
supervised models. In Proceedings of Interspeech 2021
(pp. 1509– 1513). https://www.isca- archive.org/inters
peech_2021/khare21_interspeech.htmlISCA Archive
Kim, J., Kumar, M., Gowda, D., Garg, A., & Kim, C.
(2021). Semi-supervised transfer learning for language
expansion of end-to-end speech recognition models to
low- resource languages. arXiv preprintarXiv:2111.10
047. https://arxiv.org/abs/2111.10047arXiv+1arXiv+1
Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-
attention based end-to-end speech recognition using
multi-task learning. In Proceedings of the 2017 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP) (pp. 4835–4839). IEEE.
SpringerLink
Kondo, F., & Tamura, S. (2024). Inter-language transfer
learning for visual speech recognition toward under-
resourced environments. In Proceedings of the 3rd
Annual Meeting of the Special Interest Group on
Under-resourced Languages @ LREC-COLING 2024
(pp. 149– 154). ELRA and ICCL. https://aclanthology.
org/2024.sigul-1.19/ACL Anthology+2ACL
Anthology+2ACL Anthology+2
Tang, H., & Wang, S. (2021). Few-shot learning for cross-
lingual end-to-end speech recognition. In Proceedings