
bridge robustness, multi-lingualism, and real-time
performance in order to encourage progress in natural
language processing using spoken language
technologies.
REFERENCES
Abdi, A., & Meziane, F. (Eds.). (2025). Speech recognition
and natural language processing [Special issue].
Applied Sciences. https://www.mdpi.com/journal/app
lsci/special_issues/S0SDL9UCWOMDPI
Ahlawat, H., Aggarwal, N., & Gupta, D. (2025). Automatic
speech recognition: A survey of deep learning
techniques and approaches. International Journal of
Cognitive Computing in Engineering, 6(7). https://doi
.org/10.1016/j.ijcce.2024.12.007 ResearchGate
Bhogale, K., Raman, A., Javed, T., & Khapra, M. (2023).
Effectiveness of mining audio and text pairs from
public data for improving ASR systems for low-
resource languages. Conference Proceedings.
ResearchGate
Chen, K. (2023). Speech recognition method based on deep
learning of artificial intelligence: An example of
BLSTM-CTC model. ACM. https://dl.acm.Org/doi/
abs/10.1145/3606193.3606201ACM Digital Library
Chen, Y., & Li, X. (2024). Combining automatic speech
recognition with semantic natural language processing
for schizophrenia classification. Psychiatry Research,
328, 115456. https:// doi.org/10. 1016/j.ps ychres.202
3.115456ScienceDirect
Davitaia, A. (2025). Application of machine learning in
speech recognition. ResearchGate. https://www.resear
chgate.net/publication/390349000_Application_of_Ma
chine_Learning_in_Speech_RecognitionResearchGate
Georgiou, G. P., Giannakou, A., & Alexander, K. (2024).
EXPRESS: Perception of second language phonetic
contrasts by monolinguals and bidialectals: A
comparison of competencies. Quarterly Journal of
Experimental Psychology.Wikipedia
Gong, Y., Chung, Y.-A., & Glass, J. (2023). AST: Audio
spectrogram transformer. Interspeech. Wikipedia
Jin, Z., Xie, X., Wang, T., & Liu, X. (2024). Towards
automatic data augmentation for disordered speech
recognition. Conference Proceedings.ResearchGate
Kamath, U., Graham, K. L., & Emara, W. (2023).
Transformers for machine learning: A deep dive. CRC
Press. Wikipedia
Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic
speech recognition using advanced deep learning
approaches: A survey. Information Fusion.
ResearchGate
Kwon, Y., & Chung, S.-W. (2023). MoLE: Mixture of
language experts for multi-lingual automatic speech
recognition. Conference Proceedings.ResearchGate
Lam, T. K., Schamoni, S., & Riezler, S. (2023). Make more
of your data: Minimal effort data augmentation for
automatic speech recognition and translation.
Conference Proceedings.ResearchGate
Li, J. (2021). Recent advances in end-to-end automatic
speech recognition. arXiv. https://arxiv.org/abs/2111.
01690arXiv
Mehrish, A., Majumder, N., Bharadwaj, R., & Poria, S.
(2023). A review of deep learning techniques for speech
processing. Information Fusion. https://doi.org/10.101
6/j.inffus.2023.06.004
ResearchGate+1ScienceDirect+1
Nguyen, T. S., Stueker, S., & Waibel, A. (2021). Super-
human performance in online low-latency recognition
of conversational speech. Interspeech.Wikipedia
Paaß, G., & Giesselbach, S. (2023). Foundation models for
natural language processing. Springer. Wikipedia
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey,
C., & Sutskever, I. (2022). Robust speech recognition
via large-scale weak supervision. OpenAI. https://cdn.
openai.com/papers/whisper.pdfWikipe dia+2OpenAI
+2Wikipedia+2
Rajwal, S., Zhang, Z., Chen, Y., Rogers, H., Sarker, A., &
Xiao, Y. (2025). Applications of natural language
processing and large language models for social
determinants of health: Protocol for a systematic
review. JMIR Research Protocols, 14(1), e66094.
https://www.researchprotocols.org/2025/1/e66094JRP
- JMIR Research Protocols
Ravanelli, M., Parcollet, T., & Bengio, Y. (2018). The
PyTorch-Kaldi speech recognition toolkit. Interspeech.
Wikipedia
Ristea, N.-C., Ionescu, R. T., & Khan, F. S. (2022). SepTr:
Separable transformer for audio spectrogram
processing. Interspeech
Siddique, L., Zaidi, A., Cuayahuitl, H., Shamshad, F., &
Shoukat, M. (2023). Transformers in speech
processing: A survey. Journal of Artificial Intelligence
Research. Wikipedia
Waibel, A. (2024). Super-human performance in online
low-latency recognition of conversational speech.
Interspeech.Wikipedia
Yadav, A. K., Kumar, M., Kumar, A., & Yadav, D. (2023).
Hate speech recognition in multilingual text: Hinglish
documents. Journal of Multilingual and Multicultural
Development.ResearchGate
Yu, F.-H., & Chen, K.-Y. (2021). Non-autoregressive
transformer-based end-to-end ASR using BERT. arXiv.
https://arxiv.org/abs/2104.04805arXiv
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
518