BANGLA ISOLATED WORD SPEECH RECOGNITION

Adnan Firoze, M. Shamsul Arifin, Ryana Quadir, Rashedur M. Rahman

2011

Abstract

The paper presents Bangla word speech recognition using spectral analysis and fuzzy logic. As human speech is imprecise and ambiguous, the fuzzy logic – the base of which is indeed linguistic ambiguity, could serve as a more precise tool for analysing and recognizing human speech. Even though the core source of an uttered word is a voiced signal, our system revolves around the visual representation of voiced signals – the spectrogram. The spectrogram may be perceived as a “visual” entity. The essences of a spectrogram are matrices that include information about properties of a sound, e.g., energy, frequency and time. In this research the spectral analysis has been chosen as opposed to image processing for increased accuracy. The decision making process of our system is based on fuzzy logic. Experimental results demonstrate that our system is 80% accurate compared to a commercial Hidden Markov Model (HMM) based speech recognizer that shows 73% accuracy on an average.

References

  1. Abul, Md. H., Jabir, M., Mumit, K, 2007. Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective, in SNLP 07, Kasetsart University, Bangok, Thailand
  2. Abul, Md. H., Jabir, M., Mumit, K, 2007. Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective, in SNLP 07, Kasetsart University, Bangok, Thailand
  3. Davies, K. H., Biddulph, R., Balashek, S., 1952. Automatic Speech Recognition of Spoken Digits, J. Acoust. Soc. Am. 24(6) pp.637 -642.
  4. Davies, K. H., Biddulph, R., Balashek, S., 1952. Automatic Speech Recognition of Spoken Digits, J. Acoust. Soc. Am. 24(6) pp.637 -642.
  5. Dragon Natural Speaking (DNS), 2010, Wikipedia Encyclopedia, 2010. Available: http://en.wikipedia.org/wiki/Dragon_NaturallySpeakin g
  6. Dragon Natural Speaking (DNS), 2010, Wikipedia Encyclopedia, 2010. Available: http://en.wikipedia.org/wiki/Dragon_NaturallySpeakin g
  7. Fletcher, H., 1922. The Nature of Speech and its Interpretations, Bell Syst. Tech. J., Vol 1, pp. 129- 144.
  8. Fletcher, H., 1922. The Nature of Speech and its Interpretations, Bell Syst. Tech. J., Vol 1, pp. 129- 144.
  9. Hasan, M. R., Nath, B., Alauddin B. M. , 2003. Bengali Phoneme Recognition: A New Approach, in 6th ICCIT conference, Dhaka.
  10. Hasan, M. R., Nath, B., Alauddin B. M. , 2003. Bengali Phoneme Recognition: A New Approach, in 6th ICCIT conference, Dhaka.
  11. Illinois Image Formation and Processing (IIFP), 2010. DSP Mini-Project: An Automatic Speaker Recognition System [Online]. Available: http://www.ifp.illinois.edu/minhdo/teaching/speaker_ recognition/speaker_recognition.html
  12. Illinois Image Formation and Processing (IIFP), 2010. DSP Mini-Project: An Automatic Speaker Recognition System [Online]. Available: http://www.ifp.illinois.edu/minhdo/teaching/speaker_ recognition/speaker_recognition.html
  13. Islam, M. R., Sohail, A. S. M., Sadid, M. W. H.M., Mottalib, A., 2005. Bangla Speech Recognition using three layer Back-Propagation Neural Network, in NCCPB, Dhaka.
  14. Islam, M. R., Sohail, A. S. M., Sadid, M. W. H.M., Mottalib, A., 2005. Bangla Speech Recognition using three layer Back-Propagation Neural Network, in NCCPB, Dhaka.
  15. Juang, B. H., Rabiner, L. R., 2005. Automatic Speech Recognition -A Brief History of the Technology, Elsevier Encyclopedia of Language and Linguistics, Second Edition, Amsterdam, Holland.
  16. Juang, B. H., Rabiner, L. R., 2005. Automatic Speech Recognition -A Brief History of the Technology, Elsevier Encyclopedia of Language and Linguistics, Second Edition, Amsterdam, Holland.
  17. Karim, A H M. R, Rahman, Md. S., Iqbal, Md.Zafar, 2002. Recognition of Spoken Letters in Bangla, in 6th ICCIT conference, Dhaka.
  18. Karim, A H M. R, Rahman, Md. S., Iqbal, Md.Zafar, 2002. Recognition of Spoken Letters in Bangla, in 6th ICCIT conference, Dhaka.
  19. Nuance Communications (NComm), (2010) Available: http://www.nuance.com/naturallyspeaking/
  20. Nuance Communications (NComm), (2010) Available: http://www.nuance.com/naturallyspeaking/
  21. Rahman, K. J., Hossain,M.A., Das, D., Islam, T. A. Z. and Ali, M.G., 2003. Continuous Bangla Speech Recognition System, in 6th Int. Conf. on Computer and Information Technology (ICCIT), Dhaka.
  22. Rahman, K. J., Hossain,M.A., Das, D., Islam, T. A. Z. and Ali, M.G., 2003. Continuous Bangla Speech Recognition System, in 6th Int. Conf. on Computer and Information Technology (ICCIT), Dhaka.
  23. Roy, K., Das, D., Ali, M.G, 2002. Development of the Speech Recognition System Using Artificial Neural Network, in 5th ICCIT conference, Dhaka.
  24. Roy, K., Das, D., Ali, M.G, 2002. Development of the Speech Recognition System Using Artificial Neural Network, in 5th ICCIT conference, Dhaka.
  25. Spectrogram on Wikipedia Encyclopedia, 2010. [Online]. Available: http://en.wikipedia.org/wiki/Spectrogram
  26. Spectrogram on Wikipedia Encyclopedia, 2010. [Online]. Available: http://en.wikipedia.org/wiki/Spectrogram
  27. Short-time Fourier Transform (STFT),Wikipedia Encyclopedia, 2010. [Online]. Available: http://en.wikipedia.org/wiki/STFT
  28. Short-time Fourier Transform (STFT),Wikipedia Encyclopedia, 2010. [Online]. Available: http://en.wikipedia.org/wiki/STFT
  29. Traunmüller, H., Eriksson, A., 1995. Publications of Hartmut Traunmüller, Stockholm University, Sweden [Online]. Available: http://www.ling.su.se/staff/hartmut/f0_m&f.pdf
  30. Traunmüller, H., Eriksson, A., 1995. Publications of Hartmut Traunmüller, Stockholm University, Sweden [Online]. Available: http://www.ling.su.se/staff/hartmut/f0_m&f.pdf
  31. Weiss, M., 2006 . Indo-European Language and Culture, Journal of the American Oriental Society [Online] . Available: http://findarticles.com/p/articles/mi_go2081/is_2_126/ ai_n29428508/
  32. Weiss, M., 2006 . Indo-European Language and Culture, Journal of the American Oriental Society [Online] . Available: http://findarticles.com/p/articles/mi_go2081/is_2_126/ ai_n29428508/
Download


Paper Citation


in Harvard Style

Firoze A., Arifin M., Quadir R. and Rahman R. (2011). BANGLA ISOLATED WORD SPEECH RECOGNITION . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-54-6, pages 73-82. DOI: 10.5220/0003492700730082


in Harvard Style

Firoze A., Arifin M., Quadir R. and Rahman R. (2011). BANGLA ISOLATED WORD SPEECH RECOGNITION . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-54-6, pages 73-82. DOI: 10.5220/0003492700730082


in Bibtex Style

@conference{iceis11,
author={Adnan Firoze and M. Shamsul Arifin and Ryana Quadir and Rashedur M. Rahman},
title={BANGLA ISOLATED WORD SPEECH RECOGNITION},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2011},
pages={73-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003492700730082},
isbn={978-989-8425-54-6},
}


in Bibtex Style

@conference{iceis11,
author={Adnan Firoze and M. Shamsul Arifin and Ryana Quadir and Rashedur M. Rahman},
title={BANGLA ISOLATED WORD SPEECH RECOGNITION},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2011},
pages={73-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003492700730082},
isbn={978-989-8425-54-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - BANGLA ISOLATED WORD SPEECH RECOGNITION
SN - 978-989-8425-54-6
AU - Firoze A.
AU - Arifin M.
AU - Quadir R.
AU - Rahman R.
PY - 2011
SP - 73
EP - 82
DO - 10.5220/0003492700730082


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - BANGLA ISOLATED WORD SPEECH RECOGNITION
SN - 978-989-8425-54-6
AU - Firoze A.
AU - Arifin M.
AU - Quadir R.
AU - Rahman R.
PY - 2011
SP - 73
EP - 82
DO - 10.5220/0003492700730082