VISUAL SPEECH RECOGNITION USING WAVELET TRANSFORM AND MOMENT BASED FEATURES

Wai C. Yau; Dinesh K. Kumar; Sridhar P. Arjunan; Sanjay Kumar

doi:10.5220/0001209903400345

VISUAL SPEECH RECOGNITION USING WAVELET TRANSFORM AND MOMENT BASED FEATURES

Wai C. Yau, Dinesh K. Kumar, Sridhar P. Arjunan, Sanjay Kumar

2006

Abstract

This paper presents a novel vision based approach to identify utterances consisting of consonants. A view based method is adopted to represent the 3-D image sequence of the mouth movement in a 2-D space using grayscale images named as motion history image (MHI). MHI is produced by applying accumulative image differencing technique on the sequence of images to implicitly capture the temporal information of the mouth movement. The proposed technique combines Discrete Stationary Wavelet Transform (SWT) and image moments to classify the MHI. A 2-D SWT at level 1 is applied to decompose MHI to produce one approximate and three detail sub images. The paper reports on the testing of the classification accuracy of three different moment-based features, namely Zernike moments, geometric moments and Hu moments computed from the approximate representation of MHI. Supervised feed forward multilayer perceptron (MLP) type artificial neural network (ANN) with back propagation learning algorithm is used to classify the moment-based features. The performance and image representation ability of the three moments features are compared in this paper. The preliminary results show that all these moments can achieve high recognition rate in classification of 3 consonants.

References

Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
Bobick, A. F. and Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:257-267.
Chen, T. (2001). Audiovisual speech processing. IEEE Signal Processing Magazine, 18:9-21.
Haung, K. Y. (2001). Neural network for robust recognition of seismic patterns. In IIJCNN'01, Int Joint Conference on Neural Networks.
Hu, M. K. (1962). Visual pattern recognition by moment invariants. IEEE Transactions on Information Theory, 8:179-187.
Khontazad, A. and Hong, Y. H. (1990). Rotation invariant image recognition using features selected via a systematic method. Pattern Recognition, 23:1089-1101.
Kulkarni, A. D. (1994). Artificial Neural Network for Image Understanding. Van Nostrand Reinhold.
Kumar, S. and Kumar, D. K. (2005). Visual hand gesture classification using wavelet transform and moment based features. International Journal of Wavelets, Multiresolution and Information Processing(IJWMIP), 3(1):79-102.
Kumar, S., Kumar, D. K., Alemu, M., and Burry, M. (2004). Emg based voice recognition. In Intelligent Sensors, Sensor Networks and Information Processing Conference.
Liang, L., Liu, X., Zhao, Y., Pi, X., and Nefian, A. V. (2002). Speaker independent audio-visual continuous speech recognition. In IEEE Int. Conf. on Multimedia and Expo.
Mallat, S. (1998). A Wavelet Tour of Signal Processing. Academic Press.
Mukundan, R. and Ramakrishnan, K. R. (1998). Moment Functions in Image Analysis : Theory and Applications. World Scientific.
Petajan, E. D. (1984). Automatic lip-reading to enhance speech recognition. In GLOBECOM'84, IEEE Global Telecommunication Conference.
Potamianos, G., Neti, C., Gravier, G., and Senior, A. W. (2003). Recent advances in automatic recognition of audio-visual speech. In Proc. of IEEE, volume 91.
Teague, M. R. (1980). Image analysis via the general theory of moments. Journal of the Optical Society of America, 70:920-930.
Teh, C. H. and Chin, R. T. (1988). On image analysis by the methods of moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10:496-513.

Download

Paper Citation

in Harvard Style

C. Yau W., K. Kumar D., P. Arjunan S. and Kumar S. (2006). VISUAL SPEECH RECOGNITION USING WAVELET TRANSFORM AND MOMENT BASED FEATURES . In Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 978-972-8865-60-3, pages 340-345. DOI: 10.5220/0001209903400345

in Bibtex Style

@conference{icinco06,
author={Wai C. Yau and Dinesh K. Kumar and Sridhar P. Arjunan and Sanjay Kumar},
title={VISUAL SPEECH RECOGNITION USING WAVELET TRANSFORM AND MOMENT BASED FEATURES},
booktitle={Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2006},
pages={340-345},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001209903400345},
isbn={978-972-8865-60-3},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - VISUAL SPEECH RECOGNITION USING WAVELET TRANSFORM AND MOMENT BASED FEATURES
SN - 978-972-8865-60-3
AU - C. Yau W.
AU - K. Kumar D.
AU - P. Arjunan S.
AU - Kumar S.
PY - 2006
SP - 340
EP - 345
DO - 10.5220/0001209903400345