Salah Werda, Walid Mahdi, Abdelmajid Ben Hamadou



In recent year, lip-reading systems have received a great attention, since it plays an important role in human communication with computer especially for hearing impaired or elderly people. The need for an automatic lip-reading system is ever increasing. Today, extraction and reliable analysis of facial movements make up an important part in many multimedia systems such as videoconference, low communication systems, lip- reading systems. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple syllable pronunciation. We present in this paper a new approach for lip localization and feature extraction in a speaker’s face. The extracted visual information is then classified in order to recognize the uttered viseme (visual phoneme). To check our system performance we have developed our Automatic Lip Feature Extraction prototype (ALiFE). Experiments revealed that our system recognizes 70.95 % of French digits uttered under natural conditions.


  1. Petajan, E. D., Bischoff, B., Bodoff, D., and Brooke, N. M., “An improved automatic lipreading system to enhance speech recognition,” CHI 88, pp. 19-25, 1988.
  2. Daubias P., Modèles a posteriori de la forme et de l'apparence des lèvres pour la reconnaissance automatique de la parole audiovisuelle. Thèse à l'Université de Maine France 05-12-2002.
  3. Goecke R., A Stereo Vision Lip Tracking Algorithm and Subsequent Statistical Analyses of the Audio-Video Correlation in Australian English. Thesis Research School of Information Sciences and Engineering. The Australian National University Canberra, Australia, January 2004.
  4. McGurck et Mcdonald J., Hearing lips and seeing voice. Nature, 264 : 746-748, Decb 1976.
  5. Matthews I., J. Andrew Bangham, and Stephen J. Cox. Audiovisual speech recognition using multiscale nonlinear image decomposition. Proc . 4th ICSLP, volume1, page 38-41, Philadelphia, PA, USA, Octob 1996.
  6. Meier U., Rainer Stiefelhagen, Jie Yang et Alex Waibe. Towards unrestricted lip reading. Proc 2nd International conference on multimodal Interfaces (ICMI), Hong-kong, Jan 1999.
  7. Prasad, K., Stork, D., and Wolff, G., “Preprocessing video images for neural learning of lipreading,” Technical Report CRC-TR-9326, Ricoh California Research Center, September 1993.
  8. Rao, R., and Mersereau, R., “On merging hidden Markov models with deformable templates,” ICIP 95, Washington D.C., 1995.
  9. Delmas P., Extraction des contours des lèvres d'un visage parlant par contours actif (Application à la communication multimodale). Thèse à l'Institut National de polytechnique de Grenoble, 12-04-2000.
  10. Potamianos G., Hans Peter Graft et eric Gosatto. An Image transform approach For HM based automatic lipreading. Proc, ICIP, Volume III, pages 173-177, Chicago, IL, USA Octb 1998.
  11. Matthews I., J. Andrew Bangham, and Stephen J. Cox. A comparaison of active shape models and scale decomposition based features for visual speech recognition. LNCS, 1407 514-528, 1998.
  12. Eveno N., “Segmentation des lèvres par un modèle déformable analytique”, Thèse de doctorat de l'INPG, Grenoble, Novembre 2003.
  13. Eveno N., A. Caplier, and P-Y Coulon, “Accurate and Quasi-Automatic Lip Tracking”, IEEE Transaction on circuits and video technology, Mai 2004.
  14. Miyawaki T, Ishihashi I, Kishino F. Region separation in color images using color information. Tech Rep IEICE 1989; IE89-50.
  15. Nakata Y., Ando M. Lipreading Method Using Color Extraction Method and Eigenspace Technique Systems and Computers in Japan, Vol. 35, No. 3, 2004 Zhang X., Mersereau R., Clements M. and Broun C., Visual Speech feature extractionfor improved speech recognition. In Proc. ICASSP, Volume II, pages 1993- 1996, Orlondo,FL, USA, May 13-17 2002.
  16. Werda S., Mahdi W. and Benhamadou A., “A SpatialTemporal technique of Viseme Extraction: Application in Speech Recognition “, SITIS 05, IEEE, Werda S., Mahdi W., Tmar M. and Benhamadou A., “ALiFE: Automatic Lip Feature Extraction: A New Approach for Speech Recognition Application “, the 2nd IEEE International Conference on Information & Communication Technologies: from Theory to Applications - ICTTA'06 - Damascus, Syria. 2006.
  17. Werda S., Mahdi W. and Benhamadou A., “LipLocalization and Viseme Classification for Visual Speech Recognition”, International Journal of Computing & Information Sciences. Vol.4, No.1, October 2006.

Paper Citation

in Harvard Style

Werda S., Mahdi W. and Ben Hamadou A. (2007). AUTOMATIC LIP LOCALIZATION AND FEATURE EXTRACTION FOR LIP-READING . In Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, ISBN 978-972-8865-74-0, pages 268-275. DOI: 10.5220/0002055702680275

in Bibtex Style

author={Salah Werda and Walid Mahdi and Abdelmajid Ben Hamadou},
booktitle={Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP,},

in EndNote Style

JO - Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP,
SN - 978-972-8865-74-0
AU - Werda S.
AU - Mahdi W.
AU - Ben Hamadou A.
PY - 2007
SP - 268
EP - 275
DO - 10.5220/0002055702680275