Salah Werda, Walid Mahdi, Abdelmajid Ben Hamadou



Today, Human-Machine interaction represents a certain potential for autonomy especially of dependant people. Automatic Lip-reading system is one of the different assistive technologies for hearing impaired or elderly people. The need for an automatic lip-reading system is ever increasing. Extraction and reliable analysis of facial movements make up an important part in many multimedia systems such as videoconference, low communication systems, lip-reading systems. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple visemes (visual phoneme) pronunciation. We present in this paper a new approach for lip localization and feature extraction in a speaker’s face. The extracted visual information is then classified in order to recognize the uttered viseme. We have developed our Automatic Lip Feature Extraction prototype (ALiFE). ALiFE prototype is evaluated with a multiple speakers under natural conditions. Experiments include a group of French visemes by different speakers. Results revealed that our system recognizes 92.50 % of French visemes.


  1. Petajan, E. D., Bischoff, B., Bodoff, D., and Brooke, N. M., “An improved automatic lipreading system to enhance speech recognition,” CHI 88, pp. 19-25, 1988.
  2. Philippe Daubias, Modèles a posteriori de la forme et de l'apparence des lèvres pour la reconnaissance automatique de la parole audiovisuelle. Thèse à l'Université de Maine France 05-12-2002.
  3. Roland Goecke, A Stereo Vision Lip Tracking Algorithm and Subsequent Statistical Analyses of the AudioVideo Correlation in Australian English. Thesis Research School of Information Sciences and Engineering. The Australian National University Canberra, Australia, January 2004.
  4. McGurck et John Mcdonald. Hearing lips and seeing voice. Nature, 264 : 746-748, Decb 1976.
  5. Iain Matthews, J. Andrew Bangham, and Stephen J. Cox. Audiovisual speech recognition using multiscale nonlinear image decomposition. Proc . 4th ICSLP, volume1, page 38-41, Philadelphia, PA, USA, Octob 1996.
  6. Uwe Meier, Rainer Stiefelhagen, Jie Yang et Alex Waibe. Towards unrestricted lip reading. Proc 2nd International conference on multimodal Interfaces (ICMI), Hong-kong, Jan 1999.
  7. Prasad, K., Stork, D., and Wolff, G., “Preprocessing video images for neural learning of lipreading,” Technical Report CRC-TR-9326, Ricoh California Research Center, September 1993.
  8. Rao, R., and Mersereau, R., “On merging hidden Markov models with deformable templates,” ICIP 95, Washington D.C., 1995.
  9. Patrice Delmas, Extraction des contours des lèvres d'un visage parlant par contours actif (Application à la communication multimodale). Thèse à l'Institut National de polytechnique de Grenoble, 12-04-2000.
  10. Gerasimos Potamianos, Hans Peter Graft et eric Gosatto. An Image transform approach For HM based automatic lipreading. Proc, ICIP, Volume III, pages 173-177, Chicago, IL, USA Octb 1998.
  11. Iain Matthews, J. Andrew Bangham, and Stephen J. Cox. A comparaison of active shape models and scale decomposition based features for visual speech recognition. LNCS, 1407 514-528, 1998.
  12. N.Eveno, “Segmentation des lèvres par un modèle déformable analytique”, Thèse de doctorat de l'INPG, Grenoble, Novembre 2003.
  13. N. Eveno, A. Caplier, and P-Y Coulon, “Accurate and Quasi-Automatic Lip Tracking”, IEEE Transaction on circuits and video technology, Mai 2004.
  14. Miyawaki T, Ishihashi I, Kishino F. Region separation in color images using color information. Tech Rep IEICE 1989;IE89-50.
  15. Nakata Y, Ando M. Lipreading Method Using Color Extraction Method and Eigenspace Technique Systems and Computers in Japan, Vol. 35, No. 3, 2004 X. Zhang, Russell M. Mersereau, M. Clements and C. Charles Broun. Visual Speech feature extractionfor improved speech recognition. In Proc. ICASSP, Volume II, pages 1993-1996, Orlondo,FL, USA, May 13-17 2002.
  16. S. Werda, W. Mahdi and A. Benhamadou, “A SpatialTemporal technique of Viseme Extraction: Application in Speech Recognition “, SITIS 05, IEEE, S. Werda, W. Mahdi, M. Tmar and A. Benhamadou, “ALiFE: Automatic Lip Feature Extraction: A New Approach for Speech Recognition Application “, the 2nd IEEE International Conference on Information & Communication Technologies: from Theory to Applications - ICTTA'06 - Damascus, Syria. 2006.
  17. S. Werda, W. Mahdi, and A. Benhamadou, “LipLocalization and Viseme Classification for Visual Speech Recognition”, International Journal of Computing & Information Sciences. Vol.4, No.1, October 2006.
  18. N. B. Karayiannis and M. M.Randolph-Gips. Noneuclidean c-means clustering algorithms. Intelligent Data Analysis-An International Journal, 7(5):405-425, 2003.
  19. C.M Bishop, Neural Networks for Pattern Recognition, Oxford: Oxford University Press, 1995.

Paper Citation

in Harvard Style

Werda S., Mahdi W. and Ben Hamadou A. (2007). A NEW LIP-READING APPROCH FOR HUMAN COMPUTER INTERACTION . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS, ISBN 978-972-8865-92-4, pages 27-36. DOI: 10.5220/0002382800270036

in Bibtex Style

author={Salah Werda and Walid Mahdi and Abdelmajid Ben Hamadou},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS,},

in EndNote Style

JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS,
SN - 978-972-8865-92-4
AU - Werda S.
AU - Mahdi W.
AU - Ben Hamadou A.
PY - 2007
SP - 27
EP - 36
DO - 10.5220/0002382800270036