INNER LIP SEGMENTATION BY COMBINING ACTIVE

CONTOURS AND PARAMETRIC MODELS

Sebastien Stillittano

1

and Alice Caplier

2

1

Vesalis, 10 Allée Evariste Galois, 63000 Clermont-Ferrand, France

2

Gipsa-lab, 46, Avenue Félix Viallet, 38031 Grenoble, France

Keywords: Inner lip contour, segmentation, active contour (“jumping snake”), parametric model.

Abstract: Lip reading applications require accurate information about lip movement and shape, and both outer and

inner contours are useful. In this paper, we introduce a new method for inner lip segmentation. From the

outer lip contour given by a preexisting algorithm, we use some key points to initialize an active contour

called “jumping snake”. According to some optimal information of luminance and chrominance gradient,

this active contour fits the position of two parametric models; a first one composed of two cubic curves and

a broken line in case of a closed mouth, and a second one composed of four cubic curves in case of an open

mouth. These parametric models give a flexible and accurate final inner lip contour. Finally, we present

several experimental results demonstrating the effectiveness of the proposed algorithm.

1 INTRODUCTION

Many studies have shown that visual information

can significantly increase speech comprehension in

noisy environment (Neely, 1956) (Sumby, 1954).

Both inner and outer lip movements and shape give

useful information for lip reading applications. With

this motivation, many researches have been carried

out to accurately obtain outer lip contour. However,

relatively few studies deal with the problem of inner

lip segmentation. The main reason is that inner

contour extraction is much more difficult than outer

contour extraction. Indeed, we can find different

mouth shapes and non-linear appearance variations

during a conversation. Especially, inside the mouth,

there are different areas which have similar color,

texture or luminance than lips (gums and tongue).

We can see very bright zones (teeth) as well as very

dark zones (oral cavity). Every area could

continuously appear and disappear when people are

talking.

Among the existing approaches for inner lip

contour extraction, lip shape is represented by a

parametric deformable model composed of a set of

curves. In (Zhang, 1997), Zhang uses deformable

templates for outer and inner lip segmentation. The

chosen templates are three or four parabolas,

depending on whether the mouth is closed or open.

The first step is the estimation of candidates for the

parabolas by analyzing luminance information.

Next, the right model is chosen according to the

number of candidates. Finally, luminance and color

information is used to match the template. This

method gives results, which are not accurate enough

for lip reading applications, due to the simplicity and

the assumed symmetry of the model.

In (Beaumesnil, 2006), Beaumesnil et al. use

internal and external active contours for lip

segmentation as a first step. The second step

recovers a 3D-face model in order to extract more

precise parameters to adjust the first step. A k-means

classification algorithm based on a non-linear hue

gives three classes: lip, face and background. From

this classification, a mouth boundary box is

extracted and the points of the external active

contour are initialized on two cubic curves computed

from the box. The forces used for external snake

convergence are, in particular, a combination of non-

linear hue and luminance information. Next, an inner

snake is initialized on the outer contour. Then the

contour is shrunk by a non isotropic scaling with

regard to the mouth center and taking into account

the actual thickness of lips. The main problem is that

the snake has to be initialized close to the real

contour because it will converge to the closest

gradient minimum. Particularly for the inner lip

contour, different gradient minima are generated by

the presence of teeth or tongue and can cause a bad

297

Stillittano S. and Caplier A. (2008).

INNER LIP SEGMENTATION BY COMBINING ACTIVE CONTOURS AND PARAMETRIC MODELS.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 297-304

DOI: 10.5220/0001077402970304

Copyright

c

SciTePress

convergence. In (Beaumesnil, 2006), the 3D-face

model is used to correct this problem.

Statistical methods can be used for inner and

outer lip segmentation. In (Cootes, 1994a) and

(Cootes, 1994b), Cootes et al. develop statistical

active model for both shape (AMS) and appearance

(AAM). Shape and grey-level appearance of an

object are learned from a training set of annotated

images. Then, a Principal Component Analysis

(PCA) is performed to obtain the main modes of

variation. Models are iteratively matched to reduce

the difference between the model and the real

contour by using a cost function. In (Luettin, 1996),

Luettin et al. build an AMS and in (Abboud, 2005),

Abboud et al. build an AAM to position M-PEG

compatible feature points on the inner and outer lip

contours. The main interest of these models is that

the segmentation gives good results, but the training

data have to deal with many cases of possible mouth

shapes.

The aim of our work is to obtain an accurate

segmentation of the inner lip contour for lip reading

applications. We develop an algorithm based on

both active contours and parametric models. Models

represent the a priori shape of the mouth and the

“jumping snake” described in (Eveno, 2004) fits

their position.

For the outer lip segmentation, we use the

algorithm proposed in (Eveno, 2004). From the

resulting outer lip contour, we extract several key

points, and we define jumping snakes and two

different parametric models (depending on whether

the mouth is closed or open) to extract the inner lip

contour. As a consequence, our algorithm for inner

lip contour segmentation supposes that the outer

contour of the lips has already been extracted

successfully.

The paper is organized as follows. In section 2

we briefly describe the extraction of the outer lip

contour proposed in (Eveno, 2004). Section 3 and 4

show the way to find the inner lip contour depending

on whether the mouth is closed or open.

Experimental results are presented in section 5.

Finally, section 6 concludes the paper.

2 OUTER LIP CONTOUR

EXTRACTION

In (Eveno, 2004), Eveno et al. introduce a

parametric model composed of a broken line and

four cubic curves (see figure 1). The model is

initialized by 6 key points and is adjusted by using

some gradient information computed from the

pseudo-hue (Hulbert, 1998) and luminance. The

three points P

2

, P

3

and P

4

, linked by the broken

lines, give the Cupidon's bow contour, the point P

6

is

the lowest point of the contour and the points P

1

and

P

5

are the mouth corners. 4 cubic curves (γ

i

), linking

P

2

and P

6

(resp. P

4

and P

6

) to P

1

(resp. P

5

), complete

the outer contour.

Our algorithm for the inner contour detection is

inspired by the algorithm described in (Eveno,

2004). First, our method supposes that the outer lip

contour has been successfully segmented and that

we can use the different key points P

i

to initialize

our process. Moreover, we make the hypothesis that

the inner and outer lip contours are linked by the

mouth corners (P

1

and P

5

).

We develop two different strategies and 2 models

depending on whether the mouth is closed or open.

Figure 1: Key points and parametric model (Eveno, 2004).

3 CONTOUR EXTRACTION FOR

CLOSED MOUTH

3.1 Chosen Model

The parametric model for inner contour, when the

mouth is closed, is composed of two cubic curves (γ

5

and γ

6

) and one broken line (see figure 4). The

broken line linking the points P’

2

, P’

3

and P’

4

of the

model stands for the representation of the inner lip

distortion due to the Cupidon’s bow. Two cubic

curves, between the point P’

2

(resp. P’

4

) and the

mouth corner P

1

(resp. P

5

), complete the inner

contour. Experimental study has shown that a

parabola is not accurate enough to represent the

inner lip contour, as chosen in the majority of others

works. For lip reading applications, the inner

contour has to be very accurate and what we can call

the “inner Cupidon's bow” cannot be represented by

a single parabola between the mouth corners.

3.2 Model Initialization

For closed mouth, the inside of the mouth is only

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

298

composed of lips. The inner contour can be seen as a

dark line between the mouth corners.

We use the line L

min

to initialize the searched

contour. As proposed in (Delmas, 1999), L

min

is

composed of the darkest pixels and moreover, the

mouth corners have been chosen so as to be on this

line. L

min

is initialized on the darkest pixel of the

segment [P

3

P

6

] and grows by adding pixels in both

directions left and right. For each direction, only the

3 closest pixels are candidates and the pixel with the

minimum of luminance is chosen. As you can see on

figure 2, L

min

is already a good representation of the

inner lip contour.

Figure 2: Detection of L

min

.

L

min

is sampled and gives the initial contour

called C

1

(see figure 3). We find three key points

P’

2

, P’

3

and P’

4

, in order to fix the limits of the three

parts of our model. P’

3

is on the contour C

1

and is

the closest point to the vertical passing by P

3

. P’

2

is

the highest point of the contour C

1

limited by the

two verticals passing by P

2

and P

3

(interval I

1

on

figure 3). P’

4

is the highest point of the contour C

1

limited by the two verticals passing by P

3

and P

4

(interval I

2

on figure 3). The mouth corners are the

points P

1

and P

5

found with the detection of the

outer lip contour.

Figure 3: Initial contour C

1

by sampling L

min

and detection

of key points.

3.3. Model Optimization

From the key points detected in the previous section,

the final inner contour is given by a broken line

linking P’

2

, P’

3

and P’

4

, and two cubic curves,

between the mouth corner P

1

(resp. P

5

) and the key

point P’

2

(resp. P’

4

). The two curves are computed

with the least square minimization method.

Figure 4: Inner lip model for closed mouths.

4 CONTOUR EXTRACTION FOR

OPEN MOUTH

The detection of the inner lip contour is much more

challenging for open mouth, due to the non-linear

appearance variations of the inside of the mouth.

Indeed, during a conversation, the area between lips

could take different configurations. We can have

four configurations: 1) Teeth, 2) Oral cavity, 3) Gum

and 4) Tongue.

4.1 Chosen Model

The parametric model for inner contour, when the

mouth is open, is composed of four cubic curves

(see figure 9). For open mouth, the “inner Cupidon's

bow”, as introduced in section 3.1, is less

pronounced than for closed mouth, so using only

two cubic curves is sufficient to accurately segment

the upper inner lip contour. With four cubic curves,

the model is flexible and allows to challenge inner

segmentation with unsymmetrical mouth shape.

4.2 Model Initialization: Key Points

Extraction

Two jumping snakes, as introduced in (Eveno,

2004), are used to match the model; a first one for

the upper inner contour and a second one for the

lower inner contour.

The jumping snake convergence is a succession

of growth and jump phases. First, the snake is

initialized with a seed, then, the snake grows by

adding left and right endpoints. Each new point is

found by maximizing some gradient flow through

the segment between this current candidate point and

the previous one. Finally, the seed jumps to a new

position closer to the searched contour. The process

of growth and jump is repeated until the jump

amplitude is smaller than a threshold.

The initialization of the snakes starts with the

search for two points (P

7

and P

8

on figure 5) on the

upper and lower inner contours assumed to belong to

INNER LIP SEGMENTATION BY COMBINING ACTIVE CONTOURS AND PARAMETRIC MODELS

299

(

)

(

)

(

)

1

(, )Gxy= Crx,y+hx,y+Lx,y∇⎡ ⎤

⎣⎦

() ()() ()

2

(, ) 3G x y = L x, y Cr x, y S x, y h x, y∇−−−∗⎡⎤

⎣⎦

(1)

(2)

the vertical passing by P

3

. As said previously, the

difficulty of the task is that we can find between lips

different areas with similar or largely different color,

texture or luminance than lips, when a mouth is

open. The main goal is to find the adequate

information that can emphasize the inner contour for

every configuration. Experimental study on

thousands of face images has shown that no single

data can reach this goal and we have to consider a

combination between the information coming from

different spaces, each information emphasizing the

boundary for one specific configuration. For

example, lips are represented by a high pseudo-hue

and a high red component, teeth are bright and

saturated in color, the oral cavity is very dark, when

gums and tongue could have the same aspect than

lips. We build experimentally two gradients (G

1

and

G

2

) of mixed information coming from different

spaces to find P

7

and P

8

.

P

7

is found by searching

the maximum of the gradient G

1

(see equation 1)

between P

3

and P

6

. P

8

is found by searching the

maximum of the gradient G

2

(see equation 2)

between P

3

and P

7

. In order to avoid false detection

due to noise, we cumulate the different gradients on

10 columns around P

3

and we choose the point with

the highest cumulated gradient.

where Cr comes from the YCbCr space, h

is the

pseudo-hue, L is the luminance and S is the

saturation component of the HSV space. Each

component is normalized between 0 and 1. The

pseudo-hue, introduced by Hulbert et al. (Hulbert,

1998), is the ratio h = R/R+G, where R and G are the

red and green components of the RGB color space.

The pseudo-hue emphasizes contrast between lips

and skin (Eveno, 2004).

From P

8

and P

7

, we compute two seeds P’

8

and

P’

7

for the initialization of the jumping snakes. P’

8

is

¾ of the segment [P

3

P

8

] and P’

7

is ¾ of the segment

[P

6

P

7

] (see figure 5). With this configuration, the

seeds are closer to the inner contours than eventual

noise contours.

Figure 5: Detection of jumping snake seeds.

For the convergence of the snakes, we have also

to find gradients which emphasize the inner

boundary in every configuration. In the same way,

we experimentally build two kinds of space

combination. For the upper inner contour, the

convergence of the first jumping snake gives the

initial contour C

2.

P’

8

is taken as seed and the snake

parameters are chosen so that the two snake’s

branches tend to go down. G

3

(see equation 3) is the

gradient used for the snake’s growth phase. For the

lower inner contour, the convergence of the second

jumping snake gives the initial contour C

3.

P’

7

is

taken as seed and the snake parameters are chosen so

that the two snake’s branches tend to go up. G

4

(see

equation 4) is the gradient used for the snake’s

growth phase (see figure 6).

(3)

(4)

where R is the red component of the RGB space,

L is the luminance, u comes from the CIELuv space

(Wyszecki, 1982) and h is the pseudo-hue. Each

component is normalized between 0 and 1.

These 2 gradient definitions were chosen

because:

− the luminance L and the pseudo-hue h are

generally higher for the lips than inside the

mouth (in particular than the oral cavity, where

L and h are close to zero),

− the component u is higher for the lips than for

the teeth (indeed u is close to zero for the teeth)

− and the component R can be lower for the lips

than inside the mouth in others cases.

The sign is different between G

3

and G

4

because the

lips are above the inside of the mouth with G

3

,

whereas the lips are below the inside of the mouth

with G

4

.

We take the two closest points (P’’

8

and P’’

7

) to

the vertical passing by P

3

on each contour C

2

and C

3

as key points for our inner lip model.

Figure 6: Jumping snake convergences and detection of

key points.

(

)()()

3

(, )G x y = R x,y u x,y h x,y∇−−

⎡

⎤

⎣

⎦

(

)

(

)

(

)

4

(, )G xy= Lx,y+ux,y+hx,y∇

⎡

⎤

⎣

⎦

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

300

4.3. Snakes adjustment

4.3.1 Adjustment in Case of Teeth

In (Wang, 2004), Wang et al. find the teeth area by

computing the mean value

μ and the standard

deviation

σ of the components a and u of the

CIELab and CIELuv spaces (Wyszecki, 1982). Only

the pixels inside the mouth area are considered and

these parameters are represented by

μ

a

, μ

u

, σ

a

and

σ

u

. A pixel (x, y) is defined as a teeth pixel if:

or

We exploit this idea in our algorithm. After

having found the teeth area (see figure 7 (a)), we

adjust the points of the jumping snake found by the

first convergence (see figure 7 (b) and (c)), only if

there are teeth pixels just below the snake for the

lower inner contour or just above for the upper inner

contour.

(a) (b) (c)

Figure 7: (a) teeth region (the green pixels represent the

teeth), (b) snake convergence, (c) snake convergence after

the adjustment.

4.3.2 Adjustment in Case of Gum

Segmentation failures of the upper inner contour can

occur in presence of gum. Indeed, when the color

and texture information of the gum is too close to

the one of the lips, the contour is detected between

the gum and the teeth (see figure 8 (a)). To

overcome this difficulty, we use a second snake for

the upper contour. The seed of the 2

nd

snake is the

middle point of the 1

st

snake. The 2

nd

snake

parameters are chosen so that the two snake’s

branches tend to go up and G

5

(see equation 5) is the

gradient used for the snake’s growth phase.

(5)

where L is the luminance and Cr comes from the

YCbCr space. Each component is normalized

between 0 and 1.

G

5

is considered because the luminance L and the

component Cr are higher for the gum than for the

lips.

(a) 1

st

snake convergence

(b) 2

nd

snake convergence

Figure 8: Snake adjustment in presence of gum.

After the convergence, if the middle points of the

2

nd

snake are below the upper outer contour, we keep

the modification (see figure 8 (b)), else we go back

to the result of the 1

st

snake (that is the case when no

gums are visible).

4.4. Model Optimization

From the key points detected in the previous section,

the final inner contour is given by four cubic curves

between the mouth corners P

1

and P

5

and the key

points P”

7

and P”

8

. The two curves for the upper

contour are computed with the least square

minimization method by taking some points of the

contour C

2

close to P”

8

, the point P”

8

and the mouth

corners P

1

and P

5

. The two curves for the lower

contour are computed with the least square

minimization method by taking some points of the

contour C

3

close to P”

7

, the point P”

7

and the mouth

corners P

1

and P

5

.

Figure 9: Inner lip model for open mouth.

5 EXPERIMENTAL RESULTS

For testing the performances of our lip segmentation

method, we use images from the AR face database

(Martinez, 1998). It contains images of 126 people's

(, )

uu

uxy σμ≤−(, )

aa

axy σμ≤−

(

)

(

)

5

(, )G x y = L x,y +Cr x,y∇⎡ ⎤

⎣⎦

INNER LIP SEGMENTATION BY COMBINING ACTIVE CONTOURS AND PARAMETRIC MODELS

301

faces (70 men and 56 women) with different facial

expressions and illumination conditions. The mean

size of the mouths is 110 pixels in width. Figure 11

shows experimental inner lip segmentation results

for this database for both closed and open mouths.

The results are zoomed on the mouth to better see

the segmentation.

Moreover, we use image sequences from

different speakers acquired in our lab under natural

non uniform lighting conditions and without any

particular make-up. These images are RGB (8

bits/color/pixel) and contain the region of the face

spanning from chin to nostrils. The mean size of the

mouths is 85 pixels in width. Results for closed and

open mouths are shown on figure 12.

To evaluate quantitatively our algorithm in case

of open mouths, we use the method introduced by

Wu et al. (Wu, 2002). We hand-labelled the inner lip

contour of 507 images from the AR face database

(corresponding to the features “smile” and

“scream”) and 94 images from our own database. If

a pixel does not belong to both the hand-labelled

area and the area defined by our algorithm, the pixel

is evaluated as an error pixel. The error ratio is

computed by the ratio between the number of error

pixels (NEP) of the image divided by the number of

pixels in the hand-labelled area. The 252 first images

of the AR face database correspond to the feature

“smile” and the last 255 images correspond to the

feature “scream” (see figure 11 (b) and (c) for

examples).

The tables 1, 2 and 3 show the error ratio for the

3 images sets (database AR “smile” and “scream”

and our sequences). The value is 0.252 (standard

deviation = 0.093) for the AR images with the

feature “smile”, 0.112 (standard deviation = 0.095)

or the AR images with the feature “scream” and

0.188 (standard deviation = 0.068) for the images

from our sequences. The error ratio is lower for the

feature “scream” than for the feature “smile” and

this difference is due to the method for computing

the error ratio. Indeed, the number of error pixels

(NEP) is relatively constant for the whole database.

But to compute the error ratio, the NEP is divided by

the number of pixels in the hand-labeled area, and

that is obvious there are much more pixels in the

mouth area during a scream rather than during a

smile. The mean NEP is 360 for the “smile” images

and 535 for the “scream” images, whereas the mean

number of pixels in the inner lip hand-labelled area

is around 1505 for the first one and 4968 for the

second. For example, that's why the error ratio of the

last images of the figure 11 (b) is higher than the last

images of the figure 11 (c) in spite of a lower NEP.

Table 1: Error ratio for the images from the AR face

database with the feature “smile”.

AR database : feature “smile”

Error ratio (ER) in % (standard-deviation) 25.2 (9.3)

Number of images with ER < 15% 26

Number of images with 15% ≤ ER < 25% 118

Number of images with 25% ≤ ER < 50% 103

Number of images with 50% ≤ ER ≤ 75% 5

Number of images with ER > 75% 0

Mean number of error pixels (NEP)

(standard-deviation)

360 (179)

Mean number of pixels in the hand-labelled

area (standard-deviation)

1505 (598)

Table 2: Error ratio for the images from the AR face

database with the feature “scream”.

AR database : feature “scream”

Error ratio (ER) in % (standard-deviation) 11.2 (9.5)

Number of images with ER < 15% 216

Number of images with 15% ≤ ER < 25% 19

Number of images with 25% ≤ ER < 50% 16

Number of images with 50% ≤ ER ≤ 75% 4

Number of images with ER > 75% 0

Mean number of error pixels (NEP)

(standard-deviation)

535 (497)

Mean number of pixels in the hand-labelled

area (standard-deviation)

4968

(1556)

Table 3: Error ratio for the images from our sequences.

database from our sequences

Error ratio (ER) in % (standard-deviation) 18.8 (6.8)

Number of images with ER < 15% 28

Number of images with 15% ≤ ER < 25% 49

Number of images with 25% ≤ ER < 50% 17

Number of images with 50% ≤ ER ≤ 75% 0

Number of images with ER > 75% 0

Mean number of error pixels (NEP)

(standard-deviation)

108 (43)

Mean number of pixels in the hand-labelled

area (standard-deviation)

616 (238)

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

302

error = 0.268 error = 0.305 error = 0.669

NEP = 528 NEP = 1542 NEP = 3812

Figure 10: Failures due to the presence of gum or tongue.

The majority of the wrong detections for the

lower inner lip contour occur in presence of the

tongue, when the contour is not marked enough.

Also, in spite of the adjustment introduced in section

4.3.2., the upper inner lip contour can be found

between the gum and the teeth. Some examples are

shown on figure 10.

(a) closed mouth

error = 0.256 error = 0.106 error = 0.216

NEP = 255 NEP = 199 NEP = 196

(b) open mouth and feature “smile”

error = 0.038 error = 0.052 error = 0.071

NEP = 233 NEP = 381 NEP = 454

(c) open mouth and feature “scream”

Figure 11: Some results with the AR face database.

Also, by examining the localization of the error

pixels inside the mouth, we have seen that there are

sometimes a lot of error pixels near the mouth

corners, even if the inner lip contour seems to be

right. That is because in our model the outer lip

contour and the inner lip contour are linked by the

two mouth corners (P

1

and P

5

). So the cubic curves

of the inner contour have to pass by the mouth

corners and the contour could be not very accurate

near the mouth corners. For example, it is the case

for the images of the figure 12 (b).

(a) closed mouth

error = 0.194 error = 0.156 error = 0.287

NEP = 114 NEP = 96 NEP = 82

(b) open mouth

Figure 12: Some results for the images from our

sequences.

6 CONCLUSIONS

This paper presents an algorithm for inner lip

segmentation. The method consists of a combination

of active contours and parametric models. The active

contours give key points and fit the two models, a

first one for a closed mouth and a second one for an

open mouth. The parametric models, composed of

several cubic curves, allow to obtain accurate and

realistic results useful for applications which require

a high level of precision, such as lip reading.

For the moment, the decision between the closed

mouth model and the open mouth model is taken

manually and the inner lip segmentation is done for

static images. It could be useful to know

automatically if the mouth is closed or open for a

future work of segmentation in video sequences.

Indeed, during a conversation, the mouth

continuously alternates with closed and open

positions.

INNER LIP SEGMENTATION BY COMBINING ACTIVE CONTOURS AND PARAMETRIC MODELS

303

REFERENCES

Abboud, B., Chollet, G., 2005. Appearance Based Lip

Tracking and Cloning on Speaking Faces. In ISPA'05,

IEEE International Symposium on Image and Signal

Processing and Analysis. pp. 301-305.

Beaumesnil, B., Chaumont, M., Luthon, F., 2006.

Liptracking and MPEG4 Animation with Feedback

Control. In ICASSP'06, IEEE International

Conference on Acoustics, Speech, and Signal

Processing. Vol. 2, pp. 677-680.

Cootes, T. F., Hill, A., Taylor, C. J., Haslam, J., 1994a.

Use of Active Shape Models for Locating Structures

in Medical Images. In Image and Vision Computing.

Vol. 12, No. 6, pp. 355-365.

Cootes, T. F., Lanitis, A., Taylor, C. J., 1994b. Automatic

Tracking, Coding and Reconstruction of Human Faces

using Flexible Appearance Models. In IEE Electronic

Letters. Vol. 30, No 19, pp.1587-1588.

Delmas, P., Coulon, P-Y., Fristot V., 1999. Automatic

snakes for robust lip boundaries extraction. In

ICASSP'99, IEEE International Conference on

Acoustic, Speech and Signal Processing. Vol. 6, pp.

3069-3072.

Eveno, N., Caplier, A., Coulon P-Y, 2004. Automatic and

Accurate Lip Tracking. In IEEE Trans. on Circuits

and Systems for Video Technology. Vol. 14, No 5, pp.

706-715.

Hulbert, A., Poggio, T., 1998. Synthesizing a Color

Algorithm From Examples. In Science. Vol. 239, pp.

482-485.

Luettin, J., Thacker, N. A., Beet, S. W., 1996. Statistical

Lip Modelling for Visual Speech Recognition. In

Eusipco'96, Proceedings of the 8th European Signal

Processing Conference. Vol. 1, pp. 123-125.

Martinez, A. M., Benavente, R., 1998. The AR Face

Database. In CVC Tech. Report # 24.

Neely, K. K., 1956. Effect of Visual Factors on the

Intelligibility of Speech. J. Acoustical Society of

America. Vol. 28, pp. 1275-1277.

Sumby, W. H., Pollack, I., 1954. Visual Contribution to

Speech Intelligibility in Noise. J. Acoustical Society of

America. Vol. 26, pp. 212-215.

Wang, S. L., Lau, W. H., Leung, S. H., Yan, H., 2004. A

Real-time Automatic Lipreading System. In ISCAS,

IEEE International Symposium on Circuits and

Systems. Vol.2, pp. 101-104.

Wyszecki, G., Stiles, W. S., 1982. Color Science:

Concepts and Methods, Quantitative Data and

Formulae. John Wiley & Sons, Inc., New York, New

York, 2

nd

edition.

Wu, Z., Aleksic, P. S., Katsaggelos, A. K., 2002. Lip

tracking for MPEG-4 facial animation. In ICMI, IEEE

International Conference on Multimodal Interfaces.

pp. 293-298.

Zhang, L., 1997. Estimation of the mouth features using

deformable templates. In ICIP'97, IEEE International

Conference on Image Processing. Vol. 3.pp. 328–331.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

304