USE OF CEPSTRUM-BASED PARAMETERS FOR AUTOMATIC

PATHOLOGY DETECTION ON SPEECH

Analysis of Performance and Theoretical Justiﬁcation

Rub´en Fraile, Juan Ignacio Godino-Llorente, Nicol´as S´aenz-Lech´on, V´ıctor Osma-Ruiz

Department of Circuits & Systems Engineering, Universidad Polit´ecnica de Madrid

Carretera de Valencia Km 7, 28031 Madrid, Spain

Pedro G´omez-Vilda

Department of Computer Systems’ Architecture and Technology, Universidad Polit´ecnica de Madrid

Campus de Montegancedo s/n, Boadilla del Monte, 28660 Madrid, Spain

Keywords:

Speech analysis, Pattern classiﬁcation.

Abstract:

The majority of speech signal analysis procedures for automatic pathology detection mostly rely on parameters

extracted from time-domain processing. Moreover, calculation of these parameters often requires prior pitch

period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this

paper, an alternative approach based on cepstral-domain processing is presented which has the advantage of

not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed

scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has

an easier physical interpretation while achieving similar performance standards.

1 INTRODUCTION

Analysis of recorded speech is an attractive method

for pathology detection since it is a low-cost non-

invasive diagnostic procedure (Boyanov and Had-

jitodorov, 1997). Although there is a wide range

of causes for pathological voice (functional, neural,

laryngeal, etc.) and a correspondingly wide range

of acoustic parameters has been proposed for its de-

tection (see (Jackson-Menaldi, 2002) for summaris-

ing tables and typical values), these intend to detect

speech signal features that may be roughly classiﬁed

in only three classes (Godino-Llorente et al., 2006b):

• Short-term frequency perturbations: both in fun-

damental frequency and in formants.

• Short-term amplitude perturbations.

• Noise or, more speciﬁcally, speech-to-noise ratio.

Calculation of above-mentioned acoustic parame-

ters requires previous and reliable detection of speech

fundamental frequency(pitch) (Deliyski, 1993) (Boy-

anov and Hadjitodorov, 1997). Nevertheless, pitch

detection is not an easy task due to its sensitiveness to

noise, signal distorsion, speech formants, etc. (Boy-

anov et al., 1993).

An alternative approach to speech signal analy-

sis is doing it in cepstral domain, more speciﬁcally

in Mel-frequency cepstral domain. Such approach,

consisting in classifying patterns of so-called Mel-

frequency cepstral coefﬁcients (MFCC), does not re-

quire prior pitch estimation and has proven to be

fairly robust against different kinds of speech distor-

tion (Bou-Ghazale and Hansen, 2000), including that

of telephone channel (Fraile et al., 2007), and reason-

ably independent of the particular way in which com-

putations may be implemented (Ganchev et al., 2005).

For these reasons, their application to automatic voice

pathology detection has been proposed during the

last years (Godino-Llorente and G´omez-Vilda, 2004).

Yet, to authors’ knowledge, up to now no physical ex-

planation exists on the meaning of MFCC and their

relevance on pathology detection.

Within this paper, a new scheme for automatic

voice pathology detection is proposed. This lies

half-way between usual cepstral domain and Mel-

frequency cepstral domain. Namely, it takes proﬁt

from the conceptual interpretation of cepstral process-

ing of speech signals (Deller et al., 1993), the pat-

tern separation capability of cepstral distances (Ra-

biner and Juang, 1993) and the smoother spectrum es-

Fraile R., Ignacio Godino-Llorente J., Sáenz-Lechón N., Osma-Ruiz V. and Gómez-Vilda P. (2008).

USE OF CEPSTRUM-BASED PARAMETERS FOR AUTOMATIC PATHOLOGY DETECTION ON SPEECH - Analysis of Performance and Theoretical

Justiﬁcation.

In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, pages 85-91

DOI: 10.5220/0001059600850091

 SciTePress

timation provided by the ﬁlter banks in MFCC calcu-

lation (Rabiner and Juang, 1993). The mathematical

formulation of both cepstrum and MFCC parameters

is revised in section 2, while the newly proposed set

of parameters is introduced in section 3. The results

from the application of these features to the detection

of pathologies on voices belonging to a commercial

database are reported in section 4. Last, the conclu-

sions are presented in section 5.

2 MATHEMATICAL

FORMULATION

2.1 Short-time Fourier Transform

As stated in previous section, the variability of speech

signal is a key feature for pathology detection. The

need for detecting such variability leads to the conve-

nience of employing short-time techniques for speech

processing. For this reason, in the following lines the

mathematical framework for short-time processing of

speech provided in (Deller et al., 1993) is revised.

Let x[n] be a speech signal composed by N sam-

ples (n= 0···N−1) obtained at a sampling frequency

equal to f

; then it can be segmented in frames deﬁned

by:

f [n;m] = x[n] · w[m− n] (1)

where w[n] is the framing window:

w[n] = 0 if n < 0 or n ≥ L (2)

and L is the frame length. Consequently, f [n;m] has

non-zero values only for n ∈ [m − L + 1,m]. If con-

secutive speech frames are overlapped a number of l

samples, then m may have the following values:

m = L+ p · (L− l

) − 1 (3)

where p is the frame index and it is an integer such

that:

0 ≤ p ≤

N − L

L− l

(4)

Considering the relation between the frame shift

m and the frame index p, frames without time shift

reference may be renamed as:

[n] = f [n+ m− L+ 1;m] =

= f [n+ p · (L− l

);m] = (5)

= x[n+ p· (L− l

)] · w[(L− 1)− n]

where n = 0· · · L − 1. From these speech frames,

the short-term Discrete Fourier Transform (stDFT) is

computed as:

(k) =

DFT

−1

∑

n=0

[n] · e

− j·

2π

DFT

·kn

(6)

where N

DFT

is the number of points of the stDFT, k =

0···N

DFT

− 1 and:

[n] =



[n] if 0 ≤ n < L

0 otherwise

(7)

thus, if N

DFT

≥ L then (6) is equal to:

(k) =

L−1

∑

n=0

[n] · e

− j·

2π

DFT

·kn

(8)

The frequency values that correspond to each stDFT

coefﬁcient are:







DFT

if k ≤

DFT

k−N

DFT

if k >

DFT

(9)

2.2 Short-time Cepstrum

In (Deller et al., 1993), an algorithm for computing

the short-time cepstrum from the stDFT is given, un-

der the assumption that N

DFT

>> L:

[q] =

DFT

−1

∑

k=0

log|S

(k)| · e

j·

2πk

DFT

·q

(10)

A physical interpretation of cepstrum can be de-

rived from the discrete-time model for speech pro-

duction that can also be found in (Deller et al., 1993).

This model may be written in frequency domain as:



jΩ



= E



jΩ



· G



jΩ



· H



jΩ



(11)

where S



jΩ



is the speech, E



jΩ



is the impulse

train corresponding to the fundamental frequency and

its harmonics, G



jΩ



is the glottal pulse wave-

form that modulates the impulse train and H



jΩ



is,

herein, the combined effect of vocal tract and lip ra-

diation. These components can be appreciated in ﬁg-

ure 1, which corresponds to the average modulus of

the short-term DFT calculated from one of the voice

records belonging to the database referred in section

4.1.

The quick impulse-like variations in ﬁgure 1 cor-

respond to the pitch harmonics E



jΩ



, and the evo-

lution of the impulse amplitude envelope is related

to the glottal waveform G



jΩ



and the formants in-

duced by the vocal tract H



jΩ



. These formants cor-

respond to the three envelope peaks with a decreas-

ing level of energy that are centered at 750 Hz, 1375

Hz and 3000 Hz. In fact, these center frequencies

BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

−1

Frequency (Hz)

E [ | S ( e

j Ω

) | ]

Original speech signal

Speech filtered in frequency domain

Figure 1: Average modulus of the short-term DFT for one

voice record.

0 1 2 3 4 5 6 7 8 9 10

−2

−1

Quefrency (ms)

Figure 2: Short term cepstrum averaged for all frames of

the same voice record as used for ﬁgure 1.

are coherent with the range of typical values given in

(Jackson-Menaldi, 2002).

The logarithm operation in (10) converts the prod-

ucts in (11) into sums. Consequently, it allows the

cepstrum to separate fast from slow signal variations

in frequency domain. This widely known fact is il-

lustrated in ﬁgure 2, where the peak around 5.7 ms

clearly identiﬁes the fundamental frequency (175 Hz)

and the values below 2 ms correspond to the spectrum

envelope.

2.3 Short-time MFCC

Once the stDFT of a speech signal is available, an-

other option for further processing, as mentioned in

section 1, is the calculation of short-time MFCC (stM-

FCC) parameters. For stMFCC computation, only the

positive part of the frequency axis is considered (Ra-

biner and Juang, 1993), that is, f

≥ 0 and, therefore,

k ≤ N

DFT

/2. In order to calculate stMFCC coefﬁ-

cients, a transformation is applied to the frequencies

so as to convert them to Mel-frequencies f

(Godino-

Llorente and G´omez-Vilda, 2004):

= 2595· log



700



(12)

and the stDFT is further processed through band-pass

integration along M equally long Mel-frequency in-

tervals, being M = ⌊3· log

⌋ ( ⌊·⌋ means round-

ing to the previous integer). Namely, the i

interval

(i = 1· · · M) in Mel-domain is deﬁned by:



i− 1

M + 1

i+ 1

M + 1



(13)

where F

is the maximum Mel-frecuency:

= max

= 2595· log



700



(14)

and the interval length in Mel-domain is given by:

L(I

) =

M + 1

· F

(15)

According to previous equations, the N

DFT

stDFT

coefﬁcients are transformed to M frequency compo-

nents as follows:

(i) =

∑

∈I

1−



− F

M+1



L(I

)/2

· |S

(k)| (16)

Last, the q

(q = 1··· Q) stMFCC of the p

speech frame, where Q is the desired length of the

Mel-cepstrum, is given by cosine transform of the

logarithm of the smoothed “Mel-spectrum” (Rabiner

and Juang, 1993):

[q] =

∑

i=1

log



(i)



· cos



q·



i−





(17)

3 CEPSTRAL COEFFICIENTS

BASED ON SMOOTHED

SPECTRUM

3.1 Justiﬁcation

As stated in section 1, while MFCC parameters ex-

hibit both good performanceand robustness in feature

extraction from speech, they lack a clear physical in-

terpretation. On the opposite, cepstrum has a physical

meaning (recall section 2.2), yet raw cepstrum coef-

ﬁcients are not as useful for speech parametrisation.

In the next paragraphs, the reasons for these facts are

exposed.

Cepstrum calculation, as formulated in (10), is

based on the spectrum estimate provided by the ab-

solute value of the stDFT. Due to the logarithm,

this gives a result that is proportional to the case of

periodogram-based spectrum estimation. However,

such estimation is very dependent on the speciﬁc val-

ues of the original speech frame. A more robust spec-

trum estimate can be obtained by smoothing of the

USE OF CEPSTRUM-BASED PARAMETERS FOR AUTOMATIC PATHOLOGY DETECTION ON SPEECH -

Analysis of Performance and Theoretical Justification

periodogram (Blackman and Tukey method, (Proakis

and Manolakis, 1996)). In fact, this is what (16) ex-

presses in the calculation of MFCC. Therefore, ﬁlter-

ing of the stDFT may be assumed to be one of the

sources of MFCC robustness.

In contrast, an explanation for the lack of clear

interpretation of MFCC also lies in the meaning of

(16). According to that equation, stDFT smoothing

for MFCC computation is carried out with a variable-

length ﬁlter, that is, a Bartlett window whose length

decreases for lower frequency bands. Moreover, the

smoothed stDFT is downsampled to obtain only M

samples in the interval [0, f

/2] that are not uniformly

spaced (Rabiner and Juang, 1993). While the down-

sampling is positive in the sense that it reduces the di-

mensionality of the problem, its non-uniformness, to-

gether with the previous variable-length ﬁltering, ob-

scures the interpretation of the output of the cosine

transform in (17).

From the previous reasoning, if stDFT is

smoothed with a ﬁxed-length ﬁlter and its output is

uniformly decimated prior to the logarithm compu-

tation, the cepstral coefﬁcients in (10) can be trans-

formed to a more robust parameter set. Moreover, this

is achieved while keeping the physical meaning of

cepstrum, since the output of the ﬁrst operation gives

an improved spectrum estimate and the second only

limits the length of cepstrum in quefrency domain.

3.2 Formulation

Starting from (8), if the stDFT modulus is smoothed

with a Bartlett window of constant length equal to ∆f

then the following output is obtained:

′

(i) =

∑

∈I

1−



− i· ∆f/2



∆f/2

· |S

(k)| (18)

where I

= [∆f · (i− 1)/2, ∆ f · (i+ 1)/2] and the

Bartlett window has been chosen for similarity with

(16). Herein, only the positive part of the frequency

axis has been considered, as in section 2.3.

If the ﬁltered stDFT is decimated so as to keep

only the outputs of consecutive windows with a 50%

overlap, this is equivalent to decimation by a fac-

tor D = ⌊∆ f · N

DFT

/(2· f

)⌋. The modiﬁed cepstrum

then becomes:

′

[q] =

DFT

2·D

∑

k=0

log



′

(k· D)



· (19)

· cos



(k− 1) ·

2πD

DFT

· q



where only the positive frequencies have been con-

sidered, hence computing the inverse DFT as a cosine

transform as in (17). c

′

[q] has the twofold advantage

over c

[q] of being based on a smoother spectrum es-

timate S

′

(i) and having a period length that has been

reduced by a factor D, thus providing some dimen-

sionality reduction.

3.3 Cepstral Distances

Differences in cepstrum can be used for speech signal

classiﬁcation. An example of such usage is the deﬁ-

nition of the cepstral distance in (Rabiner and Juang,

1993) as the norm of the vector resulting form sub-

straction of the two cepstra to be compared. This,

if directly applied to pathology detection, would re-

sult in comparing the cepstrum of consecutive speech

frames so as to assess the variability of the signal.

Mathematically:

DFT

−1

∑

q=0



′

p+1

[q] − c

′

[q]



(20)

However, bearing in mind the physical interpreta-

tion of cepstrum, this deﬁnition has the drawback of

mixing pitch variations with formant and glottal pulse

variations. To overcome this problem an individual

frame-to-frame cepstral parameter variation analysis

is proposed:

[q] =



′

p+1

[q] − c

′

[q]



(21)

This way, analysis of the distribution of d

[q] re-

lated to speech formant and glottal pulse variability

(low values of q) can be isolated from pitch changes

associated to values of q around the pitch period.

4 APPLICATION AND RESULTS

For the purpose of performance analysis, the modi-

ﬁed cepstral parameters presented in previous section

have been applied to the problem of automatic pathol-

ogy detection on recorded voice. The results have

been compared to those produced by MFCC. Within

this section, ﬁrst the voice database is presented, sec-

ond the used parameter set is speciﬁed, third the clas-

siﬁer is described and, last, the results are shown and

commented.

4.1 Database

The voice records used in this investigation are the

same as in (Godino-Llorente et al., 2006a). They be-

BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing

long to a database distributed by the company Kay El-

emetrics (Kay Elemetrics Corp., 1994). The recorded

sounds correspond to sustained phonations (1-3 s

long) of the vowel /ah/ from patients with either nor-

mal or disordered voice. Such voice disorders belong

to a wide variety of organic, neurological, traumatic

and psychogenic classes. Sampling rate of speech

records has been made uniform for all of them and

equal to 25 kHz, while the coding has a resolution

of 16 bits. The subset taken from the database con-

tains 53 normal and 173 pathological speakers which

are uniformly distributed in age and gender (Godino-

Llorente et al., 2006a).

4.2 Parameter Sets

For each speech record, cepstrum-based coefﬁcients,

as deﬁned in (19), have been calculated. Namely, a

ﬁlter length ∆f = 200 Hz has been chosen for sfDFT

smoothing. As a consequence, a cepstrum length

of ( f

− ∆f/2)/(∆f/2) = 124 samples results. The

choice of ∆ f is consistent to the approximate length

of the low-band ﬁlters used for MFCC calculation (re-

call (16)). At ﬁrst sight, however, it has the draw-

back of loosing pitch information of the signal spec-

trum. This is illustrated in ﬁgure 1 where the ﬁltered

DFT has been plotted with a dashed line. Neverthe-

less, such ﬁltered spectrum contains information on

both harmonic-to-noise ratio (HNR) and glottal pulse

waveform (Murphy and Akande, 2005) and HNR is

a useful parameter for pathology detection that is

closely related to both frequency and amplitude per-

turbations of pitch (Jackson-Menaldi, 2002).

Since cepstrum contains information on total sig-

nal energy and its distribution among formants, the

whole sequence is used as part of the parameter set.

As well as the cepstrum, information on its variability

is used as an input for the pathology detector. More

speciﬁcally, the mean and variance of d

[q] for each

value of q are used as descriptors of the cepstrum vari-

ability. Therefore, on the whole, a parameter vector of

124× 3 elements is produced.

For the sake of comparison, another classiﬁer

based on a parameter vector consisting of M =

⌊3· log

⌋ = 13 MFCC coeﬁcients averaged for all

signal frames has also been tested.

4.3 Classiﬁer Description

For both classiﬁcation schemes, a Multilayer Percep-

tron (MLP) with two hidden layers, each consisting of

4 neurons, and a single-neuron output layer has been

used as a classiﬁer. All neurons have logistic activa-

tion functions. An MLP with a single hidden layer

−2

−1

−2

−1

False alarm probability

Miss probability

Detector based on MFCC

Detector based on modified cepstrum

Figure 3: DET plot for MFCC based and modiﬁed cepstrum

based classiﬁers.

having 50 neurons was utilised in (Godino-Llorente

and G´omez-Vilda, 2004). The structure herein pro-

posed, in contrast, has less free parameters, thus al-

lowing a faster learning, and the reduced number of

neurons is compensanted by the introduction of an

additional hidden layer that permits learning of more

complex relations (Haykin, 1994).

4.4 Results

The MLP classiﬁer has been trained with 70% of

available speech records in such a way that its out-

put is expected to be “1” for pathological voices and

“0” for normal voices. The remaining 30% of records

have been used for testing. The experiment has been

repeated 20 times, each of them with different, ran-

domly chosen, training sets. The average results for

both MFCC and herein presented cepstrum-based pa-

rameters are drawn in the DET plot (Martin et al.,

1997) of ﬁgure 3.

Plotted results indicate that the performance of the

classiﬁer based on the newly proposed set of param-

eters is in the same order of magnitude than that of

MFCC parameters. To be speciﬁc, in terms of equal

error rate (EER), that is, for false alarm rate equal

to miss rate, the MFCC-based classiﬁcation yields

an experimental error probability of 15% while the

cepstrum-based classiﬁcation error probability for the

same conditions is 14%. Considering that within this

experiment the task of ﬁne-tuning the classiﬁer has

not been carried out and that the MLP has been cho-

sen as a standard for comparison, the difference in the

results is not signiﬁcant.

In order to acquire a deeper understanding of the

reasons for these results, an analysis of the relevance

of cepstrum-based parameters for speech classiﬁca-

tion as either pathological or not has been realised.

Such analysis is based on the evaluation of the Fisher

criterion (Duda et al., 2001) for each individual pa-

rameter. The results, differentiated for the three sub-

sets of parameters (modiﬁed cepstrum, variance of

USE OF CEPSTRUM-BASED PARAMETERS FOR AUTOMATIC PATHOLOGY DETECTION ON SPEECH -

Analysis of Performance and Theoretical Justification

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Quefrency (ms)

Fisher criterion

Variance of d

[q]

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Quefrency (ms)

Fisher criterion

Average of d

[q]

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Quefrency (ms)

Fisher criterion

Average modified cepstrum c’

[q]

Figure 4: Value of Fisher criterion for each cepstral param-

eter.

differences and average of absolute differences) are

plotted in ﬁgure 4.

According to this plot, the most relevant cepstral

parameters for pathology detection maybe roughly

classiﬁed into two groups:

• The modiﬁed cepstrum values with lowest indices

(plot at the bottom of ﬁgure 4): these are related

to the slowest components of the spectrum enve-

lope in ﬁgure 1, which, on their side, are asso-

ciated to spectral noise levels and HNR (Murphy

and Akande, 2005).

• The frame-to-frame variations in cepstrum-based

coefﬁcients whose quefrecies are within the inter-

val [0.5,1.5] miliseconds approximately: coefﬁ-

cients within that interval correspond to the short

frequency range components of the spectrum en-

velope. These components, as justiﬁed in section

2.2, are related to glottal waveform and speech

formants. However, this information itself does

not help to discriminate the presence of pathology,

as indicated by the low values of the Fisher crite-

rion in the bottom plot of ﬁgure 4. Instead, frame-

to-frame variations of these factors are muchmore

relevant, as depicted in the other two plots of the

same ﬁgure.

To be more speciﬁc, since the voice records of the

database used for this experiment correspond to sus-

tained vowel phonations, it can be assumed that the

vocal tract has very little variations, hence formants

do not change and the second group of parameters

should be more closely related to changes in the glot-

tal waveform. As for the limits of the quefrency in-

terval in which parameters from the second group are

relevant, the lower limit of 0.5 ms corresponds to the

quefrency band that separates slow components of the

spectrum envelope (ﬁrst group of parameters) from

faster components (associated to the second set); on

the other hand, the upper limit of 1.5 ms corresponds

to the highest quefrency range at which the modiﬁed

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−30

−20

−10

Quefrency (ms)

Figure 5: 124 modiﬁed cepstral parameters from one of the

database’s voice records.

cepstrum c

′

[q] has signiﬁcant values. This is shown

in ﬁgure 5, where a plot of the frame-averaged modi-

ﬁed cepstrum of one voice record is depicted.

5 CONCLUSIONS

Speech parametrisation in cepstral domain is a useful

technique for automatic pathology detection. Speciﬁ-

cally, MFCC have been successfully used for this pur-

pose. While the computation of these parameters has

an intrinsic robustness due to its independency from

pitch extraction and the spectrum ﬁltering, their phys-

ical interpretation is obscure because of the non-linear

Mel-frequency transformation.

Within this paper an alternative set of cepstrum-

based parameters has been proposed. Such param-

eters share the robustness of MFCC since they do

not require pitch estimation and ﬁltering of the es-

timated speech spectrum is also performed. In con-

trast to MFCC, the calculation of these newly pro-

posed parameters does not involve any non-linear fre-

quency transformation and, consequently, their phys-

ical interpretation remains clear. Namely, their val-

ues have been shown to be related to the amount of

noise energy present in speech and the glottal wave-

form variability. Both factors are directly associated

to laringeal pathologies.

Finally, the performance of the proposed cepstral

parameters for pathology detection has been tested

using a MLP classiﬁer and results have been com-

pared to those of MFCC. The obtained misclassiﬁca-

tion rates indicate that the performances of both sets

of parameters are similar. Moreover, a deeper analy-

sis on the individual impact of each parameter on the

classiﬁcation task has revealed that the most relevant

parameters are those more closely linked to the above-

mentioned two factors: noise energy and glottal wave

variations.

BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing

ACKNOWLEDGEMENTS

This research was carried out within projects funded

by the Ministry of Science and Technology of

Spain (TEC2006-12887-C02) and the Universidad

Polit´ecnica de Madrid (AL06-EX-PID-033).

REFERENCES

Bou-Ghazale, S. E. and Hansen, J. H. L. (2000). A compar-

ative study of traditional and newly proposed features

for recognition of speech under stress. IEEE Transac-

tions on Speech and Audio Processing, 8(4):429–442.

Boyanov, B. and Hadjitodorov, S. (1997). Acoustic analysis

of pathological voices. A voice analysis system for the

screening of laryngeal diseases. IEEE Engineering in

Medicine and Biology, 16(4):74–82.

Boyanov, B., Ivanov, T., Hadjitodorov, S., and Chollet, G.

(1993). Robust hybrid pitch detector. IEE Electronics

Letters, 29(22):1924–1926.

Deliyski, D. D. (1993). Acoustic model and evaluation

of pathological voice production. In Proceedings of

the 3

Conference on Speech Communication and

Technology (EUROSPEECH’93), pages 1969–1972,

Berlin (Germany).

Deller, J. R., Proakis, J. G., and Hansen, J. H. L. (1993).

Discrete-time processing of speech signals. Macmil-

lan Publishing Company, New York (USA).

Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern

classiﬁcation. John Wiley & sons, New York (USA),

edition.

Fraile, R., Godino-Llorente, J. I., S´aenz-Lech´on, N., Osma-

Ruiz, V., and G´omez-Vilda, P. (2007). Analysis of

the impact of analogue telephone channel on mfcc pa-

rameters for voice pathology detection. In 8

INTER-

SPEECH Conference (INTERSPEECH 2007), pages

1218–1221, Antwerp (Belgium).

Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2005).

Comparative evaluation of various MFCC implemen-

tations on the speaker veriﬁcation task. In Proceed-

ings of the 10

International Conference on Speech

and Computer (SPECOM 2005), pages 191–194, Pa-

tras (Greece).

Godino-Llorente, J. I. and G´omez-Vilda, P. (2004). Au-

tomatic detection of voice impairments by means of

short-term cepstral parameters and neural network

based detectors. IEEE Transactions on Biomedical

Engineering, 51(2):380–384.

Godino-Llorente, J. I., G´omez-Vilda, P., and Blanco-

Velasco, M. (2006a). Dimensionality reduction of a

pathological voice quality assessment system based

on gaussian mixture models and short-term cepstral

parameters. IEEE Transactions on Biomedical Engi-

neering, 53(10):1493–1953.

Godino-Llorente, J. I., S´aenz-Lech´on, N., Osma-Ruiz, V.,

Aguilera-Navarro, S., and G´omez-Vilda, P. (2006b).

An integrated tool for the diagnosis of voice disorders.

Medical Engineering & Physics, 28(3):276–289.

Haykin, S. (1994). Neural Networks: a comprehensive

foundation. Macmillan College Publishing Company,

New York (USA), 1

edition.

Jackson-Menaldi, M. C. A. (2002). La voz patol´ogica.

Editorial M´edica Panamericana, Buenos Aires (Ar-

gentina).

Kay Elemetrics Corp. (1994). Disordered voice

database.version 1.03.

Martin, A., Doddington, G., Kamm, T., Ordowski, M., and

Przybocki, M. (1997). The DET curve in assess-

ment of detection task performance. In Proceedings

of the 5

Conference on Speech Communication and

Technology (EUROSPEECH’97), pages 1895–1898,

Rhodes (Greece).

Murphy, P. J. and Akande, O. O. (2005). Quantiﬁcation

of glottal and voiced speech harmonics-to-noise ratios

using cepstral-based estimation. In Proceedings of the

International Conference on Non-Linear Speech

Processing (NOLISP’05), pages 224–232, Barcelona

(Spain).

Proakis, J. G. and Manolakis, D. G. (1996). Digital Sig-

nal Processing. Principles, Algorithms and Applica-

tions. Prentice-Hall International, New Jersey (USA),

edition.

Rabiner, L. and Juang, B. H. (1993). Fundamentals of

speech recognition. Prentice-Hall, Englewood Cliffs

(USA).

USE OF CEPSTRUM-BASED PARAMETERS FOR AUTOMATIC PATHOLOGY DETECTION ON SPEECH -

Analysis of Performance and Theoretical Justification