
SEARCHING FOR A ROBUST MFCC-BASED 
PARAMETERIZATION FOR ASR APPLICATION 
J. V. Psutka, Luboš Šmídl and Aleš Pražák 
Department of Cybernetics, University of West Bohemia, Pilsen, Czech Republic 
Keywords: MFCC parameterization, critical band-pass filters, robust front-end. 
Abstract:  The paper concerns with searching for areas of robust setting a MFCC-based parameterization as regards 
numbers of band-pass filters and computed coefficients. Settings that are theoretically recommended for 
telephone and microphone speech are compared with a large number of experimental results and a new 
technique for determination of robust areas of {<# of band-pass filters>×<# of coefficients>} is designed. 
1 INTRODUCTION 
The state of the art parameterization techniques used 
in ASR systems try to model the process of human 
hearing. In speech processing terminology these 
techniques are known as MFCC (Zheng and Song, 
2001) and PLP parameterizations. It is well known 
that both these techniques attempt to accommodate 
the parameter estimation process to the way of 
human hearing and how human perceive sounds 
with various frequencies. However, one question 
that we have to deal with is a selection of an 
"optimal" number of critical band-pass filters and a 
number of computed coefficients. In papers 
published in many prestige world conferences we 
usually find nearly always the same settings without 
necessary analysis of the task conditions and 
reference e.g. to the used sampling frequency of 
speech signal (perhaps it is influenced by the default 
setting the software tool HTK, which is frequently 
used at many research labs). On the other hand, from 
the relatively rich experience of building many ASR 
systems we known that there isn't only one universal 
setting which would yield for given "quality" of 
speech signal the most successful results of 
recognition experiments. Experimental results 
however indicate that the best classification results 
create in the space {<number of band-pass filters> × 
<number of coefficients>} certain areas in which the 
successfulness is high and it doesn't change too 
much (i.e. it doesn't dependent on the change of the 
number of critical band-filters and the number of 
coefficients). The goal of described works is to find 
settings (i.e. the number of filters and derived 
coefficients), which correspond to the best 
recognition results and then for such solutions to 
specify "areas of robust setting".  
The whole work is done with the MFCC 
parameterization and for speech data of telephone 
(F
v
 =8 kHz) and microphone (F
v
 =44.1 kHz) quality. 
2 MFCC BASED PROCESSING 
The computational algorithm of the MFCC 
parameterization is realized by the bank of 
symmetric overlapping triangular filters spaced 
linearly in a mel-frequency axis, according to 
auditory perceptual considerations. The spacing as 
well as bandwidth of the particular filters is 
determined by a critical-band concept. To execute 
this process we have to perform following steps: 
•Computation of short-term speech spectrum. 
•Non-linear frequency transformation and critical-
band spectral resolution – triangular band-pass 
filters in a mel-frequency axis. 
Table 1: Recommended numbers of filters for different 
values of sampling frequency. 
• Computation of cepstral coefficients. 
• Applying an inverse discrete Fourier transform. 
Sampling 
frequency 
F
v
 [kHz] 
Band 
width 
[kHz] 
Band 
width 
[mell] 
Number 
of filters 
M 
8 0÷4 0÷2146 15 
16 0÷8 0÷2840 20 
44.1 0÷22 0÷3921 27 
196
V. Psutka J., Šmídl L. and Pražák A. (2007).
SEARCHING FOR A ROBUST MFCC-BASED PARAMETERIZATION FOR ASR APPLICATION.
In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 192-195
DOI: 10.5220/0002140401920195
Copyright
c
 SciTePress