
 
spectrogram to acquire a matrix of (32 × 12)  and 
saving this matrix for further use. 
 
The memory space needed for this algorithm is 
32×12 integer numbers for each sample. Considering 
the range of these integers [0, 255], only 1 byte is 
needed for each number. Therefore each sample 
takes 384 bytes and the growth of space is 384×n 
hence Ω(n). 
 
Time complexity for this algorithm can be 
calculated by considering the comparisons for each 
sample. The number of layers is always constant (6 
layers in this case: 2
0
 to 2
5
). In presented problem 
there are 12 frequency bands which are also 
constant. Therefore number of comparisons for each 
sample is constant and same for all samples. This 
shows that the complexity is again Ω(n). However, 
the sorting step takes about O(n Log n). This causes 
the time complexity to be O(n Log n) in overall 
which is still acceptable as a fast algorithm. 
 
The small size of memory needed for each 
sample and the speed of recognition of a word make 
this algorithm very suitable for voice commands in 
mobile devices such as cell phones, PDAs and etc. 
Since there are only one or two words for each 
command and considering the results in next section, 
this simple algorithm seems very efficient and useful 
for the purpose of voice command. 
5 RESULTS 
The presented algorithm has been implemented and 
tested for a single speaker with 100 words. The 
results have been compared to that of a widely used 
method in speech recognition, HMM. In order to 
measure flexibility of this algorithm to noise, 
different kinds of noises are applied to test data. 
Table 1 shows these results. 
 
For this purpose, a database of samples is 
generated which contains about 8 different 
pronunciations of a same word, for 100 words which 
add up to 800 samples. All samples were introduced 
to system except one for each word. Then these 
unused samples were tested by system and asked for 
recognition. The entry called "Clean" in table 1 
refers to these results. 
 
Afterwards, different amounts of two kinds of 
noises, White Noise and Babble Noise, are added to 
test data and asked again to be recognized. Other 
entries of table 1 show these results. Also, "First 
Answer" means first recognized answer is the 
correct answer and "Third Answer" means one of the 
first three answers is the correct answer. The same 
data has been tested with HMM approach and its 
results are also included for comparison. 
Table 1: Experimental results. 
  HMM 
First 
Answer 
Third 
Answer 
Clean  100 %  98 %  99 % 
20 db  99 %  91 %  96 % 
10 db  74 %  90 %  96 % 
White 
Noise 
0 db  4 %  84 %  91 % 
20 db  98 %  98 %  99 % 
10 db  92 %  92 %  95 % 
Babble 
Noise 
0 db  39 %  44 %  72 % 
 
Table 1 shows that while the efficiency of HMM 
algorithm drops down sharply with noisy data, the 
presented algorithm keeps its efficiency even with 
intensive noise. Also, it can be noted that because of 
the smoothing property of averaging, this algorithm 
has a good resistance to white noise and this can be 
concluded from above results. However, because the 
babble noise destroys the information of lower 
frequencies, it can affect the efficiency of this 
algorithm. Therefore, the first 3 lower frequency 
bands of spectrograms have been ignored to achieve 
better results in table 1.  
REFERENCES 
Zimmermann, H.J., 1996. Fuzzy set theory and its 
applications, Kluwer Academic Publishers. 
Boston/Dordrecht/London, 3
rd
 edition. 
Gonzalez, R., Woods, R., 2001. Digital image Processing, 
Prentice Hall. New Jersey, 2
nd
 edition. 
Halavati, R., Bagheri, S., Sameti, H., Babaali, B., 2005. A 
novel noise immune fuzzy approach to speech 
recognition. In International Fuzzy Systems 
Association 11th World Congress. Beijing, China. 
Babaali, B., Sameti, H., 2004. The sharif speaker-
independent large vocabulary speech recognition 
system. In The 2nd Workshop on Information 
Technology & Its Disciplines (WITID 2004). Kish 
Island, Iran. 
Duchateau, J., Demuynck, K., Compernolle, D.V., 1998. 
Fast and Accurate Acoustic Modelling with Semi-
Continuous HMMs. In Speech Communication, 
volume 24, No. 1, pages 5--17. 
Ohkawa, Y., Yoshida, A., Suzuki, M., Ito, A., Makino, S., 
2003. An optimized multi-duration HMM for 
spontaneous speech recognition. In EUROSPEECH-
2003. 485-488. 
ICINCO 2006 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
126