Authors:
E. Didiot
;
I. Illina
;
O. Mella
;
D. Fohr
and
J.-P. Haton
Affiliation:
LORIA-CNRS & INRIA Lorraine, France
Keyword(s):
Speech/music discrimination, wavelets, static and dynamic parameters, long-term parameters, classifiers fusion.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Audio and Speech Processing
;
Digital Signal Processing
;
Multimedia
;
Multimedia Signal Processing
;
Pattern Recognition
;
Software Engineering
;
Telecommunications
Abstract:
The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the broadcast test corpus, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 39% for the speech/music discrimination task.