SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS

E. Didiot; I. Illina; O. Mella; D. Fohr; J.-P. Haton

doi:10.5220/0001572901510156

SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS

E. Didiot, I. Illina, O. Mella, D. Fohr, J.-P. Haton

2006

Abstract

The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the broadcast test corpus, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 39% for the speech/music discrimination task.

References

Carey, M., Parris, E., and Lloyd-Thomas, H. (1999). A Comparison of Features for Speech, Music Discrimination. In ICASSP-99.
Deviren, M. (2004). Revisiting speech recognition systems : dynamic Bayesian networks and new computational paradigms. PhD thesis, Université Henri Poincaré, Nancy, France.
I. Daubechies, S. M. (1996). A Nonlinear Sqeezing of the Continuous Wavelet Transform based on Auditory Nerve Models. In Wavelets in Medecine and Biology.
Kaiser, J. (1990). On a Simple Algorithm to Calculate the 'Energy'of a Signal. In ICASSP-90.
Logan, B. (2000). Mel Frequency Cepstral Coefficients for Music Modeling. In International Symposium on Music Information Retrieval (ISMIR).
Mallat, S. (1998). A Wavelet Tour of Signal Processing. Academic Press.
Pinquier, J. (2002). Speech and music classification in audio documents. In ICASSP-02.
Sarikaya, R. and Hansen, J. (2000). High Resolution Speech Feature Parameterization for Monophonebased Stressed Speech Recognition. IEEE Signal Processing Letters, 7(7):182-185.
Scheirer, E. and Slaney, M. (1997). Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator. In ICASSP-97.
Tzanetakis, G. and Cook, P. (2002). Musical Genre Classification of Audio Signals. IEEE Transaction on Speech and Audio Processing, 10(5):293-302.

Download

Paper Citation

in Harvard Style

Didiot E., Illina I., Mella O., Fohr D. and Haton J. (2006). SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006) ISBN 978-972-8865-64-1, pages 151-156. DOI: 10.5220/0001572901510156

in Bibtex Style

@conference{sigmap06,
author={E. Didiot and I. Illina and O. Mella and D. Fohr and J.-P. Haton},
title={SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006)},
year={2006},
pages={151-156},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001572901510156},
isbn={978-972-8865-64-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2006)
TI - SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS
SN - 978-972-8865-64-1
AU - Didiot E.
AU - Illina I.
AU - Mella O.
AU - Fohr D.
AU - Haton J.
PY - 2006
SP - 151
EP - 156
DO - 10.5220/0001572901510156