Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection

Patricia Besson; Murat Kunt

doi:10.5220/0001224701060115

Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection

Patricia Besson, Murat Kunt

2006

Abstract

This work addresses the problem of detecting the speaker on audio-visual sequences by evaluating the synchrony between the audio and video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework so as to get confidence levels associated to the classifier outputs. Such an approach allows to evaluate the whole classification process efficiency, and in particular, to evaluate the advantage of performing or not the feature extraction. As a result, it is shown that introducing a feature extraction step prior to the classification increases the ability of the classifier to produce good relative instance scores.

References

Hershey, J., Movellan, J.: Audio-vision: Using audio-visual synchrony to locate sounds. In: Proc. of NIPS. Volume 12., Denver, CO, USA (1999) 813-819
Nock, H.J., Iyengar, G., Neti, C.: Speaker localisation using audio-visual synchrony: An empirical study. In: Proceedings of the International Conference on Image and Video Retrivial (CIVR), Urbana, IL, USA (2003) 488-499
Butz, T., Thiran, J.P.: From error probability to information theoretic (multi-modal) signal processing. Signal Processing 85 (2005) 875-902
Fisher III, J.W., Darrell, T.: Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia 6 (2004) 406-413
Besson, P., Popovici, V., Vesin, J.M., Kunt, M.: Extraction of audio features specific to speech using information theory and differential evolution. EPFL-ITS Technical Report 2005-018, EPFL, Lausanne, Switzerland (2005)
Ihler, A.T., Fisher III, J.W., Willsky, A.S.: Nonparametric hypothesis tests for statistical dependency. IEEE Transactions on Signal Processing 52 (2004) 2234-2249
Moon, T.k., Stirling, W.C.: Mathematical Methods and Algorithms for Signal Processing. Prentice hall (2000)
Meynet, J., Popovici, V., Thiran, J.P.: Face detection with mixtures of boosted discriminant features. Technical Report 2005-35, EPFL, 1015 Ecublens (2005)
Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Patterson, E., Gurbuz, S., Tufekci, Z., , Gowdy, J.: Cuave: a new audio-visual database for multimodal human-computer interface research. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP ). Volume 2., Orlando, IEEE (2002) 2017-2020
Besson, P., Monaci, G., Vandergheynst, P., Kunt, M.: Experimental evaluation framework for speaker detection on the cuave database. Technical Report TR-ITS-2006.003, EPFL, 1015 Ecublens (2006)

Download

Paper Citation

in Harvard Style

Besson P. and Kunt M. (2006). Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection . In Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006) ISBN 978-972-8865-67-2, pages 106-115. DOI: 10.5220/0001224701060115

in Bibtex Style

@conference{bpc06,
author={Patricia Besson and Murat Kunt},
title={Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection},
booktitle={Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006)},
year={2006},
pages={106-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001224701060115},
isbn={978-972-8865-67-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006)
TI - Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection
SN - 978-972-8865-67-2
AU - Besson P.
AU - Kunt M.
PY - 2006
SP - 106
EP - 115
DO - 10.5220/0001224701060115