Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection

Patricia Besson, Murat Kunt

2006

Abstract

This work addresses the problem of detecting the speaker on audio-visual sequences by evaluating the synchrony between the audio and video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework so as to get confidence levels associated to the classifier outputs. Such an approach allows to evaluate the whole classification process efficiency, and in particular, to evaluate the advantage of performing or not the feature extraction. As a result, it is shown that introducing a feature extraction step prior to the classification increases the ability of the classifier to produce good relative instance scores.

References

  1. Hershey, J., Movellan, J.: Audio-vision: Using audio-visual synchrony to locate sounds. In: Proc. of NIPS. Volume 12., Denver, CO, USA (1999) 813-819
  2. Nock, H.J., Iyengar, G., Neti, C.: Speaker localisation using audio-visual synchrony: An empirical study. In: Proceedings of the International Conference on Image and Video Retrivial (CIVR), Urbana, IL, USA (2003) 488-499
  3. Butz, T., Thiran, J.P.: From error probability to information theoretic (multi-modal) signal processing. Signal Processing 85 (2005) 875-902
  4. Fisher III, J.W., Darrell, T.: Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia 6 (2004) 406-413
  5. Besson, P., Popovici, V., Vesin, J.M., Kunt, M.: Extraction of audio features specific to speech using information theory and differential evolution. EPFL-ITS Technical Report 2005-018, EPFL, Lausanne, Switzerland (2005)
  6. Ihler, A.T., Fisher III, J.W., Willsky, A.S.: Nonparametric hypothesis tests for statistical dependency. IEEE Transactions on Signal Processing 52 (2004) 2234-2249
  7. Moon, T.k., Stirling, W.C.: Mathematical Methods and Algorithms for Signal Processing. Prentice hall (2000)
  8. Meynet, J., Popovici, V., Thiran, J.P.: Face detection with mixtures of boosted discriminant features. Technical Report 2005-35, EPFL, 1015 Ecublens (2005)
  9. Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
  10. Patterson, E., Gurbuz, S., Tufekci, Z., , Gowdy, J.: Cuave: a new audio-visual database for multimodal human-computer interface research. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP ). Volume 2., Orlando, IEEE (2002) 2017-2020
  11. Besson, P., Monaci, G., Vandergheynst, P., Kunt, M.: Experimental evaluation framework for speaker detection on the cuave database. Technical Report TR-ITS-2006.003, EPFL, 1015 Ecublens (2006)
Download


Paper Citation


in Harvard Style

Besson P. and Kunt M. (2006). Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection . In Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006) ISBN 978-972-8865-67-2, pages 106-115. DOI: 10.5220/0001224701060115


in Bibtex Style

@conference{bpc06,
author={Patricia Besson and Murat Kunt},
title={Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection},
booktitle={Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006)},
year={2006},
pages={106-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001224701060115},
isbn={978-972-8865-67-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Biosignal Processing and Classification - Volume 1: BPC, (ICINCO 2006)
TI - Hypothesis Testing as a Performance Evaluation Method for Multimodal Speaker Detection
SN - 978-972-8865-67-2
AU - Besson P.
AU - Kunt M.
PY - 2006
SP - 106
EP - 115
DO - 10.5220/0001224701060115