Authors:
Rong Fu
and
Ian D. Benest
Affiliation:
University of York, United Kingdom
Keyword(s):
Speaker Diarization, Model Complexity Selection, Universal Background Model.
Related
Ontology
Subjects/Areas/Topics:
Multimedia
;
Multimedia Databases, Indexing, Recognition and Retrieval
;
Multimedia Systems and Applications
;
Telecommunications
Abstract:
This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone. It is based on the ICSI-SRI Fall 2004 diarization system (Wooters et al., 2004), but it has a number of significant modifications. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. It determines the model complexity automatically. It adapts the segment model from a Universal Background Model (UBM), and uses the
cross-likelihood ratio (CLR) instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Altogether this reduces the speaker diarization error rate from 25.36% to 21.37% compared to the baseline system (Wooters et al., 2004).