
Ajmera, J., McCowan, I., and Bourlard, H. (2004). Ro-
bust speaker change detection. IEEE Signal Process-
ing Letters, 11(8):649–651.
Barras, C., Zhu, X., Meignier, S., and Gauvain, J.-L.
(2006). Multistage speaker diarization of broadcast
news. IEEE Transactions on Audio, Speech, and Lan-
guage Processing, 14(5):1505–1512.
Berg, T. L., Berg, A. C., Edwards, J., Maire, M., White,
R., Teh, Y.-W., Learned-Miller, E., and Forsyth, D.
(2004). Names and Faces in the News. In Proc. CVPR,
pages II–848–II–854 Vol.2.
Bertini, M., Bimbo, A. D., and Pala, P. (2001). Content-
based indexing and retrieval of tv news. Pattern
Recognition Letters, 22(5):503–516.
Bradski, G. R. (1998). Real Time Face and Object Tracking
as a Component of a Perceptual User Interface. In
Proc. WACV, pages 214–219.
Bradski, G. R. (2000). The OpenCV Library. Dr. Dobb’s
Journal of Software Tools, 25(11):120, 122–125.
Chen, S. S. and Gopalakrishnan, P. S. (1998). Speaker, En-
vironment And Channel Change Detection And Clus-
tering Via The Bayesian Information Criterion. In
Proc. DARPA Broadcast News Transcription and Un-
derstanding Workshop, pages 127–132.
Dugad, R., Ratakonda, K., and Ahuja, N. (1998). Robust
Video Shot Change Detection. In Proc. MMSP, pages
376–381.
El-Khoury, E., Senac, C., and Joly, P. (2010). Face-and-
clothing based people clustering in video content. In
Proc. MIR, pages 295–304.
Everingham, M. R., Sivic, J., and Zisserman, A. (2006).
“Hello! My name is... Buffy” – Automatic Naming
of Characters in TV Video. In Proc. BMVC, pages
92.1–92.10.
Gauvain, J.-L. and Lee, C.-H. (1994). Maximum a poste-
riori estimation for multivariate gaussian mixture ob-
servations of markov chains. IEEE Transactions on
Speech and Audio Processing, 2(2):291–298.
Khan, S., Rafibullslam, M., Faizul, M., and Doll, D. (2008).
Speaker recognition using mfcc. International Jour-
nal of Computer Science and Engineering System,
2(1).
Korshunov, P. and Ooi, W. T. (2011). Video quality for
face detection, recognition, and tracking. ACM Trans-
actions on Multimedia Computing, Communications,
and Applications, 7(3):14:1–14:21.
Levenshtein, V. I. (1966). Binary codes capable of correct-
ing deletions, insertions and reversals. Cybernetics
and control theory, 10(8):707–710.
Lienhart, R., Kuranov, A., and Pisarevsky, V. (2003). Em-
pirical Analysis of Detection Cascades of Boosted
Classifiers for Rapid Object Detection. In Proc.
DAGM, pages 297–304.
Maji, S. and Bajcsy, R. (2007). Fast Unsupervised Align-
ment of Video and Text for Indexing/Names and
Faces. In Proc. MM, pages 57–64.
Martin, A., Doddington, G., Kamm, T., Ordowski, M., and
Przybocki, M. (1997). The DET curve in assess-
ment of detection task performance. In Proc. EU-
ROSPEECH, pages 1895–1898.
Meignier, S. and Merlin, T. (2010). LIUM SpkDiarization:
An Open Source Toolkit For Diarization. In Proc.
CMU SPUD Workshp.
Otsu, N. (1979). A threshold selection method from gray-
level histograms. IEEE Transactions on Systems, Man
and Cybernetics, 9(1):62–66.
Reynolds, D. A., Quatieri, T. F., and Dunn, R. B. (2000).
Speaker verification using adapted gaussian mixture
models. Digital Signal Processing, 10(1–3):19–41.
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T.,
and Meignier, S. (2013). An Open-source State-of-
the-art Toolbox for Broadcast News Diarization. In
Proc. INTERSPEECH.
Sim, T., Baker, S., and Bsat, M. (2002). The CMU Pose,
Illumination, and Expression (PIE) database. In Proc.
FG, pages 46–51.
Sivic, J., Zitnick, C. L., and Szeliski, R. (2006). Finding
people in repeated shots of the same scene. In Proc.
BMVC, pages 93.1–93.10.
Smith, R. (2007). An overview of the Tesseract OCR En-
gine. In Proc. ICDAR, pages 629–633.
Viola, P. and Jones, M. J. (2001). Rapid Object Detection
using a Boosted Cascade of Simple Features. In Proc.
CVPR, pages I–511–I–518 Vol.1.
Viola, P. and Jones, M. J. (2004). Robust real-time face
detection. International Journal of Computer Vision,
57(2):137–154.
Zhang, Y.-F., Xu, C., Lu, H., and Huang, Y.-M. (2009).
Character identification in feature-length films using
global face-name matching. IEEE Transactions on
Multimedia, 11(7):1276–1288.
ACTIVE, an Extensible Cataloging Platform for Automatic Indexing of Audiovisual Content
581