Full Video Processing for Mobile Audio-Visual Identity Verification

Alexander Usoltsev, Dijana Petrovska-Delacrétaz, Khemiri Houssemeddine

Abstract

This paper describes a bi-modal biometric verification system based on voice and face modalities, which takes advantage of the full video processing instead of using still-images. The bi-modal system is evaluated on the MOBIO corpus and results show a relative improvement of performance by nearly 10% when the whole video is used. The fusion between face and speaker verification systems, using linear logistic regression weights, gives a relative improvement of performance that varies between 30% and 60% comparing to the best uni-modal system. Proof-of-concept iPad application is developed based on the proposed bi-modal system.

References

  1. Bonastre, J., Scheffer, N., Matrouf, D., Fredouille, C., Larcher, A., Preti, A., Pouchoulin, G., Evans, N., Fauve, B., and Mason, J. (2008). Alize/spkdet: a state of-the-art open source software for speaker recognition. In The Speaker and Language Recognition Workshop, Odyssey.
  2. Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. (1995). Active shape models their training and application. In Computer Vision and Image Understanding, pages 38-59.
  3. Gravier, G. (2009). Spro: Speech signal processing toolkit, release 4.1.
  4. Lowe, D. G. (2000). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91-110.
  5. McCool, C., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher, A., Levy, C., Matrouf, D., Bonastre, J.-F., Tresadern, P., and Cootes, T. (2012). Bi-modal person recognition on a mobile phone: Using mobile phone data. In Multimedia and Expo Workshops (ICMEW), 2012 IEEE International Conference on, pages 635- 640.
  6. Petrovska-Delacrétaz, D., Chollet, G., and Dorizzi, B. (2009). Guide to Biometric Reference Systems and Performance Evaluation. Springer Verlag.
  7. Reynolds, D., Quatieri, T., and Dunn, R. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(13):19 - 41.
  8. Stegmann, M. B., Ersbll, B. K., and Larsen, R. (2003). Fame a flexible appearance modelling environment. IEEE Trans. On Medical Imaging, 22(10):1319- 1331-110.
  9. Zhou, D., Petrovska-Delacrétaz, D., and Dorizzi, B. (2009). Automatic landmark location with a combined active shape model. In International Conference on Biometrics: Theory, Applications, and Systems, pages 1-7.
  10. MacLean, K., VoxForge (2012). Ken MacLean. [Online]. Available: http://www.voxforge.org/home.
  11. Phillips, P. J., Flynn, P. J., Scruggs, T., Bowyer, K. W., Chang, J., Hoffman, K., ... & Worek, W. (2005, June). Overview of the face recognition grand challenge. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 1, pp. 947-954). IEEE.
Download


Paper Citation


in Harvard Style

Usoltsev A., Petrovska-Delacrétaz D. and Houssemeddine K. (2016). Full Video Processing for Mobile Audio-Visual Identity Verification . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 552-557. DOI: 10.5220/0005667305520557


in Bibtex Style

@conference{icpram16,
author={Alexander Usoltsev and Dijana Petrovska-Delacrétaz and Khemiri Houssemeddine},
title={Full Video Processing for Mobile Audio-Visual Identity Verification},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={552-557},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005667305520557},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Full Video Processing for Mobile Audio-Visual Identity Verification
SN - 978-989-758-173-1
AU - Usoltsev A.
AU - Petrovska-Delacrétaz D.
AU - Houssemeddine K.
PY - 2016
SP - 552
EP - 557
DO - 10.5220/0005667305520557