SEMANTIC CLASS DETECTORS IN VIDEO GENRE RECOGNITION

Michal Hradiš, Ivo Řezníček, Kamil Behúň

2012

Abstract

This paper presents our approach to video genre recognition which we developed for MediaEval 2011 evaluation. We treat the genre recognition task as a classification problem. We encode visual information in standard way using local features and Bag of Word representation. Audio channel is parameterized in similar way starting from its spectrogram. Further, we exploit available automatic speech transcripts and user generated meta-data for which we compute BOW representations as well. It is reasonable to expect that semantic content of a video is strongly related to its genre, and if this semantic information was available it would make genre recognition simpler and more reliable. To this end, we used annotations for 345 semantic classes from TRECVID 2011 semantic indexing task to train semantic class detectors. Responses of these detectors were then used as features for genre recognition. The paper explains the approach in detail, it shows relative performance of the individual features and their combinations measured on MediaEval 2011 genre recognition dataset, and it sketches possible future research. The results show that, although, meta-data is more informative compared to the content-based features, results are improved by adding content-based information to the meta-data. Despite the fact that the semantic detectors were trained on completely different dataset, using them as feature extractors on the target dataset provides better result than the original low-level audio and video features.

References

  1. Ayache, S. and Quénot, G. (2007). Evaluation of active learning strategies for video indexing. Signal Processing: Image Communication, 22(7-8):692-704.
  2. Ayache, S. and Quénot, G. (2008). Video Corpus Annotation Using Active Learning. In Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., and White, R., editors, Advances in Information Retrieval, volume 4956 of Lecture Notes in Computer Science, pages 187-198. Springer Berlin / Heidelberg.
  3. Ayache, S., Quénot, G., and Gensel, J. (2006). CLIPS-LSR Experiments at TRECVID 2006. In TREC Video Retrieval Evaluation Online Proceedings. TRECVID.
  4. Beran, V., Hradis, M., Otrusina, L., and Reznicek, I. (2011). Brno University of Technology at TRECVid 2011. In TRECVID 2011: Participant Notebook Papers and Slides, Gaithersburg, MD, US. National Institute of Standards and Technology.
  5. Brezeale, D. and Cook, D. J. (2008). Automatic Video Classification: A Survey of the Literature. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 38(3):416-430.
  6. Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273-297.
  7. Gauvain, J.-L., Lamel, L., and Adda, G. (2002). The LIMSI Broadcast News transcription system. Speech Communication, 37(1-2):89-108.
  8. Hradis, M., Reznicek, I., and Behun, K. (2011). Brno University of Technology at MediaEval 2011 Genre Tagging Task. In Working Notes Proceedings of the MediaEval 2011 Workshop, Pisa, Italy.
  9. Ionescum, B., Seyerlehner, K., Vertan, C., and Lamber, P. (2011). Audio-Visual Content Description for Video Genre Classi?cation in the Context of Social Media. In MediaEval 2011 Workshop, Pisa, Italy.
  10. Larson, M., Eskevich, M., Ordelman, R., Kofler, C., Schmeideke, S., and Jones, G. J. F. (2011). Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task. In MediaEval 2011 Workshop, Pisa, Italy.
  11. Le, Q. V., Zou, W. Y., Yeung, S. Y., and Ng, A. Y. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Learning, pages 1-4.
  12. Lowe, D. G. (1999). Object Recognition from Local ScaleInvariant Features. In ICCV 7899: Proceedings of the International Conference on Computer Vision-Volume 2, page 1150, Washington, DC, USA. IEEE Computer Society.
  13. Mikolajczyk, K. (2004). Scale & Affine Invariant Interest Point Detectors. International Journal of Computer Vision, 60(1):63-86.
  14. Mikolajczyk, K. and Schmid, C. (2005). A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 27(10):1615-1630.
  15. Perronnin, F., Senchez, J., and Xerox, Y. L. (2010). Largescale image categorization with explicit data embedding. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2297- 2304, San Francisco, CA.
  16. Rouvier, M. and Linares, G. (2011). LIA @ MediaEval 2011 : Compact Representation of Heterogeneous Descriptors for Video Genre Classi?cation. In MediaEval 2011 Workshop, Pisa, Italy.
  17. Smeaton, A. F., Over, P., and Kraaij, W. (2009). High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. In Divakaran, A., editor, Multimedia Content Analysis, Theory and Applications, pages 151-174. Springer Verlag, Berlin.
  18. Snoek, C. G. M., van de Sande, K. E. A., de Rooij, O., Huurnink, B., Gavves, E., Odijk, D., de Rijke, M., Gevers, T., Worring, M., Koelma, D. C., and Smeulders, A. W. M. (2010). The MediaMill TRECVID 2010 Semantic Video Search Engine. In TRECVID 2010: Participant Notebook Papers and Slides.
  19. Van de Sande, K. E. A., Gevers, T., and Snoek, C. G. M. (2010). Evaluating Color Descriptors for Object and Scene Recognition. fIEEEg Transactions on Pattern Analysis and Machine Intelligence, 32(9):1582-1596.
  20. Van Gemert, J. C., Veenman, C. J., Smeulders, A. W. M., and Geusebroek, J. M. (2010). Visual Word Ambiguity. PAMI, 32(7):1271-1283.
  21. You, J., Liu, G., and Perkis, A. (2010). A semantic framework for video genre classification and event analysis. Signal Processing Image Communication, 25(4):287- 302.
Download


Paper Citation


in Harvard Style

Hradiš M., Řezníček I. and Behúň K. (2012). SEMANTIC CLASS DETECTORS IN VIDEO GENRE RECOGNITION . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012) ISBN 978-989-8565-03-7, pages 640-646. DOI: 10.5220/0003868706400646


in Bibtex Style

@conference{visapp12,
author={Michal Hradiš and Ivo Řezníček and Kamil Behúň},
title={SEMANTIC CLASS DETECTORS IN VIDEO GENRE RECOGNITION},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)},
year={2012},
pages={640-646},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003868706400646},
isbn={978-989-8565-03-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)
TI - SEMANTIC CLASS DETECTORS IN VIDEO GENRE RECOGNITION
SN - 978-989-8565-03-7
AU - Hradiš M.
AU - Řezníček I.
AU - Behúň K.
PY - 2012
SP - 640
EP - 646
DO - 10.5220/0003868706400646