SVM-based Video Segmentation and Annotation of Lectures and Conferences

Stefano Masneri, Oliver Schreer


This paper presents a classification system for video lectures and conferences based on Support Vector Machines (SVM). The aim is to classify videos into four different classes (talk, presentation, blackboard, mix). On top of this, the system further analyses presentation segments to detect slide transitions, animations and dynamic content such as video inside the presentation. The developed approach uses various colour and facial features from two different datasets of several hundred hours of video to train an SVM classifier. The system performs the classification on frame-by-frame basis and does not require pre-computed shotcut information. To avoid over-segmentation and to take advantage of the temporal correlation of succeeding frames, the results are merged every 50 frames into a single class. The presented results prove the robustness and accuracy of the algorithm. Given the generality of the approach, the system can be easily adapted to other lecture datasets.


  1. Brezeale D., Cook D. J., Automatic Video Classification: A Survey of the Literature, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions, vol. 38, issue 3, 2007.
  2. Carrato I. K., Video Segmentation: A Survey. Signal Processing: Image Communication, 477-500, 2001.
  3. Cortes C., Vapnik V., Support-vector networks. Machine Learning, 273-297, 1995.
  4. Chau M., Jay F., Nunamaker Jr., Ming L., Chen H., Segmentation of Lecture Videos Based on Text: A Method Combining Multiple Linguistic Features, Proceedings of the 37th Hawaii International Conference on System Sciences. Hawaii, USA, 2004.
  5. Friedland G., Rojas R.: Anthropocentric Video Segmentation for Lecture Webcasts. EURASIP Journal on Image and Video Processing, Volume 2008, Hindawi Publishing Corporation, 2008.
  6. Hauptmann A., Yan R., Qi Y., Jin R., Christel M., Derthick M., Chen M.-Y., Baron R., Lin W.-H., Ng T.D., Video classification and retrieval with the informedia digital video library system, Text Retrieval Conf. (TREC 2002), pp. 119-127, Gaithersburg, MD.
  7. Huang J. et al., Integration of multimodal features for video scene classification based on HMM, Multimedia Signal Processing, 1999 IEEE 3rd Workshop on. IEEE, 1999.
  8. Kalaiselvi Geetha M., Palanivel S., Ramalingam V., A novel block intensity comparison code for video classification and retrieval, Expert Systems with Applications, Volume 36, Issue 3, Part 2, April 2009, Pages 6415-6420,
  9. Kobla V., DeMenthon D., Doermann D., Identifying sports videos using replay, text, and camera motion features, Proc. SPIE Conf. Storage Retrieval Media Databases, 2000, pp. 332-343.
  10. Kueblbeck C., A. E. (2006). Face detection and tracking in video sequences using the modified census transformation. Journal on Image and Vision Computing, vol. 24, issue 6, pp. 564-572.
  11. Lin, C.-C. C.-J. (2011). LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, S. 1-27.
  12. Lin W.-H., Hauptmann A., News video classification using SVM-based multimodal classifiers and combination strategies, Proceedings of the tenth ACM international conference on Multimedia (MULTIMEDIA 7802). ACM, New York, NY, USA, 323-326.
  13. Malioutov M., Barzilay R., 2006. Minimum cut model for spoken lecture segmentation, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics., pp. 25-32.
  14. Moncrieff S., Venkatesh S., Dorai C., Horror film genre typing and scene labeling via audio analysis, Proc. Int. Conf. Multimedia Expo (ICME 2003), vol. 1, pp. 193-196.
  15. Mukhopadhyay S., Smith B., Passive capture and structuring of lectures. Proceedings of the seventh ACM international conference on Multimedia (Part 1) (MULTIMEDIA 7899). ACM, New York, NY, USA, 477-487.
  16. Ngo C.-W., Wang F., Pong T.-C., Structuring Lecture Videos for Distance Learning Applications, Proc. Multimedia Software Eng., pp. 215-222, 2003.
  17. Pan J.-Y., Faloutsos C., Videocube: A novel tool for video mining and classification, Int. Conf. Asian Digit. Libr., Singapore, 2002.
  18. Roach M., Mason J., Xu L.-Q., Video genre verification using both acoustic and visual modes, Multimedia Signal Processing, 2002 IEEE Workshop on. IEEE, 2002.
  19. Robson G. D., The Closed Captioning Handbook. Burlington, MA. Focal Press, 2004.
  20. Subashini, K., Palanivel , S. and Ramalingam , V. , AudioVideo Based Classification Using SVM. The IUP Journal of Science & Technology, Vol. 7, No. 1, pp. 44-53, March 2011.
  21. Vakkalanka S., Krishna Mohan, C., Kumara Swamy, R., Yegnanarayana, B., Content-Based Video Classification Using Support Vector Machines. ICONIP 2004: 726-731.
  22. Vapnik, V.N. The Nature of Statistical Learning Theory. 2nd ed. Springer, New York, 2000.
  23. Yamamoto N., Ogata J., Ariki Y., Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition, Proc. Eurospeech, 2003, pp. 961-964.
  24. Zhang H., Kankanhalli A., Smoliar, S.W., Automatic partitioning of full-motion video. Multimedia Syst., 1(1):10-28, Jan. 1993.
  25. TED,, last checked on July 2013.
  26. VideoLectures,, last checked on July 2013.

Paper Citation

in Harvard Style

Masneri S. and Schreer O. (2014). SVM-based Video Segmentation and Annotation of Lectures and Conferences . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 425-432. DOI: 10.5220/0004686004250432

in Bibtex Style

author={Stefano Masneri and Oliver Schreer},
title={SVM-based Video Segmentation and Annotation of Lectures and Conferences},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},

in EndNote Style

JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - SVM-based Video Segmentation and Annotation of Lectures and Conferences
SN - 978-989-758-004-8
AU - Masneri S.
AU - Schreer O.
PY - 2014
SP - 425
EP - 432
DO - 10.5220/0004686004250432