Pixel-wise Ground Truth Annotation in Videos - An Semi-automatic Approach for Pixel-wise and Semantic Object Annotation

Julius Schöning, Patrick Faion, Gunther Heidemann

Abstract

In the last decades, a large diversity of automatic, semi-automatic and manual approaches for video segmentation and knowledge extraction from video-data has been proposed. Due to the high complexity in both the spatial and temporal domain, it continues to be a challenging research area. In order to develop, train, and evaluate new algorithms, ground truth of video-data is crucial. Pixel-wise annotation of ground truth is usually time-consuming, does not contain semantic relations between objects and uses only simple geometric primitives. We provide a brief review of related tools for video annotation, and introduce our novel interactive and semi-automatic segmentation tool iSeg. Extending an earlier implementation, we improved iSeg with a semantic time line, multithreading and the use of ORB features. A performance evaluation of iSeg on four data sets is presented. Finally, we discuss possible opportunities and applications of semantic polygon-shaped video annotation, such as 3D reconstruction and video inpainting.

References

  1. Alt, H. and Guibas, L. J. (1996). Discrete geometric shapes: Matching, interpolation, and approximation: A survey. Technical report, Handbook of Computational Geometry.
  2. Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222-1239.
  3. Caselles, V., Kimmel, R., and Sapiro, G. (1997). Geodesic active contours. Int J Comput Vision, 22(1):61-79.
  4. Cobos, F. and Peetre, J. (1991). Interpolation of compact operators: the multidimensional case. Proc. Lond. Math. Soc., 3(2):371-400.
  5. Dasiopoulou, S., Giannakidou, E., Litos, G., Malasioti, P., and Kompatsiaris, Y. (2011). A survey of semantic image and video annotation tools. Lect Notes Comput Sc, pages 196-239.
  6. Doermann, D. and Mihalcik, D. (2000). Tools and techniques for video performance evaluation. International Conference on Pattern Recognition, 4:167 - 170.
  7. Ferryman, J. and Shahrokni, A. (2009). PETs2009: Dataset and challenge. IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.
  8. Gotsman, C. and Surazhsky, V. (2001). Guaranteed intersection-free polygon morphing. Comput Graph, 25(1):67-75.
  9. Höferlin, B., Höferlin, M., Heidemann, G., and Weiskopf, D. (2015). Scalable video visual analytics. Inf Vis, 14(1):10-26.
  10. 2Source code and binaries for Ubuntu, Mac OS X and Windows: https://ikw.uos.de/~cv/projects/iSeg Kurzhals, K., Bopp, C. F., Bässler, J., Ebinger, F., and Weiskopf, D. (2014a). Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. Workshop on Beyond Time and Errors Novel Evaluation Methods for Visualization.
  11. Kurzhals, K., Heimerl, F., and Weiskopf, D. (2014b). Iseecube: visual analysis of gaze data for video. Symposium on Eye Tracking Research and Applications, pages 43-50.
  12. Lowe, D. G. (2004). Distinctive Image Features from ScaleInvariant Keypoints. Int J Comput Vision, 60(2):91- 110.
  13. Muja, M. and Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. International Conference on Computer Vision Theory and Applications, 2:331-340.
  14. Multimedia Knowledge and Social Media Analytics Laboratory (2015). Video image annotation tool. http://mklab.iti.gr/project/via.
  15. Rother, C., Kolmogorov, V., and Blake, A. (2004). “GrabCut” interactive foreground extraction using iterated graph cuts. ACM Trans Graph, 23(3):309-314.
  16. Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). ORB: an efficient alternative to SIFT or SURF. IEEE International Conference on Computer Vision, pages 2564-2571.
  17. Sch öning, J. (2015). Interactive 3D reconstruction: New opportunities for getting cad-ready models. In Imperial College Computing Student Workshop, volume 49, pages 54-61. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
  18. Sch öning, J., Faion, P., and Heidemann, G. (2015). Semiautomatic ground truth annotation in videos: An interactive tool for polygon-based object annotation and segmentation. In International Conference on Knowledge Capture, pages 17:1-17:4. ACM, New York.
  19. Schroeter, R., Hunter, J., and Kosovic, D. (2003). Vannotea - A collaborative video indexing, annotation and discussion system for broadband networks. Workshop on Knowledge Markup & Semantic Annotation,pages 1-8
  20. Shneiderman, B. (1984). Response time and display rate in human performance with computers. ACM Comput Surv, 16(3):265-285.
  21. Tanisaro, P., Sch öning, J., Kurzhals, K., Heidemann, G., and Weiskopf, D. (2015). Visual analytics for video applications. it-Information Technology, 57:30-36.
  22. Wu, S., Zheng, S., Yang, H., Fan, Y., Liang, L., and Su, H. (2014). Sagta: Semi-automatic ground truth annotation in crowd scenes. IEEE International Conference on Multimedia and Expo Workshosps.
  23. Yao, A., Gall, J., Leistner, C., and Van Gool, L. (2012). Interactive object detection. International Conference on Pattern Recognition, pages 3242-3249.
  24. YouTube (2015). Statistics - youtube: https://www. youtube.com/yt/press/statistics.html.
Download


Paper Citation


in Harvard Style

Schöning J., Faion P. and Heidemann G. (2016). Pixel-wise Ground Truth Annotation in Videos - An Semi-automatic Approach for Pixel-wise and Semantic Object Annotation . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 690-697. DOI: 10.5220/0005823306900697


in Bibtex Style

@conference{icpram16,
author={Julius Schöning and Patrick Faion and Gunther Heidemann},
title={Pixel-wise Ground Truth Annotation in Videos - An Semi-automatic Approach for Pixel-wise and Semantic Object Annotation},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={690-697},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005823306900697},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Pixel-wise Ground Truth Annotation in Videos - An Semi-automatic Approach for Pixel-wise and Semantic Object Annotation
SN - 978-989-758-173-1
AU - Schöning J.
AU - Faion P.
AU - Heidemann G.
PY - 2016
SP - 690
EP - 697
DO - 10.5220/0005823306900697