Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding

Joanna Isabelle Olszewska

Abstract

Reliable detection of objects of interest in complex visual scenes is of prime importance for video-surveillance applications. While most vision approaches deal with tracking visible or partially visible objects in single or multiple video streams, we propose a new approach to automatically detect all objects of interest being part of an analyzed scene, even those entirely hidden in a camera view whereas being present in the scene. For that, we have developed an innovative artificial-intelligence framework embedding a computer vision process fully integrating symbolic knowledge-based reasoning. Our system has been evaluated on standard datasets consisting of video streams with real-world objects evolving in cluttered, outdoor environment under difficult lighting conditions. Our proposed approach shows excellent performance both in detection accuracy and robustness, and outperforms state-of-the-art methods.

References

  1. Albanese, M., Molinaro, C., Persia, F., Picariello, A., and Subrahmanian, V. S. (2011). Finding unexplained activities in video. In Proceedings of the AAAI International Joint Conference on Artificial Intelligence, pages 1628-1634.
  2. Bai, L., Lao, S., Jones, G. J. F., and Smeaton, A. F. (2007). Video semantic content analysis based on ontology. In Proceedings of the IEEE International Machine Vision and Image Processing Conference, pages 117- 124.
  3. Berclaz, J., Fleuret, F., Tueretken, E., and Fua, P. (2011). Multiple object tracking using K-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9):1806-1819.
  4. Bernardin, K. and Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008:1-10.
  5. Bhat, M. and Olszewska, J. I. (2014). DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams. In Proceedings of the ACL International Conference on Computational Linguistics Workshop, pages 87-94.
  6. Chen, L., Wei, H., and Ferryman, J. (2014). ReadingAct RGB-D action dataset and human action recognition from local features. Pattern Recognition Letters, 50:159-169.
  7. Dai, X. and Payandeh, S. (2013). Geometry-based object association and consistent labeling in multi-camera surveillance. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 3(2):175-184.
  8. Evans, M., Osborne, C. J., and Ferryman, J. (2013). Multicamera object detection and tracking with object size estimation. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 177-182.
  9. Ferrari, V., Tuytelaars, T., and Gool, L. V. (2006). Simultaneous object recognition and segmentation from single or multiple model views. International Journal of Computer Vision, 67(2):159-188.
  10. Ferryman, J., Hogg, D., Sochman, J., Behera, A., Rodriguez-Serrano, J. A., Worgan, S., Li, L., Leung, V., Evans, M., Cornic, P., Herbin, S., Schlenger, S., and Dose, M. (2013). Robust abandoned object detection integrating wide area visual surveillance and social context. Pattern Recognition Letters, 34(7):789- 798.
  11. Fleuret, F., Berclaz, J., Lengagne, R., and Fua, P. (2008). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):267-282.
  12. Gomez-Romero, J., Patricio, M. A., Garcia, J., and Molina, J. M. (2011). Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Systems with Applications, 38(6):7494-7510.
  13. Jeong, J.-W., Hong, H.-K., and Lee, D.-H. (2011). Ontology-based automatic video annotation technique in smart TV environment. IEEE Transactions on Consumer Electronics, 57(4):1830-1836.
  14. Kasturi, R., Goldgof, D., Soundararajan, P., Manohar, V., Garofolo, J., Boonstra, M., Korzhova, V., and Zhang, J. (2009). Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):319-336.
  15. Lehmann, J., Neumann, B., Bohlken, W., and Hotz, L. (2014). A robot waiter that predicts events by highlevel scene interpretation. In Proceedings of the International Conference on Agents and Artificial Intelligence, pages I.469-I.476.
  16. Mavrinac, A. and Chen, X. (2013). Modeling coverage in camera networks: A survey. International Journal of Computer Vision, 101(1):205-226.
  17. Natarajan, P. and Nevatia, R. (2005). EDF: A framework for semantic annotation of video. In Proceedings of the IEEE International Conference on Computer Vision Workshops, page 1876.
  18. Olszewska, J. I. (2012). Multi-target parametric active contours to support ontological domain representation. In Proceedings of the RFIA Conference, pages 779-784.
  19. Olszewska, J. I. (2013). Multi-scale, multi-feature vector flow active contours for automatic multiple-face detection. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing.
  20. Olszewska, J. I. (2015). Multi-camera video object recognition using active contours. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing, pages 379-384.
  21. Olszewska, J. I. and McCluskey, T. L. (2011). Ontologycoupled active contours for dynamic video scene understanding. In Proceedings of the IEEE International Conference on Intelligent Engineering Systems, pages 369-374.
  22. Park, H.-S. and Cho, S.-B. (2008). A fuzzy rule-based system with ontology for summarization of multi-camera event sequences. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing. LNCS 5097., pages 850-860.
  23. Remagnino, P., Shihab, A. I., and Jones, G. A. (2004). Distributed intelligence for multi-camera visual surveillance. Pattern Recognition, 37(4):675-689.
  24. Riboni, D. and Bettini, C. (2011). COSAR: Hybrid reasoning for context-aware activity recognition. Personal and Ubiquitous Computing, 15(3):271-289.
  25. Sridhar, M., Cohn, A. G., and Hogg, D. C. (2010). Unsupervised learning of event classes from video. In Proceedings of the AAAI International Conference on Artificial Intelligence, pages 1631-1638.
  26. Vrusias, B., Makris, D., Renno, J.-P., Newbold, N., Ahmad, K., and Jones, G. (2007). A framework for ontology enriched semantic annotation of CCTV video. In Proceedings of the IEEE International Workshop on Image Analysis for Multimedia Interactive Services, page 5.
  27. Yilmaz, A., Javed, O., and Shah, M. (2006). Object Tracking: A Survey. ACM Computing Surveys, 38(4):13.
Download


Paper Citation


in Harvard Style

Olszewska J. (2016). Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 223-229. DOI: 10.5220/0005852302230229


in Bibtex Style

@conference{icaart16,
author={Joanna Isabelle Olszewska},
title={Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={223-229},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005852302230229},
isbn={978-989-758-172-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding
SN - 978-989-758-172-4
AU - Olszewska J.
PY - 2016
SP - 223
EP - 229
DO - 10.5220/0005852302230229