Toward Object Recognition with Proto-objects and Proto-scenes

Fabian Nasse, Rene Grzeszick, Gernot A. Fink

2014

Abstract

In this paper a bottom-up approach for detecting and recognizing objects in complex scenes is presented. In contrast to top-down methods, no prior knowledge about the objects is required beforehand. Instead, two different views on the data are computed: First, a GIST descriptor is used for clustering scenes with a similar global appearance which produces a set of Proto-Scenes. Second, a visual attention model that is based on hiearchical multi-scale segmentation and feature integration is proposed. Regions of Interest that are likely to contain an arbitrary object, a Proto-Object, are determined. These Proto-Object regions are then represented by a Bag-of-Features using Spatial Visual Words. The bottom-up approach makes the detection and recognition tasks more challenging but also more efficient and easier to apply to an arbitrary set of objects. This is an important step toward analyzing complex scenes in an unsupervised manner. The bottom-up knowledge is combined with an informed system that associates Proto-Scenes with objects that may occur in them and an object classifier is trained for recognizing the Proto-Objects. In the experiments on the VOC2011 database the proposed multi-scale visual attention model is compared with current state-of-the-art models for Proto-Object detection. Additionally, the the Proto-Objects are classified with respect to the VOC object set.

References

  1. Achanta, R., Hemami, S., Estrada, F., and Susstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1597-1604.
  2. Alexe, B., Deselaers, T., and Ferrari, V. (2012). Measuring the objectness of image windows. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(11):2189-2202.
  3. Borji, A. and Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):185-207.
  4. Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. In BMVC.
  5. Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., and Hu, S.-M. (2011). Global contrast based salient region detection. In IEEE CVPR, pages 409-416.
  6. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., and Hebert, M. (2009). An empirical study of context in object detection. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1271-1278. IEEE.
  7. Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval, page 19. ACM.
  8. Elazary, L. and Itti, L. (2008). Interesting objects are visually salient. Journal of Vision, 8(3):1-15.
  9. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2011). The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascalnetwork.org/challenges/VOC/voc2011/workshop/ index.html.
  10. Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627-1645.
  11. Grzeszick, R., Rothacker, L., and Fink, G. A. (2013). Bagof-features representations using spatial visual vocabularies for object classification. In IEEE Intl. Conf. on Image Processing, Melbourne, Australia.
  12. Haxhimusa, Y., Ion, A., and Kropatsch, W. G. (2006). Irregular pyramid segmentations with stochastic graph decimation strategies. In CIARP, pages 277-286.
  13. Hou, X. and Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8.
  14. Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259.
  15. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2169-2178.
  16. Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., and Shum, H.-Y. (2011). Learning to detect a salient object. IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(2):353-367.
  17. Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2):129- 137.
  18. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. Int. Journal of Computer Vision, 60(2):91-110.
  19. Nasse, F. and Fink, G. A. (2012). A bottom-up approach for learning visual object detection models from unreliable sources. In Pattern Recognition: 34th DAGMSymposium Graz.
  20. Oliva, A. (2005). Gist of the scene. Neurobiology of attention, 696:64.
  21. Oliva, A., Torralba, A., et al. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in brain research, 155:23.
  22. Rutishauser, U., Walther, D., Koch, C., and Perona, P. (2004). Is bottom-up attention useful for object recognition? In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II37-II-44 Vol.2.
  23. Walther, D., Itti, L., Riesenhuber, M., Poggio, T., and Koch, C. (2002). Attentional selection for object recognition: A gentle way. In Proceedings of the Second International Workshop on Biologically Motivated Computer Vision, BMCV 7802, pages 472-479, London, UK, UK. Springer-Verlag.
  24. Zhai, Y. and Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In Proceedings of the 14th annual ACM international conference on Multimedia, MULTIMEDIA 7806, pages 815-824, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Nasse F., Grzeszick R. and Fink G. (2014). Toward Object Recognition with Proto-objects and Proto-scenes . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 284-291. DOI: 10.5220/0004657902840291


in Bibtex Style

@conference{visapp14,
author={Fabian Nasse and Rene Grzeszick and Gernot A. Fink},
title={Toward Object Recognition with Proto-objects and Proto-scenes},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={284-291},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004657902840291},
isbn={978-989-758-004-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Toward Object Recognition with Proto-objects and Proto-scenes
SN - 978-989-758-004-8
AU - Nasse F.
AU - Grzeszick R.
AU - Fink G.
PY - 2014
SP - 284
EP - 291
DO - 10.5220/0004657902840291