CASCADE OF MULTI-LEVEL MULTI-INSTANCE CLASSIFIERS FOR IMAGE ANNOTATION

Cam-Tu Nguyen, Ha Vu Le, Takeshi Tokuyama

2011

Abstract

This paper introduces a new scheme for automatic image annotation based on cascading multi-level multiinstance classifiers (CMLMI). The proposed scheme employs a hierarchy for visual feature extraction, in which the feature set includes features extracted from the whole image at the coarsest level and from the overlapping sub-regions at finer levels. Multi-instance learning (MIL) is used to learn the “weak classifiers” for these levels in a cascade manner. The underlying idea is that the coarse levels are suitable for background labels such as “forest” and “city”, while finer levels bring useful information about foreground objects like “tiger” and “car”. The cascade manner allows this scheme to incorporate “important” negative samples during the learning process, hence reducing the “weakly labeling” problem by excluding ambiguous background labels associated with the negative samples. Experiments show that the CMLMI achieve significant improvements over baseline methods as well as existing MIL-based methods. improvements over baseline methods as well as existing MIL-based methods.

References

  1. Akbas, E. and Vural, F. T. Y. (2007). Automatic image annotation by ensemble of visual descriptors. In IEEE Conf. on CVPR, pages 1-8, Los Alamitos, CA, USA.
  2. Andrews, S., Hofmann, T., and Tsochantaridis, I. (2002). Multiple instance learning with generalized support vector machines. In 18th AAAI National Conference on Artificial intelligence, pages 943-944, Menlo Park, CA, USA.
  3. Barnard, K., Duygulu, P., Forsyth, D., Freitas, N. D., Blei, D. M., K, J., Hofmann, T., Poggio, T., and Shawetaylor, J. (2003). Matching words and pictures. Journal of Machine Learning Research, 3:1107-1135.
  4. Blei, D. M. and Jordan, M. I. (2003). Modeling annotated data. In Proc. of the 26th ACM SIGIR, pages 127-134.
  5. Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. PAMI, 29(3):394-410.
  6. Deselaers, T. and Ferrari, V. (2010). A conditional random field for multiple-instance learning. In Proc. of The 27th ICML, pages 287-294.
  7. Deselaers, T., Keysers, D., and Ney, H. (2008). Features for image retrieval: an experimental comparison. Inf. Retr., 11:77-107.
  8. Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In Proc. of the ACM CIVR, pages 1-8, New York, NY, USA.
  9. Duygulu, P., Barnard, K., de Freitas, J. F. G., and Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. of the 7th ECCV, pages 97-112, London, UK. Springer-Verlag.
  10. Feng, S. L., Manmatha, R., and Lavrenko, V. (2004). Multiple bernoulli relevance models for image and video annotation. In Proc. of the 2004 CVPR.
  11. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proc. of the 22nd ACM SIGIR, pages 50-57, New York, NY, USA.
  12. Jégou, H., Douze, M., and Schmid, C. (2010). Improving bag-of-features for large scale image search. Int. J. Comput. Vision, 87(3):316-336.
  13. Jeon, J., Lavrenko, V., and Manmatha, R. (2003). Automatic image annotation and retrieval using crossmedia relevance models. In Proc. of the 26th int. ACM SIGIR, pages 119-126.
  14. Jeon, J., Lavrenko, V., and Manmatha, R. (2004). Automatic image annotation of news images with large vocabularies and low quality training data. In Proc. of ACM Multimedia.
  15. Kennedy, L. S. and Chang, S.-F. (2007). A reranking approach for context-based concept fusion in video indexing and retrieval. In Proc. of the 6th ACM int. on CIVR, pages 333-340, New York, NY, USA. ACM.
  16. Lavrenko, V., Manmatha, R., and Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in Neural Information Processing Systems (NIPS'03). MIT Press.
  17. Lazebnix, S., Schmid, C., and Ponce, J. (2009). Object Categorization: Computer & Human Vision Perspectives, chapter Spatial Pyramid Matching. Cambridge University Press.
  18. Makadia, A., Pavlovic, V., and Kumar, S. (2010). Baselines for image annotation. Int. J. Comput. Vision, 90(1):88-105.
  19. Maron, O. and Lozano-Pérez, T. (1998). A framework for multiple-instance learning. In Proc. of the Conf. on Advances in Neural Information Processing Systems, NIPS 7897, pages 570-576, Cambridge, MA, USA. MIT Press.
  20. Monay, F. and Gatica-Perez, D. (2007). Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1802-1817.
  21. Nguyen, C.-T., Kaothanthong, N., Phan, X.-H., and Tokuyama, T. (2010). A feature-word-topic model for image annotation. In Proc. of the 19th ACM CIKM, pages 1481-1484.
  22. Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. of Comput. Vision, 42:145-175.
  23. Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors (1999). Advances in kernel methods: support vector learning. MIT Press, Cambridge, MA, USA.
  24. Szummer, M. and Picard, R. W. (1998). Indoor-outdoor image classification. In Proc. of the 1998 Int. Workshop on Content-Based Access of Image and Video Databases, page 42, Washington, DC, USA.
  25. Torralba, A., Murphy, K. P., and Freeman, W. T. (2010). Using the forest to see the trees: exploiting context for visual object detection and localization. Commun. ACM, 53(3):107-114.
  26. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proc. of IEEE CVPR, volume 1, pages I-511 - I-518 vol.1.
  27. Yang, C., Dong, M., and Hua, J. (2006). Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In Proc. of the 2006 IEEE CVPR, pages 2057-2063, Washington, DC, USA.
Download


Paper Citation


in Harvard Style

Nguyen C., Le H. and Tokuyama T. (2011). CASCADE OF MULTI-LEVEL MULTI-INSTANCE CLASSIFIERS FOR IMAGE ANNOTATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 14-23. DOI: 10.5220/0003634400140023


in Bibtex Style

@conference{kdir11,
author={Cam-Tu Nguyen and Ha Vu Le and Takeshi Tokuyama},
title={CASCADE OF MULTI-LEVEL MULTI-INSTANCE CLASSIFIERS FOR IMAGE ANNOTATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={14-23},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003634400140023},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - CASCADE OF MULTI-LEVEL MULTI-INSTANCE CLASSIFIERS FOR IMAGE ANNOTATION
SN - 978-989-8425-79-9
AU - Nguyen C.
AU - Le H.
AU - Tokuyama T.
PY - 2011
SP - 14
EP - 23
DO - 10.5220/0003634400140023