A Multiple Instance Learning Approach to Image Annotation with Saliency Map

Tran Phuong Nhung, Cam-Tu Nguyen, Jinhee Chun, Ha Vu Le, Takeshi Tokuyama

2013

Abstract

This paper presents a novel approach to image annotation based on multi-instance learning (MIL) and saliency map. Image Annotation is an automatic process of assigning labels to images so as to obtain semantic retrieval of images. This problem is often ambiguous as a label is given to the whole image while it may only corresponds to a small region in the image. As a result, MIL methods are suitable solutions to resolve the ambiguities during learning. On the other hand, saliency detection aims at detecting foreground/background regions in images. Once we obtain this information, labels and image regions can be aligned better, i.e., foreground labels (background labels) are more sensitive to foreground areas (background areas). Our proposed method, which is based on an ensemble of MIL classifiers from two views (background/foreground), improves annotation performance in comparison to baseline methods that do not exploit saliency information.

References

  1. Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS, pages 561-568.
  2. Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 29(3):394-410.
  3. Duygulu, P., Barnard, K., Freitas, J. F. G. d., and Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European Conference on Computer Vision-Part IV, ECCV 7802, pages 97- 112, London, UK.
  4. Guillaumin, M., Mensink, T., Verbeek, J., and Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the 12th International Conference on Computer Vision (ICCV), pages 309-316.
  5. Hou, X., Harel, J., and Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell., 34(1):194-201.
  6. Lavrenko, V., Manmatha, R., and Jeon, J. (2003). A model for learning the semantics of pictures. In Proccedings of the 16th Conference on Neural Information Processing Systems (NIPS'03). MIT Press.
  7. Navalpakkam, V. and Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7806, pages 2049- 2056.
  8. Nguyen, C.-T., Kaothanthong, N., Phan, X.-H., and Tokuyama, T. (2010). A feature-word-topic model for image annotation. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM 7810, pages 1481-1484, New York, USA. ACM.
  9. Nguyen, C.-T., Le, H. V., and Tokuyama, T. (2011). Cascade of multi-level multi-instance classifiers for image annotation. In KDIR 7811: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, pages 14-23, Paris, France.
  10. Qi, X. and Han, Y. (2007). Incorporating multiple svms for automatic image annotation. Pattern Recogn., 40(2):728-741.
  11. Rokach, L. (2010). Ensemble-based classifiers. Artif. Intell. Rev., 33(1-2):1-39.
  12. Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888-905.
  13. Ueli, R., Dirk, W., Christof, K., and Pietro, P. (2004). Is bottom-up attention useful for object recognition. In In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2004, pages 37-44.
  14. Yang, C., Dong, M., and Hua, J. (2006). Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR 7806, pages 2057-2063, Washington, DC, USA. IEEE Computer Society.
  15. Zhou, Z.-H. and Zhang, M.-L. (2007). Multi-instance multilabel learning with application to scene classification. In Proceedings of the 19th Conference on Neural Information Processing Systems (NIPS), pages 1609- 1616. Monreal, Canada.
Download


Paper Citation


in Harvard Style

Phuong Nhung T., Nguyen C., Chun J., Le H. and Tokuyama T. (2013). A Multiple Instance Learning Approach to Image Annotation with Saliency Map . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 152-159. DOI: 10.5220/0004543901520159


in Bibtex Style

@conference{kdir13,
author={Tran Phuong Nhung and Cam-Tu Nguyen and Jinhee Chun and Ha Vu Le and Takeshi Tokuyama},
title={A Multiple Instance Learning Approach to Image Annotation with Saliency Map},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={152-159},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004543901520159},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - A Multiple Instance Learning Approach to Image Annotation with Saliency Map
SN - 978-989-8565-75-4
AU - Phuong Nhung T.
AU - Nguyen C.
AU - Chun J.
AU - Le H.
AU - Tokuyama T.
PY - 2013
SP - 152
EP - 159
DO - 10.5220/0004543901520159