A Survey of Extended Methods to the Bag of Visual Words for Image Categorization and Retrieval

Mouna Dammak, Mahmoud Mejdoub, Chokri Ben Amar

2014

Abstract

The semantic gap is a crucial issue in the enhancement of computer vision. The user longs for retrieving images on a semantic level, but the image characterizations can only give a low-level similarity. As a result, recording a stage medium between high-level semantic concepts and low-level visual features is a stimulating task. A recent work, called Bag of visual Words (BoW) have arisen to resolve this difficulty in greater generality through the conception of techniques genius relevantly learning semantic vocabularies. In spite of its clarity and effectiveness, the building of a codebook is a critical step which is ordinarily performed by coding and pooling step. Yet, it is still difficult to build a compact codebook with shortened calculation cost. For that, several approaches try to overcome these difficulties and to improve image representation. In this paper, we introduce a survey investigates to cover the inadequacy of a full description of the most important public approaches for image categorization and retrieval.

References

  1. Avrithis, S. and Kalantidis, Y. (2012). Approximate gaussian mixtures for large scale vocabularies. In European Conference on Computer Vision, volume 7574, pages 15-28. Springer.
  2. Bay, H., Tuytelaars, T., and Gool, L. (2006). Surf speeded up robust features. In European Conference on Computer Vision.
  3. Bingbing, N., Shuicheng, Y., Meng, W., Kassim, A., and Qi, T. (2013). High order local spatial context modeling by spatialized random forest. IEEE Transactions on Image Processing, 22(2).
  4. Bosch, A., Zisserman, A., and Muoz., X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions Pattern Analysis and Machine Intelligents, 30:712-727.
  5. Bowen, F., Du, E. Y., and Hu, J. (2012). A novel graphbased invariant region descriptor for image matching. In EIT.
  6. Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., and Fua, P. (2012). Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1281- 1298.
  7. Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010). Spatial bag of features. In CVPR.
  8. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22.
  9. Csurka, G. and Perronnin, F. (2010). Fisher vectors: Beyond bag-of-visual-words image representations. In VISIGRAPP, pages 28-42.
  10. Duchenne, O., Joulin, A., and Ponce, J. (2011). A graphmatching kernel for object categorization. In ICCV.
  11. Farquhar, J., Szedmak, S., Meng, H., and Shawe-Taylor, J. (2005). Improving bag-of keypoints image categorisation. Technical report, University of Southampton.
  12. Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. (2003). An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933-969.
  13. G. Mclachlan, D. P. (2000). Finite mixture models.
  14. Gemert, J., Veenman, C., and Geusebroek, J. (2010). Visualword ambiguity. TPAMI.
  15. Guo, X. and Cao, X. (2010). Find: A neat flip invariant descriptor. In 20th International Conference on Pattern Recognition, pages 515-518.
  16. Herve, N. and Boujemaa, N. (2009). Visual word pairs for automatic image annotation. In Proceedings of the 2009 IEEE international conference on Multimedia and Expo, ICME 09.
  17. H.Jeou, Perronnin, F., Douze, M., Sanchez, J., Perez, P., and Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9).
  18. Jaechul Kim, K. G. (2010). Asymmetric region to image matching for comparing images with generic object categories. In CVPR.
  19. Jiang, Y., Meng, J., and Yuan, J. (2012). Randomized visual phrases for object search. In CVPR.
  20. Khan, R., Barat, C., Muselet, D., and Ducottet, C. (2012). Spatial orientations of visual word pairs to improve bag-of-visual-words model. In BMVC.
  21. Kisku, D. R., Rattani, A., Grosso, E., and Tistarelli, M. (2010). Face identification by sift-based complete graph topology. CoRR.
  22. Leordeanu, M. and Hebert, M. (2005). A spectral technique for correspondence problems using pairwise constraints. In Tenth IEEE International Conference on Computer Vision, pages 1482-1489.
  23. Liu, L., Wang, L., and Liu, X. (2011). In defense of softassignment coding. In International Conference on Computer Vision, ICCV 7811, pages 2486-2493.
  24. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal on Computer Vision, 60(2).
  25. Ma, R., Chen, J., and Su, Z. (2010). Mi-sift: mirror and inversion invariant generalization for sift descriptor. International Conf. on Image and Video Retrieval, pages 228-236.
  26. Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19- 60.
  27. Morioka, N. and Satoh, S. (2010a). Building compact local pairwise codebook with joint feature space clustering. In 11th European conference on Computer vision, ECCV10.
  28. Morioka, N. and Satoh, S. (2010b). Learning directional local pairwise bases with sparse coding. In BMVC.
  29. Morioka, N. and Satoh, S. (2011). Compact correlation coding for visual object categorization. In ICCV.
  30. Pham, T., Mulhem, P., Maisonnasse, L., Gaussier, E., and Lim, J. (2012). Visual graph modeling for scene recognition and mobile robot localization. Multimedia Tools Appl., 60(2).
  31. Picard, D. and Gosselin, P. (2013). Efficient image signatures and similarities using tensor products of local descriptors. Computer Vision and Image Understanding, 117(6):680-687.
  32. Quack, T., Ferrari, V., Leibe, B., and Gool, L. V. (2007). Efficient mining of frequent and distinctive feature configurations. In International Conference on Computer Vision).
  33. Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, volume 2, pages 1470-1477.
  34. Smeulders, A., M.Worring, Santini, S., Gupta, A., and Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-1380.
  35. Su, Y. and Jurie, F. (2011). Semantic contexts and fisher vectors for the imageclef 2011 photo annotation task. In CLEF (Notebook Papers/Labs/Workshop).
  36. Tuytelaars, T. (2010). Dense interest points. In Computer Vision and Pattern Recognition, pages 2281-2288.
  37. van de Sande, K. E. A., Gevers, T., and Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1582-1596.
  38. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR, pages 3360-3367.
  39. Wang, L., Song, D., and Elyan, E. (2012). Improving bagof-visual-words model with spatial-temporal correlation for video retrieval. In 21st ACM international conference on Information and knowledge management, CIKM 12.
  40. Wu, Z., Huang, Y., Wang, L., and Tan, T. (2013). Spatial graph for image classification. In 11th Asian conference on Computer Vision, ACCV 12.
  41. Xie, L., Tian, Q., and Zhang, B. (2012). Spatial pooling of heterogeneous features for image applications. In 20th ACM international conference on Multimedia, MM 12.
  42. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition.
  43. Zhang, E. and Mayo, M. (2010). Improving bag-of-words model with spatial information. In 25th International Conference of Image and Vision Computing New Zealand.
  44. Zhang, S., Tian, Q., Hua, G., Huang, Q., and Gao, W. (2011a). Generating descriptive visual words and visual phrases for large scale image applications. IEE Transacton on Imgage Processing, 20(9).
  45. Zhang, Y. and Chen, T. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.
  46. Zhang, Y., Jia, Z., and Chen, T. (2011b). Image retrieval with geometry-preserving visual phrases. In CVPR.
  47. Zheng, Y., Zhao, M., Neo, S., Chua, T., and Tian, Q. (2008). Visual synset: Towards a higher level visual representation. In Computer Vision and Pattern Recognition.
  48. Zhou, X., Yu, K., Zhang, T., and Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In 11th European conference on Computer vision, ECCV'10, pages 141-154.
Download


Paper Citation


in Harvard Style

Dammak M., Mejdoub M. and Ben Amar C. (2014). A Survey of Extended Methods to the Bag of Visual Words for Image Categorization and Retrieval . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 676-683. DOI: 10.5220/0004750506760683


in Bibtex Style

@conference{visapp14,
author={Mouna Dammak and Mahmoud Mejdoub and Chokri Ben Amar},
title={A Survey of Extended Methods to the Bag of Visual Words for Image Categorization and Retrieval},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={676-683},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004750506760683},
isbn={978-989-758-004-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - A Survey of Extended Methods to the Bag of Visual Words for Image Categorization and Retrieval
SN - 978-989-758-004-8
AU - Dammak M.
AU - Mejdoub M.
AU - Ben Amar C.
PY - 2014
SP - 676
EP - 683
DO - 10.5220/0004750506760683