How Effective Are Aggregation Methods on Binary Features?

Giuseppe Amato, Fabrizio Falchi, Lucia Vadicamo

Abstract

During the last decade, various local features have been proposed and used to support Content Based Image Retrieval and object recognition tasks. Local features allow to effectively match local structures between images, but the cost of extraction and pairwise comparison of the local descriptors becomes a bottleneck when mobile devices and/or large database are used. Two major directions have been followed to improve efficiency of local features based approaches. On one hand, the cost of extracting, representing and matching local visual descriptors has been reduced by defining binary local features. On the other hand, methods for quantizing or aggregating local features have been proposed to scale up image matching on very large scale. In this paper, we performed an extensive comparison of the state-of-the-art aggregation methods applied to ORB binary descriptors. Our results show that the use of aggregation methods on binary local features is generally effective even if, as expected, there is a loss of performance compared to the same approaches applied to non-binary features. However, aggregations of binary feature represent a worthwhile option when one need to use devices with very low CPU and memory resources, as mobile and wearable devices.

References

  1. Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). FREAK: Fast retina keypoint. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 510-517.
  2. Amato, G., Falchi, F., and Gennaro, C. (2013). On reducing the number of visual words in the bag-of-features representation. In VISAPP 2013 - Proceedings of the International Conference on Computer Vision Theory and Applications, volume 1, pages 657-662.
  3. Arandjelovic, R. and Zisserman, A. (2013). All about VLAD. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1578- 1585.
  4. Bay, H., Tuytelaars, T., and Van Gool, L. (2006). SURF: Speeded Up Robust Features. In Computer Vision - ECCV 2006, volume 3951 of Lecture Notes in Computer Science, pages 404-417. Springer Berlin Heidelberg.
  5. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer.
  6. Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). BRIEF: Binary Robust Independent Elementary Features. In Computer Vision - ECCV 2010, volume 6314 of Lecture Notes in Computer Science, pages 778- 792. Springer Berlin Heidelberg.
  7. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., and Girod, B. (2011). Residual enhanced visual vectors for on-device image matching. In Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, pages 850-854.
  8. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. Workshop on statistical learning in computer vision, ECCV, 1(1-22):1-2.
  9. Delhumeau, J., Gosselin, P.-H., Jégou, H., and Pérez, P. (2013). Revisiting the VLAD image representation. In Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pages 653-656.
  10. Galvez-Lopez, D. and Tardos, J. (2011). Real-time loop detection with bags of binary words. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 51-58.
  11. Grana, C., Borghesani, D., Manfredi, M., and Cucchiara, R. (2013). A fast approach for integrating ORB descriptors in the bag of words model. In IS&T/SPIE Electronic Imaging, volume 8667. International Society for Optics and Photonics.
  12. Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147-160.
  13. Heinly, J., Dunn, E., and Frahm, J.-M. (2012). Comparative evaluation of binary features. In Computer Vision - ECCV 2012, Lecture Notes in Computer Science, pages 759-773. Springer Berlin Heidelberg.
  14. Jaakkola, T. and Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In In Advances in Neural Information Processing Systems 11, pages 487-493. MIT Press.
  15. Jégou, H., Douze, M., and Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In European Conference on Computer Vision, volume I of LNCS, pages 304-317. Springer.
  16. Jégou, H., Douze, M., and Schmid, C. (2010a). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3):316-336.
  17. Jégou, H., Douze, M., and Schmid, C. (2011). Product quantization for nearest neighbor search. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(1):117-128.
  18. Jégou, H., Douze, M., Schmid, C., and Pérez, P. (2010b). Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision & Pattern Recognition.
  19. Jégou, H., Perronnin, F., Douze, M., Sànchez, J., Pérez, P., and Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1704-1716.
  20. Kaufman, L. and Rousseeuw, P. (1987). Clustering by means of medoids. In An introduction to L1-norm based statistical data analysis, volume 5 of Computational Statistics & Data Analysis.
  21. Lee, S., Choi, S., and Yang, H. (2015). Bag-of-binaryfeatures for fast image representation. Electronics Letters, 51(7):555-557.
  22. Leutenegger, S., Chli, M., and Siegwart, R. (2011). BRISK: Binary robust invariant scalable keypoints. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2548-2555.
  23. Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  24. Perronnin, F. and Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In Computer Vision and Pattern Recognition, 2007. CVPR 7807. IEEE Conference on, pages 1-8.
  25. Perronnin, F., Liu, Y., Sànchez, J., and Poirier, H. (2010a). Large-scale image retrieval with compressed fisher vectors. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3384- 3391.
  26. Perronnin, F., Sànchez, J., and Mensink, T. (2010b). Improving the fisher kernel for large-scale image classification. In Computer Vision - ECCV 2010, volume 6314 of Lecture Notes in Computer Science, pages 143-156. Springer Berlin Heidelberg.
  27. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition (CVPR), 2007 IEEE Conference on, pages 1-8.
  28. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8.
  29. Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2564-2571.
  30. Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
  31. Sànchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision, 105(3):222-245.
  32. Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, volume 2 of ICCV 7803, pages 1470-1477. IEEE Computer Society.
  33. Thomee, B., Bakker, E. M., and Lew, M. S. (2010). TOPSURF: A visual words toolkit. In Proceedings of the International Conference on Multimedia, MM 7810, pages 1473-1476. ACM.
  34. Tolias, G. and Jégou, H. (2013). Local visual query expansion: Exploiting an image collection to refine local descriptors. Research Report RR-8325.
  35. Van Gemert, J., Veenman, C., Smeulders, A., and Geusebroek, J.-M. (2010). Visual word ambiguity. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1271-1283.
  36. Van Opdenbosch, D., Schroth, G., Huitl, R., Hilsenbeck, S., Garcea, A., and Steinbach, E. (2014). Camerabased indoor positioning using scalable streaming of compressed binary image signatures. In IEEE International Conference on Image Processing.
  37. Zezula, P., Amato, G., Dohnal, V., and Batko, M. (2006). Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. Springer.
  38. Zhang, Y., Zhu, C., Bres, S., and Chen, L. (2013). Encoding local binary descriptors by bag-of-features with hamming distance for visual object categorization. In Advances in Information Retrieval, volume 7814 of Lecture Notes in Computer Science, pages 630-641. Springer Berlin Heidelberg.
  39. Zhao, W.-L., Jégou, H., and Gravier, G. (2013). Oriented pooling for dense and non-dense rotation-invariant features. In BMVC - 24th British Machine Vision Conference.
Download


Paper Citation


in Harvard Style

Amato G., Falchi F. and Vadicamo L. (2016). How Effective Are Aggregation Methods on Binary Features? . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 566-573. DOI: 10.5220/0005719905660573


in Bibtex Style

@conference{visapp16,
author={Giuseppe Amato and Fabrizio Falchi and Lucia Vadicamo},
title={How Effective Are Aggregation Methods on Binary Features?},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={566-573},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005719905660573},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - How Effective Are Aggregation Methods on Binary Features?
SN - 978-989-758-175-5
AU - Amato G.
AU - Falchi F.
AU - Vadicamo L.
PY - 2016
SP - 566
EP - 573
DO - 10.5220/0005719905660573