Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Lucia Vadicamo

Abstract

Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.

References

  1. Amato, G., Bolettieri, P., Falchi, F., and Gennaro, C. (2013a). Large scale image retrieval using vector of locally aggregated descriptors. In Brisaboa, N., Pedreira, O., and Zezula, P., editors, Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science, pages 245-256. Springer Berlin Heidelberg.
  2. Amato, G., Bolettieri, P., Falchi, F., Gennaro, C., and Rabitti, F. (2011). Combining local and global visual feature similarity using a text search engine. In ContentBased Multimedia Indexing (CBMI), 2011 9th International Workshop on, pages 49 -54.
  3. Amato, G., Falchi, F., and Gennaro, C. (2013b). On reducing the number of visual words in the bag-of-features representation. In VISAPP 2013 - Proceedings of the International Conference on Computer Vision Theory and Applications, volume 1, pages 657-662.
  4. Amato, G., Falchi, F., Gennaro, C., and Bolettieri, P. (2014a). Indexing vectors of locally aggregated descriptors using inverted files. InProceedings of International Conference on Multimedia Retrieval, ICMR 7814, pages 439:439-439:442.
  5. Amato, G., Gennaro, C., and Savino, P. (2014b). MIFile: using inverted files for scalable approximate similarity search. Multimedia Tools and Applications, 71(3):1333-1362.
  6. Arandjelovic, R. and Zisserman, A. (2013). All about VLAD. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1578- 1585.
  7. Bay, H., Tuytelaars, T., and Van Gool, L. (2006). SURF: Speeded Up Robust Features. In Leonardis, A., Bischof, H., and Pinz, A., editors, Computer Vision - ECCV 2006, volume 3951 of Lecture Notes in Computer Science, pages 404-417. Springer Berlin Heidelberg.
  8. Boureau, Y.-L., Bach, F., LeCun, Y., and Ponce, J. (2010). Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2559-2566.
  9. Chavez, G., Figueroa, K., and Navarro, G. (2008). Effective proximity retrieval by ordering permutations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):1647 -1658.
  10. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., and Girod, B. (2011). Residual enhanced visual vectors for on-device image matching. In Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, pages 850-854.
  11. Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1-8.
  12. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. Workshop on statistical learning in computer vision, ECCV, 1(1-22):1-2.
  13. Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, SCG 7804, pages 253-262.
  14. Delhumeau, J., Gosselin, P.-H., Jégou, H., and Pérez, P. (2013). Revisiting the VLAD image representation. In Proceedings of the 21st ACM International Conference on Multimedia, MM 7813, pages 653-656.
  15. Esuli, A. (2009). MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System. In Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, SISAP 7809, pages 146-148.
  16. Fagin, R., Kumar, R., and Sivakumar, D. (2003). Comparing top-k lists. SIAM J. of Discrete Math., 17(1):134- 160.
  17. Gennaro, C., Amato, G., Bolettieri, P., and Savino, P. (2010). An approach to content-based image retrieval based on the lucene search engine library. In Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., and Frommholz, I., editors, Research and Advanced Technology for Digital Libraries, volume 6273 of Lecture Notes in Computer Science, pages 55-66. Springer Berlin Heidelberg.
  18. Jaakkola, T. and Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In In Advances in Neural Information Processing Systems 11, pages 487-493.
  19. Jégou, H. and Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., and Schmid, C., editors, Computer Vision-ECCV 2012, volume 7573 of Lecture Notes in Computer Science, pages 774-787. Springer.
  20. Jégou, H., Douze, M., and Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In Forsyth, D., Torr, P., and Zisserman, A., editors, Computer Vision - ECCV 2008, volume 5302 of Lecture Notes in Computer Science, pages 304-317. Springer Berlin Heidelberg.
  21. Jégou, H., Douze, M., and Schmid, C. (2009). Packing bagof-features. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2357 -2364.
  22. Jégou, H., Douze, M., and Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87:316-336.
  23. Jégou, H., Douze, M., Schmid, C., and Pérez, P. (2010a). Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3304-3311.
  24. Jégou, H., Perronnin, F., Douze, M., Sànchez, J., Pérez, P., and Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1704-1716.
  25. Jégou, H., Schmid, C., Harzallah, H., and Verbeek, J. (2010b). Accurate image search using the contextual dissimilarity measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(1):2-11.
  26. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2.
  27. Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  28. McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley series in probability and statistics. Wiley.
  29. Peng, X., Wang, L., Qiao, Y., and Peng, Q. (2014). Boosting vlad with supervised dictionary learning and highorder statistics. In Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T., editors, Computer Vision - ECCV 2014, volume 8691 of Lecture Notes in Computer Science, pages 660-674. Springer International Publishing.
  30. Perd'och, M., Chum, O., and Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 9-16.
  31. Perronnin, F. and Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In Computer Vision and Pattern Recognition, 2007. CVPR 7807. IEEE Conference on, pages 1-8.
  32. Perronnin, F., Liu, Y., Sanchez, J., and Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3384 -3391.
  33. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition, 2007. CVPR 2007. IEEE Conference on, pages 1-8.
  34. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8.
  35. Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
  36. Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV 7803, pages 1470-1477.
  37. Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I. Y., Tsoumakas, G., and Vlahavas, I. (2014). A comprehensive study over vlad and product quantization in large-scale image retrieval. Multimedia, IEEE Transactions on, 16(6):1713-1728.
  38. Thomee, B., Bakker, E. M., and Lew, M. S. (2010). TOPSURF: A visual words toolkit. In Proceedings of the International Conference on Multimedia, MM 7810, pages 1473-1476.
  39. Tolias, G. and Avrithis, Y. (2011). Speeded-up, relaxed spatial matching. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1653-1660.
  40. Tolias, G. and Jégou, H. (2013). Local visual query expansion: Exploiting an image collection to refine local descriptors. Research Report RR-8325, INRIA.
  41. Van Gemert, J., Veenman, C., Smeulders, A., and Geusebroek, J.-M. (2010). Visual word ambiguity. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1271-1283.
  42. Van Gemert, J. C., Geusebroek, J.-M., Veenman, C. J., and Smeulders, A. W. (2008). Kernel codebooks for scene categorization. In Forsyth, D., Torr, P., and Zisserman, A., editors, Computer Vision - ECCV 2008, volume 5304 of Lecture Notes in Computer Science, pages 696-709. Springer Berlin Heidelberg.
  43. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. (2010). Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3360-3367.
  44. Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing gigabytes: compressing and indexing documents and images. Multimedia Information and Systems Series. Morgan Kaufmann Publishers.
  45. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. InComputer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1794-1801.
  46. Zezula, P., Amato, G., Dohnal, V., and Batko, M. (2006). Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. SpringerVerlag.
  47. Zhao, W.-L., Jégou, H., and Gravier, G. (2013). Oriented pooling for dense and non-dense rotation-invariant features. In BMVC - 24th British Machine Vision Conference.
Download


Paper Citation


in Harvard Style

Amato G., Bolettieri P., Falchi F., Gennaro C. and Vadicamo L. (2016). Using Apache Lucene to Search Vector of Locally Aggregated Descriptors . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 383-392. DOI: 10.5220/0005722503830392


in Bibtex Style

@conference{visapp16,
author={Giuseppe Amato and Paolo Bolettieri and Fabrizio Falchi and Claudio Gennaro and Lucia Vadicamo},
title={Using Apache Lucene to Search Vector of Locally Aggregated Descriptors},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={383-392},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005722503830392},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Using Apache Lucene to Search Vector of Locally Aggregated Descriptors
SN - 978-989-758-175-5
AU - Amato G.
AU - Bolettieri P.
AU - Falchi F.
AU - Gennaro C.
AU - Vadicamo L.
PY - 2016
SP - 383
EP - 392
DO - 10.5220/0005722503830392