Keyword(s):High-Dimensional Vector Streams, Approximate Nearest Neighbor Search, Product Quantization, Hierarchical Navigable Small World Graphs, Classified Ad, Trading Platform.

Abstract: This paper addresses the vector stream similarity search problem, defined as: “Given a (high-dimensional) vector q and a time interval T, find a ranked list of vectors, retrieved from a vector stream, that are similar to q and that were received in the time interval T.” The paper first introduces a family of methods, called staged vector stream similarity search methods, or briefly SVS methods, to help solve this problem. SVS methods are continuous in the sense that they do not depend on having the full set of vectors available beforehand, but adapt to the vector stream. The paper then presents experiments to assess the performance of two SVS methods, one based on product quantization, called staged IVFADC, and another based on Hierarchical Navigable Small World graphs, called staged HNSW. The experiments with staged IVFADC use well-known image datasets, while those with staged HNSW use real data. The paper concludes with a brief description of a proof-of-concept implementation of a classified ad retrieval tool that uses staged HNSW.(More)

This paper addresses the vector stream similarity search problem, defined as: “Given a (high-dimensional) vector q and a time interval T, find a ranked list of vectors, retrieved from a vector stream, that are similar to q and that were received in the time interval T.” The paper first introduces a family of methods, called staged vector stream similarity search methods, or briefly SVS methods, to help solve this problem. SVS methods are continuous in the sense that they do not depend on having the full set of vectors available beforehand, but adapt to the vector stream. The paper then presents experiments to assess the performance of two SVS methods, one based on product quantization, called staged IVFADC, and another based on Hierarchical Navigable Small World graphs, called staged HNSW. The experiments with staged IVFADC use well-known image datasets, while those with staged HNSW use real data. The paper concludes with a brief description of a proof-of-concept implementation of a classified ad retrieval tool that uses staged HNSW.

Guests can use SciTePress Digital Library without having a SciTePress account. However, guests have limited access to downloading full text versions of papers and no access to special options.

Guests can use SciTePress Digital Library without having a SciTePress account. However, guests have limited access to downloading full text versions of papers and no access to special options.

Pinheiro, J.; Borges, L.; Martins da Silva, B.; Leme, L. and Casanova, M. (2023). Indexing High-Dimensional Vector Streams. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-648-4; ISSN 2184-4992, SciTePress, pages 32-43. DOI: 10.5220/0011758900003467

@conference{iceis23, author={João Pinheiro. and Lucas Borges. and Bruno {Martins da Silva}. and Luiz Leme. and Marco Casanova.}, title={Indexing High-Dimensional Vector Streams}, booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS}, year={2023}, pages={32-43}, publisher={SciTePress}, organization={INSTICC}, doi={10.5220/0011758900003467}, isbn={978-989-758-648-4}, issn={2184-4992}, }

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS TI - Indexing High-Dimensional Vector Streams SN - 978-989-758-648-4 IS - 2184-4992 AU - Pinheiro, J. AU - Borges, L. AU - Martins da Silva, B. AU - Leme, L. AU - Casanova, M. PY - 2023 SP - 32 EP - 43 DO - 10.5220/0011758900003467 PB - SciTePress

- Science and Technology Publications, Lda.RESOURCES