Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and Quantization

Bach Duong, Bach Duong, Pham Nhat Minh

2025

Abstract

Multi-vector document retrieval systems, such as ColPali, excel in fine-grained matching for complex queries but incur significant storage and computational costs due to their reliance on high-dimensional patch embeddings and late-interaction scoring. To address these challenges, we propose HPC-ColPali, a Hierarchical Patch Compression framework that enhances the efficiency of ColPali while preserving its retrieval accuracy. Our approach integrates three innovative techniques: (1) K-Means quantization, which compresses patch embeddings into 1-byte centroid indices, achieving 32× storage reduction; (2) attention-guided dynamic pruning, utilizing Vision-Language Model attention weights to retain only the top-p% most salient patches, reducing late-interaction computation by 60% with less than 2% nDCG@10 loss; and (3) optional binary encoding of centroid indices into b-bit strings (b = ⌈log2K⌉), enabling rapid Hamming distance-based similarity search for resource-constrained environments. In domains like legal and financial analysis, where documents contain visual elements (e.g., charts in SEC filings), multi-vector models like ColPali enable precise retrieval but scale poorly. This work introduces hierarchical compression, novel in combining VLM attention pruning with quantization, reducing costs by 30-50% while preserving accuracy, as validated on ViDoRe. Evaluated on the ViDoRe and SEC-Filings datasets, HPC-ColPali achieves 30–50% lower query latency under HNSW indexing while maintaining high retrieval precision. When integrated into a Retrieval-Augmented Generation pipeline for legal summarization, it reduces hallucination rates by 30% and halves end-to-end latency. These advancements establish HPC-ColPali as a scalable and efficient solution for multi-vector document retrieval across diverse applications. Code is available at https://github.com/DngBack/HPC-ColPali.

Download


Paper Citation


in Harvard Style

Duong B. and Minh P. (2025). Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and Quantization. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 98-109. DOI: 10.5220/0013732500004000


in Bibtex Style

@conference{kdir25,
author={Bach Duong and Pham Minh},
title={Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and Quantization},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={98-109},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013732500004000},
isbn={},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and Quantization
SN -
AU - Duong B.
AU - Minh P.
PY - 2025
SP - 98
EP - 109
DO - 10.5220/0013732500004000
PB - SciTePress