Integrating Information Retrieval and Large Language Models for Vietnamese Legal Document Query Systems
Pham Hien, Duong Ngoc Thao Nhi, Pham Thi Ngoc Huyen
2025
Abstract
The complexity of Vietnamese legal documents poses significant challenges in accessing legal information for both professionals and the general public. Traditional legal information retrieval methods are time-consuming and require specialized expertise to navigate the intricate hierarchy of laws, decrees, and regulations. This paper introduces a Vietnamese legal document query system that integrates Information Retrieval (IR) techniques with Large Language Models (LLMs) to automate legal document access and provide accurate, context-aware responses to legal queries. The proposed system employs a Retrieval-Augmented Generation (RAG) architecture, combining vector-based document retrieval with LLMs to generate precise, context-informed answers. Key components include a document indexing module processing 45,000 Vietnamese legal documents, a vector database for semantic search, and an LLM-powered response generation interface. The system leverages 350,000 legal Q&A pairs from authoritative sources to understand complex legal terminology and provide contextually relevant responses. In a comprehensive evaluation, the system was assessed using both performance metrics (via RAGAs framework, including context recall and ROUGE scores) and user studies involving legal professionals and law students. The results indicate that integrating IR with LLMs substantially improves the relevance and accuracy of legal responses, reducing response time by 58%. Users reported high satisfaction levels (average 4.23/5) with the system's ability to answer complex legal queries, achieving 89% accuracy across 12 legal categories. Overall, our findings demonstrate that IR-augmented LLM systems can effectively automate legal information access, alleviating professional workloads and democratizing legal knowledge access in Vietnamese legal contexts.
DownloadPaper Citation
in Harvard Style
Hien P., Nhi D. and Huyen P. (2025). Integrating Information Retrieval and Large Language Models for Vietnamese Legal Document Query Systems. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KMIS; ISBN 978-989-758-769-6, SciTePress, pages 290-300. DOI: 10.5220/0013751200004000
in Bibtex Style
@conference{kmis25,
author={Pham Hien and Duong Nhi and Pham Huyen},
title={Integrating Information Retrieval and Large Language Models for Vietnamese Legal Document Query Systems},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KMIS},
year={2025},
pages={290-300},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013751200004000},
isbn={978-989-758-769-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KMIS
TI - Integrating Information Retrieval and Large Language Models for Vietnamese Legal Document Query Systems
SN - 978-989-758-769-6
AU - Hien P.
AU - Nhi D.
AU - Huyen P.
PY - 2025
SP - 290
EP - 300
DO - 10.5220/0013751200004000
PB - SciTePress