JURISMIND: Context-Driven Retrieval for Accurate and Relevant Legal

Question-Answering in Patent Filings

Pandey Shourya Prasad

1,∗

, Vidhish Trivedi

1,∗

, Madhav Rao

and Srijoni Sen

International Institute of Information Technology, Bangalore, India

University of Tilburg, Netherlands

Keywords:

Large Language Model (LLM), Legal Domain, Retrieval Augmented Generation (RAG), Information

Retrieval.

Abstract:

Large Language Models (LLMs) have demonstrated strong performance in domain-speciﬁc conversational

forums, but they often suffer from hallucinations—producing factually incorrect or contextually irrelevant

responses. This issue is particularly critical in the legal domain, where accuracy is paramount. Existing

solutions such as ﬁne-tuning and static retrieval methods struggle to handle the complexities of legal language

and often fail to provide sufﬁcient contextual grounding. To address this, we propose JURISMIND, a context-

driven retrieval-augmented generation (RAG) pipeline designed for the legal domain, with a focus on Patent

Filing. Our approach retrieves relevant legal texts, case law, and statutes based on the input query. This

retrieved context is combined with a base prompt and the user query, guiding the language model to respond

using the provided legal context. This method signiﬁcantly reduces hallucinations and improves the contextual

accuracy of responses. Preliminary evaluation indicates that 56.32% of responses are in strong agreement and

27.59% in fair agreement with ground truth, totaling 83.91% alignment. Furthermore, JURISMIND achieves

a BERTScore of 0.91, outperforming the 0.838 BERTScore of a pretrained LLaMA-based model. The code

and dataset are publicly released to support adoption and further research in the developer community.

1 INTRODUCTION

In recent years, the application of AI in domain-

speciﬁc tasks has seen signiﬁcant progress in various

domains like healthcare(Naz et al., 2024; Shokrol-

lahi et al., 2023), agriculture (Cravero and Sep

ulveda,

2021; Aashu et al., 2024) and others. There have

also been many methods(Pandey et al., 2023b; Guo

et al., 2016; Pandey et al., 2023a) proposed in vari-

ous ﬁelds to enhance the efﬁciency of computations

and results. Along similar lines, it can also be partic-

ularly useful in legal question-answering (QA) sys-

tems, where precision and context are critical. The

legal domain is accumulated with the procedures of

law, amendment to the same, and new ones inaddition

to the cases where certain new decision was consid-

ered. This makes the retrieval process extremely te-

dious if one wants to know more about legal domain

for a speciﬁc scenario. Hence technology usage in

the AI tool will be preferred over any other automa-

tion choices. Retrieval-augmented generation (RAG)

has emerged as a promising approach (Bayarri-Planas

∗

These authors contributed equally to this work.

et al., 2024; Krishna et al., 2024), integrating retrieval

mechanisms with generative large language models

(LLMs) to improve response accuracy. However, they

continue to face issues like hallucinations (Huang

et al., 2023; Xu et al., 2024), where the formulated re-

sponses are the generation of content that is irrelevant,

made up, or inconsistent with the input data. This

challenge becomes especially substantial in the legal

domain, where mis-interpretation or error has serious

consequences. For example, a new applicant pursuing

legal services in the form of ﬁling patent is expected

to ﬁnd the most accurate approach to proceed with the

services. However with mis-directed path, the appli-

cant not only suffers from bearing extra cost but also

squanders away the effort and time without achieving

effective outcome. Hence in the patent ﬁling process,

it is essential to ﬁnd appropriate responses towards

submission of the application.

Existing mitigation strategies for hallucinations,

such as using Epistemic Neural Networks (Verma

et al., 2023), ﬁne-tuning (Gekhman et al., 2024; Ball-

out et al., 2024) or static retrieval, often fail to ad-

dress the intricate nuances of legal language and

Prasad, P. S., Trivedi, V., Rao, M. and Sen, S.

JURISMIND: Context-Driven Retrieval for Accurate and Relevant Legal Question-Answering in Patent Filings.

DOI: 10.5220/0013716600004000

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2025) - Volume 1: KDIR, pages 323-329

323

the complex relationships within legal texts. Con-

sequently, they lack the granularity necessary for as-

sessing retrieval quality in RAG systems within the

legal domain, where accurate retrieval and contex-

tual relevance are paramount in minimizing halluci-

nation. Techniques like cosine similarity with early

exiting(Chen et al., 2024; Pandey et al., 2025) are also

adopted to improve the results.

To address these issues, this paper introduces a

context-driven RAG pipeline explicitly tailored for

the Patent ﬁling of the legal domain, focusing on min-

imizing hallucinations in legal QA. We have consid-

ered Patent ﬁling as one such application within the

legal domain to showcase the impact, so that it could

be generalized in the future. Besides, Patent ﬁling

process is also not straight-forward considering the

number of checks, an applicant has to perform based

on the jurisdiction. Procedurally, the applicant needs

to be aware of the holding and selling rights, apart

from the duration of this rights, and its limitations

based on the jurisdictions covered (Gaff and Dom-

browski, 2013). The deadline for ﬁling, public dis-

closure part, technology novelty, and cost for ﬁling

are few other potential factors that decides the ﬁling

process. A well articulated process along these lines

will be valuable for the applicant (Raghupathi et al.,

2018). Our proposed approach retrieves relevant le-

gal documents, case law, and statutes based on the

query and combines them with the language model

prompt to guide the response. Preliminary evalua-

tions of this approach demonstrate a high degree of

alignment with ground truth answers, underscoring its

potential to improve accuracy and reliability in legal

applications. The proposed pipeline and dataset are

made publicly available (Anonymous, ), facilitating

further research and development in legal QA.

2 RELATED WORKS

2.1 Retrieval Augmented Generation

(RAG)

Zorik Gekhman et al. (Lewis et al., 2021) proposes

their retrieval-augmented generation (RAG) mod-

els and highlight their effectiveness in knowledge-

intensive NLP tasks by combining parametric and

non-parametric memory. RAG frameworks integrat-

ing a pre-trained seq-2-seq model with dense vec-

tor retrieval mechanisms have shown improved fac-

tual accuracy and relevance in tasks like open-domain

question answering. Their approach surpasses tra-

ditional models by generating diverse and factually

grounded content, suggesting that RAG models of-

fer a robust, general-purpose solution for applications

requiring precise, knowledge-based language genera-

tion.

2.2 SMARThealth GPT

Al Ghadban et al. (Al Ghadban et al., 2023) presents

a case study on using retrieval-augmented genera-

tion (RAG) models to improve healthcare education

in low and middle-income countries (LMICs). Their

SMARThealth GPT model aims to provide commu-

nity health workers with accessible and relevant med-

ical information to enhance maternal care. The au-

thors use RAG techniques to demonstrate the effec-

tiveness of the model to retrieve and generate context-

speciﬁc information, addressing essential knowledge

gaps. This work illustrates the potential of RAG

models to support health workers in resource-limited

settings, improve healthcare outcomes, and possibly

other domain-speciﬁc tasks.

2.3 OwlMentor

D. Th

us et al. (Th

us et al., 2024) presents a study

on OwlMentor, an AI-powered learning platform de-

veloped using principles from User-Centered Design

(UCD) to focus on usability and usefulness. Owl-

Mentor includes document-based chats, automated

question generation, and quiz creation to facilitate in-

teractive learning. The authors have used Technology

Acceptance Model (Maranguni

c and Grani

c, 2015;

Misirlis and Munawar, 2023), to assess system accep-

tance and investigated how general self-efﬁcacy in-

ﬂuences the use and perceived effectiveness of Owl-

Mentor. This was a RAG based system to enhance the

learning in education domain.

2.4 Indian Legal Assistant

The Indian Legal Assistant(007UT, ), a LLaMA

8.03B based model ﬁnetuned on Indian legal texts.

The model is designed to assist with various legal

tasks and queries related to Indian law. It is a text gen-

eration model speciﬁcally trained to understand and

generate text related to Indian law tasks such as Legal

question answering, Case summarization, Legal doc-

ument analysis and Statute interpretation.

3 METHODOLOGY

The proposed method comprises four stages as shown

in Figure 1, namely Client Interface, Semantic Encod-

ing, Context Retrieval, Prompt Generation and Natu-

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

324

Figure 1: Flow Diagram representing an overview of the proposed method.

ral Language Generation (NLG). The initial stage of-

fers a front-end interface that allows users to submit

queries to the proposed system. Subsequent stages

process these queries to formulate a response. The

users query is converted into a vector embedding,

which is then evaluated against the vector embeddings

stored in the database using cosine similarity to assess

similarity and retrieve relevant information for further

usage.

The vector embeddings stored in our document

database, speciﬁcally related to the domain of Indian

Patent Law, are subjected to a similarity scoring pro-

cess with the embedding generated in the preceding

stage. Subsequently, the system identiﬁes the most

similar match through semantic search (Guha et al.,

2003) to the query and retrieves the corresponding

context, enabling the formulation of a more robust

and relevant response. The context retrieved from

the database is subsequently integrated with a base

prompt to enhance the output and then provided to a

large language model (LLM) to generate the response

in a speciﬁc format. The base prompt guides the LLM

in producing an answer that is succinct and consistent

with the retrieved context. A sample of base prompt

for LLM with retrieved-context and user query pre-

sented in Figure 3.

4 IMPLEMENTATION DETAILS

4.1 Dataset Details

Four publicly available documents sourced from pub-

lic websites were utilized, each containing critical

legal and procedural information related to intellec-

tual property in India. These documents cover de-

sign rules, Indian patent rules, trademark rules, and a

set of frequently asked questions addressing common

queries in these domains. As the documents were

available in PDF format, Optical Character Recogni-

tion (OCR) techniques were employed to extract the

textual content efﬁciently. This extraction process en-

sured that the embedded text, including legal provi-

sions and procedural guidelines, was accurately re-

trieved for further processing. The subsequent steps

involved in structuring and reﬁning this extracted text

for effective retrieval-augmented generation (RAG)

are described in detail below.

4.2 Database Construction

Milvus (Wang et al., 2021), an open-source vector

database, facilitates efﬁcient semantic search across

domain-speciﬁc documents for data retrieval in a

high-dimensional vector space.

A collection of documents about Indian Patent

JURISMIND: Context-Driven Retrieval for Accurate and Relevant Legal Question-Answering in Patent Filings

325

Figure 2: Flow Diagram representing an overview of the proposed method.

Law, denoted as the set D = {d

, d

, . . . , d

}, where

each document d

is chunked into smaller units, rep-

resented as pages P

= {p

, p

, . . . , p

}, where m

the number of pages in document d

. Each page p

i j

then tokenized into a sequence of words or phrases,

represented as T

i j

= {t

, t

, . . . , t

}, where k denotes

the number of tokens on page p

i j

. The tokenized se-

quences are further mapped to high-dimensional vec-

tor embeddings v

i j

∈ R

384

, using a suitable embed-

ding function f : T

i j

→ v

i j

. A sentence transformer

model, speciﬁcally all-MiniLM-L6-v2 is employed.

Formally, v

i j

= f (T

i j

), where f is an encoding func-

tion that projects the tokenized string T

i j

into a 384-

dimensional vector space R

384

. The resulting vec-

tor embeddings v

i j

are inserted into the Milvus vec-

tor database, with each embedding v

i j

represented as

a point in a vector space of 384 dimensions. Mil-

vus is designed to efﬁciently index and retrieve these

embeddings using similarity search algorithms, such

as cosine similarity (Rahutomo et al., 2012) or inner

product, to compute similarity between query embed-

dings q ∈ R

384

and the stored embeddings. The simi-

larity measure sim(q, v

i j

) is given by the cosine simi-

larity as stated in the Equation 1, where ∥v

i j

∥ and ∥q∥

are the magnitudes of the respective vectors.

sim(q, v

i j

) =

q · v

i j

∥q∥∥v

i j

∥

(1)

Figure 2 shows the overall ﬂow of database con-

struction and information retrieval using semantic

search over stored vector embeddings. These re-

trieved documents or document segments are seman-

tically aligned with the users query vector q and

provide a suitable context for generating an answer,

thereby enabling a more robust and scalable semantic

search capability over the corpus of domain-speciﬁc

documents.

4.3 Information Retrieval and Answer

Generation

The user provides a query transformed into a 384-

dimensional vector embedding. This embedding

is then compared against all stored embeddings in

the Milvus vector database using a cosine similarity

search. Based on cosine similarity, the top k matches

are selected as contextual information for generating

the ﬁnal output. These context strings are integrated

with a base prompt to produce a more structured and

relevant response. The ﬁnal structured output is gen-

erated using an LLM, Gemini-1.5-flash, and dis-

played to the user.

5 EVALUATION

To assess performance of our model for generating

accurate responses, we compared the generated an-

swers with the ground truth provided in the testing

dataset, containing 87 questions. The comparison was

performed by measuring the similarity between the

generated answers and the ground truth using GPT-4o

(OpenAI et al., 2024), with scores ranging from 0 to

100. These scores assess the model’s output with the

expected answer. For evaluation, the results were cat-

egorized into three distinct performance bands:

• Low Accuracy (0-39): Responses that exhibit

low similarity to the ground truth, indicating sig-

niﬁcant deviation or incorrectness.

• Moderate Accuracy (40-69): Moderately simi-

lar Responses, indicating partial correctness or a

reasonable attempt with some inconsistencies.

• High Accuracy (70-100): Responses that are

highly similar to the ground truth, demonstrating

high accuracy and alignment with the expected

answers.

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

326

As shown in Table 1, the model produced 14 an-

swers that fell within the Low Accuracy band, 24 an-

swers within the Moderate Accuracy band, and 49 an-

swers within the High Accuracy band. This distribu-

tion indicates that while a considerable portion of the

generated answers were moderately or highly accu-

rate, there remain instances where the model’s perfor-

mance needs improvement. The overall distribution

demonstrates that the majority of the generated an-

swers (56.32%) fell within the High Accuracy band,

reﬂecting a solid alignment with the ground truth.

Figure 3: Base prompt for LLM, with retrieved-context and

user query.

6 RESULTS

The evaluation results indicate that our proposed

retrieval-augmented generation (RAG) pipeline en-

hances the accuracy of answers in the legal domain,

reducing the frequency of hallucinations. Of the 87

tested questions, 49 answers (56.32%) achieved a

high similarity score (70-100) with the ground truth,

highlighting strong alignment with the expected le-

gal responses. Additionally, 24 answers (27.59%)

fell within the moderate accuracy range (40-69), in-

dicating partial correctness and capturing relevant le-

gal context, albeit with minor inconsistencies. How-

ever, 14 answers (16.09%) fell into the low accuracy

band (0-39), reﬂecting a need for further reﬁnement

in complex or ambiguous cases. This distribution

demonstrates that 83.91% of generated answers are

either highly or moderately aligned with the ground

truth.

7 COMPARISON WITH INDIAN

LEGAL ASSISTANT

To assess the effectiveness JurisMind, RAG-based

model, its performance against the Indian Legal As-

sistant model(007UT, ), a pre-trained LLaMA 8.03B

model speciﬁcally trained on Indian legal data. The

evaluation is conducted using standard NLP metrics,

including ROUGE, BLEU, and BERTScore.

Table 2 presents the average performance scores

of both models across multiple legal question-

answering tasks.

7.1 Evaluation Metrics

To quantitatively assess the performance of both mod-

els, the following NLP metrics:

7.1.1 (i) ROUGE Score:

Recall-Oriented Understudy for Gisting Evalua-

tion (ROUGE) measures the overlap of n-grams be-

tween the generated response and the reference an-

swer. We report:

• ROUGE-1: Unigram (single word) overlap.

• ROUGE-2: Bigram (two-word sequence) over-

lap.

• ROUGE-L: Longest Common Subsequence

(LCS) similarity.

The general formula for ROUGE-N is:

ROUGE-N =

∑

s∈Ref

∑

w∈s

Match(w)

∑

s∈Ref

∑

w∈s

Total(w)

(2)

where Match(w) represents the number of overlap-

ping n-grams between the candidate and reference

text.

7.1.2 (ii) BLEU Score:

Bilingual Evaluation Understudy (BLEU) mea-

sures how closely the generated response matches a

set of reference responses by evaluating n-gram preci-

sion with a brevity penalty. The BLEU score is com-

puted as:

BLEU = BP · exp

∑

n=1

log p

(3)

where:

• p

is the precision for n-grams.

• w

is the weight assigned to each n-gram order.

• BP (brevity penalty) is applied if the generated re-

sponse is shorter than the reference.

JURISMIND: Context-Driven Retrieval for Accurate and Relevant Legal Question-Answering in Patent Filings

327

Table 1: Distribution of Answer Similarity Scores.

Performance Band Number of Answers Percentage (%)

Low Accuracy (0-39) 14 16.09 %

Moderate Accuracy (40-69) 24 27.59 %

High Accuracy (70-100) 49 56.32 %

7.1.3 (iii) BERTScore:

BERTScore is a semantic similarity metric based on

contextual word embeddings. Instead of token over-

lap, it computes cosine similarity between embed-

dings of words in the candidate and reference sen-

tences:

BERTScore =

|S|

∑

w∈S

cos(Emb(w)

ref

, Emb(w)

cand

)

(4)

where:

• Emb(w)

ref

and Emb(w)

cand

are the contextual em-

beddings of word w in the reference and candidate

sentence, respectively.

• cos(·, ·) denotes cosine similarity.

• S is the set of all words in the reference sentence.

7.2 Analysis of Results

The performance comparison in Table 2 demonstrates

that JurisMind outperforms the Indian Legal Assis-

tant model across all evaluation metrics. The key ob-

servations are:

• Higher Precision and Recall: JurisMind

achieves a ROUGE-1 score of 0.5378, sig-

niﬁcantly surpassing ILA’s 0.2259, indicating

stronger word-level and phrase-level recall.

• Better Coherence and Fluency: With a BLEU

score of 19.8545, JurisMind exhibits better ﬂu-

ency in legal language generation.

• Stronger Contextual Understanding: The

BERTScore of 0.9100 demonstrates superior se-

mantic alignment between JurisMind’s responses

and ground truth answers.

Table 2: Average Performance Comparison of JurisMind

and Indian Legal Assistant.

Metric JurisMind Indian Legal Assistant

ROUGE-1 0.5378 0.2259

ROUGE-2 0.2667 0.0501

ROUGE-L 0.4585 0.1819

BLEU 19.8545 10.8421

BERTScore 0.9100 0.8480

8 CONCLUSION

Our work proposes a novel RAG pipeline targeted

at the legal domain, designed to address the persis-

tent challenge of hallucinations in domain-speciﬁc

question answering. By integrating dynamically re-

trieved, query-speciﬁc legal texts, our approach pro-

vides contextually accurate information that guides

the language model, signiﬁcantly reducing halluci-

nations. The high percentage of responses in agree-

ment with the ground truth highlights the potential of

this method for real-world applications, particularly

in domains demanding high accuracy, such as legal

systems. NLP metric results highlight the advantage

of using a Retrieval-Augmented Generation (RAG)

pipeline, which enables JurisMind to generate legally

accurate and contextually rich responses. While the

Indian Legal Assistant model beneﬁts from its exten-

sive pretraining on Indian legal documents, it strug-

gles with precision and fails to structure responses ef-

fectively.

REFERENCES

007UT, V. Indian legal assistant. Hugging Face. model:

https://tinyurl.com/KDIRR1.

Aashu, Rajwar, K., Pant, M., and Deep, K. (2024). Applica-

tion of machine learning in agriculture: Recent trends

and future research avenues.

Al Ghadban, Y., Lu, H. Y., Adavi, U., Sharma, A., Gara,

S., Das, N., Kumar, B., John, R., Devarsetty, P., and

Hirst, J. E. (2023). Transforming healthcare edu-

cation: Harnessing large language models for front-

line health worker capacity building using retrieval-

augmented generation. medRxiv.

Anonymous. Dataset and model ﬁles of jurismind. GDrive.

url: https://tinyurl.com/pipelineKDIR.

Ballout, M., Krumnack, U., Heidemann, G., and Kuehn-

berger, K.-U. (2024). Show me how it’s done: The

role of explanations in ﬁne-tuning language models.

Bayarri-Planas, J., Gururajan, A. K., and Garcia-Gasulla,

D. (2024). Boosting healthcare llms through retrieved

context.

Chen, W.-C., Wong, H.-S. P., and Achour, S. (2024). Bit-

wise adaptive early termination in hyperdimensional

computing inference. In Proceedings of the 61st

ACM/IEEE Design Automation Conference, DAC ’24,

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

328

New York, NY, USA. Association for Computing Ma-

chinery.

Cravero, A. and Sep

ulveda, S. (2021). Use and adaptations

of machine learning in big data—applications in real

cases in agriculture. Electronics, 10:552.

Gaff, B. M. and Dombrowski, J. M. (2013). Ten things to

know about applying for non-us patents. Computer,

46(8):9–11.

Gekhman, Z., Yona, G., Aharoni, R., Eyal, M., Feder, A.,

Reichart, R., and Herzig, J. (2024). Does ﬁne-tuning

llms on new knowledge encourage hallucinations?

Guha, R., McCool, R., and Miller, E. (2003). Seman-

tic search. In Proceedings of the 12th International

Conference on World Wide Web, WWW ’03, page

700–709, New York, NY, USA. Association for Com-

puting Machinery.

Guo, R., Kumar, S., Choromanski, K., and Simcha, D.

(2016). Quantization based fast inner product search.

In Artiﬁcial intelligence and statistics, pages 482–490.

PMLR.

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang,

H., Chen, Q., Peng, W., Feng, X., Qin, B., and Liu, T.

(2023). A survey on hallucination in large language

models: Principles, taxonomy, challenges, and open

questions.

Krishna, S., Krishna, K., Mohananey, A., Schwarcz, S.,

Stambler, A., Upadhyay, S., and Faruqui, M. (2024).

Fact, fetch, and reason: A uniﬁed evaluation of

retrieval-augmented generation.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,

V., Goyal, N., K

uttler, H., Lewis, M., tau

Yih, W., Rockt

aschel, T., Riedel, S., and Kiela,

D. (2021). Retrieval-augmented generation for

knowledge-intensive nlp tasks.

Maranguni

c, N. and Grani

c, A. (2015). Technology accep-

tance model: a literature review from 1986 to 2013.

Univers. Access Inf. Soc., 14(1):81–95.

Misirlis, N. and Munawar, H. B. (2023). An analysis of the

technology acceptance model in understanding uni-

versity students behavioral intention to use metaverse

technologies.

Naz, A., Prasad, P., McCall, S., CHAN, C. L., Ochi, I.,

Gong, L., and Yu, M. (2024). Privacy-preserving ab-

normal gait detection using computer vision and ma-

chine learning.

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L.,

Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt,

J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I.,

Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian,

M., Belgum, J., Bello, I., and et al., J. B. (2024). Gpt-4

technical report.

Pandey, N. P., Fournarakis, M., Patel, C., and Nagel, M.

(2023a). Softmax bias correction for quantized gener-

ative models. In Proceedings of the IEEE/CVF In-

ternational Conference on Computer Vision (ICCV)

Workshops, pages 1453–1458.

Pandey, N. P., Kulkarni, S., Wang, D., Gungor, O., Ponzina,

F., and Rosing, T. (2025). Dpq-hd: Post-training com-

pression for ultra-low power hyperdimensional com-

puting.

Pandey, N. P., Nagel, M., van Baalen, M., Huang, Y., Patel,

C., and Blankevoort, T. (2023b). A practical mixed

precision algorithm for post-training quantization.

Raghupathi, V., Zhou, Y., and Raghupathi, W. (2018). Le-

gal decision support: Exploring big data analytics

approach to modeling pharma patent validity cases.

IEEE Access, 6:41518–41528.

Rahutomo, F., Kitasuka, T., Aritsugi, M., et al. (2012). Se-

mantic cosine similarity. In The 7th international stu-

dent conference on advanced science and technology

ICAST, volume 4, page 1. University of Seoul South

Korea.

Shokrollahi, Y., Yarmohammadtoosky, S., Nikahd, M. M.,

Dong, P., Li, X., and Gu, L. (2023). A comprehensive

review of generative ai in healthcare.

us, D., Malone, S., and Br

unken, R. (2024). Explor-

ing generative ai in higher education: a rag system to

enhance student engagement with scientiﬁc literature.

Frontiers in Psychology, 15.

Verma, S., Tran, K., Ali, Y., and Min, G. (2023). Reducing

llm hallucinations using epistemic neural networks.

Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang,

X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou,

Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu,

J., Jiang, R., Wei, Y., and Xie, C. (2021). Mil-

vus: A purpose-built vector data management sys-

tem. In Proceedings of the 2021 International Con-

ference on Management of Data, SIGMOD ’21, page

2614–2627, New York, NY, USA. Association for

Computing Machinery.

Xu, Z., Jain, S., and Kankanhalli, M. (2024). Hallucination

is inevitable: An innate limitation of large language

models.

JURISMIND: Context-Driven Retrieval for Accurate and Relevant Legal Question-Answering in Patent Filings

329