Exploring Customer Service Agent Preferences for Conversational and

Keyword-Based Information Retrieval

Nektarios Machner

, Yaren M

andle

and Florian Matthes

Department of Computer Science, Technical University of Munich, Munich, Germany

Keywords:

Conversational Search, Knowledge Discovery, Knowledge Management, Information Retrieval, Customer

Service.

Abstract:

Effective knowledge discovery and information retrieval drive organizational innovation and competitive ad-

vantage. To support this, organizations have long used knowledge management systems that historically have

relied on keyword-based search. The rise of artiﬁcial intelligence (AI), most notably large language models

(LLMs), has enabled conversational search (CS) interfaces that understand natural-language queries, synthe-

size information from multiple sources, and generate answers. This study investigates the factors that inﬂuence

customer service agents’ preferences for conversational search versus traditional keyword-based search within

an internal knowledge management system. Set in a large European insurance company, we employ a mixed-

methods empirical approach, integrating semi-structured interviews (n = 13), a structured survey (n = 17), and

log-ﬁle analysis of 508 real-world queries. Our research explores which factors drive agents’ choice between

the two search approaches, and examines the practical strengths and limitations of each approach. Our ﬁndings

reveal that agents choose keyword search when they are conﬁdent of where to look and conversational search

when they need natural-language guidance, with trust and time constraints further tipping the balance. This

complementarity suggests hybrid interfaces, blending ease of use, reliable results, and ﬂexible query handling,

best support agents’ workﬂows.

1 INTRODUCTION

Knowledge is a critical asset for modern organiza-

tions: “Successful companies are those that create

new knowledge, disseminate it widely throughout the

organization, and quickly embody it into new tech-

nologies and products. This process further fuels in-

novation and develops lasting competitive advantage”

(Fowler, 2000). To facilitate effective knowledge

sharing, many organizations deploy knowledge man-

agement systems that enable employees to retrieve in-

formation on demand.

Historically, the search process depended on

keyword-based search, requiring users to select pre-

cise terms or Boolean operators to locate documents.

While effective for well-deﬁned queries, this ap-

proach often struggles with synonyms, polysemy, and

contextual nuances. In response to the limitations of

traditional keyword-based retrieval, the emergence of

artiﬁcial intelligence (AI), particularly large language

https://orcid.org/0009-0001-8359-6668

https://orcid.org/0009-0007-1087-5707

https://orcid.org/0000-0002-6667-5452

models (LLMs), has enabled conversational search

interfaces that transform how employees interact with

knowledge management systems. By understand-

ing queries in natural language, LLM-based conver-

sational search allows employees to simply describe

their information needs in everyday terms, rather than

crafting precise keyword strings. The model then ag-

gregates relevant passages across multiple retrieved

documents and generates an answer, while highlight-

ing implicit connections in the data and further offer-

ing the option for clarifying follow-up queries.

However, successful deployment of AI-based sys-

tems not only depends on technology but also on em-

ployee acceptance. While there is extensive research

on customer-facing conversational interfaces, far less

attention has been paid to how employees integrate

AI-based conversational search into their existing in-

ternal knowledge management workﬂows.

In this study, we investigate within the context of

a large European insurance company the adoption of

LLM-based conversational search in knowledge dis-

covery and information retrieval compared with tradi-

tional keyword-based search. To this end, we deﬁned

Machner, N., Mändle, Y. and Matthes, F.

Exploring Customer Service Agent Preferences for Conversational and Keyword-Based Information Retrieval.

DOI: 10.5220/0013721400004000

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2025) - Volume 1: KDIR, pages 349-356

349

the following research questions (RQs):

• RQ1: Which factors inﬂuence customer service

agents’ choice between the conversational search

and keyword-based search?

• RQ2: What are the strengths and limitations of

LLM-based conversational search versus tradi-

tional keyword-based search when integrated into

existing knowledge management systems?

2 RELATED WORK

As we investigate the adoption of conversational

search versus traditional keyword-based search in

customer service knowledge management, it is impor-

tant to contextualize our work within existing studies

on these retrieval modalities across various domains.

• Information access via conversational agents and

traditional keyword search was compared by

(Preininger et al., 2021) within a widely used

pharmacologic knowledge base. They found that

for certain topics, users accessed information

more frequently with the conversational agent,

while other topics saw higher access rates un-

der the keyword-based approach. However, their

study did not explore why users chose one method

over the other, nor did it assess the usability or

user satisfaction associated with each search ap-

proach.

• A conversational search system for exploring

scholarly publications using a knowledge graph

was developed by (Schneider and Matthes, 2024).

They evaluated it through a user study with 40

participants, comparing it to a traditional graph-

ical interface with keyword search. Their results

indicate that the conversational interface enables

more effective discovery of research publications.

However, unlike our study, their work focused

on individual private users in an academic search

context, rather than on enterprise users in a knowl-

edge management setting.

• A survey of college students by (Sakirin and

Ben Said, 2023) compared ChatGPT-powered

conversational interfaces with traditional keyword

search. Using descriptive and inferential statis-

tics, they found that most participants favored the

ChatGPT interface for its convenience and efﬁ-

ciency. In contrast to our study, their work exam-

ined individual private users instead of enterprise

users operating within a knowledge management

environment.

• A user study conducted by (Liu et al., 2021) com-

pared interaction behaviors and explicit feedback

when searching legal cases via a traditional key-

word system versus a conversational search in-

terface. They tracked both search interactions

and outcome metrics, ﬁnding that participants

achieved higher retrieval performance with the

conversational system. While their results under-

score the potential of conversational agents for

improving search effectiveness, their work is situ-

ated in a legal case retrieval context, whereas our

study focuses on customer service agents navigat-

ing an insurance-domain knowledge management

system.

• Traditional keyword search versus LLM-based

search for image geolocation tasks was evaluated

by (Wazzan et al., 2024), asking users to pin-

point where an image was taken. They examined

both task performance and how users adjusted

their query strategies, ﬁnding that keyword search

yielded more accurate location predictions than

the LLM-based approach. While their work high-

lights differences in tool effectiveness and user

behavior, it differs from ours in that we com-

pare these search modalities within an organiza-

tional knowledge-management setting and assess

the LLM-based system using metrics such as per-

ceived ease of use and answer relevance rather

than geolocation accuracy.

• Two online experiments run by (Spatharioti et al.,

2023) compared traditional keyword search and

an LLM-based tool for consumer-product re-

search. They found that the LLM interface en-

abled faster task completion with fewer but more

complex queries, although participants sometimes

overrelied on incorrect model outputs. Unlike

their lab-style experiments, our study examines

customer service agents in an organizational in-

surance context using interviews, surveys, and log

analysis.

3 METHODOLOGY

3.1 Case Study Context

As this study is conducted within the context of our

case study company, relevant context information is

disclosed below:

• All data is collected within the context of a large

European insurance company, more speciﬁcally

within the German branch of customer care.

• Their in-house developed knowledge manage-

ment software serves as the central help sys-

tem for internal use supporting employees across

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

350

all divisions in case processing. It has been

in use since 2008 and consists of over 168,000

pages across approximately 5,600 documents

with roughly 40,000 active users.

• The software was extended in 2023 by adding

conversational search to support customer service

agents in efﬁciently ﬁnding information to save

time and effort, especially when handling cus-

tomer service requests simultaneously.

• The conversational search was implemented in

addition to the traditional keyword-based search

and not as a substitute. Customer service agents

are free to choose which system to use based on

their needs.

3.2 Search Workﬂow & IT Architecture

To better understand the differences between the two

solution approaches, a typical workﬂow and the un-

derlying architecture of each will be brieﬂy high-

lighted below.

3.2.1 Conversational Search

Workﬂow

• When a customer service agent submits a query,

a retriever model ﬁrst fetches documents accessi-

ble to the user’s authentication group. A ranking

model then prioritizes the most relevant results.

Finally, an OpenAI GPT-4-based LLM generates

a response, including references to the top three

retrieved documents (OpenAI, 2024).

• Customer service agents rate answers on a 1–5

scale, where 5 indicates high satisfaction, and

mark each retrieved document as relevant or not

using thumbs up/down. This feedback is used to

improve the conversational search pipeline.

Architecture

The conversational search assistant consists of mul-

tiple components and follows a typical Retrieval-

Augmented Generation (RAG) pipeline architecture.

• A centralized authentication service is used to

manage users and their access rights.

• All documents of the knowledge management

system are stored in the ﬁle storage and embed-

ded in a vector database.

• Retriever models are used to search the vector

database and retrieve relevant documents.

• Ranker models are used to rank the retrieved doc-

uments based on relevance to the query.

• An LLM (GPT-4) is used to generate answers

from the retrieved results (OpenAI, 2024).

• Feedback from the customer service agents is

stored in a database, where it is later reviewed by

experts and used to generate a ﬁne-tuning dataset

to continuously reﬁne and improve the three types

of models used throughout the pipeline.

3.2.2 Keyword-Based Search

Workﬂow

• A customer service agent can type keywords into

an input ﬁeld and ﬁlter by metadata. The search

engine checks for the presence of a keyword

within all of its indexed documents and evaluates

its frequency. The algorithm returns relevant doc-

uments to the customer service agent.

Architecture

• Users are authenticated by their email address and

global identiﬁers to determine which documents

they are allowed to access.

• Documents are stored in JSON format with a

unique identiﬁer, mandatory ﬁelds such as title,

and metadata.

• All search queries are processed by the search en-

gine in Python.

• Search results are ranked by assigning weights

that consider matches in titles, headings, body

text, the document type, and term frequency.

• An auto-complete function suggests next words

during typing, and ﬁlters can narrow results based

on metadata.

3.3 Literature Review

To design our interview and survey questions, we

ﬁrst conducted a literature review to identify relevant

evaluation criteria for comparing conversational and

keyword-based search. We consolidated our ﬁndings

and adapted multiple metrics based on prior system-

atic literature reviews on conversational agent adop-

tion (Ling et al., 2021; Lewandowski et al., 2021).

These works identify a range of factors, including

user-related, agent-related, and attitude-based dimen-

sions (Ling et al., 2021), as well as organizational,

technical, and environmental drivers (Lewandowski

et al., 2021).

Unlike these reviews, our work empirically ex-

amines customer service agents’ preferences between

conversational and keyword-based search within an

insurance company’s knowledge management sys-

tem.

Exploring Customer Service Agent Preferences for Conversational and Keyword-Based Information Retrieval

351

3.4 Interview Design

We conducted semi-structured interviews with

customer service agents who had used both the tra-

ditional keyword-based search and the more recently

implemented conversational search. Thirteen agents

participated, varying in age, gender, and professional

experience. To understand when agents preferred one

approach over the other, we deﬁned seven scenarios,

each corresponding to a search query type adapted

from real log data. For each scenario, interviewees

indicated their preferred search method and explained

why. Since query types can overlap, a single query

may belong to multiple categories. Each category is

brieﬂy deﬁned (Def.) and illustrated with an example

(Ex.) below.

Simple Query

• Def.: Requires manual lookup in a single docu-

ment of the knowledge base by the agent.

• Ex.: Are bikes insurable under private insurance?

Complex Query

• Def.: Requires more intensive research, such as

looking up multiple documents or entries in the

knowledge base.

• Ex.: How can I insure a minor policyholder?

Close-Ended Query

• Def.: Polar questions answered with ’yes’ or ’no’.

• Ex.: Is Parkinson’s disease a chronic illness?

Open-Ended Query

• Def.: Requires more detailed and extensive an-

swers.

• Ex.: What do I need to consider as a buyer or

seller during a change of ownership?

Short Query

• Def.: Search queries containing no more than

about ten words.

• Ex.: How long is the immediate coverage valid?

Long Query

• Def.: Search queries containing more than ten

words.

• Ex.: Are damages caused by my pet, such as bite

injuries or property damage, covered under liabil-

ity insurance?

Procedural Query

• Def.: Requires guidance or a description of how

to perform a speciﬁc task step-by-step.

• Ex.: How do I withdraw a balance?

After going through all scenarios, we asked the

interviewees more generally which factors inﬂu-

ence their choice between conversational search and

keyword-based search, and what they believe their re-

spective strengths and limitations are.

3.5 Survey Design

After collecting qualitative feedback through the

semi-structured interviews, we sent out an online sur-

vey to all interviewees and further customer service

agents to also collect quantitative feedback. From the

original 13 interviewees, 12 also ﬁlled out the sur-

vey, as well as an additional 5 customer service agents

who could not participate in our interviews, for a to-

tal of 17 survey participants. The survey consisted

of 20 statements mapped to 11 evaluation metrics,

each rated on a 5-point Likert scale ranging from 1

(strongly disagree) to 5 (strongly agree).

3.6 Log Analysis

To gather feedback on the use of conversational

search, the case study company integrated a feed-

back mechanism into the software that logs evaluation

data in JSON format. These logs include the orig-

inal search queries, the system-generated answers,

the three retrieved documents, and customer service

agents’ ratings of both the answers and the docu-

ment suggestions. Answers are rated on a 1–5 scale,

with higher scores indicating greater satisfaction. Re-

trieved documents are rated with a thumbs up (good)

or thumbs down (bad). We analyzed logs collected

between February and November 2024 and compared

the ﬁndings with insights from the interviews and sur-

vey.

4 RESULTS & DISCUSSION

4.1 Interview Results

We interviewed 13 customer service agents and asked

them about their preferences regarding keyword-

based and conversational search for seven different

scenarios. The ﬁndings regarding each scenario are

presented and discussed individually below.

4.1.1 Simple Query

For simple queries, 7 out of 13 agents preferred con-

versational search, noting its ability to distinguish be-

tween policy types (e.g., liability vs. property insur-

ance) and to pinpoint relevant coverage details. When

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

352

agents are uncertain of the precise keywords to use,

they consider conversational search more effective, as

it supports queries in natural language rather than re-

quiring the exact terms. The 6 agents who preferred

keyword search stated speed and familiarity with the

tool as their reasons, especially during ongoing calls

with customers, as typing a few keywords is more efﬁ-

cient than typing a full question. Agents familiar with

the tool also generally know where to ﬁnd the nec-

essary information quickly and directly as they have

consulted the documentation multiple times.

4.1.2 Complex Query

For complex queries, all 13 agents preferred conver-

sational search over keyword-based search. As com-

plex queries require looking up multiple documents,

agents value conversational search for its ability to

quickly aggregate relevant information from multiple

sources, enabling them to review related topics and

assess their relevance. By contrast, using keyword-

based search for such queries is regarded as tedious

and time-consuming, as it requires locating multiple

documents and manually evaluating each one.

4.1.3 Close-Ended Query

For close-ended queries, 10 agents preferred con-

versational search, while the remaining 3 opted for

keyword-based search. The main reason for CS is

its ability to directly provide a ’yes’ or ’no’ answer

without having to look up documents, thereby sav-

ing time and effort. The agents favoring keyword-

based search were all sufﬁciently familiar with the

knowledge base to already know which documents to

look for, suggesting their preference stemmed from

familiarity with the system rather than keyword-based

search being better suited to the task.

4.1.4 Open-Ended Query

For open-ended queries, 7 out of 13 agents pre-

ferred keyword-based search, stating that conver-

sational search provides only summarized answers,

whereas they require more detailed information. They

either know where to ﬁnd it due to their familiarity

with the knowledge base or are willing to look it up

manually, considering the inquiries important enough

to justify the extra effort. The agents preferring con-

versational search explained that, lacking precise key-

words, they would not know what to look for and

therefore would use the conversational search and its

document suggestions as a starting point for deeper

exploration into the knowledge base.

4.1.5 Short Query

For short queries, 10 out of 13 agents favor conversa-

tional search, with the main reason being its ability to

quickly and directly produce an answer to the search

query. Agents found it particularly effective for broad

or general topics, trusting the system to provide ac-

curate responses. Conversely, the 3 agents who opted

for keyword search did so out of habit and conﬁdence

in their existing retrieval strategies for quick lookups.

4.1.6 Long Query

For long queries, 7 of the 13 agents favored con-

versational search while 6 preferred keyword search.

Those who chose CS believed that, when the query

is well constructed, it results in precise answers, es-

pecially for newer policies with predeﬁned responses,

making it easier and faster to locate needed informa-

tion. Nevertheless, some agents noted that they still

revert to keyword search for more complex issues to

verify the accuracy of the CS results.

4.1.7 Procedural Query

For procedural queries, 7 of the 13 agents preferred

conversational search, 4 chose keyword search, and

2 were undecided. Those favoring CS appreciated

that it delivers focused, relevant results without the

broad, unfocused listings typical of keyword search,

which require clicking through multiple links to as-

sess relevance. Many agents also reported encounter-

ing such procedural questions for the ﬁrst time and

not knowing where to begin with keyword search; in

these cases, they found CS faster and more intuitive

for matching information to their query before diving

into the full document. Conversely, agents unfamiliar

with using CS for procedural tasks expressed skep-

ticism about its accuracy and therefore preferred the

reliability of keyword search.

4.1.8 Inﬂuencing Factors

After the predeﬁned scenarios, we asked agents about

their general preferences between conversational and

keyword-based search. Their choice largely depends

on familiarity with the knowledge base and conﬁ-

dence in locating information. Agents who know

where to look tend to favor keyword search for its

speed and reliability, while those less certain about a

topic or its structure prefer conversational search for

its guided, natural-language interface.

Query complexity also shapes preferences. For

broad or complex questions, especially those involv-

ing multiple documents or recent policies, agents

value CS’s ability to surface relevant passages and

Exploring Customer Service Agent Preferences for Conversational and Keyword-Based Information Retrieval

353

suggest follow-up prompts. In contrast, for routine or

well-deﬁned queries, particularly under time pressure

(e.g., live calls), keyword search remains the go-to op-

tion. Agents also report switching back to keyword

search to verify CS responses on critical or unfamiliar

issues.

Trust and usability further inﬂuence adoption.

Some agents hesitate to rely on CS until its accuracy

and document coverage, especially for older policies,

improve. Keyword search, by contrast, beneﬁts from

long-standing trust in its precision. Agents who ﬁnd

CS intuitive and are open to new tools are more likely

to adopt it, highlighting the importance of clear trust

indicators, comprehensive document inclusion, and

seamless workﬂows to encourage broader use.

4.2 Survey Results

The aggregated survey results from 17 participants,

including the mean and standard deviation for each

evaluation metric, are presented and discussed indi-

vidually below.

Perceived Ease of Use (Davis, 1989)

Overall, the metric ”Perceived Ease of Use” has a

mean of 4.24, indicating that customer service agents

generally found the system easy to use. The standard

deviation of 0.39 suggests low variability, meaning

that most agents rated the system similarly.

Performance (Peras, 2018)

The performance metric has a mean score of 3.55,

indicating a slightly above-average perception of

performance among customer service agents. How-

ever, the high standard deviation of 1.13 suggests that

different agents have signiﬁcantly different opinions

about the system’s performance.

Answer Faithfulness (Saad-Falcon et al., 2024)

The answer faithfulness metric has a mean score

of 3.56, reﬂecting a slightly above-average level of

agreement among customer service agents regarding

the faithfulness of the answers provided by the CS.

A standard deviation of 0.70 indicates moderate

variability, suggesting that while some agents ﬁnd the

system’s answers faithful, others hold differing views.

Answer Relevance (Saad-Falcon et al., 2024)

The answer relevance metric has a mean score

of 3.65, suggesting that customer service agents

generally agree that CS provides relevant answers.

However, the moderate standard deviation of 0.71

indicates some variability, with some agents differing

in their perception of answer relevance.

Context Relevance (Es et al., 2024)

The context relevance metric has a mean score of

3.29, reﬂecting a mostly neutral perception. The

standard deviation of 0.88 shows notable variability

in perceptions, suggesting mixed views among

agents.

Satisfaction (Oliver, 1981)

The satisfaction metric has a mean score of 3.59,

implying a slightly positive perception regarding

satisfaction. The moderate standard deviation of 0.71

indicates moderate variability, suggesting that while

many agents view CS positively, opinions are not

uniform.

Perceived Usefulness (Davis, 1989)

The mean score of the perceived usefulness metric is

3.47, indicating a moderate perception. The standard

deviation of 1.57 is very high, reﬂecting signiﬁcant

variability in responses and, thus, differences in how

useful agents perceive conversational search.

Quality (Oghuma et al., 2015)

The mean score for this metric is 3.69, which implies

an overall positive evaluation of the quality of the

system. The standard deviation is 0.8, meaning the

variability is moderate and opinions on this metric

are not entirely consistent.

Business Value (Peras, 2018)

The business value metric has a mean score of 3.71,

meaning that, generally, customer service agents

perceive the system as beneﬁcial for business pur-

poses. The standard deviation is 0.89, representing

moderate variability. This indicates that the agents

have diverse opinions regarding the system’s value to

the organization.

Openness to New Technologies (Mcknight et al.,

2011)

The openness to new technologies metric has a mean

score of 4.35, meaning that, generally, agents stated

that they are open to new technologies. The standard

deviation of 0.53 indicates a low variability, which

means most of the agents share similar views.

Replaceability and Necessity of CS

The metric ”Replaceability and Necessity of CS” has

a mean score of 3.27, suggesting a moderate agree-

ment on the system’s necessity and replaceability.

The standard deviation of 0.95 indicates moderate

variability, reﬂecting differing views on the matter.

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

354

4.3 Log Analysis Results

The log ﬁles we analyzed contained a total of 508

queries spanning the time frame from February 2024

to November 2024. We manually categorized a sam-

ple size of 400 queries into the seven types we previ-

ously deﬁned for our interviews, whereas each query

could belong to multiple categories. Table 1 shows an

overview of the categorization and the frequency of

search queries.

Table 1: Categorization of Search Queries in the Logs.

Scenarios Number of Queries

Simple Query 243 (60.75%)

Complex Query 157 (39.25%)

Open-Ended Query 159 (39.75%)

Close-Ended Query 241 (60.25%)

Short Query 212 (53%)

Long Query 188 (47%)

Procedural Query 7 (1.75%)

As can be seen in the table, a large amount of

queries were classiﬁed as simple, close-ended, or

short, whereas only seven queries were procedural.

This aligns with the interview results that conversa-

tional search is preferred more for short and close-

ended queries. Furthermore, it was observed that out

of the 400 queries, 387 (96.75% ) were full sentences,

while 13 (3.25%) were keyword search-like queries

and not complete sentences.

Next, we examined customer service agents’ eval-

uations of the answers generated by the LLM. Of the

508 total queries, agents rated 503 of them. Table 2

presents a summary of these ratings.

Table 2: User Ratings for Answers Generated by the LLM.

Rating Number of Answers

1 251 (49.90%)

2 22 (4.37%)

3 39 (7.75%)

4 10 (1.99%)

5 181 (35.98%)

On a scale of 1 to 5, with 5 meaning the agent was

highly satisﬁed with the answer, slightly more than a

third of the answers received the highest possible rat-

ing. Interestingly, roughly half the answers received

the worst possible rating. Overall, the distribution

suggests that answers were either fully satisfying or

not satisfying at all, leaving little middle ground in

between.

Finally, we assessed the agents’ ratings of the re-

trieved documents that they rated with either a thumbs

up or a thumbs down. Note that if at least one doc-

ument was evaluated with a thumbs up, the feed-

back system indicated a success rate of 100% for that

query, as the agents could ﬁnd the answer to their

questions in one of the documents. Table 3 shows a

summary of the document ratings.

Table 3: User Ratings for Document Relevance.

Rating Number of Answers

Thumbs Up 203 (41.01%)

Thumbs Down 292 (58.99%)

After analyzing the logs, we ﬁnd that the rat-

ings and thumbs-up/thumbs-down results do not fully

align with the interview or survey results. For ﬁve out

of seven scenarios we examined in the interviews, the

majority of the agents preferred conversational search

over keyword-based search. Moreover, during the

interviews, even though there were also agents who

stated that the accuracy of the responses was not al-

ways 100% correct, the majority of the agents stated

that conversational search signiﬁcantly eased the pro-

cess of ﬁnding the information they needed and in-

creased their efﬁciency. Also, during the survey, more

than half of the agents stated that the conversational

search system improves their task efﬁciency and work

performance. While the mean scores from the survey

results showed an overall moderately positive percep-

tion of conversational search, 58.99% of document

suggestions receiving all thumbs down and 62.02%

of documents having a rating of 1, 2, or 3 do not align

with the moderately positive perception.

5 CONCLUSION

Our mixed-methods investigation shows that cus-

tomer service agents’ choice between conversational

and keyword-based search is driven primarily by their

familiarity with the knowledge base and conﬁdence

in locating information. Agents who know where

to look tend to default to keyword search, while

those less certain rely on conversational search’s

natural-language guidance. Trust concerns, partic-

ularly for open-ended or complex queries, prompt

some agents to cross-check conversational outputs

with keyword results, and time pressure further

inﬂuences preferences: conversational search excels

at handling short, complex, or close-ended queries

efﬁciently, whereas keyword search remains the go-to

under live-call conditions when precise document

retrieval is paramount. Adoption of conversational

search also aligns with perceived ease of use, answer

faithfulness, and time-saving beneﬁts, and is stronger

Exploring Customer Service Agent Preferences for Conversational and Keyword-Based Information Retrieval

355

among agents open to new technologies. These

ﬁndings underscore the complementary strengths

of both modalities and suggest that enhancing trust

indicators, reﬁning usability, and integrating hybrid

search interfaces will better support agent workﬂows.

Limitations

Our study is subject to the following limitations:

• Scope & Applicability: This study is conﬁned

to our single case study company operating in the

insurance domain and therefore may not general-

ize to other industries or organizations. Moreover,

the integration of conversational search is still in

its test phase at this company. Agent attitudes and

preferences may change over time as they become

more familiar with the system.

• Sample Size: The limited number of intervie-

wees (n=13) and survey participants (n=17) may

restrict the generalizability of our ﬁndings.

ACKNOWLEDGEMENTS

We would like to thank our case study company and

its employees who participated in the interviews and

made this study possible. Generative AI (GPT-4) was

used in this study for the conversational search as de-

scribed above. ChatGPT (https://chatgpt.com/) was

used minimally for wording and phrasing of this pa-

per, with full responsibility for the content, interpre-

tation, and ﬁnal version remaining with the authors.

REFERENCES

Davis, F. D. (1989). Perceived usefulness, perceived ease of

use, and user acceptance of information technology.

MIS Quarterly, 13(3):319–340.

Es, S., James, J., Espinosa Anke, L., and Schockaert, S.

(2024). RAGAs: Automated evaluation of retrieval

augmented generation. In Proceedings of the 18th

Conference of the European Chapter of the Associa-

tion for Computational Linguistics: System Demon-

strations, pages 150–158, St. Julians, Malta. Associa-

tion for Computational Linguistics.

Fowler, A. (2000). The role of ai-based technology in sup-

port of the knowledge management value activity cy-

cle. The Journal of Strategic Information Systems,

9(2):107–128.

Lewandowski, T., Delling, J., Grotherr, C., and B

ohmann,

T. (2021). State-of-the-art analysis of adopting ai-

based conversational agents in organizations: A sys-

tematic literature review. In Proceedings of the

25th Paciﬁc Asia Conference on Information Systems

(PACIS 2021), page 167. Association for Information

Systems.

Ling, E. C., Tussyadiah, I., Tuomi, A., Stienmetz, J., and

Ioannou, A. (2021). Factors inﬂuencing users’ adop-

tion and use of conversational agents: A systematic

review. Psychol. Mark., 38(7):1031–1051.

Liu, B., Wu, Y., Liu, Y., Zhang, F., Shao, Y., Li, C., Zhang,

M., and Ma, S. (2021). Conversational vs traditional:

Comparing search behavior and outcome in legal case

retrieval. In Proceedings of the 44th International

ACM SIGIR Conference on Research and Develop-

ment in Information Retrieval. ACM.

Mcknight, D. H., Carter, M., Thatcher, J. B., and Clay, P. F.

(2011). Trust in a speciﬁc technology: An investi-

gation of its components and measures. ACM Trans.

Manage. Inf. Syst., 2(2).

Oghuma, A., Libaque-Saenz, C., Wong, S. F., and Chang,

L. Y. (2015). An expectation-conﬁrmation model of

continuance intention to use mobile instant messag-

ing. Telematics and Informatics, 33:34–47.

Oliver, R. L. (1981). Measurement and evaluation of satis-

faction processes in retail settings. Journal of Retail-

ing, 57(3):25–48.

OpenAI (2024). Gpt-4 technical report.

Peras, D. (2018). Chatbot evaluation metrics. Economic

and Social Development: Book of Proceedings, pages

89–97.

Preininger, A. M., Rosario, B. L., Buchold, A. M., Heiland,

J., Kutub, N., Bohanan, B. S., South, B., and Jack-

son, G. P. (2021). Differences in information accessed

in a pharmacologic knowledge base using a conversa-

tional agent vs traditional search methods. Interna-

tional Journal of Medical Informatics, 153:104530.

Saad-Falcon, J., Khattab, O., Potts, C., and Zaharia, M.

(2024). ARES: An automated evaluation framework

for retrieval-augmented generation systems. In Pro-

ceedings of the 2024 Conference of the North Amer-

ican Chapter of the Association for Computational

Linguistics: Human Language Technologies (Volume

1: Long Papers), pages 338–354, Mexico City, Mex-

ico. Association for Computational Linguistics.

Sakirin, T. and Ben Said, R. (2023). User preferences for

chatgpt-powered conversational interfaces versus tra-

ditional methods. Mesopotamian Journal of Com-

puter Science, 2023:24–31.

Schneider, P. and Matthes, F. (2024). Conversational

exploratory search of scholarly publications using

knowledge graphs. In Abbas, M. and Freihat, A. A.,

editors, Proceedings of the 7th International Confer-

ence on Natural Language and Speech Processing

(ICNLSP 2024), pages 384–396, Trento. Association

for Computational Linguistics.

Spatharioti, S. E., Rothschild, D. M., Goldstein, D. G., and

Hofman, J. M. (2023). Comparing traditional and llm-

based search for consumer choice: A randomized ex-

periment.

Wazzan, A., MacNeil, S., and Souvenir, R. (2024). Com-

paring traditional and LLM-based search for image

geolocation. In Proceedings of the 2024 ACM SI-

GIR Conference on Human Information Interaction

and Retrieval. ACM.

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

356