Authors:
Matheus O. Silva
1
;
Eduardo Nascimento
1
;
2
;
Yenier Izquierdo
2
;
Melissa Lemos
1
;
2
and
Marco Casanova
2
;
1
Affiliations:
1
Department of Informatics, PUC-Rio, Rio de Janeiro, RJ, Brazil
;
2
Tecgraf Institute, PUC-Rio, Rio de Janeiro, RJ, Brazil
Keyword(s):
Conversational Agents, Database Interfaces, ReAcT, LLM.
Abstract:
Database conversational agents support dialogues to help users interact with databases in their jargon. A strategy to construct such agents is to adopt an LLM-based architecture. However, evaluating agent-based systems is complex and lacks a definitive solution, as responses from such systems are open-ended, with no direct relationship between input and the expected response. This paper then focuses on the problem of evaluating LLM-based database conversational agents. It first introduces a tool to construct test datasets for such agents that explores the schema and the data values of the underlying database. The paper then describes an evaluation agent that behaves like a human user to assess the responses of a database conversational agent on a test dataset. Finally, the paper includes a proof-of-concept experiment with an implementation of a database conversational agent over two databases, the Mondial database and an industrial database in production at an energy company.