loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Eduardo R. Nascimento 1 ; Yenier T. Izquierdo 1 ; Grettel García 1 ; Gustavo Coelho 1 ; Lucas Feijó 1 ; Melissa Lemos 1 ; Luiz Leme 2 and Marco Casanova 3 ; 1

Affiliations: 1 Instituto Tecgraf, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil ; 2 Instituto de Computação, UFF, Niterói, 24210-310, RJ, Brazil ; 3 Departamento de Informática, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil

Keyword(s): Text-to-SQL, GPT, Large Language Models, Relational Databases.

Abstract: The leaderboards of familiar benchmarks indicate that the best text-to-SQL tools are based on Large Language Models (LLMs). However, when applied to real-world databases, the performance of LLM-based text-to-SQL tools is significantly less than that reported for these benchmarks. A closer analysis reveals that one of the problems lies in that the relational schema is an inappropriate specification of the database from the point of view of the LLM. In other words, the target user of the database specification is the LLM rather than a database programmer. This paper then argues that the text-to-SQL task can be significantly facilitated by providing a database specification based on the use of LLM-friendly views that are close to the language of the users’ questions and that eliminate frequently used joins, and LLM-friendly data descriptions of the database values. The paper first introduces a proof-of-concept implementation of three sets of LLM-friendly views over a relational database , whose design is inspired by a proprietary relational database, and a set of 100 Natural Language (NL) questions that mimic users’ questions. The paper then tests a text-to-SQL prompt strategy implemented with LangChain, using GPT-3.5 and GPT-4, over the sets of LLM-friendly views and data samples, as the LLM-friendly data descriptions. The results suggest that the specification of LLM-friendly views and the use of data samples, albeit not too difficult to implement over a real-world relational database, are sufficient to improve the accuracy of the prompt strategy considerably. The paper concludes by discussing the results obtained and suggesting further approaches to simplify the text-to-SQL task. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 100.28.231.85

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
R. Nascimento, E.; T. Izquierdo, Y.; García, G.; Coelho, G.; Feijó, L.; Lemos, M.; Leme, L. and Casanova, M. (2024). My Database User Is a Large Language Model. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 800-806. DOI: 10.5220/0012697700003690

@conference{iceis24,
author={Eduardo {R. Nascimento}. and Yenier {T. Izquierdo}. and Grettel García. and Gustavo Coelho. and Lucas Feijó. and Melissa Lemos. and Luiz Leme. and Marco Casanova.},
title={My Database User Is a Large Language Model},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={800-806},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012697700003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - My Database User Is a Large Language Model
SN - 978-989-758-692-7
IS - 2184-4992
AU - R. Nascimento, E.
AU - T. Izquierdo, Y.
AU - García, G.
AU - Coelho, G.
AU - Feijó, L.
AU - Lemos, M.
AU - Leme, L.
AU - Casanova, M.
PY - 2024
SP - 800
EP - 806
DO - 10.5220/0012697700003690
PB - SciTePress