Authors:
Eduardo Nascimento
1
;
Caio Avila
1
;
2
;
Yenier Izquierdo
1
;
Grettel García
1
;
Lucas Feijó L. Andrade
1
;
Michelle S. P. Facina
3
;
Melissa Lemos
1
and
Marco Casanova
4
;
1
Affiliations:
1
Instituto Tecgraf, PUC-Rio, Rio de Janeiro, RJ, CEP 22451-900, Brazil
;
2
Departamento de Computação, UFC, Fortaleza, CEP 60440-900, Brazil
;
3
Petrobras, Rio de Janeiro, RJ, CEP 20231-030, Brazil
;
4
Departamento de Informática, PUC-Rio, Rio de Janeiro, RJ, CEP 22451-900, Brazil
Keyword(s):
Text-to-SQL, Database Keyword Search, Large Language Models, Relational Databases.
Abstract:
Text-to-SQL prompt strategies based on Large Language Models (LLMs) achieve remarkable performance on well-known benchmarks. However, when applied to real-world databases, their performance is significantly less than for these benchmarks, especially for Natural Language (NL) questions requiring complex filters and joins to be processed. This paper then proposes a strategy to compile NL questions into SQL queries that incorporates a dynamic few-shot examples strategy and leverages the services provided by a database keyword search (KwS) platform. The paper details how the precision and recall of the schema-linking process are improved with the help of the examples provided and the keyword-matching service that the KwS platform offers. Then, it shows how the KwS platform can be used to synthesize a view that captures the joins required to process an input NL question and thereby simplify the SQL query compilation step. The paper includes experiments with a real-world relational databas
e to assess the performance of the proposed strategy. The experiments suggest that the strategy achieves an accuracy on the real-world relational database that surpasses state-of-the-art approaches. The paper concludes by discussing the results obtained.
(More)