Text-to-SQL Meets the Real-World

Eduardo Nascimento, Eduardo Nascimento, Grettel García, Lucas Feijó, Wendy Victorio, Wendy Victorio, Yenier Izquierdo, Aiko R. de Oliveira, Gustavo Coelho, Melissa Lemos, Melissa Lemos, Robinson Garcia, Luiz Leme, Marco Casanova, Marco Casanova



Text-to-SQL refers to the task defined as “ given a relational database D and a natural language sentence S that describes a question on D, generate an SQL query Q over D that expresses S”. Numerous tools have addressed this task with relative success over well-known benchmarks. Recently, several LLM-based text-to-SQL tools, that is, text-to-SQL tools that explore Large Language Models (LLMs), emerged that outperformed previous approaches. When adopted for industrial-size databases, with a large number of tables, columns, and foreign keys, the performance of LLM-based text-to-SQL tools is, however, significantly less than that reported for the benchmarks. This paper then investigates how a selected set of LLM-based text-to-SQL tools perform over two challenging databases, an openly available database, Mondial, and a proprietary industrial database. The paper also proposes a new LLM-based text-to-SQL tool that combines features from tools that performed well over the Spider and BIRD benchmarks. Then, the paper describes how the selected tools and the proposed tool, running under GPT-3.5 and GPT-4, perform over the Mondial and the industrial databases over a suite of 100 carefully defined natural language questions that are closely related to those observed in practice. It concludes with a discussion of the results obtained.


