Authors:
Júlio Campos
1
;
Grettel García
1
;
Jefferson A. de Sousa
1
;
Eduardo Corseuil
1
;
Yenier Izquierdo
1
;
Melissa Lemos
2
;
1
and
Marco Casanova
2
;
1
Affiliations:
1
Instituto Tecgraf, PUC-Rio, Rio de Janeiro, CEP 22451-900, RJ, Brazil
;
2
Departamento de Informática, PUC-Rio, Rio de Janeiro, CEP 22451-900, RJ, Brazil
Keyword(s):
Text-to-SQL, Large Language Models, Relational Databases, Llama, Gemma, GPT.
Abstract:
The development of Natural Language (NL) interfaces to access relational databases attracted renewed interest with the use of Large Language Models (LLMs) to translate NL questions to SQL queries. This translation task is often referred to as text-to-SQL, a problem far from being solved for real-world databases. This paper addresses the text-to-SQL task for a specific type of real-world relational database storing data extracted from engineering CAD files. The paper introduces a prompt strategy tuned to the text-to-SQL task over such databases and presents a performance analysis of LLMs of different sizes. The experiments indicated that GPT-4o achieved the highest accuracy (96%), followed by Llama 3.1 70B Instruct (86%). Quantized versions of Gemma 2 27B and Llama 3.1 8B had a very limited performance. The main challenges faced in the text-to-SQL task involved SQL complexity and balancing speed and accuracy when using quantized open-source models.