On the Construction of Database Interfaces Based on Large Language

Models

ao Pinheiro

1 a

, Wendy Victorio

1,2 b

, Eduardo Nascimento

1,2 c

, Antony Seabra

1 d

Yenier Izquierdo

2 e

, Grettel Garc

ıa

2 f

, Gustavo Coelho

2 g

, Melissa Lemos

1,2 h

Luiz Paes Leme

3 i

, Antonio Furtado

1 j

and Marco Casanova

1,2 k

Departamento de Inform

atica, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil

Instituto Tecgraf, PUC-Rio, Rio de Janeiro, 22451-900, RJ, Brazil

Instituto de Computac¸

ao, UFF, Niter

oi, 24210-310, RJ, Brazil

Keywords:

Large Language Models, Database Interfaces, Conversational Interfaces.

Abstract:

This paper argues that Large Language Models (LLMs) can be proﬁtably used to construct natural language

(NL) database interfaces, including conversational interfaces. Such interfaces will be simply called LLM-

based database (conversational) interfaces. It discusses three problems: how to use an LLM to create an NL

database interface; how to ﬁne-tune an LLM to follow instructions over a particular database; and how to

simplify the construction of LLM-based database (conversational) interfaces. The paper covers the ﬁrst two

problems with the help of examples based on two well-known LLM families, GPT and LLaMA, developed by

OpenAI and Meta, respectively. Likewise, it discusses the third problem, with the help of examples based on

two frameworks, LangChain and LlamaIndex.

1 INTRODUCTION

Natural language database interfaces allow users to

access databases using queries formulated in natural

language (NL) (Affolter et al., 2019). Going one

step further, NL database conversational interfaces

(software-based dialogue systems or chatbots) (Mot-

ger et al., 2022)(Caldarini et al., 2022) offer users

the possibility to interact with databases using tech-

niques similar to human conversation; they accept

queries typically formulated in NL, return answers

also commonly expressed in NL, and support persis-

tent conversational contexts, thereby generating user-

https://orcid.org/0000-0002-0909-4432

https://orcid.org/0009-0003-0545-2612

https://orcid.org/0009-0005-3391-7813

https://orcid.org/0009-0007-9459-8216

https://orcid.org/0000-0003-0971-8572

https://orcid.org/0000-0001-9713-300X

https://orcid.org/0000-0003-2951-4972

https://orcid.org/0000-0003-1723-9897

https://orcid.org/0000-0001-6014-7256

https://orcid.org/0000-0003-3710-624X

https://orcid.org/0000-0003-0765-9636

computer dialogues.

This paper argues in favor of using Large Lan-

guage Models (LLMs) (Manning, 2022) to construct

NL (conversational) database interfaces. Such in-

terfaces will be simply called LLM-based database

(conversational) interfaces, leaving it implicit that the

queries are formulated in NL. The adoption of LLMs

would simplify the interpretation of NL queries, the

generation of NL answers, and the control of user di-

alog, tasks that are addressed in LLM training.

A Large Language Model (LLM) is a language

model based on a deep neural network architecture

with a very large number of parameters, trained on

enormous quantities of unlabeled text, using self-

supervised or semi-supervised learning. One should

single out LLMs that have been trained to follow

instructions, which are most suited for constructing

conversational interfaces. Such LLMs are sometimes

called Chat Models. LLMs are a transformative tech-

nology, enabling developers to build novel applica-

tions. LLMs came to the foreground with the an-

nouncement of conversational interfaces that are able

to generate high-quality textual answers.

Speciﬁcally, the paper addresses three problems:

Pinheiro, J., Victorio, W., Nascimento, E., Seabra, A., Izquierdo, Y., García, G., Coelho, G., Lemos, M., Leme, L., Furtado, A. and Casanova, M.

On the Construction of Database Interfaces Based on Large Language Models.

DOI: 10.5220/0012204000003584

In Proceedings of the 19th International Conference on Web Information Systems and Technologies (WEBIST 2023), pages 373-380

ISBN: 978-989-758-672-9; ISSN: 2184-3252

373

P1. How to use an LLM to create an NL database

interface.

P2. How to ﬁne-tune an LLM to follow instructions

(i.e., to become a Chat Model) over a particular

database.

P3. How to simplify the construction of LLM-based

database (conversational) interfaces.

Sections 3 and 4 cover the ﬁrst two problems with

the help of examples based on two well-known LLM

families, GPT and LLaMA, developed by OpenAI

and Meta, respectively. Likewise, Section 5 discusses

the third problem, with the help of examples based on

two frameworks, LangChain and LlamaIndex.

The paper is organized as follows. Section 2 re-

views LLMs and some popular LLM families. Sec-

tion 3 comments on GPT-based database interfaces

and Section 4, LLaMA-based database interfaces.

Section 5 covers LangChain and LlamaIndex. Section

6 concludes the paper by summarizing the lessons

learned.

2 BACKGROUND

Brieﬂy, Affolters et al. (Affolter et al., 2019) of-

fers a comparative survey of 24 NL database inter-

faces, classiﬁed as keyword-, pattern-, parsing- and

grammar-based. As lessons learned, the authors in-

dicate that, for simple questions, keyword-based sys-

tems are enough, whereas for complex questions, an

NL database interface needs to apply some parsing to

identify structural dependencies, and that grammar-

based systems are the most powerful, but depend on

manually designed rules.

Diefenbach et al. (Diefenbach et al., 2020) as-

sumes that an NL query can be correctly interpreted

considering only the semantics of the words. For ex-

ample, the NL query “Give me actors born in Berlin”

is reformulated as a keyword question “Berlin, actors,

born in”, that is, the semantics of the words “Berlin”,

“actors”, and “born” sufﬁce to deduce the intention

of the user. The authors proceed to map the NL query

into one of four templates, which limits the usefulness

of the process. Interestingly, however, they show that

the process has good performance on several RDF

datasets from the QALD benchmarks in different lan-

guages.

Recent surveys of conversational interfaces

(software-based dialogue systems or chatbots) can be

found in (Motger et al., 2022)(Caldarini et al., 2022).

Such systems offer users the possibility to interact

with databases using techniques similar to human

conversation; they accept queries typically formu-

lated in NL, return answers also typically expressed

in NL, and support persistent conversational contexts,

thereby generating user-computer dialogues.

In another direction, the approaches involved in

Natural Language Processing (NLP) tasks have un-

dergone profound changes over the past decades.

(Manning, 2022) classiﬁes NLP approaches in

roughly four eras. From 1950 to 1969, the ﬁrst

era was characterized by simple rule-based models,

achieving very limited results due to the lack of com-

putation, data, and knowledge of human language

structure. The second era, from 1970 to 1992, ben-

eﬁted from the rapid development of linguistic the-

ories, allowing the creation of a new generation of

hand-built systems from knowledge-based artiﬁcial

intelligence models. The third era, from 1993 to

2012, saw a shift from knowledge-based to machine-

learning models, when digital text became widely

available, providing enough data to allow some level

of language understanding by using mainly super-

vised machine-learning methods. The last era, from

2013 to the present, extended the use of machine

learning, allowing a more generalized language un-

derstanding by using self-supervised learning and

deep learning-based models to represent words with

dense vectors. Within this context, the ﬁrst LLM was

successfully trained in 2018 by exposing a model to

an extremely large quantity of text, representing an

enormous amount of language knowledge.

LLMs follow the transformer neural network

architecture (Vaswani et al., 2017)(Tunstall et al.,

2022)(Jurafsky and Martin, 2023). Manning (Man-

ning, 2022) contains a readable introduction to trans-

formers. The central intuition behind transformers is

the use of the attention mechanism, where the rep-

resentation of a given position is computed based on

a weighted combination from others. The result is a

much more complex setup when compared to simple

neural networks. Based on this architecture, the self-

supervised strategy for LLMs usually involves mask-

ing words in the text and training the model to predict

the missing words from the outer context. After this

process is extensively repeated, the model learns gen-

eralized syntactic structures of sentences.

As a consequence of their generalization capacity,

LLMs perform well in a wide variety of NLP tasks, to

the point of shifting the focus of NLP research away

from the previous paradigm of training specialized su-

pervised models for each task. An LLM can be rede-

ployed to a particular NLP task with a small number

of further instructions.

However, an LLM reﬂects the data it was trained

with. In particular, an LLM suffers from the “tem-

poral generalization problem” – capturing facts that

change over time – and the “factual grounding prob-

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

374

lem” – capturing speciﬁc facts. To circumvent these

limitations, the user may ﬁne-tune the LLM, that is,

retrain it with more examples, or he may adopt few-

shot learning, that is, add a few examples in a dialog

interaction so that the model can capture what the user

is trying to do and generate a plausible completion.

Chat Models must in turn (Ouyang et al., 2022):

avoid unacceptable answers, including avoiding in-

correct or made-up facts, nonsensical answers, and

biased or toxic text; seek useful user dialogues, that

is, the language modeling objective must be aligned

with the objective “follow the user’s instructions help-

fully and safely”. Putting it in more explicit terms,

a Chat Model must be (Askell et al., 2021): helpful

(they should help the user solve their task), honest

(they shouldn’t fabricate information or mislead the

user), and harmless (they should not cause physical,

psychological, or social harm to people or the envi-

ronment).

Currently, there are three well-known LLM fami-

lies:

GPT – Generative Pre-Trained Transformer: GPT

3.5

(announced on November 30th, 2022) and

GTP 4

(announced on March 14th, 2023) respec-

tively support ChatGPT and ChatGPT Plus.

LLaMA – Large Language Model Meta AI:

LLaMA

(announced on February 23rd, 2023)

and LLaMA 2

(announced on July 18th, 2023)

were developed by Meta.

LaMDA – Language Model for Dialogue Applica-

tions and PaLM – Pathways Language Model:

LaMDA

(announced on May 18th, 2021) and

PaLM (announced in March 2023 and upgraded

in May 2023) support Google’s Bard.

The examples in Sections 3 and 4 respectively use

the GPT and the LLaMA families.

3 GPT-BASED INTERFACES

3.1 The GPT Family and OpenAI API

The OpenAI Research API offers several models

of the GPT family, with different capabilities and

https://platform.openai.com/docs/model-index-for-

researchers

https://openai.com/research/gpt-4

https://ai.facebook.com/blog/large-language-model-

llama-meta-ai/

https://ai.facebook.com/blog/large-language-model-

llama-meta-ai/

https://en.wikipedia.org/wiki/LaMDA

prices

OpenAI models are non-deterministic, that is,

identical inputs can yield different outputs.

Section 3.3 illustrates the limitations of the max-

imum number of tokens (column “Max tokens”).

Note that the models were trained up to September

2021 or June 2021, which imposes limitations, as il-

lustrated in Section 3.2.

The OpenAI API covers many tasks that involve

understanding or generating natural language, code,

or images.

The API has three key concepts:

Prompts: a model is “programmed” by providing

some instructions or a few examples to the com-

pletions and chat completions endpoint; prompts

can be used for any task, including content or code

generation, summarization, expansion, conversa-

tion, creative writing, style transfer.

Tokens: text is processed by breaking it down into

tokens; the number of tokens processed in a given

API request depends on the length of both inputs

and outputs.

Models: the API works with several models, as listed

in the documentation.

As for training a model, the user can adopt few-

shot learning, passing examples in the prompts, or

ﬁne-tune the model, as mentioned in Section 2. How-

ever, ﬁne-tuning is currently only available for the

GPT 3 base models.

3.2 ChatGPT

ChatGPT from OpenAI was released on November

30, 2022, uses GPT 3.5, and is optimized for user

dialogues. According to OpenAI

, “ChatGPT pro-

vides articulated answers across several knowledge

domains, but uneven factual accuracy has been iden-

tiﬁed as a signiﬁcant drawback, and sometimes it

writes plausible-sounding but incorrect or nonsensi-

cal answers”. Training data include “Internet phe-

nomena”, software documentation, and code. Chat-

GPT has limited knowledge of events that occurred

after September 2021. The training process leveraged

both supervised learning and reinforcement learning

(Neelakantan et al., 2022). Both approaches use hu-

man trainers to improve the model’s performance.

OpenAI used outsourced Kenyan workers earning

less than USD 2.00 per hour to label “toxic content”.

The relationship between the models the OpenAI

API offers and those mentioned in the papers is de-

scribed at https://platform.openai.com/docs/model-index-

for-researchers

ChatGPT: Optimizing Language Models for Dialogue.

https://openai.com/blog/chatgpt/

On the Construction of Database Interfaces Based on Large Language Models

375

The power and limitations of ChatGPT to gener-

ate dialogues are well-known. The reader is invited to

test ChatGPT by running the following three generic

knowledge queries, for which ChatGPT produces an-

swers of varying quality: (Correct answer) “How Liz

Taylor, Richard Burton, and John Hurt are related?”;

(Incorrect answer) “Where did D. Pedro I of Brazil

die?”; (Failed answer due to lack of data) “Tell me

about the coronation of Charles III of England.”

ChatGPT Plus, the paid version of ChatGPT, was

released on March 14th, 2023, uses GPT 4 (Ope-

nAI, 2023), at a cost of USD 20.00 per month (on

May 2023). According to the OpenAI annoucement

“GPT-4 is 82% less likely to respond to requests for

disallowed content, is 40% more likely to produce fac-

tual responses than GPT-3.5, and can take images as

input”.

3.3 Examples of GPT-Based Database

Interfaces

Innumerable applications of ChatGPT were published

in the few months after its announcement (Abdullah

et al., 2022)(Fraiwan and Khasawneh, 2023), rang-

ing from a simple application developed to help write

English essays (Fitria, 2023) to more complex medi-

cal applications (Lecler et al., 2023), passing through

tourism (Carvalho and Ivanov, 2023), digital enter-

tainment [omitted], and analyzing social systems data

(Wang et al., 2023).

Such applications can beneﬁt from the fact that

ChatGPT is context-aware and accepts feeding data

before questions. One can append text directly to the

prompt or provide a contextual set of documents us-

ing LlamaIndex (see Section 5.2). Passing data as

prompts has one major limitation, though: the number

of tokens may not exceed the LLM limits.

The rest of this section describes two simple ex-

amples that use ChatGPT (or the OpenAi API) to cre-

ate NL database interfaces with literally no program-

ming.

The simplest way to create a GPT-based database

interface would be as follows:

1. Create a description of the relational schema and

pass it as a context C.

2. Given an NL query N, ask to translate N into an

SQL query Q under the context C.

3. Execute Q.

The same approach also applies to RDF datasets with

RDF schemas to translate NL queries into SPARQL.

GPT 3.5 (and GTP-4) has been trained for these trans-

lation tasks and performs remarkably well, as re-

https://openai.com/product/gpt-4

ported in (Liu et al., 2023). The application could

adopt LangChain (see Section 5.1) to force the execu-

tion of the SQL (or SPARQL) query.

However, this approach has three limitations:

1. The schema and the NL query must be written in

the same terms. Hence, the relational schema has

to be hidden with a layer of views that remaps the

internal table and column names to terms that the

user is familiar with.

2. The database must have a schema, which is al-

ways true for relational databases but not RDF

datasets.

3. The database schema must generate fewer tokens

than the maximum allowed

Furthermore, the NL query may refer to terms

that denote entity names, attribute values, or data in

general that occur in the database, and not just to

terms that occur in the relational schema. For exam-

ple, consider the NL query “Where is Khwarazm?”,

which asks where the oasis region “Khwarazm” is

located (the region where Muhammad ibn Musa al-

Khwarizmi lived – the mathematician from whose

name the term “algorithm” is derived), but omits the

detail that “Khwarazm” is the name of a region (the

answer is “Uzbekistan” and “Turkmenistan”). Inter-

estingly, ChatGPT can infer the missing information

and synthesize a correct SQL query. Note that passing

the entire database as context would not be feasible

due to prompt size limits.

To circumvent these limitations, an approach

would be to use a database keyword search tool to me-

diate access to the database. ChatGPT (or the OpenAI

API) would be used to extract a list K of keywords

from an NL query N (a text block) and pass K to the

database keyword search tool, which would then cre-

ate an answer to N. This approach would be a simple

way to provide an NL interface to the database key-

word search tool.

The GPT-based natural language interface to the

database keyword search tool would be as follows:

1. Create a description T of the database keyword

search API and pass it as a context.

2. Given an NL query N (a text block), ask to extract

a list K of keywords from N.

3. Pass K to the tool, under the context T , to generate

an answer for N.

Lastly, the text-to-SQL problem, that is, the trans-

lation of NL questions to SQL queries, has been in-

tensively studied. The Spider Web site

, a popular

benchmark to assess text-to-SQL tools, maintains a

https://yale-lily.github.io/spider

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

376

leaderboard of the best-performing tools. It should

be noted that the best four tools use GPT-4, whereas

the ﬁfth best tool uses GPT-3.5-Turbo-0301, at signif-

icantly lower cost.

4 LLaMA-BASED INTERFACES

This section describes two examples that address the

problem of ﬁne-tuning an LLM to follow instruc-

tions (i.e., to become a Chat Model) over a particular

database. The examples are Alpaca and GPT4ALL

that create chat models by ﬁne-tuning LLaMA.

4.1 The LLaMA Family

LLaMA (Touvron et al., 2023), which stands for

Large Language Model Meta AI, is a collection

of transformer-based large language models ranging

from 7B to 65B parameters. Based on a recent work

(Hoffmann et al., 2022), the authors decided to in-

crease the number of tokens, while ﬁxing the number

of parameters. It was observed that a 7B model con-

tinues to improve even after 1T tokens, when the rec-

ommendation of (Hoffmann et al., 2022) was to train

a 10B model on 200B tokens.

The results in (Touvron et al., 2023) indicate that

the series of LLaMA LLMs were competitive with

state-of-the-art models. In particular, LLaMA-13B

outperformed GPT 3 despite being more than ten

times smaller. The authors also pointed out that the

accuracy notably increased by increasing the number

of tokens.

The LLaMA 2 family

, announced on July 18th,

2023, changed the LLM open-source scenario. The

family includes pretrained and ﬁne-tuned language

models (LLaMA Chat, Code LLaMA) — ranging

from 7B to 70B parameters. Llama 2 was trained

between January 2023 and July 2023, and the pre-

training data has a cutoff of September 2022, but

some tuning data is more recent, up to July 2023.

Llama 2-Chat models are optimized for dialogue use

cases and outperform open-source chat models on

most benchmarks and human evaluations for helpful-

ness and safety. Code Llama comes in multiple ﬂa-

vors to cover a wide range of applications: foundation

models (Code Llama), Python specializations (Code

Llama - Python), and instruction-following models

(Code Llama - Instruct) with 7B, 13B, and 34B pa-

rameters each.

https://ai.meta.com/resources/models-and-libraries/lla

ma/

4.2 Examples of LLaMA-Based

Database Conversational Interfaces

Alpaca (Taori et al., 2023), developed by the Stanford

Center for Research on Foundation Models (CRFM),

is an instruction-following model ﬁne-tuned from the

LLaMA-7B model. Alpaca adopted the self-instruct

method (Wang et al., 2022) in an interesting way.

The authors applied GPT 3.5 (text-davinci-003) to the

175 human-written instruction-output pairs deﬁned

in (Wang et al., 2022), generating 52K unique pairs

of instructions and the corresponding outputs. The

dataset generated was used to ﬁne-tune the LLaMA-

7B model using Hugging Face’s training framework.

To evaluate Alpaca, the authors created a varied list

of user-oriented instructions: email writing, social

media, and productivity tools. The evaluation con-

ﬁrmed that Alpaca-7B achieved competitive perfor-

mance when compared with GPT 3.5.

GPT4All (Anand et al., 2023) is an assistant-

style chatbot from Nomic AI that features chat mod-

els ﬁne-tuned from LLaMA. Nomic AI ﬁrst gathered

sample questions/prompts from three publicly avail-

able data sources, the uniﬁed chip2 subset of LAION

OIG, coding questions with a random sub-sample

of Stackoverﬂow Questions, and instruction-tuning

with a sub-sample of Big-science/P3. These ques-

tions/prompts were handled over to GPT-3.5-Turbo

to generate roughly 800,000 prompt-generation pairs.

Then, the data was curated to remove low-diversity re-

sponses, ensuring that the data covered a wide range

of topics, as in the Alpaca project. The ﬁnal dataset

contained approximately 440,000 prompt-generation

pairs. Nomic AI then trained several models ﬁne-

tuned from LLaMA-7B, using LoRA (Hu et al., 2021)

for four epochs.

Alpaca and GPT4ALL illustrate a much more

elaborated process to create a database conversational

interface that can be summarized as follows:

1. Let D

, ..., D

be one or more databases and L be

an LLM.

2. Create an initial set P of sample ques-

tions/prompts from the databases.

3. Create an expanded set P

′

of questions/prompts,

perhaps using ChatGPT.

4. Create a curated set P

′′

from P

′

5. Create a chat model C by ﬁne-tuning L using P

′′

6. The database conversational interface for

, ..., D

will be built using the chat model C.

Other examples that follow this strategy, but

which are not based on LLaMA, are GPT4All-J

(Anand et al., 2023) from Nomic AI, announced on

On the Construction of Database Interfaces Based on Large Language Models

377

April 13, 2023, and Dolly 2.0 (Conover et al., 2023)

from Databricks, announced on April 12, 2023.

Finally, following its announcement, there was a

hectic series of models obtained by ﬁne-tuning the

LLaMA 2 models. The Open LLM Leaderboard

which evaluates open LLMs and chatbots, lists a large

number of models obtained by ﬁne-tuning LLaMA

2 models, mostly obtained using LoRA–Low-Rank

Adaptation (Hu et al., 2021), or QLoRA–Quantized

Low-Rank Adaptation (Dettmers et al., 2023).

5 GENERIC LLM FRAMEWORKS

This section introduces two frameworks, LangChain

and LlamaIndex. LangChain helps isolate the devel-

opment of an LLM-based database (conversational)

interface from the underlying LLM, whereas the Lla-

maIndex is a simple, ﬂexible interface between an

LLM and external data.

5.1 LangChain

LangChain

is a framework to help develop LLM-

powered applications. It is designed to enable two

types of applications:

• Data-aware: connects an LLM to a data source;

• Agentic: allows an LLM to interact with its envi-

ronment.

It also provides two main props:

• Components consist of modular abstractions nec-

essary to work with language models;

• Use-Case Speciﬁc Chains consist of assembling

components in particular ways to accomplish a

particular use case. Its central idea is to “chai”

components to create more advanced use cases

around LLMs.

At its core, LangChain has six modules:

• Schema covers the basic data types and schemas

used.

• Models represent the different types of models

that are used in LangChain.

• Prompt refers to the input to the model and is the

new way of programming models.

• Index refers to structuring documents so that

LLMs can best interact with them. Indexes and

retrieval types are currently centered around vec-

tor databases.

https://huggingface.co/spaces/HuggingFaceH4/open

\ llm\ leaderboard

https://docs.langchain.com

• Memory is the concept of storing and retrieving

data in a conversation. There are two main meth-

ods: based on input, fetch any relevant pieces of

data, and, based on the input and output, update

the state accordingly.

• Chain is a generic concept that returns a sequence

of modular components (or other chains) com-

bined in a particular way to accomplish a common

use case.

• Agent accesses a suite of tools and, depending on

the user input, decides which, if any, of these tools

to call.

LangChain supports application development us-

ing Python and JavaScript/TypeScript.

LangChain can help design applications such

as personal assistants, chat-bots, question-and-

answering (QA) over documents, summarizing long

documents, extracting structured information from

unstructured text, and querying tabular data. For

example, (Kemper, 2023) illustrated how to use

LangChain to extend GPT-based models for tasks

such as QA from tabular data and QA from doc-

uments, navigating through more complex custom

database schemas. For this task, a chain type called

SQLDatabaseChain, whose main task is to provide

GPT with information from custom databases, is

used, as shown in the code snippet

below. Note

that “gpt-3.5-turbo” is used as a parameter of the

model class. A set of eight queries over the database

schema was designed for testing. The queries address

different challenging aspects of information retrieval.

The results showed that one query had an obvious

problem. The remaining seven were handled well, in-

cluding queries requiring joins over three tables.

#creating the model object

as ChatOpenAI class

llm = ChatOpenAI(temperature=0,

model_name="gpt-3.5-turbo")

#creating a Chain using SQLDatabaseChain

db_chain = SQLDatabaseChain(llm,

database=db, verbose=True)

#running a query example

db_chain.run("Sum up the total revenue")

5.2 LlamaIndex

LlamaIndex

, also known as GPT Index, is a simple,

ﬂexible interface between LLMs and external data. It

provides the following tools:

Full code is available at https://github.com/kemperd/l

angchain-sqlchain

https://gpt-index.readthedocs.io/en/latest/

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

378

• Data connectors to existing data sources and data

formats (APIs, PDFs, docs, SQL, etc.);

• Indices over unstructured and structured data for

use with LLMs;

• A user interface to query the index (feed in an in-

put prompt) and obtain a knowledge-augmented

output; and

• A comprehensive toolset, trading off cost and per-

formance.

The LlamaIndex indices store context in an easy-

to-access format for prompt insertion. They help to

deal with prompt limitations when the context is too

big, and dealing with text splitting. Depending on

the index structure adopted, LlamaIndex offers dif-

ferent methods of synthesizing a response (response

synthesis mode) from the relevant context. The re-

sponse synthesis modes are based on two concepts

node and response synthesis. Node is a generic data

container that contains a piece of data (e.g., a chunk

of text, an image, a table, etc.) and additional meta-

data about its relationship to other Node objects. Re-

sponse synthesis synthesizes a response given the re-

trieved Nodes. LlamaIndex also supports synthesiz-

ing a response across heterogeneous data sources by

composing a graph over the existing data.

The general workﬂow use of LlamaIndex is as fol-

lows:

1. Load in documents, either manually or through a

data loader, using available data connectors.

2. Parse the documents into nodes.

3. Construct an index, from nodes or documents.

4. Optionally, build indices on top of other indices.

5. Query the index.

Querying an index involves three main compo-

nents:

Retriever: retrieves a set of nodes from an index,

given a query;

Response Synthesizer: takes a set of nodes and syn-

thesizes an answer, given a query; and

Query Engine: takes a query and returns a response

object.

LlamaIndex allows querying data for any down-

stream LLM use case, such as QA tasks, seman-

tic search, summarization, or as a component in a

chatbot. Indeed, it provides features powered by

LLMs to analyze structured data through augmented

text-to-SQL capabilities. The code snippet below

shows a simple example of using the “text-to-SQL”

conversion features. The toy table in this example,

city stats, contains city/population/country

information.

from sqlalchemy

import create_engine

from llama_index

import SQLDatabase,

GPTSQLStructStoreIndex

#use SQLAlchemy to setup a db connection

engine =

create_engine("<database_information>")

. . .

# wrap the SQLAlchemy engine

# with the SQLDatabase wrapper

sql_database = SQLDatabase(engine,

include_tables=["city_stats"])

# the db is already populated with data

# then, create the index from the table

# with a blank document list

index = GPTSQLStructStoreIndex([],

sql_database=sql_database,

table_name="city_stats")

query_engine = index.as_query_engine()

response =

query_engine.query("Which city

has the highest population?")

print(response)

The derived SQL query can be accessed by the code

line

response.extra info["sql query"]

which shows something like

SELECT city_name, population

FROM city_stats

ORDER BY population DESC LIMIT 1

6 CONCLUSIONS

This paper argued that LLMs can be proﬁtably used to

construct natural language database (conversational)

interfaces. The discussion focused on three problems:

(1) How to use an LLM to create an NL database in-

terface; (2) How to ﬁne-tune an LLM to follow in-

structions (i.e., to become a Chat Model) over a par-

ticular database; and (3) How to simplify the con-

struction of LLM-based database (conversational) in-

terfaces. Sections 3, 4, and 5 brieﬂy addressed these

problems with the help of examples.

The paper left out a discussion on Retrieval Aug-

mented Generation (RAG), for lack of space. RAG

does not require model ﬁne-tuning but combines a

prompt engineering approach based on vector embed-

dings to encode the data, stored in a vector store.

The take-home lessons can be summarized as fol-

lows. The OpenAI API and ChatGPT offer interest-

ing facilities to create NL database interfaces which

On the Construction of Database Interfaces Based on Large Language Models

379

greatly simpliﬁes the interpretation of NL queries, the

generation of NL answers, and the control of user dia-

log. The ﬁne-tuning of an LLM to follow instructions

generated from a data source is far more challenging.

LangChain offers a useful framework to isolate the

development of the database interface from the under-

lying LLM, and the LlamaIndex provides facilities to

connect an LLM and a data source.

Lastly, the leaderboards mentioned before provide

references to the latest evolutions in LLM applica-

tions related to the topics of this paper.

REFERENCES

Abdullah, M., Madain, A., and Jararweh, Y. (2022).

Chatgpt: Fundamentals, applications and social

impacts. In 2022 Ninth International Confer-

ence on Social Networks Analysis, Management

and Security (SNAMS’22), pages 1–8. doi:

10.1109/SNAMS58071.2022.10062688.

Affolter, K., Stockinger, K., and Bernstein, A. (2019). A

comparative survey of recent natural language inter-

faces for databases. The VLDB Journal, 28.

Anand, Y. et al. (2023). Gpt4all-j: An

apache-2 licensed assistant-style chatbot.

https://static.nomic.ai/gpt4all/2023 GPT4All-

J Technical Report 2.pdf.

Askell, A. et al. (2021). A general language assis-

tant as a laboratory for alignment. arXiv. doi:

10.48550/arXiv.2112.00861.

Caldarini, G., Jaf, S., and McGarry, K. (2022). A literature

survey of recent advances in chatbots. Information,

13(1).

Carvalho, I. and Ivanov, S. (2023). Chatgpt for tourism:

applications, beneﬁts, and risks. Tourism Review. doi:

10.1108/TR-02-2023-0088.

Conover, M. et al. (2023). Free dolly: Introducing

the world’s ﬁrst truly open instruction-tuned llm.

https://www.databricks.com/blog/2023/04/12/dolly-

ﬁrst-open-commercially-viable-instruction-tuned-

llm. online: May 22, 2023.

Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer,

L. (2023). Qlora: Efﬁcient ﬁnetuning of quantized

llms.

Diefenbach, D., Both, A., Singh, K., and Maret, P. (2020).

Towards a question answering system over the seman-

tic web. Semant. Web, 11(3):421–439.

Fitria, T. N. (2023). Artiﬁcial intelligence (ai) technol-

ogy in openai chatgpt application: A review of chat-

gpt in writing english essay. ELT Forum: Journal

of English Language Teaching, 12(1):44–58. doi:

10.15294/elt.v12i1.64069.

Fraiwan, M. and Khasawneh, N. (2023). A review of

chatgpt applications in education, marketing, soft-

ware engineering, and healthcare: Beneﬁts, draw-

backs, and research directions. arXiv. doi:

10.48550/arXiv.2305.00237.

Hoffmann, J. et al. (2022). Training compute-

optimal large language models. arXiv. doi:

10.48550/arXiv.2203.15556.

Hu, E. J. et al. (2021). Lora: Low-rank adapta-

tion of large language models. arXiv. doi:

10.48550/arXiv.2106.09685.

Jurafsky, D. and Martin, J. H. (2023). Speech and Language

Processing: An Introduction to Natural Language

Processing, Computational Linguistics, and Speech

Recognition. Prentice Hall. Third Edition draft.

Kemper, D. (2023). Querying complex

database schemas with gpt and langchain.

https://analytix.nl/post/querying-complex-database-

schemas-with-gpt-and-langchain. online: May 22,

2023.

Lecler, A., Duron, L., and Soyer, P. (2023). Revolutioniz-

ing radiology with gptbased models: Current appli-

cations, future possibilities and limitations of chat-

gpt. Diagnostic and Interventional Imaging. doi:

10.1016/j.diii.2023.02.003.

Liu, A., Hu, X., Wen, L., and Yu, P. S. (2023). A com-

prehensive evaluation of chatgpt’s zero-shot text-to-

sql capability.

Manning, C. D. (2022). Human Language Understand-

ing & Reasoning. Daedalus, 151(2):127–138. doi:

10.1162/daed a 01905.

Motger, Q., Franch, X., and Marco, J. (2022). Software-

based dialogue systems: Survey, taxonomy, and chal-

lenges. ACM Comput. Surv., 55(5).

Neelakantan, A. et al. (2022). Text and code

embeddings by contrastive pre-training. doi:

10.48550/ARXIV.2201.10005.

OpenAI (2023). Gpt-4 technical report. arXiv. doi:

10.48550/arXiv.2303.08774.

Ouyang, L. et al. (2022). Training language models to fol-

low instructions with human feedback. In Proceed-

ings of the Advances in Neural Information Process-

ing Systems 35 (NeurIPS’22). online: May 25, 2023.

Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X.,

Guestrin, C., Liang, P., and Hashimoto, T. B. (2023).

Stanford alpaca: An instruction-following llama

model. https://github.com/tatsu-lab/stanford alpaca.

Touvron, H. et al. (2023). Llama: Open and efﬁ-

cient foundation language models. arXiv. doi:

10.48550/arXiv.2302.13971.

Tunstall, L., von Werra, L., and Wolf, T. (2022). Natu-

ral Language Processing with Transformers, Revised

Edition. O’Reilly Media, Inc.

Vaswani, A. et al. (2017). Attention is all you need. In

Proceedings of the Advances in Neural Information

Processing Systems 30 (NIPS’17), volume 30.

Wang, F.-Y. et al. (2023). Chatgpt for computational social

systems: From conversational applications to human-

oriented operating systems. IEEE Transactions on

Computational Social Systems, 10(2):414–425. doi:

10.1109/TCSS.2023.3252679.

Wang, Y. et al. (2022). Self-instruct: Aligning

language model with self generated instructions.

doi:10.48550/arXiv.2212.10560.

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

380