NL-Based Database Administration for Handling Heterogeneous Datasets Using Finetuned LLM

Pradnya Sawant, Kavita Sonawane

2025

Abstract

Translating natural language (NL) questions into structured query language (SQL) queries is becoming increasingly important for making databases easier to use and manage. Different large language models (LLMs) have been used for this translation in recent years. These models are mostly trained and evaluated on datasets covering a few types of data manipulation language(DML) queries like projection, selection, aggregate functions, joins, etc. However, these datasets failed to contain queries required for Database Administrator(DBA) operations such as creating and modifying database schema, managing user permissions, etc. This paper presents an approach to help database administrators (DBAs) and end users interact with databases more intuitively by generating SQL queries from natural language inputs. As no such dataset is publicly available, we have created a specialized dataset called DBASQL, which includes common DBA operations addressing data definition language(DDL), data manipulation language(DML), and data control language(DCL) related natural language questions like creating tables, views, or indexes; inserting values; updating data types or values; renaming tables or columns; granting or revoking user permissions, paired with their corresponding SQL queries. For experimentation, we have finetuned Text-to-Text Transfer Transformer (T5) Large on our customized DBASQL dataset, aiming to improve the accuracy of these translations. Our evaluation shows that this approach effectively translates NL to SQL that addresses DBA operations, making it easier to handle DDL, DML, and DCL database operations without requiring extensive SQL knowledge. This research highlights the potential of NLP models to improve the efficiency of natural language to SQL translation by enabling smarter database interfaces for DBA as well. Also, the proposed DBASQL dataset can be integrated with any heterogeneous datasets, such as single-domain and cross-domain, for the translation of natural language to SQL queries. Hence, covering the border range of SQL queries that can be used by both end users and database administrators.

Download


Paper Citation


in Harvard Style

Sawant P. and Sonawane K. (2025). NL-Based Database Administration for Handling Heterogeneous Datasets Using Finetuned LLM. In Proceedings of the 3rd International Conference on Futuristic Technology - Volume 3: INCOFT; ISBN 978-989-758-763-4, SciTePress, pages 56-64. DOI: 10.5220/0013608700004664


in Bibtex Style

@conference{incoft25,
author={Pradnya Sawant and Kavita Sonawane},
title={NL-Based Database Administration for Handling Heterogeneous Datasets Using Finetuned LLM},
booktitle={Proceedings of the 3rd International Conference on Futuristic Technology - Volume 3: INCOFT},
year={2025},
pages={56-64},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013608700004664},
isbn={978-989-758-763-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Futuristic Technology - Volume 3: INCOFT
TI - NL-Based Database Administration for Handling Heterogeneous Datasets Using Finetuned LLM
SN - 978-989-758-763-4
AU - Sawant P.
AU - Sonawane K.
PY - 2025
SP - 56
EP - 64
DO - 10.5220/0013608700004664
PB - SciTePress