Automated Test Generation Using LLM Based on BDD: A Comparative Study

Shexmo Richarlison Ribeiro dos Santos; Luiz Felipe Cirqueira dos Santos; Marcus Silva; Marcos  Cesar Barbosa dos Santos; Mariano  Florencio Mendonça; Marcos  Venicius Santos; Marckson  Fábio da Silva Santos; Alberto  Luciano de Souza Bastos; Sabrina Marczak; Michel Soares; Fabio  Gomes Rocha

doi:10.5220/0013683600003985

Automated Test Generation Using LLM Based on BDD: A Comparative Study

Shexmo Richarlison Ribeiro dos Santos, Luiz Felipe Cirqueira dos Santos, Marcus Silva, Marcos Cesar Barbosa dos Santos, Mariano Florencio Mendonça, Marcos Venicius Santos, Marckson Fábio da Silva Santos, Alberto Luciano de Souza Bastos, Sabrina Marczak, Michel Soares, Fabio Gomes Rocha

2025

Abstract

In Software Engineering, seeking methods that save time in product development and improve delivery quality is essential. BDD (Behavior-Driven Development) offers an approach that, through creating user stories and acceptance criteria in collaboration with stakeholders, aims to ensure quality through test automation, allowing the validation of criteria for product acceptance. The lack of test automation poses a problem, requiring manual work to validate acceptance. To solve the issue of test automation in BDD, we conducted an experiment using standardized prompts based on user stories and acceptance criteria written in Gherkin syntax, automatically generating tests in four Large Language Models (ChatGPT, Gemini, Grok, and GitHub Copilot). The experiment compared the following aspects: response similarity, test coverage concerning acceptance criteria, accuracy, efficiency in the time required to generate the tests, and clarity. The results showed that the LLMs have significant differences in their responses, even with similar prompts. We observed variations in test coverage and accuracy, with ChatGPT standing out in both cases. In terms of efficiency, related to time, Grok was the fastest while Gemini was the slowest. Finally, regarding the clarity of the responses, ChatGPT and GitHub Copilot were similar and more effective than the others. The results show that the LLMs adopted in the study can understand and generate automated tests accurately. However, they still do not eliminate the need for human assessment, but they do serve as a support to speed up the automation process.

Download

Paper Citation

in Harvard Style

Ribeiro dos Santos S., Cirqueira dos Santos L., Silva M., Barbosa dos Santos M., Mendonça M., Santos M., Santos M., Bastos A., Marczak S., Soares M. and Rocha F. (2025). Automated Test Generation Using LLM Based on BDD: A Comparative Study. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-772-6, SciTePress, pages 47-58. DOI: 10.5220/0013683600003985

in Bibtex Style

@conference{webist25,
author={Shexmo Ribeiro dos Santos and Luiz Cirqueira dos Santos and Marcus Silva and Marcos Barbosa dos Santos and Mariano Mendonça and Marcos Santos and Marckson Santos and Alberto Bastos and Sabrina Marczak and Michel Soares and Fabio Rocha},
title={Automated Test Generation Using LLM Based on BDD: A Comparative Study},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2025},
pages={47-58},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013683600003985},
isbn={978-989-758-772-6},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - Automated Test Generation Using LLM Based on BDD: A Comparative Study
SN - 978-989-758-772-6
AU - Ribeiro dos Santos S.
AU - Cirqueira dos Santos L.
AU - Silva M.
AU - Barbosa dos Santos M.
AU - Mendonça M.
AU - Santos M.
AU - Santos M.
AU - Bastos A.
AU - Marczak S.
AU - Soares M.
AU - Rocha F.
PY - 2025
SP - 47
EP - 58
DO - 10.5220/0013683600003985
PB - SciTePress