
these tests can diminish their effectiveness, particu-
larly in industrial environments, while having a lesser
impact on open-source code bases.
Overall, these results highlight the value of test-
driven development and context-rich prompts in re-
generating accurate and maintainable code, especially
within industrial software evolution workflows.
REFERENCES
Berger-Levrault (2025). Official website of berger-levrault.
https://www.berger-levrault.com/. Accessed: 2025-
03-10.
Chen, X., Liu, C., and Song, D. (2018). Tree-to-tree neural
networks for program translation. Advances in neural
information processing systems, 31.
Claris
´
o, R. and Cabot, J. (2023). Model-driven prompt engi-
neering. In 2023 ACM/IEEE 26th International Con-
ference on Model Driven Engineering Languages and
Systems (MODELS), pages 47–54. IEEE.
Davis, A. L. and Davis, A. L. (2019). Rxjava. Reactive
Streams in Java: Concurrency with RxJava, Reactor,
and Akka Streams, pages 25–39.
Fagadau, I. D., Mariani, L., Micucci, D., and Riganelli,
O. (2024). Analyzing prompt influence on automated
method generation: An empirical study with copilot.
In Proceedings of the 32nd IEEE/ACM International
Conference on Program Comprehension, pages 24–
34.
Fakhoury, S., Naik, A., Sakkas, G., Chakraborty, S., and
Lahiri, S. K. (2024). Llm-based test-driven interactive
code generation: User study and empirical evaluation.
arXiv preprint arXiv:2404.10100.
Gao, A. (2023). Prompt engineering for large language
models. Available at SSRN 4504303.
Gu, Q. (2023). Llm-based code generation method for
golang compiler testing. In Proceedings of the 31st
ACM Joint European Software Engineering Confer-
ence and Symposium on the Foundations of Software
Engineering, pages 2201–2203.
Gu, X., Chen, M., Lin, Y., Hu, Y., Zhang, H., Wan, C., Wei,
Z., Xu, Y., and Wang, J. (2024). On the effectiveness
of large language models in domain-specific code gen-
eration. ACM Transactions on Software Engineering
and Methodology.
Heston, T. F. and Khun, C. (2023). Prompt engineering in
medical education. International Medical Education,
2(3):198–205.
Husein, R. A., Aburajouh, H., and Catal, C. (2024). Large
language models for code completion: A systematic
literature review. Computer Standards & Interfaces,
page 103917.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L.,
Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S.,
Yang, Y., et al. (2024). Self-refine: Iterative refine-
ment with self-feedback. Advances in Neural Infor-
mation Processing Systems, 36.
Mathews, N. S. and Nagappan, M. (2024). Test-driven
development for code generation. arXiv preprint
arXiv:2402.13521.
Microsoft (2023). Codebleu – a weighted syntactic and
semantic bleu for code synthesis evaluation. https:
//github.com/microsoft/CodeXGLUE/blob/main/
Code-Code/code-to-code-trans/CodeBLEU.MD.
Accessed: 2025-03-10.
Moosetechnology (2020). Official website of modular
moose. https://modularmoose.org/. Accessed: 2025-
04-18.
Munck, A. and Madsen, J. (2015). Test-driven modeling
of embedded systems. In 2015 Nordic Circuits and
Systems Conference (NORCAS): NORCHIP & Inter-
national Symposium on System-on-Chip (SoC), pages
1–4. IEEE.
Nam, D., Macvean, A., Hellendoorn, V., Vasilescu, B., and
Myers, B. (2024). Using an llm to help with code un-
derstanding. In Proceedings of the IEEE/ACM 46th
International Conference on Software Engineering,
pages 1–13.
Noy, S. and Zhang, W. (2023). Experimental evidence on
the productivity effects of generative artificial intelli-
gence. Science, 381(6654):187–192.
Olausson, T. X., Inala, J. P., Wang, C., Gao, J., and Solar-
Lezama, A. (2023). Is self-repair a silver bullet for
code generation? In The Twelfth International Con-
ference on Learning Representations.
Ren, S., Guo, D., Lu, S., Zhou, L., Liu, S., Tang, D., Sun-
daresan, N., Zhou, M., Blanco, A., and Ma, S. (2020).
Codebleu: a method for automatic evaluation of code
synthesis. arXiv preprint arXiv:2009.10297.
Rugaber, S. and Stirewalt, K. (2004). Model-driven reverse
engineering. IEEE software, 21(4):45–53.
Sadik, A. R., Brulin, S., Olhofer, M., Ceravola, A., and
Joublin, F. (2024). Llm as a code generator in
agile model driven development. arXiv preprint
arXiv:2410.18489.
Saha, B. K. (2024). Generative artificial intelligence for
industry: Opportunities, challenges, and impact. In
2024 International Conference on Artificial Intelli-
gence in Information and Communication (ICAIIC),
pages 081–086. IEEE.
Soliman, A., Shaheen, S., and Hadhoud, M. (2024). Lever-
aging pre-trained language models for code genera-
tion. Complex & Intelligent Systems, pages 1–26.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M.-A., Lacroix, T., Rozi
`
ere, B., Goyal, N., Hambro,
E., Azhar, F., et al. (2023). Llama: Open and ef-
ficient foundation language models. arXiv preprint
arXiv:2302.13971.
Ugare, S., Suresh, T., Kang, H., Misailovic, S., and Singh,
G. (2024). Improving llm code generation with gram-
mar augmentation. arXiv preprint arXiv:2403.01632.
VerveineJ., M. (2022). Verveinej. Accessed: 2025-03-07.
Welz, L. and Lanquillon, C. (2024). Enhancing large lan-
guage models through external domain knowledge. In
International Conference on Human-Computer Inter-
action, pages 135–146. Springer.
Enhancing AI-Generated Code Accuracy: Leveraging Model-Based Reverse Engineering for Prompt Context Enrichment
353