Table 2: Answer correctness score.
Functional Requirements Aya expanse :8b codellama:7b gem ma2:9b llama3:8b mistral- nemo:12b qwen2.5-cod er:7b
Application 1 0.524 0.314 0.864 0.003 0.577 0.722
Application 2 0.379 0.77 0.76 0.617 0.74 0.747
Application 3 0.656 0.647 0.448 0.655 0.732 0.624
Application 4 0.644 0.786 0.627 0.78 0.665 0.699
Application 5 0.387 0.308 0.47 0.517 0.469 0.646
Application 6 0.461 0.136 0.447 0.54 0.665 0.575
Application 7 0.465 0.271 0.406 0.397 0.366 0.625
Application 8 0.436 0.36 0.653 0.373 0.763 0.405
Application 9 0.479 0.382 0.865 0.484 0.532 0.665
Application 10 0.495 0.351 0.889 0.602 0.455 0.475
Language Models f or code generation is a growing
area of research in recent times. In this research, we
have explored the capability of LLMs to create these
UML diagrams. The objective of this research was to
compare a few base LLMs to find out which one of
them performed be st fo r generating PlantUML code
for use case diagrams.
From this research, we have identified that on
average, gemma2 outperformed the other models in
generating PlantUML cod e with least errors. How-
ever, from our ob servations, we see that even though
gemma2 has the be st average score, it did no t perform
the best for every input. T he results produced still has
a few syntactic and semantic inaccurac ie s. This is be-
cause base LLMs do not have any understanding of
the domain and will generate inconsistent r esults.
In order to improve the pe rformance of these mod-
els in this domain, extensive fine- tuning is required.
A set of diverse functional require ments can be used
for fin e-tuning to create a robust mo del capable of
handling complex functional requirements. Retirieval
Augmente d Generation can also be used to provide
context on UML syntax to the LLM. Future works
can use human feedback loops to iteratively refine the
results and improve the semantic quality of the gener-
ated results. Using com plex inputs for testing, includ-
ing ambiguous a nd incomplete inputs, can help assess
the model’s adaptability to d ifferent patters of inputs.
This refined model can reduce the human effort re-
quired in the generation of UML diagrams. The final
version of this improved model can be integrated into
software development pipelines to save the time an d
effort of software analysts and architects.
REFERENCES
Abdin, M., Aneja, J., Awadalla, H., Awadallah, A., Awan,
A. A., Bach, N., Bahree, A., Bakhtiari, A., Bao, J.,
Behl, H., et al. (2024). Phi-3 technical r eport: A
highly capable language model locally on your phone.
arXiv preprint arXiv:2404.14219.
Ahmed, S., Ahmed, A., and Eisty, N. U. (2022). Auto-
matic transformation of natural to unified modeling
language: A systematic review. In 2022 IEEE/ACIS
20th International Conference on Software Engineer-
ing Research, Management and Applications (SERA),
pages 112–119. I EEE.
Alessio, F., Sallam, A., and Chetan, A. (2024). Model gen-
eration from requirements with llms: an exploratory
study-replication package.
Anjali, S., Meera, N. M., and T hushara, M. (2019). A
graph based approach for keyword extraction from
documents. In 2019 Second International Confer-
ence on Advanced Computational and Communica-
tion Paradigms (ICACCP), pages 1–4. IEEE.
Ardimento, P., Bernardi, M. L ., and Cimitile, M. (2024).
Teaching uml using a rag-based llm. In 2024 Interna-
tional Joint Conference on Neural Networks ( IJCNN),
pages 1–8. IEEE.
De Bari, D. (2024). Evaluating large language models
in software design: A comparative analysis of uml
class diagram generation. PhD thesis, Politecnico di
Torino.
Devi Sree, R. and Swaminathan, J. (2018). Construction of
activity diagrams from java execution traces. In Am-
bient Communications and Computer Systems: RAC-
CCS 2017, pages 641–655. Springer.
Dubey, A., Jauhri, A ., Pandey, A., Kadian, A ., Al-Dahle,
A., Letman, A., Mathur, A., Schelten, A., Yang, A.,
Fan, A., et al. (2024). The llama 3 herd of models.
arXiv preprint arXiv:2407.21783.
Elallaoui, M., Nafil, K., and Touahni, R. (2018). Automatic
transformation of user stories into uml use case dia-
grams using nlp techniques. Procedia computer sci-
ence, 130:42–49.
Hachichi, H. (2022). A graph transformation approach
for modeling uml diagrams. International Journal of