
(modelling, prompting, coding) suiting the current
sub-problem. Our approach allows developers to ex-
periment with different prompts and LLMs to find the
best solution for their specific needs. The resulting
code is consistent with the static model structure and
traceable to the original prompts.
Future work is motivated by the discussion. We
are planning to address the life-cycle management of
the class model and its derived artifacts. As already
mentioned in the discussion, code added after the ini-
tial generation, either manually or via LLM prompt-
ing, should be propagated back into the model. This
would eventually allow for a round-trip engineering
process, where the model is the central artifact and
the code is derived from it in a preferredly deter-
ministic way while its traceability (e.g. its original
prompt) is maintained. Futhermore, integrations with
futher AI backends in addition to llama.cpp and Ol-
lama are desirable. Last, global prompts attached to
packages or classes may overcome the problem of do-
main unspecificity by guiding the LLM in generat-
ing platform-specific target code, obviating the need
of heaviweight MDD-like approaches such as domain
specific languages or code generation templates.
REFERENCES
Blinn, A., Li, X., Kim, J. H., and Omar, C. (2024). Stat-
ically Contextualizing Large Language Models with
Typed Holes. Proceedings of the ACM on Program-
ming Languages, 8(OOPSLA2):468–498.
Buchmann, T., Peinl, R., and Schw
¨
agerl, F. (2024). White-
box llm-supported low-code engineering: A vision
and first insights. In Proc. 27th Int’l Conf Model
Driven Engineering Languages and Systems, MOD-
ELS Companion ’24, page 556–560. ACM.
Buchmann, T. and Schw
¨
agerl, F. (2025). Bridging UML
and Code in Education: A Textual Modeling Lan-
guage for Teaching Object-Oriented Analysis and De-
sign. To appear, currently under review.
Burgue
˜
no, L., Di Ruscio, D., Sahraoui, H., and Wimmer, M.
(2025). Automation in Model-Driven Engineering: A
look back, and ahead. ACM Transactions on Software
Engineering and Methodology, page 3712008.
Chen, Mark et al. (2021). Evaluating Large Language Mod-
els Trained on Code. arXiv:2107.03374 [cs].
Cheng, Yao et al. (2024). FullStack Bench: Evaluating
LLMs as Full Stack Coders. arXiv:2412.00535 [cs].
Dinu, M., Leoveanu-Condrei, C., Holzleitner, M., Zellinger,
W., and Hochreiter, S. (2024). SymbolicAI: A frame-
work for logic-based approaches combining genera-
tive models and solvers. arXiv:2402.00854 [cs].
Fill, H., Fettke, P., and K
¨
opke, J. (2023). Conceptual model-
ing and large language models: Impressions from first
experiments with chatgpt. Enterp. Model. Inf. Syst.
Archit. Int. J. Concept. Model., 18:3.
Fowler, M. and Parsons, R. J. (2010). Domain-Specific Lan-
guages. Addison-Wesley Professional.
Frankel, D. S. (2003). Model Driven Architecture: Apply-
ing MDA to Enterprise Computing. Wiley Publishing,
Indianapolis, IN.
Guo, Daya et al. (2025). DeepSeek-R1: Incentivizing Rea-
soning Capability in LLMs via Reinforcement Learn-
ing. arXiv:2501.12948 [cs].
He, J., Treude, C., and Lo, D. (2025). LLM-Based Multi-
Agent Systems for Software Engineering: Literature
Review, Vision and the Road Ahead. ACM Transac-
tions on Software Engineering and Methodology.
Hui, B. et al. (2024). Qwen2.5-Coder Technical Report.
arXiv:2409.12186 [cs].
Jain, Naman et al. (2024). LiveCodeBench: Holistic and
Contamination Free Evaluation of Large Language
Models for Code. arXiv:2403.07974 [cs].
Li, Dacheng et al. (2025). S*: Test Time Scaling for Code
Generation. arXiv:2502.14382 [cs].
Lyu, M. R., Ray, B., Roychoudhury, A., Tan, S. H., and
Thongtanunam, P. (2024). Automatic programming:
Large language models and beyond. ACM Transac-
tions on Software Engineering and Methodology.
Mirzadeh, S., Alizadeh, K., Shahrokhi, H., Tuzel, O.,
Bengio, S., and Farajtabar, M. (2024). Gsm-
symbolic: Understanding the limitations of mathe-
matical reasoning in large language models. CoRR,
abs/2410.05229.
Netz, L., Michael, J., and Rumpe, B. (2024). From nat-
ural language to web applications: using large lan-
guage models for model-driven software engineering.
In Modellierung 2024, pages 179–195.
OMG (2017). Unified Modeling Language (UML). Object
Management Group, Needham, MA, formal/2017-12-
05 edition.
Puranik, B. S., Sonawane, A., Jose, J., Chavan, S., and Patil,
Y. (2024). Enhancement of Model Driven Software
Development using AI. In 4th Asian Conference on
Innovation in Technology (ASIANCON), pages 1–5.
Saba, W. S. (2023). Stochastic LLMs do not Understand
Language: Towards Symbolic, Explainable and Onto-
logically Based LLMs. In Conceptual Modeling, vol-
ume 14320, pages 3–19. Springer Nature Switzerland.
Steinberg, D., Budinsky, F., Paternostro, M., and Merks,
E. (2009). EMF Eclipse Modeling Framework. The
Eclipse Series. Addison-Wesley, Boston, MA, 2nd
edition.
Tian, M., Gao, L., Zhang, S., Chen, X., Fan, C., Guo, X.,
Haas, R., Ji, P., Krongchon, K., and Li, Y. (2025). Sci-
code: A research coding benchmark curated by sci-
entists. Advances in Neural Information Processing
Systems, 37:30624–30650.
V
¨
olter, M., Stahl, T., Bettin, J., Haase, A., and Helsen, S.
(2006). Model-Driven Software Development: Tech-
nology, Engineering, Management. John Wiley &
Sons.
Zelikman, E., Harik, G. R., Shao, Y., Jayasiri, V., Haber,
N., and Goodman, N. (2024). Quiet-star: Language
models can teach themselves to think before speaking.
In First Conference on Language Modeling.
To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development
303