To Model, to Prompt, or to Code? The Choice Is Yours:

A Multi-Paradigmatic Approach to Software Development

Thomas Buchmann

1 a

, Felix Schw

agerl

2 b

and Ren

e Peinl

1 c

Hof University of Applied Sciences, Hof, Germany

OTH Regensburg, Regensburg, Germany

Keywords:

Model-Driven Development, Large Language Models, Software Engineering, Code Generation.

Abstract:

This paper considers three fundamental approaches to software development, namely manual coding, model-

driven software engineering, and code generation by large language models. All of these approaches have

their individual pros and cons, motivating the desire for an integrated approach. We present MoProCo, a

technical solution to integrate the three approaches into a single tool chain, allowing the developer to split a

software engineering task into modeling, prompting or coding sub-tasks. From a single input ﬁle consisting of

static model structure, natural language prompts and/or source code fragments, Java source code is generated

using a two-stage approach. A case study demonstrates that the MoProCo approach combines the desirable

properties of the three development approaches by offering the appropriate level of abstraction, determinism,

and dynamism for each speciﬁc software engineering sub-task.

1 INTRODUCTION

This paper contributes and discusses an integrated

approach combining the three software develop-

ment paradigms paradigms summarized in Figure 1,

namely modeling, prompting and coding. Our sug-

gested multi-paradigmatic appraoch is implemented

by the tool MoProCo (= modeling, prompting and

coding woven together), which is documented in this

paper and published as VSCode extension.

1.1 Background and Motivation

Manual coding is the traditional paradigm to soft-

ware development and still preferred for many sce-

narios, e.g., when performance or safety are crucial.

Higher-level programming languages are determinis-

tic by design, meaning that the same syntactical input

always produces equivalent run-time semantics. Fur-

thermore, they are suited for describing the dynamic

behavior of software systems. On the downside, pro-

grams have to be speciﬁed on a high level of detail,

requiring a deep understanding of programming lan-

guages, algorithms, and design patterns, making the

https://orcid.org/0000-0002-5675-6339

https://orcid.org/0009-0004-9852-5418

https://orcid.org/0000-0001-8457-1801

approach time-consuming and error-prone.

The modeling paradigm has gained signiﬁcant at-

tention in recent decades. Model-Driven Develop-

ment (MDD) (V

olter et al., 2006) focuses on creat-

ing abstract models of software systems, which are

then transformed into concrete implementations us-

ing automated and deterministic generators. Relying

on standards like the Uniform Modeling Language

(UML) (OMG, 2017) or domain-speciﬁc languages

(Fowler and Parsons, 2010), this approach empha-

sizes traceability, consistency and reusability. It was,

however, demonstrated that MDD can be challeng-

ing to adopt in practice, particularly when it comes

to modeling dynamic behavior of systems. Therefore,

modeling is often used for the static structure as a

starting point, and dynamic behavior is added, e.g.,

by manual coding (Steinberg et al., 2009).

Coding

Modeling Promptingabstract

deterministic

dynamic

informal

detailed

static

MoProCo

Figure 1: A triad of software development paradigms.

296

Buchmann, T., Schwägerl, F., Peinl and R.

To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development.

DOI: 10.5220/0013557100003964

In Proceedings of the 20th International Conference on Software Technologies (ICSOFT 2025), pages 296-303

ISBN: 978-989-758-757-3; ISSN: 2184-2833

Third, prompt-based software engineering using

Large Language Models (LLMs) has recently revo-

lutionized software development, particularly in code

completion, bug ﬁxing, and program synthesis tasks

(Lyu et al., 2024). Being based on statistics and

heuristics, LLMs are inherently informal and lack de-

terminism, making it challenging to ensure the cor-

rectness and reliability of generated code. LLMs,

having been trained on vast amounts of data and being

able to generalize to new scenarios, accept informal

input, making them particularly well-suited for gener-

ating dynamic parts of software systems. LLMs gen-

erate code from informal natural language prompts

usally phrased at a high level of abstraction.

1.2 Contributions

Previous work (Buchmann et al., 2024) suggests that a

combination of MDD and LLMs is promising; MDD

provides a deterministic structure which “tames” the

inherently volatile LLMs by using them only for lo-

cal prompts with limited scope. To date, integrated

tool support for interweaving model-driven and LLM-

generated artifacts is still missing. This is the research

gap where the paper at hand steps in by making the

following contributions:

1. An extension to an existing textual modeling lan-

guage for UML-like class models by two types of

annotations: natural language prompts and Java

code snippets.

2. Fully automated transformation tooling consist-

ing of two components, one of which is respon-

sible for deterministically transforming models

(and their contained Java code snippets) into an

initial code base. The second component then

uses LLMs to generate method bodies based on

the natural language prompts and weaves them

into the code base.

3. A case study (on-line shop) demonstrating the

feasibility and effectiveness of the proposed ap-

proach.

4. A classiﬁcation of related research in the inter-

secting ﬁelds of MDD and LLMs.

1.3 Outline

The remainder of this paper describes the design and

implementation of our Toolchain (Section 2), pro-

vides a qualitative discussion (Section 3), and aligns

with related work (see Section 4). Future work is out-

lined in the conclusion.

2 THE MoProCo TOOLCHAIN

In this section, we present the architecture and usage

of the MoProCo toolchain. Figure 2 summarizes the

explanations below. Source code and build instruc-

tions are available in two separate GitHub reposito-

ries

. We implemented the toolchain as two exten-

sions to the popular editor VSCode

. The extensions’

core components are:

Editor A textual editor for a class model language

(a subset of UML class diagrams). It describes

the static structure of a software system to be de-

veloped. Operations may optionally carry natural

language prompts and Java code snippets.

Generator A deterministic code generator that trans-

forms the class models into two artifacts, namely

a visualization ﬁle for PlantUML (see below) and

an initial Java code base. The latter contains the

static class structure derived from the model (e.g.,

considering bidirectional and composition seman-

tics for associations). Code snippets are directly

copied into the generated code. Natural language

prompts attached to operations are generated into

JavaDoc comments, which are in turn processed

by the downstream Weaver component.

PlantUML Viewer We reuse the PlantUML exten-

sion for VSCode to graphically visualize the class

model as diagram while the developer creates

them using the textual syntax; see Figure 3. The

PlantUML ﬁle (extension .classdiag) is gener-

ated by the Generator component automatically

when the class model ﬁle is updated and saved.

Weaver The Weaver component is responsible for

generating method implementations based on the

natural language prompts attached to methods in

the intermediate Java code. The Weaver uses

a large language model (LLM) to generate Java

code fragments based on the prompts and the ex-

isting codebase. The generated code snippets are

then woven into the intermediate Java code pro-

duced by the upstream Generator component.

LLM Server For generating code snippets from

JavaDoc annotations, the Weaver component in-

teracts with a remote server hosting the LLM.

Our architecture does not assume a speciﬁc LLM;

rather it offers integrations based on two de-facto

standard interfaces, namely llama.cpp

(best per-

formance if the LLM runs locally) and OpenAI-

https://github.com/tbuchmann/class-diag-langium,

https://github.com/tbuchmann/mdellmprocessor

https://code.visualstudio.com/

https://github.com/ggml-org/llama.cpp

To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development

297

Developer

VSCode

LLM Server

(llama.cpp or

Ollama)

tUML file

diagram

file

PlantUML

Viewer

MoProCo

Editor

MoProCo

Generator

.java files

(intermediate

.java files

(intermediate

Java files

(intermediate)

MoProCo

Weaver

.java files

(intermediate

.java files

(intermediate

Java files

(woven)

edit

run

view

run

edit

access

out

Figure 2: A context view of the architecture of the MoProCo toolchain. Gray boxes indicate existing components, while green

boxes indicate components developed as part of the new tool chain.

compatible endpoints such as Ollama

(if the

LLM runs on a remote server). The component is

conﬁgurable with respect to the URL of the LLM

server, as well as the language model and variant.

2.1 MoProCo Editor

The ﬁrst component, MoProCo Editor, is based on

the textual modeling language tUML (Buchmann and

Schw

agerl, 2025), which allows to describe domain

models at a level of abstraction between UML and

source code. It has been developed with Langium

and TypeScript and integrates with PlantUML

to vi-

sualize the model as non-editable diagram.

A minimal example (an excerpt of the case study

initially presented in (Buchmann et al., 2024)) of the

textual syntax is shown in Listing 1. Packages may

contain primitive types, classes, associations (assoc

), and interfaces as well as nested packages (not

shown in the example). Classes contain attributes

with primitive types and optional cardinalities (lower

..upper; -1 is for unbounded). Operations of classes

are modeled with visibility, parameters and return

types. Associations have two ends, each referencing

a class and carrying visibility and multiplicity. One

association end may be marked as composite. A

more detailed description of the tUML syntax (not

considering the prompt and code paradigms) is pro-

vided in (Buchmann and Schw

agerl, 2025).

In the model syntax, the connection to the prompt

and code paradigms is achieved as follows: Op-

erations may optionally be annotated with either a

prompt (desc, followed by a natural language de-

scription of the expected dynamic behavior) or an im-

https://ollama.com/

https://langium.org/

https://plantuml.com/

plementation (impl, followed by Java source code).

These annotations are used by the downstream Gen-

erator to produce JavaDoc comments and Java code.

package shop {

primitive Integer

// more primitive types

class Customer {

public names : String

public email : String [0..1]

public placeOrder(o : Order) : Boolean

desc "For every item of the order,

check if enough items are in stock.

If so, add to orders and update the

items in stock. Return whether

order was added"

}

class Order {

public orderID : String

public addItem(item : OrderItem)

: Boolean desc "If an order

referring to the same article

already exists, reuse that one

updating the quantity and returning

true. Else return false."

public totalPrice() : Decimal

impl "return items.stream()

.mapToDouble(i -> i.getQuantity()

* i.getArticle().getCurrentPrice()

).sum();"

}

// more classes and associations

assoc hasOrders {

public customer : Customer [1..1]

public orders : Order [0..-1]

composite

}

assoc hasItems {

public order : Order [1..1]

public items : OrderItem [0..-1]

composite

}

Listing 1: A minimal example of a tUML ﬁle.

ICSOFT 2025 - 20th International Conference on Software Technologies

298

Figure 3: A screenshot showing the MoProCo editor and the PlantUML Viewer in VSCode.

2.2 MoProCo Generator

The generator receives as input a tUML ﬁle and pro-

duces two artifacts: a PlantUML ﬁle for visualiza-

tion and the intermediate Java ﬁle. For consistency

reasons, the visualization is automatically generated

when the developer saves a changed class model,

whereas the Java code generation must be manually

triggered using the VSCode command Generate Java

Code contributed by our extension. We here describe

the Java code generation focusing on the prompt and

code paradigms; general Java code generation as well

as PlantUML generation is described in (Buchmann

and Schw

agerl, 2025).

Listing 2 shows an excerpt of the intermediate

code generated for Listing 1; the full code is available

in the GitHub repository referenced above.

public class Order {

private String orderID;

private Customer customer;

private List<OrderItem> items

= new ArrayList<OrderItem>();

public String getOrderID() {

return this.orderID;

}

public void setOrderID(String orderID) {

this.orderID = orderID;

}

public Customer getCustomer() {

return this.customer;

}

public void setCustomer(Customer c) {

if (this.customer != caption) {

Customer o = customer;

if (o != null) {

this.customer = null;

o.removeFromOrders(this);

}

this.customer = c;

if (c != null)

c.addToOrders(this);

}

public List<OrderItem> getItems() {

return unmodifiableList(this.items);

}

public void addToItems(OrderItem o) {

if (!this.items.contains(o)) {

this.items.add(o);

o.setOrder(this);

}

public void removeFromItems(OrderItem o){

if (this.items.contains(o)) {

this.items.remove(o);

o.setOrder(null);

}

/** @prompt If an order referring to the

same article already exists,

reuse that one updating the

quantity. */

public Boolean addItem(OrderItem item) {

To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development

299

// to be inserted by LLM

}

public Double totalPrice() {

return items.stream().mapToDouble(i ->

i.getQuantity() * i.getArticle()

.getCurrentPrice()).sum();

}

Listing 2: Generated intermediate Java code exercpt.

The generated Java code implements the class

structure deﬁned in the model, considering the se-

mantic details. For instance, attributes are trans-

formed into Java ﬁelds with getters and setters, as-

sociations are transformed into pairs of Java ﬁelds

with consistency-preserving bidirectional accessor

methods (e.g., setCustomers and addToItems/

removeFromItems). Being based on templates, the

code generated here is deterministic and traceable to

the static model structure; a property that LLMs can-

not guarantee due to their non-deterministic nature.

Operations are generated as Java methods based

on the the following rules:

• If no annotation is provided, only a stub is gen-

erated, instructing the developer to implement the

method afterwards.

• If a desc annotation is provided, a JavaDoc com-

ment is generated with the content of the annota-

tion. The comment serves as a controlled prompt

for the Weaver component.

• If an impl annotation is provided, the content of

the annotation is directly copied into the body.

2.3 MoProCo Weaver

The MoProCo Weaver extension is responsible for

generating method implementations based on the nat-

ural language prompts attached to methods in the in-

termediate Java code as JavaDoc annotations. Trig-

gered by a context menu action Process Java ﬁles, the

Weaver proceeds as speciﬁed by the following algo-

rithm:

• For every Java ﬁle containing a @prompt tag

– For every occurrence of @prompt in the ﬁle

Extract method name and memorize range of

method body (embraced by {}) of the subse-

quent Java method.

Extract the value of the @prompt tag.

Send a request to the LLM on the LLM server.

The request includes:

· all Java ﬁles of current project as context,

· a system prompt instructing the LLM to gen-

erate Java code for the method based on the

provided user prompt without additional ex-

planations or markup in the response.

· The speciﬁed prompt as user prompt.

Extract the content of the method body from

the response produced by the LLM.

Replace memorized range of the Java ﬁle with

the extracted method body.

Listing 3 shows the generated Java code after the

Weaver has processed the @prompt annotations.

public class Order {

// existing contents

/** @prompt If an order referring to the

same article already exists, reuse that

one updating the quantity and returning

true. Else return false. */

public Boolean addItem(OrderItem item) {

for (OrderItem i : this.items) {

if (i.getArticle().equals(

item.getArticle())) {

i.setQuantity(i.getQuantity()

+ item.getQuantity());

return true;

}

this.items.add(item); return false;

}

public Double totalPrice() {

return items.stream().mapToDouble(i ->

i.getQuantity() * i.getArticle()

.getCurrentPrice()).sum();

}

Listing 3: An excerpt of the woven Java code.

Method addItem has been completed by the LLM

based on the provided prompt. To provide traceability

to the original model, the @prompt annotation is re-

tained in the JavaDoc comment. Method totalPrice

has not been touched by the Weaver since it does not

carry a @prompt annotation.

3 DISCUSSION

Qualitative experiments of our tool based on the pre-

sented case study raised several questions.

3.1 Determinism and Traceability

The added determinism of our hybrid approach is

a key advantage over pure (black-box) LLM-based

ICSOFT 2025 - 20th International Conference on Software Technologies

300

code generation (Buchmann et al., 2024). The Mo-

ProCo Generator component ensures that the gener-

ated code is consistent with the static model and that

the generated methods are traceable to their originat-

ing natural language prompts. This allows developers

to reason about the generated code and to understand

the relationship between model, prompts, and code.

In contrast to the black-box style, the code frag-

ments to be generated by LLMs are much smaller,

giving the developer more control over the generated

code and reducing the risk of, e.g., hallucinations.

The Weaver component further enhances the trace-

ability of the generated code by preserving the orig-

inal prompts in the JavaDoc comments. This allows

developers to understand why a certain piece of code

was generated and to verify that the generated code is

consistent with the intended behavior. Nevertheless,

the decision to employ LLMs only locally still does

not make our presented approach fully deterministic.

Varying results and hallucinations may still occur.

3.2 Quality and Consistency

The quality of the code produced by the Weaver com-

ponent depends on both the capabilities of the con-

nected LLM and the quality of the natural language

prompts provided by the developer.

As soon as the LLM generates incorrect code,

however, it is hard to determine if the problem is due

to the prompt or due to the LLM. Our contribution

does not intend to overcome this general problem, but

to provide a tool chain that allows developers to ex-

periment with different LLMs and prompts in order

to engineer the best possible solution iteratively.

In our own experimentation, we used both Ol-

lama and llama.cpp with small-size LLMs including

llama3.2 (3 billion parameters) and qwen2.5-coder (3

billion parameters). In all cases, the generated code

was consistent with the natural language prompts.

3.3 Tooling and Integration

Unexpected results may also be due to technical limi-

tations. For instance, our approach may not overcome

the general limits of LLMs such as context size.

It is obvious that the editing support for the impl

construct is not yet optimal as it is represented as

a plain string without syntax highlighting or valida-

tion. This can be worked around in the following way

(see future work): The intermediate or ﬁnal code has

been generated, the developer may either implement

methods manually using the full-ﬂedged Java editor

or have the LLM generate fragments using the desc

annotation, and propagate the code back into the

model. In the latter case, this also resolves the prob-

lem of non-determinism and accelerates future gener-

ation steps by reducing the number of LLM iterations.

3.4 Domain Unspeciﬁcity

The presented approach is limited to UML-like class

models, which are initially domain-unspeciﬁc and

technology-neutral. Speaking in terms of Model-

Driven Architecture (Frankel, 2003), our tooling al-

lows to create and maintain platform-independent

(yet programming language speciﬁc, namely Java)

models, but the transfer to speciﬁc technology (e.g., a

speciﬁc Java framework for web development) is not

supported. While this is percieved as a common lim-

itation of MDD approaches, LLMs provide the op-

portunity for manual intervention to adapt the gen-

erated code to the target platform. To this end, one

additional building block could be global prompts at-

tached not to operations, but to entire classes or pack-

ages, to guide the LLM in generating code that is

consistent with the target platform. For instance, we

could imagine a prompt like “Generate a REST API

offering CRUD functionality for this domain model”

to guide the LLM in generating the appropriate boiler-

plate code. While this is a promising idea in general,

we have to keep in mind that it may negatively impact

the improved determinism.

4 RELATED WORK

Although the incredibly fast-paced progress in LLM

development has slowed down, the idea of using

LLMs to aid in software development is still strik-

ing and researched manifold. Current coding bench-

marks show, that models are often optimized towards

a few selected popular benchmarks, intended or not

(Jain, Naman et al., 2024), but fail to show their gen-

eralization abilities when tested in different bench-

marks. HumanEval (Chen, Mark et al., 2021) for ex-

ample shows that many models are able to achieve

85% accuracy and more, including a few small mod-

els like Qwen 2.5 Coder 7B, whereas the same mod-

els achieve less than 30% accuracy in SciCode (Tian

et al., 2025) and LiveCodeBench (Jain, Naman et al.,

2024), two more recent and more practically rele-

vant coding benchmarks according to ArtiﬁcialAnal-

ysis.ai

. For small models, the drop is especially

severe. Qwen Coder 2.5 7B drops from 90% Hu-

manEval to 9% LiveCodeBench and 14% SciCode,

whereas Qwen 2.5 70B achieves only 73% in Hu-

https://artiﬁcialanalysis.ai/models

To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development

301

manEval, but 28% and 27% in the other two bench-

marks. This impression is supported by ﬁndings

from Bytedance (Cheng, Yao et al., 2024) that no-

tice that QwenCoder models perform much better on

HumanEval than on their self-constructed FullStack

Bench, whereas nearly all other models show a lin-

ear correlation between HumanEval performance and

FullStack Bench performance .

One idea to overcome this is combining the cre-

ativity and background “knowledge” of LLMs with

the formal precision of MDD code generators. This

can be seen as an incarnation of the ongoing debate

between symbolic and subsymbolic AI (Saba, 2023)

and their compromise resolution in neuro-symbolic

approaches combining formal reasoning and LLMs

(Dinu et al., 2024; Mirzadeh et al., 2024). It remains

unclear whether we should rather “directly go from

natural language requirements to code, or is there still

[..] a sweet spot for using models in such highly au-

tomated processes” (Burgue

no et al., 2025).

Blinn et al. propose to use type hints and ap-

propriate code context by using the Hazel Language

Server in order to aid LLMs in correctly ﬁlling holes

in the code (Blinn et al., 2024). This kind of support

is similar to what IDEs provide for human program-

mers. Blinn et al. show that this increases model per-

formance dramatically while reducing the inferencing

time signiﬁcantly, especially during error rounds.

Although general purpose LLMs are highly suc-

cessful, speciﬁc LLMs for coding can be more effec-

tive (Hui, B. et al., 2024). Nevertheless, using spe-

ciﬁc system prompts targeting the various tasks in aid-

ing software developers to create a multi-agent sys-

tem can further enhance the LLM’s capabilities (Bur-

gue

no et al., 2025). Typical multi-agent setups for

code generation focus on role specialization and iter-

ative feedback loops to optimize collaboration among

agents like an Orchestrator, Programmer, Reviewer,

Tester, and Information Retriever (He et al., 2025).

On the other hand, large reasoning models following

the Quiet-Star (Zelikman et al., 2024) or DeepSeeks

R1 (Guo, Daya et al., 2025) pattern are taught via re-

inforcement learning to look at a problem from multi-

ple perspectives themselves without the need for dif-

ferent system prompts. This allows both the general

LLMs to be highly successful in coding benchmarks,

but is also translated to speciﬁc coding LLMs (Li,

Dacheng et al., 2025). S* achieves 20% absolute in-

creases on LiveCodeBench v2 across a wide variety of

models like differently sized Qwen 2.5 Coder models,

R1-distill models and GPT-4o-mini. Only the previ-

ously best model tested, o1-mini, did beneﬁt much

less with only 8.6% absolute gain.

One of the ﬁrst to evaluate the use of LLMs in

model driven development are (Fill et al., 2023). They

use ChatGPT to create ER, UML, BPMN and Herak-

lit diagrams with reasonable success. Puranik et al

analyze the possibility of using NLP techniques and

other AI methods to construct UML diagrams from

textual descriptions of the requirements for the de-

sired software (Puranik et al., 2024).

Sadik et al. research the other way around and use

GPT-4 to generate Java and Python code to run within

the JADE or PADE multi-agent system frameworks

based on UML diagrams and use either OCL alone or

OCL and FIPA-ontologies to constrain the generated

code. They use cyclomatic complexity as a measure

of quality for the generated code and ﬁnd that there

is an average of four bugs per class (7 classes in their

use case) in the GPT-4 generated code. Most of the

bugs were easily ﬁxable missing library imports.

The closest existing publication to our endeavor

is from Netz, Michael and Rumpe (Netz et al., 2024).

They develop CD4A, a DSL in a Java-like syntax, and

use it together with LLMs to generate running Web

applications. The LLM is used to generate CD4A

syntax in a few-shot approach, but GPT-4 is less suc-

cessful in that than in generating PlantUML code. Af-

ter that, MontiGem is used to generate the code for

the Web application. The claim of a “running Web

application” implies some functionality, which is not

given by the automated approach. The resulting code

still needs business logic implemented by human de-

velopers, but generates a runnable Web UI as needed

by data-centric enterprise information systems.

5 CONCLUSION

The multi-paradigmatic approach presented in this

paper integrates a triad of software development

paradigms, consisting of modeling, prompting, and

coding. It maximizes determinism (which is guaran-

teed by coding and modeling), abstractness (which

are provided by both modeling and prompting), and

dynamism (which is difﬁcult to achieve in many static

modeling approaches).

By combining the rigor of MDD with the accessi-

bility of LLMs, we provide a powerful tool for accel-

erating software development while ensuring trace-

ability. When combined with support technologies

such as version control, continuous integration, and

automated testing, our approach can streamline the

entire software development life-cycle.

The running case study has demonstrated that it is

possible to combine the strengths of structured mod-

eling with the generative power of LLMs, while the

user may locally decide for a development paradigm

ICSOFT 2025 - 20th International Conference on Software Technologies

302

(modelling, prompting, coding) suiting the current

sub-problem. Our approach allows developers to ex-

periment with different prompts and LLMs to ﬁnd the

best solution for their speciﬁc needs. The resulting

code is consistent with the static model structure and

traceable to the original prompts.

Future work is motivated by the discussion. We

are planning to address the life-cycle management of

the class model and its derived artifacts. As already

mentioned in the discussion, code added after the ini-

tial generation, either manually or via LLM prompt-

ing, should be propagated back into the model. This

would eventually allow for a round-trip engineering

process, where the model is the central artifact and

the code is derived from it in a preferredly deter-

ministic way while its traceability (e.g. its original

prompt) is maintained. Futhermore, integrations with

futher AI backends in addition to llama.cpp and Ol-

lama are desirable. Last, global prompts attached to

packages or classes may overcome the problem of do-

main unspeciﬁcity by guiding the LLM in generat-

ing platform-speciﬁc target code, obviating the need

of heaviweight MDD-like approaches such as domain

speciﬁc languages or code generation templates.

REFERENCES

Blinn, A., Li, X., Kim, J. H., and Omar, C. (2024). Stat-

ically Contextualizing Large Language Models with

Typed Holes. Proceedings of the ACM on Program-

ming Languages, 8(OOPSLA2):468–498.

Buchmann, T., Peinl, R., and Schw

agerl, F. (2024). White-

box llm-supported low-code engineering: A vision

and ﬁrst insights. In Proc. 27th Int’l Conf Model

Driven Engineering Languages and Systems, MOD-

ELS Companion ’24, page 556–560. ACM.

Buchmann, T. and Schw

agerl, F. (2025). Bridging UML

and Code in Education: A Textual Modeling Lan-

guage for Teaching Object-Oriented Analysis and De-

sign. To appear, currently under review.

Burgue

no, L., Di Ruscio, D., Sahraoui, H., and Wimmer, M.

(2025). Automation in Model-Driven Engineering: A

look back, and ahead. ACM Transactions on Software

Engineering and Methodology, page 3712008.

Chen, Mark et al. (2021). Evaluating Large Language Mod-

els Trained on Code. arXiv:2107.03374 [cs].

Cheng, Yao et al. (2024). FullStack Bench: Evaluating

LLMs as Full Stack Coders. arXiv:2412.00535 [cs].

Dinu, M., Leoveanu-Condrei, C., Holzleitner, M., Zellinger,

W., and Hochreiter, S. (2024). SymbolicAI: A frame-

work for logic-based approaches combining genera-

tive models and solvers. arXiv:2402.00854 [cs].

Fill, H., Fettke, P., and K

opke, J. (2023). Conceptual model-

ing and large language models: Impressions from ﬁrst

experiments with chatgpt. Enterp. Model. Inf. Syst.

Archit. Int. J. Concept. Model., 18:3.

Fowler, M. and Parsons, R. J. (2010). Domain-Speciﬁc Lan-

guages. Addison-Wesley Professional.

Frankel, D. S. (2003). Model Driven Architecture: Apply-

ing MDA to Enterprise Computing. Wiley Publishing,

Indianapolis, IN.

Guo, Daya et al. (2025). DeepSeek-R1: Incentivizing Rea-

soning Capability in LLMs via Reinforcement Learn-

ing. arXiv:2501.12948 [cs].

He, J., Treude, C., and Lo, D. (2025). LLM-Based Multi-

Agent Systems for Software Engineering: Literature

Review, Vision and the Road Ahead. ACM Transac-

tions on Software Engineering and Methodology.

Hui, B. et al. (2024). Qwen2.5-Coder Technical Report.

arXiv:2409.12186 [cs].

Jain, Naman et al. (2024). LiveCodeBench: Holistic and

Contamination Free Evaluation of Large Language

Models for Code. arXiv:2403.07974 [cs].

Li, Dacheng et al. (2025). S*: Test Time Scaling for Code

Generation. arXiv:2502.14382 [cs].

Lyu, M. R., Ray, B., Roychoudhury, A., Tan, S. H., and

Thongtanunam, P. (2024). Automatic programming:

Large language models and beyond. ACM Transac-

tions on Software Engineering and Methodology.

Mirzadeh, S., Alizadeh, K., Shahrokhi, H., Tuzel, O.,

Bengio, S., and Farajtabar, M. (2024). Gsm-

symbolic: Understanding the limitations of mathe-

matical reasoning in large language models. CoRR,

abs/2410.05229.

Netz, L., Michael, J., and Rumpe, B. (2024). From nat-

ural language to web applications: using large lan-

guage models for model-driven software engineering.

In Modellierung 2024, pages 179–195.

OMG (2017). Uniﬁed Modeling Language (UML). Object

Management Group, Needham, MA, formal/2017-12-

05 edition.

Puranik, B. S., Sonawane, A., Jose, J., Chavan, S., and Patil,

Y. (2024). Enhancement of Model Driven Software

Development using AI. In 4th Asian Conference on

Innovation in Technology (ASIANCON), pages 1–5.

Saba, W. S. (2023). Stochastic LLMs do not Understand

Language: Towards Symbolic, Explainable and Onto-

logically Based LLMs. In Conceptual Modeling, vol-

ume 14320, pages 3–19. Springer Nature Switzerland.

Steinberg, D., Budinsky, F., Paternostro, M., and Merks,

E. (2009). EMF Eclipse Modeling Framework. The

Eclipse Series. Addison-Wesley, Boston, MA, 2nd

edition.

Tian, M., Gao, L., Zhang, S., Chen, X., Fan, C., Guo, X.,

Haas, R., Ji, P., Krongchon, K., and Li, Y. (2025). Sci-

code: A research coding benchmark curated by sci-

entists. Advances in Neural Information Processing

Systems, 37:30624–30650.

olter, M., Stahl, T., Bettin, J., Haase, A., and Helsen, S.

(2006). Model-Driven Software Development: Tech-

nology, Engineering, Management. John Wiley &

Sons.

Zelikman, E., Harik, G. R., Shao, Y., Jayasiri, V., Haber,

N., and Goodman, N. (2024). Quiet-star: Language

models can teach themselves to think before speaking.

In First Conference on Language Modeling.

To Model, to Prompt, or to Code? The Choice Is Yours: A Multi-Paradigmatic Approach to Software Development

303