Authors:
Michal Töpfer
;
Tomáš Bureš
;
František Plášil
and
Petr Hnětynka
Affiliation:
Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic
Keyword(s):
Software Architecture, Large Language Model, LLM, Workflow, Workflow Architecture.
Abstract:
In this paper, we focus on how reliably can a Large Lanuage Model (LLM) interpret a software architecture, namely a workflow architecture (WA). Even though our initial experiments show that an LLM can answer specific questions about a WA, it is unclear how correct its answers are. To this end, we propose a methodology to assess whether an LLM can correctly interpret a WA specification. Based on the conjecture that the LLM needs to correctly answer low-abstraction level questions to answer questions at a higher abstraction level properly, we define a set of test patterns, each of them providing a template for low-abstraction level questions, together with a metric for evaluating the correctness of LLM’s answers. We posit that having this metric will allow us not only to establish which LLM model works the best with WAs, but also to determine what their concrete syntax and concepts are suitable to strengthen the correctness of LLM’s interpretability of WA specifications. We demonstrate
the methodology on the workflow specification language developed for a currently running Horizon Europe project.
(More)