Formal Description and Verification of a Text-based Model Differencing
and Merging Method
Ferenc A. Somogyi and Mark Asztalos
Department of Automation and Applied Informatics,
Budapest University of Technology and Economics,
1111 Budapest, Hungary
Keywords:
Version Control, Model Differencing and Merging, Text-Based Modeling, Algorithm, Verification.
Abstract:
Version control is an integral part of teamwork in software development. Differencing and merging key arti-
facts (i.e. source code) is a key feature in version control systems. The concept of version control can also be
applied to model-driven methodologies. The models are usually differenced and merged in their graph-based
form. However, if supported, we can also use the textual representation of the models during this process.
Text-based model differencing and merging methods have some useful use cases, like supporting the persis-
tence of the model, or having a fallback plan should the differencing algorithm fail. Using the textual notation
to display and edit models is relatively rare, as the visual (graph-based) representation of the model is more
common. However, many believe that using them both would be the ideal solution. In this paper, we present
the formal description of a text-based model differencing and merging method from previous work. We also
verify our algorithm based on this formal description. The focus of the verification is the soundness and com-
pleteness of the method. The long term goal of our research is to develop a modeling environment-independent
algorithm. This could be used in version control systems that support textual representations.
1 INTRODUCTION
Version control is an integral part of teamwork in tra-
ditional software development. Version control sys-
tems (Spinellis, 2005) are crucial in keeping team-
work organized. The concept of version control can
also be applied to model-based development tech-
niques in order to achieve greater efficiency during
teamwork. This is a young research area. Using
version control greatly benefits model evolution and
management (Paige et al., 2016), as we can keep bet-
ter track of evolving (changing) models during devel-
opment.
Difference and merging different versions of the
same code is a key feature in version control systems.
Model Differencing and Merging (MDM) is differ-
ent from source code differencing and merging. In
the former case, the artifacts are graph-based mod-
els, while source code is usually text-based. This
means that different approaches are needed for the
two cases. It is worth noting that our focus is on
Domain-Specific Modeling (DSM) (Kelly and Tolva-
nen, 2008), where the models are almost always in
a graph-based form, though there might exist other
areas where they are not. In DSM, models are usu-
ally processed by model transformations (Sendall and
Kozaczynski, 2003), with the next step usually be-
ing code generation. These practices already have a
well-established literature and practical applications
(Bergmann et al., 2015). Model differencing and
merging is a research field with academical results
(Lin et al., 2004) (Altmanninger et al., 2009) and
some industrial applications (Brun and Pierantonio,
2008), but it is mostly considered a young research
field. Text-based MDM methods are even rarer in ex-
isting research.
Table 1: Graphical and textual approaches in DSM.
Graphical
approach
T
extual approach
Broad
view
Detailed
view
Easier
to read
Easier
to write
Simulation
/ animation
Handling
larger models
Domain
experts prefer it
De
velopers prefer it
- Serialization
support
The most common approach used to display and
edit models is the graphical (visual) approach. There
are some approaches that connect graphical and tex-
tual languages (Eysholdt and Behrens, 2010), but
these are in the minority. Displaying and editing mod-
Somogyi, F. and Asztalos, M.
Formal Description and Verification of a Text-based Model Differencing and Merging Method.
DOI: 10.5220/0006728006570667
In Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2018), pages 657-667
ISBN: 978-989-758-283-7
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
657
els in a textual form can have many benefits, both in-
side and outside (Petre, 1995) (Moher et al., 1993)
modeling-related applications. Table 1 summarizes
the main advantages of these approaches. Many argue
that using the two together is the ideal solution, be-
cause we can keep the advantages of both (Gr
¨
onniger
et al., 2007) (P
´
erez Andr
´
es et al., 2008). This raises
the problem of keeping the two notations synchro-
nized (van Rest et al., 2013). Text-based MDM meth-
ods can help solve this problem as we can use them
while reloading a previously-edited textual represen-
tation to identify changes since the last edit.
In previous work (Somogyi, 2016) (Somogyi and
Asztalos, 2016), we presented our own text-based
MDM method. The algorithm was developed for a
specific modeling environment (VMTS) and a spe-
cific language (VMDL) used to describe the textual
representations. Our goal was to develop a method
that can be used to merge arbitrary VMTS mod-
els based on their VMDL representations. VMTS
is a modeling tool used for educational and indus-
trial purposes. As an industrial example, VMTS was
used in the graphical programming of programmable
logic controllers (PLC) under the IEC standard. Our
method can be applied to both industrial and educa-
tional purposes.
The main application of our approach is version
control. We can also use it to aid synchronization by
recognizing the changes that occurred between two
editing sessions of a textual representation. This is
useful to keep the textual representation synchronized
with the stored model when the model is changed by
other means (graphical edit, direct edit, etc.). It can
also be used alongside real-time synchronization as an
extra layer. Our motivations for using and researching
text-based MDM methods is further detailed in Sec-
tion 2.2.
The goals of this paper are to formally present
our approach and to provide its verification. Dur-
ing the verification, we examine the soundness and
completeness of the algorithm. The meaning of these
concepts vary based on the phases of our approach.
The paper is structured as follows. In Section 2,
we briefly introduce the modeling environment our
method was developed for, and present the language
that describes the textual representations. We also talk
about our motivations behind researching text-based
MDM methods. Section 3 is the main contribution of
the paper. It contains the formal presentation and the
verification of our approach. The analysis is divided
by the three phases of the algorithm. Finally, Section
4 concludes the paper and contains plans for future
work.
2 BACKGROUND
In this section, we introduce the modeling environ-
ment used by our algorithm. We also present the
language responsible for describing the textual rep-
resentation of the models. Next, we briefly (and in-
formally) introduce our text-based model differenc-
ing and merging (MDM) method that we published
in previous work. We also talk about our motivations
behind researching text-based MDM methods.
2.1 VMTS and VMDL
The Visual Modeling and Transformation System
(VMTS, 2003) (Levendovszky et al., 2005) is a graph-
based, domain-specific (meta)modeling and model
processing framework. The system provides a graph-
ical interface for defining, customizing, and utiliz-
ing languages. VMTS supports N-level meta instan-
tiation instead of the often-used meta levels of the
MOF specification (Meta Object Facility, 2003). The
entities (nodes) and relationships (edges) in VMTS
are identified by a globally unique identifier (GUID).
Both the nodes and the edges can have typed at-
Figure 1: A VMTS model and its VMDL representation.
AMARETTO 2018 - Special Session on domAin specific Model-based AppRoaches to vErificaTion and validaTiOn
658
tributes. The nodes can contain other nodes. The
edges belong to one of the following categories: as-
sociation, composition or inheritance. The framework
supports many common modeling concepts, like mul-
tiplicity, cardinality, etc.
The Visual Model Definition Language (VMDL)
is a language that describes VMTS models in a tex-
tual form. The models can be edited in this textual
form. We achieve this by using a formal grammar to
describe the language. VMDL uses a grammar imple-
mented in ANTLR (Parr, 2013). When the model is
updated via the text, an abstract syntax tree (AST) is
parsed (Aho et al., 2005) before updating the model.
The AST contains more semantic information than
the raw text and we can update the model based on
this information. We can also keep the non-semantic
information (comments, white spaces, etc.) in the text
separate from the semantic information. Figure 1 il-
lustrates a sample VMTS model and its textual rep-
resentation in VMDL. The example is a simple meta-
model for a library that contains books and authors.
The books have a title, and the authors have a name
attribute. There is a many-to-many relationship be-
tween the books and authors.
2.2 Our Text-based MDM Approach
Our text-based MDM method is capable of differenc-
ing and merging two different versions of a VMTS
model described by the VMDL language. The algo-
rithm performs the differencing and merging based on
the trees parsed from the raw text. It also uses the raw
texts during the process. The algorithm consists of
three main phases:
AST matching. The algorithm matches every
subtree pair in the trees parsed from the two ver-
sions of the model.
Conflict detection. The algorithm detects ev-
ery difference (conflict) between the matched sub-
trees. Unmatched trees are also a source of con-
flict.
Merging. The algorithm merges the two versions
into a merged model based on the trees, the raw
texts, and the discovered conflicts. The end result
must always be a syntactically and semantically
correct model.
Figure 2 and Figure 3 illustrate the difference be-
tween a traditional text differencing and merging al-
gorithm, and our own approach. The figures con-
tain two versions of the previously described book
meta model in VMTS. The models are represented
in their textual form. There are two differences be-
tween the two versions: 1) the multiplicity of the Ti-
Figure 2: Traditional text-based differencing.
tle attribute in the BookMeta node is different, and 2)
the order of the model elements is different. Figure
2 shows the result of the differencing process in the
case of a traditional text differencing and merging ap-
proach (KDiff3, 2003). The algorithm could not rec-
ognize the movement of the BookAuthorRelationship-
Instance edge, since the algorithm only works on the
level of the raw text. Instead, it recognized the other
two model elements (BookMeta and AuthorMeta) as
differences. Moreover, since the position of Book-
Meta changed in the text, it could not recognize the
difference in the Title attribute either. To sum it up,
the algorithm could not handle the differences on the
level of the model, the result was not accurate. On
the other hand, our method can recognize these dif-
ferences correctly. The result is illustrated in Figure
3. The identifiers of the elements are omitted from the
figure, as they are not relevant now. Both the move-
ment of the edge, and the multiplicity of the Title at-
tribute is easily recognizable based on the trees and
the raw text. They can also be tracked to the respec-
tive model elements in the model.
2.2.1 Motivations Behind Text-based MDM
The main use cases for text-based MDM methods (in-
cluding our approach) are as follows:
When using a traditional text differencing tool
(i.e. KDiff), we cannot recognize the conflicts on
the semantic level of the model. We can recog-
nize the changes in the text, but not on the level of
the model elements. A common example of this
problem is the recognition of moved elements in
the text. We have seen an example of this in Fig-
ure 2.
Formal Description and Verification of a Text-based Model Differencing and Merging Method
659
Figure 3: Text-based AST differencing.
By describing our models in a textual form, we
can use this form to support serialization instead
of a standard XML-like format like XMI (XML
Metadata Interchange, 2015). The main advan-
tage of this is the easier readability of the text,
especially during version control. Using a text-
based MDM method further supports this process.
It is beneficial to preserve the non-semantic infor-
mation (i.e. comments, white space) in the text af-
ter reloading the model. Text-based MDM meth-
ods support this.
Text-based MDM methods can help synchroniza-
tion by recognizing changes that occurred by
other means (direct edit, graphical edit, etc.) be-
tween two editing sessions of a textual represen-
tation.
If a text-based MDM method cannot find every
difference between the two versions based on the
trees, we can always rely on simple text-based
differencing. The differences might not be ac-
curately recognized, but the end user is always
informed of there being a difference. This is an
important advantage, as this makes text-based ap-
proaches less error-prone compared to structure-
based approaches. Structure-based approaches
usually do not have a fail-safe like this.
We also have our own personal motivations behind
researching text-based MDM methods:
There are few existing text-based MDM methods
(van Rozen and van der Storm, 2015). We are in-
terested in seeing how they compare to structure-
based MDM methods according to different as-
pects.
Our long-term goal is to develop a text-based
MDM method that is independent from VMTS
and VMDL, meaning that it can be used with
other modeling environments and languages.
Parts of the algorithm are already developed with
this goal in mind.
2.2.2 Comparison with Existing Approaches
Our method uses the abstract syntax trees parsed from
the textual representation during the differencing and
merging process. This approach is similar to existing
AST-based differencing tools, like GumTree (Falleri
et al., 2014) or ChangeDistiller (Gall et al., 2009).
However, our method is based on a different con-
cept than most AST differencing approaches. Most
of the time, these tools focus on source code differ-
encing. They usually focus on a specific program-
ming language, like Java in the case of GumTree and
ChangeDistiller. In contrast, our approach is tailored
for modeling, more specifically, the textual represen-
tations of graph-based models. Among other smaller
differences, this is most apparent in two main parts of
the algorithm:
The AST matching phase uses a matching oper-
ation that is based on the parser of the language
used to describe the textual representations. This
makes the approach more customizable, as it can
be tailored to multiple modeling languages in the
future.
At the end of the merging process, our approach
checks if the merged model is both syntactically
and semantically correct. This is also done with
the help of the parser, as we have to build the
model from the text to perform this check.
While it might be possible to apply these AST dif-
ferencing tools to our problem, it is usually not fea-
sible. For example, our accuracy (ratio of correctly
identified conflicts) would suffer, since we are not
using an approach tailored to modeling. Checking
the correctness of the model would also be difficult
without using the parser to build the model from the
text. Modeling-focused approaches like our method
are better suited for model differencing and merging
problems. In theory, our text-based MDM method can
also support other modeling languages VMTS and
VMDL. In addition, our approach supports the cor-
rectness check of the model at the end of the merge
phase. This is an important constraint, as most mod-
eling environments do not support saving incorrect
models (Steinberg et al., 2008) (Levendovszky et al.,
2005) during the editing process. Thus, in our opin-
ion, applying this constraint to model-based version
control is recommended.
AMARETTO 2018 - Special Session on domAin specific Model-based AppRoaches to vErificaTion and validaTiOn
660
3 VERIFICATION OF THE
ALGORITHM
This section is the main contribution of the paper.
It contains the formal description and verification of
our text-based MDM method. The concept of our
approach was presented in previous work (Somogyi,
2016) (Somogyi and Asztalos, 2016). In this section,
we formally present our method using pseudo code,
and verify it according to different aspects. The ver-
ification is divided by the three main phases of the
algorithm. We examine the soundness and complete-
ness of the different phases of the algorithm. In the
proofs, we use the notations of the pseudo code.
3.1 AST Matching Phase
The input of this phase are the two abstract syntax
trees (AST) parsed from the textual representations.
The output is a list of matched pairs and unmatched
subtrees. A subtree is a node in an AST. A subtree
itself is considered an AST as it can also have chil-
dren. During the AST matching phase, the algorithm
matches every subtree in the two abstract syntax trees.
It first tries to pair every subtree on the same level with
each other. We use this heuristic, because in practice,
most matches will be found on the same level (nodes,
attributes, etc.) of the trees. Moreover, subtrees being
moved on the same level (i.e. change of order in the
text) are common when we edit the textual represen-
tations of models.
Input: AST
1
, AST
2
Output: MP, U
1
U
2
if ¬ (IS MATCH(AST
1
,AST
2
)) then
return
/
0;
end
C
1
CHILDREN(AST
1
);
C
2
CHILDREN(AST
2
);
Tried
/
0;
(MP, Tried)
MATCH AST S(C
1
,C
2
,Tried);
U
1
UNMATCHED(MP, AST
1
);
U
2
UNMATCHED(MP, AST
2
);
for i U
1
do
for j U
2
do
if (i, j) / Tried then
(MP, Tried) (MATCH ASTS(i,
j, Tried));
end
end
end
Algorithm 1: AST matching - main algorithm.
Input: C
1
, C
2
, Tried
Output: Pairs, Tried
for i C
1
do
for j C
2
do
if (i, j) / Tried then
Tried.ADD((i, j));
if IS MATCH(i, j) then
Pairs.ADD((i, j));
c
i
CHILDREN(i);
c
j
CHILDREN( j);
Pairs.ADD(MATCH ASTS (c
i
,
c
j
, Tried));
end
end
end
end
Algorithm 2: The MATCH ASTS operation.
Algorithm 1 illustrates the main algorithm of the
AST matching phase. First, we check if the roots of
the two trees can be matched using the IS
M
ATCH op-
eration. This is a configurable user function that de-
termines if two subtrees (in this case, the roots) rep-
resent the same element. This functionality is pro-
vided by the parser of the language (i.e. VMDL). The
operation currently supports VMDL and VMTS, but
in theory, it can be extended to support other mod-
eling languages. After checking the roots, we call
the MATCH
A
ST S subroutine that is responsible for
matching the children of two subtrees, followed by
their children, etc. It always tries to match elements
on the same level. The CHILDREN operation returns
the children of a subtree. In this case, we use it to
get the children of the root trees. We also store sub-
tree pairs that we previously tried in the matching pro-
cess (Tried) , in order to avoid checking them twice.
After the subroutine returns, the algorithm gets every
unmatched tree, and tries to match them with each
other, using the subroutine again. This step is neces-
sary to find matches that are not on the same level of
the trees. The UNMATCHED operation returns ev-
ery unmatched tree in an AST. In practice, a common
way to do this is by labeling the unmatched trees.
Algorithm 2 depicts the subroutine used by the
main algorithm. The input of the subroutine are two
subtree lists (C
1
, C
2
) that represent trees on the same
level, and the list of already tried pairs. The out-
put are the matched pairs and the unmatched sub-
trees. The subroutine tries to match a pair, follow-
ing it up by doing the same with their children. It al-
ways checks subtrees on the same level to see if they
match. The matching is done using the aforemen-
tioned IS MATCH operation. We also avoid match-
ing a pair that we have tried before (Tried).
Formal Description and Verification of a Text-based Model Differencing and Merging Method
661
The basic algorithm for this phase would compare
every subtree with every other subtree in the other
AST. The difference between the basic algorithm and
our approach is the same-level heuristic mentioned
above. This greatly benefits the performance of the
algorithm when it comes to VMDL. We believe that
this is also true for most formal languages. As we
have mentioned before, the reason for this is that in
practice, nodes, attributes, etc. tend to be on the same
level in the abstract syntax trees. Thus, in practice, us-
ing the heuristic usually results in better performance.
3.1.1 Soundness
We consider that the AST matching phase of a text-
based MDM algorithm is sound, if it does not contain
any incorrectly matched pairs on its output. In this
subsection, we prove that the AST matching phase of
our algorithm is sound, assuming some limitations.
Theorem 1. The AST matching phase of our MDM
approach is sound, assuming the matching function
IS MATCH always returns the correct result.
Proof. The algorithm collects the matched pairs in
variable MP. Pairs to MP are always added within the
MATCH AST S subroutine: Pairs.ADD((i, j)). This
addition is always preceded by the IS MATCH oper-
ation. Since we assumed that the IS MATCH oper-
ation always returns the correct value, the algorithm
can never match incorrect pairs, thus, it is sound.
Remark. We assume that the IS MATCH operation
always returns the correct result, because it is a con-
figurable part of our algorithm. In theory, it means
that it can be extended to work with an arbitrary
modeling language. Therefore, we cannot possibly
prove that it is correct in every case. What we could
prove here is that the IS MATCH used for VMTS and
VMDL is correct. However, since this would require
going into too much technical detail, we choose to
omit it here. It is also relatively easy to prove this,
as most subtrees in VMDL have a unique identifier
(GUID) that we can use during the matching. This
remark covers some of the following proofs as well.
3.1.2 Completeness
We consider that the AST matching phase of a text-
based MDM algorithm is complete, if every correctly
matched pair appears on its output. In this subsection,
we prove that the AST matching phase of our algo-
rithm is complete, assuming some limitations.
Theorem 2. The AST matching phase of our MDM
approach is complete, assuming the matching func-
tion IS MATCH always returns the correct result.
Proof. Let T
1
AST
1
and T
2
AST
2
be two subtrees
in the two trees. Let us assume that T
1
and T
2
form a
correct pair (T
1
, T
2
) that must be in MP at the end of
the phase. T
1
and T
2
can either be on the same level
or on different levels of the trees. In the first case, the
MATCH AST S subroutine is going to find a match
when it is first called from the main algorithm, since
we assumed the IS MATCH operation always returns
the correct value. Since it always returns the correct
result, T
1
and T
2
can only ever be matched with each
other, which means that @ T
3
AST
2
so that (T
1
, T
3
) is
a match, and @ T
4
AST
1
so that (T
4
, T
2
) is a match.
Thus, when T
1
and T
2
are not on the same level, they
will be unmatched during the first round of matching.
However, at the end of the algorithm, we try to match
unmatched subtree that we did not try to match before
with each other. We have never tried to match T
1
and
T
2
before as they were not on the same level during
the first round, thus, the algorithm will eventually try
to match T
1
with T
2
. Therefore, assuming that the
IS MATCH operation is correct, we will always find
(T
1
, T
2
) as a pair.
3.2 Conflict Detection Phase
The conflict detection phase is the second phase of
our approach. It uses the result of the AST matching
phase to recognize conflicts between two versions of
a model. A conflict is an elementary difference be-
tween the two versions. It is always related to one or
more subtrees in order to accurately track the conflict
to model elements. The goal of this phase is to find
every conflict and assign solutions to them. A solution
is a piece of text that is used to replace the text related
to the AST of the conflict. An automatic solution is a
solution that is chosen automatically during the merg-
ing phase of our method. We differentiate between the
following types of conflicts in our approach:
Different Text Conflict (DTC). This type is as-
signed to a matched subtree pair. A DTC occurs
if the raw texts of the matched pair are different. It
can either be a semantic (changed attribute, name,
etc.) or a non-semantic (comments, etc.) differ-
ence. A DTC is recognized by performing a text-
based differencing on the pair. A DTC is always
assigned to the innermost tree. For example, if an
attribute of a node is changed, the conflict is as-
signed to the subtrees that represent the attributes,
instead of the nodes.
New Tree Conflict (NTC). This type is assigned
to an unmatched subtree. An NTC occurs if there
is a subtree that is present in one version of the
model, while it is not present in the other. An
NTC is easily recognized as an instance is created
AMARETTO 2018 - Special Session on domAin specific Model-based AppRoaches to vErificaTion and validaTiOn
662
for every unmatched tree found during the AST
matching phase.
Move Conflict (MC). This type is assigned to a
matched subtree pair. An MC occurs if the posi-
tions of the subtrees in a matched pair are not the
same. A subtype of this conflict is when the trees
in a matched pair are on different levels. This can
either have a semantic (i.e. contained node), or a
non-semantic (i.e. movement) meaning. An MC
is recognized by checking the order of the pair in
both trees. If the subtrees are on different levels, it
is best recognized by labeling the pair during the
matching process.
In previous work, we referred to move conflicts
(MC) as order conflicts (OC). We also presented a
special subtype when the order of two or three sub-
trees on the same level were changed. This is a com-
mon case in practice, and we deemed it useful for
users to easily solve these conflicts with just one so-
lution. However, this is unnecessary on a theoretical
level, as the Move Conflict presented here also cov-
ers this special case, even though it creates more in-
stances of conflicts in practice. Therefore, in this pa-
per, we exclude this special case from the theoretical
presentation.
Algorithm 3 illustrates the conflict detection
phase. For every U nmatched subtree, the algorithm
Input: AST
1
, AST
2
, Matched, Unmatched
Output: NTC, DTC, MC
for t Unmatched do
AST
R
GET CONTAINING T REE(t);
NTC.ADD(t,AST
R
));
end
for (t
1
,t
2
) Matched do
T Di f f T EX T DIFF(t
1
,t
2
);
if T Di f f 6=
/
0 then
(I
1
, I
2
) INNER TREES(AST
1
,
AST
2
, t
1
,t
2
);
DTC.ADD(I
1
,I
2
,T Di f f );
end
if t
1
.LEV EL 6= t
2
.LEV EL then
MC.ADD(t
1
,t
2
);
end
else
if DIFF ORDER(t
1
,t
2
,AST
1
,AST
2
)
then
MC.ADD(t
1
,t
2
);
end
end
end
Algorithm 3: Conflict detection - main algorithm.
creates a New Tree Conflict (NTC) and adds it to
the list. Before doing so, we determine the origin
of the subtree (GET CONTAINING T REE) so we
can identify the source of the conflict. For every pair
in the Matched we found during the AST matching,
the algorithm first performs a simple text differencing
operation (T EX T DIFF). If the texts of the subtrees
differ, then we locate the INNER T REES, so we can
identify the source of the conflict. After that, the algo-
rithm adds the created Different Text Conflict (DTC)
to the list. The next task is checking if the trees were
moved. First, we check if the subtrees are on different
levels; if they are not, then a Move Conflict (MC) is
created. If they are on the same level, we have to ex-
amine their positions. This is done by checking their
related positions to each other in both abstract syn-
tax trees (DIFF ORDER). Checking subtrees that are
only present in one AST makes no difference regard-
ing the order of elements. New trees are already han-
dled by another conflict type (NTC). If the order of
elements is the same in both trees, then there is no
conflict. Otherwise, an MC is added to the list.
3.2.1 Soundness and Completeness
We consider that the conflict detection phase of a text-
based MDM algorithm is sound, if it does not recog-
nize non-existing conflicts during the recognition. It
is complete, if it recognizes every existing conflict be-
tween the two differenced versions of the model. In
this subsection, we prove that the conflict detection
phase of our algorithm is sound and complete. Since
the proofs are very similar, we choose to prove them
together.
Remark. Completeness is related to the concept of
accuracy. Accuracy defines the ratio of correctly
identified conflicts. The conflict detection phase is
complete, if it has 100% accuracy. In the case of the
more general MDM methods, reaching 100% accu-
racy is usually a difficult task. Since our method is
tailored for VMTS and VMDL, we can reach it more
easily. Our future plan is to extend our approach to be
more general, so it can be used with other modeling
languages as well. This would result in potentially
losing this 100% accuracy. However, as we men-
tioned before, text-based approaches have the advan-
tage on falling back to pure text-based differencing, if
- for some reason - the detection algorithm would fail.
Therefore, while we might lose the accurate identifi-
cation of the conflicts, we will never lose complete-
ness, assuming that the textual representations are de-
scribed correctly.
Theorem 3. The conflict detection phase of our
MDM approach is sound and complete.
Formal Description and Verification of a Text-based Model Differencing and Merging Method
663
Proof. There are four places in the algorithm where
we recognize a new conflict. Let us examine all these
cases individually. The first case is the recognition
of a New Tree Conflict (NTC) for u U nmatched.
By definition, an NTC is a subtree t so that t
AST
1
AST
2
, but t / AST
1
AST
2
. The trees in the
Unmatched list cannot be absent from both AST
1
and
AST
2
, because then the AST matching phase would
not have found them. We create every NTC accord-
ing to the definition, thus, the completeness is proven.
We never create an NTC outside of this loop, so the
soundness is also proven. The second case is the
recognition of the Different Text Conflicts (DTC). For
(t1,t2) Matched, the algorithm checks if Text(t
1
)
and Text(t
2
) are different. Raw text-differencing de-
pends on the Longest Common Subsequence problem
(Paterson and Dan
ˇ
c
´
ık, 1994), which is solved for two
sequences. Therefore, we can assume that the result
of the text differencing is correct. Similarly to the
first case, according to the definition, we only add
a DTC to the list when Text(t
1
) 6= Text(t
2
). There-
fore, the completeness and soundness are proven. The
third case is the recognition of a Move Conflict (MC)
for (t
1
,t
2
) Matched, where t
1
.LEV EL 6= t
2
.LEV EL.
This is done according to the definition. The fourth
case is creating an MC for (t
t
,t
2
) Matched when
t
1
.LEV EL = t
2
.LEV EL, but their order is different.
We consider checking the relative order of two ele-
ments in a list a simple operation and a solved prob-
lem. Therefore, we assume that it always returns the
correct result. Again, according to the definition, we
create a new MC if the order is different in the two
trees. Since there are no other cases where we recog-
nize a conflict, and we examined all existing possibil-
ities, the soundness and completeness of the conflict
detection phase are proven.
3.3 Merge Phase
The merge phase of the algorithm is the most practice-
oriented phase, because user input heavily influences
the outcome of the merging process. The merge phase
is split into two phases: 1) the automatic phase, and
2) the iterative phase. During the automatic phase,
the automatic solution of every automatically solvable
conflict is added to the merged text. In the iterative
phase, the user can choose from the designated solu-
tions (i.e. keep the tree or delete the tree), or they
can manually solve any conflict with an arbitrary so-
lution. We see no reason to avoid user involvement,
as it is expected and conventional in version control
systems as a fail-safe method.
Algorithm 4 illustrates the formal description of
the merging phase. The inputs of the algorithm are the
Input: AST
1
, AST
2
, NTC, DTC, MC
Output: Text
M
, AST
M
Con f licts NTC DTC MC;
Auto AUT O CONFLICT S(Con f licts);
Text
M
GET T EXT (AST
1
);
AST
M
AST
1
;
AST S (AST
1
,AST
2
);
for c Auto do
ITERATIVE BUILD(Text
M
, AST
M
, ASTS,
c, Conflicts,
/
0);
end
while ¬Done do
Command USER INPUT ();
if Command == (c
u
,Solution) then
ITERATIVE BUILD(Text M, AST M,
ASTS, c
u
, Conflicts, Solution);
end
if Command == Finalize then
if CHECK(Text
M
,AST
M
) then
Done = true;
end
end
end
Algorithm 4: Merging - main algorithm.
parsed trees and every recognized conflict. The out-
put is a syntactically and semantically correct merged
model given by its textual representation and parsed
AST. As we have mentioned before, it is important to
have this correctness constraint, since most modeling
environments do not support saving incorrect models
(Steinberg et al., 2008) (Levendovszky et al., 2005).
Therefore, in our opinion, model-based version con-
trol systems are correct to apply this constraint as
well.
First, the algorithm takes one of the texts
(GET T EXT (AST
1
)) as basis for the merging . It
then continuously tries to build up the merged text
Text
M
and the merged tree AST
M
. In the automatic
phase, the algorithm takes every automatically re-
solvable conflict (AUT O CONFLICT S), and builds
the merged tree iteratively (IT ERATIV E BUILD). It
is important to note that the iteratively built merged
tree may not always be correct. It is the responsi-
bility of the user to ensure that the end result is a
correct model given in its textual form. Thus, we
have to keep track of every conflict by its absolute
position in the merged text, instead of its position
in the merged AST. Algorithm 5 illustrates the iter-
ative building process that is used during both the au-
tomatic and the iterative phases. It first resolves the
conflict in the text automatically (AU TO
RESOLV E)
or by using a given solution (RESOLV E). Then,
it builds the merged tree (BUILD T REE) based on
AMARETTO 2018 - Special Session on domAin specific Model-based AppRoaches to vErificaTion and validaTiOn
664
Input: Text
M
, AST
M
, AST S, c,
Con f licts, Solution
Output: Text
M
, AST
M
if Solution ==
/
0 then
Text
M
AUT O RESOLV E(Text
M
,c);
end
else
Text
M
RESOLV E(Text
M
,c, Solution);
end
AST
M
BUILD
T REE(AST
M
,AST S,Text
M
);
I GET INT ERACT IONS(Con f licts, c);
UPDAT E POSIT IONS(AST
M
,Con f licts, I);
Algorithm 5: The ITERATIVE BUILD operation.
the change that occurred. Afterwards, we have to
update the positions of the conflicts by their po-
sition in the text (UPDAT E POSIT IONS). The
GET INT ERACT IONS operation discovers the in-
teractions the conflicts have on each other. In
the iterative phase, the user has complete con-
trol over the resolution of the remaining conflicts
(USER INPUT ). For every resolved conflict, we call
the IT ERAT IV E BUILD operation with the Solution
that the user chose. The iterative building is the same
as it was in the automatic phase. Finally, when the
merging process is over, we have to CHECK if the
merged model is correct. Similarly to the IS MATCH
function presented in Section 3.1, the CHECK oper-
ation uses the parser of the language to build a model
from the text, and then checks the correctness of the
model.
3.3.1 Soundness
We consider that the merge phase of a text-based
MDM algorithm is sound, if it cannot produce an in-
correct merged model on its output. In this subsec-
tion, we prove that the merge phase of our algorithm
is sound, assuming the checking function CHECK al-
ways returns the correct result.
Theorem 4. The merge phase of our MDM approach
is sound, assuming the checking function CHECK al-
ways returns the correct result.
Proof. The iterative phase of the algorithm ends
(¬Done) once we verify that the model is both syn-
tactically and semantically correct. This is done
by the CHECK operation, which is similar to the
IS MATCH operation in the AST matching phase.
Both are configurable operations that rely on the
parser of the language (VMDL in our case). Since
we assumed that the CHECK operation is correct, the
merging phase can only end with a correct model.
The assumption is reasonable for the same reasons we
have seen before in Section 3.1.1.
3.3.2 Completeness
We consider that the merge phase of a text-based
MDM algorithm is complete, if the merged model
contains a solution for every discovered conflict. In
this subsection, we prove that the merge phase of our
algorithm is complete, assuming it is sound.
Theorem 5. The merge phase of our MDM approach
is complete, assuming it is sound.
Proof. The algorithm takes one version of the model
(AST
1
and GET T EXT (AST
1
)), and proceeds to
build the merged model based on this version. In Sec-
tion 3.1.1, we proved that the end result of the merge
phase is always a syntactically and semantically cor-
rect model. Therefore, the merged model must con-
tain a solution for conflicts that would make the model
incorrect. During the iterative phase, the user must
give a solution for these conflicts. All that we have to
prove is that the conflicts that do not cause the model
to be incorrect have a solution in the merged model.
Conflicts with an automatic solution are solved dur-
ing the automatic phase. For the rest of the conflicts,
let us examine them by type. The New Tree Con-
flicts are either present or not in AST
M
, based on their
appearance in AST
1
. In the case of the Different Text
Conflicts, the text that is used in the merged model ap-
pears in Text
M
. For the Move Conflicts, the solution is
the order found in AST
1
, which is also always present.
These solutions cannot cause the merged model to be
incorrect, because the Check operation at the end of
the algorithm would fail. We have seen that every
conflict has a solution in the merged model, thus, the
merge phase of our algorithm is complete.
4 CONCLUSIONS
In this paper, we have verified our previously pre-
sented text-based model differencing and merging
(MDM) method according to different aspects. We
have also formally described the algorithm. The for-
mal description was not presented in previous work.
The algorithm is capable of differencing and merging
two versions of a model by their textual representa-
tions. We have briefly introduced the modeling frame-
work (VMTS) we are using and the textual language
(VMDL) that is used to describe the textual represen-
tations of the models. We have also discussed the
differences between traditional text differencing and
merging, and text-based MDM approaches. We have
Formal Description and Verification of a Text-based Model Differencing and Merging Method
665
concluded that text-based MDM methods are needed
for the differencing and merging of the textual repre-
sentations.
Our text-based MDM approach consists of three
phases: 1) the abstract syntax tree (AST) matching
phase, 2) the conflict detection phase, and 3) the
merging phase. We presented the formal description
of our algorithm with pseudo code, divided by the
three phases of the algorithm. Afterwards, we veri-
fied the algorithm based on two aspects: soundness
and completeness. We proved that during the AST
matching phase, the algorithm does not find any in-
correctly matched pairs, and that it always finds ev-
ery correct pair. We also verified that during the con-
flict detection phase, the algorithm does not recognize
any incorrect conflicts; and that it recognizes every
existing conflict. Finally, we proved that during the
merging phase, it is not possible to create an incor-
rect merged model, and that every conflict would be
solved in some way in the resulting merged model.
We have also discussed the differences between
text-based and graph-based MDM approaches, and
outlined our motivation for researching text-based
MDM approaches. The most significant of these were
the following: better support for serialization, a fail-
safe during the differencing process, and the fact that
there are few such existing methods available. We
have also mentioned some areas where text-based
MDM methods can be applied, such as serialization
and version control, or during the synchronization of
the textual and graphical notations.
4.1 Future Work
Based on the research presented in this paper, we con-
sider the following to be promising directions for fu-
ture work:
Comparison with structure-based approaches.
Since there are few existing text-based MDM ap-
proaches, we are interested in seeing how our
method compares to structure-based approaches
regarding characteristics like accuracy, perfor-
mance, or generality. The next step would be
comparing the two types of approaches on a more
general and theoretical level.
Complexity analysis. In future work, we plan to
extend the verification of our approach by examin-
ing its complexity. We plan to analyze the worst-
case complexity of the different phases, and com-
pare these to the complexity of basic algorithms
that can be used to solve these problems. The ba-
sic algorithm for the AST matching phase would
be matching every subtree in one AST with ev-
ery subtree in the other AST. We aim to prove that
the worst-case complexity is on par with these al-
gorithms, but in practice, our method usually per-
forms better due to the heuristics we use.
Extend the method to be more general. Our
original goal was to create a text-based MDM
method that is able to handle arbitrary VMTS
models described by VMDL. Our goal for the fu-
ture is to extend this method by making it more
general, meaning that it should be usable with
other modeling languages as well. During the
development of the algorithm, we have kept this
goal in mind. Some parts of the algorithm (like
the aforementioned IS MATCH or CHECK op-
erations) are already created with this in mind.
Having a general text-based MDM method would
benefit the text-based modeling research field, as
there are very few such existing methods avail-
able. The long-term use-case for this would be
creating a text-based version control system for
models that is not dependent on any modeling en-
vironment.
ACKNOWLEDGEMENTS
This work was performed in the frame of FIEK 16-
1-2016-0007 project, implemented with the support
provided from the National Research, Development
and Innovation Fund of Hungary, financed under the
FIEK 16 funding scheme.
REFERENCES
Aho, A. V., Sethi, R., and Ullman, J. D. (2005). Compil-
ers: Principles, Techniques, and Tools (2Nd Edition).
Addison-Wesley Longman Publishing Co., Inc.
Altmanninger, K., Brosch, P., Kappel, G., Langer, P., Seidl,
M., Wieland, K., and Wimmer, M. (2009). Why
Model Versioning Research is Needed!? An Experi-
ence Report. In Proc. of the MoDSE-MCCM 2009
Workshop @ MoDELS 2009.
Bergmann, G., D
´
avid, I., Heged
¨
us,
´
A., Horv
´
ath,
´
A., R
´
ath,
I., Ujhelyi, Z., and Varr
´
o, D. (2015). Viatra 3 :
A reactive model transformation platform. In 8th
International Conference on Model Transformations,
L’Aquila, Italy. Springer, Springer.
Brun, C. and Pierantonio, A. (2008). Model differences
in the eclipse modeling framework. UPGRADE, The
European Journal for the Informatics Professional,
9(2):29–34.
Eysholdt, M. and Behrens, H. (2010). Xtext: Imple-
ment Your Language Faster Than the Quick and Dirty
Way. In Proc. of the ACM International Conference
AMARETTO 2018 - Special Session on domAin specific Model-based AppRoaches to vErificaTion and validaTiOn
666
Companion on Object Oriented Programming Sys-
tems Languages and Applications Companion, OOP-
SLA ’10, pages 307–309, New York, NY, USA. ACM.
Falleri, J.-R., Morandat, F., Blanc, X., Martinez, M., and
Monperrus, M. (2014). Fine-grained and Accurate
Source Code Differencing. In Proceedings of the In-
ternational Conference on Automated Software Engi-
neering, pages 313–324, V
¨
asteras, Sweden.
Gall, H. C., Fluri, B., and Pinzger, M. (2009). Change anal-
ysis with evolizer and changedistiller. IEEE Software,
26(1):26–33.
Gr
¨
onniger, H., Krahn, H., Rumpe, B., Schindler, M., and
V
¨
olkel, S. (2007). Text-based Modeling. In Proc. of
the 4th International Workshop on Software Language
Engineering.
KDiff3 (2003). A text-based differencing and merging tool.
http://kdiff3.sourceforge.net/.
Kelly, S. and Tolvanen, J.-P. (2008). Domain-Specific Mod-
eling: Enabling Full Code Generation. Wiley-IEEE
Computer Society Pr.
Levendovszky, T., Lengyel, L., Mezei, G., and Charaf,
H. (2005). A Systematic Approach to Metamodel-
ing Environments and Model Transformation Systems
in VMTS. Electronic Notes in Theoretical Computer
Science, 127(1):65–75.
Lin, Y., Zhang, J., and Gray, J. (2004). Model comparison:
A key challenge for transformation testing and ver-
sion control in model driven software development.
In Control in Model Driven Software Development.
OOPSLA/GPCE: Best Practices for Model-Driven
Software Development, pages 219–236. Springer.
Meta Object Facility (2003). Meta object facility (MOF)
2.0 core specification. Version 2.
Moher, T. G., Mak, D., Blumenthal, B., and Levanthal, L.
(1993). Comparing the comprehensibility of textual
and graphical programs. In Empirical Studies of Pro-
grammers: Fifth Workshop, pages 137–161. Ablex,
Norwood, NJ.
Paige, R. F., Matragkas, N., and Rose, L. M. (2016). Evolv-
ing models in model-driven engineering: State-of-the-
art and future challenges. Journal of Systems and Soft-
ware, 111:272 – 280.
Parr, T. (2013). The Definitive ANTLR 4 Reference. Prag-
matic Bookshelf, 2nd edition.
Paterson, M. and Dan
ˇ
c
´
ık, V. (1994). Longest common sub-
sequences, pages 127–142. Springer Berlin Heidel-
berg, Berlin, Heidelberg.
P
´
erez Andr
´
es, F., de Lara, J., and Guerra, E. (2008). Do-
main Specific Languages with Graphical and Tex-
tual Views, pages 82–97. Springer Berlin Heidelberg,
Berlin, Heidelberg.
Petre, M. (1995). Why looking isn’t always seeing: Read-
ership skills and graphical programming. CCommuni-
cations of the ACM, 38(6):33–44.
Sendall, S. and Kozaczynski, W. (2003). Model transfor-
mation: the heart and soul of model-driven software
development. IEEE Software, 20(5):42–45.
Somogyi, F. and Asztalos, M. (2016). Merging textual rep-
resentations of software models a practical approach.
In Hnatkowska, B. et al., editors, Software Engineer-
ing: Improving Practice through Research. Polish In-
formation Processing Society.
Somogyi, F. A. (2016). Merging Textual Representations of
Software Models. In MultiScience - 2016. microCAD
International Multidisciplinary Scientific Conference,
Miskolc, Hungary.
Spinellis, D. (2005). Version control systems. IEEE Soft-
ware, 22(5):108–109.
Steinberg, D., Budinsky, F., Merks, E., and Paternostro, M.
(2008). EMF: eclipse modeling framework. Pearson
Education.
van Rest, O., Wachsmuth, G., Steel, J., S
¨
uß, J. G., and
Visser, E. (2013). Robust Real-Time Synchroniza-
tion between Textual and Graphical Editors. In The-
ory and Practice of Model Transformations, Sixth In-
ternational Conference, ICMT 2013, Budapest, Hun-
gary, June 18-19, 2013. Proceedings., Lecture Notes
in Computer Science. Springer Verlag.
van Rozen, R. and van der Storm, T. (2015). Origin Track-
ing + Text Differencing = Textual Model Differenc-
ing, pages 18–33. Springer International Publishing,
Cham.
VMTS (2003). Visual modeling and transformation system.
http://vmts.aut.bme.hu.
XML Metadata Interchange (2015). Xml metadata inter-
change (XMI) specification. Version 2.5.1.
Formal Description and Verification of a Text-based Model Differencing and Merging Method
667