Towards a Model Transformation based Code Renovation Tool
Norbert Somogyi
a
, G
´
abor K
¨
ovesd
´
an
b
and L
´
aszl
´
o Lengyel
c
Budapest University of Technology and Economics, M
˝
uegyetem rkp. 3, Budapest, Hungary
Keywords:
Code Modernization, Modeling, Model Transformation.
Abstract:
Maintaining legacy software has always required considerable effort in software engineering. To alleviate
these efforts, extensive research has been dedicated to automate the modernization of such systems. The pro-
cess includes several challenges, such as the syntactic translation of the old software to a modern programming
language, the mapping of the type systems of the source and target languages and the paradigm shift if the two
languages use different approaches, such as transforming procedural code to the object-oriented or functional
paradigm. In the case of procedural to object-oriented transformations, the state-of-the-art solutions are not
capable of automatically producing satisfactory results and some researchers suggest that complete automa-
tion will never be achieved. In our paper, we report on our work in progress on using recent advances in the
fields of modeling and model transformation to build a software modernization tool. Our solution capitalizes
on the advantages of the Ecore-based modeling ecosystem of Eclipse and focuses on not just the syntactic
translation of the system, but also on the paradigm shift of procedural to object-oriented transformations. Our
approach builds a semantic model from the original source code written in C language and produces Java code
by analysing and transforming this model.
1 INTRODUCTION
Software is a living artifact that is rarely left alone
once developed. As software evolves and is contin-
uously modified, the structure of the source code be-
comes less clear, which makes it much harder to un-
derstand its original intent, modify it or extend it with
additional features. This is called code rot (Izurieta
and Bieman, 2013).
It would be desirable for software engineers to pe-
riodically review and reconsider the technology stack
of a particular software and migrate to a more modern
and ideal option. However, it is rarely implemented
for two main reasons. First, when done manually,
it assumes the risk of introducing so-called regres-
sions, bugs that are accidentally introduced by mod-
ifying software. Secondly, corporations are driven
by business goals and are less likely to allocate re-
sources on projects that do not provide value for the
market but are solely for maintenance purposes. Code
modernization tools could help in this challenge by
at least partly automating this process, mitigating the
risk of regressions and cutting down on costs. In this
a
https://orcid.org/0000-0001-6908-7907
b
https://orcid.org/0000-0003-1527-2896
c
https://orcid.org/0000-0002-5129-4198
paper, we report on our work in progress towards a
code modernization tool that modernizes C code us-
ing modeling and model transformation techniques.
Our tool uses the Ecore-based modeling ecosystem of
Eclipse (McAffer et al., 2010) and focuses on translat-
ing procedural software into an object-oriented struc-
ture using model transformations. The tool produces
Java code, but the approach and the tool itself are
extensible to support different source and target lan-
guages as well.
Based on the conclusions drawn from the cited
papers and tools in Section 2, we believe that there
is a lack of procedural to object-oriented code mod-
ernization tools that actually use the various object
identification methods and algorithms proposed in re-
search. Therefore, we aim to build a semi-automatic
tool that performs the conversion from procedural C
code to object-oriented Java code without interaction
as much as possible. We also propose two novel ap-
proaches when tackling this problem: call chain anal-
ysis and using a DSL (Domain Specific Language) for
the transformation of standard library calls.
Our main focus is to combine and expand upon
the various object identification methods proposed
by existing literature with the Ecore-based modeling
ecosystem of Eclipse, which we believe to be a huge
advantage in the field of code modernization. The
Somogyi, N., Kövesdán, G. and Lengyel, L.
Towards a Model Transformation based Code Renovation Tool.
DOI: 10.5220/0010246401910198
In Proceedings of the 9th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2021), pages 191-198
ISBN: 978-989-758-487-9
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
191
reason for this is because it is a well-developed and
well-known ecosystem that consists of a great deal of
various model-driven tools. Many of these are greatly
compatible with each other because all of them use
EMF (Eclipse Modeling Framework) models. Thus
this gives the opportunity to integrate these tools into
a code modernization framework, which can greatly
increase the capacities of the tool and reduce the costs
of developing it.
The paper is structured in the following way. Sec-
tion 2 reviews related work, state-of-the-art tools and
highlights the challenges in the field. In Section 3, we
present our approach and the model transformations
used during the translation process. In Section 4, we
evaluate the feasibility of our solution, exploring the
advantages and shortcomings of our work. Section 5
concludes the paper, highlighting some future work.
2 RELATED WORK
2.1 Challenges in the Field of Code
Modernization
Code modernization has a number of cornerstones
that must be considered when aiming to build such
solutions. In their paper, Terekhov and Verhoef
(Terekhov and Verhoef, 2000) have highlighted the
difficulties of automated source code modernization
that are beyond purely syntactic translation. Firstly,
the target programming language may lack some of
the source language’s features that must be simulated
in the target language. Another challenge is mapping
the data type of variables. Different programming
languages have a wide range of differences regard-
ing data types, e.g. C has a universal pointer type
(void *) that can refer to arbitrary data in the mem-
ory, whereas Java deliberately lacks such primitive
type to avoid bugs that originate in manual memory
management. Moreover, data types that are missing
in the target language must be emulated. Further-
more, some languages, such as Cobol, have several
dialects that differ syntactically and also semantically.
Semantic differences are another considerable chal-
lenge in general. For example, most languages have
a function that prints a text to the standard output,
but the way they are called is different. For exam-
ple, C’s printf works with its own format string and
expects NULL-terminated character arrays, whereas
Java’s System.out.println works with a different for-
mat description and takes instances of String objects.
Considering the difficulties of automated conver-
sion, partially automated or interactive tools seem
more viable. The tools that attempt to perform truly
parallel conversion in practice may have some defi-
ciencies. For example, the tool presented by Sneed
in (Sneed, 2011) has a systematic way of creating
object-oriented classes, but it can be argued whether
the resulting object-oriented model feels natural and
resembles what a programmer would have created
from scratch.
2.2 Cluster Analysis Methods
Discovering potential candidate classes in source
code is often referred to as object identification. A
commonly used approach in identifying objects from
procedural code is cluster analysis (Everitt et al.,
2009). This method creates partitions out of the
entities (data structures, functions, global variables)
present in the code based on the dependencies be-
tween them. The goal is to minimize the dependen-
cies between the created partitions and create classes
from them. Although most of the solutions that use
this approach are largely similar, there exist basic dif-
ferences as well.
The solution presented by Zou and Kontogiannis
(Ying Zou and Kontogiannis, 2001) builds an inter-
mediate XML representation from the AST-s (Ab-
stract Syntax Tree) and then uses an incremental al-
gorithm to divide the entities of the algorithm to dif-
ferent clusters. The approach used by Czibula and
Czibula (Czibula and Czibula, 2011) is similar to
this but uses a different, heuristic partitioning algo-
rithm created by the authors. The algorithm defines
the distance between different entities and, based on
this property, creates clusters that have very minimal
distances between each other. The algorithm is tai-
lored to provide better performance against particu-
larly large software.
In conclusion, these approaches are similar to our
object identification algorithm because they try to find
functionally related parts of the application to con-
sider them for class creation. Our method, however,
focuses not only on the dependency relation between
the entities, but also on using parameter and return
type analysis, as proposed by (Ying Zou and Kon-
togiannis, 2001). This makes our approach identify
more classes than regular cluster analysis. Our solu-
tion also proposes a novel method of easing the au-
tomatic translation of standard function library calls
(Section 3.3).
MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development
192
2.3 Code Modernization and
Reengineering Tools
Due to the importance of modernizing legacy soft-
ware, much effort has been made to create tools ca-
pable of alleviating this process. However, these tools
usually have vastly different goals and focuses.
For example, the tool developed by mtSystems
(mtSystems, 2020) is capable of precise, fast and fully
automatic syntactical translation from C to Java code,
but the created code has two considerable disadvan-
tages. Firstly, when translating pointers into Java,
this tool uses different generated container classes to
emulate pointer functionality, which makes the gen-
erated code feel unnatural. Secondly, the tool does
not deal with shifting paradigm from procedural to
object-oriented design. We believe that using model
transformations to create an object-oriented design is
key to making the generated code feel as natural as
possible, thus our tool focuses on this aspect.
MoDisco (Bruneliere et al., 2014) is an extend-
able, model transformation based reverse engineering
framework. The process of the transformation con-
sists of 3 important steps. First, the source code is
parsed and a higher level semantic model is created
that implements OMG’s standard of KDM (Knowl-
edge Discovery Metamodel) (P
´
erez-Castillo et al.,
2011). In the next step, various model transforma-
tions may be defined and ran on the model that refine
and improve it step by step. MoDisco reuses ATL
(ATL, 2008) to define these model transformations.
ATL is a model transformation language and frame-
work. The final step of the transformation process
is code generation based on the model. It should be
noted that MoDisco uses the Ecore-based modeling
ecosystem of Eclipse as well. It is what has inspired
us to use it in our paradigm shift-focused code reno-
vation tool.
As mentioned before, code modernization is a
broad term and tools may focus on different aspects
of the process. Our goal is to focus on the ac-
tual paradigm shift between procedural languages and
object-oriented design. Upon further inspection of the
MoDisco framework, we believe it is best used in two
different scenarios, none of which suit our needs well.
Firstly, it functions well as a refactoring tool. Sec-
ondly, the tool focuses on architecture modernization.
It supports modernization with regard to various tech-
nologies, for example JEE (Java Enterprise Edition)
and JUnit. Furthermore, another reason we believe
the tool is not optimal for our goals is because the
various object identification algorithms are way too
complex to be described using ATL.
3 TRANSFORMING
PROCEDURAL CODE
3.1 Transformation Methodology
Our method of transforming procedural code into an
object-oriented design shares ideas with the tradi-
tional approach employed in many earlier solutions.
This includes parsing the source code, building a
model and generating code from it. However, in cer-
tain areas, we take new directions.
Using the ASTs (Abstract Syntax Trees), a much
higher-level representation of the source code is built
in the form of a semantic model, similar to the
approach used by MoDisco. It preserves the se-
mantics and thus the intent of the original code
but eliminates all of the language-dependent details.
OOGen is then subjected to various model transfor-
mations. M2M (Model-to-Model) transformations
(Langer et al., 2010) are responsible for identifying
the potential objects of the system, and M2T (Model-
to-Text) (Oldevik et al., 2005) transformations are
used to generate code based on the model.
Figure 1 illustrates the architectural view of our
solution. As mentioned before, using the Ecore-based
modeling ecosystem of Eclipse enables the tool to be
compatible with most other model-driven tools that
use or expect models that adhere to Ecore. This
makes the transformer easy to integrate with such
tools, which allows us to reuse them in our solution.
This reduces development costs and may further in-
crease the capabilities of a code modernization tool.
One concrete tool is Xtext, which we used to develop
the DSL that describes transformations of standard li-
brary functions (Section 3.3). In the future, more and
more such tools may be reused with the transformer.
It should be noted that the concept of the transfor-
mation process itself is language-independent. Nev-
ertheless, there are some parts of the transformation
that are dependent on the source language or the tar-
get language, yet these components can be easily re-
placed to support a wider range of source and tar-
get languages. Parsing the software requires a parser.
Similarly, code generation is dependent on the target
language, since the code generator must be capable of
creating code in the implied language.
3.2 Object Identification: Model
Transformations
The quality of the created object-oriented design
largely depends on how well potential objects are
identified. Thus, it is an essential part of the trans-
Towards a Model Transformation based Code Renovation Tool
193
Figure 1: The overall architecture of our solution.
formation process. In our solution, object identifica-
tion steps are implemented as model transformations.
Based on existing literature, we propose the following
combination of object identification techniques.
3.2.1 Structure Collection
A basic first step in transforming procedural code into
an object-oriented design is to collect all data struc-
tures by traversing the AST-s and create a correspond-
ing class for each of them. Each class encapsulates
all the data members of the structure as private mem-
bers and generates appropriate public getter and set-
ter methods for accessing these fields. These classes
form the basis of the application.
3.2.2 Parameter and Return Type Analysis
In this step, global functions are analyzed with the in-
tent of assigning them to the classes created in the
previous step. To this end, the parameters and the
return type of every global function should be taken
into consideration. This approach was first proposed
in (Ying Zou and Kontogiannis, 2001).
A structure-type parameter implicates that the
function is dependent on the implied type. Since the
function uses this type for some reason or the other, it
could be beneficial to assign this method to the class
that corresponds with the type of the parameter. A
structure-type return value may imply that the func-
tion either updates the value of a variable of the im-
plied type or creates and returns a new instance of it.
Either way, it is expedient to assign the function to
the class that corresponds with the type of the return
value. It should also be noted that return type analy-
sis is a stronger sign of cohesion and thus should take
priority over the results of parameter analysis.
3.2.3 Call Chain Analysis
In the third step of object identification, we propose
the following method of finding candidate classes.
This idea expands on the concept of cluster analysis.
We analyze every global function that has not been as-
signed to a class in the previous step. The goal of this
model transformation step is to further refine the re-
maining global functions by encapsulating functions
that may semantically belong together.
To this end, a CG (call graph) is built that repre-
sents the inter-procedural control flow of the applica-
tion. The call graph is a directed graph where nodes
represent functions, and an edge from vertex f to ver-
tex g implies that function f calls function g at least
once. Our algorithm seeks call chains and creates a
class for each of these chains that meets two criteria.
Firstly, each and every function in the chain should
be called relatively rarely by the other functions in the
application that are not part of the analyzed chain. We
propose this value to be no more than 5% of the num-
ber of all the other functions outside of the current
call chain. This requirement (maximum allowed in-
bound calls) ensures that the created class is coupled
loosely with the rest of the application. Secondly, the
number of methods in the created class must exceed
a predefined safety minimum value. We propose this
to be minimum of 3 functions in a class. We believe
that 2 functions together may not be a strong sign of
cohesion, whereas at least 3 likely yield classes that
actually encapsulate functions that logically belong
together.
We define our algorithm that traverses a call graph
and creates classes encapsulating methods in call
chains that satisfy the requirements mentioned above
in Algorithm 1. Let max f
in
denote the maximum al-
lowed inbound calls and F the set of all remaining
functions to analyze.
3.3 A Novel Approach of Transforming
Standard Library Calls
After object identification, another important aspect
of modernizing between languages is the automatic
handling of standard library function calls. Trans-
forming them is an important part of modernizing
legacy software. However, doing so automatically is
a complex and very non-trivial problem. So much so
that even well-developed tools struggle with these is-
sues.
We propose a novel method of easing the auto-
matic conversion between standard library calls of
different languages. We have created a DSL (Do-
main Specific Language) (Fowler, 2010) designed
specifically for describing such transformations. We
have created the language using the Xtext (Bettini,
2016) framework of Eclipse. When designing the
language, we have paid attention to making it as
MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development
194
Algorithm 1: Call chain analysis.
Input: CG = (V(CG), E(CG)) call graph
Output: C = {c
1
, c
2
, ..., c
n
} set of
created classes
1 Function CreateClasses():
2 N = NumberOfAllFunctions()
3 max f
in
= CalcMaxfin(N)
4 F = V(CG)
5 C = {}
6 for f F do
7 c = Analyze(F, f, {})
8 if |c| 3 then
9 C = C c
10 else
11 for g c do
12 F = F g
13 return C
14 Function Analyze(F, f, c):
15 F = F \f
16 f
in
= NumberO f InboundCalls( f )
17 if f
in
> max f
in
then
18 return {}
19 c = c f
20 for g ChildrenO f ( f ) do
21 if g F then
22 c = Analyze(F, g, c)
23 return c
generic as possible. This makes the approach lan-
guage independent as long as the source or tar-
get language is either procedural or object-oriented
in nature. The grammar of the language
1
is pub-
licly available on GitHub. It is defined in the file
”hu.bme.aut.apitransform.ApiTransform.xtext” in the
project ”hu.bme.aut.apitransform”.
Let us now take a look at an example to illustrate
the feasibility of our approach. Listing 1 depicts the
handling of fopen and Listing 2 shows how to close
the stream when encountering an fclose call.
These scripts can be used to ease the refactoring
of basic C file handling. The first one describes that
for every fopen call encountered per scope basic Java
file handling instances should be created (Filereader
and Bufferedreader) with the appropriate constructor
parameters. The second script says that when the cor-
responding fclose call is found, the stream must be
closed in the generated code as well by calling the
close method of the FileReader instance.
1
https://github.com/gaborbsd/Moodernize
Listing 1: Instantiations for fopen.
1 t r an s f o rm a t i o n {
2 sou r c e f un c t io n f o p e n {
3 p ar a me t er s : p a t h mode
4 }
5 ta r g e ts {
6 i n s ta n t i a t i o n {
7 in s t an c e in {
8 cl a s s N a me : java . io . F i le R ea d er
9 pa r am e te r s : p a t h
10 }
11 in s t an c e br {
12 cl a s s N a me : java . io .
13 Bu f f e re d R e a d e r
14 pa r am e te r s : in
15 }
16 }
17 }
18 }
Listing 2: Closing the stream at fclose.
1 t r an s f o rm a t i o n {
2 sour c e f u nc t i on f c l o se {
3 pa r a m e te r s : f i l e
4 }
5 tar g e ts {
6 fun c ti o n c l o s e {
7 pa r a m e te r s :
8 owner : in
9 }
10 }
11 }
4 EXPERIMENTAL EVALUATION
In this section, we present the feasibility evalua-
tion of our approach and the tool on a number of
open-source software publicly available on GitHub.
Apart from ”CTestProject”, which is a hand-crafted C
project used to quickly test new features introduced
to the tool, all these applications are real, existing C
projects. Most of these programs are really complex
and consist of considerable amounts of source code.
Table 1 depicts the software used during the evaluat-
ing transformations.
Most of these programs are very low level and
have dependencies that were not available during
compilation. Some of them may be compiled on spe-
cific operating systems only. Thus, the transforma-
tions of these programs introduce considerably more
errors than they would if all dependencies were avail-
able. Regardless, the reason we have chosen to use
these applications as test subjects is that open-source
complex software written in C is hard to come by.
This way, we can be certain that the transformation
completes without any runtime exceptions even on
software that consist of hundreds of thousands of
source code. We also get a glimpse on how these
transformations work on real, complex software.
Towards a Model Transformation based Code Renovation Tool
195
Table 1: The transformed test applications and their total
sum of LoC (Lines of Code) across all header and source
files.
Availability LoC
https://github.com/dajobe/flickcurl 33k
https://github.com/pbatard/rufus 86k
https://github.com/hashcat/hashcat 176k
https://github.com/curl/curl 179k
https://github.com/git/git 229k
https://github.com/obsproject/obs-studio 240k
https://github.com/vim/vim 419k
Listing 3: Sample methods of the class dict.
1 publ i c void re d is _ a d d ( i nt times )
2 publ i c void r e d i s _ f e tc h ( int t i m e s )
3 publ i c void re d is _ d e l ( i nt times )
4 publ i c dict di c tC r ea t e
5 ( d ic t T yp e type , Objec t p r i v Da t a P t r )
6 publ i c int dic t R e s i z e ()
7 publ i c int di c t G e n e r i c D e l e t e
8 ( O b jec t key , int nofree )
https://github.com/NorbertSomogyi/
MoodernizeExamples
The components of the tool
2
, the OOGen
3
meta-
model and the transformed projects
4
that the tool pro-
duced are all publicly available on GitHub.
The core part of our solution was the use of var-
ious model transformation steps to create an object
identification algorithm that creates various classes
from the procedural source code. In the next few sub-
sections, we evaluate the results of our transforma-
tions.
4.1 Parameter and Return Type
Analysis
The goal of this step is to assign responsibility to
classes based on the signature of the global functions.
In many cases, this step successfully added methods
to created classes. Take for instance the class ”dict”,
created in the CBufferedTree, project which imple-
ments a dictionary in the form of a linked list in the
original C code. This class gained several methods,
some samples of which are depicted in listing 3.
As we can see, these methods truly do belong in
this class. They are responsible for adding, fetch-
ing, deleting entries, creating a certain type of dic-
tionary or resizing it. Similarly, many other classes
(e.g. dictentry, container, node) were assigned several
2
https://github.com/gaborbsd/Moodernize
3
https://github.com/gaborbsd/OOGen
4
https://github.com/NorbertSomogyi/
MoodernizeExamples
Listing 4: Sample class created by call chain analysis.
1 publ i c s t a tic void _exit_
2 l o c a l iz a t i o n ( O bje c t r e i nit ) {
3 . ..
4 f r ee _ d i a l og _ l i s t ();
5 mt a b _ d e s tr o y ( r ein i t ) ;
6 . ..
7 }
8 pri v a te st a t i c void f r e e_ d i a l o g _
9 l i s t ( ) {...}
10 pri v a te st a t i c void m t a b_ d e s t r o y
11 ( O b jec t r e i n it ) {...}
methods that would have been placed in them even
when creating the design manually by scratch. Based
on these results, we conclude that this step of the algo-
rithm truly does create useful object-oriented design.
4.2 Call Chain Analysis
This transformation step is responsible for further re-
fining those global functions that were not assigned
to a class in the previous analysis. It creates classes
encapsulating methods that logically belong together.
Across all the transformed programs, several call
chain classes were created. These classes are named
in the following format: < FirstFunctionName >
To < LastFunctionName > . Listing 4 shows an
example taken from the transformed Rufus project,
where a total of 16 call chain classes were success-
fully created.
We can see that one of the methods is public, the
others are private. This means that the private meth-
ods are never called by any other part of the applica-
tion. Moreover, the public method actually calls the
other 2 private methods at some point. This implies
these classes truly may belong together semantically.
In other classes, similar results can be observed. To
sum up, what we accomplish with this transforma-
tion step is that instead of putting every ”leftover”
global function in the global ModernizedCProgram
class, many of these functions are at least encapsu-
lated with other functions that logically belong to-
gether.
4.3 Standard Library Calls
To further elaborate on the examples presented in Sec-
tion 3.3, let us now take a look at how exactly the
the scripts depicted in listing 1 and listing 2 are used
to actually translate C file handling snippets. These
scripts describe how to handle the C fopen and fclose
calls. Coupled with two basic scripts that replace the
getC call with readLine (creating a String instance
before the calls) and putchar with System.out.print,
these can be used to actually translate a simple file
MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development
196
Listing 5: Java file handling snippet.
1 j a v a . io . Fi l eR e ad e r i n =
2 n ew Fi l eR e ad e r (" a s d . txt ");
3 j a v a . io . B u ff e r e d R e a d e r br =
4 n ew B u f f e r e dR e a d e r ( in );
5 Str i n g line ;
6 while (( line = br . r e ad L i ne ( ) )
7 != null ) {
8 S y s tem . out . prin t ( l i n e );
9 }
10 in . c lose ( );
handling code to its Java equivalent. Using the afore-
mentioned scripts, the Java code shown in listing 5 is
generated as a sample Java file handling snippet.
Based on these results, we believe this approach is
useful in easing the translation of not-too-complicated
code samples. However, to faithfully translate more
complex code, further improving the language might
be necessary. Particularly, detecting code patterns in-
stead of matching singular function calls could poten-
tially improve the expressive power of our DSL.
4.4 Critical Analysis
In this section, we summarize the strengths and weak-
nesses of our approach and tool and compare them to
the other tools mentioned in this paper.
One of the main strengths of our approach is the
use of Eclipse’s Ecore-based metamodeling ecosys-
tem. This is similar to what the Modisco tool uses and
employing it in a dedicated code modernization tool
is a novel approach of tackling this problem. Nat-
urally, our OOGen metamodel is not as detailed as
Modisco’s metamodel that adheres to OMG’s KDM
specification. For example, KDM has a separate layer
dedicated to the architecture of the modernized pro-
gram. This is not touched upon in our approach, be-
cause for now we have chosen to focus on the model
transformations that identify classes in the system.
The reason we have not reused Modisco’s KDM com-
patible metamodel is that we had OOGen available
from a different project and it was easy and fast to
use it. In the future, reusing the MoDisco metamodel
will be beneficial to our solution. To sum it up, al-
though Modisco’s metamodel is currently more ad-
vanced than ours, it is still a considerable advantage
against any other tool or approach formerly presented
in this paper.
The next notable advantage of our approach is the
use of model transformations. These are responsi-
ble for identifying potential objects as candidates for
class creation. This is an advantage when compared
to e.g. mtSystems, for this otherwise state-of-the-
art tool does not use any object-identification algo-
rithms proposed in former literature. When compared
to Modisco, we believe that these algorithms are too
complex to be described using ATL, and have deemed
it necessary to implement them ”manually”. On the
other hand, some earlier solutions use more steps in
their object identification algorithm, opposed to the
3 steps that we currently use. In the future, adding
more steps will be useful to create even better object-
oriented design.
We have also proposed the novel approach of call
chain analysis. It is similar to the previous cluster
analysis approaches seen in Section 2.2. It analyzes
the dependency between entities of the system and
creates classes that minimize dependencies between
the created classes. However, call chain analysis only
uses the functions of the system and not the global
variables or data structures. Thus, it is used only as
a refining step in the end instead of being the main
algorithm that creates the object-oriented design. The
advantage of this is that we can combine dependency
analysis with other forms of object identification, as
seen in this paper. The disadvantage is that it does
not consider data and classes should usually be used
to encapsulate behaviour with the data it operates on.
Thus, this step may further be improved in the future
by trying to combine the classes created by call chain
analysis with global variables that the functions may
heavily use.
Finally, our solution also uses an approach of eas-
ing the translation of standard library calls. Not only
is this a novel approach of tackling this problem,
but neither existing literature nor state-of-the-art tools
have made an effort to at least partially automate this
problem. Needless to say, we believe this is a consid-
erable advantage in favor of our solution.
On the other hand, there exist a number of C lan-
guage constructs that our reference implementation
tool does not handle properly as of now.The mtSys-
tems tool and other similar cutting-edge tools excel
at syntactic translation, leaving little to none compila-
tion errors in translated projects.
5 CONCLUSION
In this paper, we have presented our work in progress
code renovation tool that transforms procedural C
code to Java code in a semi-automatic way. Our ap-
proach relies on creating a high-level semantic model
out of the AST representation of the source code and
using a series of model transformations to create a
more object-oriented design. Our object identifica-
tion algorithm reuses some ideas proposed in existing
literature and proposes the new method of call chain
analysis. The quality of the identified classes is sub-
Towards a Model Transformation based Code Renovation Tool
197
ject to the parametrization of the algorithm. In gen-
eral, the classes are highly cohesive, and they encap-
sulate data that logically belong to each other.
One of the main benefits of our solution is using
the Eclipse EMF modeling ecosystem. This allows
us to easily integrate our solution with existing MDE
tools, which highly improve the capabilities of the
code modernizing tool. We have also created a DSL
that is used to effectively describe transformations of
standard library function calls. This is a novel ap-
proach of dealing with this problem.
We have also experimentally evaluated the fea-
sibility of our solution on various real, open-source
software. We have found that our object identifica-
tion methodology produces useful object-oriented de-
sign. We have also compared our solution to existing
state-of-the-art tools and solutions presented in exist-
ing literature. We have determined that the novelty
of our solution is the Ecore-based model-driven ap-
proach, the proposed call chain analysis method, and
the DSL used for translating standard library calls.
There exist many promising aspects that could be
explored upon in future work. Formal proof of the
correctness of the transformation should be given in
the near future. To this end, we plan to use for-
mal verification methods to ascertain that the trans-
formed program behaves equivalently to the original
software. Also, integrating our solution with more ex-
isting model-driven tools remains our main goal. For
example, instead of manually building a call graph,
Eclipse PTP (Watson and DeBardeleben, 2006) may
be used to easily and effectively generate it.
ACKNOWLEDGEMENTS
Project no. FIEK 16-1-2016-0007 has been imple-
mented with the support provided from the National
Research, Development and Innovation Fund of Hun-
gary, financed under the Centre for Higher Education
and Industrial Cooperation - Research infrastructure
development (FIEK 16) funding scheme.
REFERENCES
ATL (2008). ATL presentation. https:
//www.eclipsecon.org/2008/sub/attachments/
Modeltomodel Transformation with ATL.pdf.
Accessed: 2020-10-05.
Bettini, L. (2016). Implementing Domain Specific Lan-
guages with Xtext and Xtend - Second Edition. Packt
Publishing, 2nd edition.
Bruneliere, H., Cabot, J., Dup
´
e, G., and Madiot, F.
(2014). MoDisco: a Model Driven Reverse Engineer-
ing Framework. Information and Software Technol-
ogy, 56(8):1012–1032.
Czibula, I. G. and Czibula, G. (2011). Unsupervised trans-
formation of procedural programs to object-oriented
design. In Acta Universitatis Apulensis.
Everitt, B. S., Landau, S., and Leese, M. (2009). Cluster
Analysis. Wiley Publishing, 4th edition.
Fowler, M. (2010). Domain Specific Languages. Addison-
Wesley Professional, 1st edition.
Izurieta, C. and Bieman, J. (2013). A multiple case study of
design pattern decay, grime, and rot in evolving soft-
ware systems. Software Quality Journal, 21.
Langer, P., Wimmer, M., and Kappel, G. (2010). Model-
to-model transformations by demonstration. In Tratt,
L. and Gogolla, M., editors, Theory and Practice of
Model Transformations, pages 153–167, Berlin, Hei-
delberg. Springer Berlin Heidelberg.
McAffer, J., Lemieux, J.-M., and Aniszczyk, C. (2010).
Eclipse Rich Client Platform. Addison-Wesley Pro-
fessional, 2nd edition.
mtSystems (2020). mtSystems documentation. https://
www.mtsystems.com/. Accessed: 2020-10-05.
Oldevik, J., Neple, T., Grønmo, R., Aagedal, J., and Berre,
A.-J. (2005). Toward standardised model to text trans-
formations. In Hartman, A. and Kreische, D., editors,
Model Driven Architecture Foundations and Appli-
cations, pages 239–253, Berlin, Heidelberg. Springer
Berlin Heidelberg.
P
´
erez-Castillo, R., Guzm
´
an, I., and Piattini, M. (2011).
Knowledge discovery metamodel-iso/iec 19506: A
standard to modernize legacy systems. Computer
Standards & Interfaces, 33:519–532.
Sneed, H. M. (2011). Migrating pl/i code to java. In 2011
15th European Conference on Software Maintenance
and Reengineering, pages 287–296.
Terekhov, A. and Verhoef, C. (2000). The realities of lan-
guage conversions. IEEE Software, 17(6):111–124.
Watson, G. and DeBardeleben, N. (2006). Developing sci-
entific applications using eclipse. Computing in Sci-
ence & Engineering, 8:50 – 61.
Ying Zou and Kontogiannis, K. (2001). A framework
for migrating procedural code to object-oriented plat-
forms. In Proceedings Eighth Asia-Pacific Software
Engineering Conference, pages 390–399.
MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development
198