CloudTL: A New Transformation Language based on Big Data Tools and

the Cloud

Jes

us M. Perera Aracil and Diego Sevilla Ruiz

DITEC, University of Murcia, Campus Universitario de Espinardo, 30100, Espinardo, Murcia, Spain

Keywords:

MDE, Model-to-model Transformation, Cloud, Ecore.

Abstract:

Model Driven Engineering (MDE) faces new challenges as models increase in size. These so called Very

Large Models (VLMs) introduce new challenges, as their size and complexity cause transformation languages

to have long execution times or even not being able to handle them due to memory issues. A new approach

should be proposed to solve these challenges, such as automatic parallelization or making use of big data

technologies, all of which should be transparent to the transformation developer. In this paper we present

CloudTL, a new transformation language whose engine is based on big data tools to deal with VLMs in an

efﬁcient and scalable way, benchmarking it against the de facto standard, ATL.

1 INTRODUCTION

Models are a digital representation of reality, and as

such, their size is constantly increasing. Transfor-

mations are used for manipulating these models, but

nowadays are currently becoming slow or even im-

possible to run because of the huge size of input and

output models.

Big data (Manyika et al., 2011) tools have

been proven to be a solution for manipulating huge

amounts of data in considerable time, by using dis-

tribution across different compute nodes (both phys-

ical and/or virtual machines (VMs)). Thus, one can

think that using them to perform model transforma-

tions might be the way to go for dealing with VLMs.

The challenges that are arising in the Model

Driven Engineering (MDE) ecosystem due to the fact

of models becoming huge and complex can be seen

as a parallelism to what the data analytics is experi-

menting. In fact, it is a logical parallelism since in-

formation is being constantly generated and is getting

more and more complex. Thus, we believe that by us-

ing big data tools, which are already solving the issue

of manipulating inmense ammounts of information,

we could help improve the problems that MDE, and

model transformations in particular, are facing.

The Mondo Project (Kolovos et al., 2015) is a Spe-

ciﬁc Targeted Research Project (STREP) of the Sev-

enth Framework Programme for research and tech-

nological development (FP7) aiming to tackle the in-

creasingly important challenge of scalability in MDE

in a comprehensive manner. There has been publica-

tions and tools developed through this project which

have aided to get us closer to a solution of dealing

with VLMs.

As this kind of initiatives and projects indicate,

solving these problems is a real issue for MDE and

it is essential that solutions, tools, frameworks and

new languages are developed so that MDE can con-

tinue and adapt to the needs of Software Engineering

when faced to problems in which the huge magnitude

of data must be handled in an efﬁcient way.

This paper is structured as follows: Section 2

introduces the concepts used in our proposal for

building our distributed transformation language,

CloudTL. Section 3 presents our implementation and

benchmarks it agains ATL, the de facto standard. Sec-

tion 4 discusses related work and Section 5 summa-

rizes conclusions and future work.

2 BACKGROUND

2.1 Apache Storm

Apache Storm is a free and open source distributed

realtime computation system (Apache, 2016c). It is

commonly used as a big data tool to analyse streams

of information and obtain analytics or metadata from

them.

Perera Aracil J. and Sevilla Ruiz D.

CloudTL: A New Transformation Language based on Big Data Tools and the Cloud.

DOI: 10.5220/0006203101370146

In Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2017), pages 137-146

ISBN: 978-989-758-210-3

137

It is based on the idea of ﬂow of information,

called topologies, so that there are producers of in-

fromation, called Spouts, and nodes of computation,

called Bolts, which are interconnected. These inter-

connections are called Streams and are an unbounded

sequence of tuples that is processed and created in

parallel in a distributed fashion. Tuples are a collec-

tion of primitive types (strings, integers. . . ), but also

serializers can be deﬁned to use other types.

Streams connect spouts and bolts, as well as bolts

to other bolts, by deﬁning the structure of the tuples

that will be emitted through them. This intercon-

nection of spouts and bolts is called a Storm topol-

ogy, which deﬁnes the ﬂow of data from the spouts

to bolts. These streams are also aware of replication

of Storm bolts, so different strategies can be given for

emitting tupples (i.e., broadcast a tuple to all the repli-

cas, emit the tuple to a local target replica if it exists,

using consistent hashing to send tuples to the same

replica. . . ). Figure 1 shows a simple Storm topology

from the Storm webpage.

Figure 1: Simple Storm topology.

The execution of a Storm topology requires a

Storm cluster, which is responsible of starting spouts

and bolts, and the streams between them. There are

two types of Storm clusters:

• local: A local cluster runs on a single machine,

in which the Java main method is invoked as if

it were a normal Java program. This is a useful

option when developing and testing topologies.

• distributed: a cluster can be distributed between

different physicial or virtual machines, so that

the workload is shared between them. This

kind of cluster makes use of to keep synchro-

nization and keeping track of living nodes, such

as Nimbus (Keahey and Freeman, 2016) and

ZooKeeper (Apache, 2016d). This is the preferred

option for executing Storm topologies in produc-

tion mode.

2.2 Cloud Ecore

Cloud Ecore (Perera Aracil and Sevilla Ruiz, 2016) is

a distributed Ecore implementation in JSON (JSON,

2016). The authors developed both a speciﬁcation of

Ecore in JSON and a implementation which served

Ecore models and metamodels using a REST service.

The main characteristic of Cloud Ecore is that it is

URL based for the ids of elements that make a model

(i.e. any model element is identiﬁed by a valid URL).

We are using an optimized version of this imple-

mentation, in which EAttributess are part of the

JSON which is returned to the client when asked for a

particular URL. Code Example 1 shows a comparison

to clarify this modiﬁcation.

// EClass (Original)

// URL: http://www.example.com/repo/0/

// eClassifiers/0

{

"name":"http://www.example.com/repo/0/

eClassifiers/0/name",

"eClass":"http://www.example.com/repo/0/

eClassifiers/0/eClass",

...

}

// EClass (Optimized)

// URL: http://www.example.com/repo/0/

// eClassifiers/0

{

"name":"EClass",

"eClass":"http://www.example.com/repo/0/

eClassifiers/0/eClass",

...

}

Code Example 1: Partial EClass as JSON as both the origi-

nal and the optimized version.

This modiﬁcation helps reduce the number of

HTTP petitions that are sent to the server, thus be-

coming a faster implementation as well as improving

both the client and server performance.

Other optimization we have introduced is the pag-

ination of lists of elements. Now, lists are paginated

by default, each page containing a ﬁxed number of el-

ements. In this way, the parallelization of lists can be

performed through pages instead of having a vast list

and having to manually partition it.

3 CloudTL

CloudTL aims to be a model-to-model transformation

language ready to handle VLMs in an efﬁcient way,

inspired by the syntax of ATL (Jouault and Kurtev,

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

138

2006), and using big data tools and distributed com-

putation. Code Example 2 shows a simple transfor-

mation written in CloudTL, which implements a copy

transformation (i.e., it generates as output the same

model as it is given as input) for the metamodel shown

in Figure 2; it consists of a Super metaclass contain-

ing a multi-valued Sub EReference.

Simple

Sub

count: Int

subs

0..*

Super

Figure 2: Simple metamodel.

CloudTL contains three main sections which de-

ﬁne a transformation:

• Server section: The server serction of a CloudTL

transformation deﬁnes the servers which contain

the Cloud Ecore models and metamodels. These

servers are deﬁned by their IP address and port, as

well as a unique ID to be identiﬁed throught the

other sections wherever needed. In the example,

we have deﬁned a server named local which is at

http://<IP>:<Port>.

• Namespace section: The namespace section de-

ﬁnes the namespaces, both input and output, of

the transformation. These are deﬁned by a unique

ID, a reference to a server deﬁned in the server

section, as well as a path (URL) to the Cloud

Ecore metamodel. In the example, we deﬁne the

A namespace as the input, which is contained in

the local server at the repo/1 path (i.e., the meta-

model will be at http://<IP>:<Port>/repo/1).

Namespace B is the output namespace of the trans-

formation and, since this is a copy transformation,

points to the same path as the input.

• Rule section: The rule section contains the differ-

ent rules that compose a transformation, indicat-

ing both their input and output metaclass. There

is a unique special rule which must be identiﬁed

by the init keyword, marking it as the ﬁrst rule

to be executed whenever the transformation starts.

Rules have a collection of operations that set

the target element EStructuralFeatures from the

source element. There are, currently, three operations

supported by CloudTL:

Initializators of Primitive Types. They set a target

EStructuralFeature to a deﬁned value.

tgt.count = 5;

This operation would set the count feature of tgt to

5 for every execution of the rule.

Copying of Values. If the source and target

EStructuralFeatures are EAttributes, CloudTL

will just copy the value from the source to the target

without invoking other rule. An example would be

tgt.name = src.name;

This operation would copy the name feature from the

src instance to the tgt name feature.

Transformation Invokations. If the source and

target EStructuralFeatures are EReferences,

CloudTL will ﬁnd a suitable rule (i.e., a rule which

transforms an instance from the source feature value

to the target one) and invoke it. The operation

tgt.subs = src.subs;

would imply that CloudTL must ﬁnd a rule which

transforms a Sub to Sub, and invoke it, transforming

all the instances of the source subs collection.

transformation S imp l eCo py {

servers {

local@ < IP >: < Port >

}

namespaces {

in A as lo ca l@ re po /1

out B as lo ca l@ re po /1

}

init s up er 2s up e r {

from src is A :: Su per

to tgt is B :: Su per {

tgt . su bs = src . subs ;

}

sub 2s u b {

from src is A :: Sub

to tgt is B :: Sub {

tgt . coun t = src . cou n t ;

}

Code Example 2: CloudTL SimpleCopy transformation.

CloudTL is created as a Xtext framework (Foun-

dation, 2016), and thus it compiles to standard Java

code, which can be invoked or utilize any Java library

available. It also generates an Eclipse editor and pow-

erful aids when developing a transformation, such as

auto-completion, by querying the Cloud Ecore meta-

model.

3.1 Compilation to Java

We have used Apache Storm as the backend for the

engine of our language, so what the Xtext compiler

CloudTL: A New Transformation Language based on Big Data Tools and the Cloud

139

does is create a customized Spout and several Bolts,

as well as some helper classes which aid in the trans-

formation process. The generation tries to extract as

much information as possible from the Cloud Ecore

input and output metamodels and making it static (i.e.,

not needing to query it again when running the trans-

formation), so that there is as little network trafﬁc

as possible when the transformation executes. Cloud

Ecore (Perera Aracil and Sevilla Ruiz, 2016) demon-

strated that network trafﬁc is one of the most impor-

tant factors that degrade the performance of its dis-

tributed mechanism, and thus, we try to minimize it

as much as possible.

The Storm topology is created by analizing the

transformation rules and their data dependencies be-

tween them. For example, if the deﬁnition of rule

A transforms a target EReference from a source

one, this would be done by invoking rule B. In

out copy transformation, this could be seen in the

super2super rule: transforming the EReference

subs for the target metamodel is done by invoking

a rule which can transform a source Sub element to

a target one (i.e., invoking the sub2sub rule over the

subs collection of the source Super element).

Each rule of the transformation will generate a

bolt, which will be responsible for generating its tar-

get model element form the source element. The bolt

will be responsible for contacting the Cloud Ecore

server and HTTP get the URL representing the source

element, which will be a JSON Object containing

all the EStructuralFeatures. Rules are composed

of transformation operations, which will be com-

piled as emit statements to other bolts, depending on

the source and target EStructuralFeatures that are

transforming.

• Mono-valued EAttributes: will consist of

adding the value to the target JSON Object as a

new JSON pair.

• Multi-valued EAttributes: will consist of an

emit statement to a helper bolt which will iterate

over the collection and generate the target collec-

tion.

• Mono-valued EReferences: will consist of an

emit statement to the bolt which is responsible

for the transformation of the EClass of the source

EReference to the target one.

• Multi-valued EReferences: will consist of an

emit statement to a helper bot which will iterate

over the collection and emit each element to the

corresponding transformation bolt, as if it was a

mono-valued EReference, and will generate it as

a target collection.

The Storm topology created by the transformation

SimpleCopy can be seen in Figure 3. As mentioned

before, SimpleCopySpout is created, responsible for

invoking the ﬁrst rule of the transformation with both

the input model URL and the output model URL.

Data dependencies are extracted and analized

from the transformation text, so a valid topology can

be created. If the language detects that it cannot solve

a data dependencie (i.e., we have forgotten to include

the sub2sub rule or we have used the incorrect input

and output metaclasses), it will generate an editor er-

ror (i.e., marking the data dependency in red) so it can

be ﬁxed.

These data dependencies are EReferences to

other model elements, so transforming these are done

by invoking another transformation rule. This rule

is found by matching the source and target elements

EClasses to the input and output metaclasses of a

rule. In code, this is generated as two different pieces

of code in the generated Java code:

• A stream from a bolt to another (representing the

dependency of a rule to another). This is gener-

ated in the Launcher class, where the Storm topol-

ogy is conﬁgured.

• A emit statement in the execute method of a bolt

to send information to another bolt (representing

the execution of a particular model element).

If the EReference is multi-valued, it takes into

account that it is paginated, so two additional

helper bolts are generated in order to deal with

the pagination and the invokation of each element

of each page. In our example, since the subs

EReference is multivalued, the generation will cre-

ate two new bolts Super2superSubsPages, which

is in charge of iterating over the pagination, and

Super2superSubsElements, which will emit of

each Sub element to the corresponding transforma-

tion bolt (i.e., in our transformation to Sub2sub). On

the other hand, if the EReference is mono-valued, no

helper bolts will be generated, and the emit statement

will directly invoke another bolt (i.e., transformation

rule).

A Launcher class is generated in which the bolts

and spout are conﬁgured to build the Storm topology.

This class contains the main method. A Spout will

be generated which is in charge of invoking the initial

rule of a transformation, sending it the root input ele-

ment and deﬁning where the root output element must

be created.

Other helper bolts are also generated to aid in

the transformation process, such as the TraceBolt,

which is responsible for tracking target elements

transformed from source elements (i.e., similar to a

TraceLink in the ATL engine). The PosterBolt

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

140

Super2superSubsElements

Super2superSubsPages

TraceBolt

PosterBolt

Sub2sub

Super2super

SimpleCopySpout

Figure 3: SimpleCopy Storm Topology.

is in charge of making the HTTP post calls in or-

der to create the target model at a speciﬁed URL,

sending the transformed JSON element. Each of the

bolts generated from transformation rules will have a

stream connecting them to both the TraceBolt and

the PosterBolt, so that they can query elements al-

ready created or post the result of their execution (i.e.,

the target element created by executing it).

TraceBolt is special with respect to its gener-

ation, since elements will be sent to these bolts by

hashing the source element, and so, this bolt is a dis-

tributed data structure. If it is queried for an element

which has not yet been transformed, it will behave in

a way inspired by Koan (S

anchez Cuadrado and Per-

era Aracil, 2014) and the use of continuations: the

TraceBolt will keep the information of “pending”

queries and serve them whenever it is available in the

future.

Finally, the generator will create a SBT (Light-

bend, 2016) build ﬁle so that the transformation can

be compiled, imported to Eclipse or packaged into a

single jar ﬁle to be uploaded and executed in a Storm

cluster.

3.2 Execution

The execution of a CloudTL transformation is done

by compiling and packaging the generated Java code

to a jar ﬁle and uploading it to a Storm cluster,

although executing it as a regular Java program is

present, we recommend using the distributed cluster,

so that parallelism and replication can be achieved.

The Storm cluster will then deploy this as a regular

Storm topology, distributing the transformation rules

(bolts) throughout the cluster. Then, when all the

bolts are ready, Storm will execute the spout, which

will trigger the start of the transformation process.

This is simply emitting a tuple with the source URL

from the root element of the input model and the tar-

get URL where the output element should be stored.

Each rule will produce a JSON Object following our

optimized Cloud Ecore speciﬁcation and uploading it

to its target URL. It will emit a message to TraceBolt

indicating that a source element has been transformed

to a target element in this rule, thus, providing this in-

formation to be queried in the future for other rules.

Then, each element will be ﬁlled with the necessary

information for it to be a valid Ecore JSON represen-

tation (i.e., ﬁlling in its EClass reference, which is

extracted from the transformation deﬁnition, and all

the other EReferences deﬁned for EObject). Finally,

the ﬁnal JSON Object will be sent to a PosterBolt

to be posted to the speciﬁed target URL.

The creation of the JSON Object depends on

the type of EStructuralFeature of the target and

source element. If a it is a mono-valued EAttribute,

the source attribute will be directly copied into the

resulting JSON Object, as well as if it is a initializa-

tion. A mono-valued EReference will be created as

a pointer to the URL of the target feature of the rule

that transforms those features. Code Example 3 il-

lustrates with our running example how our language

will transform some EStructuralFeatures.

The TraceBolt is needed whenever a rule needs

to transform a mono-valued EReference whose

containment feature is false. This means that

CloudTL must not execute any rule to transform it,

since it will be transformed through another rule.

Thus, the TraceBolt must be queried in order to ob-

tain the URL in which that element has been trans-

formed. In case the rule in charge of transforming it

has not been executed yet, it will be marked in the bolt

so that whenever it enters the trace, it can be

CloudTL: A New Transformation Language based on Big Data Tools and the Cloud

141

init s up er 2s up e r {

from src is IN :: Sup er

to tgt is OUT :: S uper {

// This is a multi - val u ed

ER ef e re nc e

// Thus it w ill be t ra ns fo rm ed by

// in vo kin g a rule w hich can

tr ans fo rm

// a Su b s o urc e in s ta nc e i nto a

Sub

// ta r ge t i ns ta n ce ( whic h is wh at

// t h e s ub2 sub rule doe s )

tgt . su bs = src . subs ;

}

su b 2s u b {

from src is IN :: Sub

to tgt is OUT :: Sub {

// This is a mono - val ued

EA tt r ib ut e

// Thus , it will be t r an sf or me d

// co pyi ng the v alue of src . coun t

// into the r es ul ti n g JSON O b jec t

// in a new JSON pai r .

tgt . coun t = src . cou n t ;

}

Code Example 3: CloudTL Example.

3.3 Benchmarks

A series of benchmarks have been executed in or-

der to test the performance of our implementation

against ATL EMFTVM (Wagelaar et al., 2011) en-

gine. We have implemented in ATL the same trans-

formation shown in Section 3, which can be seen

in Code Example 4. As it can be seen, both trans-

formation seem identical, with some syntax differ-

ences. We have conducted ATL benchmarks by gen-

erating a Java launcher using the “ATL Plugin” wiz-

ard provided with the language in Eclipse and fol-

lowing the indications in (IBM, 2008a) and (IBM,

2008b). CloudTL benchmarks have been conducted

by modifying manually the generated Spout to in-

dicate the ellapsed seconds since the topology emit-

ted the ﬁrst tuple. Since there is no Storm bench-

mark solution (Apache, 2015), we have executed the

SimpleCopy CloudTL transformation once, with one

task and 1 hint parallelism as conﬁguration for the

Storm topology. We have used 3 virtual machines cre-

ating a Storm cluster:

• VM 1: responsible for the MongoDB (Mon-

goDB, 2016) database, Cloud Ecore server and

ZooKeeper, Nimbus and Storm UI servers.

• VM 2 and 3: Storm supervisors (working nodes).

-- @path MM=/SimpleCopyATL/

mod els / S imp le . ec o re

-- @path MM1=/SimpleCopyATL/

mod els / S imp le . ec o re

module S im pl e Co py ;

create OUT : MM1 from IN : MM ;

rule s up er 2s up e r {

from src : MM ! Sup er

to tgt : MM1 ! Supe r (

subs <- src . sub s

)

}

rule s u b2 sub {

from src : MM ! Sub

to tgt : MM1 ! Sub (

cou n t < - src . coun t

)

}

Code Example 4: ATL SimpleCopy transformation.

Our client PC setup is the following:

• Intel i7 3770K 3.90GHz

• 16 GB RAM

• 10 Mbps downstream internet connection

• 600 Kbps upstream internet connection

• Windows 10 Pro 64 bit

• Eclipse 4.5.2 Mars

• JDK 1.8.0 111

This PC has been used for the execution of the

ATL benchmark as well as the host for the VMs used

for CloudTL.

Our VMs setup is the following:

• Single core virtualized Intel i7 3770K 3.50GHz

• 2 GB RAM

• 10 Mbps downstream internet connection

• 600 Kbps upstream internet connection

• Ubuntu Linux 16.04 LTS (Xenial Xerus)

• Kernel 4.4

• JDK 1.8.0 91

We will be usign the same metamodel shown Fig-

ure 2 as the input and output metamodel that will be

used in this example. We have created 9 different

models of different sizes, from 10000 to 90000 ele-

ments, in steps of 10000.

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

142

The results can be seen in Table 1 and graphically

in Figure 5 for ATL execution times and Figure 4 for

CloudTL execution times.

20000 40000 60000 80000

50 100 150 200 250 300 350

CloudTL mean time

Elements

Time (secs)

Figure 4: CloudTL Benchmarks.

The regression equation for the CloudTL bench-

mark is f

cloud tl

(x) = 4.076e

−03

x + 16.34 and R

0.9983. As shown, the algorithm complexity for

CloudTL is linear (O(n)).

20000 40000 60000 80000

0 5 10 15 20

ATL mean time

Elements

Time (secs)

Figure 5: ATL Benchmarks.

The regression equation for the ATL benchmark

is f

atl

(x) = 2.745e

−09

− 3.738e

−05

x + 0.5302 and

= 0.9954 . As shown, the algorithm complexity

is quadratic (O(n

)), as it is expected due to the im-

plementation of two passes of ATL (see (Jouault and

Kurtev, 2006) Section 3.5).

If we equalize both regression functions, we can

speculate that CloudTL is slower than ATL until the

model size reaches around 706042.5 elements. Thus,

we have created a second set of benchmarks in which

the model have 10 times more elements, so that we

can demonstrate that CloudTL is faster whenever

the input model has a certain number of elements.

The new 9 models used in this second benchmark

range from 100000 to 900000 elements, by steps of

100000. The results of this second set of tests can

Table 1: ATL and CloudTL execution times (in seconds).

Model Elements ATL CloudTL

10000 0.272 51.313

20000 0.877 102.795

30000 2.028 136.216

40000 3.672 188.648

50000 5.779 224.044

60000 7.579 253.035

70000 11.42 290.471

80000 14.906 352.694

90000 19.641 382.055

be graphically seen in Figure 6 for ATL and Figure

7 for CloudTL, including the ﬁrst set of benchmarks,

demonstrating that all tests follow the same algorithm

complexity.

0 200000 400000 600000 800000

0 1000 2000 3000 4000 5000

ATL mean time

Elements

Time (secs)

Figure 6: ATL Benchmarks (set 2).

The regression formula for ATL, tak-

ing into account the new benchmarks, is

atl

(x) = 7.687e−09x

− 9.591e−04x + 32.28

and R

= 0.9973.

The regression formula for CloudTL, taking

into account the new benchmarks, is f

cloud tl

(x) =

3.220e−03x + 46.8 and R

= 0.9959. Equalizing

these new functions, we can deduce that CloudTL is

faster than ATL when the input model has more than

547110.6 elements. The data collected in this bench-

mark can be seen in Table 2.

We have demonstrated that CloudTL is faster than

ATL when dealing with models with a high number

of elements, while it is slower when the number of el-

ements is low due to the fact that we have to distribute

and contact through the net the Storm cluster. Figure 8

shows all 4 benchmarks in a single graph, demonstrat-

ing how ATL outperforms CloudTL for input models

with less than 500000 elements and how it quickly de-

grades for bigger input models, due to the fact that its

transformation algorithm is quadratic in complexity.

It can be clearly seen how CloudTL will keep on

CloudTL: A New Transformation Language based on Big Data Tools and the Cloud

143

0 200000 400000 600000 800000

0 500 1000 1500 2000 2500 3000

CloudTL mean time

Elements

Time (secs)

Figure 7: CloudTL Benchmarks (set 2).

0 200000 400000 600000 800000

0 1000 2000 3000 4000 5000

ATL vs CloudTL

Elements

Time (secs)

Figure 8: ATL vs CLoudTL Benchmarks.

Table 2: ATL and CloudTL execution times (in seconds).

Model Elements ATL CloudTL

100000 28.478 407.981

200000 176.084 651.932

300000 446.32 1047.269

400000 908.668 1216.150

500000 1396.361 1600.266

600000 2056.816 1821.210

700000 3213.287 2344.460

800000 4423.49 2694.124

900000 5243.931 3019.100

getting a better time execution than ATL for even big-

ger models, since the difference in execution time for

two different algorithms (O(n

) and O(n)) gets bigger

and bigger.

4 RELATED WORK

Mondo Project (Kolovos et al., 2015) aims to tackle

the increasingly important challenge of scalability

in MDE in a comprehensive manner. They have

supported different projects and investigations which

brings us closer to being able to handle VLMs in an

efﬁcient way.

Parallel ATL (Tisi et al., 2013) show an ATL en-

gine implementation and compiler which parallelizes

transformations using new opcodes for the ATL vir-

tual machine. They have demonstrated a speedup of

up to 2.5 in execution time with respect to the stan-

dard ATL engine using a CPU with 4 physical cores.

ATL-MR (Benelallam et al., 2015a) (Benelallam

et al., 2015b), supported by the Mondo Project, show

an implementation of ATL using Map Reduce and

Hadoop using VMs. On some experiment, using up

to 8 different VMs, the improvement shown has been

up to 6 times faster than the default EMFTVM im-

plementation. It is not mentioned whether they have

achieved a better algorithm complexity or not, but we

suppose the base ATL algorithm is used, and thus it is

still O(n

Koan (S

anchez Cuadrado and Perera Aracil, 2014)

is a transformation language that uses continuations

in order to simulate the parallelism of execution of

rules, and analyses data dependencies between rules

to schedule automatically the execution of rules so

that they can be resolved in a optimal way. If it de-

tects a cycle, continuations will be used in order to

keep running the transformation and try to create the

needed element.

A roadmap (Clasen et al., 2012) has been pro-

posed for the transformation of VLMs, in which they

discuss the importance of distributing models and

strategies for partitioning them.

CPU+GPU heterogeneous architectures (Fekete

and Mezei, 2016) are being studied in order to build a

transformation tool using OpenCL (Group, 2016).

5 CONCLUSION AND FUTURE

WORK

In this paper we have presented a new model-to-

model transformation language that uses Big Data

technologies as its core building element. Our lan-

guage demonstrates that these new technologies can

be used in MDE in order to achieve faster execu-

tion times of transformations when input models are

VLM. We have shown that our transformation lan-

guage not only improves time execution when han-

dling large models, but also improving the algorithm

complexity of the transformation language.

Algorithm complexity is an important parameter

to take into account when dealing with large inputs

of data, such as when transforming VLMs. This can

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

144

actually make it difﬁcult or even impossible to exe-

cute successfully a transformation or any handling of

the model. Thus, improving it should be a priority

for the next transformation languages, implementa-

tions and improvements to be able to keep using MDE

for the future. We have empirically demonstrated that

our implementation improves the de facto standard

whenever a model of a deﬁned size is used as input

for a transformation.

CloudTL has demonstrated that whenever creat-

ing a new transformation language or tool, attention

must be payed to the algorithm complexity of the en-

gine that implements it, since it becomes an important

issue when dealing with VLMs.

As for future work, we believe that there is still

much work to do and by using big data tools, it opens

up a whole new way of studying and creating model

transformations.

We plan on studying using other big data tools

such as Apache Spark (Apache, 2016b) or Apache

Flink (Apache, 2016a) for the back-end of our the

transformation language, as they are tools or have

subprojects for big data streaming analysis which

could be adapted to our language.

New types of rules could make the development of

transformation in CloudTL easier, since they would

allow for a more ways of interacting and scheduling

the transformation. We plan on adding a type of rule

based on lazy rules from ATL, which are rules that

get executed only when explicitly invoked.

We are also studying the implementation of

CloudTL cluster by using Amazon EC2. This way,

the optimization of the virtual machines needed for a

given trasnformation could be done elastically and ef-

ﬁciently, as well as this would help to reduce the net-

work overhead when two bolts are in different virtual

machines.

We plan to enrich the library with useful opera-

tions for the base types (String, Int. . . ) so that we

can have better statements in our transformation lan-

guages by enabling the programmer with better type

support (i.e. concatenation of strings, addition of in-

tegers. . . ).

We would like to expand the transformation lan-

guage and add ﬁlters for the input elements, so that

rules are executed if and only if the input element

passes the ﬁlter. We are considering to incorporate

a mechanism to auto-detect the init ruled based on the

EClasses of the root metamodels from the input and

ouput.

Having our language handle as input not only

Cloud Ecore models, but also other types of structured

data is interesting, since it would allow to bring into

MDE and modeling databases and projects that have

not considered it. This would require that CloudTL

could infer the structure (i.e., build an internal meta-

model from the structured data) using tools such

as JSONDiscoverer (C

anovas Izquierdo and Cabot,

2016) as a previous step to the generation of Storm

and Java code.

REFERENCES

Apache (2015). Jira for storm. https://issues.apache.org/

jira/browse/STORM-642.

Apache (2016a). Flink. http://ﬂink.apache.org/.

Apache (2016b). Spark. http://spark.apache.org/.

Apache (2016c). Storm. http://storm.apache.org/.

Apache (2016d). Zookeeper. https://zookeeper.apache.org/.

Benelallam, A., G

omez, A., and Tisi, M. (2015a). ATL-

MR: model transformation on MapReduce. In Pro-

ceedings of the 2nd International Workshop on Soft-

ware Engineering for Parallel Systems - SEPS 2015.

Association for Computing Machinery (ACM).

Benelallam, A., G

omez, A., Tisi, M., and Cabot, J.

(2015b). Distributed Model-to-Model Transformation

with ATL on MapReduce. In Proceedings of 2015

ACM SIGPLAN International Conference on Soft-

ware Language Engineering (SLE 2015), Pittsburgh,

United States.

anovas Izquierdo, J. L. and Cabot, J. (2016). JSONDis-

coverer: Visualizing the schema lurking behind JSON

documents. Knowledge-Based Systems, 103:52–55.

Clasen, C., Didonet Del Fabro, M., and Tisi, M. (2012).

Transforming Very Large Models in the Cloud: a Re-

search Roadmap. In First International Workshop

on Model-Driven Engineering on and for the Cloud,

Copenhagen, Denmark. Springer.

Fekete, T. and Mezei, G. (2016). Towards a model trans-

formation tool on the top of the OpenCL framework.

In Proceedings of the 4th International Conference

on Model-Driven Engineering and Software Develop-

ment, pages 355–360. Scitepress.

Foundation, E. (2016). Xtext. http://www.eclipse.org/

Xtext/.

Group, K. (2016). Opencl. https://www.khronos.org/

opencl/.

IBM (2008a). Robust java benchmarking, part 1: Issues.

http://www.ibm.com/developerworks/java/library/

j-benchmark1/index.html.

IBM (2008b). Robust java benchmarking, part 2: Statistics

and solutions. https://www.ibm.com/developerworks/

java/library/j-benchmark2/.

Jouault, F. and Kurtev, I. (2006). Transforming models with

ATL. In Satellite Events at the MoDELS 2005 Con-

ference, pages 128–138. Springer Science + Business

Media.

JSON (2016). Json. http://json.org/.

Keahey, K. and Freeman, T. (2016). Nimbus. http://

www.nimbusproject.org/.

CloudTL: A New Transformation Language based on Big Data Tools and the Cloud

145

Kolovos, D. S., Rose, L. M., Paige, R. F., Guerra, E.,

Cuadrado, J. S., de Lara, J., R

ath, I., Varr

o, D., Suny

G., and Tisi, M. (2015). MONDO: scalable modelling

and model management on the cloud. In Proceedings

of the Projects Showcase, part of the Software Tech-

nologies: Applications and Foundations 2015 federa-

tion of conferences (STAF 2015), L’Aquila, Italy, July

22, 2015., pages 44–53.

Lightbend (2016). Sbt. http://www.scala-sbt.org/.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R.,

Roxburgh, C., and Byers, A. H. (2011). Big data: The

next frontier for innovation, competition, and produc-

tivity.

MongoDB (2016). Mongodb website. https://

www.mongodb.org/.

Perera Aracil, J. M. and Sevilla Ruiz, D. (2016). To-

wards distributed ecore models. In Proceedings of the

4th International Conference on Model-Driven Engi-

neering and Software Development, pages 209–216.

Scitepress.

anchez Cuadrado, J. and Perera Aracil, J. M. (2014).

Scheduling model-to-model transformations with

continuations. Softw., Pract. Exper., 44(11):1351–

1378.

Tisi, M., Martinez, S., and Choura, H. (2013). Parallel Ex-

ecution of ATL Transformation Rules. In MoDELS,

pages 656–672, Miami, United States.

Wagelaar, D., Tisi, M., Cabot, J., and Jouault, F. (2011). To-

wards a general composition semantics for rule-based

model transformation. In Model Driven Engineer-

ing Languages and Systems, pages 623–637. Springer

Science + Business Media.

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

146