Integration of Hawk for Model Metrics in the MEASURE Platform

Orjuwan Al-Wadeai

, Antonio Garcia-Dominguez

, Alessandra Bagnato

, Antonin Abherve

and Konstantinos Barmpis

SARI, School of Engineering and Applied Science, Aston University, Birmingham, U.K.

Softeam, Research and Development Department, Paris, France

Department of Computer Science, University of York, York, U.K.

Keywords:

Specialized Information Retrieval, Uniﬁed Modeling Language (UML), Model Repositories, Big Data,

Scalable Model Querying.

Abstract:

The MEASURE project aims to integrate metrics across all phases of the software development lifecycle into

a single decision support platform. For the earlier phases, metrics can be derived from models. Industrial use

of model-driven engineering produces large model repositories, and high-performance querying is key to keep

their metrics up to date. This paper presents an integration between the MEASURE metrics platform and the

Hawk model indexing tool. Hawk was improved in several ways, such as adding support for the new Modelio

metamodelling framework, or allowing Hawk servers to be provisioned through conﬁguration ﬁles rather than

through its web services. MEASURE and Hawk were then combined successfully to extract metrics from

Modelio models of various domains, and Hawk was able to index and efﬁciently answer queries about the

2GB collection of models used by Softeam to develop Modelio.

1 INTRODUCTION

Growing global competition and system complexity

in the software industry means that companies need

to meet ever increasing demands without compromi-

sing on quality and delivery times (Shields, 2014).

Many companies have taken to automation and placed

a strong emphasis on nimble iteration in their proces-

ses to address these challenges. However, this auto-

mation and iteration need to be guided by up-to-date

and accurate metrics, and traditional approaches are

not up to the task.

The MEASURE ITEA3 consortium (Softeam

R&D, 2017) aims to cover this gap by developing a

comprehensive set of tools for automated and con-

tinuous measurement over all stages of the software

development lifecycle. It includes the development of

better metrics and ways to analyse the big data pro-

duced by continuous measurements, the validation of

those metrics by the integration of the metrics and

tools into running processes in various industrial part-

ners, and the creation of decision support tools for

project managers through the visualisation of the col-

lected data. As shown in Figure 1, MEASURE re-

volves around a central data collection and analysis

platform (the “MEASURE platform” from now on), a

web application that integrates all other efforts.

Unlike other metric platforms, MEASURE aims

to collect metrics about more than just code, as it co-

vers the entire software lifecycle and not just those

stages related to coding or testing. Particularly, one

of the aims is to collect metrics about models. One

of the MEASURE project partners in particular (Sof-

team Cadextan) is the lead developer of the commer-

cial open-source Modelio (Softeam Cadextan, 2017)

modelling tool. Softeam wanted to integrate metrics

about Modelio business, requirements and design mo-

dels into the MEASURE platform. This would re-

quire a technology that provided high-performance

querying from potentially very large models, as those

seen in the ﬁeld by Modelio. For instance, it is com-

mon to have millions of elements for models reverse

engineered from large code bases (e.g. for software

modernisation). Distributed teams producing models

concurrently can also create large collections of mo-

del elements as time goes on. Rather than deve-

lop their own from scratch, Softeam decided to reuse

Hawk, a scalable model indexing and querying fra-

mework that had been successfully integrated with

their Constellation product for collaborative model-

Al-Wadeai, O., Garcia-Domínguez, A., Bagnato, A., Abherve, A. and Barmpis, K.

Integration of Hawk for Model Metrics in the MEASURE Platform.

DOI: 10.5220/0006732207190730

In Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2018), pages 719-730

ISBN: 978-989-758-283-7

719

Figure 1: General approach for the MEASURE ITEA3 project.

ling (Garcia-Dominguez et al., 2016).

Hawk could not be used as-is, however. The ME-

ASURE platform imposed new requirements for a se-

amless integration, and recent releases of Modelio

had fundamentally changed the way the structure of

the models could be described to Hawk. The overall

practicality and scalability of Hawk for this domain

had to be re-evaluated as well. This paper presents the

work that was conducted to clarify these new require-

ments and re-engineer Hawk for this new application,

and the evaluations that were conducted to validate

Hawk as the model metrics component for the MEA-

SURE platform.

The rest of this paper is organized as follows:

Section 2 provides a background about MEASURE,

Modelio and Hawk. Section 3 summarizes the inte-

gration efforts of Hawk into MEASURE. Section 4

shows our initial validation of the ﬂexibility of this

approach, followed by an evaluation of its scalability.

Section 6 provides some conclusions and future lines

of work.

2 BACKGROUND

This section will introduce some basic concepts re-

quired to understand the rest of the work: the MEA-

SURE platform, the Modelio modelling environment,

and the Hawk model indexing framework.

2.1 MEASURE Platform

The MEASURE platform is an open-source web ap-

plication (Abherve et al., 2017) which allows col-

lecting, calculating and visualizing data collected by

executing measures deﬁned according to the Struc-

tured Metrics Meta-Model (SMM) (Object Manage-

ment Group, 2016). A measure is a method for assig-

ning measurements (numerical or symbolic values) to

entities (measurands). An observation applies a set

of measures to a certain scope, obtaining speciﬁc me-

asurements of the attributes of interest.

Within the MEASURE platform, this observation

scope is deﬁned through projects deﬁned by the user,

and phases that they may go through. Observations

and their measurements are collected into dashboards

like the one in Figure 2. Particularly, observations are

known as measure instances in the MEASURE plat-

form: they conﬁgure the measure itself in some way

and provide a scope of what should be the measurand.

Figure 3 shows the current user interface for mana-

ging measure instances.

There are two types of SMM measures: direct me-

asures that are taken from a measurand through some

process (e.g. lines of code), and derived measures

whose values are computed from others (e.g. a ratio

or a sum). Measures are contributed by users as Java

classes written against the interfaces of the SMM Me-

asure API library (Abherve, 2017), with some additi-

onal XML metadata.

As an example, one of these direct measures im-

plemented in the MEASURE platform is an integra-

tion with the SonarQube code-centric continuous eva-

MOMA3N 2018 - Special Session on Model Management And Analytics

720

Figure 2: Screenshot of a sample dashboard in the MEASURE platform.

Figure 3: Screenshot of measure instances in the MEASURE platform.

Integration of Hawk for Model Metrics in the MEASURE Platform

721

luation platform

. The implementation of the SMM

Measure API interfaces invokes the service of the So-

narQube platform, feeding its information to MEA-

SURE to be combined with other metrics not related

to code.

2.2 Modelio

Modelio (Softeam Cadextan, 2017) is a commer-

cial open-source modelling environment developed

by Softeam. It supports multiple notations (UML,

BPMN, and SysML among others). While Modelio

is developed with the Eclipse Rich-Client platform, it

is not based on the popular Eclipse Modelling Frame-

work (EMF). Instead, it uses its own metamodelling

infrastructure and ﬁle format (EXML). The standard

OMG XMI format is only available as an export for-

mat. More details on the EXML format are availa-

ble from our prior work in integrating Hawk and Mo-

delio (Garcia-Dominguez et al., 2016).

One important detail is that while it is open-

source, the exact license changes depending on the

component. The core runtime is under the Apache Li-

cense, but most of the other components are under the

GNU General Public License version 3 (GPLv3). The

incompatibility of the GPLv3 license with other po-

pular open licenses was already an issue in our prior

work: we will need to refer back to this later on.

Other distinguishing features for Modelio are the

availability of a full-ﬂedged scripting environment for

various modeling tasks (e.g. model transformation

or code generation), and the strong support for re-

verse engineering of models from existing codebases.

Scripting has been useful to test its scalability, as it

is quite convenient for generating large synthetic mo-

dels.

2.3 Hawk

As mentioned before, Hawk was selected to pro-

vide model metrics for MEASURE as it had been

successfully integrated with other Modelio-based pro-

ducts from Softeam in the past, delivering the desired

functionality and performance for the Constellation

collaboration tool. This section will provide a high-

level description of some of the details behind Hawk.

Hawk (Barmpis and Kolovos, 2013) is a hetero-

geneous model indexing framework that keeps track

of collections of ﬁle-based models and maintains

a NoSQL model-element-level graph database with

their latest versions, in order to provide efﬁcient and

scalable model querying. Hawk is distributed as

https://github.com/ITEA3-Measure/Measures

Figure 4: Component-based architecture of the Hawk mo-

del indexing framework.

open-source software under the Eclipse Public Li-

cense (EPL) 1.0.

2.3.1 Architecture

As shown in Figure 4, Hawk follows a component-

based architecture. These are the most important

component types:

Model Parsers: understand various ﬁle formats for

storing models and metamodels (descriptions of

the structure of a set of models). The Ecore XMI

parser understands models created by tools ba-

sed on the Eclipse Modelling Framework, for in-

stance. There was also a model parser component

for Modelio 3.4, which was developed before this

paper (Garcia-Dominguez et al., 2016).

Backends: integrate Hawk with different database

technologies. At the moment, two backends are

supported: OrientDB and Neo4j. OrientDB is

preferred by Softeam, since its Apache licensing

is compatible with Hawk’s EPL license without

requiring separate arrangements.

Version Control System Connectors: allow Hawk

to read models in various types of locations. At

the moment, Hawk supports local folders, Subver-

sion or Git VCS, arbitrary HTTP locations, and

Eclipse workspaces.

Query Engines: answer questions written in certain

query languages for any type of model indexed by

Hawk. Three languages are available currently:

the Epsilon Object Language and Epsilon Pat-

tern Languages (Kolovos et al., 2017) can be used

with the Neo4j or OrientDB backends, whereas

the OrientDB SQL-like dialect is only available

for that backend.

MOMA3N 2018 - Special Session on Model Management And Analytics

722

2.3.2 Using Hawk

Hawk can be used as a library, as a set of plugins for

the Eclipse IDE, or as a network service. Regardless

of the method, using Hawk generally consists of these

steps:

1. Creating an index with a particular backend in a

certain directory.

2. Registering the relevant metamodels describing

the structure of the models we will index.

3. Registering the locations where our models are

stored (e.g. folders or VCS).

4. Optionally, deﬁning indexed attributes and deri-

ved attributes for faster searching. Indexed attri-

butes make it possible to quickly ﬁnd model ele-

ments by the value of their attributes. Derived at-

tributes extend certain model element types with

new properties derived from their regular attribu-

tes, and can be used for fast searching and perfor-

ming expensive computations in advance.

5. Waiting for Hawk to index all ﬁles mentioned, and

then querying through one of the available engi-

nes. Hawk will detect changes in the model ﬁles

and trigger updates as needed.

As a standalone server, Hawk exposes its capa-

bilities through a set of HTTP(S) web services on

top of the Apache Thrift (Apache Software Founda-

tion, 2017) messaging library. Thrift makes it pos-

sible to support multiple messaging formats (proto-

cols in Thrift terminology) in the same API, which

cover different tradeoffs between language compati-

bility (JSON is the most compatible) and performance

(the “tuple” binary format produces the smallest mes-

sages).

3 INTEGRATION OF HAWK

INTO MEASURE

The previous section presented the MEASURE plat-

form, the Modelio modelling tool and the Hawk mo-

del indexing framework separately. This section will

discuss the new requirements that arose from the need

to use Hawk for model metrics in MEASURE, and

how these were met.

3.1 New Requirements

Softeam had prior experience with Hawk from the

MONDO EU project in scalable modelling (Kolovos

et al., 2016), and its high-performance and ﬂexibi-

lity made it the ﬁrst choice for MEASURE. Since the

MONDO project ended in 2016, however, Modelio

had continued to evolve, and MEASURE presented

new challenges. Further talks between Softeam and

the Hawk developers extracted these requirements:

R1. Since Modelio 3.6, metamodels were no longer

embedded in the source code of Modelio, but

instead provided by metamodel descriptor ﬁles.

Hawk needed to be able to understand these.

R2. Hawk would need to run as a standalone ser-

vice from the MEASURE platform, unlike in the

Constellation integration where it was used as a

library. This was to keep the MEASURE plat-

form simple and to separate the high resource de-

mands of Hawk from it.

R3. Softeam needed to be able to automatically pro-

vision new Hawk servers in a cloud environ-

ment with standard tools (e.g. Docker, Puppet or

Chef). These tools usually operate by ﬁlling in

conﬁguration ﬁle templates, rather than invoking

web services.

R4. Model metrics would still be written as queries

in the Epsilon Object Language, much like those

in the Constellation work. Softeam had found

this language simple enough to use during prior

experiments in the MONDO project.

R5. Hawk would need a component that could be de-

ployed as a measure in the MEASURE platform,

reading the SMM-based conﬁguration to invoke

Hawk and relay the results back to the platform.

These requirements meant that while most of

Hawk could be reused (R2 and R4 could leverage ex-

isting components), it was necessary to rewrite some

components (R1), expand others with new features

(R3) and create a new one (R5). The integrated ar-

chitecture was envisioned as in Figure 5, where the

MEASURE platform would live in one machine, the

Hawk server in another machine, and the modeller’s

Modelio installation in a third machine (the work-

station). The yellow components would be the new

pieces in the puzzle: a version of Hawk with a re-

vised Modelio model parser component, a ﬁle-based

Hawk server conﬁguration engine, and a measure im-

plementation based on Hawk queries for the MEA-

SURE platform. The following sections will expand

on the work involved for each of these.

3.2 Metamodel Descriptor Support

Before Modelio 3.6.0, the Modelio metamodels were

embedded in the source code of the tool and were not

customisable by users. Thanks to this, it was possible

to adopt a simple approach in which Hawk was com-

piled against an Apache-licensed library produced by

Integration of Hawk for Model Metrics in the MEASURE Platform

723

Figure 5: UML deployment diagram of the integration between Hawk and the MEASURE platform.

Listing 1: Sample contents of a mmversion.dat ﬁle

1m od e l i o . k e r n e l

20 . 1 . 0 0

3S t a n d a r d

41 . 0 . 0 0

Softeam for each release of Modelio. Unfortunately,

this meant that a particular installation of Hawk could

not index models developed with multiple versions of

Modelio, and that supporting a different version re-

quired replacing .jar ﬁles manually.

Modelio 3.6.0 made that approach unfeasible:

users could now deﬁne their own metamodels through

metamodel descriptor ﬁles. We could not expect me-

tamodels to be stable for a release: instead, Hawk had

to understand those descriptions and then use them to

understand models conforming to this structure.

Luckily, this is a very similar approach to what

was done for EMF-based models, whose structure is

described through ECore metamodels. Since it was

not possible to bring any GPLv3 code from Modelio

into Hawk, it was decided to implement a metamodel

parser following a clean-room approach: instead of

looking at the code, the format would be reverse en-

gineered from a sample, with some answers from the

original developers. This is the same approach that

was taken when implementing the original EXML

parser in Hawk (Garcia-Dominguez et al., 2016).

Now that there could be multiple Modelio meta-

models registered at a time in Hawk, it was necessary

to create a centralised registry of metamodels. Meta-

models can be looked up by the name and a version.

Unlike EMF metamodels, where identiﬁers tend to in-

clude the versions themselves, Modelio metamodels

keep the version separate. This provides a small de-

gree of ﬂexibility: if a model conforms to a version

of a metamodel we don’t have an exact match for, we

use the latest version available.

Finding out which version of a metamodel

to use for a speciﬁc model ﬁle requires some

thought as well. This information is present in a

mmversion.dat ﬁle in the project within the user’s

Modelio workspace, and not as part of the model ﬁle

or the model element themselves. Listing 1 shows the

typical contents of these ﬁles, formed by a sequence

of line pairs with the name of the metamodel (lines 1

and 3) and the version of the metamodel (lines 2 and

4).

3.3 Automated Server Provision

In order to use Hawk effectively in cloud environ-

ments, it should be possible to quickly bring up an

index for a certain location in an automated man-

ner. Most cloud deployment tools expect to be able

to clone an image, make some small changes to the ﬁ-

les and rerun things. Being able to conﬁgure Hawk

through ﬁles is also much more familiar to server

administrators, and it is easier to teach with typical

copy-and-paste instructions.

To accommodate the scenarios, the Hawk server

was modiﬁed so it would look upon startup for conﬁ-

guration ﬁles in a predeﬁned folder, and set up new in-

dices for those it does not have a match for. It will not

make any changes to indices for which we do not have

conﬁguration ﬁles (since they may have been created

manually through the API), and for now, it will not

update the conﬁguration of existing indices either.

An example of a conﬁguration ﬁle is shown in Lis-

ting 2. Lines 1–2 mention that the OrientDB backend

should be used, and provide a name for the instance.

Line 3 mentions the minimum and maximum delays

in milliseconds for repository polling (0 means it is di-

sabled). Lines 4–8 specify which components should

be enabled. Lines 9–12 register the metamodels des-

MOMA3N 2018 - Special Session on Model Management And Analytics

724

Listing 2: Sample contents of a server conﬁg ﬁle.

1 <hawk backend=”...OrientDatabase”

2 name=”instance 36”>

3 <delay max=”0” min=”0”/>

4 <plugins>

5 <plugin

6 name=”...ModelioGraphChangeListener”/>...

7 </plugins>

8 <metamodels>

9 <metamodel uri=”” location=”.../mm36.xml”/>

10 </metamodels>

11 <derivedAttributes>...</derivedAttributes>

12 <indexedAttributes>

13 <indexedAttribute attributeName=”Name”

14 metamodelUri=”modelio://...”

15 typeName=”Class”/>

16 </indexedAttributes>

17 <repositories>

18 <repository frozen=”false”

19 location=”ﬁle:///.../ArchiChocolate/”

20 pass=”” type=”...LocalFolder” user=””/>

21 </repositories>

22 </hawk>

cribing the structure of the models to be indexed. Line

13 would be used to registered derived attributes, and

lines 14-18 register an indexed attribute for quickly

ﬁnding classes by name. Finally, lines 19-23 mention

where the models are stored.

3.4 Hawk-based Measures

In order to create direct measures on models using

Hawk for the MEASURE platform, a new implemen-

tation of the SMM Measure API (§2.1) interfaces was

developed: the Hawk Query measure library (HawkM

from now on) (Al-wadeai, 2017).

HawkM can be used in two ways:

• As a regular component for MEASURE, which

takes in a conﬁguration ﬁle with the server details

and the EOL query to be run, and relays back to

the MEASURE platform.

• As a library for developing more advanced me-

trics, which require orchestrating multiple EOL

queries or automatically generating the EOL

source altogether.

HawkM has provisions for the fact that the Epsi-

lon Object Language is dynamically typed, and the-

refore it is not clearly known what will be the return

type of the query in the conﬁguration ﬁle. The Hawk

server will provide a response with the EOL native

type and the raw value, and HawkM will map it into

the MEASURE platform API’s type system. HawkM

will also timestamp the moment when the measure-

ment was taken.

(a) Relations between interfaces and classes.

(b) Division of classes across components.

Figure 6: UML class diagrams for the example Modelio

project used to develop MEASURE queries.

4 EVALUATION

The work in the previous section allowed Hawk to be-

come another piece of the MEASURE platform. This

section will study whether this integration can imple-

ment the model measures required by the MEASURE

ITEA3 project, and whether it can scale to models in

the large sizes expected by its industrial case studies.

4.1 Flexibility

After completing the work in Section 3, several case

studies were run to validate if Hawk could accommo-

date the wider variety of models in Modelio 3.6, and

run the desired queries.

4.1.1 MEASURE Queries

The ﬁrst test was done by creating a sample Mo-

delio 3.6 project with various containment and inher-

itance/implementation relationships (as shown in Fi-

gure 6). The Hawk server was conﬁgured to index

this sample project, using the OrientDB backend and

the Modelio-speciﬁc model and metamodel parsing

components. The Modelio 3.6 metamodel descriptor

(part of the open source source code release) was par-

sed successfully by Hawk: this sample project requi-

Integration of Hawk for Model Metrics in the MEASURE Platform

725

Listing 3: Excerpt of the EOL source code for the “Number

of overridden methods” query.

1 var nOverriden = 0;

2 for (myClass in Class.all) {

3 nOverriden += getNOverridden(myClass);

4 }

5 return nOverridden;

7 operation getNOverriden(myClass) {

8 ...

9 var ops = getAllOpsOfSubTypesOf(myClass);

10 opsNames.addAll(ops.Name);

11 for (myop in myClass.OwnedOp) {

12 if (opsNames.includes(myop.Name)) {

13 var sameName

14 = ops.select(t|t.Name = myop.Name);

15 if (sameName.size > 0) {

16 var sameNameReturn = /∗ ... ∗/;

17 var sameNameReturnArgs = /∗ ... ∗/;

18 if (sameNameReturnArgs.size > 0) {

19 nOverridden = nOverridden + 1;

20 }}}}

21 return nOverridden;

22 }

red the “Infrastructure” 2.0.00, “Standard” 2.0.00 and

“modelio.kernel” 1.0.00 fragments in particular.

After indexing these models, a subset of the me-

trics requested by the MEASURE industrial partners

was selected for implementation (as shown in Ta-

ble 1). These metrics were deﬁned and prioritised

along with their required metadata and measurement

tools. The goal was to have basic metrics that could

be recombined into higher-level indicators. The table

shows examples for the two ways of implementing

queries:

• Deploying the generic measure implemented in

Section 3.4 directly, specifying an EOL expres-

sion in the “query” part of its conﬁguration. This

does not require any Java coding, but it does re-

quire knowing the structure of the models to be

queried quite well. This is good for advanced

users or very specialised queries. The ﬁrst two

queries were like this.

• Writing Java code on top of the generic measure,

which provides the EOL query to be run. This

can be useful to distribute “canned” queries about

known metrics for known metamodels to users

that may not know how to write EOL code. The

other queries were written in this way. The last

two queries in particular were written to generate

EOL queries on the ﬂy for a speciﬁc component,

whose name was given during deployment.

As an example of how a query looks like in EOL,

Listing 3 shows an excerpt of the “Number of over-

ridden methods” query. EOL is a very ﬂexible lan-

Figure 7: Query report from MEASURE platform.

Listing 4: Example query on Archimate models for retrie-

ving the top 5 most related concepts.

1return Concept.all.collect(c | Sequence{

2c, c.closure(c2|c2.relatedTo.to).size

3}).sortBy(c | −c.second).collect(c | Sequence{

4c.ﬁrst.getTypeName() + ’ ’ + c.ﬁrst.Name,

5c.second}).subList(0,5);

guage, and it is possible to write complex queries that

check if a method has been overridden within the pro-

per subtypes of a class. Particularly, here we check

if across the subtypes, there is at least one operation

with the same name, return type and argument types.

The measure containing this query as part of its

Java code was then deployed within the MEASURE

platform, producing a report upon execution as shown

in Figure 7. The execution time in this case was slig-

htly higher than one would expect, but mostly it was

due to the fact that MEASURE was on temporary in-

frastructure, while querying a server running off a lap-

top on a different country altogether. As we will see

later, queries run faster given the right infrastructure.

4.1.2 Archimate Models

The second case study used to evaluate the ﬂexibility

of this integration was a set of sample models develo-

ped with the new support in Modelio 3.6 for creating

enterprise architecture models written in the Archi-

mate notation and metamodel. Speciﬁcally, the model

shown in Figure 8 was indexed. This type of model

combines software components (ERP, SCADA) with

buildings (factories) and descriptions of the processes

within the enterprise and the various business actors.

This model was indexed by Hawk successfully

using the same Modelio 3.6 metamodel descriptor

MOMA3N 2018 - Special Session on Model Management And Analytics

726

Table 1: Listing of queries implemented on Hawk from MEASURE industrial partners.

Implementation Parameters Metric name Test result

Generic measure

with custom

EOL query in

conﬁg.

serverUrl

Number of interfaces 1

instanceName

queryLanguage

Number of attributes in a component 1

query

Custom measure

with EOL query

generated from

Java code

serverUrl Average classes per component 2

instanceName Average subcomponents per component 1

Number of overridden methods 2

Number of overriding methods 3

serverUrl Number of subcomponents in a component 4 (cmp. A)

instanceName Number of classes in a component 6 (cmp. A)

componentName

Figure 8: Archimate model of the enteprise architecture of the ﬁctional “ArchiChocolate” company.

as before. The model contains 634 elements (as

reported by return Model. allInstances . size ; through

Hawk) and was indexed in 8 seconds and 48 milli-

seconds with a recent laptop. Particularly, a Think-

pad X1 with an i7-6600U CPU, 16GiB of RAM, and

a solid-state disk, running Ubuntu 16.04.3 and Linux

4.4.0-98, using Oracle Java 8u102 and the latest ver-

sion of Hawk at the time of writing (commit “66edee”

on Github). The model is not very large, but architec-

tural models are important for early high-level analy-

sis. Listing 4 shows an example of a query which, for

each Concept in the model:

• Annotates each concept with the number of con-

cepts that it is related to, transitively.

• Sorts them from the ones with the most related

concepts to the least.

• Collects the type name, name and number of “re-

lated to” concepts for each.

• Returns the top 5 of those elements.

This type of query could be used for impact ana-

Listing 5: Response from Thrift for Archimate query.

1 QueryReport(result:[

2 [BusinessActor ArchiChocolateEnterprise, 13],

3 [BusinessProcess Produce, 10], [Facility Turin, 7],

4 [Facility Paris, 5], [Equipment ChocoProdLine, 4]

5 ], wallMillis: 43)

lysis of a potential change in the model: the top 5 ele-

ments would be the ones that would have the largest

effect if they were removed.

When executed through the Thrift API, this query

produces an output like that shown on Listing 5. The

query itself took 54ms from the client to the server

while running in a local instance (as this was a proof

of concept), but it only took 43ms to run within the

server itself: the other 11ms were network overhead.

4.2 Scalability

The next part was to index a large collection of Mo-

delio 3.7 models with this new version of Hawk. This

evaluation was conducted by Softeam, with the help

Integration of Hawk for Model Metrics in the MEASURE Platform

727

Table 2: Indexing times per project and processing step for the scalability study, in seconds.

Modelio project analyst app archimate bpmn uml Total

Indexable ﬁles 1215 4543 2226 763 2305 11052

Fragment insertion 79 412 64 48 151 754

Fragment connection 105 301 167 140 248 960

Children derivation 103 388 105 66 235 897

Listing 6: EOL source code for the “average classes per component” query.

1 var components = Component.all;

2 return components.collect(c|c.countClasses()).sum() / components.size;

4 operation Component countClasses() {

5 return self.OwnedElement.select(c|c.isTypeOf(Class)).size

6 + self.OwnedElement.select(p|p.isTypeOf(Package)).collect(p|p.countClasses()).sum()

7 + self.OwnedElement.select(subc|subc.isTypeOf(Component)).collect(subc|subc.countClasses()).sum();

8 }

9 operation Package countClasses() {

10 return self.OwnedElement.select(c|c.isTypeOf(Class)).size

11 + self.OwnedElement.select(p|p.isTypeOf(Package)).collect(p|p.countClasses()).sum();

12 }

of the Hawk developers. The models were those used

for the internal development of Modelio itself: these

change for every release of Modelio, and are used to

generate over a million lines of code. The Modelio

projects added up to 3.7GB on disk, where 2.16GB

was from the 11052 .exml ﬁles to be indexed. These

had 452084 model elements, according to Hawk.

The indexing process was done on a laptop with

an Intel i7-6500U CPU, 8GiB of RAM, and an SSD

running Oracle Java 8u60 over Windows 10, and the

same version of Hawk as above. The OrientDB bac-

kend was used. Indexing took 2686 seconds in total

(45 minutes), which is slightly higher than the break-

down shown on Table 2 (2611s). The table has the

times needed to index the .exml fragments, connect

them, and (optionally) derive their parent-children re-

lationships.

While it may seem expensive at ﬁrst glance, it

is important to note that this high upfront cost only

needs to be paid once: later changes to the models

will be processed incrementally by Hawk, with a cost

roughly proportional to the change of the model. In

our previous study, this cost was quickly amortised as

queries were faster than with just Modelio (Garcia-

Dominguez et al., 2016).

As for the queries, we ﬁrst ran some simpler ex-

amples in this case to see how quickly we could

count all instances of a certain type. This would

give us a rough estimate of how quickly we could

ﬁnd certain subsets of the entire collection of models.

These were queries of the form return X. all . size ;,

except for the one counting all model elements, which

was return Model. allInstances . size ;. The results are

Table 3: Times required to count all instances.

Type Time (s) Count

(All types) 15.50 452084

Attribute 0.73 7403

Class 1.93 8545

Component 0.20 118

Interface 0.20 370

Operation 1.45 52502

shown on Table 3: it is possible to iterate over all in-

stances and count them in 15 seconds, and we can ﬁnd

all 8545 classes across all projects in less than two se-

conds.

Next, we decided to run one of the MEASURE in-

dustrial queries, particularly the “average classes per

component” query from Table 1 shown in Listing 6.

This query ran across the indexed 2.16GB of models

in only 1666ms, producing the end result of 59.49

classes on average per component. Results like these

show that Hawk can produce answers in seconds for

queries over very large models, by taking advantage

of the efﬁcient navigation of references in graph data-

bases.

5 RELATED WORK

The increasing size of industrial models has given rise

to several other high-performance model persistence

and model querying technologies. If the model is sto-

red in a database in the ﬁrst place, it may be faster

MOMA3N 2018 - Special Session on Model Management And Analytics

728

to query than if it is stored on ﬁles: this is possible

with NeoEMF (G

omez et al., 2015), an alternative

model persistence layer for EMF models with sup-

port for Neo4j and MapDB among other technologies.

Another similar option is MongoEMF

, which uses

the MongoDB document database.

Beyond database-backed single models, database

storage of entire collections is also possible with

model repository technologies such as Eclipse Con-

nected Data Objects (CDO)

or Morsa (Pag

an et al.,

2013). CDO in particular is very mature and supports

both relational and document-oriented databases.

Unfortunately, none of these technologies would

have been of much use for MEASURE straight away,

as Modelio models are not based on EMF. Even if

a mapping to EMF were implemented, we would

be left with two options: either replace the persis-

tence technology in the original models (which is

non-trivial and intrusive on the user experience), or

implement an incremental synchronisation approach

between the original models and our EMF-based copy

for indexing. The second alternative would have been

roughly equivalent to what was already in Hawk, wit-

hout its other beneﬁts of incremental/derived attribu-

tes and a web service API.

6 CONCLUSION AND FUTURE

WORK

The MEASURE project is developing a platform for

collecting metrics across the entire software develop-

ment lifecycle. In model-driven processes, the spe-

ciﬁcation and design phases operate on models rat-

her than on code: it is necessary to extract measure-

ments from models as well. It can be very expensive

to collect metrics across large collections of models,

or from very large models. This paper has shown the

ﬁrst version of an integration between the MEASURE

platform and the Hawk model indexing framework,

with positive results in terms of ﬂexibility and perfor-

mance. Hawk makes it possible to use “big data”-

class NoSQL technologies for efﬁcient querying of

existing models with little technical risk.

Hawk can now index any model supported by cur-

rent and future versions of Modelio without changes

in its code. It has been used in this paper to index

both Modelio 3.6 and 3.7 models, some of them from

the enteprise architecture domain and some from the

object-oriented software design domain. Hawk ser-

vers can be provisioned in an automated way wit-

https://github.com/BryanHun/mongo-emf

http://wiki.eclipse.org/CDO

hout involving the use of its API, and it is possible

to deploy both custom and predeﬁned EOL queries in

the MEASURE platform. Hawk was also used to in-

dex 2GB of industrial Modelio models: while there is

some upfront cost in the indexing, it is quickly amorti-

sed through the faster execution of the queries and the

incremental updates for later changes in the models.

Regarding future work, the MEASURE platform

will be extended with analysis capabilities that com-

bine multiple metrics, possibly from different arte-

facts and phases in the software development lifecy-

cle. Hawk will be further validated through the imple-

mentation of additional queries from the MEASURE

industrial partners, and some of the queries will be

optimised with the use of derived and indexed attribu-

tes.

ACKNOWLEDGEMENTS

The research leading to these results was partially fun-

ded by the ITEA3 project 14009, MEASURE.

REFERENCES

Abherve, A. (2017). Github project for the SMM Mea-

sure API library. https://github.com/ITEA3-Measure/

SMMMeasureApi. Last accessed on 2017-11-01.

Abherve, A., Bagnato, A., Stefanescu, A., and Baars,

A. (2017). Github project for the MEASURE

platform. https://github.com/ITEA3-Measure/

MeasurePlatform/graphs/contributors. Last accessed

on 2017-11-01.

Al-wadeai, O. (2017). Github project for the Hawk

query SMM measure library. https://github.com/

Orjuwan-alwadeai/HawkQuerySMMMMeasureLib.

Last accessed on 2017-11-01.

Apache Software Foundation (2017). Apache Thrift project

website. http://thrift.apache.org/. Last accessed on

2017-11-01.

Barmpis, K. and Kolovos, D. S. (2013). Hawk: towards a

scalable model indexing architecture. In Proceedings

of the Workshop on Scalability in Model Driven Engi-

neering, BigMDE ’13, pages 6:1–6:9, New York, NY,

USA. ACM.

Garcia-Dominguez, A., Barmpis, K., Kolovos, D. S.,

da Silva, M. A. A., Abherve, A., and Bagnato, A.

(2016). Integration of a graph-based model indexer

in commercial modelling tools. In Proceedings of the

ACM/IEEE 19th International Conference on Model

Driven Engineering Languages and Systems, pages

340–350, Saint Malo, France. ACM Press.

omez, A., Tisi, M., Suny

e, G., and Cabot, J. (2015). Map-

based transparent persistence for very large models.

In Egyed, A. and Schaefer, I., editors, Fundamen-

tal Approaches to Software Engineering, volume 9033

Integration of Hawk for Model Metrics in the MEASURE Platform

729

of Lecture Notes in Computer Science, pages 19–34.

Springer Berlin Heidelberg.

Kolovos, D. S., Garcia-Dominguez, A., Paige, R. F., Guerra,

E., de Lara, J., Rath, I., Varr, D., Suny, G., and Tisi,

M. (2016). MONDO: Scalable Modelling and Model

Management on the Cloud. In Joint Proceedings of the

Doctoral Symposium and Projects Showcase Held as

Part of STAF 2016 co-located with Software Techno-

logies: Applications and Foundations (STAF 2016),

Vienna, Austria.

Kolovos, D. S., Rose, L., Garcia, A., and Paige, R. (2017).

The Epsilon book. http://www.eclipse.org/epsilon/

doc/book/. Last accessed on 2017-11-01.

Object Management Group (2016). The Software

Metrics Meta-Model Speciﬁcation 1.1.1. http://

www.omg.org/spec/SMM/1.1.1/. Last accessed on

2017-11-01.

Pag

an, J. E., Cuadrado, J. S., and Molina, J. G. (2013). A

repository for scalable model management. Software

& Systems Modeling, 14(1):219–239.

Shields, A. (2014). Must-know: an overview of the software

industry - Market Realist. http://marketrealist.com/

2014/07/must-know-overview-software-industry-2/.

Softeam Cadextan (2017). Modelio project website. https://

www.modelio.org/. Last accessed on 2017-11-01.

Softeam R&D (2017). MEASURE project website. http://

measure.softeam-rd.eu/. Last accessed on 2017-11-

01.

MOMA3N 2018 - Special Session on Model Management And Analytics

730