XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy

Information Systems

André Reis and Alberto Rodrigues da Silva

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal

Keywords:

Model-driven Engineering, Model-driven Reverse Engineering, Model-driven Reengineering, Database,

Legacy System.

Abstract:

Due to the development of new technologies companies face high costs managing and maintaining their legacy

applications, thus upgrading those systems became a complex challenge. This paper describes a model-driven

reverse engineering approach that aims to support the mentioned challenge. This approach takes as input the

legacy relational database schema, but also user preferences to better guide the reverse engineering process.

From these artefacts it is possible to extract models of the legacy application through model-to-model transfor-

mations based on a set of well deﬁned rules and heuristics. The main contributions of this proposal (compared

with the state of the art) are the semi-automatic identiﬁcation of generalizations and aggregations and the

possibility to automatically extract default values to enrich the produced models. The paper also includes an

evaluation and a discussion of the proposal based on a simple case study and a real-world application.

1 INTRODUCTION

Enterprises have to manage legacy software applica-

tions which were built in the past and are still being

used since then. In the long term the maintenance of

such applications become rather complex, mainly due

to outdated or poor documentation and reduced ﬂexi-

bility of the original software technologies (De Lucia

et al., 2008). To overcome these problems software

reengineering approaches can be used (Arnold, 1993).

Reengineering supports evolutionary maintenance of

legacy applications by analysing and modifying such

applications rebuilding them in a new form (Chikof-

sky and Cross, 1990).

A model-driven engineering (MDE) approach can

be seen as a chain of model transformations that pro-

duces the target software artefacts from more abstract

models (Ruiz et al., 2016). Using this kind of ap-

proach, the most common model transformations are:

Model-to-Model (M2M) and Model-to-Text (M2T)

(da Silva, 2015). M2M transformations generate a tar-

get model from a source model. M2T transformations

produce a textual representation from a source model.

Model-driven reengineering is the application of

MDE principles and techniques to reengineer an ap-

plication. As illustrated in Figure 1, a complete

model-driven reengineering process is composed of

three main stages (Chikofsky and Cross, 1990): (i) re-

verse engineering, (ii) restructuring and (iii) forward

engineering. First, the reverse engineering stage aims

to extract knowledge deﬁned in low abstraction rep-

resentations through introspection and analysis of the

initial source artefacts (initial models); this process

collects valuable information that makes it easier to

deﬁne current legacy application requirements. Sec-

ond, the system restructuring stage involves establish-

ing a mapping between the source and the target sys-

tems. Third, the forward engineering stage produces

the artefacts to the new target system. Each of these

stages is a transformational process that can be im-

plemented by means of a model transformation chain

whose initial set of models is injected. This model

injection is performed with a T2M transformation ap-

plied to textual software artefacts, such as SQL data

deﬁnition language (DDL) scripts (Ruiz et al., 2016).

Model-Driven Reverse Engineering (MDRE) uses

MDE principles and techniques to reverse engineer-

ing, producing relevant (model-based) views from

legacy systems, thus facilitating their understand-

ing and subsequent manipulation (Bruneliere et al.,

2014). A MDRE process is a part of the model-driven

reengineering process merely focused in the reverse

engineering stage.

This paper proposes a MDRE approach to produce

high-level speciﬁcations of legacy information sys-

tems in a human-controlled way. These speciﬁcations

196

Reis A. and Rodrigues da Silva A.

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems.

DOI: 10.5220/0006271501960207

In Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2017), pages 196-207

ISBN: 978-989-758-210-3

Figure 1: Model-driven reengineering process.

are deﬁned according to domain speciﬁc languages,

and produced from the injection and the reverse engi-

neering stages.

The remainder of the paper is organized as fol-

lows. Section 2 gives a brief introduction of the con-

text of this research. Section 3 describes the proposed

MDRE approach and associated tool. Section 4 dis-

cuss the main results using a simple case study and

also a real-world application. Section 5 analyses and

compares this proposal with the related work. Finally

section 6 presents the conclusion and future work.

2 BACKGROUND

This research has been developed at the Instituto Su-

perior Técnico, Universidade de Lisboa, in the area of

MDE, namely within the MDDLingo

initiative.

MDDLingo is an umbrella researching initiative

that aggregates several projects around MDE topics,

namely involving the deﬁnition of a family of lan-

guages, also known as XIS*. This set of modelling

languages derives from XIS-UML proﬁle (da Silva,

2003), involving namely XIS-Mobile (Ribeiro and

da Silva, 2014b; Ribeiro and da Silva, 2014a), XIS-

CMS (Filipe et al., 2016) or XIS-Web (Seixas, 2016).

XIS-UML is a set of coherent constructs deﬁned as

a UML proﬁle that allows a high-level and visual

modeling way to design business information sys-

tems. In general these languages include the fol-

lowing views: Entities (which includes Domain and

Business Entities views), UseCases (containing Ac-

tors and Use Cases views), Architectural and User-

Interfaces (composed by Interaction Space and Nav-

igation Space views). Figure 2 illustrates the afore-

mentioned set of views

Figure 3 illustrates a very simple XIS* Domain

https://github.com/MDDLingo

Figure 2: The multi-view organization of XIS*.

view which aggregates domain classes (XisEntity),

their attributes (XisEntityAttribute) and relationships

(XisEntityAssociation/XisEntityInheritance).

Figure 3: XIS* Domain view.

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

197

In addition, Figure 4 shows the BusinessEntities

view which allows to deﬁne higher-level entities (Xis-

BusinessEntity), that aggregate XisEntities and that in

the context of a given use case are easily manipulated.

Figure 4: XIS* BusinessEntities view.

Finally, Figure 5 shows the UseCases View. This

details the operations an actor can perform when in-

teracting with the application in the context of a busi-

ness entity (Ribeiro and da Silva, 2014b).

Figure 5: XIS* UseCases view.

XIS* languages are very alike in terms of the

Entities view, however there are some discrepancies

on their Use Cases, Architectural and User-Interfaces

views due to each platform target application. For

example, XIS-CMS does not have an Architectural

view.

It is to point out that all of them are able to gen-

erate their User-Interfaces, using a smart approach

based on M2M transformations given the set of Do-

main, Business Entities, (Architectural,) Actors and

Use Cases views.

3 XIS-REVERSE APPROACH

The MDRE approach proposed in this paper is named

“XIS-Reverse”. This approach focus only in the ﬁrst

two stages of the reengineering process (Figure 1), na-

mely Injection and Reverse Engineering stages. XIS-

Reverse is generically represented in Figure 6. It

starts by extracting the application database schema

from an existent data source (through injection).

Thereafter the Application data model is generated.

Secondly, the user is able to deﬁne his own prefer-

ences by tweaking some heuristic’s parameters, pro-

ducing the XIS-Reverse conﬁguration. The reverse

engineering stage takes as input these two artefacts,

and generates XIS* models. Then, the designer can

analyse the produced artefacts and introduce some re-

ﬁnements, such as changing automatically identiﬁed

relationships into different ones in the Entities view,

enhancing the Use-Cases views, etc.

3.1 Tools Support

Since Sparx Systems Enterprise Architect (EA)

the tool used to design business information systems

in the other XIS* projects, in order to reduce depen-

dencies for future work and learning curve for the

user, we decided to also implement XIS-Reverse on

top of this tool. Moreover, from this tool we used a

set of provided technologies, namely the Automation

Interface and the Model Driven Generation

Both reverse engineering conﬁguration and exe-

cution steps (Figure 6) are supported by the XIS-

Reverse tool. Moreover, in terms of the applica-

tion database access and the proﬁler log ﬁle injec-

tion, for now this tool only supports SQL Server

connections and proﬁler log ﬁles generated from

the Microsoft SQL Server Management Studio Pro-

ﬁler Tool. Regarding the produced speciﬁcations,

this tool is able to produce XIS* models but also

RSLingo’s RSL

(de Almeida Ferreira and da Silva,

2012; de Almeida Ferreira and da Silva, 2013) spec-

iﬁcations. However, due to space constraints, only

XIS*/XIS-Web models are considered. Furthermore,

this tool can be extended to support different repre-

sentations.

3.2 Running Example

For better understanding and simplicity of the expla-

nation, we introduce a simple but practical example:

the Social Security legacy application (SS-App).

The SS-App consists in an application that man-

ages beneﬁciaries, their dependents (and tutors), as-

sociated documents and user accounts.

This domain is modelled as 8 tables: Beneﬁciary

and Dependent which primary keys are foreign keys

http://www.sparxsystems.com/products/ea

http://www.sparxsystems.com/resources/mdg_tech

https://github.com/RSLingo/RSL

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

198

Figure 6: Overview of the XIS-Reverse approach.

to a Person table. Dependents table represents the

relationship between a Beneﬁciary and a Dependent

(through its foreign keys linked to each of those ta-

bles). A Dependent may have a Tutor, which are

linked by a foreign key. Each Beneﬁciary has a set

of documents (since a Document has a foreign key to

a Beneﬁciary table) and a unique UserAccount (due

to the unique constraint that the UserAccount table

has over its BenIdNumber foreign key). Furthermore,

several documents not linked to any Beneﬁciary may

exist as an UnknownDocument.

Each of those tables, columns, constraints and

number of rows in the database are represented in Fig-

ure 7.

3.3 Data Schema Extraction

Following the aforementioned process of Figure 6,

during the Data Schema Extraction step, in order to

extract the application data model our approach de-

pends on the EA native capability

to reverse engi-

neer a database model through an ODBC connection.

Moreover, EA supports several ODBC data sources

3.4 Reverse Engineering Conﬁguration

A snapshot of the tool is illustrated in Figure 8, which

is used to explain the main conﬁguration features.

http://www.sparxsystems.com/enterprise_architect

_user_guide/13.0/model_domains/

importdatabaseschemafromod.html

http://www.sparxsystems.com/enterprise

_architect_user_guide/13.0/model_domains/

supported_databases.html

Figure 7: Case Study SS-App data model.

The conﬁguration panel of the tool has four main ar-

eas: input, output, transformation rules guidance and

appearance.

The Input area allows to specify the root node to

where the application data model was extracted, to in-

troduce the database name and to choose how to en-

hance the output speciﬁcations. In the Output area

the user can select additional output representations,

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

199

Figure 8: Conﬁguration panel of the tool.

namely XIS-Web or RSLingo’s RSL representations.

The Transformation Rules Guidance area allows

the user to contribute with his knowledge to enhance

the Reverse Engineering process. This conﬁguration

can be split into three areas: (i) Simple Principal Ta-

bles (ii) Default Values Extraction and (iii) General-

ization discovery. The user can select which tables

are Simple Principal Tables (which we consider as ta-

bles without many relationships, such as conﬁgura-

tion tables or “kind of” tables). Moreover, the user

can enhance the produced speciﬁcations by extracting

defaults values from a selected column of a certain ta-

ble. Furthermore, in both cases the user can also ﬁlter

the list of tables by specifying the maximum number

of rows per table, which allows the user to ﬁnd more

easily “kind of” tables, since usually they have few

number of instances (rows). Finally, the user can acti-

vate generalization discovery and add some knowl-

edge, such as a list of XisEntityAttribute names to

ignore, the minimum number of shared XisEntityAt-

tributes (e.g. 5 or 10), and to choose how to aggregate

those entities (e.g. by the higher number of shared

attributes or by the higher number of aggregated enti-

ties).

The Appearance conﬁguration area allows to im-

prove readability of the produced speciﬁcations, by

arranging models into smaller views and changing the

name of the tables to a clearer one.

3.5 Reverse Engineering Execution

Taking as input (i) the application data model, (ii) the

XIS-Reverse conﬁguration and (iii) the proﬁler log

ﬁle or the database connection, the reverse engineer-

ing process is composed by a set of transformation

tasks, that will be applied. Those include a chain of

M2M transformations which are used to specify the

application as a set of XIS* views.

3.5.1 Initialization

Before M2M transformations take place, the proﬁler

log ﬁle or the database connection, if provided, are

used to extract more detailed information about the

legacy application, that may enhance the produced

speciﬁcations.

From the proﬁler log ﬁle it is possible to extract

SQL statements (INSERT, SELECT, UPDATE and

DELETE), and from those statements the set of used

tables. Moreover, from that we generate 4 different

lists, one for each type of statement, where each one

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

200

of them holds for each table the number of occur-

rences for the corresponding kind of statement.

A database connection can be used to collect for

each table in the initial application model, the current

number of rows in the actual application database.

3.5.2 XIS Model-to-model Transformations

The following transformations are broken down into

the different set of produced elements.

XisEntities Transformation Rules.

For each table of the source data model, the following

XisEntities (E i) rules are applied:

(E 1) If a table has only 2 foreign keys, each one

to a distinct table, and does not contain more at-

tributes (excluding its primary keys), this table

will be translated into a XisEntityAssociation be-

tween the two referenced tables, using a many-to-

many cardinality, ie. *:* cardinality (e.g. Depen-

dents in the running example).

(E 2) Otherwise the table will map a XisEntity (e.g.

Beneﬁciary in the running example).

XisEntityAttribues Transformation Rules.

Regarding table columns, the following XisEntityAt-

tribues (EA j ) rules are applied:

Let A, B be two tables related by a foreign key

constraint from B (referencing table) to A (referenced

table):

(EA 1) For the complete Primary Key:

a) If all the foreign key columns in B are also its

complete primary key, in other words B and

A have a 1:1 relationship since B primary key

references A primary key. Then B foreign

keys will be translated into a XisEntityInheri-

tance relationship, where the class equivalent

to A will be the superclass and the equivalent

class to B will be the subclass (e.g. Person

(A) and Beneﬁciary (B) in the running exam-

ple).

b) Otherwise, the complete Primary Key will

not be represented (e.g. UnknownDocument

in the running example). Due to XIS* Do-

main View similarity to a Class Diagram,

each XisEntity can be seen as a Class, which

can have objects, and each object is different

by deﬁnition, so there is no need for a unique

identiﬁer for each XisEntity.

(EA 2) For the remaining columns with Foreign Key

constraints:

a) If the foreign key in B constitute a unique in-

dex the XisEntityAssociation will have a 1:1

cardinality, preserving the relationship direc-

tion (e.g. BenIdNumber of UserAccount in

the running example).

b) If A was not selected as Simple Principal Ta-

ble by the user (conﬁguration stage), and one

of the following options is also true: 1) the

proﬁler log ﬁle was provided, both A and

B tables were referenced in that ﬁle, and ta-

ble B had the same or higher amount of IN-

SERT operations as A did, or 2) the database

connection was provided and table B had the

same or higher amount of rows as A did (if

B foreign key column allows NULL values,

only rows with NOT NULL values on that

column will be taken into account). Then we

assume that there is an aggregation relation-

ship between entity A and B (A aggregates

B), that is translated into a XisEntityAsso-

ciation, preserving the foreign key cardinal-

ity and represented with a bidirectional arrow

(e.g. Beneﬁciary with 500 rows (A) and Doc-

ument with 1000 rows (B) in the running ex-

ample).

This reasoning made sense for us since that,

assuming that A and B are respectively strong

and weak entities, A is the one which aggre-

gates B, it is fair to assume that if that foreign

key was made for that purpose, then there

must be at least the same amount of B in-

stances created compared with A.

c) Otherwise, the XisEntityAssociation will

have a 1:* cardinality between table B and

A, preserving the relationship direction (e.g.

Dependent (A) and Tutor (B) in the running

example).

(EA 3) Otherwise the column will be translated into a

XisEntityAttribute with a corresponding XIS* at-

tribute type following a predeﬁned mapping (e.g.

Name from table Person in the running exam-

ple). Moreover, if previously conﬁgured, the user

can explicitly select if default values from a cer-

tain XisEntityAttribute from a particular XisEn-

tity should be extracted and written as an annota-

tion, which will be linked to the XisEntity. This

feature is a signiﬁcant contribution since the pro-

duced speciﬁcations are enhanced with those val-

ues, allowing a better understanding of the XisEn-

tities captured from the legacy domain.

Note: in both b) and c) cases, from (EA 2), when

the Not Null option is not set for that column (in other

words, that column can be NULL), the XisEntityAs-

sociation target’s cardinality is the concatenation of

0.. with the original target cardinality. (e.g. 1 was

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

201

the original target cardinality and the column allowed

NULL values, then it becomes 0..1)

Further XisEntityInheritance Transformation

Rule.

For each XisEntity, after the previously mentioned

XisEntities and XisEntityAttributes rules are applied:

This XisEntityInheritance identiﬁcation is per-

formed comparing every XisEntity and their XisEn-

tityAttributes with each other. We assume that two

XisEntityAttributes are the same if they share the

same name and type. During this comparison, if

previously deﬁned, some XisEntityAttributes can be

excluded by their name. Moreover, we cluster 2

or more XisEntities if they share at least the same

or more predeﬁned number of shared XisEntityAt-

tributes. Furthermore during this comparison only

XisEntities without inheritance relationship will be

taken into account.

After this clustering procedure, there are two ways

of ﬁnding inheritance, depending on the conﬁgura-

tion, the list can be ordered by the descending num-

ber of entities or ordered by the descending number

of shared XisEntityAttributes in each cluster. While

this list has clusters of XisEntities a Superclass is cre-

ated for each cluster. This Superclass will have the

identiﬁed XisEntityAttributes, each of the XisEntities

in that cluster will be linked to the superclass using a

XisEntityInheritance and will not own the XisEntity-

Attributes that their Superclass has (e.g. Unknown-

Document and Document share 3 identical columns

(Path, Description and Notes) in the running exam-

ple).

In order to implement the ﬁrst phase of this

algorithm, given the complexity that is comparing

every XisEntity with each other taking into account

all their XisEntityAttributes, in domains that can

easily be compound by hundreds or thousands of

elements, we based our approach in the MapReduce

(Dean and Ghemawat, 2008) programming model

which allows to handle large data sets.

XisBusinessEntities Transformation Rules.

For each XisEntity, the following XisBusinessEntity

(BE k ) rules are applied:

(BE 1) If the current XisEntity aggregates other

XisEntities or is not aggregated by another XisEn-

tity (i.e. we do not consider weak entities as

Business Entities), then a corresponding XisBusi-

nessEntity will be created, followed by the cre-

ation of a XisBE-EntityMasterAssociation that

will link that XisBusinessEntity with its XisEntity

from the Domain view, and for each XisEntityAs-

sociation:

a) If the XisEntityAssociation is an aggre-

gation then it will be translated into a

XisBE-EntityDetailAssociation between that

XisBusinessEntity and the target XisEn-

tity of the XisEntityAssociation. More-

over, if the corresponding target XisBusi-

nessEntity exists, there will be a XisBE-

EntityReferenceAssociation between the tar-

get XisBusinessEntity and the source XisEn-

tity.

b) Otherwise it will be translated into a

XisBE-EntityReferenceAssociation between

that XisBusinessEntity and the target XisEn-

tity of the XisEntityAssociation.

(BE 2) Otherwise there is no mapping.

XisEntityUseCases Transformation Rules.

For each XisBusinessEntity there will be a corre-

sponding elementary XisEntityUseCase that is named

"Manage" followed by the XisBusinessEntity master

XisEntity name, associated to the XisBusinessEntity

by a XisModule-BEManageAssociation. And a de-

fault XisActor will be linked to the XisEntityUseCase

using a XisRole-ModuleAssociation.

4 EVALUATION

In this section a real-world Social Security legacy ap-

plication (SS-App) will be used as the case study.

Firstly, due to the extension of this paper a simpliﬁed

version of the SS-App (described in the Subsection

3.2) will be used to show the produced speciﬁcations

with our tool. Secondly, the results of using our tool

with the whole version of this application will be fur-

ther shown and analysed.

4.1 A Simple Example

Taking as input the speciﬁed running example SS-

App data model, selecting database access to en-

hance the output speciﬁcations, selecting Status col-

umn from UnknownDocument to extract its default

values and activating generalization discovery by the

higher number of shared attributes with the minimum

of 3 attributes, Figure 9 shows the produced XIS* Do-

main view. For this image only the main contributions

of this paper were taken into account.

From the aforementioned conﬁgurations it was

possible to detect an aggregation between Beneﬁciary

and Document through (EA 2) b). Moreover, a super-

class with the shared attributes was created for Doc-

ument and UnknownDocument entities, through the

further XisEntityInheritance transformation rule. And

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

202

ﬁnally, a note was created and linked to the Unknown-

Document entity displaying the default values that the

Status column had in the database, through (EA 3).

Figure 9: Case Study SS-App XIS* Domain view.

4.2 A Real-world Example

During our research we had the chance to apply this

approach to a real-world legacy application, in order

to evaluate how much knowledge we could extract

from it.

This legacy database was from a Social Security

application and had 187 tables.

To reverse engineer this database we tried differ-

ent conﬁgurations, from which we are going to stress

two. Both used database access to enhance output

speciﬁcations and generalization discovery activated

by the higher number of shared attributes, with a mini-

mum of 5 shared attributes. Moreover, after analysing

the database we identiﬁed 5 often used attributes, each

one of them with same name and type in almost every

table. Thus, those 5 attributes were included in the

list of attribute names to ignore, in order to decrease

the generalization discovery time and try to produce

more interesting results, instead of a superclass that

would generalize almost every entity. The differ-

ence between the two conﬁgurations were in terms of

Simple Principal Tables selection. Although, in the

Experiment-A we did not select Simple Principal Ta-

bles, in Experiment-B we did.

Table 1, shows the main results of those two con-

ﬁgurations (Experiment-A and Experiment-B) with

the following metrics: number of XisEntities (in-

cluding explicit and implicit superclasses); number

of XisAssociations (also including Aggregations and

Many-to-many associations); the number of Aggre-

gations; number of Many-to-many associations; num-

ber of explicit superclasses; number of implicit super-

classes (through Generalization discovery) and num-

ber of XisBusinessEntities.

Table 1: Main results from reverse engineering process with

a real-world Social Security application.

Experiment-A Experiment-B

Entities 174 174

Associations 205 205

Aggregations 126 43

Many-to-many 19 19

Exp superclasses 0 0

Imp superclasses 4 4

BusinessEntities 140 156

After the Experiment-A reverse engineering, we

were mainly surprised with the amount of Aggrega-

tions detected (more than 60% of all Associations).

Then we explored more the database and concluded

that almost every foreign key from a main entity (e.g.

Beneﬁciary) to a conﬁguration entity, was mistakenly

transformed into an aggregation from the conﬁgura-

tion entity to the main entity.

Thus, in order to discard such aggregations, in

the Experiment-B conﬁguration we selected every en-

tity with at most 20 rows per table in the database

as a Simple Principal Table. Then the number of

aggregations reduced to around 20% of all Associa-

tions, and after analysing such aggregations against

the database, most of them made sense to be classiﬁed

as aggregations (e.g. Beneﬁciary aggregates Activity

Logs).

We did not further explored different general-

ization discovery conﬁgurations, since the minimum

number of shared attributes was already too low. Thus

we assumed that 4 implicit superclasses was already

a good result.

5 RELATED WORK

Reverse engineering of software applications have

been extensively studied in the last decades, and their

main contributions have been crucial to produce bet-

ter results in this complex activity, and furthermore

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

203

to stimulate the development of reverse engineering

tools. More recently, MDE started to be applied to

reverse engineering (MDRE), promoting a more sys-

tematic and ﬂexible process. Moreover, MDRE ap-

proaches have been extended in order to completely

reengineer a source application into a new target ap-

plication through model-driven reengineering tech-

niques.

This section overviews the most relevant research

studies, covering data schema extraction and reverse

engineering of databases. Moreover, those contribu-

tions are also compared with our approach.

5.1 Data Schema Extraction

The main properties of the research works analysed

in this subsection are shown in Table 2. This table

speciﬁes for each approach, its input, the existence of

data schema extractors, if it extracts all table proper-

ties and its output. The last row categorises our ap-

proach.

Gra2MoL (Izquierdo et al., 2008) Text-to-Model

(T2M) language and MoDisco (Bruneliere et al.,

2014) framework have been specially tailored for data

schema extraction (model injection). Gra2MoL is a

Domain Speciﬁc Language (DSL) to write transfor-

mations between any textual artefact which conforms

to a grammar (e.g. source code) and a model which

conforms to a target metamodel. On the other hand,

MoDisco is a Java framework intended to facilitate

the implementation of MDRE approaches. Regard-

ing to data schema extraction, MoDisco facilitates the

implementation of discoverers (model injectors), and

it currently provides discoverers for Java, JSP and

XML. A drawback for both approaches is that they

would require the deﬁnition of such transformations

and discoverers, respectively, to extract the database

schema.

Schemol (Díaz et al., 2013) is another tool for

injecting models. However, this tool allows inject-

ing data stored into database by specifying transfor-

mations that express the correspondence between the

source database schema and the target metamodel.

Furthermore, in terms of database model injec-

tion (to ease this T2M transformation) it is possi-

ble to transform a database schema into a graphi-

cal representation using a variety of commercial and

academic tools. DB-MAIN (Hainaut et al., 1994)

and SQL2XMI (Alalﬁ et al., 2008) are two exam-

ples of such academic tools. Firstly, DB-MAIN is

a toolbox that supports database reverse engineer-

ing, and includes legacy database schemas extractors,

through several sources such as ODBC drivers or SQL

ﬁles. Secondly, SQL2XMI is entitled as a lightweight

transformation of data models from SQL Schemas to

UML-ER expressed in XMI. To our knowledge, this

tool is still limited compared with DB-MAIN since

it does not infer entity types or cardinalities, and for

now it is only compatible with the MySQL implemen-

tation of the SQL DDL.

To sum up, given that a set of existing tools al-

ready support data schema extraction from several

sources, without additional speciﬁcation of transfor-

mations, we took advantage of such tools, more pre-

cisely we used EA.

5.2 Reverse Engineering

A reverse engineering approach can be classiﬁed in

several ways. Table 3 gives a properties summary of

the thereafter analysed research works. These proper-

ties include: kind of analysis, output, the existence of

tools for automating the approach, automation level of

those tools in regards to the reverse engineering stage,

if the tool allows extension and main contributions (A

- Aggregations Extraction, G - Implicit Generaliza-

tion Extraction and V - Default Values Extraction).

Regarding reverse engineering techniques, sev-

eral approaches have been proposed, which are usu-

ally distinguished by the particular application arte-

fact used as main information source. The most rel-

evant research works, following such distinction, are

described below.

Schema analysis (Premerlani and Blaha, 1993)

is mainly focused on spotting similarities in names,

value domains and representative patterns. This tech-

nique may help identify missing constructs (e.g. for-

eign keys). Moreover, (Premerlani and Blaha, 1993)

speciﬁcation of a manual process was adapted in

our approach to semi-automatically and automatically

identify generalizations and many-to-many associa-

tions, respectively.

Data analysis (Chiang et al., 1994; Pannurat et al.,

2010) uses content mined from a database. Firstly it

can be used to analyse the database normalisation and

secondly to verify hypothetical constructs suggested

by other techniques. Given the combinatorial explo-

sion that can affect the ﬁrst approach, data analysis

technique is mainly used with the purpose of the sec-

ond approach. In addition, our approach uses this

technique in a similar way, however it is applied to

classify associations.

Screen analysis (Ramdoyal et al., 2010) state that

user interfaces can also be sources of useful informa-

tion. These user-oriented views over a database may

display spatial structures, meaningful names and, at

run time data population and errors combined with

data-ﬂow analysis may provide information about

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

204

Table 2: Classiﬁcation of some related works on data schema extraction.

Approach Input Schema Extractors Properties Output

Gra2Mol (Izquierdo et al., 2008) any textual artefact no n.a. any model

MoDisco (Bruneliere et al., 2014) several sources no n.a. any model

Schemol (Díaz et al., 2013) data stored into DB no n.a. any model

DB-MAIN (Hainaut et al., 1994) several sources yes (e.g. ODBC) yes GER

SQL2XMI (Alalﬁ et al., 2008) SQL DDL schema yes (only MySQL) no UML in XMI

EA (our approach) several sources yes (e.g. ODBC) yes UML

data structures and schema properties. Moreover, our

approach did not consider this kind of analysis.

Static (Petit et al., 1994; Di Lucca et al., 2000)

and Dynamic (Cleve et al., 2011a; Cleve et al., 2011b)

program analysis can easily give valuable information

about ﬁeld structure and meaningful names, or iden-

tifying complex constraint checking and foreign keys

after a complex analysis. A main challenge of dy-

namic program analysis is the extraction of highly dy-

namic interactions between a program and a database.

Moreover, the analysis of SQL statements is one of

the most powerful variant of source code analysis.

Furthermore, our approach uses static program anal-

ysis in the proﬁler log ﬁle, aiming to classify associa-

tions.

Additionally, a set of approaches, concerning the

application of MDE, are also taken into account. Our

analysis focused on their injection and reverse engi-

neering stages.

As previously introduced, MoDisco MDRE

framework (Bruneliere et al., 2014) has a huge poten-

tial in order to support reverse engineering activities,

due to its generic and extensible properties. Besides

its legacy application discoverers (model injectors),

MoDisco also allows the deﬁnition of transformations

and generators, responsible for restructuring and for-

ward engineering tasks over the system models, re-

spectively. Our approach could be implemented ex-

tending this framework, however that would require

the deﬁnition from scratch of all the three stages (dis-

coverers, transformations and generators) needed to

produce the desired artefacts.

Polo et al. propose a method and a tool, called

Relational Web (Polo et al., 2007) specially designed

for database reengineering. The starting point is a re-

lational database, whose physical schema is reverse

engineered into a class diagram representing its con-

ceptual schema. In the restructuring stage, the class

diagram is manipulated by the user and then passed

as input to the forward engineering step. Moreover,

this tool supports the deﬁnition of new database man-

agers to be used as input and the implementation of

new code generators. Furthermore, on the one hand

this approach only uses as input the physical schema,

and user knowledge, and its tool does not take advan-

tage of the existing MDE techniques nor technologies.

However, this approach deﬁned foreign key’s seman-

tic extraction techniques, to identify inheritance rela-

tionships and associations, that were adapted and ex-

tended by our approach, in order to identify aggrega-

tions.

As previously stated, DB-MAIN (Hainaut et al.,

1994) is a toolbox that offers a complete function-

ality with which to apply database reverse engineer-

ing. Regarding to reverse engineering DB-MAIN in-

cludes features such as extractors of legacy database

schemas, transformations between schemas, data and

code analysis tools, among others. This tool is one

of the most mature ones, in regard to database re-

verse engineering, meaning that it includes several

features that have been the result of a great number of

research contributions from Namur University. DB-

MAIN supports a lot of common transformations and

extraction tools thus, a user with such tools can han-

dle almost any needed transformation to create a good

conceptual schema. However, this tool will require

the user to apply all the needed transformations, thus

the degree of automation achieved in our approach is

higher. Furthermore, DB-MAIN supports generaliza-

tion representation, but once again it must be identi-

ﬁed by the user.

Regarding to the main contributions of this paper,

we do not ﬁnd any other approach that specializes as-

sociations (e.g. distinguishing between associations

and aggregations), nor any approach that allows to ex-

tract default values and their representation into the

target conceptual schema.

6 CONCLUSION AND FUTURE

WORK

This paper presented a model-driven reverse engi-

neering approach, named XIS-Reverse, applied to

legacy information systems. This approach provides

the opportunity to maintain and extend a given appli-

cation, even in cases that the related documentation is

outdated or missing. Moreover, XIS-Reverse is able

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

205

Table 3: Classiﬁcation of some works on reverse engineering.

Approach Analysis Output Tools Auto. Extension Contributions

(Premerlani and

Blaha, 1993)

schema OMT class

diagram

no n.a n.a. n.a.

(Chiang et al.,

1994)

data Extended ER no n.a n.a. n.a.

NoWARs (Pannurat

et al., 2010)

data Conceptual

schema

yes Semi n.a. n.a.

RAINBOW

(Ramdoyal et al.,

2010)

screen ER model yes n.a. n.a. n.a.

(Petit et al., 1994) program (static) Extended ER no n.a n.a. n.a.

(Di Lucca et al.,

2000)

program (static) Object-Oriented

class diagram

no n.a n.a. n.a.

(Cleve et al.,

2011a)

program

(dynamic)

n.a. yes n.a. n.a. n.a.

(Cleve et al.,

2011b)

program

(dynamic)

n.a. yes n.a. n.a. n.a.

Modisco

(Bruneliere et al.,

2014)

schema and

program (static)

KDM or UML yes Auto. yes n.a.

Relational Web

(Polo et al., 2007)

schema UML yes Auto. yes n.a.

DB-MAIN

(Hainaut et al.,

1994)

schema and

program (static)

GER yes Semi yes G

XIS-Reverse (our

approach)

schema, data and

program

XIS* and

RSLingo’s RSL

yes Semi yes A;G;V

to specify an application both as XIS* models and

as RSLingo’s RSL speciﬁcations. The main contri-

butions of this paper in terms of reverse engineering,

are focused in the implementation of semi-automatic

heuristics that can identify certain relationships be-

tween entities, namely aggregations and generaliza-

tions, and also the possibility to extract default values

from the source databases to enrich the target models

or speciﬁcations. This identiﬁcation enables require-

ments enhancement, with less effort for the user by

using a tool that supports Reverse Engineering Exe-

cutions.

Regarding future work, in terms of extensibility

of the process we would like to evaluate the interop-

erability of the produced models with the XIS-Web

framework, for example, and analysing the similarity

of a forward engineered application from those mod-

els, allied with the smart approach of XIS* technolo-

gies, with the original legacy application. Addition-

ally, a divide-and-conquer approach could be used to

manage the complexity of identifying implicit gener-

alizations, splitting sets of entities by their schema,

for example.

ACKNOWLEDGEMENTS

This work was partially supported by national funds

under FCT project UID/CEC/50021/2013 and the

project TT-MDD-Mindbury/2015.

REFERENCES

Alalﬁ, M. H., Cordy, J. R., and Dean, T. R. (2008).

SQL2XMI: Reverse engineering of uml-er diagrams

from relational database schemas. In 15th Working

Conference on Reverse Engineering, pages 187–191.

IEEE.

Arnold, R. S. (1993). Software reengineering. IEEE Com-

puter Society Press.

Bruneliere, H., Cabot, J., Dupé, G., and Madiot, F.

(2014). Modisco: A model driven reverse engineer-

ing framework. Information and Software Technology,

56(8):1012–1032.

Chiang, R. H., Barron, T. M., and Storey, V. C. (1994). Re-

verse engineering of relational databases: Extraction

of an eer model from a relational database. Data &

Knowledge Engineering, 12(2):107–142.

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

206

Chikofsky, E. J. and Cross, J. H. (1990). Reverse engineer-

ing and design recovery: A taxonomy. IEEE software,

7(1):13–17.

Cleve, A., Meurisse, J.-R., and Hainaut, J.-L. (2011a).

Database semantics recovery through analysis of dy-

namic sql statements. In Journal on data semantics

XV, pages 130–157. Springer.

Cleve, A., Noughi, N., and Hainaut, J.-L. (2011b). Dy-

namic program analysis for database reverse engineer-

ing. In International Summer School on Generative

and Transformational Techniques in Software Engi-

neering, pages 297–321. Springer.

da Silva, A. R. (2003). The XIS approach and principles. In

29th Euromicro Conference, pages 33–40. IEEE.

da Silva, A. R. (2015). Model-driven engineering: A survey

supported by the uniﬁed conceptual model. Computer

Languages, Systems & Structures, 43:139–155.

de Almeida Ferreira, D. and da Silva, A. R. (2012).

RSLingo: An information extraction approach

toward formal requirements speciﬁcations. In

Model-Driven Requirements Engineering Workshop

(MoDRE), pages 39–48. IEEE.

de Almeida Ferreira, D. and da Silva, A. R. (2013). RSL-

IL: An interlingua for formally documenting require-

ments. In Model-Driven Requirements Engineering

(MoDRE), pages 40–49. IEEE.

De Lucia, A., Francese, R., Scanniello, G., and Tortora, G.

(2008). Developing legacy system migration methods

and tools for technology transfer. Software: practice

& experience, 38(13):1333.

Dean, J. and Ghemawat, S. (2008). Mapreduce: simpliﬁed

data processing on large clusters. Communications of

the ACM, 51(1):107–113.

Di Lucca, G. A., Fasolino, A. R., and De Carlini, U. (2000).

Recovering class diagrams from data-intensive legacy

systems. In Software Maintenance, 2000. Pro-

ceedings. International Conference on, pages 52–63.

IEEE.

Díaz, O., Puente, G., Izquierdo, J. L. C., and Molina, J. G.

(2013). Harvesting models from web 2.0 databases.

Software & Systems Modeling, 12(1):15–34.

Filipe, P., Ribeiro, A., and da Silva, A. R. (2016). XIS-

CMS: Towards a model-driven approach for develop-

ing platform-independent cms-speciﬁc modules. In

MODELSWARD’2016. SCITEPRESS.

Hainaut, J.-L., Englebert, V., Henrard, J., Hick, J.-M., and

Roland, D. (1994). Database evolution: the DB-

MAIN approach. In International Conference on Con-

ceptual Modeling, pages 112–131. Springer.

Izquierdo, J. L. C., Cuadrado, J. S., and Molina, J. G.

(2008). Gra2mol: A domain speciﬁc transformation

language for bridging grammarware to modelware

in software modernization. In Workshop on Model-

Driven Software Evolution.

Pannurat, N., Kerdprasop, N., and Kerdprasop, K. (2010).

Database reverse engineering based on association

rule mining. arXiv preprint arXiv:1004.3272.

Petit, J.-M., Kouloumdjian, J., Boulicaut, J.-F., and

Toumani, F. (1994). Using queries to improve

database reverse engineering. In International Con-

ference on Conceptual Modeling, pages 369–386.

Springer.

Polo, M., García-Rodríguez, I., and Piattini, M. (2007).

An mda-based approach for database re-engineering.

Journal of Software Maintenance and Evolution: Re-

search and Practice, 19(6):383–417.

Premerlani, W. J. and Blaha, M. R. (1993). An approach for

reverse engineering of relational databases. In Reverse

Engineering, 1993., Proceedings of Working Confer-

ence on, pages 151–160. IEEE.

Ramdoyal, R., Cleve, A., and Hainaut, J.-L. (2010). Reverse

engineering user interfaces for interactive database

conceptual analysis. In International Conference on

Advanced Information Systems Engineering, pages

332–347. Springer.

Ribeiro, A. and da Silva, A. R. (2014a). Evaluation of XIS-

Mobile, a domain speciﬁc language for mobile appli-

cation development. Journal of Software Engineering

and Applications, 7(11):906.

Ribeiro, A. and da Silva, A. R. (2014b). XIS-Mobile: a dsl

for mobile applications. In 29th Annual ACM Sympo-

sium on Applied Computing, pages 1316–1323. ACM.

Ruiz, F. et al. (2016). An approach for model-driven data

reengineering. PhD dissertation, University of Mur-

cia.

Seixas, J. (2016). The XIS-Web Technology: A Model-

Driven Development Approach for Responsive Web

Applications. MSc dissertation, Instituto Superior

Técnico, University of Lisbon.

XIS-Reverse: A Model-driven Reverse Engineering Approach for Legacy Information Systems

207