An MDE Approach to Generate Schemas for Object-document Mappers

Diego Sevilla Ruiz, Severino Feliciano Morales and Jes

us Garc

ıa-Molina

Faculty of Computer Science, University of Murcia, Campus Espinardo, Murcia, Spain

Keywords:

NoSQL Databases, NOSQL Data Engineering, Code Generation, MDE Solution, Object-document Mappers.

Abstract:

Most NoSQL systems are schemaless. This lack of schema offers a greater ﬂexibility than relational systems.

However, this comes at the cost of losing beneﬁts such as the static checking that assure that stored data

conforms to the database schema. Instead, developers must be in charge of this task that is time-consuming

and error-prone. Object-NoSQL mappers are emerging to alleviate this task and facilitate the development of

NoSQL applications. These mappers allow the deﬁnition of schemas, which are used to assure that data are

correctly manipulated. In this article, we present an MDE approach that automatically generates schemas and

other artefacts for mappers. As proof of concept, we have considered Mongoose, which is a widely used map-

per for MongoDB, but the solution is mapper-independent. Firstly, the schemas are inferred from stored data

by using an approach deﬁned in a previous work. Then, NoSQL schema models are input to a two-step model

transformation chain that generates schemas, validators and updating procedures, among other Mongoose arte-

facts. An intermediate metamodel has been deﬁned to ease the implementation of the transformations. This

work shows how MDE techniques can be applied in the emerging “NoSQL Data Engineering” ﬁeld.

1 INTRODUCTION

Interest in NoSQL (Not only SQL) databases has

steadily grown over the last decade. Modern appli-

cations (e.g., social media, Internet of Things, mo-

bile apps, and Big Data) have evidenced the limi-

tations of traditional relational systems to meet the

scalability and performance demands of these ap-

plications. A large number of companies have al-

ready embraced NoSQL databases, and the adop-

tion will rise considerably in next years, as reported

in (Dataversity, 2015; NoSQL-Market, 2016). The

“nosql-database.org” website shows a list of 225 ex-

isting NoSQL systems. Actually, the NOSQL term

refers to a varied set of data modelling paradigms

aimed to manage semi-structured and unstructured

data. The major NoSQL categories are: document,

wide column and key-value stores, and graph-based

databases. Except for graph databases, the paradigms

aim to represent semi-structured data using reference

and aggregation (Sadalage and Fowler, 2012). Mon-

goDB (MongoDB, 2016) is the most widely used

NoSQL system (NoSQL Ranking, 2016).

Most NoSQL systems are schemaless. This is a

signiﬁcant difference with respect to relational sys-

tems which need the deﬁnition of a schema to store

data. Schemaless provides ﬂexibility to manage data,

for example, different versions of a data entity can be

stored, and migrations are easier (Fowler, 2013). This

characteristic is probably considered the most inter-

esting and attractive of NoSQL systems. However,

it should be noted that a data schema is often con-

venient, because database applications require know-

ing the data organization in order to manage data efﬁ-

ciently. In schemaless databases, the schema is in the

minds of developers, and implicitly represented in the

code and stored data. Therefore, two alternatives are

emerging for NoSQL database applications: (i) com-

bining the schemaless approach with mechanisms that

guarantee a correct access to data (e.g. data valida-

tors) (Fowler, 2013), and (ii) using mappers which

converts NoSQL data into objects of a programming

language. Most current mappers are for document

stores (Object-document mappers, ODMs), and Mon-

goose (Mongoose, 2016) is the most widely used

ODM, created for MongoDB and Javascript. How-

ever, ODMs for other languages, such as Java and

PHP, are also available.

Some recent works have drawn attention to the

need for NoSQL tools just as they exist for relational

databases. A Dataversity report (Dataversity, 2015)

has remarked that data modeling will be a crucial

activity for NoSQL databases. The authors identi-

ﬁed three main capabilities to be offered by NoSQL

modelling tools: model visualization, metadata man-

agement, and code generation. The emergence of a

220

Sevilla D., Feliciano S. and GarcÃ a-Molina J.

An MDE Approach to Generate Schemas for Object-document Mappers.

DOI: 10.5220/0006279102200228

In Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2017), pages 220-228

ISBN: 978-989-758-210-3

NoSQL data engineering, which would be a R&D

area similar to relational data engineering, was sug-

gested in (Sevilla Ruiz et al., 2015b).

Models and model transformations are key ele-

ments in data engineering. Data schemas are mod-

els, and operations on them can be implemented using

model transformations. Therefore MDE techniques

can be very useful to develop NoSQL tools which as-

sist developers. In this article, we will present a MDE-

based solution to automate the usage of ODMs when

the database already exists. We have deﬁned a chain

of model transformations that generate some of the

main artefacts involved in any ODM: schema deﬁni-

tion, and validators, among others. This chain has as

input schema models inferred by means of the strat-

egy presented in (Sevilla Ruiz et al., 2015a). These

schema models are transformed into models that con-

form to a EntityDifferentiation metamodel, deﬁned in

this work to facilitate the generation of code. Finally,

the EntityDifferentiation models are used to generate

mapper code. Our solution deals with the existence

of more than one version for data entities, which im-

poses an additional complexity; conversely, it cap-

tures all the variability in the data base.

Therefore the contribution of our work is twofold.

We present an application of the inference process de-

ﬁned in (Sevilla Ruiz et al., 2015a), and we show how

MDE techniques are useful in the “NoSQL data en-

gineering” area, more speciﬁcally in automating the

usage of ODM mappers.

This article has been organized as follows. The

following Section presents an overview of the pro-

posal and introduces some basic concepts. Section 3

explains how schema models are transformed into En-

tityDifferentiation models. Section 4 describes the

generation of Mongoose schemas in detail. Section 5

shows the DSL deﬁned to create parameter models

and describes the generation of more Mongoose arte-

facts. Finally, related work, conclusions, and further

work are discussed.

2 OVERVIEW OF THE

APPROACH

NoSQL Schemas for Aggregation-oriented Data

Models. As indicated above, most NoSQL database

systems do not require the deﬁnition of an explicit

schema, but it is implicit in the data. In the case of

NoSQL databases that are based on an aggregation-

oriented data model, the schema is basically formed

by a set of entities connected through two types of

relationships: aggregation (a.k.a. part of) and ref-

erence. Each entity has one or more attributes that

are speciﬁed by their names and data types. These

data types can be either a primitive data type (number,

boolean, char, string) or some kind of collection (e.g.

tuples in MongoDB) that stores primitive type values.

A remarkable feature of these schemas is the possible

existence of several versions of an entity, as the ab-

sence of an explicit schema allows the non-uniformity

of the stored data. These versions can arise due to the

need of having different variants of an entity or due

to changes made during the evolution of data. Since

this article focuses on MongoDB, we will use the term

document to refer to the instances of an entity, and

ﬁeld to refer to properties (attributes, aggregates, and

references).

Entities (and therefore entity versions) can be

of three kinds: (i) root of a aggregation hierarchy;

(ii) aggregated to a root or other aggregated entity,

and (iii) leaf that are does not aggregate any entity.

An entity version is characterized by a set of ﬁelds

that can be of three kinds: (i) Common to all entity

versions (i.e. they are part of the all documents of

the entity); (ii) Shared with other entity versions; and

(iii) speciﬁc to particular entity version.

The Proposed Solution. In (Sevilla Ruiz et al.,

2015a) we presented an MDE approach that imple-

ments a reverse engineering strategy to infer the im-

plicit schema in NoSQL databases, and we outlined

how the inferred model could be used to develop

database utilities. Figure 1 shows the metamodel de-

ﬁned to represent NoSQL schemas. Compared with

the related work (Wang et al., 2015; Klettke et al.,

2015), the main novelty of our approach is discover-

ing all the versions of the inferred entities and their

relationships (i.e. composition and reference). Here,

we shall show how the inferred schemas are useful

to automate the usage of object-document mappers.

We have considered Mongoose for MongoDB, but

the solution presented is applicable to other object-

document mappers.

We have deﬁned a two-step model transforma-

tion chain which has as input an inferred NoSQL

schema model, and generates Javascript code of Mon-

goose artefacts. The generated artefacts are mainly

the database schema and validators. Figure 2 shows

this generation process, which will be explained in

detail in the next Sections. The ﬁrst step of the

chain is a model-to-model (m2m) transformation that

represents the properties of entity versions in a way

that facilitates the code generation. The model ob-

tained conforms to the EntityDifferentiation meta-

model which will be explained in Section 3. The sec-

ond step is a model-to-text (m2t) transformation that

generates Mongoose code from the model obtained

An MDE Approach to Generate Schemas for Object-document Mappers

221

Figure 1: NoSQL Schema Metamodel.

in the previous step. The ﬁrst transformation has

been implemented in Java, while the model-to-text

has been written in Xtend (Xtend, 2016). They can

be downloaded from https://github.com/catedrasaes-

umu/NoSQLDataEngineering.

Database Example. JSON is the format commonly

used to store/read data into/from document databases.

Figure 3 shows the JSON documents of the Movie

database example used in this article to illustrate

our process. This database includes two collections:

Movie and Director. We have supposed that ini-

tially a Movie document had four mandatory ﬁelds:

title, year, director and genre, and an optional ﬁeld

prizes. The ﬁeld criticisms was added later. Each Di-

rector document has the mandatory ﬁelds name and

director-movies, and the actor-movies optional ﬁeld.

The Criticism documents have the mandatory ﬁelds

content, journalist, media, and the url optional ﬁeld.

The Prize documents have the ﬁelds year, event, and

name.

According to the terminology introduced above,

the schema is formed by two root entities (Movie and

Director), and two aggregated entities which are leaf

entities (Prize and Criticism). Three entity versions

are identiﬁed for movies: Movie1 (has prizes and crit-

icisms), Movie2 (has criticisms but not prizes) and

Movie4 (neither prizes nor criticisms). Note that ti-

tle, year, director, and genre are common ﬁelds for

Movie documents. We can observe two entity ver-

sions for Director and Criticism and a single version

for Prize.

Figure 4 shows the schema inferred for the Movie1

root version entity by using a notation similar to

UML class diagrams. It is worth noting that we

have implemented a m2t transformation to generate

these diagrams that show the properties (attributes,

aggregations, and references) of a root entity. This

transformation traverses an input NoSQL schema

model and generates the corresponding PlantUML

code for all the entities involved. PlantUML (Plan-

tUML, 2016) is a generator of UML diagrams built

on GraphViz (GraphViz, 2016); it provides a simple

and intuitive textual language to express UML dia-

grams. In the case of relationships, the possibility of

a recursive composition have been considered by our

m2t transformation.

Object-document Mappers: Mongoose. When

database systems (e.g. relational systems) require

the deﬁnition of a schema that speciﬁes the struc-

ture of the stored data, a static checking assures that

only data that ﬁts the schema can be manipulated

in application code, and mistakes made by develop-

ers in accessing data are statically spotted. However,

schemaless databases entail developers to guarantee

the correct access to data. This is an error-prone

task, more so when the existence of several versions

of each entity is possible. Therefore, some database

utilities are emerging in order to alleviate this task.

Object-NoSQL mappers are probably the more use-

ful of these new tools. Like object-relational map-

pers, these mappers provide transparent persistence

and perform a mapping between stored data and ap-

plication objects. This requires that developers de-

ﬁne a data schema, e.g. by using JSON (Mongoose,

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

222

Schema

inference

NoSQL

Schema

metamodel

NoSQL

Schema

model

Version

diﬀerentiation

Entity

Diﬀerentiation

model

Mongoose

artefacts

generation

Entity

Diﬀerentiation

metamodel

- Database schema

- Data validators

- Discriminators

- Update functions

<<conforms>>

ODM Parameter

model

ODM Parameter

metamodel

<<conforms>>

<<m2t>>

Database

<<m2m>>

Figure 2: Overview of the Proposed MDE Solution.

[{

" mo v ie " : [

{

" typ e " : " mo v ie " ,

" ti t le " : " C it i ze n Ka ne ",

" yea r " : 1 9 41 ,

" di re cto r_ id " : " 1 23 4 51 " ,

" ge n re " : " D ram a " ,

" _id " : "1" ,

" pr i ze s " : [

{

" yea r " : 1 9 41 ,

" ev e nt " : " O sca r " ,

" na m es " : [

" Bes t scr e en pl a y " ,

" Bes t Writ ing "

]

{

" yea r " : 1 9 41 ,

" ev e nt " : " NY Fi l m Cr iti cs " ,

" na m es " : [ " Bes t S cr ee n pl ay " ]

" cr it i ci sm s ": [

{ " j ou rna li s t " : " R . Br ody " ,

" me d ia " : " T h e Ne w Yo rk e r ",

" co l or " : " g ree n "

}

]

{

" typ e " : " mo v ie " ,

" ti t le " : " T h e Ma n W ho W oul d Be Ki ng ",

" yea r " : 1 9 75 ,

" di re cto r_ id " : " 9 28 6 72 " ,

" ge n re " : " A dv en t ur es " ,

" _id " : "2"

{

" _id " : "4" ,

" typ e " : " mo v ie " ,

" ti t le " : " T rut h " ,

" yea r " : 2 0 14 ,

" di re cto r_ id " : " 3 45 6 79 " ,

" ge n re " : " D ram a " ,

" cr it i ci sm s ": [

{

" jo ur n al is t ": " J ordi Co sta " ,

" me d ia " : " El pa i s ",

" url " : " h t tp : // el pai s . c om /" ,

" co l or " : " r e d "

}

]

}] ,

" di rec to r " : [

{

" nam e " : " Or s on W e ll e s ",

" d i re ct ed _ mo vi es " : [ " 1 " , "5" ] ,

" a c to r_ m ov ie s " : [ "1 " , " 5 " ] ,

" typ e " : " di rec to r " ,

" _id " : " 123 451 "

{

" typ e " : " di rec to r " ,

" d i re ct ed _ mo vi es " : [ " 4 " ] ,

" nam e " : " Ja m es V an d er bil t ",

" _id " : " 345 679 "

{

" typ e " : " di rec to r " ,

" d i re ct ed _ mo vi es " : [ " 2 " ] ,

" nam e " : " Joh n H u st o n ",

" _id " : " 928 672 "

}

]

}]

Figure 3: JSON Documents in the Database Example.

2016), annotations (Doctrine, 2016), or a domain spe-

ciﬁc language (Mandango, 2016). Most of these map-

pers are Object-document mappers (ODM) because

document-based databases (mainly MongoDB

) are

the most widespread NoSQL systems. It is worth not-

ing that developers have two alternatives in building

Actually MongoDB uses BSON (Binary JSON), a vari-

ation of JSON with optimized binary storage and some

added data types.

NoSQL database applications. They can work in a

schemaless way or use an ODM mapper, by decid-

ing on the trade-offs between ﬂexibility and safety:

they could prefer not having the restrictions posed by

schemas or either avoid the data validation.

Mongoose is the de facto standard for deﬁning

schemas for MongoDB when writting Javascript ap-

plications. With Mongoose, database schemas can be

deﬁned as Javascript JSON objects, and then applica-

tions “can interact with MongoDB data in a structured

An MDE Approach to Generate Schemas for Object-document Mappers

223

Figure 4: Version Schema for the Movie1 Entity Version.

and repeatable way” (Holmes, 2013). Since JSON is

a subset of the object literal notation of JavaScript,

the Mongoose schemas are really Javascript code. We

will show some examples of schema deﬁnitions in

Section 4.

3 GENERATING ENTITY

DIFFERENTIATION MODELS

Generating artefacts to manage the different entities

and entity versions often have to differentiate between

the properties common to all entity versions and other

properties speciﬁc of a given entity version, as de-

scribed in Section 2.

This may seem a trivial task, but some subtleties

that will be addressed in this Section made it easier to

take this process as separate of the artefact generation

process, and also made the generation process itself

easier. Thus, the Entity Differentiation Metamodel

(shown in Figure 5) was created. Instances of this

model are obtained via a model-to-model transforma-

tion from the NoSQLSchema metamodel. Note that

the Entity Differentiation metamodel has references

to the elements of the NoSQLSchema metamodel.

Figure 5: Entity Differentiation Metamodel.

The rationale behind this metamodel lies in two

facts of the inference process:

1. The inference process is complete, that is, all doc-

uments in the database are considered, and their

different entity versions recorded. Each document

of the database belongs exactly to one entity ver-

sion.

2. In order to differentiate between versions of a

given entity, only properties speciﬁc of the given

entity version need to be considered.

As seen in the NoSQLSchema metamodel, each

entity has a set of entity versions. Each entity ver-

sion has a set of properties, that in turn have a name

and a type.

The transformation generates several set of inter-

esting properties: For each Entity, an EntityDiffSpec

model element is generated. This speciﬁcation holds

a set of common properties across all the versions,

commonProps, and a set of differentiation set of prop-

erties for each version, entityVersionProps.

Common properties are those properties that are

present (with the same name and type) in all the entity

versions of a given entity. Conversely, the set of spe-

ciﬁc properties (entityVersionProps) for a given entity

version is composed of the properties that are present

in this entity version, but are not present in all other

entity versions.

Properties in the Entity Differentiation metamodel

are linked through the PropertySpec class. This class

includes an attribute, needsTypeCheck, to signal when

a discrimination cannot be made just using the name

of the property. For instance, if two entity versions

share a property with the same name but with different

type, the fact that a document has a property with that

name cannot be used to discriminate between these

two entity versions: a type check must be performed.

So, the needsTypeCheck attribute is set for the prop-

erties that appear in any other entity version with the

same name but with different type.

An excerpt of the generated model for the

database example can be seen in the Figure 6.

4 GENERATING MONGOOSE

SCHEMAS

A Mongoose schema deﬁnes the structure of stored

data into a MongoDB collection. In document

databases, such as MongoDB, there is a collection for

each root entity. In our database example there would

be two collections: Movie and Director. Therefore, a

Mongoose schema should be deﬁned for each of the

two collections. Such schemas are the key element of

Mongoose, and other mechanisms are deﬁned based

on them, such as validators, discriminators, or index

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

224

Figure 6: Excerpt of the EntityDifferentiation Model for the Example.

building. In this Section, we shall explain the process

of generating schemas by means of the m2t transfor-

mation indicated in Section 2. Figure 7 shows the

generated schemas for the example database.

Aggregations and references can be speciﬁed in

Mongoose schemas. An aggregation is expressed as

an nested document which deﬁnes the schema of the

aggregated entity. A reference is expressed by means

of the ref option in the deﬁnition of the type of an at-

tribute. In addition to the type (i.e. a primitive type

as ObjectID, Number or String), the ref option is

used to indicate the name of a model of the referenced

schema. In Figure 7, the Movie schema aggregates

schemas for Prize (prizes ﬁeld) and Criticism (criti-

cisms ﬁeld), and includes a reference to Director doc-

uments stored in the Director collection (director id

ﬁeld). The ref option is used by Mongoose to ease

the use of references in queries.

Note that we are assuming a scenario in which

a company wants to use Mongoose for an existing

database in order to take advantage of its facilities

to check that data are correctly managed. Then, our

tool would infer the database schema and generate

code facilitating the use of the mapper. In generating

schemas, we had to consider the existence of entity

versions. For this, we have used the require valida-

tor that Mongoose provides to specify that a value

for a particular ﬁeld must always be given to save

documents of a schema. Speciﬁcations of ﬁelds that

are common to all the versions of an entity includes

the validator requires:true to guarantee that any docu-

ment stored of the entity will include these attributes.

To work with a particular entity version, developers

should add to the schema require restrictions for each

of the speciﬁc ﬁelds of the version. In the schema of

Figure 7, the Movie schema includes the require for

the four common ﬁelds. Once the schema is deﬁned,

two require restrictions are added for the prizes and

criticisms ﬁelds, which are speciﬁc to the Movie1 en-

tity version.

The transformation works as follows to generate

database schemas: A schema is generated for each

EntityDiffSpec connected to a root entity. For each

of them, its common and entity version properties are

added to the generated schema, but the require option

is added only for common properties. For each aggre-

gate property, an external declaration of type is added

to improve the legibility of the schema. A model is

created for schemas referenced from other schemas,

which is needed to add the ref option in the declara-

tion of the reference property. Note that this strategy

will recursively operate because the existence of ag-

gregate and reference properties.

5 GENERATING OTHER

MONGOOSE ARTEFACTS

In addition to the schema deﬁnitions and the manage-

ment of references between documents, Mongoose

provides functionality to facilitate the development of

MongoDB applications, such as validators, discrimi-

nators, and index speciﬁcation. To automatically gen-

An MDE Approach to Generate Schemas for Object-document Mappers

225

// M ovi e S ch e ma

var c ri ti ci s ms Sc he m a = {

co l or : {t y pe : S tri ng ,

enu m : [ ’g r een ’ , ’ yellow ’, ’ red ’ ],

re qui red : tr u e},

jo ur n al ist :{typ e : S t ri n g , u n iq u e : tru e ,

re qui red : tr u e},

me d ia :{t y pe : S tri ng , r eq u ir ed : t rue},

url : S tr i ng

}

var p ri zes Sc he m a = {

ev e nt :{t y pe : S tri ng , r eq u ir ed : t rue},

na m es :{t y pe : [ St r in g ] , r eq u ir ed : t rue},

yea r : {ty pe : N u mb e r , r equ ire d : tr ue}

}

var m ov i eS ch e ma = n e w m on goo se . S c hem a ({

ti t le :{t y pe : St rin g , m ax l en gth : 40 ,

un i qu e : t rue , r eq uir ed : tr u e},

_id : {type : S tri ng , i ndex : t rue ,

re qui red : t rue},

yea r : {ty pe : Nu m ber , in dex : tr ue ,

re qui red : t rue},

typ e : {ty pe : St r ing , r e qu ire d : true},

di re c to r_ i d : {t y pe : S tri ng , r eq u ir e d : t r ue ,

ref : ’ D ire c tor ’},

ge n re : {t y pe : St rin g ,

enu m : [ ’d r ama ’ , ’ come d y ’ , ’ chi ldre n ’ ] ,

re qui red : t rue},

cr it i ci sms : {ty p e : cr it ic i sm sS ch e ma },

pr i ze s : {ty pe : pr iz e sS ch e ma }

},{co lle ct i on : ’ Movie ’}) ;

// a dd re qui red f or Mo v ie 1 en ti t y ve rsi on

mo vi e Sc he m a . pat h (’ cr iti c ism s ’) . r equ ire d () ;

mo vi e Sc he m a . pat h (’ pri z es ’) . r e qu ire d () ;

var M ovi e = mo ngo os e . m ode l (’ Movie ’ , m ovi eS che ma ) ;

// a dd Di rec to r 1 s c hem a re fe r en ce d b y M ov i e1

var d ir ec to r Sc he ma = new m ong oo s e . Sc h em a ({

_id : {typ e : S tri ng , i nde x : t rue ,

re qui red : t rue},

nam e : {typ e : S tri ng , u ni q ue : t rue ,

re qui red : t rue},

typ e : {typ e : S tri ng , r eq uir ed : tr u e},

ac to r _m ov i es : {typ e : S tri ng ,

ref : ’ M o vie ’},

di re ct e d_ mo vi e s : {t ype : St rin g ,

re qui red : t rue ,

ref : ’ M o vie ’}

},{co lle ct i on : ’ Dir ecto r ’});

// a dd f o r D ir ect or 1 e n ti t y Ve rsi on

di re cto rS ch e ma . p ath ( ’ act or_ m ov i es ’) . re q ui red () ;

var D ir e ct or = m ong oos e . mod el (’ D irec t or ’ ,

di re cto rS ch e ma ) ;

Figure 7: Generated Mongoose Schema.

erate Mongoose code involved in all these mecha-

nisms, we have created a domain-speciﬁc language

(DSL) aimed to specify the information needed for

such generation. This DSL is named ODM Parameter

Language and it is independent of a concrete mapper

technology. Figure 8 shows an example of speciﬁ-

cation for the entities of our database example. This

DSL has been created with the Xtext (Xtext, 2016),

and models are obtained by means of the parser gen-

erated by this tool. These DSL models are input to

the m2t transformation that generates Mongoose arte-

facts from EntityDifferentiation models, as shown in

Figure 2.

In Mongoose, the validation is deﬁned at the

schema level. Some frequently used validators are

already built-in. The require and unique validators

can be applied to any property. As explained in pre-

vious Section, we have used require to specify what

properties are common to all the versions. The unique

validator is used to express that all the documents of

a collection must have a different value for a ﬁeld

of primitive type. Other examples of validators are

min and max for Number ﬁelds, and enum, minlength

and maxlength for String ﬁelds. Indexes are also de-

ﬁned at schema level, for instance an index can be

speciﬁed with the index option or the unique validator

(which also implies the creation of an index). Ex-

amples of use of these validators are shown in the

schema in Figure 7. For instance an enumeration is

deﬁned for the color ﬁeld of Criticism and the title

ﬁeld of Movie is unique. These validators have been

generated from the information provided by the DSL

speciﬁcation shown in Figure 8.

Our schema inference mechanism cannot discover

the decisions behind a version entity. As indicated

above, version variation can be caused by different

reasons, such as requirement changes, non-uniform

data, or custom ﬁelds in entities. We have used the

required validator to specify which ﬁelds are part of

a particular entity version. However, Mongoose pro-

vides the discriminator mechanism to have collections

of non-uniform data types. This mechanism would

be more appropriate than the require option for non-

uniform data. For instance, a MovieTheaters could

database: single screen or multiplexed theaters. The

name, city, country ﬁelds would be common, but the

roomNumber ﬁeld would be only part of multiplexed

theaters. Figure 8 shows how to declare a discrimina-

tor for an entity, MovieTheater in the example. The

generated code is shown in Figure 9.

Mongoose provides update() helper methods, but

they do not apply validators, so the code to perform

updating must be written following three steps (ﬁnd-

update-save). We also automate the generation of this

code. For instance, in Figure 10 we show the code

generated for updating the genre ﬁeld of the Movie

schema.

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

226

Figure 8: Speciﬁcation Example with the ODM Parameter Language.

var o pt i on s = { d is cr i mi na to r Ke y : ’ kin d ’ } ;

var mo vi e Th ea te rS c he ma = new mo n go ose . S che ma (

{ nam e : Str i ng , c ity : S t ring , c o un t ry : St r in g } ,

op t io ns ) ;

var M ov ie Th e at er 1 = m o ng o os e . mod e l ( ’ M ov ie T he at er1 ’ ,

th ea t er Sc he m a ) ;

var M ov ie Th e at er 2 = M o vi eT he a te r1 . d is cr i mi na t or (

’ Mo vi e Th ea t er 2 ’ ,

new m on g oo se . Sc hem a ({ r oo m Nu mbe r : N umb e r }, o p ti o ns ) );

Figure 9: MovieTheater schema using discriminator.

function u pd at e _g en re ( query , a Gen r e ) {

Mo v ie . fi ndO ne (

query ,

function ( err , mo v ie ) {

if (! e r r ) {

mo v ie . ge n re = a G en r e ;

mo v ie . sav e (function ( err , us e r ) {

co nso le . log ( ’ M ovi e s ave d : ’, m ovi e ) ;

}) ;

}

);

}

Figure 10: Code generated for updating the genre ﬁeld of

the Movie schema.

6 CONCLUSIONS AND RELATED

WORK

The Dataversity report (Dataversity, 2015) evidenced

the necessity of building tools to support the devel-

opment of NoSQL applications. This demands a

great effort of industry and academia in the emerging

NoSQL Data engineering ﬁeld. Tools with a function-

ality similar to those offered for relational databases

are required, specially tools for (i) generating code,

(ii) model visualization, and (iii) metadata manage-

ment.

Research effort in this ﬁeld has been very limited,

and it has been mainly focused on the inference of

schemas from stored data (Sevilla Ruiz et al., 2015a;

Wang et al., 2015; Klettke et al., 2015). In fact,

some existing tools for relational design are being

extended to provide NoSQL inferred schema visual-

ization (ER-Studio, 2016; DbSchema, 2016; ERWin,

2016). Moreover, NoSQL systems are offering tools

for viewing, analyzing, and querying stored data, as

Compass for MongoDB (Compass, 2016). Some kind

of data analysis is also performed in (Klettke et al.,

2015) where outliers are identiﬁed.

An MDE approach to generate code to manipulate

GraphDB graph databases from conceptual schemas

expressed in UML/OCL (Daniel et al., 2016). The

authors have deﬁned a GraphDB metamodel for graph

databases and a mapping from UML class diagrams to

this metamodel.

It is worth noting that the existence of version

entities (and therefore versioned schemas) and rela-

tionships among entities has been only considered

in (Sevilla Ruiz et al., 2015a) and (Wang et al.,

2015). We can take advantage of this fact in build-

ing utilities from inferred schemas, as we have shown

in (Hern

andez et al., 2016) and in the proposal pre-

sented in this article. We have recently presented a

tool of visualization for document databases, which

offers several diagrams and perspectives to show

NoSQL versioned schemas (Hern

andez et al., 2016).

In our knowledge, the work presented here is

the ﬁrst approach for automating the use of ODM

An MDE Approach to Generate Schemas for Object-document Mappers

227

mappers for existing databases. It is a technology-

independent solution, and as a proof of concept it has

been applied to Mongoose. We have been able to gen-

erate schemas and artefacts for the different function-

ality provided by Mongoose, such as validators, dis-

criminators, and reference management. The solution

presented has shown the usefulness of deﬁning inter-

mediate metamodels.

With regard to future work, we plan to consider

more mappers for MongoDB and for other databases,

and to build a generative architecture to automate the

building of Mongoose-based MEAN applications for

existing databases. Moreover, we are working on the

deﬁnition of a DSL aimed to specify the necessary

steps to convert one version of a database object to

another version. This could be used in at least two

ways: (i) A new application may require that all the

recovered objects comply with a new version; (ii) In

the case of a batch database migration, Map-Reduce

jobs could be generated to transform old version ob-

jects into new versions.

ACKNOWLEDGEMENTS

This work has been partially supported by the

atedra SAES of the University of Murcia

(http://www.catedrasaes.org), a research lab spon-

sored by the SAES company (http://www.electronica-

submarina.com/).

REFERENCES

Compass (2016). Mongodb Compass Web Page. https://

www.mongodb.com/products/compass. Accessed:

November 2016.

Daniel, G., Suny

e, G., and Cabot, J. (2016). UML-

toGraphDB: Mapping Conceptual Schemas to Graph

Databases, pages 430–444. Springer International

Publishing, Cham.

Dataversity (2015). Insights into NoSQL Modeling Report.

DbSchema (2016). DbSchema Web Page. http://

www.dbschema.com. Accessed: November 2016.

Doctrine (2016). Doctrine Web Page. http://docs.

doctrine-project.org/projects/doctrine-mongodb-odm/

en/latest/. Accessed: November 2016.

ER-Studio (2016). ER-Studio Web Page. https://

www.idera.com/er-studio-enterprise-data-modeling-

and-architecture-tools. Accessed: November 2016.

ERWin (2016). CA ERwin Web Page. http://erwin.com/

products/data-modeler. Accessed: November 2016.

Fowler, M. (2013). Schemaless Data Structures. http://

martinfowler.com/articles/schemaless/.

GraphViz (2016). GraphViz Web Page. http://

www.graphviz.org/. Accessed: November 2016.

Hern

andez, A., Sevilla Ruiz, D., and Garc

ıa-Molina,

J. (2016). Visualization of Inferred Versioned

Schemas from NoSQL Databases. SiriusCon 2016.

http://www.slideshare.net/Obeo

corp/siriuscon2016-

visualization-of-inferred-versioned-schemas-from-

nosql-database

Holmes, S. (2013). Mongoose for Application Develop-

ment. PACKT Publishing.

Klettke, M., Scherzinger, S., and St

orl, U. (2015). Schema

Extraction and Structural Outlier Detection for JSON-

based NoSQL Data Stores. In BTW.

Mandango (2016). Mandango Web Page. https://

mandango.org/. Accessed: November 2016.

MongoDB (2016). MongoDB Web Page. https://

www.mongodb.com/. Accessed: November 2016.

Mongoose (2016). Mongoose Web Page. http://

mongoosejs.com. Accessed: November 2016.

NoSQL-Market (2016). NoSQL Market. https://

www.alliedmarketresearch.com/NoSQL-market. Ac-

cessed: November 2016.

NoSQL Ranking (2016). NoSQL Ranking. http://

db-engines.com/en/ranking. Accessed: November

2016.

PlantUML (2016). PlantUML Web Page. http://

plantuml.com. Accessed: November 2016.

Sadalage, P. and Fowler, M. (2012). NoSQL Distilled. A

Brief Guide to the Emerging World of Polyglot Persis-

tence. Addison-Wesley.

Sevilla Ruiz, D., Feliciano Morales, S., and Garc

ıa Molina,

J. (2015a). Inferring Versioned Schemas from NoSQL

Databases and its Applications. In ER, pages 467–

480.

Sevilla Ruiz, D., Feliciano Morales, S., and Garc

ıa Molina,

J. (2015b). Model Driven NoSQL Data Engineering.

In JISBD, Santander.

Wang, L., Hassanzadeh, O., Zhang, S., Shi, J., Jiao, L., Zou,

J., and Wang, C. (2015). Schema Management for

Document Stores. In VLDB Endowment, volume 8.

Xtend (2016). Xtend Main Web Page. http://

www.eclipse.org/xtend/. Accessed: May 2016.

Xtext (2016). Xtext Main Web Page. http://

www.eclipse.org/Xtext/. Accessed: November 2016.

MODELSWARD 2017 - 5th International Conference on Model-Driven Engineering and Software Development

228