Translation of Relational and Non-relational Databases into RDF with

xR2RML

Franck Michel, Lo

ıc Djimenou, Catherine Faron-Zucker and Johan Montagnat

Univ. Nice Sophia Antipolis, CNRS, I3S, UMR 7271, Nice, France

Keywords:

Linked Data, RDF, R2RML, NoSQL.

Abstract:

With the growing amount of data being continuously produced, it is crucial to come up with solutions to

expose data from ever more heterogeneous databases (e.g. NoSQL systems) as linked data. In this paper

we present xR2RML, a language designed to describe the mapping of various types of databases to RDF.

xR2RML ﬂexibly adapts to heterogeneous query languages and data models while remaining free from any

speciﬁc language or syntax. It extends R2RML, the W3C recommendation for the mapping of relational

databases to RDF, and relies on RML for the handling of various data representation formats. We analyse data

models of several modern databases as well as the format in which query results are returned, and we show

that xR2RML can translate any data element within such results into RDF, relying on existing languages such

as XPath and JSONPath if needed. We illustrate some features of xR2RML such as the generation of RDF

collections and containers, and the ability to deal with mixed content.

1 INTRODUCTION

The web of data is now emerging through the pub-

lication and interlinking of various open data sets

in RDF. Initiatives such as the W3C Data Activity

and the Linking Open Data (LOD) project

aim at

Web-scale data integration and processing, assuming

that making heterogenous data available in a common

machine-readable format should create opportunities

for novel applications and services. Their success

largely depends on the ability to reach data from the

deep web (He et al., 2007), a part of the web content

consisting of documents and databases hardly linked

with other data sources and hardly indexed by stan-

dard search engines. Furthermore, the integration of

heterogeneous data sources is a major challenge in

several domains (Field et al., 2013). As data seman-

tics is often poorly captured in database schemas, or

encoded in application logics, data integration tech-

niques have to capture and expose database semantics

in an explicit and machine-readable manner.

The deep web keeps on growing as data is continu-

ously being accumulated in ever more heterogeneous

databases. In particular, NoSQL systems have gained

a remarkable success during recent years. Driven by

major web companies, they have been developed to

meet requirements of web 2.0 services, that relational

http://www.w3.org/2013/data/

http://linkeddata.org/

databases (RDB) could not achieve (ﬂexible schema,

high throughput, high availability, horizontal elastic-

ity on commodity hardware). Thus, NoSQL systems

should be considered as potential big contributors of

the linked open data. Other types of databases have

been developed over time, either for generic purpose

or speciﬁc domains, such as XML databases (no-

tably used in edition and digital humanities), object-

oriented databases or directory-based databases.

Signiﬁcant efforts have been invested in the def-

inition of methods to translate various kinds of data

sources into RDF. R2RML (Das et al., 2012), for in-

stance, is the W3C recommendation to describe RDB-

to-RDF mappings. RML extends R2RML for the in-

tegration of heterogeneous data formats (Dimou et al.,

2014a), but it does not address the constraints that

arise when dealing with different types of databases

and query languages. In particular, to our knowledge,

no method has been proposed yet to tackle NoSQL-

to-RDF translation.

In this paper, we present xR2RML, a mapping lan-

guage designed as an extension of R2RML and RML.

Besides relational databases, xR2RML addresses the

mapping of a large and extensible scope of non-

relational databases to RDF. It is designed to ﬂex-

ibly adapt to various data models and query lan-

guages. xR2RML can translate data with mixed for-

mats and generate RDF collections and containers.

Our primary focus includes some NoSQL and XML

native databases but the approach can equally apply

443

Michel F., Djimenou L., Faron-Zucker C. and Montagnat J..

Translation of Relational and Non-relational Databases into RDF with xR2RML.

DOI: 10.5220/0005448304430454

In Proceedings of the 11th International Conference on Web Information Systems and Technologies (WEBIST-2015), pages 443-454

ISBN: 978-989-758-106-9

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

to other types of database such as object-oriented and

directory-based databases.

In the rest of this section we draw a picture

of other works related to the translation of various

data sources to RDF, and we scope the objectives of

xR2RML. Section 2 explores in more details the capa-

bilities required to reach these goals. In section 3 we

summarize the main characteristics of R2RML and

RML, and in section 4 we describe xR2RML speciﬁc

extensions. Section 5 presents a working implemen-

tation of the language, ﬁnally sections 6 and 7 discuss

xR2RML applicability in different contexts and con-

cludes by outlining some perspectives.

1.1 Related Works

Wrapper-based data integration systems like Garlic

(Roth and Schwartz, 1997) and SQL/MED (Melton

et al., 2002) generally have similar architectures: a

global data model is described using speciﬁc mod-

elling languages (e.g. Garlic’s GDL), a query feder-

ation engine handles user queries expressed in terms

of a global data model and determines a query plan, a

per-data source wrapper implements a speciﬁc wrap-

per interface and performs the mapping with the data

source schema. No guideline is provided as to how a

wrapper should describe and implement the mapping.

The same global architecture holds in data inte-

gration systems based on semantic web technologies.

Existing works focus on efﬁcient query planning and

distribution, such as FedX (Schwarte et al., 2011),

Anapsid (Acosta et al., 2011) and KGRAM-DQP

(Gaignard, 2013). The global data model is expressed

by domain ontologies using common languages, e.g.

RDFS or OWL. User queries, expressed in terms

of the domain ontologies, are written in SPARQL.

SPARQL is also used as the wrapper interface. Each

data source wrapper is a SPARQL endpoint that per-

forms the schema mapping with the source schema.

Our work, as well as most related works listed be-

low, focuses on the mapping step: the rationale is to

standardize the schema mapping description, so that a

mapping description be written once and applied with

different wrapper implementations.

RDB-to-RDF mapping has been an active ﬁeld of

research during the last ten years (Spanos et al., 2012;

Sequeda et al., 2011; Michel et al., 2014b). Sev-

eral mapping methods and languages have been pro-

posed over time, based either on the materialization

of RDF data sets or on the SPARQL-based access

to relational data. Published in 2012, R2RML, the

W3C RDB-to-RDF mapping language recommenda-

tion, has reached a notable consensus

http://www.w3.org/2001/sw/rdb2rdf/wiki/Implementations

Similarly, various solutions exist to map XML

data to RDF. The XSPARQL (Bischof et al., 2012)

query language combines XQuery and SPARQL

for bidirectional transformations between XML and

RDF. Several other solutions are based on the XSLT

technology such as XML Scissor-lift (Fennell, 2014)

that describes mapping rules in Schematron XML val-

idation language, and AstroGrid-D (Breitling, 2009).

SPARQL2XQuery (Bikakis et al., 2013) applies XML

Schema to RDF/OWL translation rules.

Much work has already been accomplished re-

garding the translation of CSV, TSV and spreadsheets

to RDF. Tools have been developed such as XLWrap

(Langegger and W

oss, 2009) and RDF Reﬁne

. The

Linked CSV

format is a proposition to embed meta-

data in a CSV ﬁle, that make it easy to link on the Web

and eventually to translate to RDF or JSON. How-

ever this approach assumes that CSV data be made

compliant with the format in the ﬁrst place, before it

can be translated to RDF. The CSV on the Web W3C

Working Group

, created in 2014, intends to propose

a recommendation for the description of and access

to CSV data on the Web. In this context, RDF is one

of the formats envisaged either to represent metadata

about CSV data, or as a format to translate CSV data

into.

Several tools are designed as frameworks for the

integration of sources with heterogeneous data for-

mats. XSPARQL, cited above, provides an R2RML-

compliant extension. Thus it can simultaneously

translate relational, XML and RDF data to XML or

RDF. TARQL

is a SPARQL-based mapping lan-

guage that can convert from RDF, CSV/TSV and

JSON formats to RDF, but it does not focus on how

the data is retrieved from different types of databases.

Datalift (Scharffe et al., 2012) provides an integrated

set of tools for the publication in RDF of raw struc-

tured data (RDB, CSV, XML) and the interlinking of

resulting data sets.

RML (Dimou et al., 2014b; Dimou et al., 2014a)

is an extension of R2RML that tackles the mapping

of data sources with heterogeneous data formats such

as CSV/TSV, XML or JSON. Most approaches cre-

ate links between data sets after they were trans-

lated to RDF, e.g. using properties rdfs:seeAlso or

owl:sameAs. This is sometimes not adequate as log-

ical resources having different identiﬁers in different

data sets cannot easily be reconciled. RML creates

linked data sets at mapping time by enabling the si-

http://reﬁne.deri.ie/

http://jenit.github.io/linked-csv/

http://www.w3.org/2013/csvw/wiki

https://github.com/cygri/tarql/wiki/TARQL-Mapping-

Language

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

444

multaneous mapping of multiple data sources, thus

allowing for cross-references between resources de-

ﬁned in various data sources. However, RML does

not investigate the constraints that arise when deal-

ing with different types of databases. It proposes a

solution to reference data elements within query re-

sults using expressive languages such as XPath and

JSONPath. But it does not clearly distinguish be-

tween such languages and the actual query language

of a database. In some cases they might be the

same, e.g. XPath can be used to query an XML

native database, and later on to reference data ele-

ments from query results. But in the general case, the

query language and the language used to reference el-

ements within query results must be dissociated, e.g.

NoSQL document stores proprietary query languages,

while results are JSON documents that can be eval-

uated against JSONPath expressions. Furthermore,

RML explicitly refers to known evaluation languages

(ql:JSONPath, ql:XPath). In this context, support-

ing a new evaluation language requires to change the

mapping language deﬁnition. To achieve more ﬂex-

ibility, we believe that such characteristics should be

implementation-dependent, leaving the mapping lan-

guage free from any explicit dependency.

1.2 Objectives of this Work

The works presented in section 1.1 address various

types of data sources. Some of them could be ex-

tended to new data sources by developing ad-hoc ex-

tensions, although they are generally not designed to

easily support new data models and query languages.

Only RML comes with this ﬂexibility as its design

aims at adapting to new data models. Our goal with

xR2RML is to deﬁne a generic mapping language

able to equally apply to most common relational and

non-relational databases. We make a speciﬁc focus

on NoSQL and XML native databases, and we argue

that our work can be generalized to some other types

of database, for instance object-oriented and directory

(LDAP) databases. In section 2 we explore the capa-

bilities required by xR2RML to reach these goals.

2 xR2RML LANGUAGE

REQUIREMENTS

Different kinds of databases typically differ in several

aspects: the query language used to retrieve data, the

data model that underlies the data structures retrieved

and the cross-data referencing scheme, if any. Below

we explore in further details the capabilities that we

want xR2RML to provide.

Query Languages. The landscape of modern

database systems shows a vast diversity of query

languages. Relational databases generally support

ANSI SQL, and most native XML databases sup-

port XPath and XQuery. By contrast, NoSQL is

a catch-all term referring to very diverse systems

(Hecht and Jablonski, 2011; Gajendran, 2013). They

have heterogeneous access methods ranging from

low-level APIs to expressive query languages. De-

spite several propositions of common query lan-

guage (N1QL

, UnQL

, SQL++ (Ong et al., 2014),

ArangoDB QL

, CloudMdsQL (Kolev et al., 2014)),

no consensus has emerged yet, that would ﬁt most

NoSQL databases. Therefore, until a standard even-

tually arises, xR2RML must be agile enough to cope

with various query languages and protocols in a trans-

parent manner.

Data Models. Similarly to the case of query lan-

guages, we observe a large heterogeneity in data mod-

els of modern databases. To describe their translation

to RDF, a mapping language must be able to refer-

ence any data element from their data models. Below

we list most common data models, we shortly anal-

yse formats in which data is retrieved and ﬁgure out

how a mapping language can reference data elements

within retrieved data.

Relational databases comply with a row-based

model in which column names uniquely reference

cells in a row. NoSQL extensible column stores

also comply with the row-based model, with the dif-

ference that all rows do not necessarily share the

same columns. For such systems, referencing data

elements is simply achieved using column names.

Other non-relational systems, such as XML native

databases, NoSQL key-value stores, document stores

or graph stores, have heterogeneous data models that

can hardly be reduced to a row-based model:

- In databases relying on a speciﬁc data representa-

tion format like JSON (notably in NoSQL document

stores) and XML, data is stored and retrieved as doc-

uments consisting of tree-like compound values. Ref-

erencing data elements within such documents can be

achieved thanks to languages such as JSONPath and

XPath.

- Object-oriented databases conventionally provide

methods to serialize objects, typically as key-value as-

sociations: keys are attribute names while values are

objects (composition or aggregation relationship), or

compound values (collection, map, etc). Serialization

is typically done in XML or JSON, thus here again

http://www.couchbase.com/communities/n1ql

http://unql.sqlite.org/index.html

http://docs.arangodb.org/Aql/README.html

aka. column family store, column-oriented store, etc.

TranslationofRelationalandNon-relationalDatabasesintoRDFwithxR2RML

445

we can apply XPath or JSONPath expressions.

- A directory data model is organised as a tree: each

node has an identiﬁer and a set of attributes repre-

sented as name=value. Each entry retrieved from an

LDAP request is named using an LDAP path expres-

sion, e.g. cn=Franck Michel,ou=cnrs,o=fr. Refer-

encing data elements within such entries can be sim-

ply achieved using attribute names.

- In graph databases, the abstract data model basically

consists of nodes and edges. Query capabilities gen-

erally allow to retrieve either values matching certain

patterns (like the SPARQL SELECT clause), or a set

of nodes and edges representing a result graph (like

the SPARQL CONSTRUCT clause). Whatever the

type of result though, graph databases commonly pro-

vide APIs to manipulate query results. For instance a

SPARQL SELECT result set has a row-based format:

each row of a result set consists of columns typically

named after query variable names. The Neo4J graph

database provides a JDBC interface to process a query

result, and its REST interface returns result graphs as

JSON documents. Thus, although a graph may be a

somehow complex data structure, query results can be

fairly easy to manipulate using well-known formats:

a row-column model, a serialization in JSON or some

other representation syntax, etc.

Finally, the way a mapping language can refer-

ence data elements within query results depends more

on the API capabilities than the data model itself. To

be effective, xR2RML must transparently accept any

type of data element reference expression. This in-

cludes a column name (applicable not only to row-

based data models but also to any row-based query

result), JSONPath, XPath or LDAP path expressions,

etc. An xR2RML processing engine must be able to

evaluate such expressions against query results, but

the mapping language itself must remain free from

any reference to speciﬁc expression syntaxes.

Collections. Many data models support the represen-

tation of collections: these can be sets, arrays or maps

of all kinds (sorted or not sorted, with or without du-

plicates, etc.). Although the RDF data model supports

such data structures, to the best of our knowledge, ex-

isting mapping languages do not allow for the produc-

tion of RDF collections (rdf:List) nor RDF contain-

ers (rdf:Bag, rdf:Seq, rdf:Alt), except TARQL that

is able to convert a JSON array into an rdf:List. In

all other cases, structured values such as collections

or key-value associations are ﬂattened into multiple

RDF triples. Listing 1 is an example XML collection

consisting of two “movie” elements.

Its translation into two triples is illustrated in Listing

2. Assuming that the order of “movie” elements im-

plicitly represents the chronological order in which

movies were shot, triples in Listing 2 lose this infor-

mation. Using an RDF sequence may be more appro-

priate in this case, as illustrated in Listing 3.

< director name =" Woody Allen " >

< movie > Annie Hall </ mo vie >

< movie > Manha ttan </ movie >

</ dir ecto r >

Listing 1: Example of XML collection.

< h ttp :// example . org / dir / W oody \%2 0 Alle n >

ex : d i r e c t e d " Ann i e Hall ".

< h ttp :// example . org / dir / W oody \%2 0 Alle n >

ex : d i r e c t e d " M a n h a t t a n ".

Listing 2: Translation to multiple RDF triples.

< h ttp :// example . org / dir / W oody \%2 0 Alle n >

ex : m o v i eList [ a rdf : S eq ;

rdf : _1 " A nnie H all ";

rdf : _2 " M a n h a t tan " ].

Listing 3: Translation to an RDF sequence.

Consequently, to map heterogeneous data to RDF

while preserving concepts such as collections, bags,

alternates or sequences, xR2RML must be able to

map data elements to RDF collections and containers.

Cross-references. Cross-references are commonly

implemented as foreign key constraints in relational

data models, or aggregation and composition relation-

ships in object-oriented models. Cross-referencing

is even the primary goal of graph-based databases.

More generally, it is possible to cross-reference log-

ical entities in any type of database. For instance, a

JSON document of a NoSQL document store may re-

fer to another document by its identiﬁer or any other

ﬁeld that identiﬁes it uniquely, even if this is generally

not recommended for the sake of performances.

A cross-referenced logical resource may be

mapped alternatively as the subject or the object of

triples. This may entail joint queries between tables

or documents. Therefore, xR2RML must (i) allow a

modular description so that the mapping of a logical

resource can be written once and easily reused as a

subject or an object, and (ii) allow the description of

joint queries to retrieve cross-referenced logical re-

sources.

Summary. Finally, we draw up the list of key capa-

bilities expected from xR2RML as follows:

1. It enables to describe the mapping of various rela-

tional and non-relational databases to RDF.

2. It is ﬂexible enough to allow for new databases,

query languages and data models in an agile manner:

supporting a new system, query language and/or data

model only requires changes in the implementation

(adaptor, plug-in, etc.), but no changes are required in

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

446

the mapping language itself.

3. It enables to generate RDF collections (rdf:List)

or containers (rdf:Seq, rdf:Bag, rdf:Alt) from one-

to-many relations modelled as compound values or as

cross-references. RDF collections and containers can

be nested.

4. It enables to perform joint queries following cross-

references between logical resources, and it allows the

modular reuse of mapping deﬁnitions.

Taken the other way round, data sources to be

mapped to RDF using xR2RML need to fulﬁl some

requirements entailed by xR2RML’s capabilities:

1. The data source interface should provide a declar-

ative query language. If not, it must be possible to

fetch the whole data at once, like a CSV or XML ﬁle

returned by a Web service.

2. There must exist technical means to parse query

results and reference data elements: this ranges from

simple column names to expressive languages like

XMLPath.

3. In case of large data sets, the database interface

should provide ways to iterate on query results, simi-

larly to SQL cursors in RDBs.

Notice that the last two requirements are quite natural

features of most decent database systems.

To help in the design of xR2RML we chose to

leverage R2RML, a standard, well-adopted mapping

language for relational databases. R2RML already

provides some of the requirements listed above: mod-

ularity, management of cross-references, as well as

rich features such as the ability to deﬁne target named

graphs. To facilitate its understanding and adop-

tion, xR2RML is designed as a backward compatible

extension of R2RML. Besides, to address the map-

ping of heterogeneous data formats such as CSV/TSV,

XML and JSON, we leverage propositions of RML

that is itself an extension of R2RML.

3 R2RML AND RML

R2RML is a generic language meant to describe cus-

tomized mappings that translate data from a relational

database into an RDF data set. An R2RML map-

ping is expressed as an RDF graph written in Tur-

tle syntax

. An R2RML mapping graph consists

of triples maps, each one specifying how to map

rows of a logical table to RDF triples. A triples

map is composed of exactly one logical table (prop-

erty rr:logicalTable), one subject map (property

rr:subjectMap) and any number of predicate-object

maps (property rr:predicateObjectMap). A logi-

http://www.w3.org/TR/turtle/

cal table may be a table, an SQL view (property

rr:tableName), or the result of a valid SQL query

(property rr:sqlQuery). A predicate-object map con-

sists of predicate maps (property rr:predicateMap)

and object maps (property rr:objectMap). For each

row of the logical table, the subject map generates

a subject IRI, while each predicate-object map cre-

ates one or more predicate-object pairs. Triples are

produced by combining the subject IRI with each

predicate-object pair. Additionally, triples are gener-

ated either in the default graph or in a named graph

speciﬁed using graph maps (property rr:graphMap).

Subject, predicate, object and graph maps are all

R2RML term maps. A term map is a function that

generates RDF terms (either a literal, an IRI or a

blank node) from elements of a logical table row. A

term map must be exactly one of the following: a

constant-valued term map (property rr:constant) al-

ways generates the same value; a column-valued term

map (property rr:column) produces the value of a

given column in the current row; a template-valued

term map (property rr:template) builds a value from

a template string that references columns of the cur-

rent row.

When a logical resource is cross-referenced, typi-

cally by means of a foreign key relationship, it may

be used as the subject of some triples and the ob-

ject of some others. In such cases, a referencing ob-

ject map uses IRIs produced by the subject map of a

(parent) triples map as the objects of triples produced

by another (child) triples map. In case both triples

maps do not share the same logical table, a joint

query must be performed. A join condition (property

rr:joinCondition) names the columns from the par-

ent and child triples maps, that must be joined (prop-

erties rr:parent and rr:child).

Below we provide a short illustrative exam-

ple. Triples map <#R2RML Directors> uses table

DIRECTORS to create triples linking movie directors

(whose IRIs are built from column NAME) with their

birth date (column BIRTH DATE).

<# R 2 R ML_Directors >

rr:logicalTable [

rr:tableName " D I RECTORS "; ];

rr:subjectMap [ rr:template

" htt p :// e x a m p l e . or g / dir /{ NAME }"; ];

rr:predicateObjectMap [

rr:predicate ex : b i t h d a t e ;

rr:objectMap [

rr:column " B IRTH_DA T E ";

rr:datatype xsd : d ate ; ]; ].

RML is an extension of R2RML that targets

the simultaneous mapping of heterogeneous data

sources with various data formats, in particular hi-

erarchical data formats. An RML logical source

TranslationofRelationalandNon-relationalDatabasesintoRDFwithxR2RML

447

Collec t i o n " di r e c t o r s ":

{" nam e ": " Woody A l len " , " directed ": [" Manhatta n " , " I n t eriors "]} ,

{" nam e ": " Won g Kar - wai ", " d i r e c t e d ": [ "204 6" , " In t he M ood for Love "]}

Collec t i o n " movie s ":

{ " d ecade ": " 2 000 s " , " m o v ies ": [

{" nam e ": " 2046 " , " code ": " m 2 046 " , " actors ": [" T . L e ung " , "G . Li "]} ,

{" nam e ": " In the Mood fo r Lo ve " , " cod e ": " Mood ", " a c t o rs ": [" M . Cheung "]} ] }

{ " d ecade ": " 1 970 s ": , " movi e s ": [

{" nam e ": " M a n h a t t a n " , " co de ": " Man h " , " a c tors ": [" W oody Al l en " , " Diane K e a t on "]}

{" nam e ": " I n t e r i o r s " , " co de ": " Int01 " , " a c t o r s ": [" D . Keaton " , "G . Page "]} ] }

Listing 4: Example Database.

(property rml:logicalSource) extends R2RML log-

ical table and points to the data source (prop-

erty rml:source): this may be a ﬁle on the lo-

cal ﬁle system, or data returned from a web ser-

vice for instance. A reference formulation (property

rml:referenceFormulation) names the syntax used to

reference data elements within the logical source. As

of today, possible values are ql:JSONPath for JSON

data, ql:XPath for XML data, and rr:SQL2008 for re-

lational databases. Data elements are referenced with

property rml:reference that extends rr:column. Its

object is an expression whose syntax matches the ref-

erence formulation. Similarly, the deﬁnition of prop-

erty rr:template is extended to allow such reference

expressions to be enclosed within curly braces (’{’

and ’}’). Below we provide an RML example. It is

very similar to the R2RML example above, with the

difference that data now comes from a JSON ﬁle “di-

rectors.json”.

<# RML_Direct o r s >

rml:logicalSource [

rml:source " d i rectors . j s on ";

rml:referenceFormulation ql:JSONPath;

rml:iterator "$ .* "; ];

rr:subjectMap [ rr:template

" htt p :// e x a m p l e . or g / dir /{ $ .*. n ame }";

];

rr:predicateObjectMap [

rr:predicate ex : b i t h d a t e ;

rr:objectMap [

rml:reference "$ . *. bithdate ";

rr:datatype xsd : d ate ; ]; ].

4 THE xR2RML MAPPING

LANGUAGE

In this section we breiﬂy describe the elements of the

xR2RML language. A complete speciﬁcation is pro-

vided in (Michel et al., 2014a). We illustrate the de-

scriptions with a running example: Listing 4 shows

JSON documents stored in a MongoDB database, in

two collections: a “directors” collection with doc-

uments on movie directors, and a “movies” collec-

tion in which movies are grouped in per-decade doc-

uments. Listing 5 shows an xR2RML mapping graph

to translate those documents into RDF. Director IRIs

are built using director names, while IRIs of resources

representing movies use movie codes. We assume the

following namespace preﬁx deﬁnitions (the @predfix

key word is not displayed for readability):

xrr: <http://www.i3s.unice.fr/ns/xr2rml#>.

rr: <http://www.w3.org/ns/r2rml#>.

rml: <http://semweb.mmlab.be/ns/rml#>.

xsd: <http://www.w3.org/2001/XMLSchema#>.

ex: <http://example.com/ns#>.

4.1 Describing a Logical Source

To reach its genericity objective, xR2RML must avoid

explicitly referring to speciﬁc query languages or data

models. Keeping this in mind, we deﬁne logical

sources as a mean to represent a data set from any

kind of database. In conformance with R2RML prin-

ciples, we keep database connection details out of the

scope of the mapping language. In RML on the other

hand, a logical source points to the data to be mapped

typically using a ﬁle URL (property rml:source).

This difference makes it difﬁcult for xR2RML to

extend RML’s logical source concept. Instead,

xR2RML extends the R2RML logical source while

commonalities are addressed by using or extending

some RML properties (rml:referenceFormulation,

rml:query, rml:iterator).

xR2RML triples maps extend R2RML triples

maps by referencing a logical source (property

xrr:logicalSource) which is the result of a request

applied to the input database. It is either an xR2RML

base table or an xR2RML view. The xR2RML base

table extends the concept of R2RML table or view to

tabular databases beyond relational databases (exten-

sible column store, CSV/TSV, etc.). It refers to a table

by its name (property rr:tableName). An xR2RML

view represents the result of executing a query against

the input database. It has exactly one xrr:query prop-

erty that extends RML property rml:query (which it-

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

448

<# Movies >

xrr:logicalSource [

xrr:query "db . m o v i e s . fi nd ({ decad e :{ $ex i s t s : true } } )";

rml:iterator "$ . m ovies .*" ;

];

rr:subjectMap [ rr:template " ht tp :/ / e x a m p le . o rg / movie /{ $ . code }"; ];

rr:predicateObjectMap [

rr:predicate ex : s t a r r i n g ;

rr:objectMap [

rr:termType xrr:RdfBag;

xrr:reference "$ . a ctors .*" ;

xrr:nestedTermMap [ rr:datatype xsd : strin g ; ]; ]; ].

<# Dir e ctors >

xrr:logicalSource [ xrr:query " db . direc t o r s . fin d ()"; ];

rr:subjectMap [ rr:template " ht tp :/ / e x a m p le . o rg / di r /{ $ . n ame }"; ];

rr:predicateObjectMap [

rr:predicate ex : d i r e c t e d ;

rr:objectMap [

rr:parentTriplesMap <# M ovies >;

rr:join [

rr:child "$ . d i r e c t e d .*";

rr:parent "$ . nam e ";

]; ]; ].

Listing 5: xR2RML Example Mapping Graph.

self extends rr:sqlQuery

). Its value is a valid ex-

pression with regards to the query language supported

by the input database. No assumption is made what-

soever as to the query language used.

Reference Formulation. Retrieving values from a

query result set requires evaluating data element refer-

ences against the query result, as discussed in section

1.2. Relational database APIs (such as JDBC drivers)

natively support the evaluation of a column name

against the current row of a result set. Conversely,

some databases come with simple APIs that provide

low-level evaluation features. For instance, APIs of

most NoSQL document stores return JSON docu-

ments but hardly support JSONPath. Therefore, the

responsibility of evaluating data element references

may fall back on the xR2RML processing engine. To

do so, it needs to know which syntax is being used.

To this end, RML introduced the reference formula-

tion concept (property rml:referenceFormulation of

a logical source) to name the syntax of data element

references. As underlined above, xR2RML adheres to

R2RML’s principle that database-speciﬁc details are

out of the scope of the mapping language. Moreover,

we want the mapping language to remain free from

explicit reference to speciﬁc syntaxes. As a result, we

amend the R2RML processor deﬁnition as follows: an

xR2RML processor must be provided with a database

rml:query also subsumes rml:xmlQuery and

rml:queryLanguage, although none of those proper-

ties are described or exempliﬁed in the RML language

speciﬁcation and articles at the time of writing.

connection and the reference formulation applicable

to results of queries run against the connection. If the

reference formulation is not provided, it defaults to

column name, in order to ensure backward compati-

bility with R2RML.

Iteration Model. In R2RML, the row-based itera-

tion occurs on a set of rows read from a logical ta-

ble. xR2RML applies this principle to other systems

returning row-based result sets: CSV/TSV ﬁles, ex-

tensible column stores, but also some graph databases

as underlined in 1.2, e.g. a SPARQL SELECT result

set is a table in which columns are named after the

variables in the SELECT clause. In the context of

non row-based result sets, the model is implicitly ex-

tended to a document-based iteration model: a docu-

ment is basically one entry of a result set returned by

the database, e.g. a JSON document retrieved from

a NoSQL document store, or an XML document re-

trieved from an XML native database. In the case of

data sources whose access interface does not provide

built-in iterators, e.g. a web service returning an XML

response at once, then a single iteration occurs on the

whole retrieved document.

Yet, some speciﬁc needs may not be fulﬁlled. For

instance, it may be needed to iterate on explicitly

speciﬁed entries of a JSON document or elements of

an XML tree. To this end, we leverage the concept

of iterator introduced in RML. An iterator (property

rml:iterator) speciﬁes the iteration pattern to apply

to data read from the input database. Its value is a

valid expression written using the syntax speciﬁed in

TranslationofRelationalandNon-relationalDatabasesintoRDFwithxR2RML

449

the reference formulation. The iterator can be either

omitted or empty when the reference formulation is a

column name.

Listing 5 presents two logical source deﬁnition ex-

amples. Both consist of a MongoDB query (property

xrr:query). We assume that the JSONPath reference

formulation is provided along with the database con-

nection. In collection “directors” (Listing 4), each

document describes exactly one director. By contrast,

in collection “movies” each document refers to sev-

eral movies grouped by decade. To avoid mixing up

multiple movies of a single document, an iterator with

JSONPath expression $.movies.* is associated with

triples map <#Movies>: thus, the triples map applies

separately on each movie of each document.

4.2 Referencing Data Elements

In section 3 we have seen that RML properties

rml:reference and rr:template both allow data el-

ement references expressed according to the refer-

ence formulation (column name, XPath, JSONPath).

xR2RML uses these RML deﬁnitions as a starting

point to a broader set of use cases.

In real world use cases, databases commonly store

values written in a data format that they cannot inter-

pret. For instance, in key-value stores and in most

extensible column stores, values are stored as binary

objects whose content is opaque to the system. A de-

veloper may choose to embed JSON, CSV or XML

values in the column of a relational table, for perfor-

mance issues or due to application design constraints.

We call such cases mixed content.

xR2RML proposes to apply the principle of data

element references deﬁned in RML, and extend it to

allow referencing data elements within mixed con-

tent. An xR2RML mixed-syntax path consists of the

concatenation of several path expressions, each path

being enclosed in a syntax path constructor that ex-

plicits the path syntax. Existing constructors are:

Column(), CSV(), TSV(), JSONPath() and XPath().

For example, in a relational table, a text column

NAME stores JSON-formatted values containing peo-

ple’s ﬁrst and last names, e.g.: {"First":"John",

"Last":"Smith"}. Field FirstName can be ref-

erenced with the following mixed-syntax path:

Column(NAME)/JSONPath($.First). An xR2RML

processing engine evaluates a mixed-syntax path from

left to right, passing the result of each path construc-

tor on to the next one. In this example, the ﬁrst path

retrieves the value associated with column NAME. Then

the value is passed on to the next path constructor that

evaluates JSONPath expression “$.First” against the

value. The resulting value is ﬁnally translated into an

RDF term according to the current term map deﬁni-

tion.

xR2RML deﬁnes property xrr:reference as an

extension of RML property rml:reference, and ex-

tends the deﬁnition of property rr:template. Both

properties accept either simple references (illustrated

in Listing 5) or mixed-syntax path expressions.

4.3 Producing RDF Terms and (Nested)

RDF Collections/Containers

In a row-based logical source, a valid column name

reference returns zero or one value during each triples

map iteration. In turn an R2RML term map gener-

ates zero or one RDF term per iteration. By con-

trast, JSONPath and XPath expressions used with

properties xrr:reference and rr:template allow ad-

dressing multiple values. For instance, XPath expres-

sion //movie/name returns all <name> elements of all

<movie> elements. Therefore, reference-valued and

template-valued term maps can return multiple RDF

terms at once. This change entails the deﬁnition of

two strategies with regards to how triples maps com-

bine RDF terms to build triples: the Cartesian product

strategy, and the collection/container strategy.

Cartesian Product Strategy. During each iteration

of an xR2RML triples map, triples are generated as

the Cartesian product between RDF terms produced

by the subject map and each predicate-object pair.

Predicate-object pairs result of the Cartesian prod-

uct between RDF terms produced by the predicate

maps and object maps of each predicate-object map.

Like any other term map, a graph map may also pro-

duce multiple terms. The Cartesian product strategy

equally applies in that case, therefore triples are pro-

duced simultaneously in all target graphs correspond-

ing to the multiple RDF terms produced by the graph

map.

Collection/Container Strategy. Multiple values re-

turned by properties xrr:reference and rr:template

are combined into an RDF collection or container.

This is achieved using new xR2RML values of

the rr:termType property: a term map with term

type xrr:RdfList generates an RDF term of type

rdf:List, term type xrr:RdfSeq corresponds to

rdf:Seq, xrr:RdfBag to rdf:Bag and xrr:RdfAlt to

rdf:Alt. Listing 5 illustrates this use case. Instead

of generating multiple triples relating each movie to

one actor, triples map <#Movies> relates each movie

to a bag of actors starring in that movie. For instance:

<http://example.org/movie/m2046> ex:starring [

a rdf:Bag;

rdf: 1 "Tony Leung"; rdf: 2 "Gong Li" ].

At this point, two important needs must still be

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

450

addressed in the collection/container strategy: (i) like

in a regular term map, it must be possible to assign

a term type, language tag or data type to the mem-

bers of an RDF collection or container; and (ii) it

must be possible to nest any number of RDF collec-

tions and containers inside each-other. Both needs

are fulﬁlled using xR2RML Nested Term Maps. A

nested term map (property xrr:nestedTermMap) very

much resembles a regular term map, with the excep-

tion that it can be deﬁned only in the context of a term

map that produces RDF collections or containers.

In a column-valued or reference-valued term map, a

nested term map describes how to translate values

read from the logical source into RDF terms, by spec-

ifying optional properties rr:termType, rr:language

and rr:datatype. Similarly, in a template-valued

term map, a nested term map applies to values pro-

duced by applying the template string to input values.

Listing 5 illustrates the usage of nested term maps by

the production of bags of literals representing movie

names: the nested term map assigns each movie name

an xsd:string datatype. For instance:

<http://example.org/movie/m2046> ex:starring [

a rdf:Bag;

rdf: 1 "Tony Leung"ˆˆxsd:string;

rdf: 2 "Gong Li"ˆˆxsd:string ].

Finally, properties xrr:reference and rr:template

can be used within a nested term map to recursively

parse structured values while producing nested RDF

collections and containers.

4.4 Reference Relationships Between

Logical Sources

A cross-referenced logical resource usually serves as

the subject of some triples and the object of other

triples. In R2RML, this is achieved using a refer-

encing object map. xR2RML extends R2RML ref-

erencing object maps in two ways. Firstly, when a

joint query is needed (i.e. the parent and chlid triples

map do not share the same logical source), properties

rr:child and rr:parent of the join condition con-

tain data element references (4.2), possibly includ-

ing mixed-syntax paths. As underlined in section

4.3, such data element references may produce multi-

ple terms. Consequently, the equivalent joint query

of a referencing object map must deal with multi-

valued child and parent references. More precisely,

a join condition between two multi-valued references

should be satisﬁed if at least one data element of the

child reference matches one data element of the par-

ent reference. This is described in Deﬁnition 1 using

an SQL-like syntax and ﬁrst order logic for the de-

scription of WHERE conditions.

Deﬁnition 1: If a referencing object map has at

least one join condition, then its equivalent joint query

is:

SELECT * FROM (child-query) AS child,

(parent-query) AS parent WHERE

∃c1 ∈ eval(child, {child-ref1}),

∃p1 ∈ eval(parent, {parent-ref1}), c1 = p1

AND

∃c2 ∈ eval(child, {child-ref2}),

∃p2 ∈ eval(parent, {parent-ref2}), c2 = p2

AND ...

where “{child-reﬁ}” and “{parent-reﬁ}” are the child

and parent references of the i

join condition, and

“eval(child, {ref})” and “eval(parent, {ref})” are the

result of evaluating data element reference “{ref}”

on the result of the child and parent queries.

Listing 5 depicts a simple example: in triples

map <#Directors>, the object map uses movie IRIs

generated by parent triples map <#Movies>. When

processing director “Wong Kar-wai”, the child

reference ($.directed.*) returns values “2046” and

“In the Mood for Love”, while the parent reference

($.name) returns a single movie name. The condition

is satisﬁed if the parent reference returns one of

“2046” or “In the Mood for Love”. Generated triples

use movie codes to build movie IRIs, such as:

<http://example.org/dir/Wong%20Kar-wai>

ex:directed <http://example.org/movie/m2046>.

Secondly, the objects produced by a referencing

object map can be grouped in an RDF collection

or container, instead of being the objects of multi-

ple triples. To do so, an xR2RML referencing ob-

ject map may have a rr:termType property with value

xrr:RdfList, xrr:RdfSeq, xrr:RdfBag or xrr:RdfAlt.

Results of the joint query are grouped by child value,

i.e. objects generated by the parent triples map, refer-

ring to the same child value, are grouped as members

of an RDF collection or container. An interesting con-

sequence of this use case is the ability, in the case of a

regular relational database, to build an RDF collection

or container reﬂecting a one-to-many relation.

5 IMPLEMENTATION AND

EVALUATION

To evaluate the effectiveness of xR2RML, we have

developed an open source prototype implementation

available on Github

. It is developed in Scala and

based on Morph-RDB (Priyatna et al., 2014), an

https://github.com/frmichel/morph-xr2rml/

TranslationofRelationalandNon-relationalDatabasesintoRDFwithxR2RML

451

R2RML implementation that we have extended to

support xR2RML speciﬁcities.

In a ﬁrst step, we upgraded Morph-RDB to sup-

port xR2RML features in the context of relational

databases. This included the support of logical

sources, mixed contents (JSON, XML, CSV or TSV

data embedded in cells) and RDF collections/contain-

ers. In a second step, we developed a connector to

the MongoDB document store, to translate MongoDB

JSON documents into RDF. A MongoDB shell query

string is speciﬁed in each triples map logical source

(property xrr:query). The connector executes the

query and iterates over result documents returned by

the database. Subsequently, results are passed to the

xR2RML processor that applies the optional iterator

(rml:iterator) and evaluates JSONPath expressions

in each xrr:reference and rr:template property of

all term maps. The support of RDF collections and

containers was validated, in particular in the case of

cross-references (referencing object map) that entail

a joint query between two JSON documents.

Software Architecture. The prototype architecture

derives from the initial Morph-RDB architecture. To

deal with the heterogeneity of databases, Morph fol-

lows the object factory design pattern. An abstract

runner factory class provides abstract methods to

build a runner, the core object that performs the

translation of an input database with regards to an

xR2RML mapping graph. A concrete runner fac-

tory class copes with database speciﬁcities through

a set of objects: (i) a generic connection wraps a

database connection; (ii) a query unfolder builds a

concrete query object reﬂecting each deﬁned triples

map; (iii) a data translator runs the query against the

database connection and generates triples according

to the triples map deﬁnitions; (iv) ﬁnally, a data mate-

rializer writes created triples into a target ﬁle accord-

ing to the chosen RDF serialization.

In the current status we provide two factory im-

plementations: the RDB implementation extends the

original Morph-RDB code, while the new MongoDB

implementation relies on the MongoDB API and the

Jongo API for the management of MongoDB shell

queries. In the RDB context, the unfolder builds

an SQL query from the table name (logical table

deﬁnition), named columns (propertiesrr:column and

rr:template) and the optional join conditions (ref-

erencing object maps). In the MongoDB case, the

query string is provided in the mapping. Further-

more, since MongoDB does not support joint queries,

the xR2RML processing engine has to perform two

queries and join results afterwards. As a result, the

unfolder is fairly simple, it checks the query string

correctness and returns an appropriate API object.

Evaluation. We evaluated the prototype using two

simple databases: a MySQL relational database and

a MongoDB database with two collections. In both

cases, the data and associated xR2RML mappings

were written to cover most mapping situations ad-

dressed by xR2RML: strategies for handling multi-

ple RDF terms, JSONPath and XPath expressions,

mixed-syntax paths with mixed contents (relational,

JSON, XML, CSV/TSV), cross-references, produc-

tion of RDF collection/containers, management of

UTF-8 characters. A dump of both databases as well

as the example mappings are available on the same

GitHub repository. The current status of the prototype

applies the data materialization approach, i.e. RDF

data is generated by sequentially applying all triples

maps. The query rewriting approach (SPARQL to

database speciﬁc query rewriting) may be considered

in future works as suggested in section 7. At the time

of writing the prototype has two limitations: (i) only

one level of RDF collections and containers can be

generated (no nested collections/containers), and (ii)

the result of a joint query in a relational database can-

not be translated into an RDF collection or container.

6 DISCUSSION

xR2RML relies on the assumption that databases

to translate into RDF provide a declarative query

language, such that queries can be expressed di-

rectly in a mapping description. This complies with

the equivalent assumption of R2RML that all RDBs

support ANSI SQL. However this is somehow re-

strictive. Some NoSQL key-value stores, like Dy-

namoDB and Riak, have no declarative query lan-

guage, instead they provide APIs for usual program-

ming languages to describe queries in an impera-

tive manner. For xR2RML to work with those sys-

tems, a query language should be ﬁgured out along

with a compiler that transforms queries into imper-

ative code. Interestingly, this is already the case of

some systems supporting the MapReduce program-

ming model. MapReduce is conventionally supported

through APIs for programming languages, however

more and more systems now propose an SQL or SQL-

like query language on top of a MapReduce frame-

work (e.g. Apache Hive). Queries are compiled into

MapReduce jobs. This approach is often referred to

as SQL-on-Hadoop (Floratou et al., 2014).

To achieve the targeted ﬂexibility, xR2RML

comes with features that are applicable independently

of the type of database used. Nevertheless, all fea-

tures should probably not be applied with all kinds

of database. For instance, join conditions entail joint

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

452

queries. Whereas RDBs are optimized to support

joins very efﬁciently, it is not recommended to make

cross-references within NoSQL document or exten-

sible column stores, as this may lead to poor perfor-

mances. Similarly, translating a JSON element into

an RDF collection is quite straightforward, but trans-

lating the result of an SQL joint query into an RDF

collection is likely to be quite inefﬁcient. In other

words, because the language makes a mapping possi-

ble does not mean that it should be applied regardless

of the context (database type, data model, query ca-

pabilities). Consequently, mapping designers should

be aware of how databases work in order to write ef-

ﬁcient mappings of big databases to RDF.

Like R2RML, xR2RML assumes that well-

deﬁned domain ontologies exist beforehand, whereof

classes and properties will be used to translate a data

source into RDF triples. In the context of RDBs, an

alternative approach, the Direct Mapping, translates

relational data into RDF in a straightforward man-

ner, by converting tables to classes and columns to

properties (Sequeda et al., 2011; Arenas et al., 2012).

The direct mapping comes up with an ad-hoc on-

tology that reﬂects the relational schema. R2RML

implementations often provide a tool to automati-

cally generate an R2RML direct mapping from the

relational schema (e.g. Morph-RDB (de Medeiros

et al., 2015)). The same principles could be extended

to automatically generate an xR2RML mapping for

other types of data source, as long as they comply

with a schema: column names in CSV/TSV ﬁles and

extensible column stores, XSD or DTD for XML

data, JSON schema

or a JSON-LD

description for

JSON data. Nevertheless, such schemas do not neces-

sarily exist, and some databases like the DynamoDB

key-value store are schemaless. In such cases, au-

tomatically generating an xR2RML direct mapping

should involve different methods aimed at learning

the database schema from the data itself.

More generally, how to automate the generation

of xR2RML mappings may become a concern to map

large and/or complex schemas. There exists signiﬁ-

cant work related to schema mapping and matching

(Shvaiko and Euzenat, 2005). For instance, Clio (Fa-

gin et al., 2009) generates a schema mapping based

on the discovery of queries over the source and tar-

get schemas and a speciﬁcation of their relationships.

Karma (Knoblock et al., 2012) semi-automatic maps

structured data sources to existing domain ontolo-

gies. It produces a Global-and-Local-As-View map-

ping that can be used to translate the data into RDF.

xR2RML does not directly address the question of

http://json-schema.org/

http://www.w3.org/TR/json-ld/

how mappings are written, but can be complemen-

tary of approaches like Clio and Karma. In particu-

lar, Karma authors suggest that their tool could easily

export mapping rules as an R2RML mapping graph.

A similar approach could be applied to discover map-

pings between a non-relational database and domain

ontologies, and export the result as an xR2RML map-

ping graph.

7 CONCLUSION AND

PERSPECTIVES

In this paper we have presented xR2RML, a language

designed to describe the mapping of various types of

databases to RDF, by ﬂexibly adapting to heteroge-

neous query languages and data models. We have

analysed data models of several modern databases as

well as the format in which query results are returned,

and we have shown that xR2RML can translate any

data element within such results into RDF, relying

when necessary on existing languages such as XPath

and JSONPath. We have illustrated some features of

xR2RML such as the generation of RDF collections

and containers, and the ability to deal with mixed con-

tent, e.g. when a relational table stores data formatted

in another syntax like XML, JSON or CSV.

Principles of the xR2RML mapping language

have been validated in a prototype implementation

supporting several RDBs and the MongoDB NoSQL

document store. The development of connectors to

other types of database shall be considered based on

concrete use cases. Depending on the target system,

different optimizations shall be studied, notably re-

garding the computation of joint queries. Further-

more, the data materialization approach we imple-

mented is effective but it does not scale to big data

sets. Dealing with big data sets requires the data to re-

main in legacy databases, and that translation to RDF

be performed on demand through the xR2RML-based

rewriting of SPARQL queries into the source database

query language. In this regard, existing works related

to RDBs should be leveraged (Priyatna et al., 2014;

Sequeda and Miranker, 2013).

REFERENCES

Acosta, M., Vidal, M., Lampo, T., Castillo, J., and Ruck-

haus, E. (2011). ANAPSID: an adaptive query pro-

cessing engine for SPARQL endpoints. In Proc. of

ISWC’11, pages 18–34.

Arenas, M., Bertails, A., Prud’hommeaux, E., and Sequeda,

J. (2012). A direct mapping of relational data to RDF.

TranslationofRelationalandNon-relationalDatabasesintoRDFwithxR2RML

453

Bikakis, N., Tsinaraki, C., Stavrakantonakis, I., Gi-

oldasis, N., and Christodoulakis, S. (2013). The

SPARQL2XQuery interoperability framework. CoRR,

abs/1311.0536.

Bischof, S., Decker, S., Krennwallner, T., Lopes, N., and

Polleres, A. (2012). Mapping between RDF and

XML with XSPARQL. Journal on Data Semantics,

1(3):147–185.

Breitling, F. (2009). A standard transformation from XML

to RDF via XSLT. Astronomical Notes, 330:755.

Das, S., Sundara, S., and Cyganiak, R. (2012). R2RML:

RDB to RDF mapping language.

de Medeiros, L. F., Priyatna, F., and Corcho, O. (2015).

MIRROR: Automatic R2RML mapping generation

from relational databases. In Subm. to ICWE 2015.

Dimou, A., Sande, M. V., Slepicka, J., Szekely, P., Man-

nens, E., Knoblock, C., and Walle, R. V. d. (2014a).

Mapping hierarchical sources into RDF using the

RML mapping language. In Proc. of ICSC’2014,

pages 151–158. IEEE.

Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R.,

Mannens, E., and Van de Walle, R. (2014b). RML: A

generic language for integrated RDF mappings of het-

erogeneous data. In Proc. of the 7th LDOW workshop.

Fagin, R., Haas, L. M., Hernndez, M., Miller, R. J., Popa,

L., and Velegrakis, Y. (2009). Clio: Schema mapping

creation and data exchange. In Conceptual Modeling:

Foundations and App., pages 198–236. Springer.

Fennell, P. (2014). Schematron - more useful than you’d

thought. In Proc. of the XML London 2014 Confer-

ence, pages 103–112.

Field, L., Suhr, S., Ison, J., Wittenburg, P., Los, W., Broeder,

D., Hardisty, A., Repo, S., and Jenkinson, A. (2013).

Realising the full potential of research data: common

challenges in data management, sharing and integra-

tion across scientiﬁc disciplines.

Floratou, A., Minhas, U. F., and Ozcan, F. (2014). Sql-on-

hadoop: Full circle back to shared-nothing database

architectures. Proc. of the VLDB Endowment, 7(12).

Gaignard, A. (2013). Distributed knowledge sharing and

production through collaborative e-science platforms.

PhD thesis.

Gajendran, S. K. (2013). A survey on NoSQL databases

(technical report).

He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. (2007).

Accessing the deep web. Communications of the

ACM, 50(5):94–101.

Hecht, R. and Jablonski, S. (2011). NoSQL evaluation: A

use case oriented survey. In Proc. of CSC’2011, pages

336–341. IEEE Computer Society.

Knoblock, C. A., Szekely, P., Ambite, J. L., Goel, A.,

Gupta, S., Lerman, K., Muslea, M., Taheriyan, M.,

and Mallick, P. (2012). Semi-automatically mapping

structured sources into the semantic web. In Proc. of

ESWC’2012, pages 375–390. Springer.

Kolev, B., Valduriez, P., Jimenez-Peris, R., Mart

ınez-Bazan,

N., and Pereira, J. (2014). CloudMdsQL: Querying

heterogeneous cloud data stores with a common lan-

guage. In Proc. of the BDA’2014 Conference.

Langegger, A. and W

oss, W. (2009). XLWrap - querying

and integrating arbitrary spreadsheets with SPARQL.

In Proc. of ISWC’2009.

Melton, J., Michels, J. E., Josifovski, V., Kulkarni, K., and

Schwarz, P. (2002). SQL/MED: a status report. ACM

SIGMOD Record, 31(3):81–89.

Michel, F., Djimenou, L., Faron-Zucker, C., and Montagnat,

J. (2014a). xR2RML: Relational and non-relational

databases to RDF mapping language. Research report.

ISRN I3S/RR 2014-04-FR v3.

Michel, F., Montagnat, J., and Faron-Zucker, C. (2014b).

A survey of RDB to RDF translation approaches and

tools. Research report. ISRN I3S/RR 2013-04-FR.

Ong, K. W., Papakonstantinou, Y., and Vernoux, R. (2014).

The SQL++ unifying semi-structured query language,

and an expressiveness benchmark of SQL-on-Hadoop,

NoSQL and NewSQL databases (submitted). CoRR,

abs/1405.3631.

Priyatna, F., Corcho, O., and Sequeda, J. (2014). Formal-

isation and experiences of R2RML-based SPARQL

to SQL query translation using Morph. In Proc. of

WWW’2014.

Roth, M. T. and Schwartz, P. (1997). Don’t scrap it, wrap

it! A wrapper architecture for legacy data sources. In

Proc. of VLDB’1997, pages 266–275.

Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata,

S., Bucher, B., Hamdi, F., Bihanic, L., K

eklian, G.,

Cotton, F., and others (2012). Enabling linked data

publication with the Datalift platform. In Proc. of the

AAAI workshop on semantic cities.

Schwarte, A., Haase, P., Hose, K., Schenkel, R., and

Schmidt, M. (2011). FedX: Optimization techniques

for federated query processing on linked data. In Proc.

of ISWC’11, pages 601–616.

Sequeda, J., Tirmizi, S. H., Corcho, s., and Miranker,

D. P. (2011). Survey of directly mapping SQL

databases to the semantic web. Knowledge Eng. Re-

view, 26(4):445–486.

Sequeda, J. F. and Miranker, D. P. (2013). Ultrawrap:

SPARQL execution on relational data. Web Seman-

tics: Sc., Serv. and Agents on the WWW, 22:19–39.

Shvaiko, P. and Euzenat, J. (2005). A survey of schema-

based matching approaches. In Journal on Data Se-

mantics IV, pages 146–171. Springer.

Spanos, D.-E., Stavrou, P., and Mitrou, N. (2012). Bringing

relational databases into the semantic web: A survey.

Semantic Web Journal, 3(2):169–209.

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

454