On the Fly SPARQL Execution for Structured Non-RDF Web APIs
Torsten Spieldenner
German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus D 3 2,
Saarbrucken Graduate School of Computer Science, 66123 Saarbrucken, Germany
Keywords:
Linked Data, Semantic Web, Resource Description Framework, RDF, SPARQL, Web API, Structured Data.
Abstract:
The concept of the Semantic Web, built around the idea of semantically described Linked Data and the data
model of the Resource Description Framework (RDF), has become a prominent idea of seamless access to,
and integration of, data. The number of tools to translate from non-RDF to RDF-representation of data has
since then been ever increasing. However, to this day, numerous Web APIs do offer time critical data only
in non-RDF-formats. Examples for this are traffic and public transport live data. Due to its nature, an offline
data dump, as mostly generated by RDF lifting translation tools, is not practical, as it becomes inconsistent
with the original data quickly. In this paper, we for this present an approach, that, published as a micro-
service, allows to send semantic queries against the legacy Web interfaces directly, and return the result in
RDF. The service API follows the SPARQL 1.1 Query API specification, and also supports federated queries
over distributed endpoints, allowing an easy and accessible way for semantically enriched data integration
over legacy endpoints.
1 INTRODUCTION
In the last years, the concept of the Semantic
Web (Berners-Lee et al., 2001) has developed to a
prominent idea of seamless integration of, and ac-
cess to, data between different providers and appli-
cations. Among others, Verborgh (Verborgh et al.,
2011) and Mayer (Mayer et al., 2016) emphasize
the importance of sufficiently semantically described
server APIs (along with their data). The Semantic
Web is based on the idea of Linked Data (Bizer et al.,
2011), a set of best practices for publishing struc-
tured data on the Web. Linked Data is usually pub-
lished in terms of RDF (Resource Description Frame-
work) graphs
1
to establish links between all kinds of
addressable Web resources. However, to this day,
there is still a tremendous number of resources being
published as structured data that does not yet follow
Linked Data principles. For many text-based struc-
tured resource representations like XML, JSON or
CSV, generic approaches have been developed to map
them to RDF (Das et al., 2012; Scharffe et al., 2012;
Dimou et al., 2013; Michel et al., 2017). These ap-
proaches, however, are commonly used to create a
copy of the original data as Linked Data data dump.
1
RDF 1.1 Primer document (Jul. 2020): https://www.
w3.org/TR/rdf11-primer/
This is not practical for data that is changing fast, such
as live feeds or streams. For these, data dumps may
soon become inconsistent with the original live data
when freshness of the data would be crucial.
An example for this is live traffic and public trans-
port connection data. For this kind of data, the JSON-
based General Transport Feed Specification (GTFS),
2
along with extensions like the Google Proto Buffer
3
based GTFS-Realtime,
4
has become a de-facto stan-
dard. For an emerging number of bicycle sharing sta-
tions, the comparable General Bikeshare Feed Speci-
fication (GBFS)
5
is gaining momentum in acceptance
worldwide. Some work already motivates the use
available open data for route planning (Nallur et al.,
2015). Efforts have also been made to model traffic
data in terms of Linked Data vocabularies (Colpaert
et al., 2017; Colpaert et al., 2019), also tackling the
problem of integrating both static datasets, and dy-
namic Linked Data feeds (Harth et al., 2013).
These approaches, however, often consider
2
GTFS Static Specification (Jul. 2020): https://
developers.google.com/transit/gtfs/
3
Google Protocol Buffer (Jul. 2020): https://developers.
google.com/protocol-buffers
4
GTFS Realtime Specification (Jul. 2020): https://
developers.google.com/transit/gtfs-realtime
5
GBFS Specification (Jul. 2020): https://developers.
google.com/transit/gtfs-realtime
Spieldenner, T.
On the Fly SPARQL Execution for Structured Non-RDF Web APIs.
DOI: 10.5220/0010142102430252
In Proceedings of the 16th International Conference on Web Information Systems and Technologies (WEBIST 2020), pages 243-252
ISBN: 978-989-758-478-7
Copyright
c
2020 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
243
Linked Data and traditional data services as two sep-
arate worlds. Either they do not consider Linked Data
as suitable representation at all (as in the work by Nal-
lur et al. (Nallur et al., 2015)), or in the case of Col-
paert (Colpaert et al., 2017), or Harth (Harth et al.,
2013), they assume readily available Linked Data rep-
resentations of the data of interest, and may suggest
translation steps to lift existing data to the required
Linked Data representation. The problem of integrat-
ing live data that is not yet provided in a semantic
Linked Data representation into a Linked Data appli-
cation is mostly untackled. In fact, a lack of usable
tools to make the transition between the traditional
world of data, and the Linked Data world, has re-
cently been identified as a main bottle neck that hin-
ders the uptake of concepts from Semantic Web in ap-
plication in industry and public sectors (Verborgh and
Vander Sande, 2020).
As a remedy, we present in this paper a sys-
tem, implemented as microservice, that provides a
SPARQL 1.1 Query
6
service behind an API that is
fully compliant to the SPARQL 1.1 protocol inter-
face.
7
It allows to specify remote sources, perform
a provided query against them, and return as result a
SPARQL query result in RDF representation.
The remainder of this paper is structured as fol-
lows: In Section 2, we provide an overview of ex-
isting non-RDF to RDF lifting approaches, as well
as approaches that allow to query structured data as
Linked Data. We revisit the most relevant notions of
the RDF data model and the SPARQL query language
in Section 3. From these, we derive a formal defini-
tion of our SPARQL query service in Section 4, pro-
vide a thorough description of the resulting service
implementation in Section 5, and give detailed exam-
ples of its usability in a selected use-case from the
transportation domain in section 6. We finally con-
clude the paper and give an outlook over future work
in Section 7.
2 RELATED WORK
From the beginning of the Semantic Web, the map-
ping of existing structural data to RDF is an active
research area. The so called lifting and lowering of
data is required to access and modify any non RDF
datasource from Linked Data applications (Sect. 2.1).
6
W3C SPARQL 1.1 Query Language Recommen-
dation (Jul. 2020): https://www.w3.org/TR/2013/
REC-sparql11-query-20130321/
7
W3C SPARQL 1.1 Protocol Recommenda-
tion (Jul. 2020): https://www.w3.org/TR/2013/
REC-sparql11-protocol-20130321/
In addition to lifting non-RDF data to an RDF rep-
resentation, research has also carried out on how to
generate RDF data from evaluation semantic queries
on non-semantic data sources directly (Sect. 2.2).
2.1 RDF-lifting Approaches
A prominent example for lifting relational databases
to RDF is R2RML (Das et al., 2012). R2RML speci-
fies mappings from database schemas to RDF graphs
in RDF turtle syntax. A respective mapping file de-
scribes table structure and content, and where to inject
the data from these tables into a target RDF graph.
Several extensions to R2RML have been pre-
sented: The tool Karma by Gupta et al. (Gupta et al.,
2012) is a visual tool to define R2RML mappings by
annotating relational data. Karma supports the user
by inference of suitable target mappings from pre-
viously annotated data. The resulting mappings can
be published as service with REST API for online
data conversion. Dimou et al. extended R2RML to
RML (Dimou et al., 2013; Dimou et al., 2014a; Di-
mou et al., 2014b), a superset of the R2RML mapping
language, that also allows for mapping of structural
datasources that are not stored in relational databases.
Source formats for RML include CSV, TSV, XML,
and JSON.
Slepicka et al. present KR2RML (Slepicka et al.,
2015), a different approach to extending R2RML to
support heterogeneous sources, that keeps in mind
extendability and scalability with respect to changes
in the source data; aspects in which Slepicka et al.
see shortcomings in RML. Finally, the implementa-
tion CARML,
8
an extension to RML, allows dynamic
input streams as input to an RML mapping, instead
of specifying to the to be mapped source directly in
the mapping file. This is a crucial feature for re-using
mappings for a variety of different structurally equiv-
alent source files.
2.2 SPARQL Query Interfaces to
Non-RDF Datasources
A different approach to make non-semantic datasets
accessible for Linked Data applications is to provide
a SPARQL query endpoint to clients, and transpar-
ently lift the queried non-RDF source upon receiving
a query by a client.
In 2004, Bizer et al. presented D2RQ (Bizer and
Seaborne, 2004). D2RQ provides a mapping lan-
guage from relational database schemata to RDF, sim-
8
CARML GitHub Repository:
https://github.com/carml/carml
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
244
ilar to R2RML. Clients may send requests to the plat-
form to perform SPARQL queries against a database,
or explore a database as Linked Data, while the D2RQ
platform performs the lifting of the database to RDF
transparently, based on a previously defined mapping.
Similar approaches realize SPARQL-to-SQL
mappings by employing R2RML based lift-
ings (Rodr
´
ıguez-Muro and Rezk, 2015; Priyatna
et al., 2014; Calvanese et al., 2017). For non-
relational datasources, Michel et al. present an
approach to query the document-based MongoDB
by employing xR2RML (Michel et al., 2017;
Michel et al., 2016), an extension for R2RML for
non-relational sources.
SPARQL-Generate (Lefranc¸ois et al., 2017) inte-
grates RDF generation from non-RDF datasources di-
rectly into the SPARQL-query itself. This removes
the need of separately provided mapping files, how-
ever, it requires that the target SPARQL processor
implemenents SPARQL-generate on top of SPARQL
1.1.
Finally, SPARQL-Microservice (Michel et al.,
2018a; Michel et al., 2018b) provides a SPARQL
query interface to wrap existing, JSON-based Web
APIs.
The presented approaches in 2.1 cover the transla-
tion of legacy data to Linked Data data dumps. Most
of them take offline approaches to lift data to RDF, a
way we found impractical for live data.
From the approaches in 2.2, many target rela-
tional databases rather than Web APIs, with D2RQ
as one chosen example. Our approach compares
best to SPARQL Generate and SPARQL Microser-
vice, which create the Linked Data representation
from structured legacy data transparently as query re-
sponse. However, SPARQL Generate requires an ex-
tended SPARQL implementation beyond the gener-
ally used specification. SPARQL Microservice re-
quires cumbersome configuration during deploy time.
It is moreover limited to JSON-LD as lifting result,
which makes it difficult to impossible to use it in sce-
narios where the source data are not JSON, or when
another result representation is needed.
Our approach overcomes these shortcomings by
complying fully to the SPARQL 1.1 protocol specifi-
cation. It uses RML as underlying mapping and thus
supports a wide range of source- and target formats,
making it more flexible and versatile than existing ap-
proaches. Lifting of target data is done transparently
in an online step during query execution. A client can
therefore use the service to directly query non-RDF
legacy data, as if the queried source was a Linked
Data source.
3 PRELIMINARIES
This section will handle in brief the basics of the RDF
graph model and the SPARQL Query language. The
RDF graph model will be described according to the
contents of the RDF Primer document (see Footnote
1
). The basics established in this section will in the
following be used to define our service.
3.1 RDF Graph Model
The Resource Description Framework (RDF) is the
standard data model for the Semantic Web.
Let in the following I, L and B be pairwise disjoint
infinite sets of IRIs (Internationalized Resource Iden-
tifiers), literals and blank nodes, respectively. We will
refer to elements in the union set in I B L as RDF
terms. The subset T = I L of RDF terms denotes
resources in some universe of discourse.
Blank nodes in B are to be understood that they
indicate the existence of a resource, the content of
which is described directly in the blank node itself,
but they do not use an IRI to identify a particular re-
source.
The infinite set of all RDF triples is T = (I B)×
I × (T B). Asserting an RDF triple (s, p, o) says that
some resource, denoted by p, establishes a binary re-
lationship between the resources denoted by s and o.
An RDF graph G T is thrn a finite set of RDF
triples of the form (s, p, o).
3.2 SPARQL Query Language
The SPARQL Query Language (SPARQL) is the
W3C recommended query language for RDF datasets
(see footnote
6
). SPARQL queries are built around
a graph pattern matching facility, i.e. triple patterns,
which form the core of the language. In the follow-
ing, we briefly present the syntax and semantics of
SPARQL graph patterns.
Following the formal evaluation algebra as pro-
posed by Perez et al. (P
´
erez et al., 2006), we use V
to denote the infinite set of variables that is disjoint
from I B L. The set of variables occurring in a
syntax expression E is given by var(E).
A tuple from (T V) × (I V) × (T V) is called
a triple pattern. Triple pattern components may be
bound, i.e. from set T, or unbound, i.e. from set V.
The semantics of SPARQL graph patterns is de-
fined in terms of an evaluation function J·K
D
G
that eval-
uates a SPARQL graph pattern P over a dataset D
with an active RDF graph G. The result of this eval-
uation is a set of mappings var(E) T in case of a
On the Fly SPARQL Execution for Structured Non-RDF Web APIs
245
SPARQL SELECT query, or an RDF Graph g T in
case of a SPARQL CONSTRUCT query.
4 SERVICE DEFINITION
From the notions of the previous section, we will now
derive a formal definition of the service in terms of
RDF liftings and SPARQL query execution. We first
define the query execution as function on a (mapped)
data source. Second, we define a SPARQL 1.1 query
interface with the notions of the defined formal query
approach.
4.1 Formal Definition
With the notions from Sect.3, we define moreover the
following concepts:
We define
Q = (T V) × (I V) × (T V) (1)
as a shortcut for the set of all Triple Patterns.
Let Σ denote a machine readable alphabet, Σ
the
set of all words over alphabet Σ, and Γ Σ
a set of
datagrams in a given structured data format encoded
in alphabet Σ. Such structured data formats could for
example be comma separated value (CSV), JSON, or
XML documents, as emitted by a Web resource, but
also binary streams following a deterministic struc-
ture or protocol.
A datagram γ Γ is then a valid structured piece
of data that is encoded in a machine readable alphabet
Σ.
We then define a mapping function
m : (Γ Σ
) (G T ), T = (I B) × I × (T B)
(2)
as a function that translates a given structured
datagram γ into an RDF Graph g G.
Let
Q
denote the set of result mappings of a
triple pattern P Q according to (Buil-Aranda et al.,
2013). A service call s is then a function s with the
following properties:
s : (Γ × Q )
Q
(3)
s(γ, P) = JPK
D
m(γ)
, m : (Γ Σ
) (G T ) (4)
Means, a service accepts as call parameters a tuple
that consists of a datagram γ from a datagram syntax
Γ, and a triple Pattern P Q in some SPARQL syn-
tax.
The result of the service call is an evaluation of the
triple pattern P against the result of a lifting operation
m on datagram γ.
4.2 Service SPARQL Query API
Following, we define the API to the SPARQL Wrap-
ping Service as superset on the W3C SPARQL 1.1
Protocol
9
specification. We define parameters to
specify a SPARQL query that is to be evaluated, as
well as structured legacy data on which to evaluate
the query, or URIs that point to endpoints from where
to retrieve the data respectively. The subset of pa-
rameters that provides necessary information for the
execution of the SPARQL query should completely
comply with the SPARQL 1.1 Protocol specification.
4.2.1 Requests
The SPARQL 1.1 Protocol Recommendation speci-
fies three modes of query requests: Query by HTTP
GET request with Query String parameters, by query
via HTTP POST request, either with message body
included as URL encoded query parameters, or as
direct POST operation with all information contained
in the message payload. Accordingly, a service call s
is performed by an HTTP request with the following
methods and parameters (see also Table 1):
Query via GET: The request is sent by the client
via HTTP GET request to the service endpoint with no
Content-Type header set, as request body is empty.
The endpoint accepts as parameters query and source,
with query being the properly serialized and URL en-
coded triple pattern P according to SPARQL 1.1. pro-
tocol specification, and source a reference to a re-
source from which the datagram γ can be retrieved,
or a datagram γ as url-encoded string respectively.
Query via POST with URL Encoded Parameters:
The request is sent by clients via HTTP POST to the
service endpoint with Content-Type header set to
application/x-www-form-urlencoded. The ser-
vice accepts as parameters query and source. Pa-
rameters are URL-encoded and ampersand-separated.
query contains the SPARQL triple pattern P, and
source the datagram γ (or an URI from which γ can
be retrieved).
Query via Direct POST: Clients send HTTP
POST requests with Content-Type header set to
application/sparql-query. The source datagram
γ is provided as encoded URL parameter, either in-
line, or as URI from which γ can be retrieved. The
SPARQL query P is provided as unescaped string
within the message payload.
Obviously, the above definition satisfies the
SPARQL 1.1 protocol specification with respect to
9
https://www.w3.org/TR/2013/REC-sparql11-protocol-
20130321/
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
246
Table 1: Expected parameters for a SPARQL 1.1 Query service call, based on the original SPARQL 1.1 query protocol
specification. The structured datagram γ takes the role of both Dataset D and (default) graph G.
Method Query Parameters Content-Type Message Body
GET
query=P (exactly 1),
source=γ (ex. 1)
None None
POST
(URL enc.
Parameters)
None
application/
x-www-form-urlencoded
URL-enc., &-separated:
query=P (exactly 1),
source=γ (exactly 1)
POST
(direct)
source=γ (ex. 1)
application/
sparql-query
Unencoded SPARQL
query string
necessary parameters. Our API does not yet con-
sider specification of an RDF dataset D against
which the query should be executed in terms of
default-graph-uri or named-graph-uri. How-
ever, according to the SPARQL 1.1 protocol recom-
mendation, these parameters are optional, and the
specification states that, ”if an RDF Dataset is not
specified in either the protocol request or the SPARQL
query string, then implementations may execute the
query against an implementation-defined default RDF
dataset”.
10
This default dataset is in our case the re-
sult of the mapping operation m(γ).
4.2.2 Responses
Following the SPARQL 1.1 protocol specification, a
query request to a service s returns the SPARQL query
result with a success status code (2xx).
The service moreover returns codes 400 (Bad Re-
quest) and 500 (Internal Server Error) in case of a
malformed query, or a failure to execute the query re-
spectively, in accordance with the SPARQL 1.1 Proto-
col specification. The service moreover returns a 400
error code if the supplied datagram γ is syntactically
incorrect. The service may moreover return:
422 (Unprocessable Entity), if γ, either provided
directly via HTTP POST, or as URI reference for
download, is syntactically correct, but the mapping
m returns an error for some reason, or any parameter
specifying γ is missing.
502 (Bad Gateway), if γ is provided by URI ref-
erence, and the service under source-uri returns an
error.
10
https://www.w3.org/TR/2013/
REC-sparql11-protocol-20130321/#dataset
5 IMPLEMENTATION
This section will describe in detail the actual imple-
mentation of the previously defined service as a mi-
croservice. The overall architecture, and the compo-
nents it is composed from, is provided in Section 5.1.
5.2 details out the service API beyond the SPARQL
interface. We will give an overview of employed
frameworks and libraries in our prototype implemen-
tation in Section 5.3.
5.1 Service Architecture
Figure 1: Architecture of the SPARQL API service.
We build the query service around the components as
shown in Figure 1. Client requests are received by an
HTTP API endpoint. This API accepts HTTP GET
and POST Requests with parameters for query and
source according to the specification in Section 4.2.
Upon receiving a client request, an HTTP Client com-
ponent sends HTTP GET request to the endpoint as
specified by the source parameter. If this request re-
turns an error, this error is forwarded to the request-
ing client according to the error handling routine as
described in Section 4.2.
In case the remote source returns valid data, it is
used as input for an RML Mapping component. The
respective mapping is provided (for example in terms
On the Fly SPARQL Execution for Structured Non-RDF Web APIs
247
of an RML mapping file) by the service itself, and can
be inspected by clients via an HTTP GET request to
the respective resource according to Section 5.2.
The result of the mapping is stored in an in-
memory RDF Triple Store that provides a SPARQL
query API to the service application. If the mapping
was successful, the query as provided by the client
as parameter is executed against the triple store that
contains the mapping result. Otherwise, an error ac-
cording to Sec. 4.2 is returned.
Finally, the result of the query is returned to the
client as result of its request (or an error, if execution
of the query was not successful).
5.2 Service Self Information
1 { " defi n iti o ns ": {
2 " B ik e ": {
3 " t yp e ": [" ob j ec t "] ,
4 " p r ope r ti e s ": {
5 " b ike _id ": {" ty pe ": " s tri ng "} ,
6 " lat ": {" t yp e ": " nu mb er "} ,
7 " lon ": {" t yp e ": " nu mb er "} ,
8 " i s _re s erv e d ": {" typ e ": " in te g er "} ,
9 " i s _di s abl e d ": {" typ e ": " in te g er "}
10 }} ,
11
12 " B i ke D at a ": {
13 " t yp e ": " obje ct ",
14 " p r ope r ti e s ": {
15 " b ik es " : {
16 " t yp e ": " arr ay " ,
17 " i te ms " : {" $ref ":
18 "#/ de fi n i ti o ns / Bik e "}
19 }}} # e nd o f B i ke D at a
20
21 } , # en d of de f i ni t ion s
22
23 " t yp e ": " obje ct ",
24 " p r ope r ti e s ": {
25 " d at a ": {" $ref ":
26 "#/ de f i ni t ion s / Bik eDa t a "}
27 },
28 " r e qu i re d ": [" da ta "]
29 }
Listing 1: Example of a JSON-Schema description
of information conveyed by a free bike status.json
datagram according to NABSA/GBFS General Bike Feed
Specification.
In the current version, the service provides, listed as
result of an HTTP OPTIONS request, routes to the fol-
lowing resources:
Under the route /sourceformat/, clients may
retrieve the expected structure of source data. The
source format is specified in JSON- or XML-Schema
format, depending on the format that the service maps
(see also Listing 1). The provided description may be
used by clients to validate whether the service is ca-
pable of querying the intended legacy API according
to JSON-/XML-Schema documentation.
11
Moreover, the employed RML mapping file is pro-
vided under the route /mapping/ for reasons of doc-
umentation. From the mapping file, client developers
may learn employed ontologies or vocabularies in the
resulting RDF SPARQL response, as returned by the
service.
For future versions, we moreover plan to provide
a definition of the output RDF format, for example in
SHACL
12
, to help clients to validate their local RDF
representation automatically against the output that is
generated by the service.
5.3 Prototype Implementation
Our prototype implementation is based on the Java
Spring Framework
13
for the Web Service HTTP in-
terface (”SERVICE API” in Fig. 1). RDF fea-
tures are provided by the RDF4j
14
RDF library, using
the RDF4j Repository API
15
for SPARQL Queries.
The RDF4j Sail API
16
serves as temporal in-memory
triple store to contain RDF mapping results against
which the SPARQL Queries are executed (”SPARQL
API” and ”RDF Triple Store” in Fig. 1 respectively).
The mappings are performed by the CARML
17
map-
ping framework. CARML extends RML mapping
routines by the capability of defining a dynamic in-
put stream as input the mapping, unlike RML, which
expects a route to a fixed source.
Figure 2 shows the function call sequence between
components of the service as chosen for our imple-
mentation: The HTTP interface provided by the Java
Spring application server API receives a client request
that specifies parameters source and query as either
URL parameters or payload of a HTTP POST query.
A DataBuilder class checks whether the supplied
source argument is an URI, or already the datagram γ
itself. In case of a it being an URI, a WebAPIProxy is
used to retrieve γ from the provided API. γ then serves
as Structured Data input for the mapping process.
11
https://json-schema.org/, https://www.w3.org/XML/
Schema
12
https://www.w3.org/TR/shacl/
13
Spring Framework Website (Feb. 2020): ttps://spring.
io/
14
RDF4j Website (Feb. 2020): https://rdf4j.org/
15
RDF4j Repo. API (Feb. 2020): https://rdf4j.org/
documentation/programming/repository/
16
RDF4jSailAPI(Jul.2020):https://rdf4j.org/
documentation/sail/
17
CARML GitHub Repository: https://github.com/
carml/carml
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
248
Figure 2: Call sequence between the different service components upon a client request to the API as defined in Sect. 4.2.
The (CA)RML Mapping file is provided by the
service itself. It is fed to a CARML Mapper class that,
after some preprocessing steps, uses the CARML map-
ping library to translate γ to a mappedModel, which
will be in RDF Graph form.
The mappedModel, and the query as provided by
the client, are used as input for an QueryExecutor.
The QueryExecutor first opens a connection to a
temporary RDF4j Repo, loads the mappedModel into
it, and executes the query via the RDF4j Sail API.
The queryResult of this operation is finally re-
turned to the client as result of the client’s initial query
request.
6 IN-USE EXAMPLES
Following, we demonstrate the usage of the service
using examples from public transport and bike rental
domain, the main application domain of the funding
project SmartMaaS.
Second, we show how the presented service can
be used to infer additional information from dis-
tributed data sets by employing distributed SPARQL
queries over several SPARQL Wrapper services with
the SERVICE keyword.
6.1 SELECT Query on JSON Data
1 SE L E C T ? name ? lat ? lon W H E R E {
2 ? stati o n a gbfs : St a ti o n ;
3 g b f s : name ? name ;
4 wgs 8 4_ p os : lat ? lat ;
5 wgs 8 4_ p os : long ? lon .
6 }
Listing 2: A SPARQL SELECT query that reads location
information for bike sharing station from a GBFS service
endpoint.
The following example demonstrates a simple SE-
LECT query against a JSON data endpoint. The query
as shown in Listing 2 is sent as query parameter to
the service endpoint, using the JSON data as shown
in Listing 3 as input. The RDF result is shown in
Listing 4.
The overall execution time of the query in the ex-
ample was about 180ms (milliseconds) for a dataset of
63 bike sharing station items. Of these 180ms, 30ms
were spent on the CARML lifting process, and 10ms
on the SPARQL query execution (measured on a In-
tel Core i7-4770k, 3.5GHz). The remaining time was
spent to retrieve the source data from the provided
URI as source parameter.
6.2 Federated Queries
The design of the Service API over parameterized,
SPARQL 1.1 compliant request URLs also allows for
federated SPARQL queries using the SERVICE key-
word, as described in the respective W3C recommen-
On the Fly SPARQL Execution for Structured Non-RDF Web APIs
249
1 {" las t _u pd at ed ": 15 9 5 8 3 5 393 ,
2 " ttl ": 60 ,
3 " da t a ": {
4 " stati o n s ": [
5 {
6 " sta t io n _i d ": "100442 7 9 " ,
7 " na m e ": " B a hn h of B e u e l " ,
8 " sho r t_ n am e ": "47 4 1" ,
9 " lat ": 50.7 3 9 2 1 1 ,
10 " lon ": 7.126598 ,
11 " reg i o n_ id ": "547"
12 },
13 {
14 " sta t io n _i d ": "100442 8 7 " ,
15 " na m e ": " H al te pu n kt Bonn - West ",
16 " sho r t_ n am e ": "47 4 2" ,
17 " lat ": 50 . 7 3 6 7 67 5 ,
18 " lon ": 7.08 0 9 5 6 7 ,
19 " reg i o n_ id ": "547"
20 }, ... ]
21 }}
Listing 3: Input provided as example for a simple SELECT
query (excerpt; source: https://gbfs.nextbike.net/maps/gbfs/
v1/nextbike bf/de/station information.json).
dation document.
18
In the formal W3C SPARQL 1.1 Grammar Rec-
ommendation,
19
a ServiceGraphPattern (entry 59
in the respective grammar document
20
) is defined as
ServiceGraphPattern := ’SERVICE’
’SILENT’? VarOrIri GroupGraphPattern
Accordingly, a federated query against a SPARQL
Wrapper Service endpoint can be performed by
employing as VarOrIri a valid URI against the
SPARQL Wrapper Service API according to Section
4.2, and as ServiceGraphPattern the query that is to
be executed against the dataset γ that is referred to
as source (acc. to API definition in Section 4.2),
employing the mapping m(γ) that is provided by the
service under the URI that is provided as VarOrIri
element in the query. The parameter query can be
omitted in this case, as the Triple Pattern P that de-
scribes the query is already provided in terms of the
GroupGraphPattern that follows the VarOrIri ele-
ment of the federated query.
Listing 5 shows an example of a federated query.
For brevity, the complete route to the service end-
point, as well source datagram γ are omitted, and
given as #srvpath and #srcpath respectively.
21
18
W3C SPARQL 1.1 Federated Query recommendation:
https://www.w3.org/TR/sparql11-federated-query/.
19
SPARQL 1.1 Grammar: https://www.w3.org/TR/2013/
REC-sparql11-query-20130321/#grammar
20
As at current date, June 2020
21
The complete URI used for the given example was http:
//sparql-wrapper.service/?source=https://gbfs.nextbike.net/
1 < r esults >
2 < result >
3 < binding n ame = name >
4 < l i teral > B a h nh o f Beuel </ li t eral >
5 </ binding >
6 < bindi n g n a m e = lon >
7 < literal d at a ty pe = xsd : double >
8 7.1 2 6 59 8
9 </ literal >
10 </ bindi ng >
11 < bindi n g n a m e = lat >
12 < literal d at a ty pe = xsd : double >
13 5 0. 7 39 2 11
14 </ literal >
15 </ bindi ng >
16 </ res ult >
17 < result >
18 . . . .
19 </ results >
Listing 4: Query result of the Query in Listing 2 against the
data in Listing 3.
In our evaluation, the query given in Listing 5
was performed against an endpoint that emits in-
formation about the location of rentable bikes in
GBFS JSON format. The GraphGroupPattern in the
SERVICE clause merges that information with infor-
mation about the location of rental bike stations of
the same provider. The respective query response is
shown in Listing 6. Note that the displayed infor-
mation (Which bike is currently parked at which sta-
tion?), is originally not provided by how the GBFS
datamodel is defined. Deriving this information via
semantic queries over the originally not semantically
enriched GBFS data is a direct benefit from lifting
queries against the GBFS data to a semantic repre-
sentation.
The total execution time of the construct query
was 560ms for a dataset of 63 bike station entries,
and 649 items for free bikes respectively. Of these
560ms, approx. 10ms each were spent for the lifting
process of both the datasets, and another 30ms to per-
form the query against the station information data in
the SERVICE clause. The remaining times were spent
to retrieve the source data from the provided URIs.
7 CONCLUSION AND FUTURE
WORK
In this paper, we have presented a novel service that
allows to perform SPARQL queries against non-RDF
datasets. Unlike existing solutions, the presented ser-
vice is not limited to a certain source format, as long
maps/gbfs/v1/nextbike bf/de/station information.json
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
250
1 C O NS T RU CT {
2 ? stati o n a gbfs : St a ti o n ,
3 wgs 8 4_ p os : S p at ia lT hing ;
4 gbfs : name ? s t at io n_ na me ;
5 w gs 8 4_ p os : l a t _l o n ? la t _l o n_ po s .
6
7 ? bike_ i d a gbfs : Bike ;
8 w gs 8 4_ p os : l o ca t io n ? stati o n .
9 }
10 WHERE {
11 ? bi k e w gs 84 _ po s : lat ? lat ;
12 wgs 8 4_ p os : long ? lon .
13
14 SERVI C E <# s r vp a t h /? s o u r ce =# srcpat h
> {
15 ? st a ti o n gbfs : n a m e ? name ;
16 w gs 8 4_ p os : lat ? s t at io n_ la t ;
17 w gs 8 4_ p os : long ? st a ti on _l on .
18 }
19 FILTER (
20 ABS (? lat -? sta t io n_ la t ) < 0.001 &&
21 ABS (? lon -? sta t io n_ lo n ) <0 . 001)
22 B I N D
23 ( CO N C AT ( str (? lat ) ," ," , str (? lon ))
24 as ? l a t_ lo n_ po s )
25 }
Listing 5: Example of federated SPARQL query.
as the source to be queried is structured. The service
offers a SPARQL 1.1 protocol HTTP query API, that
is also suitable to be used as endpoint for federated
SPARQL queries.
Based on the original SPARQL 1.1 Protocol, we
have derived a formal query API, and provided a for-
mal design of the presented solution. We have more-
over presented a proposal for an actual service ar-
chitecture, based on the CARML non-RDF-to-RDF
mapping engine.
Finally, we outlined our protocol implementation,
and concluded with an evaluation of the prototype
implementation. We moreover demonstrated the ap-
plicability of the service in the scope of federated
SPARQL queries.
The presented implementation is published
on Github under https://github.com/SmartMaaS/
sparql-api-wrapper.
The current design and implementation so far ne-
glect named graphs. How to include this concept in
the presented service design is subject to future work.
So far, the service supports SPARQL SELECT,
ASK and CONSTRUCT queries. As future work,
we also plan to investigate how SPARQL UPDATE
queries against a non-RDF endpoint may be used to
modify a remote non-RDF dataset, given that the re-
mote endpoint allows data modification.
We have discussed our solution with respect to
use-cases from the domain of traffic and public trans-
port. We see however, and plan to evaluate, a clear
1 < h t t p :// foo . bar / st at i on s /10044 5 4 0 >
2 a gbfs : S t ation ,
3 w gs 8 4_ p os : S pa ti al Th in g ;
4 g b f s : name " J u ri di c um ";
5 wgs 8 4_ p os : l a t _ lo n
"50 . 73 0 09 23 28 9 94 85 ,
7. 1 08 277 6 78 48 968 5 " .
6
7 < h t t p :// foo . bar / bikes /4 4 608 > a g b f s :
Bike ;
8 wgs 8 4_ p os : l o c at i on
9 < http :// foo . bar / st at i on s
/10044540 > .
10
11 < h t t p :// foo . bar / bikes /4 5 448 > a g b f s :
Bike ;
12 wgs 8 4_ p os : l o c at i on
13 < http :// foo . bar / st at i on s
/10044540 > .
14
15 < h t t p :// foo . bar / bikes /4 5 337 > a g b f s :
Bike ;
16 wgs 8 4_ p os : l o c at i on
17 < http :// foo . bar / st at i on s
/10044540 > .
Listing 6: Query result (excerpt) as returned by the
federated SPARQL query example.
applicability in use-cases from industrial domains, as
well as Smart City, Smart Grid, and Smart Living sce-
narios.
ACKNOWLEDGEMENTS
This work has been supported by the German Federal
Ministry for Economic Affairs and Energy (BMWi)
(FZK 01MD18014 C) as part of the project Smart
MaaS, and by the German Federal Ministry of Ed-
ucation and Research through the MOSAIK project
(grant no. 01IS18070-C). The project Smart MaaS is
part of the technology program “Smart Service Welt
II”, which is funded by the German Federal Ministry
for Economic Affairs and Energy (BMWi).
REFERENCES
Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The
semantic web. Scientific american, 284(5):28–37.
Bizer, C., Heath, T., and Berners-Lee, T. (2011). Linked
data: The story so far. In Semantic services, inter-
operability and web applications: emerging concepts,
pages 205–227. IGI Global.
Bizer, C. and Seaborne, A. (2004). D2RQ-treating non-
RDF databases as virtual RDF graphs. In Pro-
ceedings of the 3rd international semantic web con-
On the Fly SPARQL Execution for Structured Non-RDF Web APIs
251
ference (ISWC2004), volume 2004. Proceedings of
ISWC2004.
Buil-Aranda, C., Arenas, M., Corcho, O., and Polleres, A.
(2013). Federating queries in SPARQL 1.1: Syntax,
semantics and evaluation. Web Semantics: Science,
Services and Agents on the World Wide Web, 18(1):1
17. Special Section on the Semantic and Social Web.
Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R.,
Lanti, D., Rezk, M., Rodriguez-Muro, M., and Xiao,
G. (2017). Ontop: Answering sparql queries over re-
lational databases. Semantic Web, 8(3):471–487.
Colpaert, P., Abelshausen, B., Andr
´
es, J., Mel
´
endez, R., and
Delva, H. (2019). Republishing Open Street Map’s
roads as Linked Routable Tiles. In European Semantic
Web Conference, pages 13—-17.
Colpaert, P., Verborgh, R., and Mannens, E. (2017). Pub-
lic transit route planning through lightweight linked
data interfaces. In Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intel-
ligence and Lecture Notes in Bioinformatics), volume
10360 LNCS, pages 403–411.
Das, S., Sundara, S., and Cyganiak, R. (2012). R2RML:
RDB to RDF Mapping Language. W3C Recommen-
dation, (September 2012):1–34.
Dimou, A., Sande, M. V., Colpaert, P., Verborgh, R., Man-
nens, E., and Van De Walle, R. (2014a). RML: A
generic language for integrated RDF mappings of het-
erogeneous data. In CEUR Workshop Proceedings,
volume 1184.
Dimou, A., Sande, M. V., Slepicka, J., Szekely, P., Man-
nens, E., Knoblock, C., and Van De Walle, R. (2014b).
Mapping Hierarchical Sources into RDF using the
RML Mapping Language.
Dimou, A., Vander Sande, M., Colpaert, P., Mannens, E.,
and Van De Walle, R. (2013). Extending R2RML
to a source-independent mapping language for RDF.
In CEUR Workshop Proceedings, volume 1035, pages
237–240.
Gupta, S., Szekely, P., Knoblock, C. A., Goel, A.,
Taheriyan, M., and Muslea, M. (2012). Karma: A sys-
tem for mapping structured sources into the semantic
web. In Extended Semantic Web Conference, volume
7540, pages 430–434. Springer, Berlin, Heidelberg.
Harth, A., Knoblock, C. A., Stadtm
¨
uller, S., Studer, R., and
Szekely, P. (2013). On-the-fly integration of static and
dynamic linked data. In CEUR Workshop Proceed-
ings, volume 1034.
Lefranc¸ois, M., Zimmermann, A., and Bakerally, N. (2017).
A SPARQL extension for generating RDF from het-
erogeneous formats. In Lecture Notes in Computer
Science (including subseries Lecture Notes in Artifi-
cial Intelligence and Lecture Notes in Bioinformatics),
volume 10249 LNCS, pages 35–50.
Mayer, S., Verborgh, R., Kovatsch, M., and Mattern, F.
(2016). Smart configuration of smart environments.
IEEE Transactions on Automation Science and Engi-
neering, 13(3):1247–1255.
Michel, F., Djimenou, L., Zucker, C. F., and Montagnat,
J. (2017). xR2RML: Relational and non-relational
databases to RDF mapping language. PhD thesis,
CNRS.
Michel, F., Faron-Zucker, C., and Montagnat, J. (2016).
A mapping-based method to query mongodb docu-
ments with sparql. In International Conference on
Database and Expert Systems Applications, pages 52–
67. Springer.
Michel, F., Zucker, C. F., Gandon, F., Fabien, G., and Faron-
Zucker, C. (2018a). Bridging Web APIs and Linked
Data with SPARQL Micro-Services. pages 187–191.
Michel, F., Zucker, C. F., Gandon, F., Fabien, G., and
Faron-Zucker, C. (2018b). SPARQL Micro-Services:
Lightweight Integration of Web APIs and Linked
Data. Technical report.
Nallur, V., Elgammal, A., and Clarke, S. (2015). Smart
Route Planning Using Open Data and Participatory
Sensing. In IFIP Advances in Information and Com-
munication Technology, volume 451, pages 91–100.
Springer New York LLC.
P
´
erez, J., Arenas, M., and Gutierrez, C. (2006). Semantics
and complexity of sparql. In International semantic
web conference, pages 30–43. Springer.
Priyatna, F., Corcho, O., and Sequeda, J. (2014). Formali-
sation and experiences of R2RML-based SPARQL to
SQL query translation using morph. In WWW 2014 -
Proceedings of the 23rd International Conference on
World Wide Web, pages 479–489.
Rodr
´
ıguez-Muro, M. and Rezk, M. (2015). Efficient
SPARQL-to-SQL with R2RML mappings. Journal of
Web Semantics, 33:141–169.
Scharffe, F., Bihanic, L., K
´
ep
´
eklian, G., and Atemezing
(2012). Enabling Linked Data Publication with the
Datalift Platform. Workshops at the Twenty-Sixth
AAAI Conference on Artificial Intelligence.
Slepicka, J., Yin, C., Szekely, P., and Knoblock, C. A.
(2015). KR2RML: An alternative interpretation of
R2RML for heterogeneous sources. In CEUR Work-
shop Proceedings, volume 1426.
Verborgh, R., Steiner, T., Van Deursen, D., Van de Walle,
R., and Vall
´
es, J. G. (2011). Efficient runtime service
discovery and consumption with hyperlinked restdesc.
In 2011 7th International Conference on Next Gener-
ation Web Services Practices, pages 373–379. IEEE.
Verborgh, R. and Vander Sande, M. (2020). The seman-
tic web identity crisis: in search of the trivialities that
never were. Semantic Web, (Preprint):1–9.
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
252