Towards Early Prototyping of Services based on Open Transport Data:

A Feasibility Study

Nicolas Ferry, Aida Omerovic and Marit Kjøsnes Natvig

SINTEF, Norway

Keywords:

Service Prototyping, Open Transport Data, DevOps.

Abstract:

Data under open licenses and in reusable formats, often referred to as ”open data”, is increasingly being

made accessible by both public and private actors. Government institutions, municipalities, private companies

and entrepreneurs are among the stakeholders either having visions of new open data-based services, or just

looking for new ideas on potential innovations based on open data. It is, however, in both cases, often unclear

to the service developers how the open data actually can be utilized. A main reason is that the data needs

to be retrieved from multiple sources, understood, quality checked and processed. While gaining insights on

possible services that can be created on the top of open data, a service developer has to undergo an iterative

”trying and failing” exercise of service prototyping. In order to be practically feasible, such a process needs

to be agile and efﬁcient. Open data from the transport sector is used as a case. The open transport data are

characterized by many challenges common for open data in general, but also a few speciﬁc ones. One of those

challenges is the need for combining (often real-time) data from rather many sources in order to create a new

service. In this paper we propose an agile approach to early service prototyping and we try out the approach

on an open transport data service. Finally, we propose the priorities for future work towards a comprehensive

approach for agile prototyping of open transport data-based services.

1 INTRODUCTION

During the past several years, increasingly many pri-

vate and public actors all over the world have been

actively releasing data under open licenses and often

in reusable formats (Barometer, 2015). The goal is to

foster creation of new and innovative digital services.

The innovation and economic potential is becoming

more and more visible, as documented by a European

study (Carrara et al., 2015), thus attracting govern-

ments, municipalities, companies and entrepreneurs

to take part in the ecosystem of the data provision and

creation of innovations on the top of open data. Once

the data are released and announced through a pub-

lic catalogue, a developer needs to understand its for-

mat and content, evaluate its quality and then (at least

partially) create a new service through several itera-

tions. This process is necessary in order to try out the

ideas and evaluate feasibility of the envisioned ser-

vice. Such a creative process of “trying and failing”

to develop new services needs to be highly agile and

efﬁcient. The process is however slowed down since

the data openly available online frequently consist of

rather unstructured information (Kim et al., 2014),

which makes service prototyping difﬁcult and expen-

sive (Rusu et al., 2013). It is also a challenge that

the quality of the dataset descriptions and the meta

data announced might not be good enough to give

the developer the information needed (Martin et al.,

2013)(Beno et al., 2017).

Some tools and methods facilitating open data-

based service prototyping do exist. The Linked Data

Stack (Auer et al., 2012) is a software stack con-

sisting of a number of loosely coupled tools, each

capable of performing certain sets of operations on

linked data, such as data extraction, storage, query-

ing, linking, classiﬁcation, and search. The LinDA

project (Hasapis et al., 2014) developed a set of tools

for linked data publishing, packaged into the LinDA

Workbench. In the cases of both Linked Data Stack

and LinDA, the complexity of provisioning resources

and managing the web application rests on the service

developer who must install the tools and maintain the

infrastructure. The COMSODE project (P. Hane

ak,

2015) provided a set of software tools and method-

ology for open data processing and publishing. COS-

MODE is not available as an online service, but rather

as a set of tools that need to be individually man-

Ferry, N., Omerovic, A. and Natvig, M.

Towards Early Prototyping of Services based on Open Transport Data: A Feasibility Study.

DOI: 10.5220/0007675402570262

In Proceedings of the 9th International Conference on Cloud Computing and Services Science (CLOSER 2019), pages 257-262

ISBN: 978-989-758-365-0

257

aged, which implies additional burden on the devel-

oper. Datalift (Scharffe et al., 2012) is a software

framework for linked data publishing. It is consid-

ered as an ”expert tool” (Scharffe et al., 2012). For

example, it comes with no GUI to support data pub-

lishers in the data publication process. The Linked

Data AppStore (Roman et al., 2014) is a Software-as-

a-Service platform prototype for data integration on

the web. Common for the mentioned tools and ap-

proaches is that they either only partially cover the

prototyping process, or that they are too extensive and

therefore unﬁt for a DevOps-driven agile approach.

We have through the research and innovation

project Open Transport Data, which gathers some of

the major public and private actors from the transport

sector in Norway, addressed service prototyping in

the context of open data from the transport domain.

The following list includes the main challenges that a

developer faces when prototyping services on the top

of open data:

• Discovery of relevant datasets through metadata

search and visualisation of datasets to better un-

derstand the data content. Public catalogues and

data portals are still not comprehensive and meta-

data for describing the contents are only to a lim-

ited degree standardized and available.

• Understanding and using varying application pro-

gramming interfaces (APIs) for data retrieval.

Even though API description standards exist (e.g.,

OpenAPI), they are not commonly used, and APIs

are not documented in a standardised way.

• Combining multiple sources of open data, in order

to create value added services. Travel planners

will for example need information on addresses,

stop points, route plans and position data from

several transport service operators, maps, etc.

• Accessing real-time data from IoT and sensors.

The amount of such data will increase, and new

services will use real-time data streams on, for ex-

ample, the conditions at locations and the move-

ment of people, vehicles and goods.

• Handling of large volumes of data, which is pos-

sibly unstructured.

• Handling proprietary data formats. For example,

standards exists for data on public transport, but

for other transport types (e.g., car sharing, city

bikes, ride sharing) there are no standards, and

proprietary data formats are used.

• Understanding the data. In many cases, domain

knowledge is required in order to sufﬁciently un-

derstand the data contents. This is a challenge

due to lack of documentation and metadata, as de-

scribed above.

Clearly, these characteristics impose requirements

to the approach followed for prototyping the services

based on open transport data. Our goal is that a ser-

vice developer (e.g., an entrepreneur with limited pro-

gramming background) can incrementally explore the

possibilities and ideas while creating a service proto-

type. To that end, the approach has to be highly iter-

ative, comprehensible to non-expert developers and

cost-efﬁcient. To the best of our knowledge, there

is currently no approach which sufﬁciently meets the

above mentioned needs and challenges. In particular,

the existing approaches fail to be sufﬁciently agile,

scalable and comprehensible in order to ﬁt for grad-

ual prototyping through consolidation of many data

sources through multiple iterations.

In this position paper we propose an initial agile

approach to early service prototyping based on open

transport data. The approach is novel in the sense that

it is data-centric and focuses on how to develop an

idea into a prototype rather than how to implement

a solution. The approach is motivated by the above

listed challenges as well as experiences gained from

applying the data which has been harvested into an

open catalogue by the Open Transport Data project.

We exemplify our approach on an open transport data

service and discuss the lessons learned so far. We

also outline a roadmap for the forthcoming research

towards a comprehensive approach for agile prototyp-

ing of open transport data-based services.

Section 2 gives an overview of the approach. Sec-

tion 3 exempliﬁes the approach by prototyping a ser-

vice based on real-life open transport data, and Sec-

tion 4 summarizes the lessons learned in this trial and

discusses the threats to validity of the results. We also

propose the priorities for future work which aims to

provide a comprehensive approach for agile prototyp-

ing of open transport data-based services.

2 OVERVIEW OF THE

APPROACH

In this section we introduce our approach for the iter-

ative prototyping of services based on open data. We

propose the prototyping process for the development

of services based on open transport data, as depicted

in Figure 1 . Firstly, service developers need to search

for the relevant data sets. Open data is typically re-

leased in a domain-dependent way when it comes to

use of terminology and data structures. However, as

stated in (Noy and Brickley, 2017): “it can be difﬁcult

to determine not only the source of the dataset that

has the information that you are looking for, but also

the veracity or provenance of that information”. In

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

258

particular, datasets typically lack proper description

and meta-data. Due to its open nature, the data is not

prepared for a speciﬁc application and can be used in

many different contexts which were not necessarily

anticipated at release time.

Secondly, when data is found, developers need to

access and understand the data. In many cases, only

looking at the documentation of the data (when avail-

able) is not enough as documentation typically fails to

represent aspects such as data missing, data accuracy,

etc. As a result, in order to properly understand the

data, developers need to manipulate and test it.

From this stage, the developer can identify the

potential usage area for the data that enables new

added value services. Once the capabilities of such

service are identiﬁed, and before its implementation,

the developers need to prepare the data (e.g., pre-

processing, cleaning).

In case additional data is required to deliver the

service with the desired capabilities, developers can

enter a new prototyping process. If not, the proto-

type can then be used in other stages of the product

life cycle such as code and deployment stages, for in-

stance when part of its implementation needs to be re-

developed to meet the production requirements (e.g.,

speciﬁc framework needs to be used), or to the testing

stage.

Figure 1: Data oriented early prototyping process.

The details of each of the steps of the iterative ap-

proach depicted in Figure 1 are described in the fol-

lowing:

1. Search Data: The aim of this task is to identify

the data sources and the datasets which the forth-

coming prototyping iteration will be based upon.

Catalogue, data repositories, and search engines

can help ﬁnding the relevant datasets.

2. Access and Understand Data: This process

consists ﬁrst in understanding how the identiﬁed

datasets or data streams can be accessed, second

in actually accessing the data, and ﬁnally in look-

ing at different samples of the data in order to

properly understand its contents, structure, etc.

These activities are often done in an ad-hoc man-

ner as the APIs to retrieve data are typically not

following API description standards.

3. Identify Added Value: This step requires look-

ing into the details of the data in order to under-

stand its contents and to identify which parts of it

are relevant for our service. It is important at this

stage to evaluate several samples of data in order

to establish the overall quality of the data - e.g.,

data accuracy and the missing data.

4. Specify Capabilities: At this stage, the developer

can start specifying the features that will be of-

fered by the prototype. This activity will be af-

fected by the availability of data and its identiﬁed

added value.

5. Prepare Data: This stage consists in managing

and preparing the data for further analysis and

processing as part of the service business logic.

This includes the following activities: data char-

acterization, data organization, data ﬁltering, re-

structuring and compression. At the end of this

stage, the data should be ready to be consumed by

the business logic of the service. In addition, it

should ﬁt its needs and requirements.

6. Prototype Service: This stage consists in the ac-

tual development, delivery and deployment of a

prototype that implements the business logic of

the service speciﬁed at step 4.

7. Identify Missing Data: At the end of a proto-

typing iteration, once a new set of features have

been added, the developer identiﬁes which fea-

tures should be added to the prototype in the forth-

coming iteration, as well as which data are re-

quired.

The cycle may be followed in several iterations,

and terminates when a desired service prototype is in

place, or when a stage fails in a manner that makes it

impossible to proceed.

3 TRIAL OF THE APPROACH

We tried out our approach in the context of the X

project, where we developed a service aiming at (i)

counting all the ongoing deviations within the public

transport (e.g., tram delays, problems with a bus) and

(ii) the average number of deviations over a week. In

the following we detail the activities we performed in

each of the step of our approach. The scope of the trial

were open data available for the public transportation

within the city of Oslo, Norway.

Towards Early Prototyping of Services based on Open Transport Data: A Feasibility Study

259

1. Search Data: We ﬁrst searched for data in the

Open Transport Data CKAN catalogue (see Fig-

ure 2) using “transport” and “Oslo” as keywords

but we could not ﬁnd relevant data. By contrast,

when using the “Ruter” keyword (Ruter is the

public transport authority for Oslo), we found the

API of a “route planning” service.

Figure 2: Open Transport Data CKAN.

2. Access and Understand Data: We ﬁrst selected

the Ruter Sirisx API

which allowed us to re-

trieve, for one stop (i.e., buses, tram, and sub-

way stops), the list of ongoing deviations in all the

lines using this stop. It is exposed as a REST API

and can be accessed using classical tools such as

“curl” or a “REST console”. However, the API is

little documented and we identiﬁed that we could

not use directly this service as it requires as in-

put a JSON object containing the identiﬁer of the

stop of interest. We thus searched again in the

catalogue for another API providing such infor-

mation, and we selected the Ruter Reise API

it provides details about all the public transporta-

tion stops in Oslo, regardless of the transporta-

tion mode. We veriﬁed that the information be-

tween the two services was matching semantically

- i.e., we stored identiﬁers of a few stops from the

Ruter Reise API service and thereafter we called

the Sirisx API using these identiﬁers.

3. Identify Added Value: We analyzed the data

from both the Ruter Reise and the Sirisx APIs.

We could easily ﬁnd the relevant information and

in general the data was accurate even though the

textual description of a deviation was sometimes

incomplete or missing.

4. Specify Capabilities: Using these APIs we could

retrieve and provide users with live information

https://sirisx.ruter.no

http://reisapi.ruter.no

about the deviations associated to one or several

stops. We also decided to retrieve and store this

information on a regular basis to compute the av-

erage number of deviations over a week in the

whole city.

5. Prepare Data: We prepared the data in two ways.

First, by ﬁltering it to only manipulate the part rel-

evant for our service. Second, we prepared the

data for further analysis. The data from the Reise

API describing the stops was obtained in the form

of a JSON object stringiﬁed. Unfortunately, the

JSON obtained was not properly formatted as it

used single quotes instead of doubles. In addi-

tion, some Norwegian language characters where

not properly encoded. We thus implemented a

mechanism to ﬁx this issue before transforming

the string into a proper JSON object.

6. Prototype Service: We implemented our service

using the Node-RED platform

, an open source

project by IBM that uses a visual dataﬂow pro-

gramming model for building applications and

services. Using Node-RED, an application takes

the form of a set of nodes (i.e., software compo-

nents) wired with links that are encapsulated in a

ﬂow. A ﬂow can easily be exposed as a service

using speciﬁc Node-RED nodes. Thanks to the

large community behind Node-RED, a large set

of nodes are available off-the-shelf and for free,

making it easy to implement new applications and

services. We had to implement speciﬁc nodes for

accessing the two APIs and for computing the av-

erage number of deviation over a week

. The ﬁnal

ﬂow is depicted in Figure 3

7. Identify Missing Data: We did not ﬁnd it nec-

essary to implement this step in the trial, as the

prototype already covered the intended function-

ality.

4 DISCUSSION

This section ﬁrst summarizes the challenges we faced

during the trial and thereafter discusses the threats to

validity and reliability of the results.

4.1 Lessons Learned from the Trial

As already presented in Section 2, searching the most

relevant datasets or data sources for building a spe-

ciﬁc service is challenging due to the lack of meta-

data about (i) the datasets (or data sources) and (ii)

https://nodered.org

https://github.com/SINTEF-9012/OTD-components

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

260

Figure 3: Data preparation using Node-RED.

the semantic overlaps between different datasets (or

data sources). For example, it would be interesting to

link datasets by means of automatic annotations with

keywords that would form a domain speciﬁc ontology

(Jiang and Li, 2019).

During our trial, in addition to these challenges,

we ﬁrst observed that many catalogues of datasets

(and data sources) are available on the web, but it was

difﬁcult to make sure that we were using the best can-

didate. In particular, in terms of tooling, there may be

a need for a cross-catalogue search engine (i.e., an en-

gine enabling searching on multiple catalogues). Sim-

ilarly, once we selected our datasets or data sources,

it was impossible to assess if these were the best can-

didates. However, in this case, it is worth noting that

our agile approach, where we can start over again af-

ter trying to use the dataset, helps assessing the qual-

ity and value of different data sources.

Identifying the value of the datasets is also chal-

lenging as it can be difﬁcult to evaluate the quality

of the data. For instance, when dealing with large

datasets or data streams, it is difﬁcult to identify if

some data is missing. As an example, in a large

dataset with data recorded every second for a few

months, it might be difﬁcult to check if a few days

or hours of recordings are missing. More generally,

information about the reliability of a data source is

typically not provided.

The preparation of the data does not necessarily

involve complicated tasks. However, after a few steps

of manipulation, it can be difﬁcult to actually under-

stand the status of the data being manipulated (i.e.,

structure, format, or even the actual content of the

data). In such a case, tools providing a means to

visualize the data after each manipulation, would be

highly beneﬁcial. This applies not only to datasets but

also to data streams.

Our approach is meant to be used during the pro-

totyping phase of the overall life-cycle management

of a service. However, it appears that this prototyp-

ing phase, by itself, would beneﬁt from using clas-

sical tools for the continuous and agile development

and operation of services. For instance, once a pro-

totype has been implemented, it typically has to be

deployed and tested in an sandbox environment. Sim-

ilarly, more advanced prototypes could undergo a ca-

nary testing - i.e., routing a subset of users or requests

to the prototype. A deep analysis of how our approach

ﬁts within the main Agile and DevOps processes, is

required.

4.2 Threats to Validity and Reliability

The validity of the results depends to a large extent

on how well the threats to validity and reliability have

been handled. This section discusses the essential as-

pects of such threats in our context.

In terms of validity, our example is only to a lim-

ited degree representative for the contexts intended

to be within the scope of our approach. The trial

has, however, given strong indications of feasibility

of the approach. No particular customizations of the

approach were needed for the trial. Thus, we have

reason to believe that it should be possible to reapply

our approach on new services.

Reliability is concerned with demonstrating that

Towards Early Prototyping of Services based on Open Transport Data: A Feasibility Study

261

the empirical research can be repeated with the same

results. Of course, a trial like the one we have con-

ducted can not give solid repeatable evidence. There

are several contextual factors inﬂuencing what hap-

pens, particularly the choices made by the researchers

during the service development. As our main goal has

been to propose an initial approach and test its feasi-

bility through the example, performance evaluation of

the approach was not addressed.

It is, in terms of evaluation, also a weakness that

the researchers who tried out the approach also par-

ticipated in design of the approach. As such, it is also

a threat to reliability of the evaluation results, as we

cannot know to what degree another service developer

would have obtained the same results.

We need to further evaluate the approach in more

realistic settings. There is also a need for a baseline

for comparing this approach with the alternative ones,

in order to assess its characteristics such as usability,

usefulness and cost-effectiveness. It should be a part

of the future work. Further empirical evaluation is

also needed for assessing scalability of our approach

with respect to complexity and size of the services to

be developed.

Overall, we have drawn useful experiences from

developing and instantiating the approach in the ex-

ample. Although the mentioned threats to validity and

reliability are present in the study, we argue that the

results indicate feasibility and suggest strengths and

weaknesses of the approach.

5 CONCLUSIONS

In this paper we propose an approach to early and con-

tinuous service prototyping based on open data We

have also tried out the approach on an open transport

data service. The results indicate feasibility and sug-

gest strengths and weaknesses of the approach. In

particular we argue for an iterative ”trying and fail-

ing” approach, as developers building services on top

of open data typically need to play and understand the

data while implementing a service. For this, automa-

tion should also be provided, in particular to facilitate

the access to the data. Automation would also sup-

port deployment of the mechanisms and tools for (i)

the prototyping and (ii) the execution of the prototype

itself.

ACKNOWLEDGEMENT

This work has been funded by the Open Trans-

port Data Project under Norwegian Research Council

grant no. 257153 and by the H2020 programme under

grant agreement no 780351 (ENACT).

REFERENCES

Auer, S., B

uhmann, L., Dirschl, C., Erling, O., Hausen-

blas, M., Isele, R., Lehmann, J., Martin, M., Mendes,

P. N., Van Nuffelen, B., et al. (2012). Managing

the life-cycle of linked data with the lod2 stack. In

International semantic Web conference, pages 1–16.

Springer.

Barometer, O. D. (2015). Open data barometer global re-

port. WWW Foundation.

Beno, M., Figl, K., Umbrich, J., and Polleres, A. (2017).

Open data hopes and fears: determining the barriers

of open data. In E-Democracy and Open Government

(CeDEM), 2017 Conference for, pages 69–81. IEEE.

Carrara, W., Chan, W., Fische, S., and Steenbergen, E. v.

(2015). Creating value through open data: Study on

the impact of re-use of public data resources. Euro-

pean Commission.

Hasapis, P., Fotopoulou, E., Zafeiropoulos, A., Mouzaki-

tis, S., Koussouris, S., Petychakis, M., Kapourani, B.,

Zanetti, N., Molinari, F., Virtuoso, S., et al. (2014).

Business value creation from linked data analytics:

The linda approach. In eChallenges e-2014, 2014

Conference, pages 1–10. IEEE.

Jiang, S., H. T. F. N. M. and Li, J. (2019). Ontology-based

semantic search for open government data. In In the

proceedings of the IEEE 13th International Confer-

ence on Semantic Computing (ICSC). IEEE.

Kim, G.-H., Trimi, S., and Chung, J.-H. (2014). Big-data

applications in the government sector. Communica-

tions of the ACM, 57(3):78–85.

Martin, S., Foulonneau, M., Turki, S., and Ihadjadene, M.

(2013). Open data: Barriers, risks and opportunities.

In Proceedings of the 13th European Conference on

eGovernment (ECEG 2013), Academic Conferences

and Publishing International Limited, Reading, pages

301–309.

Noy, N. and Brickley, D. (2017). Facilitating the discovery

of public datasets.

P. Hane

ak, S. Krchnav

y, I. H. (2015). Comsode publi-

cation platform – open data node – ﬁnal. Technical

report.

Roman, D., Pop, C. D., Roman, R. I., Mathisen, B. M.,

Wienhofen, L., Elvesæter, B., and Berre, A. J. (2014).

The linked data appstore. In Mining Intelligence and

Knowledge Exploration, pages 382–396. Springer.

Rusu, O., Halcu, I., Grigoriu, O., Neculoiu, G., Sand-

ulescu, V., Marinescu, M., and Marinescu, V. (2013).

Converting unstructured and semi-structured data into

knowledge. In Roedunet International Conference

(RoEduNet), 2013 11th, pages 1–4. IEEE.

Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata,

S., Bucher, B., Hamdi, F., Bihanic, L., K

eklian, G.,

Cotton, F., et al. (2012). Enabling linked data publica-

tion with the datalift platform. In AAAI workshop on

semantic cities.

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

262