PLASMA: Platform for Auxiliary Semantic Modeling Approaches

Alexander Paulus, Andreas Burgdorf, Lars Puleikis, Tristan Langer, Andr

e Pomp

and Tobias Meisen

Chair of Technologies and Management of Digital Transformation, University of Wuppertal, Wuppertal, Germany

Keywords:

Semantic Modeling, Semantic Reﬁnement, Semantic Data Management, PLASMA.

Abstract:

In recent years, the impact and usability of semantic data management has increased continuously. Still, one

limiting factor is the need to create a semantic mapping in the form of a semantic model between data and

a used conceptualization. Creating this mapping manually is a time-consuming process, which especially

requires to know the used conceptualization in detail. Thus, the majority of recent approaches in this research

ﬁeld focus on a fully automated semantic modeling approach, but reliable results cannot be achieved in all use

cases, i.e., a human must step in. The needed subsequent manual adjustment of automatically created models

has already been mentioned in previous works, but has, in our opinion, received too little attention so far. In this

paper, we treat the involvement of a human as an explicit phase of the semantic model creation process, called

semantic reﬁnement. Semantic reﬁnement comprises the manual improvement of semantic models generated

by automated approaches. In order to additionally enable dedicated research in this direction in the future,

we also present PLASMA, a platform to utilize existing and future modeling approaches in a consistent and

extendable environment. PLASMA aims to support the development of new semantic reﬁnement approaches

by providing necessary supplementary functionalities.

1 INTRODUCTION

Due to the emergence of large amounts of data and

its usage in machine learning processes, the manage-

ment of such data has become increasingly impor-

tant in recent years. Although access to data is often

possible, ﬁnding and understanding the needed data

sources to extract information can be a challenging

task without meta information associated to the raw

data sets. Semantic data management in the form

of Ontology-based Data Management (OBDM) has

already laid the foundation for managing heteroge-

neous data sources more efﬁciently. Advances in this

ﬁeld of research can be observed in (Linked) Open

Data and the introduction of Enterprise Knowledge

Graphs(Galkin et al., 2016).

There are two important processes that ensure the

quality and acceptance of semantic data management:

the creation and maintenance of conceptualizations as

well as the description of data sources based on such

conceptualizations in form of semantic models. A

semantically annotated data source can be found us-

ing the conceptual representation of the data and pro-

cessed using the provided context information stored

in the model or the conceptualization. In recent years,

research has been focused on reducing the effort that

arises when it comes to deﬁning those conceptual-

izations and the semantic models (cf. Section 2).

However, there are no approaches that generally yield

ﬂawless results, often requiring a user to supervise

the creation and apply modiﬁcations. Those modi-

ﬁcations can be corrections as well as enhancements

of the semantic models. Although mentioned in pre-

vious publications such as (Taheriyan et al., 2016), in

our opinion, this phase of the modeling process has

not yet received sufﬁcient attention.

This phase, which is referred to simply as reﬁne-

ment by (Taheriyan et al., 2016), follows the auto-

mated semantic modeling. It allows the user to im-

prove an initial version proposed by an automatic

modeling approach by manually correcting or en-

hancing the state of the model. However, a clear deﬁ-

nition of this phase has not yet been proposed, which

we attempt in this contribution.

Furthermore, to support the development of al-

gorithms for the reﬁnement phase, we also pro-

pose PLASMA, a modular platform to integrate ap-

proaches for the entire process of creating a seman-

tic model, including the reﬁnement, but also labeling

and modeling (cf. Section 2). Using PLASMA, users

are enabled to test approaches in a pre-conﬁgured en-

vironment. For that, PLASMA allows users to ex-

Paulus, A., Burgdorf, A., Puleikis, L., Langer, T., Pomp, A. and Meisen, T.

PLASMA: Platform for Auxiliary Semantic Modeling Approaches.

DOI: 10.5220/0010499604030412

In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 2, pages 403-412

ISBN: 978-989-758-509-8

403

change single components of the semantic model cre-

ation process in order to test and evaluate new or even

already existing approaches in the same system.

Our contributions in this paper are as follows:

First, we introduce the semantic reﬁnement as a phase

of the semantic model creation process and state the

characteristics of this phase. Second, we present the

concept of auxiliary approaches and how those can be

utilized to build a comprehensive semantic modeling

environment. Third, we release PLASMA to serve as

a backbone for future semantic modeling that includes

a reﬁnement UI.

In the further course of the paper, we discuss re-

lated work in Section 2, then we present our concept

of semantic reﬁnement as its own phase in Section 3.

Afterwards, we introduce PLASMA in Section 4, fol-

lowed by current and future usage scenarios of the

platform in Section 5. We conclude and give an out-

look in Section 6.

2 RELATED WORK

In the past years, several approaches for performing

the creation of semantic models have emerged. The

process of creating a semantic model has been divided

into two phases so far. A basic annotation of a data

source can be achieved by performing a semantic la-

beling step, followed by the semantic modeling step,

which aims to relate initially mapped concepts and

equip the annotation with additional context (cf. (Vu

et al., 2019)).

Semantic labeling, which is the initial mapping

from labels (e.g., table headers or leafs in a struc-

tured data set) to a conceptualization (e.g., ontology),

is mainly performed by following either a label-driven

or a data-driven strategy. Examples for label-driven

mappings have been proposed by (Polﬂiet and Ichise,

2010), (Pinkel et al., 2017), (Paulus et al., 2018) or

(Papapanagiotou et al., 2006). Next to those, there

also exist data-driven mapping approaches that rely

mainly on using the data values of a data set. For

instance, (Syed et al., 2010) as well as (Wang et al.,

2012) proposed approaches that provide labels using

the data values within a data set in conjunction with

external (knowledge) databases to identify the most

reasonable concept. (Ramnandan et al., 2015),(Ab-

delmageed and Schindler, 2020) use algorithmic ap-

proaches whereas (Pham et al., 2016), (R

ummele

et al., 2018), (Chen et al., 2019), (Hulsebos et al.,

2019) as well as (Takeoka et al., 2019) and (Pomp

et al., 2020) use machine learning approaches to solve

the task.

All previously mentioned semantic labeling ap-

proaches rely on a predeﬁned conceptualization, al-

though approaches like (Pham et al., 2016), (Pomp

et al., 2020) or (Jim

enez-Ruiz et al., 2015) are able to

create concepts during the mapping process by inte-

grating the user into the labeling process. Using such

approaches in platforms, like Optique (Giese et al.,

2015), it becomes possible to redeﬁne the found map-

pings using an R2RML editor (Sengupta et al., 2013).

Following the semantic labeling phase, the initial

mappings can be extended by additional meaningful

concepts as well as selecting the most suitable re-

lationships that hold between the different concepts,

resulting in a semantic model for that data set. Ap-

proaches presented by (Knoblock et al., 2012) and

(Taheriyan et al., 2013; ?; ?) as well as (U

na et al.,

2018) have shown signiﬁcant improvements in this

area during the last years. Most recent approaches

like (Vu et al., 2019) and (Futia et al., 2020) show the

ongoing research interest in the topic. However, none

of the approaches does solve the task perfectly, due to

ambiguities in the data sets or the encounter of unseen

cases in the training data.

To ﬁnalize the model and correct potential errors,

human interaction is needed. However there are only

few approaches that feature tools to inspect and mod-

ify a semantic model after the automated generation.

A ﬁrst approach to support semantic model modiﬁca-

tion was Karma, presented by (Knoblock et al., 2012)

and (Gupta et al., 2012). Karma is described as a

data integration tool that enables users to feed in data

and create a semantic model for it. The used auto-

mated semantic modeling approach is based on the

method developed by Taheriyan et al. Karma has

been used in various projects (cf. (Szekely et al.,

2015)). A similar approach to Karma is followed by

ESKAPE(Pomp et al., 2017), a semantic data plat-

form for enterprises, handling the full data manage-

ment process from ingestion to extraction. However,

ESKAPE only performs basic semantic labeling au-

tomatically - the semantic modeling has to be done

manually by the users. Furthermore, there exist mul-

tiple commercial tools like PoolParty

or Grafo

that

focus on creating knowledge graphs but are not avail-

able for free use and furthermore are not extendable

to support own algorithms.

Both Karma and ESKAPE are capable of doing a

data schema analysis, analyzing live input data to ex-

tract labels and structure to perform the labeling step

on. Both also support multiple data formats during

this phase as well as provide a GUI (cf. (Szekely

et al., 2015) and (Pomp et al., 2018)), which allows

users to modify the semantic model. However, al-

http://poolparty.biz

http://gra.fo

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

404

automated

semantic

labeling

automated

semantic

modeling

semantic

refinement

storage

schema

analysis

Figure 1: Overview of the phases during the creation of a

semantic model with the semantic reﬁnement phase follow-

ing the (automated) semantic modeling.

though following the human in the loop approach,

both systems do not actively support the user during

the modiﬁcation phase.

From the observed state of the art, the main pro-

cess of creating a semantic model is depicted in Fig-

ure 1. We identify two main phases, automated se-

mantic labeling and automated semantic modeling

that have received signiﬁcant research during the past

years, and the subsequent reﬁnement phase featuring

the interactive modeling. Also, as those stages are

provided by modeling systems like Karma, we add a

schema analysis phase that builds the initial syntactic

model from a data set and a ﬁnal storage phase that

persists the complete semantic model, e.g., by inte-

grating it into a knowledge graph or semantic storage.

3 SEMANTIC REFINEMENT

While there have been major advances in the ﬁeld of

semantic labeling and modeling, the subsequent re-

ﬁnement has been mentioned but not actively sup-

ported. However, systems like Karma and ESKAPE

provide the ability to improve the quality of gener-

ated models as a fourth phase following the auto-

mated semantic modeling. We refer to this phase as

Semantic Reﬁnement, an interactive and iterative pro-

cess of improving an initial version proposed by an

automatic modeling approach. The terminology is

not new, (Paulheim, 2016) used the ”reﬁnement” term

to describe the automated improvement of knowledge

graphs, whereas (Futia et al., 2020) use it to describe

the ﬁnal automated validation of their semantic model

computation. However, in both cases this is an auto-

mated procedure and which, in the ﬁrst case is dis-

connected from the semantic model creation process

and in the latter case we associate it as a part of the

semantic modeling phase. Also, the reﬁnement of a

semantic model has been mentioned in previous pub-

lications (e.g., (Szekely et al., 2013), (Taheriyan et al.,

2015) and (Taheriyan et al., 2016)), but was often not

in the focus and being referred to as a necessity than

a dedicated phase of the semantic model creation pro-

cess. However, it is unclear if the authors referred to

an automated step like proposed by (Futia et al., 2020)

or a fully user-based adjustment as (Pomp et al., 2017)

propose.

We see the main characteristic of the dedicated se-

mantic reﬁnement phase as the manual use of a tool to

improve the quality of a semantic model. The user is

kept involved by inspecting the result after each mod-

iﬁcation. Possible user operations in this phase are

adding new concepts or relations from the conceptu-

alization, deleting or replacing existing elements or

modifying single elements if permitted by the system

(e.g., changing properties). Even extending the con-

ceptualization (c.f. (Pomp et al., 2019)) is a valid op-

eration if the system used follows the open world as-

sumption and lets the user add own facts to the models

which are not yet present in ontologies.

In our opinion, an essential part of the semantic re-

ﬁnement phase is a continuous support by the system,

aiding the user on how to improve the model by taking

the manual load of checking, validating, correcting

and extending the contents, e.g., by solving discrep-

ancies identiﬁed during the modeling phase. There-

fore, we propose that algorithms are used repeatedly

after each user-created modiﬁcation. Such algorithms

use the current state of the semantic model as input

to evaluate and recommend one or multiple possible

changes to the semantic model. For the output, we

propose that approaches yield suggestions on how to

incrementally improve the semantic model by provid-

ing single relations or concepts as well as subgraphs

to be added. For each iteration, the user can then also

decide to accept or reject those changes, possibly up-

dating the model and triggering a new recommenda-

tion iteration. Of course, the user can also ignore the

recommendations and apply own further changes.

Examples for reﬁnements span selection or ex-

change of ambiguous or mutually exclusive concepts,

e.g.. for specifying units of measure, as well as adding

new concepts that were either not included in the au-

tomatically generated model for various reasons, most

commonly due to being unseen in the past. Also,

reﬁnements could include adding new and more ap-

propriate concepts that were unavailable to previous

automated steps, e.g., by using external data sources.

The result should resemble the way the user interprets

the data set as close as possible.

4 PLATFORM FOR AUXILIARY

SEMANTIC MODELING

APPROACHES

With the new process model in mind, a platform is

needed which supports and focuses manual semantic

PLASMA: Platform for Auxiliary Semantic Modeling Approaches

405

reﬁnement alongside all other phases. We designed

the PLatform for Auxiliary Semantic Modeling Ap-

proaches (PLASMA) as a platform for scientiﬁc re-

search that supports the whole semantic model cre-

ation process and aim to make it publicly available

for scientiﬁc purposes

With the release of PLASMA as a new platform,

we aim to build a baseline system for semantic model

creation, while reducing the effort for the initial setup

and maintenance to a minimum. The main driving

factor is the ability to support all ﬁve phases of the se-

mantic model creation process (Figure 1) in one plat-

form. This especially includes the semantic reﬁne-

ment phase introduced in Section 3. We also chose a

modular approach to be able to exchange single com-

ponents and connect new ones using a deﬁned inter-

face, while changing as little of the surrounding sys-

tem as possible. This would also apply when connect-

ing existing systems like Serene (U

na et al., 2018) or

SeMi (Futia et al., 2020).

This approach is inspired by (U

na et al., 2018),

which connected Karma to their system to achieve

comparability, as well as the Serene Benchmark

ummele et al., 2018), which focuses on integrat-

ing different fully automated semantic labeling ap-

proaches. PLASMA also allows the utilization of

those approaches but encapsules them in a modu-

lar way making them exchangeable. Furthermore,

instead of solely focusing on the semantic labeling

phase, PLASMA provides a complete modeling envi-

ronment, including the semantic modeling and reﬁne-

ment phase. PLASMA also provides common func-

tionalities like schema analysis and a graphical mod-

eling UI to not require each developer to implement

those each time they design a new approach. Al-

though being provided by the system, those compo-

nents are also to be realized as modules, making them

exchangeable if required.

4.1 Components

In the following, we present the single components of

the architecture separately and outline dependencies

and requirements. Also, a detailed feature overview

is given for each service. Figure 2 gives a schematic

overview of the components and their interactions and

dependencies.

4.1.1 Data Model

The data central model of PLASMA is called a Com-

bined Model. It holds the syntactic model as well

https://github.com/tmdt-buw/plasma

as the current state of semantic model that is cre-

ated. We deﬁne the syntactic model as M

Syn

(nodes, edges, root) where nodes are general nodes,

representing an object or a collection, or leaf nodes

representing data labels and their values. The root

node is stored in root. The set edges contains all

the edges that connect the nodes to represent the

structure of the data source so that especially nested

structures are preserved. Given a semantic concep-

tualization, like an ontology O = (C, R), where C

denotes available concepts and R the relations be-

tween them, a semantic model is deﬁned as M

Sem

, R

) C

⊆ C, R

⊆ R. The combined model M

Syn

, M

Sem

, L) then denotes a fusion of the syntac-

tic and semantic model. It contains an additional

mapping L : C

→ nodes mapping single concepts

∈ M

Sem

from the semantic model to attributes in

the syntax model M

Syn

. As L is a injective mapping,

it must hold that @ c

, c

∈ C

where L(c

) = L(c

)

(only one concept is assigned to each syntax node).

Furthermore, the combined model deﬁnes a for-

mat for modiﬁcations to the model. A modiﬁca-

tion for a combined model M

= (M

Syn

, M

Sem

, L) is

deﬁned as ∆ = (C

, A, R

, ω), where C

⊆ C and

⊆ R denote the sets of modiﬁed (added or ad-

justed or deleted) concepts and and relations, respec-

tively. The set A denotes the anchor points, that is

A = {a | a ∈ C

∧ (a ∈ C

, where C

∈ M

Sem

∨ ∃ n ∈

nodes, L(a) = n)}. This requires the contained sub-

graph to be connected to the existing M

Sem

or to con-

tain at least one concept a ∈ C

being mapped to an

attribute. Those modiﬁcations can either be applied

directly to M

or stored for the user to chose which

ones to apply. In this case, we refer to them as Rec-

ommendations and require A 6=

0, not allowing dis-

connected subgraphs and indicate an (optional) conﬁ-

dence score with ω ∈ [0, 1], indicating on how certain

the issuer of the modiﬁcations is that this will indeed

improve the state of the model.

4.1.2 Data Source Service (DSS)

Before the user can start modeling, a data source has

to be created using the DSS. A data source is the basic

reference unit of PLASMA. We deﬁne a data source

as a tuple DS = (name, note, desc, data), where name

is the unique name of that source, note is a short de-

scription of the data source, desc is a textual descrip-

tion of the data and data is the later provided sam-

ple input data, being empty on creation. The desc is

an extended textual description of the contained data

that can be provided to supply meta information for

labeling and modeling approaches.

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

406

User Interface

Schema Analysis

Service

Data Modeling

Service

Knowledge Graph

Service

Data Source

Service

Auxiliary

Recommendation

Services

Semantic

Recommendation

Service

data exchange optional / partial data exchange

Figure 2: Architectural overview of the PLASMA’s components. All connections indicate communicating components.

4.1.3 Schema Analysis Service (SAS)

Once a data source has been created, we begin the

semantic model creation process with phase 1, the

schema analysis. Sample data points need to be in-

serted for a data source to analyze them and extract

the structure and labels. Thus, the SAS takes a data

source DS

and a set of sample data points SDP

, P

, ..., P

to analyze the data. After the analysis,

the SAS identiﬁes the data structure and provides a

syntactic model M

Syn

4.1.4 Auxiliary Recommendation Services

(ARS)

The Auxiliary Recommendation Services (ARS) are

services that provide modeling recommendations dur-

ing different stages of the semantic model creation

process. Each ARS encapsulates an algorithm to per-

form a speciﬁc task in either the labeling (phase 2),

modeling (phase 3) or reﬁnement phase (phase 4), be-

ing denoted ARS for Labeling (ARS-L), ARS for Mod-

eling (ARS-M) and ARS for Reﬁnement (ARS-R), re-

spectively, based on their type. An ARS receives the

current state of the combined model and optionally

some meta information and will return a combined

model which has been modiﬁed or supplemented with

some recommendations ∆

, .., ∆

for the user to chose

from. The services are using a standardized interface

to be quickly exchangeable and allow PLASMA to

support multiple approaches on how to solve the task

for each phase.

The ARS are modular services, thus posing as lit-

tle limitations as possible to developers on what tech-

nology stack to use when implementing their solu-

tions. As long as an ARS satisﬁes the given interface

provided by the platform, it can be connected. Using

a decoupled architecture, new approaches can be de-

veloped and tested on their own before being attached

to PLASMA. If the approach to be connected by the

ARS itself is an already implemented and running

system with a predeﬁned API, the ARS can also func-

tion as a proxy to it, e.g., by using an offered interface

and converting communication to match PLASMA’s

data format (see Section 4.1.1). A single service

can be registered to perform operations for different

phases, e.g., by providing multiple interfaces. The

three types of ARS are:

• ARS for Labeling (ARS-L): receive a data source

and the syntactic model M

Syn

and provide an

initial combined model by performing the seman-

tic labeling, returning a combined model M

with

labeled leaf nodes.

• ARS for Modeling (ARS-M): receive DS

and M

and perform the semantic labeling. Return M

in-

cluding the initial semantic model or multiple pos-

sible models represented as recommendations.

• ARS for Reﬁnement (ARS-R): receives DS

and

and computes modiﬁcations that might help

improve the state of the model (cf. Section 3).

How this is achieved and which modiﬁcations are

deemed ﬁtting is up to the contained algorithm.

Return M

and a number of optional modiﬁca-

tions ∆

, .., ∆

as recommendations. ARS-R do

not alter M

Syn

or M

Sem

directly. Also, ARS-R

are expected to be invoked multiple times and

thus may keep an internal state for the speciﬁc

data source, e.g., to improve recommendations in

follow-up executions.

PLASMA: Platform for Auxiliary Semantic Modeling Approaches

407

4.1.5 Semantic Recommendation Service (SRS)

This service coordinates the ARS that are connected

to PLASMA to perform their tasks during phases 2-

4 of the semantic model creation process. There-

fore, the SRS provides three different endpoints to

request modiﬁcations for phase 2-4 of the semantic

model creation process. All ARS of one type use the

same interface. Using standardized interfaces allow

the SRS to contact previously unknown services via

auto discovery, a feature that will be operational in the

near future. In the current state, the SRS is conﬁgured

which of the available ARS to use. Once development

of a new ARS has ﬁnished, the ARS can be integrated

into the architecture and the SRS adjusted to use the

new ARS for further operations.

4.1.6 Data Modeling Service (DMS)

This service keeps the current state of all combined

models M

in its database and updates them once

modiﬁcations are issued. Whenever an actor modi-

ﬁes the current state of the combined model, those

changes are processed and validated using the DMS.

These modiﬁcations, which are done by applying a

modiﬁcation ∆ to a combined model M

, can either be

made by the user using the UI or originate from (auto-

mated) recommendation systems, like ARS. The only

restriction is that there always has to be one deﬁned

state of combined model M

for each data source DS

in the DMS to be provided to the UI.

4.1.7 Knowledge Graph Service (KGS)

This service is the platform’s main source for seman-

tic information. It maintains the conceptualization

that is used for generating mappings as well as the

mappings itself in the form of semantic models ex-

tracted from the combined model. We refer to those

types as conceptualization layer and mapping layer.

To represent these two layers, we use an upper on-

tology as introduced by Pomp (Pomp, 2020). This

ontology allows to combine the conceptualization and

the mapping layer by representing them in a single

knowledge graph. By utilizing a knowledge graph,

we achieve direct linkage between the conceptualiza-

tion and the mapping layer, allowing algorithms to ef-

ﬁciently operate on the structure.

The conceptualization layer can be initialized by

importing already existing ontologies in RDF Turtle

syntax to form the target conceptualization for the se-

mantic models. During the import, RDF triples are

converted into Concepts and Relations as deﬁned by

the upper ontology while keeping provenance should

those later be exported again. Both the conceptual-

ization as well as semantic models can be exported as

RDF in Turtle syntax again. Semantic models can be

created, deleted and searched.

4.2 User Interface

Figure 3: Partial screenshot of the reﬁnement UI of

PLASMA during the modeling process with an unﬁnished

semantic model. Highlighted node is a recommendation.

To allow the user to inspect the processing that has

taken place, we provide a web-based user interface

(UI) that allows to operate the basic functionalities of

PLASMA. In the following, we give a brief overview

of the UI, including the available views and function-

alities. PLASMA’s UI consists of multiple views, one

for the creation of data sources, one for the submis-

sion of sample data for an existing data source and a

reﬁnement UI that enables the user to perform man-

ual actions during the semantic reﬁnement phase. The

realized modeling concept for the reﬁnement UI fol-

lows the approach presented by (Pomp et al., 2018),

but has received a major overhaul. Figure 3 shows the

reﬁnement UI displaying the syntactic model and an

unﬁnished semantic model. Here, the user can inspect

recommendations issued by the system and edit the

current state of the semantic model. The reﬁnement

UI aims to provide the user with a convenient model-

ing environment, hiding technical details for beginner

users. The syntactic model is shown as nodes with

dashed lines and can be modiﬁed by the user, i.e.,

nodes can be rearranged. For hierarchical data, the UI

also displays the structure. Based on the environment

the reﬁnement UI is used (cf. Section 5), even more

sophisticated operations, like syntactic node splitting

or assigning data types, can be realized. However, as

this requires modiﬁcations to the original data, fur-

ther data processing components are needed that are

not part of PLASMA itself.

The semantic model M

Sem

consists of nodes that

are comprised of two parts. A concept part (upper

half), realizing an assigned concept from the con-

ceptualization layer and a mapping part (lower half)

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

408

representing the mapping layer component (cf. Sec-

tion 4.1.7). Each mapping part serves as a projec-

tion of a concept (denoted in the concept part) in

a semantic model. While concept parts cannot be

edited or adjusted, the mapping part is customizable,

e.g., can be renamed in order to resolve ambiguities

within the model, while still being bound to a con-

cept which ensures universal understandability. Re-

lations are taken directly from the conceptualization

layer and can therefore not be modiﬁed. Available

elements can be found using the left side drawer of-

fering a search functionality for existing concepts and

relations. Concepts as well as relations are then added

using drag and drop mechanisms. When a concept is

attached to a leaf node of the syntactic model, a map-

ping is created between the semantic model and the

data set in L

Following each modiﬁcation, the DMS requests

an update on the recommendations from the SRS us-

ing the now modiﬁed model as input. This may result

in recommendations being removed as they no longer

apply, or new ones being generated. Recommenda-

tions are displayed when hovering one in the rec-

ommendations tab and applied by double-clicking on

them, providing a convenient way to extend or modify

the semantic model. However, as we are experiment-

ing with different visualization techniques, this might

change in the future. Figure 3 shows recommenda-

tions as highlighted nodes, in this instance a possi-

ble generalization of an airport as a transportation

hub to indicate it as an organizational unit instead of,

for example, the area. There might be more recom-

mendations available, but are hidden due to a general

threshold for certainty of ω ≥ 0.5, resulting in recom-

mendations with lower certainty to be hidden.

4.3 Semantic Model Creation Process in

PLASMA

In this section, we brieﬂy describe the main phases

of the semantic model creation process (Figure 1) and

how those are executed in PLASMA. Figure 4 gives

an overview of the main operations and their order.

For an existing data source DS, data can be added to

be analyzed. The system currently processes JSON

typed example data values as JSON can be used to de-

scribe both structured and ﬂat data sources. The data

is then transmitted to the SAS where it is analyzed

. Once the syntactic model M

Syn

has been created

(phase 1), the result is transmitted to the DMS where

an empty semantic model M

Sem

is attached to it

.

The model is then handed to the SRS for the initial

labeling and modeling steps

. To achieve this, the

SRS requests an automated semantic labeling (phase

2) from an ARS-L. The result is an initial labeling of

the leaf nodes of the syntactic model in the returned

combined model M

. This is followed by request-

ing an automated semantic modeling (phase 3) to ex-

tend M

past the labeled concepts. The ARS-M per-

forms this operation and returns the M

to the SRS

. This model is then returned to the DMS

 and

shown in the User Interface

. Once the user per-

forms a manual action, the updated M

is handed over

to the DMS

 which then requests a recommendation

for the current M

from the SRS

, using the reﬁne-

ment endpoint. The SRS uses instances of ARS-R to

acquire one or more recommendations

 and returns

those to the DMS

 where M

is adjusted and the

UI updated

, giving the user the ability to inspect

and optionally accept those recommendations. Steps



 are repeatable and form the Seman-

tic Reﬁnement Process (phase 4). Once the user has

ﬁnished editing, the ﬁnalized semantic model M

Sem

is sent to the KGS for storage (phase 5), ending the

reﬁnement process and bringing the semantic model

creation of the given data source to an end

.

5 ONGOING USE CASE

PLASMA was designed to be a baseline support tool

for upcoming projects that aim to improve the se-

mantic model creation process. Future approaches

can be developed and tested in a controlled environ-

ment, where the auxiliary services architecture sup-

ports a plug and play approach to setup and test new

approaches after PLASMA is set up. PLASMA is

currently in use in the Bergisch.Smart.Mobility

re-

search project, which features a data marketplace for

smart city open data, providing the semantic model

creation functionalities for the semantic annotation of

data sources. Within the ongoing project, PLASMA

is integrated alongside the semantic data ingestion en-

gine and provides an infrastructure that enables mu-

nicipal employees to create semantic models to de-

scribe their data sets. The created semantic models

are then used to integrate the data and provide it to

data consumers. Figure 5 shows how PLASMA is po-

sitioned in the general workﬂow of the marketplace.

During the project runtime, PLASMA’s components

are adjusted to improve the usability of semantic tech-

nologies for municipal workers. There are also multi-

ple ARS being tested to improve the overall automatic

modeling performance of the platform, for example

one focusing on frequently used geospatial concepts.

https://www.bergischsmartmobility.de/en/the-project/

PLASMA: Platform for Auxiliary Semantic Modeling Approaches

409

Auxiliary Recommendation Services

Schema Analysis

Service

Data Modeling

Service

User Interface

Semantic

Recommendation

Service

Knowledge Graph

Service

ARS-L ARS-M ARS-R

syntactic model combined (syntactic + semantic) model semantic recommendation(s)semantic model

input data

Figure 4: The semantic model creation process.

 Upload of input data to the SAS.

 Submission of the resulting syn-

tactical model to the DMS.

 Request for semantic recommendations based on the current state of the combined model.

 Requesting of semantic labeling using separated Auxiliary Recommendation Service (ARS).

 Requesting of semantic

modeling using separated ARS.

 Provision of initial model to the DMS.

 Transmission of current model to GUI.

 User

manually updates model.

 Requesting of semantic recommendation using separated ARS.

 Provision of one or more

suggestions to the DMS.

 Submission of ﬁnal semantic model to the KGS for storage or export.

5.1 Future Usage Scenarios

In the project, PLASMA is evolved to support the

users when creating semantic models. However, aside

from those advancements, estimating the quality of

different approaches is also in the focus of the plat-

form. For example, to compare two different seman-

tic modeling approaches, after providing the target

ontology and connecting any ARS-L for the labeling

phase, approach A, realized as an ARS-M, would be

connected to the SRS. The model is created and after-

wards the service exchanged to another ARS-M con-

taining approach B. After another run, two results are

available for export and evaluation. By exchanging

only one component during runs, comparable results

can be reached as the remaining environment remains

unchanged.

With the UI including the recommendation in-

spection and selection, PLASMA also aims to pro-

vide a platform to advance research towards user sup-

port during the semantic reﬁnement phase. A speciﬁc

aim of PLASMA is to enhance the visualization and

selection of recommendations to actively support the

reﬁnement process. Therefore, new approaches can

be developed utilizing PLASMA as a platform.

6 CONCLUSION AND OUTLOOK

In this paper, we introduced the interactive manual

reﬁnement of a semantic model as an explicit phase

of the semantic model creation process. It covers an

essential part of the creation of semantic models us-

ing a human in the loop approach to manually cor-

rect and improve results generated by automated ap-

proaches. We also presented PLASMA, an extend-

able platform aimed to support the development and

integration of semantic labeling/modeling/reﬁnement

approaches using the concept of auxiliary modeling

services. The platform is designed to function as a

tool to develop new approaches for the semantic re-

ﬁnement process. PLASMA is build in a modular pat-

tern to (i) allow modiﬁcations or replacements of sin-

gle components, to (ii) integrate new approaches and

evaluate them using a common framework and to (iii)

inspect results using a web based UI aimed to support

future semantic model reﬁnement approaches.

In the future, we plan to provide a set of exist-

ing modeling algorithms for PLASMA by implement-

ing current approaches in dedicated ARS. The SRS

is planned to support multiple similar approaches,

e.g., for semantic labeling, and let the user decide

which algorithm to use for the current run. This

would greatly reduce the manual conﬁguration cur-

rently needed and also allow users to evaluate and

compare different approaches without the need to

modify the technical architecture during runs. In ad-

dition, we plan to support batch operations, allowing

to analyze or model multiple data sources automat-

ically, optionally using different ARS instances for

each run. This way, we hope to eventually utilize

PLASMA as an evaluation framework for different

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

410

Marketplace UI

Data Storage

Semantic Data

Integration Engine

Schema Analysis

Service

Data Modeling

Service

Modeling Interface

Knowledge Graph

Service

syntactic model combined (syntactic + semantic) modelsemantic model

input data

data source

sample data

integrated data

user

processed data

integrated data

PLASMA component

Semantic Data

Access Engine

Figure 5: Ingestion and retrival process for a single data source with PLASMA integrated into a data marketplace. The SRS

is omitted for brevity.

 Upload of sample data to the SAS.

 Submission of the resulting syntactical model to the DMS.

 Creation / reﬁnement of the semantic model for the data source.

 Submission of ﬁnal semantic model to the KGS.



Integration of the raw input data using the created semantic model.

 Storage of semantically annotated integrated data in the

data storage.

 User(s) search data space based on semantic models.

 Raw data or integrated data is requested, processed

and transferred to the user.

automated modeling approaches. In this context, to

even save the need to export and evaluate models, a

integrated benchmarking functionality is envisioned,

but needs further speciﬁcation.

REFERENCES

Abdelmageed, N. and Schindler, S. (2020). JenTab: Match-

ing Tabular Data to Knowledge Graphs. The 19th In-

ternational Semantic Web Conference.

Chen, J., Jimenez-Ruiz, E., et al. (8/10/2019 - 8/16/2019).

Learning Semantic Annotations for Tabular Data. In

Proceedings of the Twenty-Eighth International Joint

Conference on Artiﬁcial Intelligence.

Futia, G., Vetr

o, A., and de Martin, J. C. (2020). SeMi:

A SEmantic Modeling machIne to build Knowledge

Graphs with graph neural networks. SoftwareX,

12:100516.

Galkin, M., Auer, S., and Scerri, S. (2016). Enterprise

Knowledge Graphs: A Backbone of Linked Enterprise

Data. In 2016 IEEE/WIC/ACM International Confer-

ence on Web Intelligence (WI).

Giese, M., Soylu, A., Vega-Gorgojo, G., Waaler, A., Haase,

P., Jimenez-Ruiz, E., Lanti, D., Rezk, M., Xiao, G.,

Ozcep, O., and Rosati, R. (2015). Optique: Zooming

in on Big Data.

Gupta, S., Szekely, P., et al. (2012). Karma: A System for

Mapping Structured Sources into the Semantic Web.

In Extended Semantic Web Conference.

Hulsebos, M., Hu, K., et al. (2019). Sherlock. In Proceed-

ings of the 25th ACM SIGKDD International Confer-

ence on Knowledge Discovery & Data Mining.

Jim

enez-Ruiz, E., Kharlamov, E., et al. (2015). BootOX:

Bootstrapping OWL 2 Ontologies and R2RML Map-

pings from Relational Databases. In International Se-

mantic Web Conference (Posters & Demos).

Knoblock, C. A., Szekely, P., et al. (2012). Semi-

Automatically Mapping Structured Sources Into the

Semantic Web. In Extended Semantic Web Confer-

ence, pages 375–390.

Papapanagiotou, P., Katsiouli, P., et al. (2006). RONTO:

Relational to Ontology Schema Matching. AIS

Sigsemis Bulletin, 3(3-4):32–36.

Paulheim, H. (2016). Knowledge graph reﬁnement: A sur-

vey of approaches and evaluation methods. Semantic

Web, 8(3):489–508.

Paulus, A., Pomp, A., et al. (2018). Gathering and Com-

bining Semantic Concepts from Multiple Knowledge

Bases. In ICEIS 2018, pages 69–80, Set

ubal, Portugal.

Pham, M., Alse, S., et al. (2016). Semantic Labeling: A

Domain-Independent Approach. In The Semantic Web

– ISWC 2016, pages 446–462, Cham. Springer Inter-

national Publishing.

Pinkel, C., Binnig, C., et al. (2017). IncMap: a Journey

Towards Ontology-based Data Integration. Daten-

banksysteme f

ur Business, Technologie und Web (BTW

2017).

Polﬂiet, S. and Ichise, R. (2010). Automated Mapping Gen-

eration for Converting Databases into Linked Data. In

Proceedings of the 2010 International Conference on

Posters & Demonstrations Track.

Pomp, A. (2020). Bottom-up Knowledge Graph-based

Data Management. Berichte aus dem Maschinenbau.

Shaker.

Pomp, A., Kraus, V., et al. (2020). Semantic Concept Rec-

ommendation for Continuously Evolving Knowledge

PLASMA: Platform for Auxiliary Semantic Modeling Approaches

411

Graphs. In Enterprise Information Systems, Lecture

Notes in Business Information Processing. Springer.

Pomp, A., Lipp, J., and Meisen, T. (2019). You are Miss-

ing a Concept! Enhancing Ontology-based Data Ac-

cess with Evolving Ontologies. In Proceedings, 13th

IEEE International Conference on Semantic Comput-

ing, pages 98–105.

Pomp, A., Paulus, A., et al. (2018). A Web-based UI to

Enable Semantic Modeling for Everyone. Procedia

Computer Science, 137:249–254.

Pomp, A., Paulus, A., Jeschke, S., and Meisen, T. (2017).

ESKAPE: Platform for Enabling Semantics in the

Continuously Evolving Internet of Things.

Ramnandan, S. K., Mittal, A., et al. (2015). Assigning Se-

mantic Labels to Data Sources. In The Semantic Web.

Latest Advances and New Domains, pages 403–417,

Cham. Springer International Publishing.

ummele, N., Tyshetskiy, Y., and Collins, A. (2018). Eval-

uating Approaches for Supervised Semantic Labeling.

CoRR, abs/1801.09788.

Sengupta, K., Haase, P., Schmidt, M., and Hitzler, P. (2013).

Editing r2rml mappings made easy. In Proceedings

of the 12th International Semantic Web Conference,

page 101–104.

Syed, Z., Finin, T., et al. (2010). Exploiting a Web of Se-

mantic Data for Interpreting Tables. In Proceedings of

the Second Web Science Conference.

Szekely, P., Knoblock, C. A., et al. (2013). Connecting

the Smithsonian American Art Museum to the Linked

Data Cloud. In The Semantic Web: Semantics and Big

Data, pages 593–607, Berlin, Heidelberg. Springer

Berlin Heidelberg.

Szekely, P., Knoblock, C. A., et al. (2015). Building and

Using a Knowledge Graph to Combat Human Traf-

ﬁcking. In The Semantic Web - ISWC 2015, volume

9367, pages 205–221. Springer, Cham.

Taheriyan, M., Knoblock, C. A., et al. (2013). A Graph-

Based Approach to Learn Semantic Descriptions of

Data Sources. In The Semantic Web – ISWC 2013,

pages 607–623, Berlin, Heidelberg. Springer Berlin

Heidelberg.

Taheriyan, M., Knoblock, C. A., et al. (2015). Leveraging

Linked Data to Infer Semantic Relations within Struc-

tured Sources. In Proceedings of the 6th International

Workshop on Consuming Linked Data (COLD 2015).

Taheriyan, M., Knoblock, C. A., et al. (2016). Learning the

Semantics of Structured Data Sources. Journal of Web

Semantics, 37-38:152–169.

Takeoka, K., Oyamada, M., et al. (2019). Meimei: An Efﬁ-

cient Probabilistic Approach for Semantically Anno-

tating Tables. Proceedings of the AAAI Conference on

Artiﬁcial Intelligence, 33(01):281–288.

na, D. D., R

ummele, N., et al. (2018). Machine Learn-

ing and Constraint Programming for Relational-To-

Ontology Schema Mapping. In Proceedings of the

Twenty-Seventh International Joint Conference on Ar-

tiﬁcial Intelligence.

Vu, B., Knoblock, C., and Pujara, J. (2019). Learning Se-

mantic Models of Data Sources Using Probabilistic

Graphical Models. In The World Wide Web Confer-

ence, WWW ’19, pages 1944–1953, New York, NY,

USA. ACM.

Wang, J., Wang, H., et al. (2012). Understanding Tables

on the Web. In Conceptual Modeling, pages 141–155,

Berlin, Heidelberg. Springer Berlin Heidelberg.

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

412