Why an Automated, Scalable and Resilient Service for Semantic

Interoperability is Needed

Maximilian St

abler

1 a

, Tobias M

uller

2 b

Frank K

oster

and Christoph Schlueter-Langdon

Institute for AI Safety and Security, DLR, Wilhelm-Runge-Straße 10, 89081 Ulm, Germany

SAP SE, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany

Institute for AI Safety and Security, DLR, Lilienthalplatz 7, 38108 Braunschweig, Germany

Drucker School of Business, Claremont Graduate University, 1021 N Dartmouth Ave, Claremont, CA 91711, U.S.A.

Keywords:

Knowledege Representation, Ontologie Alignment, Semantic Interoperability, Graph Analysis.

Abstract:

The increasing linkage of different data sources and data ecosystems underlines the need for high-quality and

well-structured data. Unambiguous descriptions of data (meta-data) promote a common understanding of the

data among different users. New ontologies and data schemas are constantly being developed for this purpose.

While there are new ways to align, merge or match these ontologies and data schemas, the context of the data,

which is important for a clear understanding, is often not taken into account. This work addresses this problem

by analyzing a graph consisting of 1, 615 data attributes from 13 domains and 828 different ontologies. The

results show how overlapping and partially synonymous ontologies, both from the same domain and from

different domains, are. The results show the complexity for users in creating unique descriptions of data and

why new approaches and methods are needed to achieve semantic interoperability.

1 INTRODUCTION

Nowadays, ontology users and ontology mapping

practitioners often face a challenging problem: there

is an enormous amount of different ontologies from

different communities, with different backgrounds,

intensions, qualities, and scopes that claim some rele-

vance for ontology usage (Boukhers et al., 2023; Jab-

bar et al., 2017). The question how to deal with this

multitude of possibilities, how to best describe data

with it, which ontologies to use, which process to fol-

low when modeling or how to decide for or against

an ontology is so far only little explored. There are

many efforts that either match ontologies and individ-

ual datasets (Ardjani et al., 2015), or attempt to iden-

tify the identical entities in large knowledge graphs

(Nejhadi et al., 2011). Approaches for a holistic,

cross-domain view of ontologies and datasets (Doan

et al., 2004), that are not limited to individual ontol-

ogy parts or datasets, are scarce (Liu et al., 2021).

In this paper we aim to adress this issue and fo-

cus on the opportunities of modeling data with on-

https://orcid.org/0000-0003-1311-3568

https://orcid.org/0000-0002-9088-5054

tologies of a practitioner. The various possibilities

and manifestations that can occur in the application

when modelling an existing data schema were exam-

ined. An example of different ontological possibil-

ities are shown in Figure 1. Here, the practitioner

wants to assign a suitable explicit ontology entity to

his database column ”id”. Since the property ”id”

Figure 1: Possibilities to model ”id” with different ontolo-

gies.

does not exist only in one ontology, but in many

different ones from different domains, for different

application and in different version, the user asks

himself which option is the best for his speciﬁc use

case. Although the different ontologies all describe

the property ”id”, one can see that the descriptions of

the individual ontoligies for the property ”id” differ

slightly. The respective property targets once an ”en-

Stäbler, M., Müller, T., Köster, F. and Schlueter-Langdon, C.

Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed.

DOI: 10.5220/0012345000003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 299-307

ISBN: 978-989-758-680-4; ISSN: 2184-433X

299

tity”, once a ”thing” and once a ”user”. This can be

irrelevant for some applications, but for other appli-

cations a distinction and exact differentiation between

these properties can be very important. Searching for

”id” on vocabulary search engines such as Linked-

Open-Data

, returns 2948 search results (properties

& classes) from 68 different ontologies. This hetero-

geneous and often opaque ontology landscape is ex-

plored in this work, based on 21081 attributes from

applications (e.g., database column names) and 763

individual ontologies from different domains. Thus,

in addition to describing the problems faced by ontol-

ogy users in modeling existing datasets, our contribu-

tions include descriptions of the connections and de-

pendencies in the domain-independent ontology land-

scape using real-world data attributes. The different

ontologies are divided into commonly used ”general”

ontologies and domainapplication speciﬁc ontologies.

Likewise, the connections between the 13 individual

domains are shown. The generated graph also allows

for analysis of which ontologies describe the same at-

tribute and which ontology ﬁts best or least to which

domain.

The remainder of the paper is structured as fol-

lows: The following Chapter 2 presents related work

before describing the technical background in detail

in Section 3. Our approach is described in Chapter 4

before the results obtained are presented in Section 5,

and discussed with a conclusion in Section 6.

2 RELATED WORK

In a world where information is distributed over the

Internet, describing it clearly and understandably is

a serious challenge (Uschold and Gruninger, 2004).

There is a need to ensure that different people, from

different companies, communities or countries, un-

derstand and use the information in the same way

(Davies et al., 2020). Ontologies have provided a

solution to this problem by deﬁning a speciﬁc vo-

cabulary used by a speciﬁc application (Euzenat and

Shvaiko, 2013). However, the sparse reuse of existing

ontologies in the development of domain or applica-

tion speciﬁc ontologies results in multiple similar on-

tologies in any given domain (Euzenat et al., 2004).

According to Predoiu (2006) it is unlikely to ﬁnd two

ontologies that describe the same ”thing” (concept)

with perfect overlap which makes communication and

interoperability either difﬁcult or impossible.

This has direct implications for the semantic land-

scape: in previous work (St

abler et al., ) we have

https://lov.linkeddata.es/dataset/lov/terms

shown that the current semantic landscape is inter-

connected across different domains. Different do-

mains share few common attributes and use many do-

main/application speciﬁc attributes. For practitioners,

this means that a single ontology is not sufﬁcient to

describe all attributes in a adequate level of detail in

most use cases. Different ontologies have to be used

together and in combination. This, combined with a

large number of ontologies describing individual at-

tributes in different ways, leads to signiﬁcant man-

ual effort in merging and sharing different ontologies

(Boukhers et al., 2023).

Euzenat et al. (2004) also expect that ontologies will

not remain static and that different versions of ontolo-

gies will need to be tracked. It is expected that both

new interdisciplinary ontologies will need to be cre-

ated from existing domain-speciﬁc ontologies (Bento

et al., 2020) and various existing ontologies will need

to be merged (Liu et al., 2021). Examples include

merging domain-speciﬁc ontologies with more gen-

eral ontologies, consolidating different ontology ver-

sions, or enriching existing ontologies with new infor-

mation (Shenoy et al., 2013). In addition, new ontolo-

gies may be created by merging information from het-

erogeneous databases or other information sources.

Reducing manual effort in establishing seman-

tic interoperability is the goal of ontology matching

(Doan et al., 2004) and ontology alignment (Nejhadi

et al., 2011). Here, machine learning rules or methods

are used to automatically transfer concepts and terms

from one ontology or vocabulary to another. Tech-

niques such as semantic similarity measures (Sousa

et al., 2022), graph-based methods (Shenoy et al.,

2013) and deep learning models (Khoudja et al.,

2018; Iyer et al., 2020; Bento et al., 2020) are used to

identify correspondences between concepts in differ-

ent ontologies or vocabularies (Boukhers et al., 2023).

The goal is to create a mapping that allows the ex-

change of data between systems that use different on-

tologies or vocabularies, while preserving the mean-

ing of the data. In the literature, the problem of a

heterogeneous semantic landscape has been identiﬁed

several times and a variety of approaches to ontology

matching have been presented. Otero et al. (2015) de-

scribe that the different approaches are based on dif-

ferent techniques, but as a result they also work and

are applicable differently in different use cases. In

order to meet the increasing demand for high quality

data in the future, it is necessary to create automat-

able, scalable and resilient services that do not only

organise and connect the semantic landscape accord-

ing to a speciﬁc technique (context-based OR content-

based), but also connect different techniques (context-

based AND content-based AND element-level AND

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

300

structure-level). We contribute to this by describing

and analyzing the semantic/ontology landscape.

3 TECHNICAL BACKGROUND

In this Chapter, we describe both the data sources for

the data attributes and ontologies used, and the tech-

nical basis the creation and analysis of the graph.

3.1 Ontologies - Linked Open

Vocabulary

Figure 2: Overview of the Linked Open Vocabularies Ar-

chitecture. Adapted from (Vandenbussche et al., 2016)

Linked-Open-Vocabluary (LOV)

is an open and col-

laborative platform that provides a centralised repos-

itory for semantic vocabularies and ontologies. We

used LOV not only because it is a widely used tool

and has among the most comprehensive collection

of ontologies, but also because the interfaces ensure

ease and reliability of use. These vocabularies play a

critical role in deﬁning and describing concepts, re-

lationships and attributes in various domains, allow-

ing knowledge to be represented in a structured and

machine-understandable format. The primary goal

of LOV is to foster the creation, dissemination, and

adoption of semantic vocabularies across domains,

thereby promoting semantic interoperability and har-

monising data integration efforts. LOV exhibits a

well-structured architecture to facilitate the organiza-

tion and management of semantic vocabularies. The

key components of LOV include:

• Vocabularies: LOV hosts a vast collection of se-

mantic vocabularies, each designed to cover a spe-

ciﬁc domain or knowledge area. These vocabu-

laries are meticulously curated and enriched with

metadata, making it easier for users to discover

and evaluate them.

• Ontologies: Beyond simple vocabularies, LOV

https://lov.linkeddata.es/dataset/lov

also includes ontologies—more complex and for-

malized representations of knowledge. Ontolo-

gies capture the relationships, axioms, and con-

straints within a domain, enabling the develop-

ment of sophisticated knowledge graphs.

A detailed overview of the architecture of LOV is

given in Figure 2. Vandenbussche et al. (2016)

describes the architecture as follows: The goal of

this architecture is to promote and facilitate the reuse

of well-documented vocabularies in the Linked Data

ecosystem. To achieve this goal, the LOV performs

the following three main activities: 1) collecting new

vocabularies from the LOV community; 2) tracking

and analyzing the LOV vocabulary catalog; and 3) en-

abling access to the data using various indexes and

publishing methods to facilitate data use, including a

search engine, data dumps, SPARQL endpoints, and

APIs. We used the API for our approach. All at-

tributes were sent to the endpoint:

https://lov.linkeddata.es/dataset/lov/api/ \

v2/term/search?q=<SEARCH STRING>& \

page_size=3000

The ”q” parameter contains the search string (at-

tribute) and the ”page size” parameter returns the

maximum possible number of search results. The

maximum number of search results when using the

endpoint was 2554 therefore no loss of possible re-

sults due to ”page size”. The LOV search engine

employs a ranking algorithm that assesses term pop-

ularity not only within datasets but also within the

LOV ecosystem. The algorithm assigns scores based

on which label property a searched term matches.

These scores are used in Chapter 5 as a decision vari-

able whether to assign an ontology to an attribute or

not. For calculating the score of each property class

match, four label property categories are considered

(Vandenbussche et al., 2016):

• Local Name: When a searched term matches the

local name of a URI (Uniform Resource Identi-

ﬁer), it receives the highest score. The local name

is a compressed form of a term label used in con-

structing the URI. For instance, ”person” match-

ing the local name http://schema.org/Person re-

ceives a high score.

• Primary Labels: Matches on properties

like rdfs:label, dce:title, dcterms:title, and

skos:prefLabel also receive the highest score.

For example, matching ”person” with rdfs:label

”Person”@en gets a high score.

• Secondary Labels: Properties such

as rdfs:comment, dce:description, dc-

terms:description, and skos:altLabel are

classiﬁed as secondary labels. A medium

Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed

301

score is assigned for matches on these prop-

erties. For example, matching ”person” with

dcterms:description ”Examples of a Creator

include a person, an organization, or a ser-

vice.”@en receives a medium score.

• Tertiary Labels: All properties not falling

into the previous categories are considered

tertiary labels and receive a low score. For

instance, matching ”person” with the URI

http://metadataregistry.org/uri/proﬁle/RegAp/name

”Person”@en gets a low score.

The algorithm prioritizes properties like rdfs:label

over dcterms:comment based on their nature. Dif-

ferent indexing tokenizers and scoring methods are

applied to these labels accordingly. Consequently,

a term matching rdfs:label will have a higher score

compared to a match with dcterms:comment, reﬂect-

ing the different signiﬁcance and roles of these labels

in the LOV ecosystem.

3.2 Attributes - SmartDataModels

In this Section, we introduce SmartDataModels

(SDM), an approach that aims to provide a compre-

hensive solution for managing and sharing data mod-

els. The SDM were used because we are not aware of

any approach that offers a comparable qualitatively

equivalent coverage of data attributes from different

domains. We outline the structure and capabilities of

SmartDataModels, emphasising their applicability to

a wide range of applications.

The SDM initiative addresses the critical need for

standardised data models in the context of data in-

teroperability. SDM are designed to serve as a com-

mon language for data, promoting interoperability

and seamless integration between disparate systems

and devices. The key objective of SmartDataMod-

els is to simplify the process of deﬁning, sharing and

using data models in applications. These models cap-

ture the essential attributes and relationships of enti-

ties and events in a given domain, enabling develop-

ers, researchers and organisations to efﬁciently model

their data without reinventing the wheel. SDM have a

well-deﬁned structure that facilitates their ease of use

and extensibility. The main components include:

• Entity Types: These represent real-world objects,

phenomena or concepts within a particular do-

main. For example, in the context of smart cities,

entity types might include ’streetlight’, ’car park’

or ’weather station’.

• Attributes: Each entity type is associated with

a set of attributes that describe its characteristics

https://smartdatamodels.org/

and properties. Attributes can be of different types

such as text, numeric, boolean or date/time. Ex-

amples of attributes are: features, id, address-

Country, cio, locatedAt, maxQ.

Figure 3 shows an overview of the SDM used. It can

be seen that only 1.615 of the 21.081 available at-

tributes are included in the graph generation. Various

ﬁlter criteria have been set for this purpose:

• The attribute must be unique in the graph. At-

tributes such as ’id’ or ’type’ are used in different

data models of the SDM, so these are already not

unique by the structure of the SDM. This has the

advantage that central attributes are uniquely de-

ﬁned.

• The mapping score must be smaller than 0.2. For

further information on the score and the determi-

nation of the threshold, also see Chapter 4.2.

Figure 3: SmartDataModels overview with domains and

amount of used attributes.

3.3 Graphs

Graphs and networks are deﬁned by a set of vertices

V and a set E of relations between the vertices. The

simplest relation is an edge deﬁned as a pair of ver-

tices (a, b) with a ∈ V and b ∈ V . A weighted graph

G = (V, E) is attributed by a function w that assigns a

weight w(e), typically w(e) > 0, to each edge e ∈ E.

In the graphical representation of networks, the values

of the weights w of the graph G are distinguished by

line weight or value, line sign or line type (De Nooy,

2009). Examples of value differentials in a social con-

text include intensity, frequency, valence, or type of

social relationship. The set of possible relationships

per node is potentially inﬁnite (Martino and Spoto,

2006). The degree of a node describes the number of

edges connected to the node.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

302

4 APPROACH

To be able to describe the semantic landscape repre-

sentatively, we searched for matching properties and

classes from 828 high-quality ontologies available

from LOV for each of the 21.081 SDM data attributes.

The attributes and ontologies were transformed into a

graph structure in order to analyse and describe the

dependencies, overlaps and connections that are in-

visible to the user. In the following Chapter, the data

sources for the ontologies and attributes as well as the

methods used are presented.

4.1 Data Basis

The tables 1, 2, 3 contain the data basis for the cre-

ation of the network graphs, as well as the calcula-

tion of all result values values. The vocabularies and

ontologies were requested through the LOV API end-

point described in Chapter 3.1. For the atttributes of

the SDM, along with extended description and infor-

mation of each attribute, the Python library PySmart-

DataModels

was used. This raw data was processed

using the Python programming language and trans-

formed into the following tables. For presentation

reasons, the columns of the tables are listed.

Table 1: SDM Attributes ”property” column

holds all 21.081 attributes. ”dataModel” and ”re-

poName” are used to create a mapping between the

attribute and the domains shown in Table 2.

• property: id

• dataModel: Activity

• repoName: dataModel.User

Table 2: SDM DataModels The ”repoName” col-

umn contains the matching key to Table 1. The ”do-

main” column describes the domains in which the re-

spective attribute is used.

• repoName: dataModel.User

• repoLink: https://github.com/smart-data-

models/dataModel.User.git

• dataModels: [Activity, UserContext]

• domains: [CrossSector]

Table 3: LOV-Attribute Mapping This table con-

solidates the attribute with the results from the API

endpoints request from the LOV server. Column

https://pypi.org/project/pysmartdatamodels/

”property” contains the respective attribute, and col-

umn ”num results” shows the total number of re-

sults from the request. The columns ”dataModels”,

”preﬁx name” and ”preﬁx preﬁx” describe the re-

spective ontology-property. For all results of an at-

tribute (”property”-column), ”num results” remains

the same.

• property: id

• num results: 2948

• preﬁx name: mv:id

• score: 0.555

• vocab preﬁx: mv

4.2 Graph Construction and Analysis

The data presented is used to create various graphs

and to derive descriptive statistics to describe the se-

mantic landscape. A detailed list of the libraries

used, together with the Python code, can be found on

GitHub

. The visualisations of the graphs were cre-

ated using the program Gephi

In the present work, several columns of the pre-

sented tables were deﬁned as nodes. These are: SDM

attributes (Table 3: property), the ontology property /

class (Table 3: preﬁx name), ontology (Table 3: vo-

cab preﬁx) and the domains (Table 2: domains).

We also use weighted edges to better describe the re-

lationship between attributes, ontology properties /

classes and domains. As weight either the common

occurrence (number) of the two nodes (

∑

(a, b)), or

the score (Table 3: score) was used. The threshold for

creating an edge was set to 0.2. Therefore, only one

edge was drawn between two nodes if the score was

greater than 0.2. This cutoff value was determined by

independent manual examinations by three scientists.

The scientists each scored the query’s LOV (property,

description) results as ”matching” or ”mismatching”

for 200 random SDM attributes. For each SDM at-

tribute, a maximum of 50 results were scored. If the

number of LOV results was greater than 50, 50 re-

sults were sampled from the total results. Below an

overview of the classiﬁcation scheme of a researcher

for LOV results for the SDM attribute ”address”. For

presentation reasons, the table is presented as a num-

bered list.

http://bit.ly/3FanBYI

https://gephi.org/

Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed

303

1. • property: swpo:hasAddress

• description: This property relates an agent to

its address.

• suitable: True

2. • property: gleif-base:hasCity

• description: name and address. It makes use

of the OMG Languages @en

• suitable: False

In this example, the LOV result ”swpo:hasAddress”

was assessed as suitable (1) and the LOV result ”gleif-

base:hasCity” was assessed as unsuitable (2). This

process was performed for each of the 200 ran-

domly selected SDM attributes, by each scientist. An

overview of the distribution of score values for the

three scientists and their ratings is provided in Fig-

ure 4. It can be seen that for each of the scientists,

the average score of the matching attributes is higher

than that of the mismatching attributes. The score

threshold is therefore deﬁned as the mean value of the

LOV results classiﬁed as unsuitable. This is shown in

Figure 4 with a horizontal dashed line. Experiments

showed that increasing the threshold to 0.3 resulted

in an average −50.38% decrease in the number of re-

sults per attribute, and decreasing the threshold to 0.1

resulted in an average 195.43% increase in the num-

ber of results per attribute. Since we want to com-

pare as diverse attributes and ontologies as possible

in our study, but also do not want to create inappro-

priate associations between attributes and ontologies,

the value 0.2 is considered a suitable average.

To better visualise the importance of individual

nodes in a graph and the weight of edges, the size

of nodes is determined by the degree and the thick-

ness of edges is determined by the weight of the edge.

For visualisation, the ForceAtlas2 algorithm (Jacomy

et al., 2014) was used in Gephi to determine the layout

for each graph with the following parameters:

Table 1: Gephi for Graph layout. The rest of the parameters

was not changed from the default value.

Parameter Value

Scaling 2

Gravity 1

LinLog mode True

Prevent Overlap True

Edge Weight Inﬂuence 3

5 RESULTS

The semantic and ontological landscape is evaluated

using two different methods. Firstly, classical meth-

Figure 4: Overview of the distribution of the value score

based on the assessments of the three scientists whether a

LOV result ﬁts the requested SDM attribute or not. For each

scientist there are two distributions describing the scores of

the ”Suitable” and ”Unsuitable” results. The dashed line

shows the mean score (0.2) of ”Unsuitable” for all three

scientists.

ods of descriptive statistics are used to organise and

summarise the available results. On the other hand,

methods of graph analysis and visualizations are used

to gain a better understanding of the relationships

within the landscape and to detect patterns.

Figure 5 shows an overview of the number of re-

sults returned by the LOV API for each attribute (I)

and the number of times that an ontology contains a

property or class that matches the requested attribute

with a score greater than 0.2 (II). In I it can be seen

that there is a power-law-like distribution in the num-

ber of results for the SDM attributes. Thus, there

are many attributes that ﬁnd only a few matching on-

tology properties or classes, and few attributes that

match very many properties and classes from differ-

ent ontologies. The attribute ”a” gets the most results

with 44.873. 417 attributes have only one result. Sub-

ﬁgure II shows the matching of the 828 ontologies to

the queried attributes. There are ontologies that are

not used at all, as well as ontologies that match many

different attributes from different domains. On aver-

age, an ontology is used 62 times - the median is 22.

Figure 6 shows that there are ontologies that are

strongly correlated with each other and therefore con-

tain properties or classes that belong to the same SDM

attribute. Similarly, there are ontologies that are com-

pletely disjoint and have no overlap at all. We as-

sume that each of the ontologies was developed for a

speciﬁc purpose or domain. Therefore, with respect

to correlation, it is expected that ontologies from the

same domain / for the same application are more cor-

related than ontologies from different domains / for

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

304

Figure 5: I: Distribution of the results per attribute (SDM)

retrieved from the LOV-API. II: Distribution of the ontology

usage. The higher the density, the more often the ontology

contains a matching property or class for the requested at-

tribute. Both ﬁgures have a logarithmic X-axis.

Figure 6: Correlation heat map of ontologies. The color

scale shows the strength of the correlation with logarythmic

scale. For representation reasons, not every one of the 828

ontologies can be mapped on the X and Y axes.

different application. The domain associated with the

SDM attribute is included in associated domain is in-

cluded in Figure 7. It can be seen that the attributes

from the 13 domains of the SDM are used with differ-

ent frequency. Domains such as SmartEnergy (644),

CrossSector (573) and SmartSensoring (553) have a

higher degree than SmartRobotics (256), SmartAero-

nautics (236) or SmartManufacturing (232). There

are 131 attributes associated with all domains and

therefore an appropriate property/class has been as-

signed in each domain. Regarding the position of

each node, it can be seen that the node of the Cross-

Sector domain is located near the center of the graph.

The SDM attributes that can be used in different do-

mains are grouped together in this domain. Ontolo-

gies with a low degree (0− 4) tend to be located at the

edge of the graph, while ontologies with a high degree

(8 − 13) are oriented towards the center of the graph.

Similarly, ontologies that have edges to multiple do-

mains settle between these domains depending on the

weights of the edges. The arrangement of the domains

with respect to each other cannot be evaluated here,

since there are no direct connections between the do-

mains, but only via the individual ontologies.

Figure 7: Visualization of the graph structure showing the

usage of attributes from the 13 domains of the SDM. The

red nodes show the domains and the blue nodes show the

ontologies. The size of the nodes is relative to their degree.

Edges between nodes with a high weight have a thicker edge

than edges with a low weight. The ﬁve nodes with the high-

est degree have a label.

Figure 8: Graph structure showing the dependencies of the

attributes (red) on the ontologies (blue). The size of the

nodes is relative to the degree. Edges between nodes with

a high weight have a thicker edge than edges with a low

weight. The four nodes with the highest degree are labelled

in green (three ontologies and one attribute).

Figure 8 shows the relationships between the SDM

Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed

305

attributes and the LOV ontologies. It can be observed

that some ontologies have links to many attributes.

Similarly, we can observe that there are some at-

tributes that match properties and classes from many

different ontologies, although the clear majority of at-

tributes have a low degree. The ontology with the

highest degree is ”dicom” with 337. The edge with

the highest weight (825) is between the nodes ”p6”

and ”rdau”. Although the ontology ”dicom” has

the highest degree, the ontology ”dbpedia-owl” is in

the centre of the graph. This is because ”dbpedia-

owl” is a very general ontology and matches many at-

tributes from different domains. Interestingly, in sev-

eral cases, ontologies with a relatively high degree are

also positioned at the edge of the graph. One would

expect higher degree ontologies to be more centred

in the graph. The reason for this is that these ontolo-

gies have many connections to attributes that have few

connections to the high degree ontologies in the centre

of the graph (cf. ”dbpedia-owl”) and many connec-

tions to smaller ontologies at the edge of the graph.

Overall, it can be summarized that not only are

there different amounts of suitable classes and proper-

ties for different attributes in different ontologies from

different contexts, but also the ontologies themselves

often contain overlapping attributes and classes with

other ontologies. Practitioners therefore have prob-

lems in the current semantic landscape to pick the

most suitable properties and classes from the most

suitable ontology out of the multitude of possibilities.

Similarly, there is currently no broad consensus on

ontology standards in different domains.

6 DISCUSSION AND

CONCLUSION

In this work, a large dataset of high-value data

attributes (SDMs) from industry and 828 high-

value ontologies were used to describe the seman-

tic/ontological landscape. By analyzing the connec-

tions and dependencies between ontologies from dif-

ferent domains, it was shown that it is hardly possible

for a single practitioner to choose the most appropri-

ate option from all available properties and classes.

Ontologies not only overlap to a large extent within a

domain, but in many cases they also have many com-

ponents that are described and deﬁned in the same

or very similar way by other ontologies in other do-

mains. As long as there is no worldwide consensus on

which ontologies are used when, it can be assumed

that users will model identical or very similar data

differently due to the numerous possibilities. Exist-

ing approaches for matching ontologies try to support

users in this respect, but usually focus only on single

properties of the ontologies. To address this problem,

a holistic view of the data structure to be modeled and

the inclusion of the context in which the data was col-

lected or is used in the modeling is required. The in-

terrelationships shown, the different orientations and

the heterogeneous degree of domain/application spec-

iﬁcation of the ontologies clearly show the need for

new approaches, methods and services to achieve se-

mantic interoperability.

Although the database is very large, it cannot be

excluded that important ontologies for individual do-

mains have been excluded. It should also be noted

that both the SDM domains and the ontologies vary

in size. Therefore, large ontologies with many prop-

erties and classes are more likely to match a partic-

ular attribute than small, highly domain-speciﬁc on-

tologies. The presented approach to determine the

threshold of the score (see Section 4.2 has been de-

termined by three researchers, but of course may be

inappropriate for speciﬁc applications. In addition,

no analyses were performed on more complex graph

properties such as communities. Similarly, the graph

structure provides an ideal foundation for further sup-

port of other ontologies and attributes. We encourage

researchers to pursue these questions and further de-

velop the existing code base.

REFERENCES

Ardjani, F., Bouchiha, D., and Malki, M. (2015). Ontology-

Alignment Techniques: Survey and Analysis. Inter-

national Journal of Modern Education and Computer

Science, 7(11):67–78.

Bento, A., Zouaq, A., and Gagnon, M. (2020). Ontology

Matching Using Convolutional Neural Networks.

Boukhers, Z., Lange, C., and Beyan, O. (2023). Enhanc-

ing Data Space Semantic Interoperability through Ma-

chine Learning: A Visionary Perspective. In Compan-

ion Proceedings of the ACM Web Conference 2023,

pages 1462–1467, Austin TX USA. ACM.

Davies, J., Welch, J., Milward, D., and Harris, S. (2020). A

formal, scalable approach to semantic interoperability.

Science of Computer Programming, 192:102426.

De Nooy, W. (2009). Social Network Analysis, Graph The-

oretical Approaches to. In Meyers, R. A., editor, En-

cyclopedia of Complexity and Systems Science, pages

8231–8245. Springer New York, New York, NY.

Doan, A., Madhavan, J., Domingos, P., and Halevy, A.

(2004). Ontology Matching: A Machine Learning Ap-

proach. In Staab, S. and Studer, R., editors, Handbook

on Ontologies, pages 385–403. Springer Berlin Hei-

delberg, Berlin, Heidelberg.

Euzenat, J., Bach, T. L., Barrasa, J., Bo, J. D., Brussels,

VU., Dieng, R., Ehrig, M., Hauswirth, M., Lara, R.,

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

306

Innsbruck, U., Maynard, D., Napoli, A., Stamou, G.,

Shvaiko, P., Trento, U., Bolzano, FU., Acker, S. V.,

Brussels, VU., and Zaihrayeu, I. (2004). D2.2.3: State

of the art on ontology alignment.

Euzenat, J. and Shvaiko, P. (2013). Ontology Matching.

Springer Berlin Heidelberg, Berlin, Heidelberg.

Iyer, V., Agarwal, A., and Kumar, H. (2020). VeeAlign:

A Supervised Deep Learning Approach to Ontology

Alignment.

Jabbar, S., Ullah, F., Khalid, S., Khan, M., and Han, K.

(2017). Semantic Interoperability in Heterogeneous

IoT Infrastructure for Healthcare. Wireless Communi-

cations and Mobile Computing, 2017:1–10.

Jacomy, M., Venturini, T., Heymann, S., and Bastian, M.

(2014). ForceAtlas2, a Continuous Graph Layout Al-

gorithm for Handy Network Visualization Designed

for the Gephi Software. PLoS ONE, 9(6):e98679.

Khoudja, M. A., Fareh, M., and Bouarfa, H. (2018). On-

tology Matching using Neural Networks: Survey and

Analysis. In 2018 International Conference on Ap-

plied Smart Systems (ICASS), pages 1–6, Medea, Al-

geria. IEEE.

Liu, X., Tong, Q., Liu, X., and Qin, Z. (2021). Ontology

Matching: State of the Art, Future Challenges, and

Thinking Based on Utilized Information. IEEE Ac-

cess, 9:91235–91243.

Martino, F. and Spoto, A. (2006). Social Network Analysis:

A brief theoretical review and further perspectives in

the study of Information Technology. PsychNology

Journal, 4:53–86.

Nejhadi, A. H., Shadgar, B., and Osareh, A. (2011). Ontol-

ogy Alignment Using Machine Learning Techniques.

International Journal of Computer Science and Infor-

mation Technology, 3(2):139–150.

Otero-Cerdeira, L., Rodr

ıguez-Mart

ınez, F. J., and G

omez-

Rodr

ıguez, A. (2015). Ontology matching: A lit-

erature review. Expert Systems with Applications,

42(2):949–971.

Predoiu, L. (2006). D4.2.2 state-of-the-art survey on ontol-

ogy merging and aligning V2.

Shenoy, K. M., Shet, K. C., and Acharya, U. D. (2013). Se-

cured Ontology Matching Using Graph Matching. In

Meghanathan, N., Nagamalai, D., and Chaki, N., edi-

tors, Advances in Computing and Information Tech-

nology, volume 177, pages 11–18. Springer Berlin

Heidelberg, Berlin, Heidelberg.

Sousa, R. T., Silva, S., and Pesquita, C. (2022). The Super-

vised Semantic Similarity Toolkit. In Groth, P., Rula,

A., Schneider, J., Tiddi, I., Simperl, E., Alexopoulos,

P., Hoekstra, R., Alam, M., Dimou, A., and Tamper,

M., editors, The Semantic Web: ESWC 2022 Satellite

Events, volume 13384, pages 42–46. Springer Inter-

national Publishing, Cham.

abler, M., K

oster, F., and Schlueter-Langdon, C. Towards

solving ontological dissonance using network graphs.

AMCIS 2023 Proceedings, 1.

Uschold, M. and Gruninger, M. (2004). Ontologies and

semantics for seamless connectivity. ACM SIGMOD

Record, 33(4):58–64.

Vandenbussche, P.-Y., Atemezing, G. A., Poveda-Villal

on,

M., and Vatant, B. (2016). Linked Open Vocabularies

(LOV): A gateway to reusable semantic vocabularies

on the Web. Semantic Web, 8(3):437–452.

Why an Automated, Scalable and Resilient Service for Semantic Interoperability is Needed

307