Towards Semantic Summaries over Ontologies

Sebastian Wandelt

and Ralf M

oller

WBI, Humboldt Universit

at zu Berlin, Berlin, Germany

STS, Hamburg University of Technology, Hamburg, Germany

Keywords:

Ontologies, Description Logics, Semantic Summaries.

Abstract:

Industry is increasingly dependent on the gathering and processing of data to support crucial product develop-

ment activities. However, support systems for engineers or computer scientists may need to consider terabytes

of data, making it very hard to automatically extract useful information. Querying data repositories in order to

extract just the right information for the situation at hand remains a challenging problem.

We propose a notion of semantic summaries on top of description logic knowledge bases that supports query-

ing and summarizing information in large (ontological) data repositories. The idea of a semantic summary is

to characterize the result set from a broader perspective, instead of describing each domain object. We show

that our summarization approach scales for benchmark ontologies up to several million assertional axioms.

1 INTRODUCTION

Industry is increasingly dependent on the gathering

and processing of data to support decision making

and other activities critical to their business. How-

ever, support systems for engineers, including soft-

ware engineers, need to gather information from data

stores that grow up to petabyte size, making efﬁciency

in information retrieval increasingly difﬁcult. Query-

ing data repositories in order to extract just the right

information for the situation at hand is a challenging

task.

When dealing with huge data sets, it can be help-

ful to compute any kind of synopses and summaries

over the data for two purposes. First, from a query

answering system point of view, it might be more ef-

ﬁcient to answer (transformed) queries over a summa-

rization, because of reduced complexity of the input.

Second, from a user’s point of view, it can be easier

to explain/comprehend particular relations (e.g. sub-

sumptions, individual relations, etc.) in the ontology.

The underlying idea for creating synopses and sum-

marizations is closely related to notions of similarity.

First, we discuss similarity in the case of synopses.

Technically, synopses can be created in several ways:

• Spatial synopses: Given a particular snapshot

(representation of a point of time), a similarity re-

lation/function is computed, which assigns a sim-

ilarity measure for any two entities (concepts, in-

dividuals, etc.) in an ontology. For example, in a

clinical setting, two patients can be treated simi-

lar, if they share a particular amount of symptoms.

In a synopsis, these patients might be merged to-

gether and only unmerged/unfolded on further re-

quest. The scenario is depicted in Figure 1.

Figure 1: Semantic summary.

Joe and Sarah share symptoms Fever and

Headache in our example. For some queries it

might sufﬁcient to merge Joe and Sarah into one

individual, which then has e.g. only Fever and

Headache, or, Fever, Headache, Shoulder injury

and Dry cough. The outcome after reasoning

over summarizations clearly depends on the cho-

sen strategy. Especially in a clinical setting, for

some queries, it is important to retain soundness

and completeness in a synopsis, because we do

not want to draw wrong conclusions about any of

our patients. On the other hand, there might be

queries, which do not need to distinguish details

about Joe and Sarah, e.g. assume we want to ﬁnd

out all patients with Fever only.

196

Wandelt S. and Möller R..

Towards Semantic Summaries over Ontologies.

DOI: 10.5220/0004134601960201

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2012), pages 196-201

ISBN: 978-989-8565-30-3

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

• Temporal synopses: the idea is to use similarity of

an individual over time. For example, assume the

scenario shown in the Figure 2. There we show

one possible progress of a disease for Joe. If we

want to query for people with shoulder injuries

only, we do not need to distinguish the instances

of Joe over time. On the other hand, if we have

a query to ﬁnd patients with rising symptoms for

ﬂue, it is inevitable to consider all changes of in-

formation on Joe.

Figure 2: Semantic summary.

• Diagonal synopses: Combine Spatial and Tem-

poral Synopses. This area is widely unexplored,

but seems to have high potential to analyze large

amounts of ﬂuctuant data.

The challenges about synopses are twofold. First,

the choice of the strategy (which entities to treat sim-

ilar) is crucial to determine its reasoning and expla-

nation capabilities. Second, it is important to iden-

tify which synopses should be kept up-to-date for an

ontology. Summaries can be seen as an extension of

synopses. The idea is to get away from told instance

information in ontologies to determine a kind of ab-

straction of the stored assertional data. Going back

to our clinical example, the idea can be visualized as

shown in Figure 3.

Figure 3: Semantic summary.

For example, in a summary, we only know that

there is a patient with Chickenpox. Furthermore, we

know that there are three people who have Headache,

and possible share Fever and Shoulder Injuries. This

summary can be used to talk about the general situa-

tion, e.g. allocation of beds, in the hospital. With ex-

isting technologies this is not yet possible. Although

one can create statistics over ontologies by predeﬁned

aggregate queries, these queries do not adapt to new

situations (e.g. new diseases).

With summaries, these statistics are just created

automatically, without the user having to deﬁne any

statistics-rules for his ontology. Differential sum-

maries can then be used to determine the recent

changes in ontologies, e.g. ”Does the hospital have

a similar allocation of beds, as it had 5 years ago?”.

We emphasize that it is not only intended to use ob-

vious and directly told information on individuals to

create summaries, but to use the locality information

to detect all possibly relevant information.

This article discusses the creation of spatial syn-

opses. For each named individual in the ABox, an

abstraction is computed, given the told ABox infor-

mation. The intuition is that the abstraction corre-

sponds to a subset of the assertional knowledge, rep-

resenting what we know about a given individual and

what is sufﬁcient to perform reasoning with respect

to the given background knowledge. The situation is

depicted in Figure 4.

Figure 4: Semantic summary.

The abstractions for each named individual are

combined, in order to obtain asynopsis/summary of

the whole ontology. In the following, we describe

the formal foundations for computing semantic sum-

maries.

2 DESCRIPTION LOGICS

Description logics are a family of languages for

knowledge representation. Historically, description

logics are descendants of semantic nets (Quillian,

1968) and frame systems (Minsky, 1974). In Artiﬁ-

cial Intelligence, description logics are used for for-

mal reasoning about application domains.

In the following, we recapitulate syntax and se-

mantics of the description logic SH I as far as rele-

vant for this work. Please refer to (Baader, 1999) for

further details. We assume a number of disjoint base

sets as follows: CN is a non-empty set of concept

names, RN is a non-empty set of role names, NIN

is a non-empty set of named individuals, and AIN is

a non-empty set of anonymous individuals. The set

TowardsSemanticSummariesoverOntologies

197

of individuals is IN = NIN ∪ AIN. The set of SH I

-concept descriptions is given by the following gram-

mar:

, C

::=>|⊥|A|¬C

|∀R.C

|∃R.C

where A ∈ CN and R ∈ Rol. With AtCon we denote

all atomic concepts, i.e. concept descriptions which

are concept names or negated concept names. For

the semantics of concept descriptions please refer to

(Baader et al., 2007).

A TBox T is a set of so-called generalized con-

cept inclusion axioms C

v C

. A RBox R is a set

of so-called role inclusion axioms R

v R

. An ABox

A is a set of so-called concept and role assertion ax-

ioms C(a) and R(a

, a

). An ontology O consists of

a 3-tuple hT, R, Ai. We restrict the concept assertion

axioms in A in such a way that each concept descrip-

tion is an atomic concept or a negated atomic con-

cept. This is without loss of generality, since each

non-atomic concept description can be given a name

in the TBox. The set of TBoxes (RBoxes, ABoxes,

ontologies) is denoted with ST (SR, SA, SO).

We denote with clos(C) the closure of a concept

description C. The closure of a concept description is

usually used for syntactical analysis. We assume that

a concept description C is usually in negation normal

form, i.e. for all ¬C

∈ clos(C), C

is a concept name.

Using De Morgan laws, every concept description can

be transformed into a concept description in negation

normal form. The negation normal form of a concept

description C is denoted nn f (C). Given a TBox T, the

concept closure of T, denoted clos(T), is deﬁned as

clos(T) =

[

∈T

(clos(¬C

) ∪ clos(C

)).

3 INDIVIDUAL ABSTRACTION

In (Wandelt and M

oller, 2008), a method is proposed

to identify the relevant assertions to reason about

an individual. The main motivation is to enable in-

memory reasoning over large ontologies, i.e. ontolo-

gies with a large ABox, for traditional tableau-based

reasoning systems. More formally, given an input in-

dividual a, the proposal is to compute a set of ABox

assertions A

isl

(a subset of the source ABox A), such

that for all atomic (!) concept descriptions C, we have

hT, R, Ai  C(a) iff hT, R, A

isl

i  C(a).

In order to deﬁne subsets of an ABox relevant for

reasoning over an individual a, we deﬁne an opera-

tion which splits up role assertions in such a way that

we can apply graph component-based modularization

techniques over the outcome of the split.

ABox

ann

teaches

ABox split

ABox

ann

ann*

c1*

teaches

Figure 5: Intuition of an ABox split.

Deﬁnition 1 (ABox Split). Given

• a role description R,

• two distinct named individuals a and b,

• two distinct anonymous individuals c and d, and,

• an ABox A,

an ABox split is a function ↓

R(a,b)

c,d

: SA → SA, deﬁned

as follows:

• If R(a, b) ∈ A and {c, d} * Ind(A), then

↓

R(a,b)

c,d

(A) =A \ {R(a, b)} ∪ {R(a, d), R(c, b)}∪

{C(c) | C(a) ∈ A}∪

{C(d) | C(b) ∈ A}

• Else

↓

R(a,b)

c,d

(A) = A.

The intuition of Deﬁnition 1 is depicted in Fig-

ure 5. The clouds in Figure 5 indicate a set of ABox

assertions. We split up a role assertion and keep the

concept assertions for each fresh individual copy. The

reason for keeping the asserted concept descriptions

is explained below. If the ABox does not contain the

role assertion in question, then the split returns an un-

changed ABox.

Deﬁnition 2 (Extended ∀-info Structure). Given a

TBox T in normal form and a RBox R, an extended

∀-info structure for T and R is a function extinfo

∀

T,R

Rol → ℘(Con), such that we have C ∈ extinfo

∀

T,R

(R)

if and only if there exists a role R

∈ Rol, such that

R  R v R

and ∀R

.C ∈ clos(T).

Example 1 (Example for an Extended ∀-info Struc-

ture). Let

Ex1

= {

Chair v ∀headO f.Department,

∃memberO f.> v Person,

GraduateStudent v Student

}

KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment

198

Input: Ontology O = hT, R, Ai, individual a ∈

NInd(A)

Output: Individual island ISL

= hT, R, A

isl

, ai

Algorithm:

Let agenda = a

Let seen =

Let A

isl

While agenda 6=

0 do

Remove a

from agenda

Add a

to seen

Let A

isl

= A

isl

∪ {C(a

) | C(a

) ∈ A}

For each R(a

, a

) ∈ A

isl

= A

isl

∪ {R(a

, a

) ∈ A}

If R(a

, a

) ∈ A is S H I -splittable with re-

spect to O then

isl

= A

isl

∪ {C(a

) | C(a

) ∈ A}

else agenda = agenda ∪ ({a

} \ seen)

For each R(a

, a

) ∈ A

isl

= A

isl

∪ {R(a

, a

) ∈ A}

If R(a

, a

) ∈ A is S H I -splittable with re-

spect to O then

isl

= A

isl

∪ {C(a

) | C(a

) ∈ A}

else agenda = agenda ∪ ({a

} \ seen)

Figure 6: Naive algorithm for computation of an individual

island.

and

Ex1

= {headO f v memberO f},

then the TBox in normal form is

Ex1norm

= {

> v ¬Chair t ∀headO f.Department,

> v ∀memberO f.⊥t Person,

> v ¬GraduateStudent t Student

}

and the extended ∀-info structure for T

Ex1norm

and

Ex1

is:

extinfo

∀

T,R

(R) =











{Department, ⊥} if R = headO f,

{⊥} if R = memberO f,

0 otherwise.

The extended ∀-info structure allows us to check

which concept descriptions are (worst-case) propa-

gated over role assertions in SH I -ontologies.

Deﬁnition 3 (SH I-splittability of Role Assertions).

Given a S H I -ontology O = hT, R, Ai and a role

assertion R(a

, a

), we say that R(a

, a

) is SH I -

splittable with respect to O if

1. there exists no transitive role R

with respect to R,

such that R  R v R

2. for each C ∈ extinfo

∀

T,R

(R)

• C = ⊥ or

• there exists a concept description C

, such that

(b) ∈ A and T  C

v C or

• there exists a concept description C

, such that

(b) ∈ A and T  C uC

v ⊥

and

3. for each C ∈ extinfo

∀

T,R

−

)

• C = ⊥ or

• there exists a concept description C

, such that

(a) ∈ A and T  C

v C or

• there exists a concept description C

, such that

(a) ∈ A and T  C uC

v ⊥.

To sum up, for each named individual in the on-

tology, we use the algorithm from Figure 6, to obtain

an abstraction of the individual.

4 SEMANTIC SUMMARIES

Given an individual abstraction for each named indi-

vidual in an input ontology, it is clear that some (or

even many) abstraction are similar to each other. Due

to lack of space we do not go into the technical de-

tails of computing the similarity of individual abstrac-

tion here. However, if one looks at an abstraction as a

graph, graph homomorphisms can be used directly to

determine similar individual islands.

The key insight is that similar abstraction entail

the same set of concept descriptions for there root in-

dividual. Therefore these individuals (of similar ab-

stractions) cannot be distinguished with respect to the

given background knowledge. This is exactly what

we expect from semantic summaries. Thus, for se-

mantic summaries, we propose to look at ontologies

as a set of similar individual abstractions.

We performed some ﬁrst evaluation of this idea

with respect to a benchmark ontology. The Lehigh

University Benchmark, short LUBM, is a synthetic

ontology developed to benchmark knowledge base

systems with respect to large OWL applications. The

ontology is situated in the university domain. The

background knowledge, i.e. the terminology, is de-

scribed in a schema called Univ-Bench, see (Guo

et al., 2005) for an overview over the history, different

versions and the predecessor Univ 1.0. The expressiv-

ity of the ontology is chosen to be in OWL Lite, which

corresponds to the description logic SH I F . How-

ever, the de facto expressivity is lower. For instance,

TowardsSemanticSummariesoverOntologies

199

the ontology does not introduce any cardinality/func-

tionality expressions on roles.

In Figure 7, we show the number of individu-

als in the dataset, for different numbers of universi-

ties. It can be seen that the number of individuals

increases almost linearly with the number of univer-

sities. Around 30 percent of the individuals in the

dataset are publications, another 30 percent are under-

graduate students, 10 percent are graduate students,

10 percent are courses and graduate courses. The re-

maining 20 percent of the individuals are for instance

professors, assistants and departments. For more de-

tails about the data distribution, see (Guo et al., 2005).

500.000

1.000.000

1.500.000

2.000.000

2.500.000

100

individuals

universities

Figure 7: Number of individuals in LUBM.

Next, we evaluated the number of distinct individ-

ual abstractions for different number of universities.

The result is shown in Figure 8. It can be seen that the

number of distinct individual abstactions is constant -

compared to the linear number of individuals.

Figure 8: Number of distinct individual abstractions for

LUBM.

As a second ontology, we had a look at an ontol-

ogy from the CASAM project. The CASAM project

is focused on computer-aided semantic annotation

of multimedia content. The novelty is the aggrega-

tion of human and machine knowledge. For a de-

tailed discussion of the research objectives, see (Gries

et al., 2010), (Papantoniou et al., 2010), and (Creed

et al., 2010). Within the CASAM project, there is

a need to deﬁne an expressive annotation language

which allows for typical-case reasoning systems. The

proposed annotation language is deﬁned by the so-

called Multimedia Content Ontology, short MCO, in-

troduced in (Gries et al., 2009). Inspired by the

MPEG-7 standard, see (ISO/IEC15938-5FCD, 2002),

strictly necessary elements describing the structure of

multimedia documents are extracted. The intention

is to exploit quantitative and qualitative time informa-

tion in order to relate co-occurring observations about

events in videos. Co-occurrences are detected either

within the same or between different modalities, i.e.

text, audio and speech, regarding the video shots.

For our evaluation with respect to MCO, we have

a number of multimedia documents from the CASAM

project. The set of test ontologies contains documents

with identiﬁers ranging from 1 to 14. Each document

is decomposed into several so-called delta ﬁles. Each

delta represents additional information about the doc-

ument of concern. We evaluated our summarization

techniques with respect to all documents. Here we

only show the results for Document 1, since for all

the other documents we obtained very similar statis-

tics.

In Figure 9, we show the number of individuals in

the dataset, with an increasing delta. It can be seen

that most individuals are introduced in the ﬁrst delta

ﬁles. The remaining delta ﬁles only introduce addi-

tional ABox assertions about already known individ-

uals. Please note that the number of individuals is not

linear in the number of delta.

Figure 9: Number of individuals in Document 1.

We have evaluated the number of individual ab-

stractions for different delta. The result is shown in

Figure 10. It can be seen that the number of dis-

tinct individual abstractions is relatively constant - af-

ter most individuals are introduced in the third and

fourth delta.

Figure 10: Number of distinct individual abstractions for

Document 1.

5 CONCLUSIONS AND FUTURE

WORK

We have proposed ﬁrst ideas for a notion of seman-

tic summaries that supports industrial information

KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment

200

search scenarios by using (domain speciﬁc) industry-

standard vocabularies to query and summarize infor-

mation. It has been shown already that summaries

can be efﬁciently managed in a distributed computing

setting(Wandelt and M

oller, 2010) and can be used

for reasoning over the ontology of concern (Wandelt

et al., 2010).

For Future Work, we have to evaluate our seman-

tic summary techniques with respect to additional on-

tologies. Furthermore, we would like to formally im-

plement and evaluate difference operators over ontol-

ogy summaries, in order to formally capture ontology

evolution with temporal synopses.

REFERENCES

Baader, F. (1999). Logic-Based Knowledge Representa-

tion. In Artiﬁcial Intelligence Today, pages 13–41.

Springer-Verlag.

Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D.,

and Patel-Schneider, P. F. (2007). The Description

Logic Handbook. Cambridge University Press, New

York, NY, USA.

Creed, C., Lonsdale, P., Hendley, R., and Beale, R. (2010).

Synergistic annotation of multimedia content. In Pro-

ceedings of the 2010 Third International Conference

on Advances in Computer-Human Interactions, ACHI

’10, pages 205–208, Washington, DC, USA. IEEE

Computer Society.

Gries, O., M

oller, R., Naﬁssi, A., Rosenfeld, M., Sokolski,

K., and Wessel, M. (2010). A Probabilistic Abduc-

tion Engine for Media Interpretation Based on Ontolo-

gies. In Hitzler, P. and Lukasiewicz, T., editors, RR,

volume 6333 of Lecture Notes in Computer Science,

pages 182–194. Springer.

Gries, O., M

oller, R., Naﬁssi, A., Sokolski, K., and Rosen-

feld, M. (2009). CASAM Domain Ontology. Techni-

cal report, Hamburg University of Technology.

Guo, Y., Pan, Z., and Heﬂin, J. (2005). LUBM: A bench-

mark for OWL knowledge base systems. J. Web Sem.,

3(2-3):158–182.

ISO/IEC15938-5FCD (2002). Multimedia Content

Description Interface (MPEG-7). http://mpeg.

chiariglione.org/standards/mpeg-7/mpeg-7.htm.

Minsky, M. (1974). A Framework for Representing Knowl-

edge. Technical report, MIT-AI Laboratory, Cam-

bridge, MA, USA.

Papantoniou, K., Tsatsaronis, G., and Paliouras, G. (2010).

KDTA: Automated Knowledge-Driven Text Annota-

tion. In Balc

azar, J. L., Bonchi, F., Gionis, A., and Se-

bag, M., editors, ECML/PKDD (3), volume 6323 of

Lecture Notes in Computer Science, pages 611–614.

Springer.

Quillian, R. (1968). Semantic memory. In Semantic Infor-

mation Processing, pages 216–270. MIT Press.

Wandelt, S. and M

oller, R. (2008). Island reasoning for

ALCHI ontologies. In Proceedings of the 2008 con-

ference on Formal Ontology in Information Systems,

pages 164–177, Amsterdam, The Netherlands. IOS

Press.

Wandelt, S. and M

oller, R. (2010). Distributed island-based

query answering for expressive ontologies. In Bellav-

ista, P., Chang, R.-S., Chao, H.-C., Lin, S.-F., and

Sloot, P. M. A., editors, GPC, volume 6104 of Lecture

Notes in Computer Science, pages 461–470. Springer.

Wandelt, S., M

oller, R., and Wessel, M. (2010). Towards

scalable instance retrieval over ontologies. Int. J. Soft-

ware and Informatics, 4(3):201–218.

TowardsSemanticSummariesoverOntologies

201