USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY

DEVELOPMENT

Guadalupe Aguado de Cea, Asunción Gómez-Pérez, Elena Montiel-Ponsoda

and Mari Carmen Suárez-Figueroa

Ontology Engineering Group, Dpto. de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo s/n, 28660 Boadilla del Monte, Madrid, Spain

Keywords: Ontology development, Ontology design patterns, Ontology statements, Lexico-syntactic patterns.

Abstract: In this paper we describe how linguistic patterns can contribute to ontology development by enabling an

easier reuse of some ontological resources. In particular, our research focuses on the reuse of ontology

design patterns and ontology statements by relying on linguistic constructs at different stages of the reuse

process. With this aim, we propose the employment of lexico-syntactic patterns with two objectives: 1) the

reuse of ontology design patterns, and 2) the validation of ontology statements for their subsequent reuse in

the ontology development. To illustrate the proposed approaches, we will present some examples of lexico-

syntactic patterns and their employment in the reuse of ontology design patterns and in the validation of

ontology statements.

1 INTRODUCTION

The 1990s and the first years of this new millennium

have witnessed the growing interest of many

practitioners in methodologies and methods that

support ontology development. A complete

overview of the methods and methodologies for

developing ontologies from scratch has been

reported in (Gómez-Pérez et al., 2003). All these

efforts have contributed to transform the art of

constructing single ontologies into an engineering

activity. However, in the last years, ontology

developers are starting to reuse and re-engineer as

much as possible knowledge-aware resources (e.g.,

ontologies, ontology modules, ontology statements,

ontology design patterns, thesauri, lexicons,

classification schemas, and databases) as opposed to

custom-building ontologies from scratch. This new

ontology development paradigm has the following

benefits: (a) allows speeding up the ontology

development process, (b) saves time and budget, and

In this new paradigm, design patterns have

appeared into scene. Design patterns are generally

defined as archetypal solutions to design problems

(Gangemi, 2005). According to (Menzies, 1997),

design patterns are generally perceived as having

three kinds of benefits: (1) reuse, (2) guidance, and

(3) fluent communication. Intuitively, these benefits

can also be applied to Ontology Engineering, and

recently experiments have empirically proven this

intuition (Dzbor et al., 2009). On the light of these

benefits, the Ontology Engineering community is

devoting many efforts to the development of

ontology design pattern repositories (e.g. Ontology

Design Patterns.org) to encourage users the reuse of

best practices while speeding up ontology

development. However, experiments in the reuse of

Ontology Design Patterns (ODPs hereafter) have

also revealed that users with different levels of

expertise in ontology modelling face difficulties

when trying to reuse ODPs. The set of ODPs used in

this study have been designed in the NeOn project

(www.neon-project.org), and are included in

(Suárez-Figueroa et al., 2007).

Other ontological resources worth reusing for

accelerating the ontology development process are

ontology statements. We consider an ontology

statement (or triple) as containing three components:

subject, predicate and object. Such ontology

statements are available in the Semantic Web (SW)

and can be used to build a preliminary model of the

ontology or to extend and improve an existing one.

Nevertheless, an important issue here is the quality

of the retrieved ontological knowledge represented

by the ontology statement.

With the aim of offering a solution for a

flawless reuse of both ODPs and ontology

statements, we are currently developing an

206

de Cea G., Gómez-Pérez A., Montiel-Ponsoda E. and Suárez-Figueroa M. (2009).

USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY DEVELOPMENT.

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development, pages 206-213

DOI: 10.5220/0002300002060213

 SciTePress

innovative approach based on a Natural Language

human interface. In the same way that the

knowledge of the world used to build knowledge

systems is normally captured in Natural Language

(NL), it seems reasonable that NL be one of the most

appropriate means to identify and validate encoded

knowledge.

Based on previous work on knowledge

extraction from texts summarized in (Cimiano,

2006), our intuition is that some linguistic constructs

reliably convey ontological relations. The linguistic

constructs we deal with in the present work combine

lexical items with syntactic dependencies and have

verbs as main elements. For the purposes of our

research, we have adopted the term Lexico-Syntactic

Patterns or LSPs (Hearst, 1992) to designate those

linguistic constructs that have been defined in

(Aguado de Cea et al., 2008) as “formalized

linguistic schemas or constructions derived from

regular expressions in NL that consist of certain

linguistic and paralinguistic elements, following a

specific syntactic order, and that permit to extract

some conclusions about the meaning they express”.

To guarantee an easy reuse of ODPs, a

repository of LSPs corresponding to ODPs has been

developed and published in (Aguado de Cea et al.,

2008). The purpose of this repository is to enable the

reuse of ODPs starting from sentences in NL

formulated by users. In this paper, we will contribute

with some enhancements to the LSPs repository.

Regarding the reuse of ontology statements, the goal

is to provide a way for predicting the truth value of

these statements also by means of LSPs, with the

aim of ensuring the correctness of the knowledge to

be reused. Thus, when developing ontologies, LSPs

will be highly useful because they facilitate 1) the

reuse of the ODPs that represent the phenomenon or

domain aspect the user wants to model in the

ontology, and 2) the reuse of validated ontology

statements drawn from online ontologies.

The reminder of this paper is structured as

follows. Section 2 is devoted to the motivation.

Section 3 presents some examples of LSPs and

improvements to the initial repository. Section 4

includes an enhanced version of the guidelines for

reusing of ODPs. Section 5 proposes some steps for

the validation of ontology statements. Finally, we

conclude the paper and outline some future lines of

research in section 6.

2 MOTIVATION

Ontology Design Patterns Reuse. Some

experiments reported in (Aguado de Cea et al.,

2008) showed common problems, errors and

misunderstandings faced by novice users when they

tried to reuse ODPs to solve modelling problems

expressed in NL. The ODPs used in the mentioned

experiments are included in (Suárez-Figueroa et al.,

2007), and have been classified into Logical and

Content ODPs. Examples of Logical ODPs are the

Logical Pattern for modelling SubclassOf relation,

the one for Universal Restrictions, or the one for

Disjoint Classes. Under the category of Content

Patterns, we find the Participation, the Role-Task, or

the Simple Part-Whole relation patterns. Subjects

participating in the mentioned experiments had

problems in correctly reusing the Part-Whole

relation Content pattern, for example, and mistook it

for the Logical ODPs for Object Property,

SubclassOf, or SubpropertyOf. Another typical error

was the incorrect use of the Exhaustive Classes

ODP, when the Disjoint Classes ODP should have

been employed.

Ontology Statements Validation. One crucial issue

in the current SW is related with the trust assigned to

the knowledge accessed through SW portals like

Watson (http://watson.kmi.open.ac.uk). Several

studies (Sabou et al., 2008; Suárez-Figueroa et al.,

2008) have revealed the need for evaluating

ontology statements before reusing them, because of

the large number of incorrect or misleading

statements retrieved from the Web. For a

classification of the identified problematic

statements see (Sabou et al., 2009). Problematic

statements that are of interest to this research are the

so-called modelling errors, i.e., the ones that express

an incorrect relation from a (formal) modelling

perspective. One of the most common errors in

online ontologies reported in (Sabou et al., 2008)

was the use of the SubclassOf relation (or

subsumption) “as a way to model the fact that there

exists some type of relation between two concepts”.

This resulted in examples like Biographies

subclass_of People. Another quite common error

was the use of the SubclassOf relation to model

Part-Whole relations. For example, Branch

subclass_of Tree.

The results of these experiments shed some light

about incorrect modelling of some of the most

common ontological relations when reusing ODPs

or ontology statements. This brought us to propose a

NL-based approach to facilitate the access to ODPs,

and to validate the knowledge expressed in ontology

statements, since NL is the way humans have to

communicate knowledge.

USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY DEVELOPMENT

207

3 LEXICO-SYNTACTIC

PATTERNS

Lexico-Syntactic Patterns (LSPs) were first applied

by Hearst (1992) in the context of relation discovery

from machine readable dictionaries. Hearst’s

patterns had prepositional phrases, paralinguistic

signs or conjunctions as main elements. The object

of her research was to identify hyperonymy-

hyponymy relations between concepts. Since then,

many authors have followed Hearst’s approach for

the automatic discovery of semantically related

lexical items from unstructured texts with different

purposes. In Ontology Engineering, LSPs have been

mainly applied in two directions: 1) to learn classes,

attributes or instances for ontology population

(Aussenac-Gilles and Jacques, 2006; Berland and

Charniak, 1999; Cimiano and Wenderoth, 2007;

Etzioni et al., 2004; Pasca, 2005; Reinberger and

Spyns, 2004; among others), and 2) to learn

taxonomic and non-taxonomic relations for ontology

building (Kavalec and Svátek, 2005; Sánchez and

Moreno, 2008; among others).

For the purposes of this research, we will focus

on two specific sets of patterns that represent two of

the most important taxonomical relations in

ontologies: the SubclassOf relation and the Part-

Whole relation. As already mentioned in section 2,

in the two activities that concern us here, namely,

ODPs reuse and ontology statement validation, these

two relations are the source of many modelling

errors. For that reason, we will concentrate when

showing how LSPs can contribute to solve novice

users’ problems in reusing the ODPs to model those

relations. Besides that, LSPs will be applied to

discern if an ontology statement models correctly a

SubclassOf or a Part-Whole relation. We will

contribute to the LSPs repository in (Aguado de Cea

et al., 2008) by proposing some enhancements to the

LSPs present in the SubclassOf and Part-Whole

relations.

It is unusual to find a one-to-one correspondence

between LSPs and ODPs. ODPs are classified

according to the modelling problem they address,

but in NL different modelling aspects can be

linguistically realized with one and the same lexical

element. In this case we have a NL statement

corresponding to a combination of ODPs. For

example, the verb to classify in/into implies the

arrangement of objects in different categories that do

not normally share instances, i.e., it combines the

SubclassOf relation ODP with the Disjoint Classes

ODP. Apart from that, sometimes the same linguistic

construct can convey two different relations, or,

what is the same, correspond to two or more

pairwise disjoint ODPs. An example of this is the

verb to include, which can either list the types into

which a certain class is divided, or the parts into

which an object is divided (this can also be applied

to the verb to divide). In the latter case, a further

disambiguation process is required. Let us illustrate

these cases with the LSPs in the tables (The set of

restricted symbols and abbreviations used is

included in Appendix):

Table 1: LSPs corresponding to SubclassOf or Part-Whole

ODPs.

LSP Identifier : LSP-SC-PW-EN

NeOn ODPs Identifier : LP-SC-01 / CP-PW-01

Formalization

NP<class> include | comprise | consist of [(NP<class >,)*

and] NP<class>

NP<class> be divide | split | separate in|into [CD] [CN]

[(NP<class >,)* and] NP<class>

Examples

(a) Common mass storage devices include disk drives

and tape drives. (SC)

(b) Reproductive structures in female insects include

ovaries, bursa copulatrix and uterus. (PW)

(a) Marine mammals are divided into three orders:

Carnivora, Sirenia and Cetacea. (SC).

(b) The cerebrum is divided into two major parts: the

right cerebral hemisphere and left cerebral hemisphere.

(PW)

In Table 1 we present the LSPs for the pairwise

disjoint ODPs SubclassOf and Part-Whole,

signalized by the use of “/” in the NeOn ODPs

Identifier (taken from the ODPs repository in

(Suárez-Figueroa et al., 2007)). In this Identifier, LP

stands for Logical Pattern and CP for Content

Pattern. The LSP Identifier in the LSPs template

consists of an acronym of the relation captured by

the ODP (SC for SubclassOf and PW for Part-

Whole), and the ISO-639 code for the language to

which the LSP belongs.

As already mentioned, patterns in Table 1

contain polysemous verbs conveying both

ontological relations, SubclassOf and Part-Whole.

We have contributed to the LSPs repository by

including a new verbal form in LSP 1, namely, the

verb to consist of.

In Table 2, LSPs corresponding to the

SubclassOf relation are shown. In this paper, some

modifications to the LSP-SC-EN in Aguado de Cea

et al. (2008) have also been made, namely:

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

208

Table 2: LSPs corresponding to SubclassOf relation ODP.

LSP Identifier: LSP-SC-EN

NeOn ODPs Identifier : LP-SC-01

Formalization

[(NP<subclass>,)* and] NP<subclass> be [CN]

NP<superclass>

[(NP<subclass>,)* and] NP<subclass> (classify as) |

(group in|into|as) | (fall into) | (belong to) [CN]

NP<superclass>

There are CD | QUAN [CN] NP<superclass> PARA

[(NP<subclass>,)* and] NP<subclass>

[A(n) | QUAN] example | examples | [CN] of

NP<superclass> be | include [(NP<subclass>,)* and]

NP<subclass>

Examples

1 Odometry, speedometry and GPS are types of sensors.

15,5% of Spaniards are classified as obese.

Thyroid medicines belong to the general group of

hormone medicines.

There are several kinds of memory: fast, expensive, short

term memory, and long-term memory.

Some examples of peripherals are keyboards, mice,

monitors, printers, scanners, disk and tape drives,

microphones, speakers, joysticks, plotters and cameras.

a) LSPs 1 and 2 in the original repository have

been merged into LSP 1 here;

b) A new verbal form has been included in the

new LSP 2, namely, to classify as, and Class Name

(CN) has been turned into an optional element;

c) LSP 4 in the original repository has proven to

belong to the LSP for “SubclassOf relation, Disjoint

Classes, and Exhaustive Classes ODPs” (see pattern

number 3 in Table 3) and has been moved to that

LSP. This decision is supported by a preliminary

experiment reported in (Aguado de Cea and

Montiel-Ponsoda, 2009) that has demonstrated that

verbs such as to classify in/into, to divide in/into, to

split in/into, and to separate in/into predicate about a

set of subclasses that do not share any individuals

(disjointness). Although more evidence is required,

we would dare to say that for the rest of patterns

represented in Table 2, no assertions can be made

about disjointness or exhaustiveness without further

evidence;

d) A new LSP has been included in

LSP-SC-EN,

namely, LSP 4.

Regarding the LSPs corresponding to

“SubclassOf relation, Disjoint Classes, and

Exhaustive Classes ODPs” (see Table 3), a new

pattern has been included, namely, LSP 1. In this

case, one LSP corresponds to a combination of three

ODPs, and it is signalized by the use of “+” in the

NeOn ODPs Identifier. Moreover, we claim that

exhaustiveness is also a feature of the linguistic

constructs identified in this case, although further

evidence is needed.

Table 3: LSPs corresponding to SubclassOf relation,

Disjoint Classes, and Exhaustive Classes ODPs.

LSP Identifier : LSP-SC-Di-EC-EN

NeOn ODPs Identifier: LP-SC-01 + LP-Di-01 + LP-EC-

Formalization

NP<superclass> be | CATV [either] NP<subclass> or |

and NP<subclass>

NP<superclass> be divide | split | separate | group in|into

[either] [(NP<subclass >,)* and] NP<subclass>

NP<superclass> CATV [CD] [CN] [PARA]

(NP<subclass>,)

and NP <subclass>

Examples

1 Animals are either vertebrates or invertebrates.

Sensors are divided into two groups: contact and non-

contact sensors.

Membrane proteins are classified into two major

categories, integral proteins and peripheral proteins.

Table 4: LSPs corresponding to Part-Whole relation ODP.

LSP Identifier: LSP-PW-EN

NeOn ODPs Identifier: CP-PW-01

Formalization

1 (NP<part>,)* and NP<part> COMP [CN] NP<whole>

2 NP<whole> be COMP [CN] (NP<part>,)* and NP<part>

The part | parts of a NP<whole> be [PARA]

NP<part>,)* and NP<part>

Examples

1 Proteins form part of the cell membrane.

2 Water is made up of hydrogen and oxygen.

The parts of a tree are the root, trunk(s), branches, twigs

and leaves.

In Table 4 we include those lexico-syntactic

constructs that can identify a Part-Whole relation.

The following modifications have been carried out

to the LSP-PW-EN:

a) Verbal forms composing the set of the so-

called Verbs of Composition (COMP), to which the

verbs to contain and to hold have been added, and

the verb to consist of has been removed;

b) LSP 3 has been included.

Finally, in Table 5 we show the pairwise disjoint

ODP LSP-OP-DP-PW-EN from Aguado de Cea et

al. (2008) that contains a polysemous verb (to have),

which can also correspond to different modelling

patterns. In this case, disambiguation is also related

with modelling decisions, i.e., if we have already

discerned that we are not dealing with parts of an

USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY DEVELOPMENT

209

object (PW), but with properties, we have to decide

if we want to model a certain property as an

ObjectProperty (OP), or as a DatatypeProperty (DP).

Table 5: LSP corresponding to Object Property, Datatype

Property and Part-Whole relation ODPs.

LSP Identifier : LSP-OP-DP -PW-EN

NeOn ODPs Identifier: LP-OP-01 / LP-DP-01 / CP-PW-01

Formalization

1 NP<class> have NP<class>

Examples

Birds have feathers.

Water areas have names in natural language.

4 METHOD FOR ODP REUSE

The repository introduced in section 3 is the core

element of the method we present here. The main

purpose of this repository is to bridge the gap

between NL and the modelling solutions represented

by ODPs. In this way, this method aims at enabling

an easier reuse of ODPs to users with different

expertise in Ontology Engineering. The whole

process needs to be supported by a system that

automatically analyses the sentence in NL

introduced by the user, and looks for the linguistic

structure that best matches the input in the LSPs

repository. However, some LSPs are ambiguous, i.e.

the same linguistic structure can equally express the

information encoded in pairwise disjoint ODPs (as

already exemplified in Tables 1 and 5). This means

that the user input may need to be revised or refined

during the process, so that only one option is valid.

The method is mainly divided in three tasks:

Task 1: Formulation. The goal of this task is to

formulate in NL the domain aspect to be modelled.

We assume that the user has a good command of the

knowledge parcel (s)he wants to model in the

ontology. Contrary to the proposal made in (Aguado

de Cea et al., 2008), in which the user was allowed

to introduce in full NL the domain aspect to be

modelled, we claim here that some

recommendations on the use of NL should be made

in order to guarantee the results. For example, to

express sentences in present tense; to express one

topic or idea per sentence; to avoid coordination of

phrases, that will be only used when necessary; to

avoid redundant information (adjectives, adverbs) or

modal verbs; or to use full stop to end up sentences,

among others.

Task 2: Refinement. The goal of this task is to

refine the input because an LSP matches pairwise

disjoint ODPs. This task will take place when LSPs

of the type represented in Table 1 and Table 5 are

matched. Different strategies for the performance of

this task are being investigated, such as interaction

with the user; search in available ontologies

modelling the same domain of knowledge; search in

lexical resources with some semantics (e.g.

WordNet); etc.

Task 3: Validation. The goal of this task is to

confirm that the ODP proposed as modelling

solution is correct. The validation is foreseen to be

manually carried out by the user.

4.1 Example of Use

In this section, our aim is to exemplify the proposed

method for the reuse of ODPs having as starting

point sentences in NL. Since the relations dealt with

here are SubclassOf and Part-Whole, we will focus

on the employment of the proposed method with one

example of a polysemous LSP conveying both

ontological relations (see Table 1). The method

could help in the following way:

Task 1: Formulation. Let us assume that the user

wants to model the Part-Whole relation holding

between the brain (cerebrum) and its parts (the right

hemisphere and the left hemisphere). The first task

consists in formulating that knowledge according to

the recommendations introduced in the method. If

we take Example 2 (b) in Table 1: The cerebrum is

divided into two major parts: the right cerebral

hemisphere and left cerebral hemisphere, the

adjective “major” appearing in this sentence should

be removed, since it is redundant here.

Task 2: Refinement. The sentence in NL matches

the LSP-SC-PW-EN corresponding to pairwise

disjoint ODPs: LP-SC-01 and CP-PW-01 (see Table

1). There would be two options for refining the

input: 1) interaction with the user by asking him or

her if the right cerebral hemisphere and left cerebral

hemisphere are “types of” cerebrum or “parts of”

cerebrum; 2) search for cerebral hemisphere in a

lexical or terminological resource such as WordNet

and check which relation it bears to cerebrum.

Task 3: Validation. In return, the user receives an

ODP represented by a UML diagram instantiated

with the information of the input sentence in Task 1.

Then, the user has to confirm the ODPs suitability.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

210

5 VALIDATION OF ONTOLOGY

STATEMENTS

The repository of LSPs introduced in section 3 can

also be very useful when verifying the quality of the

ontology statements the user wants to reuse for

ontology building or enrichment. Our starting

hypothesis here was that if the knowledge expressed

in the ontology statement was formally correct from

a modelling perspective, it had to be found in

domain texts expressed by means of the linguistic

constructs identified as usually conveying the

relation represented in the statement.

This means that we could expect to find the concepts

in the statement (let us say, Branch part_of Tree),

forming part of NL structures identified in the LSPs

repository as conveying the Part-Whole relation (e.g.

Trees have branches). On the contrary, if the

relation said to be hold between the ontology

concepts in the statement were (formally) false

(Branch subclass_of Tree), we would not find

sentences expressing SubclassOf relation (Trees are

classified into branches or Branches are trees), or at

least a smaller quantity. The proposed approach to

validate ontology statements before reuse can be

summarized in three steps:

Step 1: LSPs Instantiation. The purpose is to

instantiate the set of LSPs in the repository with the

ontology concepts in the statement to be reused. By

instantiation we understand the replacement of Noun

Phrases (NP) contained in LSPs, with the ontology

labels in ontology statements.

The following aspects need to be taken into account

before proceeding with the instantiation:

1. Ontology labels have to be used in its plural form.

In ontologies, labelling style guides (see Fliedl et al.,

2007) recommend to express labels that designate

concepts in singular. However, when referring to

classes in general, the English language makes use

of the plural form of nouns. For instance, tress have

branches. Therefore, the first thing is to obtain the

plural form of each ontology label.

2. Optional elements have to be removed, thus

maintaining the basic form of the pattern. The set of

optional elements that appear in LSPs are included

in square brackets, and, if omitted, they do not

modify the basic meaning conveyed by the pattern.

The reason for using the pattern in its basic form is

motivated by the effort required by the manual

instantiation of LSPs with optional elements. Should

the approach be automated, this problem would

disappear.

3. The semantic role of the concepts in the ontology

statement is maintained in the pattern instantiation.

Semantic roles are signalized in LSPs by means of

angle brackets (<superclass>), as detailed in the

Appendix at the end of the paper. Semantic roles in

LSPs are closely related with the grammatical

function of a certain Noun Phrase in the pattern, and

are essential components of LSPs. As a first

approach to the modelling error presented here, we

create equivalences between “superclasses” and

“wholes”, and “subclasses” and “parts”, and keep

those roles in the instantiation. However, this

strategy is just one of several possible strategies. We

are now investigating the possibility of instantiating

LSPs allowing the two labels in the ontology

statements to play any semantic role in LSPs.

Step 2: Web Search. The purpose is to introduce

the resulting NL sentences in a Web search engine to

check their frequency in domain texts.

Step 3: Validation. The purpose of this step is to

evaluate the resulting matches for each LSP. The

instantiations of those LSPs conveying the relation

that really holds between the ontology concepts

should appear in Web documents more frequently.

In the near future, the idea would be to automate

these steps to achieve an automatic validation of

ontology statement before its reuse. However, up to

now, we have only evaluated the proposed approach

manually, as explained in section 5.1.

5.1 Example of Use

In this section, our aim is to describe how to proceed

with a manual validation of ontology statements

with LSPs. We will concentrate on those ontology

statements that incorrectly express the SubclassOf

and the Part-Whole relations, since these relations

account for some of the most common errors in

ontologies, as already outlined in section 2.

Let us take as example the erroneous ontology

statement chapter_subclass_of_book. This is a real

example of an ontology statement accessed through

Watson from a subset of the AKT Reference

Ontology (Portal Ontology) written in OWL LITE

(http://www.csd.abdn.ac.uk/~cmckenzi/playpen/rdf/

akt_ontology_LITE.owl).

Step 1: LSPs Instantiation. The terms or labels that

name the concepts “chapter” and “book” will now

be used to instantiate the different LSPs in which the

SubclassOf relation and the Part-Whole relation are

involved, i.e., the LSPs described in section 3. Let us

illustrate the instantiation process with LSP 1 in

Table 2 corresponding to the SubclassOf relation:

USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY DEVELOPMENT

211

[(NP<subclass>,)* and] NP<subclass> be [CN]

NP<superclass>

1. The first action is to obtain the plural forms of the

terms in question: chapters and books.

2. Optional elements are removed. This is the case of

[CN] in the pattern taken as example. CN stands for

“Class Name: Generic names for semantic roles

usually accompanied by a preposition, such as

(sub)class, type, or category” (as detailed in the

Appendix). By keeping this optional element, the

LSP in our example could have the following

instantiations: chapters are books, chapters are

types of books, chapters are subclasses of books, etc.

3. Semantic roles are maintained. In the present case,

chapter_subclass_of_book, “chapter” plays the role

of subclass, and “book” the role of superclass. These

roles should be maintained in the instantiation of

LSPs for the SubclassOf relation. In the LSPs for the

Part-Whole relation, “chapter” would be considered

playing the role of “part”, and “book” the role of

“whole”.

Step 2: Web Search. Once we have instantiated all

LSPs in which the SubclassOf relation and Part-

Whole relation are involved (see Table 6), we

proceed to manually include the resulting NL

sentences into the Google search engine. The

Google configuration options used in this step were:

 It should only find those pages that contained

the exact wording or phrase.

 It should only look in pages in English.

 It could access files in any format.

The first configuration option is the most important,

because in this way Google only counts the number

of appearances in which web documents exactly

contain the resulting NL sentences.

This step would result in numerical data about how

many times sentences appear in Google. It is worth

mentioning that Google does not give information

about the total amount of indexed pages it accesses

during the search (e.g., for the instantiation

“chapters are books” Google returned 7 matches,

whereas for “books contain chapters”, it returned

547 matches).

Step 3: Validation. According to the results, the

relation to be hold between the ontology concepts in

the ontology statements is confirmed to be valid, or

proved to be false. We are currently working on the

establishment of statistical parameters to

automatically validate ontology statements.

Table 6: Ontology statement instantiation example.

LSP-SC-EN / LSP-SC-Di-EC-EN

1 “chapters are books”

2 “chapters are grouped into books”

3 “chapters fall into books”

4 “chapters belong to books”

5 “examples of books are chapters”

6 “books are classified into chapters”

LSP-SC-PW-EN

7 “books include chapters”

8 “books consist of chapters”

9 “books are divided into chapters”

LSP-PW-EN / LSP-OD-DP-PW-EN

10 “chapters are contained in books”

11 “chapters compose books”

12 “chapters make up books”

13 “chapters are part of books”

14 “books are made up of chapters”

15 “books contain chapters”

16 “books have chapters”

6 CONCLUSIONS

The main purpose of the work presented here was to

show the suitability of Lexico-Syntactic patterns

(LSPs) in the development of ontologies,

particularly, in the reuse of Ontology Design

Patterns (ODPs), and for the validation of ontology

statements. The initial hypothesis was that some

linguistic constructs reliably convey ontological

relations. By establishing correspondences between

LSPs and ODPs in the LSPs repository, we provide

a method that enables novice users the formulation

of modelling aspects in NL, which are then

transformed into ODPs. Additionally, the LSPs

repository can be employed in the quality

assessment of ontology statements, since the

information captured in the statement has its

correspondence to expressions in NL collected in the

repository. Our proposal has been detailed and

illustrated in two examples of use. The examples

already point out some of the future lines of

research, such as the establishment of statistical

parameters for analyzing “ontology statements

validation results” (Step 3). Additionally,

experiments are also needed to test the proposed

method for the reuse of ODPs, and to further

improve the LSPs repository.

KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development

212

REFERENCES

Aguado de Cea, G., and Montiel-Ponsoda, E. 2009. When

Phraseology and Ontologies Meet. In Proccedings of

the XI Simposio Internacional de Comunicación

Social. Santiago de Cuba, Cuba.

Aguado De Cea, G., Gómez-Pérez, A., Montiel-Ponsoda,

E., and Suárez-Figueroa, M.C. 2008. Natural

Language-Based Approach for Helping in the Reuse

of Ontology Design Patterns. In Proceedings of the

16th International Conference on Knowledge

Engineering (EKAW), pp. 32-47.

Aussenac-Gilles, N., and Jacques, M.P. 2006. Designing

and Evaluating Patterns for Ontology Enrichment

from Texts. In Staab, S. and Svatek, V. (ed.)

Managing Knowledge in a World of Networks, 15th

International EKAW Conference, Springer Verlag.

Berland, M., and Charniak, E. 1999. Finding parts in very

large corpora. In Proceedings of the 37th Annual

Meeting of the Association for Computational

Linguistics, College Park, MD.

Cimiano, P. 2006. Ontology Learning and Population from

Text. Algorithms, Evaluation and Applications.

Springer Verlag.

Cimaino, P. and Wenderoth, J. 2007. Automatic

Acquisition of Ranked Qualia Structures from the

Web. In Proceedings of the Annual Meeting of the

Association for Computational Linguistics (ACL), pp.

888-895.

Dzbor, M., Suárez-Figueroa, M.C., Blomqvist, E., Lewen,

H., Espinoza, M., Gómez-Pérez, A., and Palma, R.

2009. NeOn Deliverable D5.6.2 Experimentation and

Evaluation of the NeOn Methodology.

Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M.,

Shaked, T., Soderland, S., Weld, D., and Yates, A.

2004. Methods for domain-independent information

extraction from the web: An experimental comparison.

In Proceedings of the 19th National Conference on

Artificial Intelligence (AAAI), pp. 391-398.

Fliedl, G., Kop Ch., and Vöhringer, J. 2007. From OWL

Class and Property Labels to Human Understandable

Natural Language. In Natural Language Processing

and Information Systems, Volume 4592/2007,

Springer Berlin / Heidelberg.

Gangemi, A. 2005. Ontology Design Patterns for Semantic

Web Content. In: Musen et al. (eds.): Proc. of the 4

ISWC, Springer, Heidelber.

Hearst, M. 1992. Automatic acquisition of hyponyms from

large text corpora. In Proceedings of the 14

International Conference on Computational

Linguistics (COLING), pp. 539-545.

Kavalec, M. and Svátek, V.2005. A study on automated

relation labelling in ontology learning. In Buitelaar,

P., Cimiano, P., Magnini, B.(eds.) Ontology Learning

from Text: Methods, Evaluation and Applications. IOS

Press, pp. 44-58.

Menzies, T. 1997. Object-oriented patterns: Lessons from

expert systems. Software-Practice and Experience,

1(1).

Pasca, M. 2005. Finding Instance Names and Alternative

Glosses on the Web: WordNet Reloaded. In

Computational Linguistics and Intelligent Text

Processing, Springer Berlin/Heidelberg, pp. 280-292.

Reinberger, M.L., and Spyns, P. 2004. Discovering

knowledge in texts for the learning of dogma-inspired

ontologies. In Proceedings of the ECAI Workshop on

Ontology Learning and Population, pp. 19-24.

Sabou, M., Aguado de Cea, G., d’Aquin, M., Daga, E.,

Lewen, H., Montiel-Ponsoda, E., Presutti, V. and

Suárez-Figueroa, M.C. 2009. NeOn D2.2.3.

Evaluation and Selection of Knowledge Components.

Sabou, M., d’Aquin, M., and Motta, E. 2008. Exploring

the Semantic Web as Background Knowledge for

Ontology Matching. Journal on Data Semantics, XI.

Sánchez, D., and Moreno, A. 2008. Learning non-

taxonomic relationships from web documents for

domain ontology construction. In Data Knowledge

Engineering, 64, pp. 600-623.

Suárez-Figueroa, M.C., Brockmans, S., Gangemi, A.,

Gómez-Pérez, A., Lehmann, J., Lewen, H., Presutti, V.

and Sabou, M. 2007. NeOn D5.1.1. NeOn Modelling

Components.

Suárez-Figueroa, M.C. et al. 2008. NeOn D5.4.1. NeOn

Methodology for Building Contextualized Ontology

Networks.

APPENDIX

SYMBOLS & ABBREVIATIONS in LSPs

CATV

Verbs of Classification. Set of verbs of classification plus

the preposition that normally follows them. Some of the

most representative verbs in this group are: classify

in/into, categorize in/into, subclassify in/into,

subcategorize in/into.

Cardinal Number.

Class Name. Generic names for semantic roles usually

accompanied by preposition, such as class, type, category

COMP

Verbs of Composition. Set of verbs meaning that

something is made up of different parts. Some of the

most representative ones are: contain, hold, consist of,

compose of, make up of, form of/by, constitute of/by.

NP<…>

Noun Phrase. It is defined as a phrase whose head is a

noun or a pronoun, optionally accompanied by a set of

modifiers, and that functions as the subject or object of a

verb. NP is followed by the semantic role played by the

concept it represents in the conceptual relation in

question in <…>, e.g., class, subclass.

PARA

Paralinguistic symbols like colon, or more complex

structures as as follows, etc.

QUAN

Quantifiers such as all, some, most, many, several, every.

( )

Parentheses group two or more elements.

Asterisk indicates repetition.

[ ]

Elements in brackets are meant to be optional, which

means that they can be present either at that stage of the

sentence or not, and by default of appearance, the pattern

remains unmodified.

USING LINGUISTIC PATTERNS TO ENHANCE ONTOLOGY DEVELOPMENT

213