Scalable Prediction of Atomic Candidate OWL Class Axioms Using a

Vector-Space Dimension Reduced Approach

Ali Ballout

, C

elia da Costa Pereira

and Andrea G. B. Tettamanzi

Universit

e C

ote d’Azur, I3S, Inria, Sophia Antipolis, France

Universit

e C

ote d’Azur, I3S, CNRS, Sophia Antipolis, France

Keywords:

Ontology Learning, OWL Axioms, Concept Similarity, Vector-Space Modeling.

Abstract:

Scoring candidate axioms or assessing their acceptability against known evidence is essential for automated

schema induction and can also be valuable for knowledge graph validation. However, traditional methods for

accurately scoring candidate axioms are often computationally and storage expensive, making them impracti-

cal for use with large knowledge graphs. In this work, we propose a scalable method to predict the scores of

atomic candidate OWL class axioms of different types. The method relies on a semantic similarity measure

derived from the ontological distance between concepts in a subsumption hierarchy, as well as feature ranking

and selection for vector-space dimension reduction. We train a machine learning model using our reduced

vector-space, encode new candidates as a vector, and predict their scores. Extensive tests that cover a range of

ontologies of various sizes and multiple parameters and settings are carried out to investigate the effectiveness

and scalability of the method.

1 INTRODUCTION AND

MOTIVATION

Ontologies play a critical role in artiﬁcial intelli-

gence (AI) systems by providing structured and for-

mal representations of knowledge in a speciﬁc do-

main (Chandrasekaran et al., 1999). In the seman-

tic Web, ontologies can be expressed using the Web

Ontology Language (OWL) (OWL Working Group,

2012). These ontologies consist of classes that repre-

sent concepts in the domain and relationships that de-

ﬁne how they are related (Khadir et al., 2021). They

also include a set of axioms that provide a logical ba-

sis for reasoning about the domain and making infer-

ences based on the knowledge represented in the on-

tology. Class axioms, in particular, are important for

deﬁning the properties and characteristics of classes

in the ontology. For example, a class axiom might

specify that a certain class is a subclass of another

class, or that it has certain attributes or relationships

with other classes. This structured knowledge rep-

resentation allows AI systems to reason about the

domain and make predictions or recommendations

based on that knowledge.

However, creating ontologies can be a time-

consuming and error-prone process, particularly for

large and complex domains. This challenge is known

as the knowledge acquisition bottleneck (Cullen and

Bryman, 1988). As a solution for this, the ﬁeld of on-

tology learning (Maedche and Staab, 2004) emerges,

which is the process of automatically constructing

an ontology from a given set of data (Lehmann and

olker, 2014). This process involves identifying the

classes and relationships that exist within the data and

encoding this knowledge in a structured and formal

representation. Linguistic and statistical approaches

are traditionally utilized for the ontology learning pro-

cess, where machine learning techniques are often

combined with such approaches to complement and

improve their results (Khadir et al., 2021).

Candidate axiom scoring involves evaluating the

suitability of a candidate axiom based on the available

evidence from known facts or data. This task is cru-

cial for automated schema or ontology induction and

can also aid in ontology and knowledge graph valida-

tion. Essentially, candidate axiom scoring is a tech-

nique used in ontology learning to assess the quality

of candidate axioms. Its signiﬁcance lies in its abil-

ity to identify the most credible axioms that can be

incorporated into an ontology (Ballout et al., 2022b).

Nevertheless, machine learning techniques that

tackle the task of candidate axiom scoring, face a

scalability problem when dealing with large ontolo-

Ballout, A., da Costa Pereira, C. and Tettamanzi, A.

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach.

DOI: 10.5220/0012384200003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 2, pages 347-357

ISBN: 978-989-758-680-4; ISSN: 2184-433X

347

gies and the number of facts and entities they in-

clude (Nickel et al., 2012). This is because the pro-

cess can be intensive in terms of storage and com-

putation, particularly for large and complex datasets.

For ontology learning, this would require techniques

to step away from instance data when possible and

rely more on what has already been established in an

ontology’s structure. As a result, scalable techniques

and models with the ability of addressing ontologies

of different sizes while maintaining satisfactory per-

formance without incurring excessive computational

and storage costs become a necessity.

In this paper, we present the issue of dealing with

large ontologies and its effect in terms of storage and

computation when attempting to score candidate class

axioms. We highlight the importance of the chal-

lenge at hand by experimenting with a state-of-the-

art (SOTA) approach using large-size ontologies. In

addition, we propose an approach that scores atomic

candidate OWL class axioms of different types for on-

tologies of different sizes. We do so by utilizing an

ontological semantic similarity (Corby et al., 2006)

between concepts and extending it to axioms, we in-

corporate feature selection technique on our dataset to

pick the most impactful axioms to act as our dimen-

sions in an axiom-based vector space. We encode can-

didate axioms into this vector space without the need

for any instance data. We experiment using DBpedia,

Gene ontology (GO), and Cell ontology (CL) to test

the scalability of the approach, as well as the effect

of feature selection on performance, storage cost and

computation time.

This work is structured as follows: in Sect. 2 we

give an overview of some related work and SOTA ap-

proaches; Sect. 3 provides some background about

ontological semantic similarity, the possibilistic ax-

iom scorer and feature selection. As for Sect. 4 it

lays out the methodology explaining how the axioms

are extracted and scored, how we build the seman-

tic measure, and also how we model an axiom based

vector-space leading to the prediction of a candidate

axiom’s score. We detail our experiments in Sect. 5

then present and analyze the results in Sect. 6. We

end the paper with some notes and conclusions.

2 RELATED WORK

Since the current study aims at developing a novel ap-

proach to scalable prediction of candidate class axiom

scores, it is relevant to provide an overview of previ-

ous research on the topic of predicting the score of

candidate OWL class axioms.

Some of the works addressing the challenge of

predicting the ﬁtness or score of candidate OWL

class axioms build on an idea presented in (Ballout

et al., 2022a), where a number of truth-labeled for-

mulas (in this case axioms) are modeled into a vector-

space with a semantic similarity measure used as the

weights, then a model is trained using these formulas

to enable it to predict a label/score for new candidate

formulas. This approach proves to be accurate even

when the number of available formulas with known

scores/labels is sparse. This kind of approach has

been validated in works related to ontology learning,

such as (Hassanpour et al., 2014), where the authors

deal with semantic web rule language (SWRL) rules.

One such method that applies the above approach

to OWL class axioms is (Malchiodi and Tettamanzi,

2018). It uses a similarity measure based on instance

data counting, which is reminiscent of the Jaccard in-

dex. It shows the ability of such a technique to ap-

proximate a score for atomic candidate OWL sub-

sumption axioms, but much remains to be improved

in terms of performance and accuracy. The same

similarity is used in (Malchiodi et al., 2020), which

uses methods such as principle component analysis

(PCA) to map axioms into a lower-dimension space.

This form of dimensionality reduction, which is unre-

lated to ours in methodology or goal,

combined with

instance-based similarity measures, resulted in less

than satisfactory performance. Indeed, the authors ex-

pected to see a clear separation between accepted and

rejected axioms, which would have made it possible

for unsupervised method to perform the task of label-

ing candidate axioms, but their results did not support

this hypothesis.

However, an instance-based similarity measure

fully relies on an ontology’s instance data, and any

lack of such data results in ignorance, while an exces-

sive amount of data overwhelms the method. Some

follow a different path, such as (Chen et al., 2022),

which takes the embedding approach utilizing an on-

tology’s class subClassOf hierarchy, also known as

the is-a hierarchy, to predict subsumers. This ap-

proach, like the ones we mentioned before, only ad-

dresses subsumption. It also proves to be compu-

tationally complex as it follows a breadth-ﬁrst algo-

rithm when embedding a subsumption relation. This

algorithm keeps extracting the subsumers of each of

the classes till reaching a leaf, or the superclasses till

reaching the root in order to generate a sentence. The

authors mention limiting the length of their sentences

for the evaluation, highlighting a trade-off between

PCA is a technique used to map data into lower dimen-

sional planes, where as feature selection ranks your dimen-

sions in terms of how useful they are to predict the target

value and allows you to drop the low ranking dimensions.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

348

having complete sentence context and redundancy.

This is answered by (Ballout et al., 2022b), which

addresses the problem of instance data dependency

and provides an alternative axiom-based vector-space

modeling technique. This technique uses a seman-

tic similarity based on the subsumption hierarchy of

an ontology. This work, as ours, extends its scope

to include disjointWith class axioms as well. It high-

lights a reduction in error rate and computation time

compared to other works. In terms of performance,

it outperforms (Malchiodi et al., 2020) without any

attempt to reduce dimensionality. The authors only

experiment using DBpedia, which includes 762 con-

cepts, positioning it as a smaller ontology when com-

pared with ontologies with tens of thousands of con-

cepts such as GO and CL.

In general, previous research on the prediction of

candidate class axiom scores has provided valuable

insights into this problem and has laid the founda-

tion for the current study. Our work aims to ad-

dress the shortcomings of the mentioned works, espe-

cially with respect to computational complexity, stor-

age cost, and scalability, by developing a novel ap-

proach for scalable prediction of atomic candidate

OWL class axiom scores. We do this by incorpo-

rating feature selection and utilizing the ontological

concept semantic similarity extended to axioms de-

tailed in (Ballout et al., 2022b). Our approach, as all

the ones mentioned, addresses atomic OWL class ax-

ioms, which are axioms containing one concept on

each side and, like (Ballout et al., 2022b), it can deal

with subsumption, disjointness and equivalence. In

this paper, we will compare our work with (Ballout

et al., 2022b).

3 BACKGROUND

3.1 Ontological Axiom Semantic

Similarity

To achieve scalability, one of the challenges we want

to overcome is the reliance on instance data, which

can be overwhelming when dealing with larger on-

tologies. For this reason we seek a similarity measure

that is independent of instance data. In this case it is

a measure dependent on the subsumption hierarchy.

In our work, we use a semantic similarity measure

between axioms developed in (Ballout et al., 2022b).

Here we summarize how that measure is calculated

between the axioms and how it is extended from an

ontological concept similarity measure.

The concept similarity measure acting as a foun-

dation for the axiom similarity is detailed in (Corby

et al., 2006) under the subsection titled Ontological

Approximation, it is a distance calculated between

two concepts by using the subsumption path length

with the general deﬁnition:

∀(t

) ∈ H

) = min

(

⟨

⟩

) + l

(

⟨

⟩

))

= min

∑

{x∈<t

,t>,x̸=t

}

1/2

(x)

∑

{x∈<t

,t>,x̸=t

}

1/2

(x)

(1)

Formula 1 translates to: for all type pairs t

and

in an inheritance hierarchy H, the ontological dis-

tance between t

and t

in the inheritance hierarchy H

is the minimum of the sum of the lengths of the sub-

sumption paths between each of them and a common

super type. And the length of the subsumption path

between a type t

and its direct supertype t is equal to

1/2

(t)

with d

(t) being the depth of t in H.

The authors of (Ballout et al., 2022b) then extend

this measure to axioms by performing the following

steps:

1. Extract the distances between all concepts in the

ontology and store them in a concept similarity

matrix 1.

2. Compare each axiom with all other axioms in the

dataset.

3. When comparing two axioms, retrieve from the

concept similarity matrix the similarity/distance

between the concepts on the left side of the ax-

iom.

4. Repeat the previous step for the right side.

5. In case of symmetric axiom types (disjointness/e-

quivalence) repeat the comparison between the

left concept from the ﬁrst axiom and the right con-

cept of the second axiom, and then between the

right concept from the ﬁrst axiom and the left con-

cept from the second one. Keep the higher values

between both comparisons.

6. Take the average of the two values that you have

as a result of the previous step.

7. Store that value in an axiom similarity matrix 1.

This process is used for the construction of an axiom-

based vector-space, where each axiom or candidate

can be represented by a vector of its similarity to all

other axioms.

3.2 Axiom Scoring via Possibility

Theory

Many of the works mentioned in Sect. 2 like (Ball-

out et al., 2022b; Malchiodi et al., 2020; Malchiodi

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach

349

and Tettamanzi, 2018), use a dataset of subclassOf

axioms scored by a possibilistic heuristic (Tettamanzi

et al., 2017). This heuristic uses possibility the-

ory (Dubois and Prade, 1988), a mathematical theory

of epistemic uncertainty, whose central notion is that

of a possibility distribution that assigns to each ele-

mentary event a degree of possibility ranging from 0

(impossible, excluded) to 1 (completely possible, nor-

mal). A possibility distribution π induces a possibility

measure Π, corresponding to the greatest of the pos-

sibilities associated to an event and the dual neces-

sity measure N, equivalent to the impossibility of the

negation of an event.

Since we compare our work with (Ballout et al.,

2022b) we consider the same scorer for the DBpedia

subClassOf dataset. Here, we provide a brief expla-

nation of the theory behind the scoring.

If we denote by u

the support of φ, which is the

cardinality of its content, by u

the number of conﬁr-

mations of φ and by u

−

the number counterexamples

of φ, the possibility and the necessity of candidate ax-

iom φ may be deﬁned as follows:

• if u

> 0,

Π(φ) = 1−

1 −

− u

−

; N(φ) =











1 −



−u



, if u

−

= 0,

0, if u

−

> 0;

(2)

• if u

= 0, Π(φ) = 1 and N(φ) = 0, we are in a state of

maximum ignorance, given that no evidence is available

in the RDF dataset to assess the credibility of φ.

The possibility and necessity of an axiom can then

be combined into a single handy acceptance/rejection

index.

ARI(φ) = N(φ)+Π(φ)− 1 = N(φ)− N(¬φ)

= Π(φ) − Π(¬φ) ∈ [−1, 1],

(3)

because N(φ) = 1 − Π(¬φ) and Π(φ) = 1 − N(¬φ)

(duality of possibility and necessity). A negative

ARI(φ) suggests rejection of φ (Π(φ) < 1), whilst a

positive ARI(φ) suggests acceptance (N(φ) > 0), with

a strength proportional to its absolute value. A value

close to zero reﬂects ignorance.

Given a candidate axiom φ, expressing a hypothe-

sis about the relations holding among some entities of

a domain, we wish to evaluate its credibility, in terms

of possibility and necessity, based on the evidence

available in the form of a set of facts contained in an

RDF dataset.

Instance data in the RDF set that either conﬁrms or

contradicts a candidate axiom.

Concepts C

... C

1 S

0,1

... S

0,n

1,0

1 ... S

1,n

n−1

n−1,0

n−1,1

... S

n−1,n

n,0

n,1

... 1

(a) Concept similarity matrix.

Axioms

... A

1 S

0,1

... S

0,m

1,0

1 ... S

1,m

m−1

m−1,0

m−1,1

... S

m−1,m

m,0

m,1

... 1

  

        



(C)"YJPNTJNJMBSJUZNBUSJY

Matrix 1: Structure of the concept similarity and axiom sim-

ilarity matrices.

3.3 Feature Ranking and Selection

In machine learning, feature selection is referred to

as the process of obtaining a subset from an origi-

nal feature set according to certain feature selection

criterion, which selects the relevant features of the

dataset. It plays a role in compressing the data pro-

cessing scale, where the redundant and irrelevant fea-

tures are removed (Cai et al., 2018). It is particularly

useful in the case of high-dimensional datasets. It

does not involve dimension aggregation, nor attempts

to map higher-dimensional spaces to lower ones as

done in (Malchiodi et al., 2020).

According to their relationship with learning

methods, feature selection methods can be classiﬁed

into ﬁlter, wrapper, and embedded models (Cai et al.,

2018). In our work we use the ﬁlter model, which has

a lesser computational cost than the others (Cai et al.,

2018). A good feature selection method should have

high learning accuracy but less computational over-

head (time complexity and space complexity).

Filter feature selection methods typically utilize

evaluation criteria to increase the correlation between

the feature and the class label and decrease the cor-

relation among features. In addition, the correlation

among features is often replaced by either redundancy

or diversity (distance). These measures of relevance,

redundancy, and diversity may be identical or dis-

tinct. Filter methods involve selecting features based

on their individual statistical properties. This can be

done using techniques such as correlation analysis or

mutual information, which measure the strength of

the relationship between a given feature and the tar-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

350

get variable. At the end, features are ranked based on

their effect on the class label and then a percentage or

number of the can be kept while the rest is discarded.

For example, the mutual information gain (also

known as mutual information or MI) between two

variables x and y can be calculated as follows:

MI(x,y) =

∑

y∈Y

∑

x∈X

p(x,y)log

p(x,y)

p(x)p(y)

(4)

where p(x) and p(y) are the marginal probability dis-

tributions of x and y, respectively, and p(x,y) is the

joint probability distribution of x and y. Mutual infor-

mation measures the amount of mutual dependence

between the two variables, with higher values indicat-

ing a stronger relationship. It is often used in feature

selection to identify the most relevant features for a

given task.

4 METHODOLOGY

Our objective is to develop a scalable method to pre-

dict a score for atomic candidate OWL class axioms

by learning from a set of previously scored axioms of

the same type. To this aim, we exploit the hierarchy of

concepts formed by the subsumption rdfs:SubClassOf

axioms, combined with feature selection. A separate

model is required for each type of axiom addressed.

Following are the steps of our methodology:

1. Axiom Extraction and Scoring: This step de-

scribes the creation of the set of scored axioms

of a certain type to be learned. One approach

would be to use a scorer to label a set of gener-

ated candidate axioms to learn as done in (Ballout

et al., 2022b; Malchiodi et al., 2020; Malchiodi

and Tettamanzi, 2018). Another approach can be

to query existing axioms and label them as ac-

cepted, then generating some random axioms and

checking that they are not explicitly available or

entailed in the ontology, and labeling them as re-

jected as done in (Chen et al., 2022).

2. Axiom Similarity Calculation: This step de-

tails how we extract the concepts used in our set

of axioms and retrieve the ontological distances

pertaining to these concepts only. Unlike the

SOTA (Ballout et al., 2022b), we only query con-

cepts present in our set of axioms instead of all

concepts present in the ontology. We then calcu-

late the axiom similarity measure using the algo-

rithm detailed in (Ballout et al., 2022b) and brieﬂy

explained in Sect. 3.1, but we enhance perfor-

mance by leveraging the power of multiprocess-

ing.

3. Axiom-Based Vector-Space Modeling: This

step focuses on using the axiom similarity mea-

sures as weights; each axiom can be represented

as a vector in an axiom-based vector-space.

4. Vector-Space Dimensionality Reduction: This

step consists of applying feature selection on the

axiom similarity matrix to reduce the number of

axioms being used as dimensions to a certain pre-

deﬁned number. This results in reduced compu-

tation and processing when encoding axioms into

the vector-space as well as smaller concept and

axiom similarity matrices. Unlike (Ballout et al.,

2022b), which does not acknowledge the chal-

lenge of dealing with a large or rich ontology, we

added this step to ensure that the method does not

time out or run out of storage no matter what the

size of the ontology is.

5. Candidate Axiom Encoding: This step describes

how a new candidate axiom is introduced into

the vector-space, including the case where the

axiom consists of concepts not available in the

concept similarity matrix. The SOTA (Ballout

et al., 2022b) does not include such a step since

it naively queries all available concepts. On the

other hand, since we added a step that limits our

method to query only concepts found in the set of

axioms used, we had to add this new step to query

any new concept that might be introduced by a

candidate axiom.

6. Prediction: This step is dedicated to training a

machine learning model with the dataset (vector-

space model and scores) and predicting the scores

of new candidate axioms.

We begin by preparing the set of scored axioms

that we want to use to train our model. Then we ex-

tract the concepts which our axiom set consists of. Af-

ter that we query the ontology to retrieve the ontologi-

cal distances between only the concepts we extracted.

We then calculate the axiom similarity between our

axioms and use it to model our axiom-based vector-

space. We then utilize feature selection to reduce the

size of our vector-space by reducing the number of

axioms acting as dimensions to those that are most

impactful, which leads to a reduction in the size of the

concept and axiom similarity matrices. We train a ma-

chine learning model using our new reduced vector-

space, encode new candidate axioms as a vector, and

predict their scores.

4.1 Axiom Extraction and Scoring

We adopt two approaches to create our set of ax-

ioms. The ﬁrst approach is discussed in (Ballout et al.,

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach

351

2022b; Chen et al., 2022). It dictates that we ﬁrst

query an ontology for existing axioms of a certain

type and by that we obtain a set of axioms which

would have positive scores. Following that, we gen-

erate rejected axioms. We do this by constructing

an axiom with a pair of random concepts, the axiom

is of the form subClassOf/disjointWith(C

), with

̸= C

. We then check if the axiom exists in or is

entailed by the ontology; if so, the generated axiom

is discarded, otherwise it is kept. The resulting gen-

erated set of axioms will have a negative score. The

rationale is that a randomly generated axiom can be

expected to be false with a very high probability. In

our work, we add a limit to the number of axioms be-

ing selected and generated, while in the SOTA (Ball-

out et al., 2022b) every possible combination is gen-

erated: this is a critical point when dealing with large

ontologies.

Query 1 is used to extract a given number of ex-

isting axioms and generate a given number of random

ones, followed by removing the existing from the gen-

erated. We ignore the blank nodes as well as instances

where both classes are the same. This produces a bal-

anced set of positive and negative axioms. The server

used is Corese (Corby et al., 2004) which applies rea-

soning to check entailed axioms.

0 select DISTINCT ?class1 ?class2 ?label where {

1 {select ?random ?class1 ?class2 ?label where {

2 ?class1 a owl:Class

3 ?class2 a owl:Class

4 ?class1 rdfs:subClassOf ?class2

5 filter (!isBlank(?class1) && !isBlank(?class2) &&

(?class1 != ?class2))

6 bind(1.0 as ?label)

7 BIND(RAND() AS ?random) .

8 }ORDER BY ?random

9 limit 500}

10 UNION

11 {select ?random ?class1 ?class2 ?label where

12 {{?class1 a owl:Class

13 ?class2 a owl:Class

14 filter (!isBlank(?class1) && !isBlank(?class2) &&

(?class1 != ?class2))

15 bind(0 as ?label)

16 BIND(RAND() AS ?random) .}

17 minus

18 {?class1 a owl:Class

19 ?class2 a owl:Class

20 ?class1 rdfs:subClassOf ?class2

21 filter (!isBlank(?class1) && !isBlank(?class2))

22 bind(0 as ?label)

23 BIND(RAND() AS ?random) .}

24 } ORDER BY ?random

25 limit 500}}

Query 1: Extraction of an rdfs:subClassOf axiom balanced

set with a size of 1000 axioms using random generation.

A second, more judicious approach adopted

in (Ballout et al., 2022b) is to only query existing

axioms. For example, if we want to train a model

to predict subClassOf axioms, we would query for

n subClassOf axioms and consider that as the set of

positive subClassOf axioms. We would then query n

disjointWith axioms and consider them as the set of

negative subClassOf axioms. If one query retrieves a

number of axioms lesser than the limit n, we can drop

the excess axioms from the other set to maintain bal-

ance. SOTA (Ballout et al., 2022b) applies no limit

and extracts all available axioms. We again provide a

limit, as methods that implement the work of (Ballout

et al., 2022a) perform well even with small datasets.

Query 2 shows our implementation.

0SELECT ?class1 ?class2 ?label WHERE {

1 ?class1 a owl:Class

2 ?class2 a owl:Class

3 ?class1 rdfs:subClassOf ?class2

4 filter (!isBlank(?class1) && !isBlank(?class2)

&& (?class1 != ?class2))

5 bind(1.0 as ?label)

6 BIND(RAND() AS ?random) .} ORDER BY ?random

7 LIMIT 500

8SELECT ?class1 ?class2 ?label WHERE {

9 ?class1 a owl:Class

10 ?class2 a owl:Class

11 ?class1 owl:disjointWith ?class2

12 filter (!isBlank(?class1) && !isBlank(?class2)

&& (?class1 != ?class2))

13 bind(0 as ?label)

14 BIND(RAND() AS ?random) .} ORDER BY ?random

15 LIMIT 500

Query 2: Extraction of an rdfs:subClassOf axiom balanced

set with a size of 1000 axioms using a negated axiom type.

The following step is to score the axioms. This is

done by inputting the extracted axioms into the pos-

sibilistic heuristic (Tettamanzi et al., 2017), and re-

ceiving an output ﬁle containing the scores (ARI). We

note that the process is very slow; (Malchiodi and Tet-

tamanzi, 2018) mentions that it took a little less than

a year to score 722 axioms, whence the need for a

method such as ours.

4.2 Axiom Similarity Matrix

Like in (Ballout et al., 2022b), the axiom similarity is

derived from the concept similarity. This means that

we ﬁrst need to query Corese, where the ontological

distance metric is implemented as a function, to re-

trieve this similarity.

Query 3 provides our implementation of the re-

trieval of the concept similarity measure. In contrast

with (Ballout et al., 2022b) our approach takes advan-

tage of the ability to run multiple queries at a time

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

352

and considers only concepts found in our set of ax-

ioms instead of all concepts in the ontology. This re-

sults in an initial reduction of computational cost and

matrix size. We divide our set of concepts into k sub-

sets, then query Corese with k queries, each including

one of those subsets and the main set concepts for the

distances between them. This allows us to speed up

the process of creating the concept similarity matrix

1 by k times. This also drastically reduces the storage

space needed to store the matrices, by reducing the

values to exactly what is used.

0select * (kg:similarity(?class1, ?class2) as

?similarity) where {

1 ?class1 a owl:Class

2 ?class2 a owl:Class

3 filter (!isBlank(?class1) &&

!isBlank(?class2) &&

4 str(?class1) IN (subset) && str(?class2) IN

(concepts))}

Query 3: Concept ontological distance retrieval.

Next, we use the algorithm explained in Sect. 3.1

and detailed in (Ballout et al., 2022b) (Algorithm

1 under subsection ’Semantic Measure Construction

and Assignment’) to calculate the similarity between

axioms. We end up with an m × m axiom similar-

ity matrix 1. The diagonal of this matrix will contain

only 1s as the similarity between an axiom A and itself

is 1.

4.3 Axiom Base Vector-Space Modeling

We model our vector space to encode axioms into

vectors. The initial number of dimensions d of this

vector space is equal to the number of axioms in our

scored axiom set. Each axiom can be represented as

a vector V in this d-dimensional space. Considering

the weights of our vectors are the similarities between

axioms, it would be intuitive to view our axiom sim-

ilarity matrix as a representation of our axiom-based

vector space.

4.4 Vector-Space Dimension Reduction

We now shift our focus to ranking and reducing the di-

mensions of our vector space. By doing so we achieve

the following:

• A reduction in the error rate due to the reduction

in noise and redundancy.

• A reduction in the size of our vector-space and

storage space for the axiom similarity matrix.

• A reduction in the size of our concept similarity

matrix. The matrix will only include concepts

that constitute axioms acting as dimensions in our

vector-space.

• A reduction in the computational complexity

when encoding new candidate axioms into the

vector-space. We will be comparing the new ax-

iom to a subset of the initial axioms acting as di-

mensions d.

• A reduction in the look-up time when retrieving

the concept similarity value from the new smaller

concept similarity matrix.

• A better dataset for our machine learning model

to train on with regards to redundancy.

To this aim, we consider our dimensions as fea-

tures and apply a supervised ﬁlter-type feature selec-

tion method such as mutual information. This works

by taking as input the axiom similarity matrix along

with the scores of the axioms and returning a ranking

of the dimensions from the most to the least impact-

ful . We then keep a percentage of these dimensions

according to their ranks and discard the rest.

Our new axiom similarity matrix is of size m × z,

z being the number of dimensions selected from the

original dimensions d. This in turn affects the concept

similarity matrix, which will become of size n × f , f

being the number of concepts included in the selected

axioms acting as dimensions in our reduced vector

space. New candidate axioms will be encoded into the

vector space with the reduced number of dimensions.

This means lower processing complexity in terms of

computation cost and storage cost, which are the keys

for scalability.

4.5 Candidate Axiom Encoding

The candidate axiom encoding process includes two

cases. In the ﬁrst case, the candidate axiom is made

up of two concepts already found in our concept sim-

ilarity matrix. If so, the candidate axiom goes straight

through the algorithm mentioned in Sect. 3.1, as we

did with the training set. The candidate then becomes

a new vector in our vector-space, ready for score pre-

diction.

The second case covers candidate axioms that

contain concepts not found in our concept similarity

matrix. Such candidates invoke a new similarity mea-

sure retrieval query 3. In this query the subset variable

includes the new concepts and the concepts variable

includes the concepts found in the new reduced set of

axioms acting as dimensions. This produces at most

two new rows to be added to the matrix, if none of

the candidate’s concepts are in the concept similarity

matrix. After that, the candidate proceeds normally

through the vector encoding algorithm.

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach

353

4.6 Prediction

Now that we have our reduced vector space, we

can apply machine learning methods. A simplistic

method such as k-NN can be used to highlight the

strength of our similarity measure, so we use it along

with more sophisticated methods such as random-

forest regressors. We choose this method to be able

to compare with (Ballout et al., 2022b), since they

achieve their best results using it.

5 EXPERIMENTATION

PROTOCOL

We use the following hardware conﬁguration for our

experiments:

• CPU: Intel(R) Xeon(R) CPU W-11955M @

2.60GHz base and 4.5 GHz all core boost. With 8

cores and 16 threads.

• A total of 128 GB of RAM memory with fre-

quency 3200 MHZ.

• 1 TB of NVME SSD storage with read and write

speeds of up to 2000 MB per second.

In addition, the code

uses the Python multipro-

cessing package to distribute the dataset-building and

querying tasks over all available cores.

We searched for ontologies of different sizes and

domains to use in our experiments and selected the

following:

• DBpedia,

a project aiming to extract struc-

tured content from the information created in the

Wikipedia project. This was designated as the

real-life scenario having a large number of con-

cepts as well as being prone to errors and not be-

ing hand-crafted by engineers. Number of con-

cepts: 762.

• GO,

which is the Gene ontology, the world’s

largest source of information on the functions of

genes. Number of concepts: 29,575.

• CL,

which is the Cell ontology, designed as a

structured controlled vocabulary for Cell types,t

his ontology was constructed for use by the model

organism and other bioinformatics databases.

Number of concepts: 78,835.

https://github.com/ali-ballout/Scalable-Prediction-of-

Atomic-Candidate-OWL-Class-Axioms-Using-a-Vector-

Space-Dimension-Reduced-Appr

https://www.dbpedia.org/resources/ontology/

http://geneontology.org/docs/download-ontology/

https://www.ebi.ac.uk/ols/ontologies/cl

1 2 3 4

5 6

7 8 9

0.2

0.3

0.4

0.5

0.6

Experiment number

Error rate in RMSE (lower is better)

Our approach

SOTA (Ballout et al., 2022b)

Figure 1: A graph comparing the performance of our pro-

posed approach and the SOTA in RMSE using the subClas-

sOf dataset containing 722 axioms through out 9 experi-

ments (train/test splits), in each experiment the SOTA is

trained using 100% of the dimensions, while our model is

trained using a selected percentage of the dimensions equiv-

alent to the experiment’s number × 10% so from 10% to

90%

For DBpedia, and to be able to compare

with (Ballout et al., 2022b), we use the same scored

subClassOf dataset consisting of 722 axioms. The

dataset is scored using the possibilistic heuristic de-

tailed in (Tettamanzi et al., 2017) and brieﬂy ex-

plained in Sect. 3.2. As for GO and CL, and

since (Ballout et al., 2022b) does not experiment us-

ing these ontologies, we create our own disjointWith

datasets using the process described in Sect. 4.1. For

each ontology, we create a balanced set of 600 ax-

ioms. For the sake of experimentation and since scor-

ing 722 axioms in our smallest tested ontology took

little under a year (Malchiodi and Tettamanzi, 2018),

we gave the score of 1 to all positive axioms and 0 to

all negative axioms. This would turn the task in those

cases to classiﬁcation, which is not an issue, as long

as the approach proves scalable and with good accu-

racy. After all, the work that is the foundation for all

these vector-space approaches (Ballout et al., 2022a)

is presented as a formula classiﬁer.

Following the preparation of our dataset, we are

now able to train a regressor, in the case of DBpe-

dia, for performance experimentation in terms of error

rate. To compare with (Ballout et al., 2022b), we use a

random forest regressor, with which they achieve their

best results. In our experiments we consider Process-

ing time as the time needed in seconds to construct

the concept similarity matrix (CSM) and the axiom

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

354

similarity matrix (ASM) and Encoding time the time

needed to encode cone new candidate axiom. And in

regards to storage cost, we consider the size in mega

bytes (MB) for the ASM and in number of values

stored for the CSM.

With the smallest ontology, DBpedia, we perform

experiments to analyse the effect of feature selection

on prediction accuracy. Since the DBpedia datasets

are correctly scored using (Tettamanzi et al., 2017),

they are the most suitable to use for this type of anal-

ysis. The results for these experiments are presented

in Table 1 and Figure 1.

Using GO and CL we perform experiments to

analyse the impact of our approach on storage usage

and computational cost. For these experiments we set

our feature selection percentage at 40% based on the

results from Figure 1. We also note that in these ex-

periments we do not include the performance metric

for lack of space and since its not our main concern,

but would like to highlight that an average F1 score of

0.86 was achieved in both experiments. We compare

the feasibility of both our approach and (Ballout et al.,

2022b) when applied to these ontologies. The results

are presented in Table 2 for GO and Table 3 for CL.

6 RESULTS AND ANALYSIS

Figure 1 depicts the performance of both our pro-

posed method and the SOTA (Ballout et al., 2022b).

It compares the error rate in terms of RMSE for both

methods through out a series of 9 experiments where

the percentage of dimensions selected in our method

is incremented by 10% each run. This plot shows

the effect of feature selection on the prediction ac-

curacy of the model. We can see that when a very

small number of dimensions is selected (< 30%) the

method cannot make accurate predictions. The error

rate decreases as we increase the number of selected

dimensions until we reach 40%. Here we can see that

our approach performs better with fewer dimensions

that the SOTA. After the 40% mark, we get a similar

or slightly better performance. We can conclude from

this that 40% to 50% can be considered an optimal

percentage of dimensions to remove redundancy and

improve performance. This is exactly the reason why

we set the feature selection percentage parameter to

40% for the GO and CL experiments.

Table 1 highlights the effects of our approach on

computational cost and storage cost. Due to our

method querying only selected concepts instead of all

concepts, we can see that the initial size of the con-

cept similarity matrix is almost seven times smaller

than that in the SOTA while processing with the same

set of axioms. This smaller size results in faster look-

ups to calculate the axiom similarity measure, which

leads to a reduction in time cost from 13.7 seconds to

3.8 seconds. Also, fewer dimensions in our vector-

space lead to a faster encoding time for a candidate

axiom as we can see a reduction from 0.019 seconds

to 0.0053 seconds.

In Table 2, we see from the size of the CSM

and the concepts queried, that our method has scaled

well from a small-size ontology such as DBpedia to

a larger one such as GO. Even though the number

of concepts in the ontology addressed changes from

762 to 78, 835 our method is able to maintain almost

the same size of the CSM by dealing with a rela-

Table 1: Comparison of computational cost in seconds as well as storage cost in number of values for CSM using using the

DBpedia scored subClassOf dataset.

Approach Number of Initial CSM Concepts Processing Encoding

axioms processed size queried time time

SOTA 722 580,644 762 13.72 0.019

proposed approach 722 85,264 292 3.86 0.005

Table 2: Comparison of computational cost in seconds as well as storage cost in MB for ASM and number of values for CSM

using the GO disjointWith dataset. Time out error: TO.

Approach Number of ASM Initial Concepts Processing Encoding

axioms processed size CSM size queried time time

SOTA 600 62,000 6,214,957,225 78,835 TO TO

proposed approach 600 5.1 95,481 309 312.54 0.034

Table 3: Comparison of computational cost in seconds as well as storage cost in MB for ASM and number of values for CSM

using the CL disjointWith dataset. Time out error: TO.

Approach Number of ASM Initial Concepts Processing Encoding

axioms processed size CSM size queried time time

SOTA 600 8,000 874,680,625 29,575 TO TO

proposed approach 600 2.05 216,225 465 103.7 0.015

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach

355

tively small number of concepts (309). When com-

pared to (Ballout et al., 2022b), we notice that it is

unfeasible to apply the method to the ontology. Con-

cerning computational cost, the SOTA times out and

crashes without completing the task. As for storage

cost, the size of the CSM for the SOTA would be ap-

proximately 62 Gbytes compared to 5.1 Mbytes for

our method.

We notice that our approach consumes an in-

creased amount of processing time up to 312 seconds

but maintains a very short axiom encoding time of

0.034 seconds. This increase in processing time is

attributed to the querying of the semantic similarity

measure in such a large ontology. It is dependant on

the capability of the SPARQL endpoint and the size of

the ontology. However, this is well within acceptable

time.

Similarly, when dealing with the medium-size CL,

having 29,575 concepts, our approach displays con-

sistency and stability in terms of storage and compu-

tation. Processing time is 103 s, which falls within

expectation when compared to the processing time

of GO, the same can be said for the encoding time.

Again, approach (Ballout et al., 2022b) times out and

crashes proving neither feasible nor scalable.

7 CONCLUSION

We have proposed a scalable approach for the score

prediction of atomic candidate OWL class axioms of

different types. The method relies on a semantic sim-

ilarity measure derived from the ontological distance

between concepts in a subsumption hierarchy, as well

as feature selection for vector-space dimension reduc-

tion. Extensive tests that covered a range of ontolo-

gies of different sizes as well as multiple parameters

and settings were carried out to investigate the effec-

tiveness and scalability of the method.

The results obtained support the effectiveness of

the proposed method in predicting the scores of the

considered OWL axiom types with lower error rates

than the SOTA. More importantly, it does so while be-

ing scalable, consistent and stable when dealing with

ontologies of different sizes. This allows us to con-

ﬁdently say that our proposed method is feasible and

able to address large real-world ontologies.

Based on our ﬁndings, it is clear that some re-

search paths emerge, including:

• Developing the method to be able to predict the

scores of complex candidate axioms.

• Incorporating active learning (Settles, 2009) for

scalability before reaching the stage of applying

feature selection.

REFERENCES

Ballout, A., da Costa Pereira, C., and Tettamanzi, A. G. B.

(2022a). Learning to classify logical formulas based

on their semantic similarity. In Aydogan, R., Cri-

ado, N., Lang, J., S

anchez-Anguix, V., and Serramia,

M., editors, PRIMA 2022: Principles and Practice of

Multi-Agent Systems - 24th International Conference,

Valencia, Spain, November 16-18, 2022, Proceedings,

volume 13753 of Lecture Notes in Computer Science,

pages 364–380. Springer.

Ballout, A., Tettamanzi, A. G. B., and Da Costa Pereira, C.

(2022b). Predicting the score of atomic candidate owl

class axioms. In 2022 IEEE/WIC/ACM International

Joint Conference on Web Intelligence and Intelligent

Agent Technology (WI-IAT), pages 72–79.

Cai, J., Luo, J., Wang, S., and Yang, S. (2018). Feature se-

lection in machine learning: A new perspective. Neu-

rocomputing, 300:70–79.

Chandrasekaran, B., Josephson, J. R., and Benjamins, V. R.

(1999). What aro ontologies, and why do we need

them? IEEE Intelligent Systems and Their Applica-

tions, 14:20–26.

Chen, J., He, Y., Geng, Y., Jim

enez-Ruiz, E., Dong, H., and

Horrocks, I. (2022). Contextual semantic embeddings

for ontology subsumption prediction.

Corby, O., Dieng-Kuntz, R., and Faron-Zucker, C. (2004).

Querying the semantic web with corese search engine.

In ECAI, pages 705–709. IOS Press.

Corby, O., Dieng-Kuntz, R., Faron-Zucker, C., and Gandon,

F. (2006). Searching the semantic web: Approximate

query processing based on ontologies. IEEE Intell.

Syst., 21(1):20–27.

Cullen, J. and Bryman, A. (1988). The knowledge acquisi-

tion bottleneck: Time for reassessment? Expert Sys-

tems, 5:216–225.

Dubois, D. and Prade, H. (1988). Possibility Theory—An

Approach to Computerized Processing of Uncertainty.

Plenum Press, New York.

Hassanpour, S., O’Connor, M. J., and Das, A. K. (2014).

Clustering rule bases using ontology-based similarity

measures. Journal of Web Semantics, 25:1–8.

Khadir, A. C., Aliane, H., and Guessoum, A. (2021). On-

tology learning: Grand tour and challenges. Computer

Science Review, 39:100339.

Lehmann, J. and V

olker, J. (2014). An introduction to on-

tology learning.

Maedche, A. and Staab, S. (2004). Ontology Learning.

Springer, Berlin, Heidelberg.

Malchiodi, D., Da, C., Pereira, C., and Tettamanzi, A. G. B.

(2020). Classifying candidate axioms via dimension-

ality reduction techniques. pages 179–191.

Malchiodi, D. and Tettamanzi, A. G. (2018). Predicting the

possibilistic score of OWL axioms through modiﬁed

support vector clustering. Proceedings of the ACM

Symposium on Applied Computing, pages 1984–1991.

Nickel, M., Tresp, V., and Kriegel, H. P. (2012). Factoriz-

ing yago : Scalable machine learning for linked data.

WWW’12 - Proceedings of the 21st Annual Confer-

ence on World Wide Web, pages 271–280.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

356

OWL Working Group (2012). OWL—web ontology lan-

guage. https://www.w3.org/OWL/.

Settles, B. (2009). Active learning literature survey. Com-

puter Sciences Technical Report 1648, University of

Wisconsin–Madison.

Tettamanzi, A. G., Faron-Zucker, C., and Gandon, F.

(2017). Possibilistic testing of OWL axioms against

RDF data. International Journal of Approximate Rea-

soning.

Scalable Prediction of Atomic Candidate OWL Class Axioms Using a Vector-Space Dimension Reduced Approach

357