
3  DATA PREPARATION USING 
ONTOLOGIES 
To overcome the issues identified in Section 2, we 
propose  an  approach  involving  the  creation  of  a 
domain  ontology,  and  performing  logic  based 
reasoning, to assist in the data preparation. 
In  computer  science,  an  Ontology  is  a  formal, 
explicit specification of the concepts, relationships, 
and other distinctions that are relevant for modelling 
a  domain  (Gruber,  2009).  It  provides  a  common 
vocabulary, usually machine-interpretable, to share a 
common  understanding  of  the  structure  of 
information among people and software agents and 
helps  make  domain  assumptions  explicit  (Noy  & 
McGuinness,  n.d.).  It  thereby  allows  software 
agents,  often  called  reasoners,  to  identify  implicit 
information  in  the  data  based  on  first-order  logic. 
Such  reasoners  have  been  used  to  enable 
interoperability  between  software  tools,  determine 
inconsistencies  and  errors  in  data,  automate  data 
classification, etc.  (Ameri, et al., 2012)  (Yang, et 
al., 2013).  
We  shall  explain  our  proposed  approach  using 
the demonstrative example introduced in Section 2. 
For  this  example,  we  use  an  Ontology  Editor, 
Protégé  (Musen,  2015),  provided  by  the  Stanford 
Center  for  Biomedical  Research,  Stanford 
University.  Protégé  supports  OWL-DL  (Web 
Ontology  Language  –  Description  Logic)  as  the 
language  for  defining  the  Ontology.  It  enables 
reasoning using Description Logic, which is a subset 
of first-order logic (Horridge, 2011) (Wood, 2013). 
We also use the Pellet reasoner (Clark, 2015) plugin 
for Protégé for drawing inferences. 
To  define  the  Ontology,  we  first  identify  the 
important concepts in the domain. In this example, 
the key concepts are that of a Machine, a diagnostic 
code  or  DTC,  and  a  diagnostic  Event,  which  are 
defined as classes in the  Ontology. Every  machine 
has a pin number and type or category. To capture 
that,  we  create  two  new  classes  nativepin  and 
MachineCategory,  and  two  new  relationships  or 
object properties, namely hasNativePin, which has 
Machine as the domain and nativepin as the range, 
and  hasCategory,  which  has  Machine  as  the 
domain  and  MachineCategory  as  the  range. 
Similarly,  for  a  DTC  we  create  a  new  class, 
DTCCategory,  an  object  property, 
hasFaultCategory,  and  a  data  property 
hasdescription.  Finally,  for  the  Event  class  we 
create  a  class,  machinepin,  an  object  property, 
hasMachinePin, an object property hasFaultCode, 
and a data property, date. The domains and ranges 
for these properties are shown in Table 1 below: 
Table 1: Table of relationships. 
As we analyses the domain, we realise that  the 
classes  nativepin  and  machinepin  describe  the 
same  concept,  and  hence  we  specify  that  these 
classes  are  equivalent.  Likewise,  we  explicitly 
specify all the other classes to be disjoint from each 
other. Since, we know that diagnostic events occur 
on  specific  machines,  we  would  like  to  have  a 
relationship  that  indicates  the  Machine  that  a 
particular Event occurred on. Therefore, we create a 
new object property, belongsTo, with Event as the 
domain  and  Machine  as  the  range.  However,  we 
also realise that this information would implicitly be 
present in the data through the hasMachinePin and 
hasNativePin  properties due  to  the  equivalence  of 
machinepin  and  nativepin.  Hence,  we  specify the 
property  belongsTo  as  SuperProperty  Of  Chain 
“hasMachinePin o inverse(hasNativePin)”.  
The  resultant  description  of  the domain can  be 
visualized as shown in Figure 5. 
 
Figure 5: Definition of the machine data ontology. 
The information specified so far is merely recording 
knowledge  about  the  domain.  This  information  is 
often referred to as the T-Box or Terminology Box. 
It  does  not  have  any  information  about  specific 
instances of machines or specific diagnostic events. 
KEOD 2018 - 10th International Conference on Knowledge Engineering and Ontology Development
170