Ontology for the Semantic Enhancement, Database Definition and
Management and Revision Control
Edward S. Blurock
a
Blurock Consulting AB, Lund, Sweden
Keywords: Ontology Use Case, Database System, Chemical Kinetics, Experimental Data, Chemical Modelling.
Abstract: This paper describes the use of ontologies interacting with a noSQL database (Google Cloud Firestore) in
multiple capacities in the database system CHEMCONNECT. The motivation is to implement the ‘Data on
the Web Best Practices” as recommended by the W3C (https://www.w3.org/TR/2017/REC-dwbp-20170131/,
2017) in an application within the physical chemistry and instrumentation. First, the ontology provides
semantic enhancement to each database object through meta-data, standard vocabularies and data object
relationships. There is a one-to-one correspondence between the database objects and the ontology objects.
Another use of the ontology is to provide a data-driven model for the creation, provenance and versioning of
database objects. One aspect of this is the use of domain specific templates to guide the construction of the
database objects. The definition of each database object is in a hierarchy of catalog objects, record objects
and components (using the DCAT ontology model). Within each of these object definitions is a link describing
how a create a set of automatically generated RDF objects within the CHEMCONNECT database. The RDFs
facilitate searching the database. To facilitate versioning, data source tracking and data quality control,
operations on the database are organized as transactions. In CHEMCONNECT a transaction has a one to one
correspondence with the underlying JAVA operation in the implementation. Within the transaction definition,
the set of prerequisites and the output of the operation is defined. The use of transactions helps organize and
give semantic enhancement to the set of individual operations within the implementation. The work in this
paper is on-going and as the first use-case is concentrating experimental and theoretical information in the
chemical domain. The implementation is written in JAVA and is using Google Cloud firestore as the database.
1 INTRODUCTION
CHEMCONNECT is database application within the
chemical and instrumentation domain. The
motivation for using ontologies within
CHEMCONNECT (E. S. Blurock 2019; E. Blurock
2021) stems from the W3C recommendations, ‘Data
on the Web Best Practices (Caroline Burle Lóscio
2017). The uses of ontologies with
CHEMCONNECT have multiple roles and goals:
Ontology Based Data Management: There is a
one-to-one correspondence between ontology
objects and (JAVA) data objects. The ontology
defines objects (Maali and Erickson 2014),
processes (Timothy Lebo, Satya Sahoo, Deborah
McGuinness 2013) and transactions (Ciccarese et
al. 2013). Each object has semantic enhancements
a
https://orcid.org/0000-0001-9487-3141
to facilitate provenance, data quality, the use of
standard vocabularies, formatting and versioning
within the database.
Ontology Templates: The ontology provides
templates containing domain specific information
that can be inserted into standard database objects
within the database. In this way the domain
specific knowledge can be enhanced without
having to update the JAVA database
implementation.
Ontology Driven Data Manipulation: The
JAVA implementation interprets and is driven by
the ontology. All processes, including
transactions, are defined within the ontology.
There is a one-to-one correspondence between all
the functions provided by the web API and the
ontology. The ontology defines the prerequisites
226
Blurock, E.
Ontology for the Semantic Enhancement, Database Definition and Management and Revision Control.
DOI: 10.5220/0010714300003064
In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 2: KEOD, pages 226-233
ISBN: 978-989-758-533-3; ISSN: 2184-3228
Copyright
c
2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
(input) needed and the output expected from the
process. The hierarchy of processes and the
human and machine-readable vocabularies within
the process definitions provide meta-data
semantic enhancement. Transactions are a special
case of processes used to promote versioning, data
tracking and data quality within the database.
Static versus Dynamic Knowledge: The role of
the ontology within CHEMCONNECT is to
capture ‘static’ knowledge, particularly data
structures. The database captures the expanding
data and knowledge within the domain.
The ontology gives a semantic context to the data
in the database following the principles of Ontology
Based Data Management (Lenzerini 2011;
Dehainsala, Pierra, and Bellatreche 2007). In this
respect, the ontology supplements the data within the
database by providing additional information about a
data object that is fixed for all data objects of the same
type. The database object can be seen as an instance
of this object.
The human readable meta-data provided by all
ontology objects provides documentation, comments
and labels that can be used by the user-interface.
Another role of the ontology is as the basis of the
data-driven paradigm of CHEMCONNECT. The
design philosophy is to minimize the catalog object
specificity in the (JAVA) programming by having the
definitions within the ontology drive the data object
manipulation. Within the implementation, the
ontology objects instances are represented as JSON
objects within the database and within the JAVA
implementation. This and the standard meta-data
requirements of each data object promotes domain
specific enhancements through ontology
development rather than JAVA development.
Domain data can be updated in the ontology without
any addition JAVA programming.
The information within the ontology can provide:
Definition: The ontology provides the definition
of database objects with its sub-parts. The
database object is a specific instance of the object
defined in the ontology. The basic ontology
objects are divided in three types, components,
records and catalog objects.
Templates: Templates are generalized
information used to fill in domain information,
such as data formats, chemical properties,
instrument properties, procedure steps, etc., into
the general catalog database object instances.
Concepts: This is the hierarchy of domain
specific concepts and classifications. The
concepts are also used to fill in domain
information in the template.
Relationships: Within the ontology object, RDF
relationships are defined to link data within
catalog object to facilitate searching for the data.
The ontology information is used for the
automatic creation of database RDFs. These
mapping definitions are provided at every level
of the object definition.
Transactions: Transactions are the key to data
tracking and versioning. The transaction
definition within the ontology defines
prerequisite (input) objects, essential defining
information, output objects and relationships.
The transaction information provides the
roadmap to automate the creation, manipulation
and ultimately versioning of database objects.
Each database object originates from a
transaction process. In this way, the entire history
of the object, versioning, is documented.
The ontology provides static information
common to each data type within the database. The
database itself is instances of these abstract objects.
The ontology in is used in several capacities.
1.1 Database-Ontology Interaction
The purpose of the ontology definitions is to give
semantic context to objects within the database. The
ontology represents static information giving
definitions, relationships and semantic enhancements
to the objects in the database.
Figure 1: The ontology definition of NameOfPerson.
For example, the ontological definition of a
person’s name (NameOfPerson a record in the
ontology with the identifier, foaf:name) says that
the name should have three components as shown in
Figure 1, where:
UserTitle: A classification representing the
title of the person (Mr, Dr. Ms., etc.) with the
identifer, foaf:title.
givenName: A string representing the name of
the person with the identifier,
foaf:givenName.
NameOfPerson
dcterms:hasPart givenName
dcterms:hasPart familyName
dcterms:hasPart UserTitle
Ontology for the Semantic Enhancement, Database Definition and Management and Revision Control
227
familyName: A string representing the family
name of the person with the identifier,
foaf:familyName.
A specific database instance of this information,
for example the name “Dr. Edward Blurock”, would
be represented (as a JSON object) as shown in
Figure
2.
Figure 2: A specific instance of NameOfPerson in JSON
format.
Within the database instance, only the ‘essential’
information is given. In the ontology semantic
enhancements are given. Within the database
instance, the (unique) identifiers point to the
corresponding ontology object. For example,
foaf:name points to the NameOfPerson
ontology object. The ontology object gives additional
(static) information about the name of the person:
rdfs:label: “Name of Person”
rdfs:comment: “The full name of a
person (including title)”
dcterms:identifier: “foaf:name”
skos:altlabel: “pname”
dc:type: NameOfPerson
In addition, using the
skos:mappingRelation two database RDF’s
(see section RDFGeneration) linking the last name
and the given name to the catalog object.
2 DATA OBJECT DEFINITION
Within the data object definitions are semantic
enhancements promoting the use of descriptive and
structural meta-data and data object formatting as
recommended by the W3C (Caroline Burle Lóscio
2017).
Identifiers and vocabularies, standard when
available, are used within every data object and
component. The meta-data within the data object are
both machine readable and human readable.
The ontology also provides the machine
interpretable formats of each data object, process and
transaction. This is also the key to the use of the
ontology in a data-driven capacity. The JAVA
implementation interprets the ontology which, in
turn, drives the processes.
Through the placement within the ontology
hierarchies of classes and subclasses, structural meta-
data is provided. Data is within the component,
record and catalog structures(Maali and
Erickson 2014), templates are within the Concept
hierarchy(Miles and Bechhofer 2008), processes are
within the prov:SoftwareAgent (Timothy Lebo,
Satya Sahoo, Deborah McGuinness 2013) and
transaction are within the Event(Dublin Core 2012)
hierarchy.
2.1 Common Catalog Information
The ontology structures have a one-to-one
correspondence with data, interface and persistent
database structures. There are basically three levels of
data structures:
Catalog Structures: These are based on the
DCAT Catalog structure (Maali and Erickson
2014). These are the structures representing the
main data objects to represent the domain.
Record Structures: Base on the
dcat:record from the DCAT ontology, these
are the records of the catalog. Each record
structure contains several pieces of 'primitive'
information.
Components: These are basically single string
primitives that make up the record. Numerical
values are strings in the database, but can be
interpreted as numerical objects.
Catalog objects are the top-level objects within
the database. Both catalog objects and records are
compound objects consisting of records and
components.
The total catalog object definition is a hierarchy
of records and components. In the JAVA
implementation and the database, a catalog object is
manipulated and stored as a JSON object.
All catalog instances are subclasses of
SimpleCatalogObject which has the following
information:
CatalogObjectAccessModify: This is a
reference to which users can modify the catalog
object.
CatalogObjectAccessRead: This is a
reference to which users can read the catalog
object. If this is “Public”, then all users,
including guest, can access the information.
CatalogObjectKey: This is a unique key for
the catalog object instance.
foaf:name: {
foaf:title: “Mr.”,
foaf:givenName: “Edward”,
foaf:familyName: “Blurock”
}
KEOD 2021 - 13th International Conference on Knowledge Engineering and Ontology Development
228
TransactionID: This is a reference to the
transaction (see Section 0) that created the object.
CatalogObjectOwner: This is the owner of
the catalog object.
This information is key to determining who has
access to the catalog object and how the catalog
object was created.
2.1.1 Access Rights
The access rights, meaning who can read, modify or
even delete the catalog object, is determined by
several keys as shown in Section 0. The keyword in
each these fields (including the owner) has the
following forms:
Username: This is the username of the account
which has the access rights.
Consortium: This is a list of usernames have the
same access rights to a set of objects.
Public: This is everybody.
If several user accounts can access the accounts,
then a consortium is built. The consortium keyword
points to a list of usernames. If the access is a
consortium keyword, then the specific user account
must be in the list.
In searching through the database, part of the
search expression involves joining all the possible
combinations of access rights to the
CatalogObjectAccessRead field. The list of
valid consortiums is formed by those which include
the user account. Basically, an OR operation with this
list of consortiums and the username is appended to
the rest of the search expression.
2.2 Semantic Enhancements
The standard basic information associated with every
ontological object representing data is:
Labels (rdfs:label) and Comments
(rdfs:comment): These are human
interpretable strings that give semantic
enhancement to the data. These are also useful in
the GUI or in human readable printout.
Identifiers (dcterms:identifier): This is
a unique ontological identifier specifying that
what follows is the specific data type.
Alternative label (skos:altlabel): This is
a short label uniquely identifying the data type.
Type (dc:type): This is the pointer to the datatype
of the object.
2.3 Templates
The ontology in CHEMCONNECT provides
templates to help build and fill in the information in
the catalog objects. There are several important
classes of templates:
Choices and Classifications: For a given
parameter there could be a list or tree hierarchy
of possible choices.
Domain Information: The database catalog
object is designed to be very general. The domain
specific information within the ontology is used
to fill in and structure these general catalog data
objects.
Transactions: This is a set of templates for
operations on the database (see Section 0).
An example of domain specific information is the
specification needed for a scientific instrument. In
CHEMCONNECT, a device is viewed as a system of
subsystems. Within a system definition there are the
set of sub-systems (each with its own system
definition), concepts and keywords associated with
the system’s purpose and domain, and finally a set of
parameters describing specific attributes of the
device. The set of these attributes are designed by the
domain experts as being important characteristics to
distinguish, for example, the ‘same’ instrument from
one lab from another. For a particular system these
attributes could be, for example, dimensions,
configuration, operating ranges, etc.. These attributes
are a condensation and machine-readable form of the
information found in the ‘Experimental Setup’
section of a scientific paper.
3 RDF GENERATION
One type of database object is a Resource Description
Framework (RDF) triplet. The database RDFs are not
static like the ones in the ontology defintion, but grow
with the addition of database objects. A
corresponding RDF database instance (a subclass of
RDFTriple) is added to the database using the
information within the data object. The RDF
definition defines how to create a link between two
pieces of data within the catalog object. Part of the
transaction process is, after the catalog object is
created, to create the corresponding RDFs.
The purpose of the database RDFs is to facilitate
searching and to link up database object instances.
We view the RDF to be an object linked to a
subject by a predicate:
Object -> Predicate -> Subject
Ontology for the Semantic Enhancement, Database Definition and Management and Revision Control
229
Figure 3: Excerpt from DatabasePerson ontology object.
Figure 4: Excerpt from PersonalDescription Ontology.
Within the ontology RDF definition, a subclass of
RDFMappingDefinition, the class name is the
Predicate name. Within this definition the Object is
identified with skos:member and the Subject is
identified with prov:entity. The object pointed
to by the Subject and Object is searched for within the
current catalog object and its value is substituted in
the RDFTriplet object.
Within a source object, which can be a catalog
object, record object or even a component object, the
RDF defining class is identified with
skos:mappingRelation. The ontology object
that the Object and Subject refer to are either directly
in the source object definition or in the catalog object
where the source object is found.
For example, in Figure 3 we see that one of the records
(dcat:record) in the ontology catalog object
DatabasePerson is PersonalDescription.
Furthermore, looking at the definition of
PersonalDescription in Figure 4, we see that
NameOfPerson is a record. In addition to the components
(dcterms:hasPart) of NameOfPerson (as seen in
Figure 1), there is an additional mapping with the identifier
skos:mappingRelation to the RDF object,
RDFPersonFamilyName, with the elements shown in
Figure 5. Since there are two skos:member links, this
produces two RDFTriple objects. The first has the
following form:
Object: CatalogObjectKey (a component)
Predicate: RDFPersonFamilyName
Subject: familyName (a component)
Since both the Object and Subject are simple
component strings, the catalog object that is produced
is RDFSubjectObjectPrimitive (see
Figure 6)
where the Object, Predicate and Subject are stored in
the ShortStringKey (with identifier
foaf:LabelProperty), RDFPredicate (with
identifer RDFpredicate) and
RDFSubjectClassName (with identifier
rdfsubjectclassname) fields, respectively.
Figure 5: Excerpt from RDFPersonFamilyName.
Figure 6: Excerpt from RDFSubjectObjectPrimitive
ontology.
Using the example that the name of the person is
“Dr. Edward Blurock”, the
RDFSubjectObjectPrimitive, a catalog object, that
would be formed (in JSON form) is shown in Figure
7.
Figure 7: Excerpt from RDFSubjectObjectPrimitive
database object. The entry catalogkey” represents the
actual unique key for the database object.
The other RDF that is formed is different in two
ways. First it involves a record object as the subject,
namely FirestoreCatalogID. The second is
that this record is not a member of
NameOfPerson, but is found in another place in
the total catalog object DatabasePerson (see
Figure 3). But since the identifiers are unique, the
database object needs only to be systematically
searched for the corresponding element.
Since the subject is a record, it would produce a
RDFSubjectPrimitiveObjectRecord
object, schematically shown in Figure 8 where the
record object is identified as
dcat:CatalogRecord and the actual record for
FirestoreCatalogID is identified with
firestorecatalog.
DatabasePerson
dcat:record PersonalDescription
dcat:record FirestoreCatalogID
dcat:record CatalogObjectKey
.
.
.
PersonalDescription
Dcat:record NameOfPerson
.
.
.
RDFPersonFamilyName
skos:member CatalogObjectKey
skos:member FirestoreCatalogID
prov:entity familyName
RDFSubjectObjectPrimitive
dcterms:hasPart ShortStringKey
dcterms:hasPart RDFPredicate
dcterms:hasPart
RDFSubjectObjectPrimitive {
foaf:LabelProperty: “Blurock”
RDFPredicate: “RDFPersonFamilyName“
rdfsubjectclassname: “catalogkey”
}
KEOD 2021 - 13th International Conference on Knowledge Engineering and Ontology Development
230
Figure 8: Schematic of JSON object
RDFSubjectPrimitiveObjectRecord.
3.1 Searching RDFs
The objects in the database are in a complex
hierarchical structure. Unless one knows where in this
structure the desired object is within the hierarchy,
finding the object may be difficult. The purpose of the
database RDFs is to provide an efficient search
mechanism to find, primarily through keywords,
objects in the database.
The RDF definitions in the ontology as outlined
in the previous section provides an automatic
mechanism to facilitate simple searches through key
objects in the database. The RDFs which are
generated should reflect useful and efficient searches
of the database. For each data object, especially
catalog objects, the designer should decide how the
object will be accessed and what is the most efficient
way to access this information. For example, which
keywords within the information within the catalog
object can be used to access the information.
In the previous example, an RDF was made
linking the last name, “Blurock”, with the full
information of the user (DatabasePerson). In this
case the keyword “Blurock” would be searched in all
the Object fields of the stored RDF, the
foaf:LabelProperty of the RDFs. The question this
RDF answers is ‘Find me all the users with the last
name of Blurock’.
4 TRANSACTIONS
Operations on the database, such as creation,
modification or deletion, are defined within
CHEMCONNECT as transactions. The main
motivation of the use of transactions (a sub-class of
Event, in the dublin-core ontology (Dublin Core
2012)) is to satisfy the W3C requirements (Caroline
Burle Lóscio 2017) for versioning. A
CHEMCONNECT transaction has an associated
process (function), the list of prerequisite transactions
for the process and the output object of the process.
The tree of transactions gives the exact history of the
data. There are transactions for the creation and also
the manipulation, transformation and updating of data
objects. Data quality can be assessed through the
transaction history because the sources can be traced.
Within the implementation there is a one-to-one
correspondence between an operation on the database
and a transaction. The transactions keep track of what
is needed to perform the operation and then what
catalog objects are created by the operation.
Defining operations as transactions gives each
operation an organizational and semantic context.
Database modification through transactions also
gives the history and dependence of the catalog
objects. If an object is dependent on another object
which has been modified, there is the possibility to
reflect the modification of the modified object on
those objects which are dependent on them.
A transaction is defined within the ontology
hierarchy in the ontology as a subclass of dc:Event.
The transaction definition within the ontology has the
following elements:
The set of prerequisite transactions that need to
be performed before the current transaction can
be executed (dcterms:requires).
An additional data object having the input
information needed to perform the current
transaction (dcterms:source).
The catalog object that the transaction produces
(hasOutput).
The prerequisite transaction information can be
used to find the output data (hasOutput) from
previous operations needed to process the current
operation.
The additional data (dcterms:source) links to a
data object giving additional information, not found
in the prerequisites, needed to perform the operation.
This information can, for example come from the user
interface.
Figure 9: Excerpt from CatalogObjectUserAccountEvent.
For example, Figure 9 shows the fields of the
CatalogObjectUserAccountEvent that is
RDFSubjectObjectPrimitive {
foaf:LabelProperty: “Blurock”
RDFPredicate: “RDFPersonFamilyName“
dcat:CatalogRecord: {
firestorecatalog: {
.
.
.
}
}
CatalogObjectUserAccountEvent
dcterms:requires
CreateDatabasePersonEvent
dcterms:source
ActivityCatalogUserAccount
hasOutput: UserAccount
Ontology for the Semantic Enhancement, Database Definition and Management and Revision Control
231
needed to create a new user account. This event
requires that the user (DatabasePerson) already
exists and this is ensured by requiring that a specific
CreateDatabasePersonEvent has already
been executed. The corresponding database object
has some of the information needed, such as the
person’s name and other details, but there is some
extra information needed. This is provided by the link
(dcterms:source) to an
ActivityCatalogUserAccount record (see
Figure 11: Excerpt from TransactionEventObject).
All of these records containing the extra information
for transactions are found as a subclass of
ActivityInformationRecord which, in turn
is a subclass of dcat:CatalogRecord. In this
example, this is information needed for the new
account that is not found in the DatabasePerson.
Figure 10: Excerpt from ActivityCatalogUserAccount.
Within the ontology, the general transaction is
defined. In the ontology definition, the ontology
object class is pointed to (by the identifiers shown in
parenthesis above). After each transaction, the
catalog object instance,
TransactionEventObject (see Figure 11), is
created. For a specific transaction instance, specific
objects of that class within the database are pointed to
(again by the same corresponding identifier). The
three records pointed to are links, through database
IDs, to the required transaction instances, the addition
information instance and the output instance. These
IDs are enough to find the respective information.
From the transaction instances, when needed, the
respective output objects from these transactions can
be retrieved.
Figure 11: Excerpt from TransactionEventObject.
4.1 TransactionID
Every catalog object created has the transaction ID as
one of its fields. This provides information about the
objects origins and history through the creating
transaction and its dependencies through the chain of
prerequisites found in the transactions. It also can
provide a search tool for finding all the objects
created by the transaction.
5 CONCLUSION
This paper has outlined several aspects of the
ontology-based database CHEMCONNECT. The
ontology provides information about the database
object instances. The ontology provides semantic
enhancement of database objects through annotations
and relationships defined within the ontology. The
ontology also provides, as in the case of RDF triplet
generation, an automation of the database object
creation. Through transaction definitions, the
ontology also provides a semantic context and
organization to the operations of database
management.
This work is on-going and in a preliminary phase.
The use-case domain is chemical kinetics and
physical organic chemistry.
REFERENCES
Blurock, Edward. 2021. “Use of Ontologies in Chemical
Kinetic Database CHEMCONNECT.” In , 240–47.
https://www.scitepress.org/PublicationsDetail.aspx?ID
=ai7xFiBEN8E=&t=1.
Blurock, Edward S. 2019. “CHEMCONNECT: An
Onotology-Based Repository of Experimental Devices
and Observations.” In . Copenhagen, Denmark.
https://icaita2019.org/index.html#home.
Caroline Burle Lóscio, Bernadette Farias, Newton Calegari.
2017. “Data on the Web Best Practices.” January 2017.
https://www.w3.org/TR/2017/REC-dwbp-20170131/.
Ciccarese, Paolo, Stian Soiland-Reyes, Khalid Belhajjame,
Alasdair JG Gray, Carole Goble, and Tim Clark. 2013.
“PAV Ontology: Provenance, Authoring and
Versioning.” Journal of Biomedical Semantics 4 (1):
37. https://doi.org/10.1186/2041-1480-4-37.
Dehainsala, Hondjack, Guy Pierra, and Ladjel Bellatreche.
2007. “OntoDB: An Ontology-Based Database for Data
Intensive Applications.” In Advances in Databases:
Concepts, Systems and Applications, edited by
Ramamohanarao Kotagiri, P. Radha Krishna, Mukesh
Mohania, and Ekawit Nantajeewarawat, 497–508.
Lecture Notes in Computer Science. Berlin,
ActivityCatalogUserAccount
dcterms:hasPart AuthorizationType
dcterms:hasPart username
TransactionEventObject
dcat:record
RequiredTransactionIDAndType
dcat:record
ActivityInformationRecord
dcat:record
DatabaseObjectIDOutputTransaction
KEOD 2021 - 13th International Conference on Knowledge Engineering and Ontology Development
232
Heidelberg: Springer. https://doi.org/10.1007/978-3-
540-71703-4_43.
Dublin Core. 2012. “Dublin Core Metadata Initiative.”
Dublin Core Metadata Initiative. June 12, 2012.
http://dublincore.org/.
Lenzerini, Maurizio. 2011. “Ontology-Based Data
Management.” In Proceedings of the 20th ACM
International Conference on Information and
Knowledge Management, 5–6. CIKM ’11. New York,
NY, USA: Association for Computing Machinery.
https://doi.org/10.1145/2063576.2063582.
Maali, Fadi, and John Erickson. 2014. “Data Catalog
Vocabulary (DCAT).” January 16, 2014.
https://www.w3.org/TR/vocab-dcat/.
Miles, Alisair, and Sean Bechhofer. 2008. “SKOS Simple
Knowledge Organization System Namespace
Document 30 July 2008 ‘Last Call’ Edition.” August
20, 2008. https://www.w3.org/TR/2008/WD-skos-
reference-20080829/skos.html.
Timothy Lebo, Satya Sahoo, Deborah McGuinness. 2013.
“PROV-O: The PROV Ontology.” 2013.
https://www.w3.org/TR/prov-o/.
Ontology for the Semantic Enhancement, Database Definition and Management and Revision Control
233