System Proposal for Integrating Quality Control Data of Components of

the Brazilian Oil and Gas Industry

Mario Ricardo Nascimento Marques Junior, Eder Mateus Nunes Gonc¸alves,

Silvia Silva da Costa Botelho, Emanuel da Silva Diaz Estrada, Danubia Bueno Espindola,

Eduardo Nunes Borges, Werner Luft Botelho, Bruno Machado Lobell and Lucas Silva Marca

Center of Computational Sciences, Federal University of Rio Grande, Rio Grande, Brazil

Keywords:

Quality Control, Schema Matching, Ontology, Inspection.

Abstract:

Databooks are the main documents for monitoring and validating a work in the oil industry. However, the

analysis of these Databooks requires many man-hours of professionals, and will still be subject to failures

during these analyzes. An approach to optimize this task would be to use an intelligent search and validation

system for this documentation. Once the documents are scanned, algorithms could assist the professional to

analyze these Databooks, based on automatic indexing and checks. A problem that arises from this approach

is the difﬁculty to develop a valid conceptual model for all Databooks found and all types of components

and structures subject to inspection procedures. As this modeling is done manually by professionals, this

task becomes difﬁcult, slow and often inefﬁcient. To mitigate this problem, this work proposes a system to

automatically generate a conceptual model from textual descriptions of the inspection elements, making this

task simpler. This is done from the development of an ontology that describes the standardized knowledge

on inspection of different types of components of the structure of works for oil platforms for the validation of

inspection reports and quality certiﬁcates regarding criteria of completeness and compliance.

1 INTRODUCTION

In the manufacture and supply of equipment for the

Oil Industry, Quality Management is a controlling and

standardizing instrument, providing management for

a Quality System, covering the organizational struc-

ture, procedures, processes and resources (Nasci-

mento, 2010).Inspection is a critical part of the oil in-

dustry onshore and offshore due to its inﬂuence on

maintaining equipment safety, reliability and avail-

ability (Rachman and Ratnayake, 2019).

All data related to the inspection of equipment is

contained in Databooks. The Databook gathers all

the information generated and registered, such as en-

gineering documents and their changes, purchase or-

ders, tests carried out, quality certiﬁcates, inspection

reports, among others (Duarte, 2010). According to

the degree of complexity of the component, a data-

book can contain thousands of pages.

The analysis of these Databooks is done by trained

professionals. These professionals must carefully an-

alyze each page of the Databook looking for abnor-

malities, such as the lack of an inspection report, an

unsigned document or even acts of bad faith. The

problem with this analysis is the high time spent,

since the professional must manually search the doc-

uments for these anomalies. In addition, a Databook

must be checked for completeness criteria (if it con-

tains all mandatory documents) and compliance (if all

documents are correct as to the procedures adopted).

Without an adequate search system, each of these ac-

tivities can occupy technicians for hours, days or even

weeks of work.

One approach to optimize this task would be to

use an intelligent search system. Once the documents

have been scanned, algorithms could assist the profes-

sional in analyzing the Databooks by performing in-

formation registration and retrieval functions, in addi-

tion to performing completeness and compliance ver-

iﬁcation functions. However, before these data are

stored in a database, they must be pre-processed, as

they usually originate from a scanned document or a

PDF document (Portable Document Format).

After pre-processing, the extracted data must be

stored in a database. However, different companies

provide different equipment and services, and con-

Marques Junior, M., Gonçalves, E., Botelho, S., Estrada, E., Espindola, D., Borges, E., Botelho, W., Lobell, B. and Marca, L.

System Proposal for Integrating Quality Control Data of Components of the Brazilian Oil and Gas Industry.

DOI: 10.5220/0010603802110215

In Proceedings of the 18th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2021), pages 211-215

ISBN: 978-989-758-522-7

211

sequently their reports are delivered in different for-

mats, both in content and in form. Therefore, concep-

tual modeling, one of the most important steps in a

database (Elmasri and Navathe, 2011), should be per-

formed for each item in the Databook, because in this

step requirements are analyzed to deﬁne the system

structure.

This modeling is done manually by a professional,

based on his experience, in the domain of the sys-

tem and certain rules. However, this task becomes

difﬁcult when large systems are developed. For the

present work, a system is proposed to automatically

generate a conceptual model from textual descriptions

of the inspection elements, making this task simpler.

This is done from the development of an ontology that

describes the standardized knowledge on inspection

of different types of components of the structure of

works for oil platforms for the validation of inspec-

tion reports and quality certiﬁcates regarding criteria

of completeness and compliance. Ontology makes it

possible to deal with the diversity of types of com-

ponents, documents and the correct allocation of data

even under different structures. In this way, it is pos-

sible to resolve the match between the different data

schemes associated with each of the inspected com-

ponents and structures. Through this approach, data

can be inserted into the same database.

This work is part of a larger project that aims to

use pre-processing techniques and data science for the

automatic veriﬁcation of completeness, compliance

and bad faith in records and documents of purchase,

construction and assembly in context Big Data. The

project involves the development of technologies for

registration, retrieval and inference of large databases

using technologies of datalakes and microservices.

2 FUNDAMENTALS

2.1 Component Inspection for Oil and

Gas Industry

Within the scope of the oil industry in Brazil, the

document that governs the guidelines related to the

inspection of the manufacture of equipment for oil

and gas is the ABC of the Inspection of Manufacture

(Petr

oleo-Brasileiro-S.A, 2017).

According to this document, manufacturing in-

spection ”is the activity carried out for planning and

execution purposes aiming to verify, at the premises

of the supplier and / or sub-suppliers involved, the

conformity of the equipment or materials manufac-

tured with the contractual documents”. An asset is

deﬁned as any system, equipment, product or mate-

rial that the Supplier must deliver to the customer in

accordance with the contract.

2.2 Ontology

In the area of Computer Science, ontologies are used

for modeling, both in database-based systems and for

knowledge representation (Almeida, 2014). An on-

tology allows the deﬁnition of concepts (entities, ob-

jects, events, processes), emphasizing their proper-

ties, relationships and restrictions, enabling the shar-

ing of knowledge about a given domain, which is rep-

resented by a vocabulary (de Freitas, 2017).

According to (Almeida, 2014), two main deﬁni-

tions for ontology in Computer Science are noted in

the literature. The ﬁrst refers to the use of ontolog-

ical principles to model reality, that is, to provide a

description of what exists and to characterize entities

(Wand et al., 1999). The other deﬁnition refers to

the representation of a domain in a logical language

for computational purposes. In this case, an ontol-

ogy consists of a set of statements expressed through

a representation language, processable by inference

mechanisms (Staab and Studer, 2010).

2.3 Schema Matching

Scheme matching is the problem of generating corre-

spondences between elements of two schemas (Bern-

stein et al., 2011). A match is a relationship between

one or more elements of a scheme and one or more

elements of the other.

Figure 1 shows the simpliﬁed representation of

two schemes and the matches (or correspondences)

found in dotted lines. The problem proves to be chal-

lenging because the matching task may contain ele-

ments described differently (for example an acronym)

and still be matched (PurchaseOrder and PO), ele-

ments whose semantics are partially matched (Name

representing the full name and FName only the ﬁrst

name), elements with similar names but representing

different concepts in the schemes (Address represent-

ing the billing address and ShipAddress representing

the delivery address) and elements without a corre-

spondent in the other scheme ( Product, BillTo and

Customer).

2.4 Generation Database Schema from

an OWL Files

In (Panawong et al., 2016) a method is presented to

automatically generate a database schema from an

OWL (Ontology Web Language) ﬁle. In addition,

ICINCO 2021 - 18th International Conference on Informatics in Control, Automation and Robotics

212

Figure 1: Two schemes and possible marriages (dotted ar-

rows) (Rodrigues, 2013).

a mapping conﬁguration can be created to associate

the ontology and the data schema generated by the

process. The method is divided into three main pro-

cesses. First, the inserted OWL ﬁle is read and a syn-

tactic analysis of its contents is performed. From this,

two types of ontological unit are captured: concept

and concept relation. The parsing starts from the root

class. Then, the properties of the class are read and

recorded. For properties, property types are all impor-

tant in later processes, so that object ownership (part

of) and data ownership (attribute of) are recorded sep-

arately along with their restrictions. After scrolling

through all properties, all classes and subclasses are

scrolled until all classes are read.

From the classes captured in the previous process,

the table is generated. For each class, a table is cre-

ated and its properties are generated as columns of the

table. A data property is considered to be a ﬁeld for

entering data, while an object property must be linked

to another table using a foreign key following a class

constraint. A table name and column tags are gener-

ated following a label from an ontology and relation-

ship class. The database table and columns generated

together with an ontology class label are written to a

ﬁle for use in a later process.

In (Afzal et al., 2016) is presented a tool called

textit OWLMap, fully automatic to provide a lossless

approach to transforming ontology into a relational

database format. In the proposed system, initially a

user will select an ontology ﬁle. Then, information

is extracted about ontology constructs using the Jena

API. The mapping rules are deﬁned to guarantee a

lossless transformation. Based on the mapping rules,

ontological constructions are transformed into a rela-

tional database.

Mapping rules are used to transform an ontol-

ogy into a relational database. According to these

mapping rules, ontology classes must be transformed

into tables, object type properties must be mapped

into columns or tables, data type properties must be

mapped into columns or tables according to the map-

ping rules and constraints must be stored in metadata

tables.

In (Abo Zidan et al., 2019) a tool is presented

that acts as a bridge between a preexisting relational

database and the semantic data, which provides a fully

automated approach that converts a populated ontol-

ogy into a relational database without losing the on-

tological structure, preserving the relations of knowl-

edge between the entities. In the Figure 2 is present

the system architecture.

Figure 2: System architecture (Abo Zidan et al., 2019).

As in (Afzal et al., 2016), this tool also uses map-

ping rules to transform an ontology into a relational

database. Mapping rules are necessary for convert-

ing an OWL ontology into a relational database. The

extraction of classes, properties and the ontological

construction to transform the ontology into a rela-

tional database is performed through the DotNetRDF

library.

After identifying the domain ontology and the on-

tological constructions are extracted using the tex-

tit DotNetRDF library extracting the classes, prop-

erties with their two object types and data type and

other constructions. The proposed conversion map-

ping rules are applied, then links will be identiﬁed

between the relational database and the converted on-

tology using the WordNet library. As a result, a rela-

tional database is obtained.

2.5 JSON Schema Discovery

In recent years, JavaScript Object Notation (JSON)

has been gaining popularity, as it provides a

System Proposal for Integrating Quality Control Data of Components of the Brazilian Oil and Gas Industry

213

OWL File

Database

OCR JSON File

Schema discovery

Schema generator

Schema matching

JSON File

Data schema

Database generator

Matching result

Data insert

Figure 3: Proposed system.

lightweight data exchange format with great perfor-

mance (Izquierdo and Cabot, 2013). JSON is a

human-readable format that consists of sets of objects

(that is, types or concepts) described by name/value

pairs (that is, ﬁelds or attributes). JSON has no

schema, that is, there is no schema that speciﬁes

the internal structure of JSON objects, instead, the

schema is implicit. Schemaless data is particularly in-

teresting in cases dealing with non-uniform data (for

example, non-uniform types or custom ﬁelds) or in

schema migration (Fowler, 2013), however, it can be-

come a difﬁculty in data integration scenarios , where

it becomes necessary to discover, at least partially, the

structure to correctly process the data.

In (Izquierdo and Cabot, 2013) an approach is pre-

sented to automatically discover an implicit schema

of a set of JSON documents. For this, model-oriented

techniques were used to design a process in which

initial schema excerpts are discovered from each indi-

vidual service and then combined to obtain a compos-

ite model that describes the underlying domain model

of the application, which facilitates understanding

of JSON-based services. The approach was imple-

mented in Java and made available on the hosting plat-

form Github (Izquierdo, 2014).

3 PROPOSED SYSTEM

As Databooks are usually scanned documents or PDF

documents, processing is necessary to extract only

the relevant information.Thus, a module is responsi-

ble for performing this processing. This module per-

forms this processing through OCR (Optical Charac-

ter Recognition) algorithms, transforming the Data-

book into a JSON ﬁle. Assuming that the informa-

tion extracted from a Databook will be contained in a

JSON ﬁle, the system of Figure 3 has been proposed.

The main objective of the proposed system is to

allow the integration of the diversity of types of com-

ponents, documents and correct data assignment even

under different structures. The proposed system has

two main functionalities. The ﬁrst functionality is

to generate a database from an ontology. For that,

from an OWL ﬁle, an algorithm should generate the

database schema corresponding to the ontology. From

this data schema, the database will be created. For

the development of this functionality, the works re-

ported will be maded (Afzal et al., 2016), (Panawong

et al., 2016), (Abo Zidan et al., 2019). In addition, the

schema generated from the ontology will be used in

the schema matching.

The second functionality, on the other hand, con-

sists of, starting from a JSON ﬁle of unknown

schema, inserting this data in their respective database

tables. To do this, from an unknown JSON schema,

it is necessary to infer which schema this ﬁle corre-

sponds to. To carry out this stage of such function-

ality, the work presented by (Izquierdo and Cabot,

2013) will be taken as a reference, and libraries that

perform this function will also be investigated.

Knowing the schema of the JSON ﬁle, the match-

ing schema will be performed, generated from the on-

tology. To perform the schema matching, the tool pro-

posed by (Aumueller et al., 2005), called Coma ++

(A system for ﬂexible combination of schema match-

ing approaches), will be used as the basis. This tool

was chosen because it allows the matching of differ-

ent ﬁles (for example, OWL and SQL), in addition to

being an open source project under the AGPL license.

Currently, the tool does not have the ability to work

with JSON ﬁles, so this capability should be devel-

oped. Then, a ﬁle containing the matching schema

matches is generated. Finally, based on the JSON

ﬁle and the result of the schema matching, an algo-

rithm inserts the data into the corresponding database

tables.

ICINCO 2021 - 18th International Conference on Informatics in Control, Automation and Robotics

214

4 CONCLUSION AND FUTURE

WORK

This article proposed a system for integrating quality

control data from components in the Brazilian oil and

gas industry.The system is proposed based on tools al-

ready developed, for example Coma ++ (Aumueller

et al., 2005) e Ontology2RDB(Abo Zidan et al., 2019).

Thus, the main objective of the system is to promote

a structure that allows access and management to this

data, promoting a better management of equipment

quality control. Therefore, in the case of manual data

veriﬁcation, inspectors will perform the task faster

due to the ease of access to information. Also, this

quality control data can be provided to a system that

will infer as to the completeness and compliance of

the databooks.

As future work, tests can be cited to validate the

functionalities of the tools mentioned in this work.

From these tests, it will be possible to evaluate which

tool best suits the proposed system.An important step

for the implementation of this work is the develop-

ment of an ontology applied to the inspection of com-

ponents of the oil industry. First, a search in the lit-

erature should be carried out for ontologies applied to

this domain and then an ontology developed with the

peculiarities of the analyzed domain.Finally, after the

development and integration of the proposed system,

the effective gains from its use must be evaluated.

REFERENCES

Abo Zidan, R., Almustafa, M., and Tahawi, M. (2019). On-

tology2rdb for storing ontology in relational database.

Journal of Theoretical and Applied Information Tech-

nology, 97:2229– 2240.

Afzal, H., Waqas, M., and Naz, T. (2016). Owlmap:

fully automatic mapping of ontology into relational

database schema. International journal of advanced

computer science and applications, 7(11):7–15.

Almeida, M. B. (2014). Uma abordagem integrada so-

bre ontologias: Ci

encia da informac¸

ao, ci

encia da

computac¸

ao e ﬁlosoﬁa. Perspectivas em Ci

encia da

Informac¸

ao, 19(3):242–258.

Aumueller, D., Do, H.-H., Massmann, S., and Rahm, E.

(2005). Schema and ontology matching with coma++.

In Proceedings of the 2005 ACM SIGMOD interna-

tional conference on Management of data, pages 906–

908.

Bernstein, P. A., Madhavan, J., and Rahm, E. (2011).

Generic schema matching, ten years later. Proceed-

ings of the VLDB Endowment, 4(11):695–701.

de Freitas, A. L. P. (2017). Proposta de uma Ontologia para

Tratamento de Diverg

encia de Dados na Aplicac¸

ao de

Fus

ao de Sensores em Plantas Industriais. Master’s

thesis, Universidade Federal do Rio Grande, Brasil.

Duarte, G. (2010). O Controle da Qualidade em Pro-

cessos de Produc¸

ao Mec

anica N

ao-Seriada. Escola

Polit

ecnica da Universidade de S

ao Paulo.

Elmasri, R. and Navathe, S. B. (2011). Database systems,

volume 9. Pearson Education Boston, MA.

Fowler, M. (2013). Schemaless data structures. URL

http://martinfowler. com/articles/schemaless.

Izquierdo, J. L. C. (2014). JSON discoverer tool. https:

//github.com/SOM-Research/jsonDiscoverer.

Izquierdo, J. L. C. and Cabot, J. (2013). Discovering im-

plicit schemas in json data. In International Confer-

ence on Web Engineering, pages 68–83. Springer.

Nascimento, R. S. (2010). Qualidade na soldagem em uma

empresa fabricante de estruturas met

alicas soldadas

do setor de

oleo e g

as. VI Congresso Nacional de Ex-

cel

encia em Gest

ao.

Panawong, J., Ruangrajitpakorn, T., and Buranarach, M.

(2016). An automatic database generation and ontol-

ogy mapping from owl ﬁle. In JIST (Workshops &

Posters), pages 20–27.

Petr

oleo-Brasileiro-S.A (2017). ABC da inspec¸

de fabricac¸

ao. https://www.petronect.com.br/

irj/go/km/docs/pccshrcontent/Site\%20Content\

%20\%28Legacy\%29/Portal2018/arquivos/cep/

abc inspecao.pdf.

Rachman, A. and Ratnayake, R. C. (2019). An ontology-

based approach for developing offshore and on-

shore process equipment inspection knowledge base.

In International Conference on Offshore Mechan-

ics and Arctic Engineering, volume 58783, page

V003T02A084. American Society of Mechanical En-

gineers.

Rodrigues, D. d. A. (2013). Casamento de esquemas de

banco de dados aplicando aprendizado ativo. Master’s

thesis, Universidade Federal do Amazonas, Brasil.

Staab, S. and Studer, R. (2010). Handbook on ontologies.

Springer Science & Business Media.

Wand, Y., Storey, V. C., and Weber, R. (1999). An ontologi-

cal analysis of the relationship construct in conceptual

modeling. ACM Transactions on Database Systems

(TODS), 24(4):494–528.

System Proposal for Integrating Quality Control Data of Components of the Brazilian Oil and Gas Industry

215