Gathering Formalized Information Requirements
of a Data Warehouse
Natalija Kozmina, Laila Niedrite and Janis Zemnickis
Faculty of Computing, University of Latvia, Raina blvd. 19, Riga, Latvia
Keywords: Data Warehouse, Information Requirements, Indicators, Formalization, Graphical User Interface.
Abstract: Information requirements of a data warehouse (DW) captured in natural language often have a common
issue of being ambiguous, inaccurate, or repeating. We offer an approach to formalize DW information
requirements based on our experience of using demand-driven methodology for DW conceptual design and
distinction between quantifying and qualifying data. In this paper we demonstrate a working prototype of
the iReq tool implemented for the purpose of collecting DW information requirements. Graphical user
interface (GUI) of the iReq tool conforms to the requirement formalization metamodel acquired as a result
of our previous research studies, is intuitive and user-friendly, and allows to define an unlimited number of
requirement counterpart elements. The functionality of the iReq tool is wide; it allows deriving a conceptual
model of a DW in a semi-automatic manner from gathered information requirements. Due to space
limitations, in this paper we cover only such components as GUI for input of the information requirements
illustrated with application examples, its underlying formal requirement repository, and a graph database
(DB) to represent a glossary of terms for requirement definition.
1 INTRODUCTION
Companies use DW systems to evaluate their
progress, they measure different aspects of
performance and analyze performance indicators. "A
data warehouse is a subject-oriented, integrated,
non-volatile, and time-variant collection of data in
support of management decisions" (Inmon, 2002).
Data are stored according to a multidimensional data
model of a DW. Data model must be implemented in
alignment with the information requirements
(Winter & Strauch, 2003) of a company.
Information requirements represent quantifying
data needed for performance measurement and a lot
of contextual attributes or so called qualifying data
that allow to analyze numerical performance
measurements in different perspectives and at
various levels of abstraction. Information
requirements must be appropriately transformed to
the elements of multidimensional paradigm, e.g.
dimensions, hierarchies, and cubes.
Methods for developing conceptual models of
DWs are either supply-driven or demand-driven.
Supply-driven methods use models of data sources
to determine the existing information requirements
and transform them into a DW model in a more or
less automated way. However, these methods do not
rely on actual analysis needs – instead they reflect
only operational data. Information requirements in
case of demand-driven methods (e.g. goal-oriented
or requirement-oriented) are collected during the
interviews. It is essential to elicit and analyze
typically unstructured information in a guided
manner during interviews (Prakash, 2016). Then, a
conceptual model is constructed based on the
knowledge about these requirements. To (partially)
automate this process, information requirements
must be represented formally. It is possible only
when some unified pattern in the formulated
information requirements can be discovered. A
similar problem of lacking a unified method for
eliciting and managing information requirements
also exists in the field of goal-oriented requirement
engineering for a DW system (Nasiri et al., 2015).
When the DW is used for performance
measurement purposes, information requirements
represent different performance indicators that are
expressed as more or less complex sentences with
similar structure. We could observe the similar
structure of these sentences while analyzing a set of
indicators from indicator database (Parmenter,
2010).
Kozmina, N., Niedrite, L. and Zemnickis, J.
Gathering Formalized Information Requirements of a Data Warehouse.
DOI: 10.5220/0006276802170224
In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017) - Volume 1, pages 217-224
ISBN: 978-989-758-247-9
Copyright © 2017 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
217
We have proposed a formal specification of
indicators in one of our previous works (Niedritis et
al., 2011), and a method for transforming formally
expressed information requirements or indicators
into a conceptual model of a DW (Kozmina et al.,
2013). The formal specification of requirements
(Niedritis et al., 2011) is represented with an
indicator metamodel that is based on analysis of the
structure of sentences that express requirements.
Results were used in a real DW project to extend the
metamodel with elements discovered in terms of a
case study (Kozmina & Niedrite, 2014).
These experiments were made without an
appropriate tool support and the need for such tool
was urgent, so a prototype tool was implemented.
The goal of this paper is to discuss pertinence of the
method for formal indicator definition and discover
further improvements.
The rest of the paper is organized as follows.
Background of the study is given in Section 2,
Section 3 describes the working principles and
examples of application of the iReq tool for input of
formalized DW information requirements, related
work is presented in Section 4, whereas Section 5
finalizes the paper with conclusions and future work.
2 BACKGROUND
In the course of our research, we found out that there
is a need of: (i) models for representing
requirements in a more formal way, and (ii) (semi-
automatic) methods to handle the process of
mapping requirements to design.
A schematically captured semi-automated
method for transforming information requirements
to the conceptual model of a DW could be seen in
Figure 1. The first step of the method is creating a
repository of formal requirements. The method uses
a set of requirements that complies with the
requirement formalization metamodel (Kozmina &
Niedrite, 2014) and is stored in the formal
requirement repository. The GUI of iReq tool
presented in this paper ensures requirement input
using a glossary of terms (and its synonyms) derived
from data elements of the DW source systems and
stored in a graph database (see Section 3). Then, (a)
simplified DW schema(s) is (are) acquired as an
output of the Pre-schema Generation Algorithm
(PGA) that analyses the structure of collected
requirements. Later, a developer processes pre-
schemas, for instance removes duplicate attributes or
builds hierarchies. One of the improved schemas
that meets the requirements best is chosen during an
interview with a client to instantiate a conceptual
model of the DW. Finally, requirement priorities
(Kozmina & Niedrite, 2014) are analyzed to learn,
for example, which of the planned reports should be
developed prior to others, which schema elements to
incorporate into dashboards, etc.
A Requirement mapping component can adjust
DW schema in accordance with evolving business
requirements in a semi-automatic fashion
(Solodovnikova et al., 2015). The algorithm
processes new and obsolete requirements and
deploys procedures that create new DW schema
versions. The algorithm takes advantage of the DW
logical and physical level metadata, requirement
formalization metamodel, and information about
data elements of the DW source systems to
propagate changes in requirements in the DW
schema.
This paper covers such components as GUI for
formal requirements input, its underlying repository,
and a graph DB to represent a glossary of terms. A
detailed explanation of other components is a subject
of a separate paper.
Figure 1: Method for transforming information requirements to the conceptual model of a DW.
Formal
requirement
repository
Pre-schema
Data warehouse
conceptual
model
Copy pre-schema
elements
Reports and
Dashboards
Analyze
priorities
and
hierarchies
GUI for formal
requirement
input
Enter
requirements
Requirement
mapping
component
Handle evolving
requirements
Graph DB
component
Provide terms
and synonyms
Pre-schema evaluation
and improvement
Developer
Pre-schema
acceptance
Client
Start End
Provide terms
and synonyms
PGA
Provide information
from DW source systems
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
218
3 iREQ TOOL FOR FORMAL
REQUIREMENTS
MANAGEMENT
iReq is a web-based tool with a back-end written in
PHP using Laravel framework and a front-end (GUI)
developed using JavaScript, CSS, jQuery
framework, and Bootstrap. DW indicators are stored
in the database management system (DBMS)
MariaDB, and information about DW source
systems and a glossary – in a graph database Neo4j.
Main requirements for the iReq GUI are: (i)
conformance to the requirement formalization
metamodel, (ii) ability to define and save
requirements, (iii) intuitive and user-friendly GUI,
and (iv) ability to define an unlimited number of
child elements. Current implementation of the iReq
tool meets the requirements, however, as the number
of requirement counterpart elements grows, the
demonstrativeness of the requirement gets lower. It
is particularly noticeable when iReq tool is used on a
device with a small screen.
3.1 An Example of Requirement
Formalization
We refer to the version of our proposed requirement
formalization metamodel published in (Kozmina &
Niedrite, 2014). However, in the course of iReq tool
development, we updated the metamodel with two
more classes – i.e. Business Process and
Stakeholder. Both classes have an added value, e.g.
data on Business Process improves the traceability
of requirements, while data on Stakeholder makes it
possible to process conflicting requirements stated
by different stakeholders. Data on both Business
Process and Stakeholder derived from the
formalized requirements and associated with
corresponding DW schema elements would help
manage user rights on the stage of analytical
reporting, and keep track of DW schema changes in
terms of certain business processes.
In this section we describe the requirement
formalization metamodel with iReq running
examples of iReq application depicted on Figure 2
and Figure 3. Informal requirements are as follows:
Show information on student to academic staff ratio
in each faculty” (see formal version on Figure 2) and
Show information on students from Riga that attend
lectures held in Latvian” (see formal version on
Figure 3). The bottom row elements represent parts
of the formal requirement statement, whereas the
remaining elements contain the model class names.
The key element of the metamodel is a
Requirement. Typically, a Requirement emerges in
terms of one or another Business Process (e.g. Study
process) and is expressed by a Stakeholder (e.g.
University Senate). Each requirement is
characterized by one or multiple Groups (e.g.
Strategic Plan 2010-2020), while a Theme as a
coarser level of grouping (e.g. Studies) may form
one or more groups. Each requirement is assigned a
Priority value according to MoSCoW prioritization
technique (e.g. should). Each Requirement classified
either as Simple or Complex. A Complex
Requirement is composed of two or more
Requirements either joined with an Arithmetical
Operator (e.g., ‘/’) or a Comparison, or not joined
with anything.
As results of the case study in (Kozmina &
Niedrite, 2014) show, a Simple Requirement may
consist of an Expression only – in this case, it allows
to compare a part of the requirement with some
expression or a pre-defined constant value.
A Complex Expression contains two or more
Expressions with an Arithmetical Operator in
between, whereas a Simple Expression can be
Qualifying data (e.g. employee_group, language,
city) or a Constant (e.g. “academic staff”, “LV”,
“Riga”). A Simple Requirement may consist of an
Operation that denotes a command applied to an
Object, and an optional Typified Condition. A
Complex Operation consists of two or more Actions,
which are either Aggregation (“roll-up”; e.g. count)
or Refinement (“drill-down”; e.g. show). In its turn,
an Object is either an instance of Quantifying data
(measurements, e.g. student, employee) or
Qualifying data (attributes or properties of
measurements, e.g. faculty, employee_group).
Slicing of information about Objects is ensured by
adding a constraint, i.e. a Typified Condition as seen
on Figure 3: “where language = ‘LV’ and city =
‘Riga’ ”. A Complex Condition joins two or more
conditions (e.g. “language = LV”, “city = Riga”)
with a Logical Operator from the set of values:
{“and”, “or”, “not”}. A Simple Condition consists of
a Comparison (values: {>, <, <=, >=, =, “is”}) of
two Expressions.
Gathering Formalized Information Requirements of a Data Warehouse
219
Figure 2: An example of the information requirement “show information on student to academic staff ratio in each faculty”
represented as a complex formalized requirement show (faculty) count (student) / count (employee) where employee_group
= “academic staff” in iReq tool.
Principles of reformulation of information
requirements (indicators) applied in examples above:
A component to be measured is treated as an
aggregated number of all occurrences of this
component. For example, “students” is
reformulated to “count (student)” (also
acceptable: “count (student occurrence)”), where
count is the most suitable aggregate function;
If a requirement contains such keywords as “%”,
“percent”, “percentage”, or “ratio”, then % is
treated as division of partial quantity by its total
quantity, and ratio – a comparison of two
different quantities by division. For instance,
“student-to-staff ratio” is reformulated to “count
(student) / count (employee)”.
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
220
Figure 3: An example of the information requirement “show information on students from Riga that attend lectures held in
Latvian” represented as a simple formalized requirement with complex condition count (student) where language = “LV”
and city = “Riga” in iReq tool.
The full list of principles of information
requirements (indicators) reformulation is available
in (Niedritis et al., 2011). These principles appeared
empirically and serve to translate requirements from
natural language to a state that is compatible with
the requirement formalization metamodel.
3.2 Configuration File
Main purpose of the iReq tool is to provide input of
formalized indicators in accordance with the
requirement formalization metamodel (Kozmina &
Niedrite, 2014). Classes and relationships defined in
the metamodel are translated into GUI component of
the iReq tool by means of the configuration file in
JSON format that describes the metamodel.
Figure 4(a) depicts a fragment of the metamodel
illustrating a Requirement element (which is either
Simple or Complex), and a Complex requirement
that contains one or many Requirement elements.
Figure 4(b) demonstrates a corresponding part of the
GUI with Requirement root element and two child
elements (i.e. Simple and Complex Requirements)
as a result of processing a configuration file.
The configuration file consists of a set of objects
and their attributes, where each object defines some
class of the metamodel. Attribute examples are id
(unique ID, e.g. “arithOper”), name (e.g.
“Arithmetical Operator”), parent, action, and values.
An example of the action attribute value is
“dropdown”, which enables a dropdown menu with
a set of values. Examples of values attribute are:
“+”, “-”, “*”, “/” of the Arithmetical Operator
object. Attribute parent contains a set of values that
refer to parent objects of a particular element.
A fragment of the configuration file is shown on
Figure 4(c), and it contains 3 objects (“req”,
“simpleReq”, and “complexReq”), and its parent-
child relations. Although the configuration file
ensures indicator input that conform to the
Gathering Formalized Information Requirements of a Data Warehouse
221
metamodel, currently it doesn’t prevent from all
error cases, e.g. a user can add more than one
“Arithmetical Operator” elements in a row.
(a)
(b)
(c)
Figure 4: Parent-child relations of the elements presented
as a fragment of the (a) metamodel, (b) GUI of the iReq
tool, (c) configuration file of the iReq tool.
3.3 Glossary of Terms & Term
Suggestions
In iReq tool, a user describes quantifying and
qualifying data with words and phrases, which are
later employed in pre-schema generation algorithm.
To decrease the rate of typos and semantic mistakes,
iReq uses both (i) previously entered words and
phrases stored in formal requirement repository, and
(ii) a glossary of terms.
A glossary of terms in iReq tool is derived from
data elements of the DW source systems (either
manually or exported from the DW source system as
a .CSV file) and implemented as a Neo4j graph DB.
Terms are connected by links of different kind. It
can be easily modified and supplemented with new
terms and its synonyms.
(a)
a link to indicate table attribute or term synonym
a term or its synonym
name of a table in the DW source system
(b)
Figure 5: A fragment of the (a) glossary of terms and (b)
data input with term suggestions.
An instance of the graph is seen on Figure 5,
where a DW source system contains a table
"Course" with columns "Title", "Language", and
"Credits". Column and table names are connected
with a "hasAttr" link. Terms "Course title" and
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
222
"Course name" are added to the glossary on Figure
5, and are connected to an attribute "title" with a
"hasSyn" link that denoted that these two terms are
synonyms of "title". When a user would start typing
"title", he/she could see "Course title" and "Course
name" as possible options.
Although it is recommended to select terms from
the glossary, a user may decide not to choose a term,
if the glossary seems incomplete. It is particularly
useful, if there is a need to state a new information
requirement on the data that doesn't exist in a DW
source system and in the glossary respectively. Such
requirements may indicate a necessity for adding
new data to the DW source system.
3.4 Data Model of iReq Formal
Requirement Repository
Each requirement indeed has a hierarchical tree
structure where the complete formalized requirement
itself is a root node linked to its counterpart elements
by a parent-child relationship (e.g. a Complex
Requirement contains Simple Requirements).
Besides, one parent element may have an arbitrary
number of children. When creating a new child
node, iReq tool keeps track of its parents. All
elements are parsed starting from the children at the
lowest level, moving up the tree data structure from
children to parents, and stored to the database.
The data model describes how requirement data
entered via iReq tool are physically stored in the
database. Table classes stores data on all the
elements that each requirement consists of, including
their type that corresponds to the metamodel class
(Quantifying Data, Action, Simple Condition, etc.),
while class_rels stores data on its relations
according to parent-child structure. Data on
stakeholders, business processes, and its relations
(stakeholders_rels, business_processes_rels) are
stored in a similar fashion. Each requirement is
associated with a particular theme and group. Also,
each successfully saved or updated requirement is
interpreted as an event; its data is stored in table reqs
and its counterpart elements are tied together with
reqs_id.
We deliberately didn’t divide requirement
elements into simple and complex on physical level.
The element type is captured in classes.type, while
parent-child relations and cardinality restrictions
stated in the requirement formalization metamodel
are defined in the configuration file. Thus, if
relations between classes change with time or new
classes of requirement counterpart elements appear
in the metamodel, it doesn’t affect the physical data
model.
4 RELATED WORK
The iReq tool presented in this paper is a working
prototype based on demand-driven methodology for
deriving a conceptual model of a DW semi-
automatically. Below we list several papers that
address the same issue.
In (Cravero Leal et al., 2013) a business-oriented
approach for DW development is presented.
VMOST (vision, mission, goals, strategies,
objectives, and tactics) business strategy analysis is
followed by the alignment of the elements according
to BMM or Business Motivational Model (OMG,
2015). Then, i* strategic dependency (SR) and
strategic rationale (SR) models are developed that
later are manually transformed into a
multidimensional UML class diagram. Cravero Leal
et al. (2013) provide a set of guidelines for
producing a conceptual model of a DW, however, a
prototype tool is yet to come.
In its turn, a tool that generates a DW conceptual
model in an automatic manner is described in
(Thenmozhi & Vivekanandan, 2013). It employs a
hybrid design approach (both demand-driven and
supply-driven) and takes advantage of matching
information requirements formally represented using
ontology with the data source ontology. In terms of
this approach, a designer has to state explicitly
goals, context (to derive attributes), and measures,
whereas in iReq the distinction of attributes and
measures is done automatically.
Pardillo & Mazón (2011) discuss applications of
ontologies in the context of DW design such as
incompleteness of multidimensional model,
specification of additivity constraints, reconciling
requirements and data sources, etc. For instance, if
the aim is to automatize the process of matching
information requirements with data sources, it is
advised to employ semantic knowledge from
multidimensionally-annotated ontologies. Above-
mentioned issues may be a subject for future work
on improvement of the iReq tool.
5 CONCLUSIONS AND FUTURE
WORK
In this paper we demonstrated the iReq tool
implemented for the purpose of collecting DW
Gathering Formalized Information Requirements of a Data Warehouse
223
information requirements (i.e. indicators). GUI of
the iReq tool conforms to the requirement
formalization metamodel (Kozmina & Niedrite,
2014), is intuitive and user-friendly, and allows to
define an unlimited number of requirement
counterpart elements.
DW information requirements input by mean of
the iReq tool may take up more time, if a
requirement consists of a large set of counterpart
elements, because each element has to be added
separately using GUI. An experienced iReq user,
who has no difficulties with defining formal
requirements manually, might want to enter DW
information requirements as an input expression that
would be processed by iReq tool and saved into the
formal requirement repository. This feature is not
available in current version of the iReq tool, but is
planned to be added to iReq GUI in the future.
The aim of this paper was not to discuss further
use of the collected indicators with a purpose to
generate a DW candidate schema (i.e. pre-schema)
semi-automatically according to the process depicted
on Figure 1. Module of the iReq tool, which
generates pre-schemas, handles formal requirements
in compliance with particular algorithms (Kozmina
et al., 2013), optimization mechanisms, and
produces graphical representation of the DW pre-
schemas. It is planned to give user an opportunity to
manually accept, reject, or unite pre-schemas to
acquire an optimal DW conceptual model that is
aligned to user requirements. The analysis of such
functionality of the iReq tool, its implementation,
and practical evaluation of its adequacy in terms of
generation of the DW conceptual model are a
subject of a separate paper.
REFERENCES
Cravero Leal, A., Mazón, J.-N., Trujillo, J., 2013. A
business-oriented approach to data warehouse
development. Ingenieria e Investigación, 33(1):59-65.
Inmon, W.H., 2002. Building the Data Warehouse, Wiley
Computer Publishing, 3rd edition.
Kozmina, N., Niedrite, L., 2014. Extending a Metamodel
for Formalization of Data Warehouse Requirements.
In: Johansson, B. et al. (eds.) Perspectives in Business
Informatics Research (BIR'14), LNBIP 194, Springer,
Berlin, pp. 362-374.
Kozmina, N., Niedrite, L., Golubs, M., 2013. Deriving the
Conceptual Model of a Data Warehouse from
Information Requirements. In: Proc. of the 15th Int.
Conf. on Enterprise Information Systems (ICEIS'13),
vol. 1, pp. 136-144.
Nasiri, A., Zimányi, E., Wrembel, R., 2015. Requirements
Engineering for Data Warehouses. Revue des
Nouvelles Technologies de l'Information (EDA),
RNTI-B-11:49-64.
Niedritis, A., Niedrite, L., Kozmina, N., 2011.
Performance Measurement Framework with Formal
Indicator Definitions. In: J. Grabis, M. Kirikova (eds.)
Perspectives in Business Informatics Research
(BIR'11), LNBIP 90, Springer, Berlin, pp. 44-58.
OMG, The Business Motivational Model, 2015.
Available online: http://www.omg.org/spec/BMM/1.3.
Pardillo, J., Mazón, J.-N., 2011. Using Ontologies for the
Design of Data Warehouses. Int. Journal of Database
Management Systems (IJDMS), 3(2):73-87.
Parmenter, D., 2010. Key Performance Indicators:
Developing, Implementing, and Using Winning KPIs,
Jon Wiley & Sons, Inc., 2nd edition.
Prakash, D., 2016. Eliciting Information Requirements for
DW Systems. In: Proc. of 28th Int. Conf. on Advanced
Information Systems Engineering (CAiSE'16, Doctoral
Consortium). Available online:
http://ceur-ws.org/vol-1603/10000044.pdf.
Solodovnikova, D., Niedrite, L., Kozmina, N., 2015.
Handling Evolving Data Warehouse Requirements. In:
Communications in Computer and Information
Science, vol. 539, pp. 334-345.
Thenmozhi, M.; Vivekanandan, K. A Tool for Data
Warehouse Multidimensional Schema Design using
Ontology. International Journal of Computer Science
Issues (IJCSI), 10(2):161-168, March 2013.
Winter, R., Strauch, B., 2003. A Method for Demand-
driven Information Requirements Analysis in Data
Warehousing Projects. In: Proc. of 36th Annual
Hawaii Int. Conf. on System Sciences (HICSS'03),
USA, IEEE, pp. 1359-1365.
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
224