DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES

A User-driven Approach

ecile Favre, Fadila Bentayeb and Omar Boussaid

ERIC Laboratory, University of Lyon, 5 av. Pierre Mend

es-France, 69676 Bron Cedex, France

Keywords:

Data warehouse model evolution, dimension hierarchies updates, users’ knowledge, aggregation rules.

Abstract:

We designed a data warehouse for the French bank LCL meeting users’ needs regarding marketing operations

decision. However, the nature of the work of users implies that their requirements are often changing. In

this paper, we propose an original and global approach to achieve a user-driven model evolution that provides

answers to personalized analysis needs. We developed a prototype called WEDrik (data Warehouse Evolution

Driven by Knowledge) within the Oracle 10g DBMS and applied our approach on banking data of LCL.

1 INTRODUCTION

In the context of a collaboration with the French bank

LCL-Le Cr

edit Lyonnais (LCL), the objective was to

develop the decision support system for marketing op-

erations. To achieve this objective, data of interests

coming from heterogeneous sources have been ex-

tracted, ﬁltered, merged, and stored in an integrating

database, called data warehouse. Due to the role of

the data warehouse in the daily business work of an

enterprise like LCL, the requirements for the design

and the implementation are dynamic and subjective.

Therefore, data warehouse model design is a contin-

uous process which has to reﬂect the changing envi-

ronment, i.e. the data warehouse model must evolve

over the time in reaction to the enterprise’s changes.

Indeed, data sources are often autonomous and

generally have an existence and purpose beyond that

of supporting the data warehouse itself. An impor-

tant consequence of the autonomy of data sources is

the fact that those sources may change without be-

ing controlled from the data warehouse administrator.

Therefore, the data warehouse must be adapted to any

change which occurs in the underlying data sources,

e.g. changes of the schemas. Beside these changes on

the source level, the users often change their require-

ments. Indeed, the nature of their work implies that

their requirements are personal and usually evolve,

thus they do not reach a ﬁnal state. Moreover these

requirements can depend on their own knowledge.

Our key idea consists then in considering a spe-

ciﬁc users’ knowledge, which can provide new aggre-

gated data. We represent this users’ knowledge un-

der the form of aggregation rules. In a previous work

(Favre et al., 2006), we proposed a data warehouse

formal model based on these aggregation rules to al-

low a schema evolution. In this paper, we extend our

previous work and propose an original and complete

data warehouse model evolution approach driven by

emergent users’ analysis needs to deal not only with

the schema evolution but also with the required evo-

lution for the data.

To perform this model evolution, we deﬁne algo-

rithms to update dimension hierarchies by creating

new granularity levels and to load the required data

exploiting users’ knowledge. Hence, our approach

provides a real time evolution of the analysis possibil-

ities of the data warehouse to cope with personalized

analysis needs. To validate our approach, we devel-

oped a prototype named WEDriK (data Warehouse

Evolution Driven by Knowledge), within the Oracle

10g Database Management System (DBMS), and ap-

plied our approach on data of the French bank LCL.

The remainder of this paper is organized as fol-

lows. First, we detail our global evolution approach

in Section 2. We present implementation elements

in Section 3, detailing the necessary algorithms. In

Section 4, we show, through a running example ex-

tracted from the LCL bank case study, how to use

our approach for analysis purposes. Then, we dis-

cuss the state of the art regarding model evolution in

data warehouses in Section 5. We ﬁnally conclude

and provide future research directions in Section 6.

206

Favre C., Bentayeb F. and Boussaid O. (2007).

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES - A User-driven Approach.

In Proceedings of the Ninth International Conference on Enterprise Information Systems - DISI, pages 206-211

DOI: 10.5220/0002390502060211

 SciTePress

2 A USER-DRIVEN APPROACH

In this section, we present our user-driven global ap-

proach for updating dimension hierarchies to inte-

grate new analysis possibilities into an existing data

warehouse (Figure 1). The ﬁrst phase of our approach

is the acquisition of the users’ knowledge under the

form of rules. Then, in the integration phase, these

rules are transformed into a mapping table within the

DBMS. Then the evolution phase allows to create a

new granularity level. Finally, the analysis phase

could be carried out on the updated data warehouse

model. It is an incremental evolution process, since

each time new analysis needs may appear and ﬁnd an

answer. We detail in this section the ﬁrst three phases.

Figure 1: Data warehouse user-driven evolution process.

2.1 Acquisition Phase

In our approach, we consider a speciﬁc users’ knowl-

edge, which determines the new aggregated data to in-

tegrate into the underlying data warehouse. More pre-

cisely, this knowledge deﬁnes how to aggregate data

from a given level to another one to be created. It is

represented in the form of “if-then” rules.

To achieve the knowledge acquisition phase, our

key idea is to deﬁne an aggregation meta-rule to rep-

resent the structure of the aggregation link. The meta-

rule allows the user to deﬁne: the granularity level

which will be created, the attributes characterizing

this new level, the granularity level on which is based

the level to be created and ﬁnally the attributes of the

existing level used to build the aggregation link.

Let EL be the existing level, {EA

, i=1..m} be the

set of m existing attributes of EL on which conditions

are expressed, GL be the generated level and {GA

j=1..n} be the set of n generated attributes of GL. The

meta-rule is as follows:

if ConditionOn(EL,{EA

}) then GenerateValue(GL,{GA

})

Thus, the if-clause contains conditions on the at-

tributes of the existing level. The then-clause contains

the deﬁnition of the values of the generated attributes,

which characterize the new level to be created. The

user instanciates the above meta-rule to deﬁne the in-

stances of the new aggregated level and the aggrega-

tion link with the instances of the existing level.

In the literature, different types of aggregation

links can compose the dimension hierarchies to model

real situations (Pedersen et al., 2001). In this paper

we deal with classical links: an instance in a lower

level corresponds to exactly one instance in the higher

level, and each instance in the lower level is repre-

sented in the higher level. Thus, aggregation rules

deﬁne a partition of the instances in the lower level,

and each class of this partition is associated to one

instance of the created level. To check that the aggre-

gation rules satisfy this deﬁnition, we deal with the

relational representation of aggregation rules which

is a mapping table. The transformation is achieved

during the integration phase.

2.2 Integration Phase

The integration phase consists in transforming “if-

then” rules into mapping tables and storing them into

the DBMS, by means of relational structures.

For a given set of rules which deﬁnes a new gran-

ularity level, we associate one mapping table. To

achieve this mapping process, we exploit ﬁrstly the

aggregation meta-rule for building the structure of the

mapping table, and secondly we use the aggregation

rules to insert the required values in the created map-

ping table. Since we are in a relational context, each

granularity level corresponds to one relational table.

Let ET (resp. GT) be the existing (resp. gener-

ated) table, corresponding to EL (resp. GL) in a logi-

cal context. Let {EA

, i=1..m} be the set of m attributes

of ET on which conditions are expressed and {GA

j=1..n} be the set of n generated attributes of GT. The

mapping table MT

GT built with the meta-rule corre-

sponds to the following relation:

MT GT(EA

, ..., EA

, GA

, ..., GA

, ...GA

)

The aggregation rules allow to insert in this mapping

table the conditions on the attributes EA

, and the val-

ues of the generated attributes GA

In parallel, we deﬁne a mapping meta-table to

gather information on the different mapping tables.

This meta-table includes the following attributes:

Mapping

Table ID and Mapping Table Name correspond

respectively to the identiﬁer and the name of the map-

ping table; Attribute

Name and Table Name denote the

attribute implied in the evolution process and its ta-

ble; Attribute Type provides the role of this attribute.

More precisely, this last attribute has two modalities:

‘conditioned’ if it appears in the if-clause and ‘gen-

erated’ if it is in the then-clause. Note that even if

an attribute is a “generated” attribute, it can be used

as a “conditioned” one for the construction of another

level.

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES - A User-driven Approach

207

2.3 Evolution Phase

The data warehouse model evolution consists in up-

dating dimension hierarchies of the current data ware-

house model by creating new granularity levels. The

created level can be added at the end of a hierarchy or

inserted between two existing ones, we thus speak of

adding and inserting a level, respectively. In the two

cases, the rules have to determine the aggregation link

between the lower and the created levels. In the sec-

ond case, it is also necessary to determine the aggre-

gation link between the inserted level and the existing

higher level in an automatic way. The user can insert

a level between two existing ones only if it is possible

to semantically aggregate data from the inserted level

to the existing higher level. For each type of dimen-

sion hierarchy updates, we propose, in the following

section, an algorithm which creates a new relational

table and its necessary link(s) with existing tables.

3 IMPLEMENTATION

To achieve the data warehouse model updating, we

developed a prototype named WEDriK (data Ware-

house Evolution Driven by Knowledge) within the

Oracle 10g DBMS. It exploits different algorithms

whose sequence is represented in the Figure 2.

Figure 2: Algorithms sequence.

The starting point of the algorithms sequence is a

set of rules expressed by a user to create a granularity

level. These rules are transformed under the form of

a mapping table (Algorithm 1). Then we check the

validity of the rules by testing the content of the map-

ping table (Algorithm 2). If the rules are not valid, the

user has to modify them. This process is repeated un-

til the rules become valid. If they are valid, the model

evolution is carried out (Algorithm 3).

Transformation algorithm. For one meta-rule MR

and its associated set of aggregation rules R, we build

one mapping table MT

GT. The structure of the map-

ping table is designed with the help of the meta-

rule, and the content of the mapping table is de-

ﬁned with the set of aggregation rules. Moreover, the

necessary information are inserted in the meta-table

MAPPING

META TABLE (Algorithm 1).

Algorithm 1 Pseudo-code for transforming aggrega-

tion rules into a mapping table.

Require: (1) the meta-rule MR: if ConditionOn(EL,{EA

}) then

GenerateValue(GL, {GA

}) where EL is the existing level, {EA

i = 1..m} is the set of m attributes of EL, GL is the level to be

created, and {GA

, j = 1..n} is the set of n generated attributes of

GL; (2) the set of aggregation rules R; (3) the mapping meta-table

MAPPING

META TABLE

Ensure: the mapping table MT

{Creation of the MT

GT’s structure}

1: CREATE TABLE MT

GT({EA

},{GA

})

{Values insertion in MT

GT}

2: for all rule of R do

3: INSERT INTO MT

GT VALUES (ConditionOn(EL,{EA

})

,GenerateValue(GL, {GA

}))

4: end for

{Values insertion in MAPPING

META TABLE}

5: for all EA

of {EA

} do

6: INSERT INTO MAPPING

META TABLE VALUES(MT GT,

EL, EA

, ‘conditioned’)

7: end for

8: for all GA

of {GA

} do

9: INSERT INTO MAPPING

META TABLE VALUES(MT GT,

GL, GA

, ‘generated’)

10: end for

11: return MT

Constraints checking algorithm. For each tuple of

GT, we write the corresponding query to build a

view which contains a set of instances of the table ET.

First, we check that the intersection of all views con-

sidered by pairs is empty. Second, we check that the

union of the instances of the whole of views corre-

sponds to the whole of instances of the table ET (Al-

gorithm 2).

Model evolution algorithm. The model evolution

consists in the creation of the table GT and the deﬁni-

tion of the required links, according to the evolution

type (Algorithm 3). For an addition or an insertion,

it thus consists in creating the table GT and inserting

values by exploiting the mapping table MT GT, and

then in linking the new table GT with the existing one

ET. For the insertion, we just use the mapping ta-

ble MT

GT, which only contains the link between the

lower table ET and the generated table GT. Then we

have to automatically establish the aggregation link

between GT and ET2, by inferring it according to the

one that exists between ET and ET2.

ICEIS 2007 - International Conference on Enterprise Information Systems

208

Algorithm 2 Pseudo-code for checking constraints.

Require: existing table ET, mapping table MT GT

Ensure: Intersection

constraint checked and Union constraint checked

two booleans

{Views creating}

1: for all tuple t of MT

GT do

2: CREATE VIEW v

t AS SELECT * FROM ET where

Conjunction(t({EA}))

3: end for

{Checking the intersection of views considered by pair is empty}

4: for x = 1 to t

max

5: I=SELECT * FROM v

x INTERSECT SELECT * FROM v x+1

6: if I 6= Ø then

7: Intersection

constraint checked=false

8: end if

9: end for

{Checking the union of views corresponds to the table ET}

10: U=SELECT * FROM v

11: for x = 2 to t

max

12: U=U UNION SELECT * FROM v

13: end for

14: if U 6= SELECT * FROM ET then

15: Union

constraint checked=false

16: end if

17: return Intersection

constraint checked

18: return Union

constraint checked

Algorithm 3 Pseudo-code for hierarchy dimension

updating.

Require: evol type the type ofevolution (inserting or adding a level) , exist-

ing lower table ET, existing higher table ET2, mapping table MT

GT,

{EA

, i=1..m} the set of m attributes of MT

GT which contain the con-

ditions on attributes , {GA

, j=1..n} the set of n generated attributes of

GT which contain the values of the generated attributes, the key

attribute GA

key

of GT, the attribute B linking ET with ET2

Ensure: generated table GT

{Creating the new table GT and making the aggregation link between

ET and GT}

1: CREATE TABLE GT (GA

key

, {GA

});

2: ALTER TABLE ET ADD (GA

key

);

3: for all (tuple t of MT

GT) do

4: INSERT INTO GT VALUES (GA

key

,{GA

})

5: UPDATE ET SET GA

key

WHERE {EA

}

6: end for

{If it is a level insertion: linking GT with ET2}

7: if evol

type=“inserting” then

8: ALTER TABLE GT ADD (B);

9: UPDATE GT SET B=(SELECT DISTINCT B FROM ET WHERE

ET.GA

key

=GT.GA

key

);

10: end if

11: return GT

4 A RUNNING EXAMPLE

To illustrate our approach, we use a case study deﬁned

by the French bank LCL. LCL is a large company,

where the data warehouse users have different points

of view. Thus, they need speciﬁc analyses, which de-

pend on their own knowledge and their own objec-

tives. We applied our approach on data about the an-

nual Net Banking Income (NBI). The NBI is the proﬁt

obtained from the management of customers account.

It is a measure observed according to several dimen-

sions: CUSTOMER, AGENCY and YEAR (Figure 3).

Figure 3: Data warehouse model for the NBI analysis.

To illustrate the usage of WEDriK, let us focus on

a simpliﬁed running example. Let us take the case of

the person in charge of student products. He knows

that there is three types of agencies: “student” for

agencies which gather only student accounts, “for-

eigner” for agencies whose customers do not leave in

France, and “classical” for agencies without any par-

ticularity. He needs to obtain NBI analysis according

to the agency type. The existing data warehouse could

not provide such an analysis. What we seek to do is

showing step by step how to achieve this analysis with

our approach.

Let us consider the samples of the tables AGENCY

and TF-NBI (Figure 4).

Figure 4: Samples of tables.

1-Acquisition phase. The acquisition phase allows

the user to deﬁne the following aggregation meta-rule

in order to specify the structure of the aggregation link

for the agency type:

if ConditionOn(AGENCY,{AgencyID})

then GenerateValue(AGENCY

TYPE,{AgencyTypeLabel})

He instances the meta-rule to deﬁne the different

agency types:

(R1) if AgencyID∈ {‘01903’,‘01905’,‘02256’} then Agency-

TypeLabel=‘student’

(R2) if AgencyID=‘01929’then AgencyTypeLabel=‘foreigner’

(R3) if AgencyID∈{‘01903’,‘01905’,‘02256’,‘01929’}then

AgencyTypeLabel=‘classical’

2-Integration phase. The integration phase ex-

ploits the meta-rule and the different aggregation rules

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES - A User-driven Approach

209

to generate the mapping table MT AGENCY TYPE (Fig-

ure 5). The information concerning the mapping table

is inserted in MAPPING

META TABLE (Figure 6).

Figure 5: Mapping table for the AGENCY TYPE level.

Figure 6: Mapping meta-table.

3-Evolution phase. Then the evolution phase al-

lows to create the AGENCY

TYPE table and to update

the AGENCY table (Figure 7). More precisely, the

MAPPING META TABLE allows to create the structure

of the AGENCY

TYPE table and a primary key is au-

tomatically added. The MT

AGENCY TYPE mapping

table allows to insert the required values. Further-

more, the AGENCY table is updated to be linked with

the AGENCY

TYPE table, with the addition of the Agen-

cyTypeID attribute.

Figure 7: Created AGENCY TYPE table and updated

AGENCY table.

4-Analysis phase. Finally, the analysis phase al-

lows to exploit the model with new analysis axes.

Usually, in an OLAP environment, queries require

the computation of aggregates over various dimension

levels. Indeed, given a data warehouse model, the

analysis process allows to summarize data by using

(1) aggregation operators such as SUM and (2) GROUP

BY clauses. In our case, the user wants to know the

sum of NBI according to the agency types he has de-

ﬁned. The corresponding query and the results are

provided in Figure 8.

5 RELATED WORK

When designing a data warehouse model, involving

users is crucial (Kimball, 1996). What about the data

Figure 8: Query and results for NBI analysis.

warehouse model evolution ? According to the litera-

ture, model evolution in data warehouses can take the

form of model updating or temporal modeling.

Concerning model updating, only one model is

supported and the trace of the evolutions is not pre-

served. In this case, the data historization can be

corrupted and analysis could be erroneous. We dis-

tinguish three types of approach. The ﬁrst approach

consists in providing evolution operators that allow an

evolution of the model (Hurtado et al., 1999; Blaschka

et al., 1999). The second approach consists in updat-

ing the model, focusing on enriching dimension hier-

archies. For instance, some authors propose to enrich

dimension hierarchies with new granularity levels by

exploiting semantic relations provided by WordNet

(Maz

on and Trujillo, 2006). The third approach is

based on the hypothesis that a data warehouse is a set

of materialized views. Then, when a change occurs

in a data source, it is necessary to maintain views by

propagating this change (Bellahsene, 2002).

Contrary to model updating, temporal modeling

makes it possible to keep track of the evolutions, by

using temporal validity labels. These labels are asso-

ciated with either dimension instances (Bliujute et al.,

1998), or aggregation links (Mendelzon and Vaisman,

2000), or versions (Eder and Koncilia, 2001; Body

et al., 2002). Versioning is the subject today of many

work. Versions are also used to provide an answer

to “what-if analysis” by creating versions to simulate

a situation (B

ebel et al., 2004). Different work are

then interested in analyzing data throughout versions,

in order to achieve the ﬁrst objective of data ware-

housing: analyzing data in the course of time (Morzy

and Wrembel, 2004; Golfarelli et al., 2006). In these

works, an extension to a traditional SQL language is

required to take into account the particularities of the

approaches for analysis or data loading.

Both of these approaches do not directly involve

users in the data warehouse evolution process, and

thus constitute solutions rather for a data sources

evolution than for users’ analysis needs evolution.

Thus these solutions make the data warehouse model

evolve, without taking into account new analysis

http://wordnet.princeton.edu/

ICEIS 2007 - International Conference on Enterprise Information Systems

210

needs driven by users’ knowledge, and thus without

providing an answer to personalized analysis needs.

However, the personalization of the use of a data

warehouse becomes crucial. Works in this domain

are particularly focused on the selection of data to be

visualized, based on users’ preferences (Bellatreche

et al., 2005). In opposition of this approach, we add

supplement data to extend and personalize the analy-

sis possibilities of the data warehouse. Moreover our

approach updates the data warehouse model without

inducing erroneous analyses; and it does not require

extensions for analysis.

6 CONCLUSION AND FUTURE

WORK

We aimed in this paper at providing an original user-

driven approach for the data warehouse model evolu-

tion. Our key idea is to involve users in the dimension

hierarchies updating to provide an answer to their per-

sonalized analysis needs based on their own knowl-

edge. To achieve this process, we propose a method

to acquire users’ knowledge and the required algo-

rithms. We developed a prototype named WEDriK

within the Oracle 10g DBMS and we applied our ap-

proach on the LCL case study.

To the best of our knowledge, the idea of user-

deﬁned evolution of dimensions (based on analysis

needs) is novel; there are still many aspects to be

explored. First of all, we intend to study the perfor-

mance of our approach in terms of storage space, re-

sponse time and algorithms complexity. Moreover,

we have to study how individual users’ needs evolve

in time, and thus we have to consider different strate-

gies of rules and mapping tables updating. Finally, we

are also interested in investigating the joint evolution

of the data sources and the analysis needs.

REFERENCES

ebel, B., Eder, J., Koncilia, C., Morzy, T., and Wrem-

bel, R. (2004). Creation and Management of Ver-

sions in Multiversion Data Warehouse. In XIXth ACM

Symposium on Applied Computing (SAC 04), Nicosia,

Cyprus, pages 717–723.

Bellahsene, Z. (2002). Schema Evolution in Data

Warehouses. Knowledge and Information Systems,

4(3):283–304.

Bellatreche, L., Giacometti, A., Marcel, P., Mouloudi, H.,

and Laurent, D. (2005). A Personalization Frame-

work for OLAP Queries. In VIIIth ACM International

Workshop on Data Warehousing and OLAP (DOLAP

05), Bremen, Germany, pages 9–18.

Blaschka, M., Sapia, C., and H

oﬂing, G. (1999). On

Schema Evolution in Multidimensional Databases. In

Ist International Conference on Data Warehousing

and Knowledge Discovery (DaWaK 99), Florence,

Italy, volume 1676 of LNCS, pages 153–164.

Bliujute, R., Saltenis, S., Slivinskas, G., and Jensen, C.

(1998). Systematic Change Management in Dimen-

sional Data Warehousing. In IIIrd International Baltic

Workshop on Databases and Information Systems,

Riga, Latvia, pages 27–41.

Body, M., Miquel, M., B

edard, Y., and Tchounikine, A.

(2002). A Multidimensional and Multiversion Struc-

ture for OLAP Applications. In Vth ACM Interna-

tional Workshop on Data Warehousing and OLAP

(DOLAP 02), McLean, Virginia, USA, pages 1–6.

Eder, J. and Koncilia, C. (2001). Changes of Dimension

Data in Temporal Data Warehouses. In IIIrd Interna-

tional Conference on Data Warehousing and Knowl-

edge Discovery (DaWaK 01), volume 2114 of LNCS,

pages 284–293.

Favre, C., Bentayeb, F., and Boussaid, O. (2006). A

Knowledge-driven Data Warehouse Model for Anal-

ysis Evolution. In XIIIth International Conference on

Concurrent Engineering: Research and Applications

(CE 06), Antibes, France, volume 143 of Frontiers

in Artiﬁcial Intelligence and Applications, pages 271–

278.

Golfarelli, M., Lechtenborger, J., Rizzi, S., and Vossen, G.

(2006). Schema Versioning in Data Warehouses: En-

abling Cross-Version Querying via Schema Augmen-

tation. Data and Knowledge Engineering, 59(2):435–

459.

Hurtado, C. A., Mendelzon, A. O., and Vaisman, A. A.

(1999). Maintaining Data Cubes under Dimension

Updates. In XVth International Conference on Data

Engineering (ICDE 99), Sydney, Australia, pages

346–355.

Kimball, R. (1996). The Data Warehouse Toolkit. John

Wiley & Sons.

Maz

on, J.-N. and Trujillo, J. (2006). Enriching Data Ware-

house Dimension Hierarchies by Using Semantic Re-

lations. In XXIIIrd British National Conference on

Databases (BNCOD 2006), Belfast, Northern Ireland,

volume 4042 of LNCS, pages 278–281.

Mendelzon, A. O. and Vaisman, A. A. (2000). Temporal

Queries in OLAP. In XXVIth International Confer-

ence on Very Large Data Bases (VLDB 00), Cairo,

Egypt, pages 242–253.

Morzy, T. and Wrembel, R. (2004). On Querying Versions

of Multiversion Data Warehouse. In VIIth ACM Inter-

national Workshop on Data Warehousing and OLAP

(DOLAP 04), Washington, Columbia, USA, pages 92–

101.

Pedersen, T. B., Jensen, C. S., and Dyreson, C. E. (2001).

A Foundation for Capturing and Querying Com-

plex Multidimensional Data. Information Systems,

26(5):383–423.

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES - A User-driven Approach

211