OBSERVABILITY OF INFORMATION IN DATABASES

New Spins in Data Warehousing for Credit Risk Management

Stjepan Pavlek and Damir Kalpić

Faculty of computing and electrical engineering, University of Zagreb, Unska 3, Zagreb, Croatia

Keywords: Observability, data warehousing, data mining, free text fields, coding schemes, chart of accounts.

Abstract: In this paper the observability of information in modern databases is investigated. Observable information is

the one that is explicitly stored in a database. Unobservable information is information hidden in various

coding schemes, transaction streams or free-text-description fields. Nowadays credit risk management tends

to employ cutting edge technologies and approaches from the fields of statistics and machine learning for

achieving their goals. Often it is forgotten that machine learning schemes can only use completely observ-

able information. The issue is eventually addressed – but instead when using the data it should be addressed

during the data warehouse design phase.

1 INTRODUCTION

The data warehousing branch of the computer sci-

ence had focused on easy information retrieval as

one of its top priorities ever since its inception.

In the area of straightforward information re-

trieval it is possible to identify several layers at

which the problem is being addressed. Furthermore,

we can notice the focus advancing from first to last

of these layers in time.

At first, the focus was on technical issues regard-

ing the organization of the database and the algo-

rithms used to access and manipulate the data in the

database. Some of the breakthrough ideas and con-

cepts that would classify in this level are: Professor

Kimball’s star schema (Kimball, 1996); Hash join

algorithm (Bratbergsengen, 1984); Bitmap indices

(O’Neil, 1987).

Knowledge involved in making progress at this

level is technical, computer science knowledge, and

there is none, or minimal, involvement from the side

of “business users”. We can moreover note the pre-

ceding decade (the nineties) and the first part of the

current decade being the golden age of this layer.

Second layer can be recognized as focus on the

data integration. It has been taking momentum since

the beginning of the current decade and is among top

buzzwords today. The spiritus movens here is that

we can see more about our business; that we can

learn more about our customers or that we can better

manage our risk if we are able to query data across

all of our platforms, across different transaction sys-

tems supporting different aspects of our business,

etc.

The achievements in this layer can be depicted as

advancing through different sublayers within it, as

through stages. It is one thing to pull the customer

data from different databases into a single database

(or to be able to query transparently different data-

bases from a single user interface, with a single user

query). Then, it is another thing to have these two

“lists” of customers matched so the same customer

from the two sources is recognized to be the same.

But this is still the beginning: furthermore, we need

to have the attributes (which means mainly the do-

mains of the attributes) unified across the sources.

Then there are hierarchies and classifications built

on top of customers, and so on.

This is the story of the data integration (short

version). To operate in this area it is required to

comprehend considerable understanding of the data

in business terms, either in a person or in a team. We

need to understand the attributes describing certain

database entity, to know what exactly the meaning

of its domain values is, in order to be able to recog-

nize when the same thing has been given two differ-

ent names, or vice versa, when two different things

are being called the same name.

361

Pavlek S. and Kalpi

c D. (2008).

OBSERVABILITY OF INFORMATION IN DATABASES - New Spins in Data Warehousing for Credit Risk Management.

In Proceedings of the Third International Conference on Software and Data Technologies, pages 361-368

DOI: 10.5220/0001892903610368

 SciTePress

2 DEFINITIONS AND PROBLEM

SETUP

After the data integration and surrounding concepts

(like the data cleansing) the focus shifts to the next

layer at which we can consider the ability to retrieve

information from a data warehouse. This level

doesn’t yet wear a well-known brand name, like Star

schema, or Data integration, but for this article

we’ve given it the name Observability of informa-

tion. Making advancements in this level won’t re-

quire merely to perfectly understand the way the

business sees the data and its structure. We will in-

stead see our opportunity in initiating change in the

way the business sees the data, in the way the busi-

ness does business, at least as far as its information

organization and information keeping side is con-

cerned. If this sounds somewhat vague or confusing,

that’s good; it means you need to read the rest of the

paper in order to clear thing out.

To wrap up the introduction and the first para-

graph of this section, we can say that thinking in this

layer of observability of the data is the topic of this

paper.

2.1 The Term Observability in other

Disciplines

The term observability originates from the control

theory where it is a measure for how well internal

states of a system can be inferred by knowledge of

its external outputs (Wikipedia, 2008)

In software industry the term has been used to

describe the lack of ability to assess which code ex-

actly is being executed (what’s going on) in complex

software environments. In conjunction with software

layering (which is a technique that enables us to e.g.

program an application without programming as

well a database engine, an operating system, etc.) the

term observability is used to describe the problem of

not being able to observe the unintentional actions

that are provoked in lower layers of software stack

while programming on a higher abstraction layer.

When mentioned in context of databases, ob-

servability again refers to software observability

with the meaning of being able to observe what’s

going on in a system.

The term “data observability”, besides being

used in control industry, is used in machine learning

to describe how much of the data on which the

learning is based is observable to the machine learn-

ing algorithm.

“Information observability” and “Observability

of Information” had been endemically used in eco-

nomics, sociology, risk management, etc.

2.2 Observability of Information in

Databases

Term “Observability of information in databases”

(or similar) couldn’t be found using internet search

engines therefore we assume it hadn’t been in use

yet.

With this term we want to go along with the

known “data” versus “information” versus “knowl-

edge”, as established in the data mining terminology,

where each of succeeding terms implies more con-

text, more meaning and structure.

For instance, we could have stored data in a da-

tabase saying that customer A has value for “cus-

tomer code” equal to 3. To a business user who un-

derstands the data, this data, 3, can for instance give

information that customer A resides in a foreign

country. Because all domestic customers have codes

1 and 2, and all foreign customers have codes 3 and

Let us define that in cases like this the informa-

tion (regarding residence) is stored in a database –

because it is possible to determine for each customer

in the customers table whether it is domestic or for-

eign customer. Also, let’s define that the information

is not explicitly written because in order to deter-

mine the residence for each customer (or any cus-

tomer in this case), one needs to know “something

more” – the meaning of codes 1, 2, 3 and 4.

Definition of observability can now be given as:

Information that is not explicitly stored in a database

is less observable than the one that is explicitly

stored.

3 EXAMPLES OF

UNOBSERVABLE

INFORMATION

The definition of observability in previous section

has been given through the obviously scholar exam-

ple. Our goal in this section is to enumerate typical

situations from the real world in which the informa-

tion in databases is stored in unobservable manner.

We give three such classes here, but without an

aspiration for the list to be exhaustive. Simply, the

aim of our research is not to thoroughly cover all

flavours in which unobservable information could

ICSOFT 2008 - International Conference on Software and Data Technologies

362

appear, but rather to raise awareness of the issue and

to offer the idea of how to handle it.

3.1 Free Text Fields

We start with the easiest one and the most obvious

example, but there is a special characteristic we want

to point out here.

Free text fields are, of course, necessary in every

database and in every data warehouse as well. But

its usage should be strictly limited to data which by

its nature is “free text” and doesn’t have structure,

like names.

3.1.1 When it Occurs

Free text for storing structured information in an

unstructured manner usually occurs when developers

of source, transaction, systems judge that some in-

formation is applicable only for small percentage of

entities in a class (rows in a table), that it is too

complex to model it properly, and – this one is deci-

sive – it won’t be used programmatically (in code)

so there is no point in bothering to develop support

for storing it in a structured way (which could in-

clude adding fields to existing tables, creating new

tables in a model, etc.).

When such decision is made it is usually wrong

because of at least two reasons. The first one is that

it will be used programmatically in data warehouse,

if it is not in transactional system. Someone will

want to group by this data point, filter by it, etc. oth-

erwise they wouldn’t ask it in the information sys-

tem at all in first place. Second reason is: it is not

being used programmatically today, but there is a

new feature request coming tomorrow.

3.1.2 Amending the Definition of

Observability of Data

And the characteristic that was mentioned in the

beginning: in the definition of observability it was

said that the information is unobservable if there is

“something else” one needs to know in order to un-

derstand the information. With free text fields this

often isn’t the case.

For instance, if the information about customer’s

incomes isn’t stored in the designated field titled

“income”, but instead the value “Income = 5.000” is

stored in the field “comment”, there is now extra

knowledge one needs to retrieve the information

from the data written in a database. All needed is

already written in the comment field. The hatch is

that the “instruction” on what actually is contained

in the text field is not readable by a computer; it is

readable only by a human.

If we would want to be very strict, we would

need to amend our definition of observability in or-

der to cover this case.

3.2 Transaction Streams and Similar

Second example we’ll look at are transaction

streams. Suppose there is a table with all transac-

tions that were done against one loan account in a

bank. In this data it is also written all behavioural

information one could ask about this loan: which

amount is due but not paid, what is the greatest

number of days in history of this loan that the due

amount was outstanding, what is the average time-

weighted past due amount, and so forth.

3.2.1 Comparing with the Free Text Fields

This example is somehow positioned on the opposite

pole then the previous case of free text fields. While

in the case of free text fields the information hidden

in data was obvious to human consumer, but unintel-

ligible to the computer, in this case the information

is impossible for a human to read out, and computer-

crunching is the only way to get to it. Still, there is a

far way to go from computers “needing” to extract

this information to them actually doing so.

This situation can appear in two ways. One is

when it takes extraordinary programmers’ effort in

order to design algorithm which gives accurate cal-

culation of what was asked. Example of this would

be to calculate e.g. how many days in total the loan

had outstanding due amount. (Perhaps this doesn’t

sound that complicated, but in real world situation it

would take months to crack this one.)

Second way is when the algorithm is trivial, but

is computationally highly demanding, thus still rep-

resenting significant implementation problem. Ex-

ample of such problem would be to calculate the

average balance in previous year for each account in

a bank, from the table of all bank’s last year transac-

tions.

3.2.2 Comparing with the Data Mining

The approach of extracting information through

waste number of calculations against raw data is

exactly what data mining does. There is a difference

though between a problem class we talk about here

and the class of problems that are usually addressed

through data mining (or classic statistics) techniques.

In theory, appropriate data mining scheme would

find a way to calculate some complex measure,

OBSERVABILITY OF INFORMATION IN DATABASES - New Spins in Data Warehousing for Credit Risk

Management

363

given this measure is relevant for the target attribute

(e.g. the probability of default for a customer). In

practice it is unlikely that any data mining algorithm

would be successful in case of very complex calcu-

lations. It would be unacceptably suboptimal to

leave such calculations to “good fortune”, instead of

doing them explicitly.

Therefore, there is no case for skipping to gener-

ate complex attributes (calculated measures) for

which is known to be relevant.

3.3 Coding Schemes

The third case mentions the coding schemes in con-

text of observability versus unobservability of in-

formation in a database.

The example is imperative of this paper because,

in contrast to first two examples, coding schemes are

in general rarely recognized as structure that is ag-

gravating the information retrieval rather than ren-

dering it easier.

3.3.1 What are Coding Schemes

Vocabulary definition of the term “coding scheme”

says: Coding scheme is a set of rules that maps the

elements of one set, the coded set, onto the elements

of another set, the code element set (Institute for

Telecommunication Sciences, 2000). The term is

mostly used in the telecommunications science.

In databases, coding scheme would be an entity

which is used to group, or hierarchise, instances of

some other entity. The elements (table rows) of cod-

ing scheme typically consist of fields “code” and

“name”. Code is usually a set of numerals and name

is the text field describing the meaning of the code.

For instance, let’s take our previous example of a

customer attribute “customer code”, with values 1,

2, 3 and 4; where 1 and 2 were resident customers

and 3 and 4 are foreign. If we would define a table

which would have four rows, numbers one to four in

one column and descriptions, e.g. “Resident natural

persons”, “Resident legal persons”, “Foreign natural

persons” and “Foreign legal persons” in second col-

umn, we could say we’ve defined a coding scheme.

Often in coding schemes the hierarchy is re-

flected already in the way codes are composed, de-

fined. So, in our example it would be typical to have

codes 00, 01, 10, 11 – instead of 1, 2, 3 and 4. Then

it could be defined that position one defines resi-

dence (domestic – foreign) and position two defines

whether customer is legal or natural person.

3.3.2 Why Coding Scheme is Bad Modelling

In the definition of unobservability it was said it is

unobservable that the value 1 carries the information

about customer being resident natural person, be-

cause the one querying the database needs to know

“something else” – the meaning of codes.

Seams that if we create the coding scheme table

we’ve solved the problem – the information is stored

in a database. Here is how it is: if we define two

attributes, one named “residence”, with values “For-

eign” and “Domestic” and another with values “Le-

gal person” and “Natural person”, we’ve done a

good job and we hadn’t introduced the coding

schemes problem. This is not the kind of a coding

scheme we are trying to depict here as being prob-

lematic.

The example of four values is minimalistic ex-

ample on which the idea was constructed. In reality

coding schemes grow to hundreds or thousands of

rows. Then the name of a category doesn’t say “do-

mestic” or “foreign”. It can say something like:

“Foreign natural person customer that is under su-

pervision of KS department and also has relation

with ABC bank in France.” So, the problem with

coding schemes is that the problem of no informa-

tion in database got “solved” in the way that the in-

formation is modelled in a free text field. That is

slightly better than no information – there is for in-

stance smaller probability for the needed piece of

information to get lost – but is still quite inadequate,

for the reasons discussed in subsection about free

text fields.

3.3.3 The Grand Coding Scheme

And now we are arriving to the greatest coding

scheme that one can encounter in a bank. It is the

chart of accounts. Chart of accounts is a list of all

accounts tracked by an accounting system. Wikipe-

dia further explains chart of accounts as: “should be

designed to capture financial information to make

good financial decisions”. This definition gives a

good idea about what chart of accounts used to be.

Chart of accounts is a hierarchical structure of

codes that group material value (either balances or

transactions). At top level the grouping is usually

done into: assets, liabilities, equity, income and ex-

penses. Each of these categories further breaks down

into company’s products or services, types of coun-

terparties to which the product or service is related,

and numerous other attributes, like: information

about terms, currencies, adjustments, risk provisions,

etc. The list is virtually indefinite. For any classical

accountant this is a must, a cornerstone.

ICSOFT 2008 - International Conference on Software and Data Technologies

364

In order to understand their perspective we need

to do some time travelling. Let’s go for example

back to nineteen sixties and imagine how business

reporting might function at that time. Today, we

crunch millions of rows of analytical data about cus-

tomers, or transactions, using large number of attrib-

utes for filtering and grouping the data. But what

could we do without computers? The answer is sim-

ple: there is only a finite, rigorously limited, and

relatively small number of groupings we can do: we

can group only on attributes that are included in the

chart of accounts coding scheme.

We can easily show the total amount of assets or

liabilities the company holds. We can further see

how much is in short term instruments or deals and

how much in long term, given short-long term is the

next category that divides the accounts. But, for in-

stance, if there is no special account for short term

loans given to banks and a special account for short

term loans given to companies, but only one account

for all short term loans, there is no way to give out

those two figures in a report. It could be done only

with going back to analytics and then getting to

those numbers summarizing transaction by transac-

tion. At that is unrealistic to do without the com-

puters.

Of course from nowadays perspective, when we

can always go back to analytics and issue a GROUP

BY statement on any attribute and (relatively)

quickly get a result, all this is nonsense. We could

just ignore the chart of accounts and everything

about it. But it is not simple to do that at all. We

should bear in mind that this is one of the oldest sur-

viving management tools. The way the accounting is

done hasn’t much changed since the double-entry

bookkeeping was first introduced by Benedikt

Kotruljević of Dubrovnik Republic back in 1458

A.D. (Phillips, Brook, 2003). It is understandable

that something surviving for so long is not so easy to

kill.

The problem is that entire system is built around

these accounts. The way businesses, banks function

are built on top of chart of accounts. Neural net-

works in human brains are adapted to this perspec-

tive of the world. Practically, these are the issues:

1. The data included in the chart of accounts

doesn’t exist outside of it – there was no need to

develop model in database in order to support

handling the data that is “already included” in

chart of accounts;

2. If there is data outside of account codes, it is

wrong or unreliable – since all the reports are

based on the accounts it was hard to keep disci-

pline and maintain the accuracy of the data

which is anyway not being used;

3. It is not actually known what the business de-

mands are outside of accounts, business people

“speak in accounts”. The report specification

doesn’t say: “summarize the balance of due in-

terest amount of all loans to B type customers”.

Instead, it says: “summarize the balances of all

accounts that start with numbers 3, 4 or 5, that

have numbers 1 and 0 on fourth and fifth place,

but don’t end with 7, or have number 2 in sixth

place …”

4. The entire system is built on accounts. It works

(yes, but how?). It would be too big change, it’s

impossible.

These are the main complications that keep ac-

counts alive. Not just alive, a central reporting tool.

4 SHORTCOMINGS OF

DATABASE WITH

UNOBSERVABLE

INFORMATION

We’ve used the central part of the paper for explain-

ing which cases we call “cases of unobservable in-

formation”, where these cases appear, why they ap-

pear, and why it is hard to eliminate them. Now let’s

spend some paragraphs on describing what exactly

the impact of unobservable data on our system is.

Again, as when listing the different cases of the un-

observable data, we want to depict a couple (three)

of such cases, but without an attempt to be exhaus-

tive.

4.1 Relying on Accounts in Reporting

That’s why they, the accounts, were created in a first

place – to support reporting. So why should this be a

problem?

If we remember the point three in previous sec-

tion, where the report specification was given

through the “language of accounts”, we can imagine

there will be countless such specifications in modern

organizations. And it is becoming huge burden to

maintain them.

Some decades ago reporting was not as dynami-

cal as today. There was a set of standard reports that

needed to be done every so, period. Today, there is

an explosion of demands for ever new and ever more

complex reports, both from regulatory authorities

and business users. Continuously, the scope of what

OBSERVABILITY OF INFORMATION IN DATABASES - New Spins in Data Warehousing for Credit Risk

Management

365

is being reported on expands. How does that change

the role of chart of accounts in reporting from being

a great tool to being a nuisance? Let’s construct a

small case study.

Suppose there is a report in a bank that groups

balances according to the classification of customers

that hold those balances. Since chart of accounts the

bank uses first groups by products, and then by cus-

tomer types, each customer type will appear repeat-

edly in different branches of hierarchical tree of ac-

counts, once under each product. So the list of ac-

counts that should be taken into consideration for the

report is “number of products” times “number of

customer types”.

Now let’s suppose a new product is added to

bank’s portfolio. In chart of accounts, a new branch

will be created for that product. In it, accounts will

be created for each customer type. The list of the

accounts in specification of the mentioned report

needs to be amended. This is the case.

Projected to real world: because of new reporting

demands, business growing, mergers, etc, chart of

accounts – which perhaps once was a stable struc-

ture – became live and is changing daily. There are

thousands of lists of accounts that need to be

amended (in worst case) every time the chart of ac-

counts is changed.

Now let’s take a look at alternative design. The

report hadn’t been implemented through a list of

accounts, but instead it relies on CUSTOMER ta-

ble’s “customer type” attribute. In order to prepare

the report the SQL join needs to be performed be-

tween table ANALYTIC_BALANCES and table

with customers. The result set needs to be grouped

by the customer type attribute, while the AMOUNT

column needs to be treated with SUM operator.

Bank starts a new product. New row needs to be

added in PRODUCTS table. There is no impact on

aforementioned report whatsoever.

The practical consequences of difference be-

tween these two approaches are immeasurable.

4.2 Code Stability

Often, there is code which executes conditionally,

depending of the value of one or more attributes of

the data. For instance, interest rates need to be calcu-

lated and then debited or credited to correct ac-

counts; warnings need to be sent to owners of out-

standing past due debts, etc.

Again, as with the reporting, exact specification

on how this should be done rely either on account

codes, or involved-entity attributes. All considera-

tions mentioned while discussing reporting hold.

In both cases there is inclination towards making

errors, if the “relying on accounts” principle is being

practiced. Not every time every list of accounts that

is impacted will be correctly amended. With time the

situation deteriorates. The lists that were supposed to

specify the same groups of accounts (the same

amount of money in the end) won’t do so, because in

time there were different mistakes done in maintain-

ing those lists.

Regarding the inclination towards errors, it is no-

table that mistakes done by the code that manipu-

lates data (like automatic booking of interest) are

more grave then the ones that only produce errors in

reports.

4.3 Data Mining Algorithms

It is impossible to use any information that is hidden

in text fields or in coding schemes for data mining.

Because of the nature of data mining algorithms it is

preferable to keep every independent piece of infor-

mation in a separate, well defined, attribute – table

column. (Witten, Frank, 2000)

What has just been said is notorious to anyone

who ever dealt with data mining in practice.

However, we must understand the business users

could see things somewhat differently. For instance,

they could infer: through chart of accounts the in-

formation A is clearly specified. The consultants,

experts on data mining, told me that if this informa-

tion is available, we can reach the demanded goal.

“Data mining algorithms” might be a too narrow

title for this section. Besides data mining algorithms

there is a wider field of modern trends of automating

decision making or support for decision making.

Balanced scorecards that are being developed for

monitoring the performance of the business, credit

scoring scorecards for automating processes of loan

approval and other similar techniques don’t neces-

sarily rely on some machine learning scheme. But all

those modern techniques have in common that they

cannot make use out of unobservable data in a data

warehouse.

5 PATHS TO SOLUTIONS

Solving the problems described in this paper will

surely take much more work from the wider pond of

researchers and people that encounter these prob-

lems in the industry.

However, as we were encountering them in our

work, we began to treat them and we’ve tried out a

couple of strategies and approaches to combat these

ICSOFT 2008 - International Conference on Software and Data Technologies

366

problems and annul their consequences. Therefore, it

is probably worth sharing some of these approaches

that might have potential.

5.1 “Atributising” Coding Schemes

It has been said before that it is unpractical to simply

abandon coding schemas, and the strongest reason is

because too much has been built on them.

Huge practical problem is incompatibility of two

ways of constructing logics. If we would simply

replace logic built on chart of accounts with logic

built on attributes of customers, contracts, balances

and other entities it would be extremely hard to

compare the old and the new ways in order to check

for errors in new methodology and clear out those

errors.

We’ve come up with a two stage approach that

chains two logics and builds attributes based logic

on top of chart of accounts logic! This might sound

awkward but it actually works great and already

improves the model significantly.

So, what’s been done: We identify a piece of

logic, let’s take example from earlier: a report that

groups balances according to the classification of

customers that hold those balances. There is an old

logic in place: a list of accounts connected to each

category in customer classification is maintained.

The next thing we do: we implement a new at-

tribute in the chart of accounts table, being the cus-

tomer category for which the account is meant. We

(the accounting department actually) fill in the val-

ues for the new attribute for each account in chart of

accounts table.

Now we replace the old logic of list of accounts

with the new logic built on new attribute in the ac-

counts table. We can directly compare new logic

with the old one – because the new logic doesn’t

select only the list of customers, it selects the list of

accounts as well, and two lists of accounts can be

directly compared.

And the major benefits of switching logic have

already been achieved: if new accounts are opened

for new products, in the process of opening those

correct values for the new attribute “customer type”

will be chosen and there will be no need to amend

thousands of lists of accounts. Secondly, as the logic

has been switched to an attribute that specifically

says the account belongs to “group A” there is no

more danger that new account will be added to

group A accounts in one list and not added in an-

other – the list is defined as “list of accounts that are

group A” and the new account will, depending

whether the attribute is set or not, be either added to

group A in all lists, or none. This is much better than

the case when error can appear in only some lists,

because that way it’s much more probable that it will

stay unnoticed.

The process ends when for each piece of infor-

mation hidden in chart of account codes a separate

attribute in chart of accounts table has been created.

This at the same time means that all unobservable

information from chart of account codes has been

exposed in explicit attributes.

We are not far now from switching from these

newly created attributes to natural attributes (natural

place for the customer type attribute is still custo-

mers table, not chart of accounts table). But this still

might not happen because of, for instance, legal rea-

sons: still, in many of the countries, if not all, the

regulations demand the booking and reporting to be

done based on the chart of accounts, therefore we

cannot throw it away.

5.2 Decomposing Coding Schemes

Second technique we might want to employ is split-

ting one coding scheme into more. Coding scheme

that groups e.g. customers both into geographical

hierarchy and industry they are operating in could be

split into two hierarchies: one for geography and

another for industries.

In this way the step from having attribute with

unobservable information we can move towards

having attributes with observable information.

5.3 Pruning Coding Schemes

If an entity, e.g. “contracts” has several attributes,

one of which is coding scheme and other are nice,

observable, nominal attributes; further, if some in-

formation is contained both in the codes of the

schema and again in some of the independent attrib-

utes; then coding scheme should be pruned in order

to exclude repeated information.

Suppose we have coding scheme “products” at-

tached to entity contracts. It divides the contracts

into loans, current accounts, guaranties, etc, but it

also divides them into short and long term, domestic

or foreign account or similar. So a single code from

the coding scheme can have the meaning like: “short

term loan in local currency”.

The information that the loan is in local currency

exists in two places: in the code and in separate at-

tribute “currency”. What we can do in this case is

restructure the coding scheme in such way that it is

impossible to read out from it any information which

is stored in other attributes.

OBSERVABILITY OF INFORMATION IN DATABASES - New Spins in Data Warehousing for Credit Risk

Management

367

Or, for practical reasons, we might want to create

new coding scheme which includes only information

not contained in other attributes.

6 CONCLUSIONS

We’ve started with very wide considerations on is-

sues of easy information retrieval from data ware-

houses in introduction section. After naming techni-

cal and data integration aspects we’ve focused on

new topic of consideration and called it “observabil-

ity of information in databases”.

After a discussion on this new topic, describing,

defining and viewing it as a whole in section two

and first two parts of section three, we’ve moved in

last part of section three to a particular niche within

observability topic where we see the biggest benefits

could be achieved. It is the area of coding schemes

and mostly concentrating on one specific: chart of

accounts.

The reason for this is that it is still the central

point for reporting and analyses in today’s banks,

although it is huge break in processes of moving to

analytics, as presented in section four.

In section five we’ve presented a viable approach

that gives possibility to effectively cancel out nega-

tive aspects of unobservability of information in the

chart of accounts. At the same time, this approach

leaves the possibility to keep chart of accounts in

place in which it needs to be due to current regula-

tions.

REFERENCES

Bratbergsengen, K., 1984. Hashing Methods and Rela-

tional Algebra Operations. Proceedings of the 10th In-

ternational Conference on Very Large Data Bases.

Morgan Kaufmann Publishers Inc.

Institute for Telecommunication Sciences, 2000. Tele-

communication Standard Terms Glossary. 2006 edi-

tion.

Kimball, R., 1996. The Data Warehouse Toolkit: Practical

Techniques for Building Dimensional Data Ware-

houses, John Wiley & Sons.

O’Neil, P., 1987. Model 204 Architecture and Perform-

ance. Proceedings of the 2nd International Workshop

on High Performance Transaction Systems. Springer-

Verlag, London.

Phillips, T., Brook, S., 2003. The Romance of Double-

Entry Bookkeeping, American Mathematical Society,

https://ams.org/featurecolumn/archive/book1.html

Witten, I., Frank, E., 2000. Data Mining: Practical Ma-

chine Learning Tools and Techniques with Java Im-

plementations, University of Waikato, Morgan Kauf-

mann Publishers.

Wikipedia, 2008.

http://en.wikipedia.org/wiki/Observability

ICSOFT 2008 - International Conference on Software and Data Technologies

368