A CLUSTER FRAMEWORK FOR DATA MINING MODELS

An Application to Intensive Medicine

Manuel Santos

Departamento de Sistemas de Informação, Universidade do Minho, Guimarães, Portugal

João Pereira

Escola Superior de Educação de Viana do Castelo, Viana do Castelo, Portugal

Álvaro Silva

Serviço de Cuidados Intensivos, Hospital Geral de Santo António,Porto, Portugal

Keywords: Clinical Data Mining, Clustering, Knowledge Discovery from Databases, Artificial Neural Networks, Organ

Failure, Mortality Predicting Models, Intermediate Outcomes, Intensive Medicine.

Abstract: Clustering is a technique widely applied in Data Mining problems due to the granularity, accuracy and

adjustment of th

e models induced. Although the referred results, this approach generates a considerable

large set of models, which difficult the comprehension, the visualization and the application to new cases.

This paper presents a framework to deal with the enounced problem supported by a three-dimensional

matrix structure. The usability and benefits of this instrument are demonstrated trough a case study in the

area of intensive medicine.

1 INTRODUCTION

Medical prognosis has played an increasing role in

health, namely in the critical care medicine. This

fact induced the medical community to take a more

active interest in developing models for mortality

prediction and organ failure diagnosis based on

Artificial Intelligence (AI) techniques (Hanson et al,

2001), that make possible the doctors pro-active

action. This is, as it can be easily understood, a

critical task, since the premature detection of

malfunctions in the organism may allow physicians

to respond quickly with therapy. In this context, the

existence of large Databases (DB) containing

Intensive Care Units (ICU) clinical information,

motivate and enable the application of Data Mining

(DM) techniques (Cios et al, 2002), in a Knowledge

Discovery Database process (KDD), to induce

prediction models of organ failure in a more

efficient way than other approaches (e.g., Logistic

Regression) (Gilles et al, 2001). The Sequential

Organ Failure Assessment (SOFA) (Vincent et al,

1998; Moreno et al, 1999) scores the dysfunction

degree of an organ. It can be set to values from 0 to

4 representing the organ state. Moreover, multiple

organ failure (Goris et al, 1985) highly increases the

probability of the patient’s death. This score is

evaluated by the doctors on a daily basis taking

considerable costs and time to be obtained.

Obviously, this process is fallible and dependent on

the doctor’s expertise.

Previous work in this area provided predictive

odels characterized by its generality, consequently,

associated to limited values of accuracy, specificity

and sensitivity. The major question concerning the

efficiency of such models is the patient individual

adjustment. This work envisages the resolution of

that bottleneck, proposing a framework for

clustering the patient’s prediction models, allowing

the disposition of a set of predictive models (e.g.,

decision trees, artificial neural networks) in a three

dimensional matrix.

Considering the admission data and other

vari

ables taken on the admission day, as well as

163

Santos M., Pereira J. and Silva Á. (2005).

A CLUSTER FRAMEWORK FOR DATA MINING MODELS - An Application to Intensive Medicine.

In Proceedings of the Seventh International Conference on Enterprise Information Systems, pages 163-168

DOI: 10.5220/0002523601630168

 SciTePress

Clinical Adverse Events (CAEs) occurred during the

patient’s stay in the ICU, it is possible to predict the

failure of each organ for the day following the last

day of collected data (time series). A total of 72

models were created using a data set created from

the EURICUS II study made in 42 ICUs on 9 UE

countries, between 1997 and 1999

(http://www.frice.nl). The results showed the

effectiveness of the proposed approach. Five of the

clusters presented maximum values (100%)

simultaneously for the accuracy, specificity and

sensitivity. In these kinds of patients the doctors will

get very useful support to their decisions.

The paper is organized as follows: after this

introductory considerations, the second and third

sections present the clinical data and some

definitions about events and critical events; the

fourth and fifth sections introduce the process of

data preparation, transformation and model

generation; the last two sections, preceding the

eighth one that concludes the article, are dedicated to

the results (presenting the achieved accuracies) and

to the contributions (the framework to organize the

models).

2 CLINICAL DATA

In this study a database was created based on

EURICUS II, a study made in 42 ICUs on 9 UE

countries, between 1997 and 1999. For a period of

10 months every admission to the ICU was included.

This database integrates the features related to the

case-mix (Fetter et al, 1980), namely the Age, the

Type of Admission (unscheduled surgery, scheduled

surgery and medical), the Admission Source

(Operating Bloc, Recovery Room, Emergency

Room, Infirmary, other ICU, other Hospital, other

sources), Diagnosis, Gravity Index defined by

SAPSII (Le Gall et al, 1993), SOFA of each Organ

System (Respiratory, Coagulation, Liver,

Cardiovascular, Central Nervous and Renal),

Mortality in the ICU and in the Hospital; Number of

CAEs for each of the parameters monitored

continuously, Length of Stay and Admission Day.

By definition, an organ is considered to fail when

its SOFA score is higher or equal than 3 in a 0 to 4

scale.

In this study, from the 5355 patients admitted to

the ICUs only 4425 (82.63%) stayed for two or more

days, 3105 (57.98%) stayed three or more days and

2329 (43.49%) four days or over. For the data

concerning the fifth day of stay, only 1845 (34.35%)

patients were considered.

3 CLINICAL ADVERSE EVENTS

Events (Ev) or Critical Events (CrEv) are the

occurrences of values out of the established limits

for the four physiologic variables that are monitored

continuously. These variables are the Heart Rate

(HR), the Systolic Blood Pressure (BP), the Oxygen

Saturation (SaO2) and the Urine Output (Diur). A

group of clinical specialists determined the intervals

considered normal for each one of these parameters.

Adverse events were defined as binary variables,

whose values correspond to one of two situations, in

that the variable is within or not of the established

limits (if yes, by how long). We considered as an

Event when the value of the analyzed parameter

maintains out of the limits, for a period equal or

superior to a continuous period of 10 min. (1 h. in

the case of Diur) and less than 60 min. (2 h in the

case of Diur).

It is still considered an Event when, in a

discontinuous way, values are verified out of the

limits, but that are inferior to 10 min. and in a period

of time of 30 min. maximum, since the sum of those

is greater or equal to 10 min.

The definition of Critical Event is similar to the

Event, but with different values. The times of 10

min. referred in the definition of Events, should be

replaced by 1 hour, the 30 min. for 2 hours and when

we refer to Diur, we consider 2 hours instead of 1

respectively.

A Critical Event can also be defined in some

special situations, i.e., when the value of the

analyzed parameter places among certain values.

We only can consider a new event, after a

recovery period of 30 min. or more for BP, SaO2

and HR, and of 2 hours or more for Diur, with

values inside of the intervals. In Critical Events, it

should be considered a period greater than 2 hours

for Diur and greater than 60 minutes for the

remaining ones.

4 DATA PREPARATION

A data preparation phase has been necessary to treat

the wrong or omitted data. Besides, not all the

variables were considered to generate the prediction

models, as it is the case of the age, once it is already

considered within SAPSII score.

Table 1 shows the variables that were considered

in this study and their description. For modelling

purposes, six new binary variables were created,

based in the six SOFA values, according to the

expression:

0 , if SOFA

Org

< 3 (false, no organ failure)

1 , else (true, organ dysfunction)

ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

164

where org ∈ {Respiratory, Coagulation, Liver,

Cardiovascular, Central Nervous, Renal } stands for

the organ system.

Table 1: Variables Description

* Not considered for the prediction models. They

were only considered to build the clustering

framework.

** Dependent variables.

*** Variables just considered in the first day.

Once we intend to predict an organ failure in a

certain day, based in the data of previous days, it

was necessary to transform the database structure, in

order to capture a temporary sequence of the

variables (time series).

The variables AdmFrom, AdmType, SapsII and

Diagn are obtained once (in the first day) but their

values are considered in all situations of organ

failure predictions.

For the construction of the various models, the

SOFA values were not considered as input. Instead

the number of Events and Critical Events registered

for these two days were considered for the

prediction in cause.

We just considered a temporary horizon of five

days, because, in medical terms, the fifth day of stay

in an ICU is considered a critical point in terms of

the evolution of the patient's clinical state. The first

day was not considered for prediction purposes,

once the goal is to predict organ failure based on the

data collected in the previous days. However, it is

considered as input for all the other ones.

5 MODELS DEFINITIONS AND

CONSTRUCTION

Making use of SPSS Clementine tool

(http://www.spss.com), we submitted the database to

a Kohonen Network (Kohonen, 1995), to segment it

in three distinct groups. Later, it was fallen back

upon the C5 (http://www.rulequest.com) algorithm

in way to generate a model of decision trees to

understand each one of those clusters. This way, we

obtained 3 models for each one of the dependent

variables, and for each one of the days of stay in the

ICU (18 models for each day).

Variable Description Domain

ID Patient numbe

Respirat Respiration Syste

{0,1,2,3,4} **

Coagulat Coagulation Syste

{0,1,2,3,4} **

Live

Liver Syste

{0,1,2,3,4} **

Cardiova Cardiovascular Syste

{0,1,2,3,4} **

Cns Central Nervous Syste

{0,1,2,3,4} **

Renal Renal Syste

{0,1,2,3,4} **

Nrbpevnt Number of

Events/day {0,1,…,24}

Nrbpcriv Number of

Critical {0,1,…,10}

Nrofhrev Number of HR Events/day {0,1,…,24}

Nrofhrc

Number of HR Critical {0,1,…,10}

Nrofo2ev Number of O2 Events/day {0,1,…,24}

Nrofo2c

Number of O2 C

itical {0,1,…,7}

Nrofurev Number of Diu

Events/day {0,1,…,24}

Nrofurc

Number of Diu

Critical {0,1,…,7}

Admfro

Admission From {1,2,…,7} ***

Admtype Admission Type {1,2,3} ***

SapsII Simplified Acute Physiology {0,1,…,118} ***

Diagn Diagnostic {0,1} ***

After having selected the most appropriated

variables to the generation of models, a Kohonen

Network was applied to the database, in order to

create two additional variables, which correspond to

the coordinates assigned at each record (identifying

the cluster that it belongs). These coordinates make

possible the partition of the patients into three

clusters. Later, applying the C5 algorithm to each

cluster is possible to generate the respective decision

tree.

The validation of those models was made

through a 10-fold cross validation method (Dubitzky

et al, 2001). Finally, the achieved results were

analysed by means of a confusion matrix, a matrix

of size L × L, where L denotes the number of

possible classes (Kohavi et al, 1998),

6 ACHIEVED ACCURACIES

The confusion matrix is a common tool for

classification analysis, this matrix is created by

matching the predicted and actual values. When L =

2, there are four possibilities (Table 2): the number

of correct positive - True Positive (TP), correct

negative - True Negative (TN), incorrect positive -

False Positive (FP); and incorrect negative - False

Negative (FN) classifications.

Table 2: The 2 × 2 confusion matrix

↓ actual / predicted → Negative Positive

Negative TN FP

Positive FN TP

From this table, three accuracy measures can be

defined (Essex, 1995): the true Positive Rate (PR),

also known as sensitivity, recall and Type II Error;

the true Negative Rate (NR), also known as

specificity, precision and Type I Error; and the

Predictive Accuracy (PA), which gives an overall

evaluation.

A CLUSTER FRAMEWORK FOR DATA MINING MODELS - An Application to Intensive Medicine

165

These metrics can be computed using the

following equations:

x100%

TPFN

x100%

FPTN

x100%

TPFNFPTN

TPTN

+++

In the Table 3, we can see the results of the

predicting models of the fifth day of stay.

Table 3: Results for the fifth day

As we can see, the accuracies achieved are quite

good, in some situations, we achieved the maximum

values (100%). However, these are the clusters that

have fewer patients, between 62 and 65 in a universe

of 1845. Clusters 0 and 2 contain between 649 and

1134 patients.

These results were possible due to the approach

adopted, as well as the use of misclassification costs

that allow us to specify the relative importance of

different kinds of prediction errors.

7 CLUSTERING FRAMEWORK

The Data Mining process created 72 prediction

models plus a higher order classification model

(based on a decision tree) that matches a patient to

the respective prediction model.

To deal with this complexity and to make more

explicit the relation patient vs. organ failure

prediction model, was considered a visualization

framework. In this framework, the prediction models

are denoted by:

m(d, o, c, pa, se, sp)

where d ∈ {2,3,4,5} stands for the day of the

stay, o ∈ {Renal, Central Nervous, Cardiovascular,

Liver, Coagulation, Respiratory} stands for the

organ, and c ∈ {0,1,2} for the cluster. The last

arguments are the Predictive Accuracy (pa), the

Sensitivity (se), and the Specificity (sp). These

models may be organized in a cube that makes

possible the graphical presentation of the patient

course along the stay in ICU as we can see in the

Figure 1.

Cluster 0 Cluster 1 Cluster 2

PR 92,45% 100,00% 97,37%

NR 95,04% 100,00% 93,76%

Respiratory System

PA 94,43% 100,00% 94,61%

PR 100,00% 100,00% 91,67%

NR 98,51% 100,00% 99,44%

Coagulation System

PA 98,61% 100,00% 99,03%

PR 100,00% 100,00% 88,24%

NR 99,91% 100,00% 99,84%

Liver System

PA 99,91% 100,00% 99,54%

PR 94,26% 100,00% 95,73%

NR 93,07% 100,00% 96,62%

Cardiovascular

System

PA 93,29% 100,00% 96,46%

PR 93,88% 100,00% 88,98%

NR 95,51% 98,31% 95,98%

Central Nervous

System

PA 95,23% 98,44% 94,61%

PR 92,31% 100,00% 98,28%

NR 98,06% 100,00% 98,14%

Renal System

PA 97,53% 100,00% 98,15%

Figure 1: Clustering Framework

For a given patient we have a prediction model

for each one of the 6 organs (o) indexed to the day

of stay (d) and the correspondent cluster (c). Be

noticed that, in the same day, the correspondent

models of a particular patient may belong to

different clusters.

ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

166

In the example presented in the Figure 1, the

prediction models for day 2 are given by:

m(2, Renal, 1, 97.74, 90.91, 98.19);

m(2, Central Nervous, 2, 89.10, 68.56, 93.72);

m(2, Cardiovascular, 0, 84.95, 73.91, 87.68 );

m(2, Liver, 2, 99.18, 74.19, 99.52);

m(2, Coagulation, 0, 97.35, 71.70, 98.59);

m(2, Respiratory, 1, 96.30. 90.00, 97.18).

As we can see, the prediction model for the

Central Nervous system of this patient changed from

the cluster two to cluster one, and the Predictive

Accuracy, Sensitivity and Specificity changed also.

For perception convenience, this cube can be

split into three layers, one for each cluster, where the

validity of each model is represented by a grey scale

(Table 4). The darkest tone revealed higher

accuracies (as in the cluster 1).

The Figure 2 shows the Predictive Accuracies

transformed in a four tones of gray scale.

Figure 2: Predictive Accuracies

The cells correspondent to the day 5 represents

the values indicated in Table 3. The rest represent

the remaining Predictive Accuracies of the others

days of stay in the ICU. This way we can, in a visual

and easy form, distinguish the zones of interest for

prediction.

Table 4: Predictive Accuracies scale

From 95,01% to 100%

From 90,01% to 95,00%

From 85,01% to 90,00%

Up to 85,00%

The database segmentation criteria, for the third

day of stay related with the Renal system failure, can

be visualized under the form a decision tree as

following:

admtype =< 2 (1329)

admfrom =< 1 (922, 1.0) -> 0

admfrom > 1 (407)

admtype =< 1 (309, 1.0) -> 0

admtype > 1 (98, 1.0) -> 1

admtype > 2 (1776)

admfrom =< 1 (12, 1.0) -> 1

admfrom > 1 (1764, 1.0) -> 2

In this case, the variables that determined the

classification in three clusters were the Admission

Type and Admission From. The values presented

between parentheses stand for the support level and

the confidence level, respectively.

As we can see, there is only one rule that respect

to cluster 2, and two rules for each one of the

clusters 0 and 1. If the admission type is medical,

and the admission from is other then Operating Bloc,

the patient will be in the cluster 2. This rule was

applied to 1764 cases.

As we could see in the framework, the first day

of stay was not considered, once it doesn't make

sense to predict organ failure for this day, because

the only data we have was collected in the same day.

Each of the 72 models referred in the framework

correspond to decision trees generated by the C5

algorithm. Consider for example the decision tree

that predicts the Central Nervous system for the fifth

day, in the cluster 0:

sapsii =< 54 (905)

nrofhrcr4 =< 1 (870)

nrofo2ev1 =< 0 (687)

……..

nrofo2ev1 > 0 (183)

diagn =< 0 (140)

nrofhrev3 =< 0 (119)

admfrom1 =< 3 (43, 1.0) -> 0

admfrom1 > 3 (76)

nrofo2ev1 =< 4 (70)

A CLUSTER FRAMEWORK FOR DATA MINING MODELS - An Application to Intensive Medicine

167

As we could see in the tree, we could say that if a

patient has a SAPSII score less than 54, equal or less

than one critical events of heart rate in the fourth

day, at least one event of O2 in the first day, with a

diagnostic non operative, no events of heart rate in

the third day, and with admission Source of

Operating Bloc, Recovery Room or Emergency

Room, the central nervous system will not be in

failure.

This is the kind of information that is really

important in an ICU environment in a decision

support context.

8 CONCLUSIONS AND FURTHER

WORK

In this study, we presented a clustering framework,

with the purpose of identifying and applying the

model generated for the cluster in which a patient

frames to, according to his characteristics. The

majority of the models revealed high accuracies,

which is very useful in a decision support context.

The gains of this approach can be summarized as

follows:

- A matrix to dispose and explore the

models;

- A system to apply the models to a

particular patient through a process based

on three indexes: the day, the organ and the

cluster;

- An explicit way to declare the best and the

worst predictive zones (models) based on

assessment metrics such the accuracy, the

specificity and sensitivity. The doctors

know exactly what is the value and

usability of the models and its prediction.

- An alternative or complementary formalism

of knowledge representation and

visualization for decision support.

Further work will include the study of meta-

learning techniques in order to maintain the matrix

in dynamic environments (as the ICU), as well the

graphic technologies to support the visualization and

interaction with the framework, enabling the

construction of intelligent decision support systems.

REFERENCES

Cios K., Moore G., 2002. Uniqueness of Medical Data

Mining. Artificial Intelligence in Medicine.

Dubitzky W., Granzow M., Berrar D., 2001. Data Mining

and Machine Learning Methods for Microarray

Analysis, Methods of Microarray Data Analysis. In

CAMDA'00, 5-22, Kluwer Academic Publishers.

Elkan, Charles, 2001. The foundation of Cost-Sensitive

Learning. Proceedings of the Seventeenth

International Joint Conference on Artificial

Intelligence.

Essex-Sorlie D., 1995. Medical Biostatistics &

Epidemiology: Examination & Board Review.

McGraw-Hill .

Fetter RB, Shin Y, Freeman JL, et al, 1980. Case-mix

definition by diagnosis-related groups. Medical Care

18, 1-53.

Gilles Clermont, Derek C. Angus, Stephen M. DiRusso, et

al, 2001. Predicting hospital mortality for patients in

the intensive care unit: A comparison of artificial

neural networks with logistic regression models.

Critical Care Medicine, Volume 29, Number 2,

291:296.

Goris RJ, te Boekhorst TP, Nuytinck JK, Gimbrere JS,

1985. Multiple organ failure. Generalized

autodestructive inflammation? Arch Surg, 120:1109–

1115.

Hanson William C., Marshall Bryan E., 2001. Artificial

Intelligence applications in the intensive care unit.

Critical Care Medicine, Volume 29, Number 2,

427:435.

Kohavi R. and F. Provost, 1998. Glossary of Terms.

Machine Learning, 30(2/3):271–274.

Kohonen, T., 1995. Self-Organizing Maps. Springer-

Verlag.

Le Gall JR, Lemeshow S, Saulnier F., 1993. A new

simplified acute physiology score (SAPS II) base on an

European/North American multicenter study. JAMA

270:2957-2963.

Moreno R., Vincent J-L., Matos R. et al, 1999. The use of

maximum SOFA score to quantify organ

dysfunction/failure in intensive care. Results of a

prospective, multicentre study. Intensive Care Med 25,

686-696.

Vincent J., De Mendonça A., Cantraine F. et al, 1998. Use

of the SOFA score to assess the incidence of organ

dysfunction/failure in intensive care units: results of a

multicenter, prospective study. Critical Care Medicine,

Volume 26, 1793-1800

ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

168