CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA

Jos´e M. Cadenas, M. Carmen Garrido and Raquel Mart´ınez

Dpt. Engineering Information and Communications, Faculty of Informatic, University of Murcia

Campus Espinardo, Murcia, Spain

Keywords:

Fuzzy partition, Imperfect information, Fuzzy random forest ensemble, Imprecise data.

Abstract:

Classiﬁcation is an important task in Data Mining. In order to carry out classiﬁcation, many classiﬁers require

a previous preparatory step for their data. In this paper we focus on the process of discretization of attributes

because this process is a very important part in Data Mining. In many situations, the values of the attributes

present imprecision because imperfect information inevitably appears in real situations for a variety of reasons.

Although, many efforts have been made to incorporate imperfect data into classiﬁcation techniques, there are

still many limitations as to the type of data, uncertainty and imprecision that can be handled. Therefore, in this

paper we propose an algorithm to construct fuzzy partitions from imprecise information and we evaluate them

in a Fuzzy Random Forest ensemble which is able to work with imprecise information too. Also, we compare

our proposal with results of other works.

1 INTRODUCTION

The construction of fuzzy intervals in which a con-

tinuous domain is discretized suppose an impor-

tant problem in the area of data mining and soft-

computing due to the determinate of these intervals

can deeply affect the performance of the different

classiﬁcation techniques (Au et al., 2006).

Although there are a lot of algorithms to dis-

cretization, most of them have not considered that

sometimes the information available to construct the

partitioning are not as precise and accurate as de-

sirable. However, imperfect information inevitably

appears in realistic domains and situations. Instru-

ment errors or corruption from noise during experi-

ments may give rise to information with incomplete

data when measuring a speciﬁc attribute. In other

cases, the extraction of exact information may be ex-

cessively costly or unfeasible. Moreover, it might be

useful to complement the available data with addi-

tional information from an expert, which is usually

elicited by imperfect data (interval data, fuzzy con-

cepts, etc). In most real-world problems, data have

a certain degree of imprecision. Sometimes, this im-

precision is small enough for it to be safely ignored.

On other occasions, the imprecision of the data can be

modeled by a probability distribution. However, there

is a third kind of problem, where the imprecision is

signiﬁcant, and a probability distribution is not the

most natural way to model it. This is the case of cer-

tain practical problems where the data are inherently

fuzzy (Bonissone, 1997; Casillas and S´anchez, 2006;

Garrido et al., 2010; Otero et al., 2006).

When we have imperfect data, we have two op-

tions: The ﬁrst option is to transform the original

data for another kind of data which our algorithm can

work; The second one is to work directly with original

data without carrying out any transformation in data.

When we choose the ﬁrst option, we can lose infor-

mation and therefore, we can lose accuracy. For this

reason, it is necessary to incorporate the handling of

information with attributes which may present miss-

ing and imprecise values in the discretization algo-

rithms.

In this paper we present an algorithm, which we

call EOFP (Extended Optimized Fuzzy Partitions)

that obtains fuzzy partitions from imperfect informa-

tion. This algorithm extends the OFP

CLASS algo-

rithm (Cadenas et al., 2010) to incorporate the man-

agement of imprecise values (intervals and fuzzy val-

ues) in continuous attributes and set-valued classes

(imprecise values for the attribute class).

EOFP Algorithm follows the steps of a top-down

discretization process with four iterative stages (Liu

et al., 2002): 1.- All kind of continuous values in the

dataset to be discretized are ordered. 2.- The best cut

point for partitioning attribute domains is found. 3.-

Once the best cut point is found, the domain of each

379

M. Cadenas J., Carmen Garrido M. and Martínez R..

CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA.

DOI: 10.5220/0003644303790388

In Proceedings of the International Conference on Evolutionary Computation Theory and Applications (FCTA-2011), pages 379-388

ISBN: 978-989-8425-83-6

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

attribute is divided into two partitions. 4.- Finally, we

check whether the stopping criterion is fulﬁlled, and

if so the process is terminated.

To implement the above general discretization

process, EOFP Algorithm is divided in two stages. In

the ﬁrst stage, we carry out a search of the best cut

points for each attribute. In the second stage, based

on these cut points, we use a genetic algorithm which

optimizes the fuzzy sets formed from the cut points.

The structure of this study is as follows. In Sec-

tion 2, we are going to present the EOFP Algorithm.

In addition, in this section we are going to extend a

fuzzy decision tree, which is used as base in the ﬁrst

stage of EOFP algorithm. This tree is able to work

with imprecise information both in the values of the

attributes and in the class values. Later, in Section

3, we will show various experimental results which

evaluate our proposal in comparison with previously

existing proposals. For these experiments we will use

datasets with imprecision. In Section 4, we will show

the conclusions of this study. Finally, we include Ap-

pendix A with a brief description of the combination

methods used at work.

2 DESIGNING THE ALGORITHM

In this section we are going to present the EOFP Al-

gorithm which is able to work with imprecise data.

The EOFP Algorithm builds fuzzy partitions which

guarantees for each attribute:

• Completeness (no point in the domain is outside

the fuzzy partition), and

• Strong fuzzy partition (it veriﬁes that ∀x ∈ Ω

∑

f=1

(x) = 1 where B

,..,B

are the F

fuzzy

sets for the partition of the i continuous attribute

with Ω

domain and µ

(x) are its functions mem-

bership).

The domain of each i continuous attribute is parti-

tioned in trapezoidal fuzzy sets, B

..,B

, so that:

(x) =







1 b

≤ x ≤ b

−x)

−b

)

≤ x ≤ b

0 b

≤ x

;

(x) =











0 x ≤ b

(x−b

)

−b

)

≤ x ≤ b

1 b

≤ x ≤ b

−x)

−b

)

≤ x ≤ b

0 b

≤ x

;

··· ;

(x) =











0 x ≤ b

−1)3

(x−b

−1)3

)

−1)4

−b

−1)3

)

−1)3

≤ x ≤ b

−1)4

1 b

≤ x

The EOFP Algorithm is composed for two stages:

in the stage 1 we use a fuzzy decision tree. In this

stage we get possible cut points to different attributes.

In the stage 2 we carry out the process by which we

optimize the cut points and make fuzzy partitions.

The objective is to divide the continuous domains in

fuzzy set which will be competitive and effective to

obtain a good accuracy in the classiﬁcation task.

Before to describe the EOFP Algorithm, we are

going to present a fuzzy decision tree witch is be able

to work which imprecise data.

2.1 Fuzzy Decision Tree

In this section, we describe a fuzzy decision tree that

we will use as base classiﬁer in a Fuzzy Random For-

est ensemble to evaluate fuzzy partitions generated

and whose basic algorithm will be modiﬁed for the

ﬁrst stage of the EOFP Algorithm, as we willsee later.

This tree is an extension of the fuzzy decision tree that

we present in (Cadenas et al., 2010), to incorporate

the management of imprecise values.

The tree is built from a set of examples E which

are described by attributes which may be of type nom-

inal and continuous expressed with crisp, interval and

fuzzy values where there will be at least one nomi-

nal attribute which will act as class attribute. In addi-

tion, the class attribute can be expressed with a set of

classes (set-value class). Thus, the class also may be

expressed in an imprecise way.

The fuzzy decision tree is based on the ID3 algo-

rithm, where all the continuous attributes have been

discretized by means of a series of fuzzy sets. An

initial value equal to 1 (χ

root

) = 1, where χ

)

is the membership degree of example e

to node N

and e

is j − th example from data set) is assigned

to each example e

used in the tree learning, indicat-

ing that initially the example is only in the root node

of the tree. This value will continue to be 1 as long

as the example e

does not belong to more than one

node during the tree construction process. In a clas-

sical tree, an example can only belong to one node at

each moment, so its initial value (if it exists) is not

modiﬁed throughout the construction process. In the

case of a fuzzy tree, this value is modiﬁed in three

situations:

• When the example e

has a missing value in an

attribute i which is used as a test in a node N.

In this case, the example descends to each child

node N

,h = 1, ..., H

with a modiﬁed value pro-

FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications

380

portionately to the weight of each child node. The

modiﬁed value for each N

is calculate as:

) = χ

) ·

Tχ

where Tχ

is the sum of the weights of the ex-

amples with known value in the attribute i at node

N and Tχ

is the sum of the weights of the ex-

amples with known value in the attribute i that de-

scend to the child node N

• According to the membership degree of e

to dif-

ferent fuzzy sets of the partitions when the test

of a node N is based on attribute i which is con-

tinuous. In this case, the example descends to

those child nodes to which the example belongs

with a degree greater than 0 (µ

) > 0; f =

1,...,F

). Due to the characteristics of the parti-

tions we use, the example may descend to two

child nodes at most. In this case, χ

) =

) · µ

); ∀ f | µ

) > 0; h = f.

• When the test of a node N is based on attribute

i continuous and the value to attribute i in e

a fuzzy value different from the set of partitions

of the attribute, or an interval value, we need to

extend the function that measure the membership

degree of these type of data. This new function

(denoted µ

simil

(·)) captures the change in the value

), when e

descends in the fuzzy tree. For

this reason, the membership degree of e

is cal-

culated using a measure similarity (µ

simil

)) be-

tween the value of attribute i in e

and the differ-

ent fuzzy sets of the partition of attribute i. There-

fore, the example e

can descend to different child

nodes. In this case, χ

) = χ

) · µ

simil

Function µ

simil

) is deﬁned, for f = 1, . . ., F

, as:

simil

) =

(min{µ

(x),µ

(x)})dx

∑

f=1

(min{µ

(x),µ

(x)})dx

(1)

where

– µ

(x) represents the membership function of

the fuzzy or interval value of the example e

i the attribute i.

– µ

(x) represents the membership function of

the fuzzy set of the partition of the attribute i.

– F

is the cardinality of the partition of the at-

tribute i.

We can say that the χ

) value indicates the de-

gree with which the example fulﬁlls the conditions

that lead to node N on the tree. Also another aspect of

this extended fuzzy tree is the way to calculated the

information gain when node N (node which is being

explored at any given moment) is divided using the

attribute i as test attribute. This information gain G

is deﬁned as:

= I

− I

(2)

where:

• I

: Standard information associated with node N.

This information is calculated as follows:

1. For each class k = 1,...,|C|, the value P

which is the number of examples in node N be-

longing to class k is calculated:

|E|

∑

j=1

) · µ

) (3)

where:

∗ χ

) the membership degree of example e

to node N.

∗ µ

) is the membership degree of example

to class k.

2. P

, which is the total number of examples in

node N, is calculated.

|C|

∑

k=1

3. Standard information is calculated as:

= −

|C|

∑

k=1

· log

• I

is the product of three factors and represents

standard information obtained by dividing node N

using attribute i adjusted to the existence of miss-

ing values in this attribute.

= I

· I

where:

– I

= 1 -

, where P

is the weight of the

examples in node N with missing value in at-

tribute i.

– I

∑

h=1

, H

being the number of descen-

dants associated with node N when we divide

this node by attribute i and P

the weight of

the examples associated with each one of the

descendants.

– I

∑

h=1

· I

, I

being the standard in-

formation of each descendant h of node N.

On the other hand the stopping criterion is the

same that we described in (Cadenas et al., 2010)

which is deﬁned by the ﬁrst condition reached out of

the following: (a) pure node, (b) there aren’t any more

CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA

381

attributes to select, (c) reaching the minimum number

of examples allowed in a node. Besides, it must be

pointed out, that once an attribute has been selected

as a node test, this attribute will not be selected again

due to the fact that all the attributes are nominal or are

partitioned.

Having constructed the fuzzy tree, we use it to in-

fer an unknownclass of a newexample. The inference

process is as follow:

Given the example e to be classiﬁed with the ini-

tial value, for instance, χ

root

(e) = 1, go through the

tree from the root node. After obtain the leaf set

reached by e. For each leaf reached by e, calculate

the support for each class. The support for a class on

a given leaf N is obtained according to the expression

(3). Finally, obtain the tree’s decision, c, from the

information provided by the leaf set reached and the

value χ with which example e activates each one of

the leaves reached.

With the fuzzy decision tree presented at the mo-

ments, we have incorporate continuous imprecise at-

tributes which are described by mean of interval and

fuzzy values. In the next subsection we are consider

the modiﬁcations which are necessary to carry out in

the phases of learning and classiﬁcation to incorpo-

rate the treatment of examples whose class attribute is

set-value.

2.1.1 Set-value Classes in the Fuzzy Decision

Tree

In the previous section, we are say that the initial

weight of one example e may be equal to 1 (χ

root

(e) =

1) but this value depends on if the example has a sim-

ple class or it has a set-value class. In the ﬁrst case, if

the examplee has a unique class, the initial weight is 1

and in the second case the initial weight will depend

on the number of classes that example has. There-

fore, if the example e has a set-value class with n

classes

classes, the example will be replicated n

classes

times

and each replicate of the example e will have associ-

ated the weight 1/n

classes

In this case, when we perform a test of the tree to

classify a dataset with set-value classes, we can follow

the decision process:

If class(e)==class

tree

(e)∧size(class(e))==1 then successes++

else

If class(e)∩ class

tree

(e)6=

0 then success or error++

else errors++

where

• class

tree

is the class that fuzzy decision tree pro-

vides as output.

• class(e) is the class value of the example e.

As result of this test, we obtain the interval

[min error,max error] where min error is calculated

considering only errors indicated in the variable errors

from the previous process and max error is calculated

considering as errors errors+ success or error.

With this way to classify, the tree receives an im-

precise input and its output is imprecise too, because

it’s not possible to determine exactly a unique error.

One, we have described the fuzzy decision tree

that we will use to classify, and that with some modiﬁ-

cations, we will use in the stage 1 of the discretization

algorithm, we are going to expose such algorithm. As

we said earlier, the discretization algorithm. EOFP

is composed by two stages which we are going to

present.

2.2 First Stage: Searching for Cut

Points

In this stage, a fuzzy decision tree is constructed

whose basic process is that described in Subsection

2.1, except that now a procedure based on priority

tails is added and there are attributes that have not

been discretized. To discretize this attribute the ﬁrst

step is look for the cut points which will be the border

between different partitions. In previous section, we

are expose that to discretize attributes, we must order

the values. If all data are not crisp, we need a func-

tion to order crisp, fuzzy and interval values. To order

data, we use the same function that to search for the

possible cut points.

To deal with non-discretized attributes, the algo-

rithm follows the basic process in C4.5. The thresh-

olds selected in each node of the tree for these at-

tributes will be the split points that delimit the inter-

vals. Thus, the algorithm that constitutes this ﬁrst

stage is based on a fuzzy decision tree that allows

nominal attributes, continuous attributes discretized

by means of a fuzzy partition, non-discretized con-

tinuous attributes described with crisp, interval and

fuzzy values and furthermore it allows the existence

of missing values in all of them. Algorithm 1 de-

scribes the whole process.

In the step 1, all examples in the root node have

an initial weight equal to 1, less the examples with

set-value class whose weight will be initialize as we

indicate in the Section 2.1.1. The tail is a priority tail,

ordered from higher to lower according to the total

weight of the examples of nodes that form the tail.

Thus the domain is guaranteed to partition according

to the most relevant attributes.

In the step 3, when we expand a node according

to an attribute:

FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications

382

Algorithm 1: Search of cut points.

SearchCrispIntervals(in : E, Fuzzy Partition;

out : Cut points)

begin

1. Start at the root node, which is placed in the ini-

tially empty priority tail. Initially, in the root node

is found the set of examples E with an initial

weight.

2. Extract the ﬁrst node from the priority tail.

3. Select the best attribute to split this node using in-

formation gain expressed in (2) as the criterion.

We can ﬁnd two cases: The ﬁrst case is where

the attribute with the highest information gain is

already discretized, either because it is nominal,

or else because it had already been discretized

earlier by the Fuzzy Partition. The second case

arises when the attribute is continuous and non-

discretized. In this case it is necessary to obtain

the corresponding cut points.

4. Having selected the attribute to expand node, all

the descendants generated are introduced in the

tail.

5. Go back to step two to continue constructing the

tree until there are not nodes in the priority tail or

until another stopping condition occurs, such as

reaching nodes with a minimum number of exam-

ples allowed by the algorithm.

end

1. If the attribute is already discretized, the node is

expanded into many children as possible values

the selected attribute have. In this case, the tree’s

behaviour is similar to that described in the Sub-

section 2.1.

2. If the attribute is not previously discretized, its

possible descendants are obtained. To do this, as

in C4.5, the examples are ordered according to the

value of the attribute in question. To carry out the

order of data with crisp, fuzzy and interval val-

ues, we need an ordering index, (Wang and Kerre,

2001). Therefore, we have a representative value

for each interval and fuzzy value and we can or-

der all values of the non-discretized attribute. The

index used is calculated as in (4). Let A

a fuzzy

set (or interval) in the attribute i of the example e:

Y(A

) =

M(A

iα

)dα (4)

where

• Y(A

) is the representative value of the fuzzy or

interval data of the attribute i in the example.

• M(A

iα

) is the mean value of the elements of

iα

This index determine for each fuzzy or interval

value a number with which we order all values.

Using the crisp and the representative values, we

ﬁnd the possible cut point as a C4.5 tree. The in-

termediate value between value of the attribute for

example e

and for example e

j+1

is obtained. The

value obtained will be that which provides two de-

scendants for the node and to which the criterion

of information gain is applied. This is repeated

for each pair of consecutive values of the attribute,

searching for the value that yields the greatest in-

formation gain. The value that yields the greatest

information gain will be the one used to split the

node and will be considered as a split point for

the discretization of this attribute. When example

e descend to the two descendants, the process car-

ries out is the same that we explain in Section 2.1

and if the value of the attribute is fuzzy or interval,

we apply the function (1) to determine the mem-

bership of this example e of the descendant nodes,

because we only use the representative value of

these kind of values to order and to get cut points,

but when we need use these values to do some

calculate, we use the original value and not the

representative value.

2.3 Second Stage: Optimizing Fuzzy

Partitions with Imprecise Data

In this second stage of the EOFP Algorithm, we are

going to use a genetic algorithm to get the fuzzy sets

that make up the partitioning of nondiscretized at-

tributes. We have decide to use a genetic algorithm,

because these algorithms are very powerful and very

robust, as in most cases they can successfully deal

with an inﬁnity of problems from very diverse areas

and speciﬁcally in Data Mining (Cantu-Paz and Ka-

math, 2001). These algorithms are normally used in

problems without specialized techniques or even in

those problems where a technique does exist, but is

combined with a genetic algorithm to obtain hybrid

algorithms that improve results (Cox, 2005).

The genetic algorithm take as input the cut points

which we are obtained in the ﬁrst stage, but it is im-

portant to mention that the genetic algorithm will de-

cide what cut point are more important to construct

the fuzzy partitions, so it is possible that many cut

point are not use to obtain the optimal fuzzy parti-

tions. Maximum if the ﬁrst stage gets F cut points

for the attribute i, the genetic algorithm can make up

+ 1 fuzzy partition for the attribute i. However,

if the genetic algorithm considers that the attribute i

CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA

383

don’t have a lot of relevance in the dataset, this at-

tribute won’t be partitioned. The different elements

which compose this genetic algorithm are as follows:

Encoding. An individual will consist of two array

and v

. The array v

has a real coding and its

size will be the sum of the number of split points that

the fuzzy tree will have provided for each attribute in

the ﬁrst stage. Each gene in array v

represents the

quantity to be added to and subtracted from each at-

tribute’s split point to form the partition fuzzy. On

the other hand, the array v

has a binary coding and

its size is the same that the array v

. Each gene in

array v

indicates whether the corresponding gene or

split point of v

is active or not. The array v

will

change the domain of each gene in array v

. The do-

main of each gene in array v

is an interval deﬁned

by [0,min(

−p

r−1

r+1

−p

)] where p

is the r-th split

point of attribute i represented by this gene except in

the ﬁrst (p

) and last (p

) split point of each attribute

whose domains are, respectively: [0,min(p

−p

]

and [0,min(

−p

u−1

,1− p

When F

= 2, the domain of the single split point

is deﬁned by [0, min(p

,1− p

]. The population size

will be 100 individuals.

Initialization. First the array v

in each individual

is randomly initialized, provided that the genes of

the array are not all zero value, since all the split

points would be deactivated and attributes would

not be discretized. Once initialized the array v

the domain of each gene in array v

is calculated,

considering what points are active and which not.

After calculating the domain of each gene of the

array v

, each gene is randomly initialized generating

a value within its domain.

Fitness Function. The ﬁtness function of each in-

dividual is deﬁned according to the information gain

deﬁned in (Au et al., 2006). Algorithm 2 implements

the ﬁtness function, where:

• µ

is the belonging function corresponding to

fuzzy set f of attribute i. Again, we must empha-

size that this membership function depend on the

kind of attribute. Where if the attribute is contin-

uous or belonging to to a known fuzzy partition,

the membership function is calculated as we have

indicated in 2. On the contrary if the attribute is

fuzzy or interval, the membership function is cal-

culated as we show in function (1).

• E

is the subset of examples of E belonging to

class k.

This ﬁtness function, based on the information

gain, indicates how dependent the attributes are

with regard to class, i.e., how discriminatory each

attribute’s partitions are. If the ﬁtness we obtain

for each individual is close to zero, it indicates

that the attributes are totally independent of the

classes, which means that the fuzzy sets obtained

do not discriminate classes. On the other hand,

as the ﬁtness value moves further away from zero,

it indicates that the partitions obtained are more

than acceptable and may discriminate classes with

good accuracy.

Algorithm 2: Fitness Function.

Fitness(in : E, out : ValueFitness)

begin

1. For each attribute i = 1,...,|A|:

1.1 For each set f = 1,...,F

of attribute i

For each class k = 1,...,|C| calculate the probabil-

ity

ifk

eεE

(e)

eεE

(e)

1.2 For each class k = 1,...,|C| calculate the probabil-

ity

= Σ

f=1

ifk

1.3 For each f = 1,...,F

calculate the probability

= Σ

|C|

k=1

ifk

1.4 For each f = 1,...,F

calculate the information

gain of attribute i and set f

= Σ

|C|

k=1

ifk

· log

ifk

· P

1.5 For each f = 1,...,F

calculate the entropy

= −Σ

|C|

k=1

ifk

· log

ifk

1.6 Calculate the I and H total of attribute i

∑

f=1

and H

∑

f=1

2. Calculate the ﬁtness as :

ValueFitness =

∑

|A|

i=1

∑

|A|

i=1

end

Selection. Individual selection is by means of

tournament, taking subsets with size 2.

Crossing. The crossing operator is applied with a

probability of 0.3, crossing two individuals through a

single point, which may be any one of the positions

FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications

384

on the vector. Not all crossings are valid, since one of

the restrictions imposed on an individual is that the

array v

should not has all its genes to zero. When

crossing two individuals and this situation occurs,

the crossing is invalid, and individuals remain in

the population without interbreeding. If instead the

crossing is valid, the domain for each gene of array

is updated in individuals generated.

Mutation. Mutation is carried out according to a cer-

tain probability at interval [0.01, 0.1], changing the

value of one gene to any other in the possible do-

main. First, the gene of the array v

is mutated and

then checked that there are still genes with value 1 in

. In this case, the gene in v

is mutated and, in ad-

dition, the domains of this one and its adjacent genes

are updated in the vector v

. Finally, the mutation in

this same gene is carried out in the vector v

If when a gene is mutated in v

all genes are zero,

then the mutation process is not produced.

Stopping. The stopping condition is determined

by the number of generations situated at interval

[100, 150].

The genetic algorithm should ﬁnd the best possi-

ble solution in order to achieve a more efﬁcient clas-

siﬁcation.

In the next section we want to show with some

computational experiments that it is important con-

struct fuzzy partitions from real data versus transform

them because we will lost information and accuracy.

3 EXPERIMENTS

In this section we are going to show different ex-

periments to evaluate if the fuzzy partitions which

are constructed without making any transform of data

(EOFP Algorithm) are better than fuzzy partitions

which are constructed making certain transforma-

tion on imprecise data to convert them in crisp data

(OFP CLASS algorithm). All partitions are evalu-

ated classifying with a Fuzzy Random Forest ensem-

ble (FRF) (Bonissone et al., 2010) which is able to

handle imperfect data into the learning and the classi-

ﬁcation phases.

The experiments are designed to measure the be-

havior of fuzzy partitions used in the FRF ensem-

ble using datasets and results proposed in (Palacios

et al., 2009; Palacios et al., 2010) where the authors

use a fuzzy rule-based classiﬁer to classify datasets

with imprecise data such as missing or interval. Also

they use uniform partitions to evaluate the datasets

and we are going to show how the results are better

when the partitions are fuzzy although they are con-

structed using the modiﬁed dataset instead of the orig-

inal dataset. Also we are going to show how the re-

sults in classiﬁcation are still better if we don’t mod-

ify data to construct the fuzzy partitions. Due to we

are going to compare with results of (Palacios et al.,

2009; Palacios et al., 2010), we deﬁne the experimen-

tal settings quite similar to those proposed by them.

3.1 Datasets and Parameters for FRF

Ensemble

To evaluate fuzzy partitions, we have used real-world

datasets about medical diagnosis and high perfor-

mance athletics (Palacios et al., 2009; Palacios et al.,

2010), that we describe in Table 1.

Table 1: Datasets.

Dataset |E| |M| I

100ml-4-I 52 4 2

100ml-4-P 52 4 2

Long-4 25 4 2

Dyslexic-12 65 12 4

Dyslexic-12-01 65 12 3

Dyslexic-12-12 65 12 3

Table 1 shows the number of examples (|E|), the

number of attributes (|M|) and the number of classes

(I) for each dataset. “Abbr” indicates the abbreviation

of the dataset used in the experiments.

All FRF ensembles use a forest size of 100 trees.

The number of attributes chosen at random at a given

node is log

(|·|+1), where | · | is the number of avail-

able attributes at that node, and each tree of the FRF

ensemble is constructed to the maximum size (node

pure or set of available attributes is empty) and with-

out pruning.

3.2 Results

These experiments were conducted to test the accu-

racy of FRF ensemble when it uses fuzzy partitions

constructed from real-world datasets with imperfect

values using EOFP Algorithm. These results are com-

pared with the ones obtained by the GFS classiﬁer

proposed in (Palacios et al., 2009), which use uni-

form partitions and with the results obtained by FRF

ensemble when uses fuzzy partitions constructed with

the OPF CLASS Algorithm.

It is important to clarify that OPF CLASS Algo-

rithm doesn’t work with imperfect data. For this rea-

son, to get the fuzzy partitions of these datasets we

CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA

385

have modiﬁed the original data. The interval or fuzzy

valueshave been changedby its average value. There-

fore we have transformed the interval and fuzzy val-

ues in crisp values and of this way the OPF CLASS

Algorithm can work with these datasets.

In these experiments we have used the available

datasets in “http://sci2s.ugr.es/keel/” and the available

results in (Palacios et al., 2009; Palacios et al., 2010).

There are datasets from two different real-world prob-

lems. The ﬁrst one is related to the composition of

teams in high performance athletics and the second

one is a medical diagnosis problem. A more detailed

description of these problems may be found in (Pala-

cios et al., 2009; Palacios et al., 2010).

3.2.1 High Performance Athletics

The score of an athletics team is the sum of the indi-

vidual scores of the athletes in the different events. It

is the coach’s responsibility to balance the capabilities

of the different athletes in order to maximize the score

of a team according to the regulations. The variables

that deﬁne each problem are as follows:

• There are four indicators for the long jump that

are used to predict whether an athlete will pass a

given threshold: the ratio between the weight and

the height, the maximum speed in the 40 meters

race, and the tests of central (abdominal) muscles

and lower extremities;

• There are also four indicators for the 100 meters

race: the ratio between weight and height, the re-

action time, the starting or 20 m speed, and the

maximum or 40 m speed.

The datasets used in this experiment are the fol-

lowing: “Long-4” (25 examples, 4 attributes, 2

classes, no missing values and all attributes are

interval-valued), “100ml-4-I” and “100ml-4-P” (52

examples, 4 attributes, 2 classes, no missing values

and all attributes are interval-valued).

As in (Palacios et al., 2009), we have used a

10 fold cross-validation design for all datasets. Ta-

ble 2 shows the results obtained in (Palacios et al.,

2009) and the ones obtained by the FRF ensem-

ble with the six combination methods which are ex-

plained in detail in (Bonissone et al., 2010). In Ap-

pendix 4 we present a brief intuitive description of

each of them. Except for the crisp algorithm pro-

posed in (Palacios et al., 2009), in Table 2, the interval

[mean min error,mean max error] obtained for each

dataset according to the decision process described in

Section 2.1.1, is shown. For each dataset, we high-

light in bold the best results obtained with each algo-

rithm.

The results obtained in classiﬁcation by the ex-

tended GPS proposed in (Palacios et al., 2009) and

FRF ensemble, are very promising because we are

representing the information in a more natural and ap-

propriate way, and in this problem, we are allowing

the collection of knowledge of the coach by ranges of

values and linguistic terms.

The results of FRF ensemble are very competitive

with all fuzzy partitions but the fuzzy partitions ob-

tained with EOFP Algorithm are the best.

3.2.2 Diagnosis of Dyslexic

Dyslexia is a learning disability in people with normal

intellectual coefﬁcient, and without further physical

or psychological problems explaining such disability.

A more detailed description of this problem can found

in (Palacios et al., 2009; Palacios et al., 2010).

In these experiments, we have used three different

datasets. Their names are “Dyslexic-12’, “Dyslexic-

12-01” and “Dyslexic-12-12”. Each dataset has 65

examples, 12 attributes. The output variable for each

of these datasets is a subset of the labels that follow:

- No dyslexic; - Control and revision; - Dyslexic; and

- Inattention, hyperactivity or other problems.

These three datasets differ only in their outputs:

• “Dyslexic-12” comprises the four mentioned

classes.

• “Dyslexic-12-01” does not make use of the class

“control and revision”, whose members are in-

cluded in class “no dyslexic”.

• “Dyslexic-12-12” does not make use of the class

“control and revision”, whose members are in-

cluded in class “dyslexic”.

All experiments are repeated 100 times for boot-

strap resamples with replacement of the training set.

The test set comprises the “out of the bag” elements.

In Table 3, we show the results obtained when

we run FRF ensemble with fuzzy partitions obtained

with OFP CLASS and fuzzy partitions obtained with

EOFP for datasets “Dyslexic-12”, “Dyslexic-12-01”

and“Dyslexic-12-12’.

Also, in Table 3, we compare these results with

the best ones obtained in (Palacios et al., 2010)

((∗): partition used - four labels; (∗∗) partition used

- ﬁve labels). Again, in this table, the interval

[mean min error,mean max error] obtained for each

dataset according to the decision process described in

Section 2.1.1, is shown. For each dataset, we high-

light in bold the best results obtained with each algo-

rithm.

As comment about all experiment, we see that

FRF ensemble with EOFP fuzzy partitions obtains

FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications

386

Table 2: Comparative results for datasets of high performance athletics.

Dataset

100ml-4-I 100ml-4-P Long-4

Technique Train Test Train Test Train Test

EOFP

Fuzzy partition

FFR

SM1

[0.107,0.305] [0.130,0.323] [0.043,0.235] [0.093,0.290] [0.191,0.484] [0.083,0.349]

FRF

SM2

[0.110,0.306] [0.150,0.343] [0.045,0.237] [0.110,0.307] [0.165,0.449] [0.083,0.349]

FRF

MWL1

[0.070,0.265] [0.073,0.267] [0.032,0.224] [0.060,0.257] [0.085,0.364] [0.033,0.299]

FRF

MWL2

[0.060,0.254] [0.113,0.306] [0.043,0.235] [0.060,0.257] [0.111,0.391] [0.083,0.349]

FRF

MWLT1

[0.070,0.267] [0.073,0.267] [0.032,0.224] [0.060,0.257] [0.085,0.364] [0.033,0.299]

FRF

MWLT2

[0.060,0.252] [0.093,0.286] [0.038,0.231] [0.060,0.257] [0.107,0.386] [0,083,0.349]

OFP CLASS

Fuzzy partition

FFR

SM1

[0.139,0.331] [0.150,0.343] [0.098,0.291] [0.133,0.310] [0.120,0.404] [0.200,0.467]

FRF

SM2

[0.141,0.333] [0.150,0.343] [0.096,0.288] [0.093,0.290] [0.115,0.391] [0.200,0.467]

FRF

MWL1

[0.077,0.269] [0.093,0.287] [0.075,0.269] [0.073,0.270] [0.116,0.396] [0.100,0.417]

FRF

MWL2

[0.060,0.252] [0.093,0.287] [0.077,0.269] [0.073,0.270] [0.102,0.382] [0.100,0.367]

FRF

MWLT1

[0.077,0.269] [0.093,0.287] [0.075,0.267] [0.073,0.270] [0.107,0.387] [0.150,0.417]

FRF

MWLT2

[0.062,0.254] [0.093,0.287] [0.077,0.269] [0.073,0.270] [0.094,0.373] [0.067,0.333]

Crisp (Palacios et al., 2009) 0.259 0.384 0.288 0.419 0.327 0.544

GGFS (Palacios et al., 2009) [0.089,0.346] [0.189,0.476] [0.076,0.320] [0.170,0.406] [0.000,0.279] [0.349,0.616]

Table 3: Comparative results for datasets of dyslexia.

Dataset

Dyslexic-12 Dyslexic-12-01 Dyslexic-12-12

Technique Train Test Train Test Train Test

EOFP

Fuzzy partition

FRF

SM1

[0.000,0.238] [0.000,0.398] [0.022,0.223] [0.039,0.377] [0.001,0.263] [0.035,0.422]

FRF

SM2

[0.000,0.228] [0.000,0.399] [0.008,0.184] [0.022,0.332] [0.009,0.245] [0.032,0.411]

FRF

MWL1

[0.000,0.270] [0.000,0.406] [0.017,0.231] [0.045,0.383] [0.001,0.273] [0.019,0.430]

FRF

MWL2

[0.000,0.270] [0.000,0.407] [0.020,0.241] [0.056,0.385] [0.001,0.267] [0.026,0.406]

FRF

MWLT1

[0.000,0.263] [0.000,0.402] [0.012,0.216] [0.038,0.365] [0.000,0.265] [0.019,0.427]

FRF

MWLT2

[0.000,0.266] [0.000,0.404] [0.015,0.221] [0.049,0.373] [0.000,0.262] [0.024,0.422]

OFP CLASS

Fuzzy partition

FRF

SM1

[0.000,0.320] [0.002,0.511] [0.000,0.282] [0.000,0.413] [0.000,0.405] [0.000,0.477]

FRF

SM2

[0.000,0.327] [0.001,0.515] [0.000,0.253] [0.000,0.389] [0.000,0.402] [0.000,0.469]

FRF

MWL1

[0.000,0.261] [0.003,0.419] [0.000,0.264] [0.000,0.400] [0.000,0.335] [0.000,0.422]

FRF

MWL2

[0.000,0.270] [0.003,0.423] [0.000,0.276] [0.000,0.407] [0.000,0.343] [0.000,0.414]

FRF

MWLT1

[0.000,0.264] [0.004,0.419] [0.000,0.243] [0.000,0.386] [0.000,0.331] [0.000,0.422]

FRF

MWLT2

[0.000,0.267] [0.003,0.417] [0.000,0.259] [0.000,0.394] [0.000,0.343] [0.000,0.418]

Crisp CF

(*) 0.444 [0.572,0.694] 0.336 [0.452,0.533] 0.390 [0.511,0.664]

GGFS (*) – [0.421,0.558] – [0.219,0.759] – [0.199,0.757]

GGFS CF

(*) [0.003,0.237] [0.405,0.548] [0.005,0.193] [0.330,0.440] [0.003,0.243] [0.325,0.509]

Crisp CF

(**) 0.556 [0.614,0.731] 0.460 [0.508,0.605] 0.485 [0.539,0.692]

GGFS (**) – [0.490,0.609] – [0.323,0.797] – [0.211,0.700]

GGFS CF

(**) [0.038,0.233] [0.480,0.621] [0.000,0.187] [0.394,0.522] [0.000,0.239] [0.393,0.591]

better results in test than FRF with OFP CLASS

fuzzy partitions. FRF ensemble is a signiﬁcant im-

provement over the crisp GFS. In these experiments

we can see that when the partitions are obtained with

the original data using the EOFP algorithm, the ac-

curacy is higher (the intervals of error are closer to

0 and they are less imprecise). As also discussed in

(Palacios et al., 2010) is preferable to use an algo-

rithm which is capable of learning with low quality

data than removing the imperfect information and us-

ing a conventional algorithm.

4 CONCLUSIONS

In this paper we have presented the EOFP Algorithm

for fuzzy discretization of continuous attributes. This

algorithm is able to work with imperfect information.

CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA

387

We have performed several experiments using impre-

cise datasets, obtaining better results when working

with the original data. Besides, we have presented

a fuzzy decision tree which can work with imprecise

information.

Our ﬁnal conclusion, as many papers in the liter-

ature are indicating, is that it is necessary to design

classiﬁcation techniques so they can manipulate orig-

inal data that can be imperfect in some cases. The

transformation of these imperfect values to (imputed)

crisp values may cause undesirable effects with re-

spect to accuracy of the technique.

ACKNOWLEDGEMENTS

Supported by the project TIN2008-06872-C04-03 of

the MICINN of Spain and European Regional Devel-

opment Fund. Thanks to the Funding Program for

Research Groups of Excellence (04552/GERM/06)

granted by the “Agencia Regional de Ciencia y Tec-

nolog´ı a - Fundaci´on S´eneca”, Murcia, Spain. Raquel

Mart´ınez is supported by the scholarship program FPI

from the “Fundaci´on S´eneca” of Spain.

REFERENCES

Au, W.-H., Chan, K. C., and Wong, A. (2006). A fuzzy ap-

proach to partitioning continuous attributes for classi-

ﬁcation. IEEE Tran, Knowledge and Data Engineer-

ing, 18(5):715–719.

Bonissone, P. (1997). Uncertainty Management in Infor-

mation Systems: From Needs to Solutions, chapter

Approximate reasoning systems: handling uncertainty

and imprecision in information systems, pages 369–

395. A. Motro and Ph. Smets, Eds. Kluwer Academic

Publishers.

Bonissone, P., Cadenas, J. M., Garrido, M., and D´ıaz-

Valladares, R. (2010). A fuzzy random forest. Int.

J. Approx. Reasoning, 51(7):729–747.

Cadenas, J., Garrido, M., Mart´ınez, R., and Mu˜noz,

E. (2010). Ofp class: An algorithm to gener-

ate optimized fuzzy partitions to classiﬁcation. In

2nd International Conference on Fuzzy Computation,

ICFC2010., pages 5–13.

Cantu-Paz, E. and Kamath, C. (2001). Data Mining: A

heuristic approach, chapter On the use of evolutionary

algorithms in data mining, pages 48–71. Ideal Group

Publishing.

Casillas, J. and S´anchez, L. (2006). Knowledge extrac-

tion from data fuzzy for estimating consumer behavior

models. In IEEE conference on Fuzzy Systems, pages

164–170.

Cox, E. (2005). Fuzzy Modeling and Genetic Algorithms

for Data Mining and Exploration. Morgan Kaufmann

Publishers.

Garrido, M., Cadenas, J., and Bonissone, P. (2010). A clas-

siﬁcation and regression technique to handle hetero-

geneous and imperfect information. Soft Computing,

14(11):1165–1185.

Liu, H., Hussain, F., Tan, C., and Dash, M. (2002). Dis-

cretization: an enabling technique. Journal of Data

Mining and Knowledge Discovery, 6(4):393–423.

Otero, A. J., S´anchez, L., and Villar, J. R. (2006). Longest

path estimation from inherently fuzzy data acquired

with gps using genetic algorithms. In Int. Symposium

on Evolving Fuzzy Systems, pages 300–305.

Palacios, A. M., S´anchez, L., and Couso, I. (2009). Extend-

ing a simple genetic coopertative-competitive learning

fuzzy classiﬁer to low quality datasets. Evolutionary

Intelligence, 2:73–84.

Palacios, A. M., S´anchez, L., and Couso, I. (2010). Diag-

nosis of dyslexia with low quality data with genetic

fuzzy systems. Int. J. Approx. Reasoning, 51:993–

1009.

Wang, X. and Kerre, E. (2001). Reasonable propierties for

the ordering of fuzzy quantities (i-ii). Journal of Fuzzy

Sets and Systems, 118:375–405.

APPENDIX

Combination Methods

We present, with a brief intuitive description, the

combination methods used in this paper. These meth-

ods are described with more details in (Bonissone

et al., 2010).

• Method SM1: In this method, each tree of the en-

semble assigns a simple vote to the most voted

class among the reached leaves by the example.

The FRF ensemble classiﬁes the example with the

most voted class among the trees.

• Method SM2: The FRF ensemble classiﬁes the

example with the most voted class among the

reached leaves by the example.

• Method MWL1: This method is similar to SM1

method but the vote of each reached leaf is

weighted by the weight of the leaf.

• Method MWL2: In this case, each leaves reached

assigns a weight vote to the majority class. The

ensemble decides the most voted class.

• Method MWLT1: This method is similar to

MWL1 method but the vote of each tree is

weighted by a weight assigned to each tree.

• Method MWLT2: Each leaf reached vote to the

majority class with a weighted vote with the

weight of the leaf and the tree to which it belongs.

FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications

388