CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA
Jos´e M. Cadenas, M. Carmen Garrido and Raquel Mart´ınez
Dpt. Engineering Information and Communications, Faculty of Informatic, University of Murcia
Campus Espinardo, Murcia, Spain
Keywords:
Fuzzy partition, Imperfect information, Fuzzy random forest ensemble, Imprecise data.
Abstract:
Classification is an important task in Data Mining. In order to carry out classification, many classifiers require
a previous preparatory step for their data. In this paper we focus on the process of discretization of attributes
because this process is a very important part in Data Mining. In many situations, the values of the attributes
present imprecision because imperfect information inevitably appears in real situations for a variety of reasons.
Although, many efforts have been made to incorporate imperfect data into classification techniques, there are
still many limitations as to the type of data, uncertainty and imprecision that can be handled. Therefore, in this
paper we propose an algorithm to construct fuzzy partitions from imprecise information and we evaluate them
in a Fuzzy Random Forest ensemble which is able to work with imprecise information too. Also, we compare
our proposal with results of other works.
1 INTRODUCTION
The construction of fuzzy intervals in which a con-
tinuous domain is discretized suppose an impor-
tant problem in the area of data mining and soft-
computing due to the determinate of these intervals
can deeply affect the performance of the different
classification techniques (Au et al., 2006).
Although there are a lot of algorithms to dis-
cretization, most of them have not considered that
sometimes the information available to construct the
partitioning are not as precise and accurate as de-
sirable. However, imperfect information inevitably
appears in realistic domains and situations. Instru-
ment errors or corruption from noise during experi-
ments may give rise to information with incomplete
data when measuring a specific attribute. In other
cases, the extraction of exact information may be ex-
cessively costly or unfeasible. Moreover, it might be
useful to complement the available data with addi-
tional information from an expert, which is usually
elicited by imperfect data (interval data, fuzzy con-
cepts, etc). In most real-world problems, data have
a certain degree of imprecision. Sometimes, this im-
precision is small enough for it to be safely ignored.
On other occasions, the imprecision of the data can be
modeled by a probability distribution. However, there
is a third kind of problem, where the imprecision is
significant, and a probability distribution is not the
most natural way to model it. This is the case of cer-
tain practical problems where the data are inherently
fuzzy (Bonissone, 1997; Casillas and S´anchez, 2006;
Garrido et al., 2010; Otero et al., 2006).
When we have imperfect data, we have two op-
tions: The first option is to transform the original
data for another kind of data which our algorithm can
work; The second one is to work directly with original
data without carrying out any transformation in data.
When we choose the first option, we can lose infor-
mation and therefore, we can lose accuracy. For this
reason, it is necessary to incorporate the handling of
information with attributes which may present miss-
ing and imprecise values in the discretization algo-
rithms.
In this paper we present an algorithm, which we
call EOFP (Extended Optimized Fuzzy Partitions)
that obtains fuzzy partitions from imperfect informa-
tion. This algorithm extends the OFP
CLASS algo-
rithm (Cadenas et al., 2010) to incorporate the man-
agement of imprecise values (intervals and fuzzy val-
ues) in continuous attributes and set-valued classes
(imprecise values for the attribute class).
EOFP Algorithm follows the steps of a top-down
discretization process with four iterative stages (Liu
et al., 2002): 1.- All kind of continuous values in the
dataset to be discretized are ordered. 2.- The best cut
point for partitioning attribute domains is found. 3.-
Once the best cut point is found, the domain of each
379
M. Cadenas J., Carmen Garrido M. and Martínez R..
CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA.
DOI: 10.5220/0003644303790388
In Proceedings of the International Conference on Evolutionary Computation Theory and Applications (FCTA-2011), pages 379-388
ISBN: 978-989-8425-83-6
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
attribute is divided into two partitions. 4.- Finally, we
check whether the stopping criterion is fulfilled, and
if so the process is terminated.
To implement the above general discretization
process, EOFP Algorithm is divided in two stages. In
the first stage, we carry out a search of the best cut
points for each attribute. In the second stage, based
on these cut points, we use a genetic algorithm which
optimizes the fuzzy sets formed from the cut points.
The structure of this study is as follows. In Sec-
tion 2, we are going to present the EOFP Algorithm.
In addition, in this section we are going to extend a
fuzzy decision tree, which is used as base in the first
stage of EOFP algorithm. This tree is able to work
with imprecise information both in the values of the
attributes and in the class values. Later, in Section
3, we will show various experimental results which
evaluate our proposal in comparison with previously
existing proposals. For these experiments we will use
datasets with imprecision. In Section 4, we will show
the conclusions of this study. Finally, we include Ap-
pendix A with a brief description of the combination
methods used at work.
2 DESIGNING THE ALGORITHM
In this section we are going to present the EOFP Al-
gorithm which is able to work with imprecise data.
The EOFP Algorithm builds fuzzy partitions which
guarantees for each attribute:
Completeness (no point in the domain is outside
the fuzzy partition), and
Strong fuzzy partition (it verifies that x
i
,
F
i
f=1
µ
B
f
(x) = 1 where B
1
,..,B
F
i
are the F
i
fuzzy
sets for the partition of the i continuous attribute
with
i
domain and µ
B
f
(x) are its functions mem-
bership).
The domain of each i continuous attribute is parti-
tioned in trapezoidal fuzzy sets, B
1
,B
2
..,B
F
i
, so that:
µ
B
1
(x) =
1 b
11
x b
12
(b
13
x)
(b
13
b
12
)
b
12
x b
13
0 b
13
x
;
µ
B
2
(x) =
0 x b
12
(xb
12
)
(b
13
b
12
)
b
12
x b
13
1 b
13
x b
23
(b
24
x)
(b
24
b
23
)
b
23
x b
24
0 b
24
x
;
··· ;
µ
B
F
i
(x) =
0 x b
(F
i
1)3
(xb
(F
i
1)3
)
(b
(F
i
1)4
b
(F
i
1)3
)
b
(F
i
1)3
x b
(F
i
1)4
1 b
F
i
3
x
The EOFP Algorithm is composed for two stages:
in the stage 1 we use a fuzzy decision tree. In this
stage we get possible cut points to different attributes.
In the stage 2 we carry out the process by which we
optimize the cut points and make fuzzy partitions.
The objective is to divide the continuous domains in
fuzzy set which will be competitive and effective to
obtain a good accuracy in the classification task.
Before to describe the EOFP Algorithm, we are
going to present a fuzzy decision tree witch is be able
to work which imprecise data.
2.1 Fuzzy Decision Tree
In this section, we describe a fuzzy decision tree that
we will use as base classifier in a Fuzzy Random For-
est ensemble to evaluate fuzzy partitions generated
and whose basic algorithm will be modified for the
first stage of the EOFP Algorithm, as we willsee later.
This tree is an extension of the fuzzy decision tree that
we present in (Cadenas et al., 2010), to incorporate
the management of imprecise values.
The tree is built from a set of examples E which
are described by attributes which may be of type nom-
inal and continuous expressed with crisp, interval and
fuzzy values where there will be at least one nomi-
nal attribute which will act as class attribute. In addi-
tion, the class attribute can be expressed with a set of
classes (set-value class). Thus, the class also may be
expressed in an imprecise way.
The fuzzy decision tree is based on the ID3 algo-
rithm, where all the continuous attributes have been
discretized by means of a series of fuzzy sets. An
initial value equal to 1 (χ
root
(e
j
) = 1, where χ
N
(e
j
)
is the membership degree of example e
j
to node N
and e
j
is j th example from data set) is assigned
to each example e
j
used in the tree learning, indicat-
ing that initially the example is only in the root node
of the tree. This value will continue to be 1 as long
as the example e
j
does not belong to more than one
node during the tree construction process. In a clas-
sical tree, an example can only belong to one node at
each moment, so its initial value (if it exists) is not
modified throughout the construction process. In the
case of a fuzzy tree, this value is modified in three
situations:
When the example e
j
has a missing value in an
attribute i which is used as a test in a node N.
In this case, the example descends to each child
node N
h
,h = 1, ..., H
i
with a modified value pro-
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
380
portionately to the weight of each child node. The
modified value for each N
h
is calculate as:
χ
N
h
(e
j
) = χ
N
(e
j
) ·
Tχ
N
h
Tχ
N
where Tχ
N
is the sum of the weights of the ex-
amples with known value in the attribute i at node
N and Tχ
N
h
is the sum of the weights of the ex-
amples with known value in the attribute i that de-
scend to the child node N
h
.
According to the membership degree of e
j
to dif-
ferent fuzzy sets of the partitions when the test
of a node N is based on attribute i which is con-
tinuous. In this case, the example descends to
those child nodes to which the example belongs
with a degree greater than 0 (µ
B
f
(e
j
) > 0; f =
1,...,F
i
). Due to the characteristics of the parti-
tions we use, the example may descend to two
child nodes at most. In this case, χ
N
h
(e
j
) =
χ
N
(e
j
) · µ
B
f
(e
j
); f | µ
B
f
(e
j
) > 0; h = f.
When the test of a node N is based on attribute
i continuous and the value to attribute i in e
j
is
a fuzzy value different from the set of partitions
of the attribute, or an interval value, we need to
extend the function that measure the membership
degree of these type of data. This new function
(denoted µ
simil
(·)) captures the change in the value
χ
N
(e
j
), when e
j
descends in the fuzzy tree. For
this reason, the membership degree of e
j
is cal-
culated using a measure similarity (µ
simil
(e
j
)) be-
tween the value of attribute i in e
j
and the differ-
ent fuzzy sets of the partition of attribute i. There-
fore, the example e
j
can descend to different child
nodes. In this case, χ
N
h
(e
j
) = χ
N
(e
j
) · µ
simil
(e
j
).
Function µ
simil
(e
j
) is defined, for f = 1, . . ., F
i
, as:
µ
simil
(e
j
) =
R
(min{µ
e
j
(x),µ
f
(x)})dx
F
i
f=1
R
(min{µ
e
j
(x),µ
f
(x)})dx
(1)
where
µ
e
j
(x) represents the membership function of
the fuzzy or interval value of the example e
j
i the attribute i.
µ
f
(x) represents the membership function of
the fuzzy set of the partition of the attribute i.
F
i
is the cardinality of the partition of the at-
tribute i.
We can say that the χ
N
(e
j
) value indicates the de-
gree with which the example fulfills the conditions
that lead to node N on the tree. Also another aspect of
this extended fuzzy tree is the way to calculated the
information gain when node N (node which is being
explored at any given moment) is divided using the
attribute i as test attribute. This information gain G
N
i
is defined as:
G
N
i
= I
N
I
S
N
V
i
(2)
where:
I
N
: Standard information associated with node N.
This information is calculated as follows:
1. For each class k = 1,...,|C|, the value P
N
k
,
which is the number of examples in node N be-
longing to class k is calculated:
P
N
k
=
|E|
j=1
χ
N
(e
j
) · µ
k
(e
j
) (3)
where:
χ
N
(e
j
) the membership degree of example e
j
to node N.
µ
k
(e
j
) is the membership degree of example
e
j
to class k.
2. P
N
, which is the total number of examples in
node N, is calculated.
P
N
=
|C|
k=1
P
N
k
3. Standard information is calculated as:
I
N
=
|C|
k=1
P
N
k
P
N
· log
P
N
k
P
N
I
S
N
V
i
is the product of three factors and represents
standard information obtained by dividing node N
using attribute i adjusted to the existence of miss-
ing values in this attribute.
I
S
N
V
i
= I
S
N
V
i
1
· I
S
N
V
i
2
· I
S
N
V
i
3
where:
I
S
N
V
i
1
= 1 -
P
N
m
i
P
N
, where P
N
m
i
is the weight of the
examples in node N with missing value in at-
tribute i.
I
S
N
V
i
2
=
1
H
i
h=1
P
N
h
, H
i
being the number of descen-
dants associated with node N when we divide
this node by attribute i and P
N
h
the weight of
the examples associated with each one of the
descendants.
I
S
N
V
i
3
=
H
i
h=1
P
N
h
· I
N
h
, I
N
h
being the standard in-
formation of each descendant h of node N.
On the other hand the stopping criterion is the
same that we described in (Cadenas et al., 2010)
which is defined by the first condition reached out of
the following: (a) pure node, (b) there aren’t any more
CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA
381
attributes to select, (c) reaching the minimum number
of examples allowed in a node. Besides, it must be
pointed out, that once an attribute has been selected
as a node test, this attribute will not be selected again
due to the fact that all the attributes are nominal or are
partitioned.
Having constructed the fuzzy tree, we use it to in-
fer an unknownclass of a newexample. The inference
process is as follow:
Given the example e to be classified with the ini-
tial value, for instance, χ
root
(e) = 1, go through the
tree from the root node. After obtain the leaf set
reached by e. For each leaf reached by e, calculate
the support for each class. The support for a class on
a given leaf N is obtained according to the expression
(3). Finally, obtain the tree’s decision, c, from the
information provided by the leaf set reached and the
value χ with which example e activates each one of
the leaves reached.
With the fuzzy decision tree presented at the mo-
ments, we have incorporate continuous imprecise at-
tributes which are described by mean of interval and
fuzzy values. In the next subsection we are consider
the modifications which are necessary to carry out in
the phases of learning and classification to incorpo-
rate the treatment of examples whose class attribute is
set-value.
2.1.1 Set-value Classes in the Fuzzy Decision
Tree
In the previous section, we are say that the initial
weight of one example e may be equal to 1 (χ
root
(e) =
1) but this value depends on if the example has a sim-
ple class or it has a set-value class. In the first case, if
the examplee has a unique class, the initial weight is 1
and in the second case the initial weight will depend
on the number of classes that example has. There-
fore, if the example e has a set-value class with n
classes
classes, the example will be replicated n
classes
times
and each replicate of the example e will have associ-
ated the weight 1/n
classes
.
In this case, when we perform a test of the tree to
classify a dataset with set-value classes, we can follow
the decision process:
If class(e)==class
tree
(e)size(class(e))==1 then successes++
else
If class(e) class
tree
(e)6=
/
0 then success or error++
else errors++
where
class
tree
is the class that fuzzy decision tree pro-
vides as output.
class(e) is the class value of the example e.
As result of this test, we obtain the interval
[min error,max error] where min error is calculated
considering only errors indicated in the variable errors
from the previous process and max error is calculated
considering as errors errors+ success or error.
With this way to classify, the tree receives an im-
precise input and its output is imprecise too, because
it’s not possible to determine exactly a unique error.
One, we have described the fuzzy decision tree
that we will use to classify, and that with some modifi-
cations, we will use in the stage 1 of the discretization
algorithm, we are going to expose such algorithm. As
we said earlier, the discretization algorithm. EOFP
is composed by two stages which we are going to
present.
2.2 First Stage: Searching for Cut
Points
In this stage, a fuzzy decision tree is constructed
whose basic process is that described in Subsection
2.1, except that now a procedure based on priority
tails is added and there are attributes that have not
been discretized. To discretize this attribute the first
step is look for the cut points which will be the border
between different partitions. In previous section, we
are expose that to discretize attributes, we must order
the values. If all data are not crisp, we need a func-
tion to order crisp, fuzzy and interval values. To order
data, we use the same function that to search for the
possible cut points.
To deal with non-discretized attributes, the algo-
rithm follows the basic process in C4.5. The thresh-
olds selected in each node of the tree for these at-
tributes will be the split points that delimit the inter-
vals. Thus, the algorithm that constitutes this first
stage is based on a fuzzy decision tree that allows
nominal attributes, continuous attributes discretized
by means of a fuzzy partition, non-discretized con-
tinuous attributes described with crisp, interval and
fuzzy values and furthermore it allows the existence
of missing values in all of them. Algorithm 1 de-
scribes the whole process.
In the step 1, all examples in the root node have
an initial weight equal to 1, less the examples with
set-value class whose weight will be initialize as we
indicate in the Section 2.1.1. The tail is a priority tail,
ordered from higher to lower according to the total
weight of the examples of nodes that form the tail.
Thus the domain is guaranteed to partition according
to the most relevant attributes.
In the step 3, when we expand a node according
to an attribute:
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
382
Algorithm 1: Search of cut points.
SearchCrispIntervals(in : E, Fuzzy Partition;
out : Cut points)
begin
1. Start at the root node, which is placed in the ini-
tially empty priority tail. Initially, in the root node
is found the set of examples E with an initial
weight.
2. Extract the first node from the priority tail.
3. Select the best attribute to split this node using in-
formation gain expressed in (2) as the criterion.
We can find two cases: The first case is where
the attribute with the highest information gain is
already discretized, either because it is nominal,
or else because it had already been discretized
earlier by the Fuzzy Partition. The second case
arises when the attribute is continuous and non-
discretized. In this case it is necessary to obtain
the corresponding cut points.
4. Having selected the attribute to expand node, all
the descendants generated are introduced in the
tail.
5. Go back to step two to continue constructing the
tree until there are not nodes in the priority tail or
until another stopping condition occurs, such as
reaching nodes with a minimum number of exam-
ples allowed by the algorithm.
end
1. If the attribute is already discretized, the node is
expanded into many children as possible values
the selected attribute have. In this case, the tree’s
behaviour is similar to that described in the Sub-
section 2.1.
2. If the attribute is not previously discretized, its
possible descendants are obtained. To do this, as
in C4.5, the examples are ordered according to the
value of the attribute in question. To carry out the
order of data with crisp, fuzzy and interval val-
ues, we need an ordering index, (Wang and Kerre,
2001). Therefore, we have a representative value
for each interval and fuzzy value and we can or-
der all values of the non-discretized attribute. The
index used is calculated as in (4). Let A
i
a fuzzy
set (or interval) in the attribute i of the example e:
Y(A
i
) =
Z
1
0
M(A
iα
)dα (4)
where
Y(A
i
) is the representative value of the fuzzy or
interval data of the attribute i in the example.
M(A
iα
) is the mean value of the elements of
A
iα
.
This index determine for each fuzzy or interval
value a number with which we order all values.
Using the crisp and the representative values, we
find the possible cut point as a C4.5 tree. The in-
termediate value between value of the attribute for
example e
j
and for example e
j+1
is obtained. The
value obtained will be that which provides two de-
scendants for the node and to which the criterion
of information gain is applied. This is repeated
for each pair of consecutive values of the attribute,
searching for the value that yields the greatest in-
formation gain. The value that yields the greatest
information gain will be the one used to split the
node and will be considered as a split point for
the discretization of this attribute. When example
e descend to the two descendants, the process car-
ries out is the same that we explain in Section 2.1
and if the value of the attribute is fuzzy or interval,
we apply the function (1) to determine the mem-
bership of this example e of the descendant nodes,
because we only use the representative value of
these kind of values to order and to get cut points,
but when we need use these values to do some
calculate, we use the original value and not the
representative value.
2.3 Second Stage: Optimizing Fuzzy
Partitions with Imprecise Data
In this second stage of the EOFP Algorithm, we are
going to use a genetic algorithm to get the fuzzy sets
that make up the partitioning of nondiscretized at-
tributes. We have decide to use a genetic algorithm,
because these algorithms are very powerful and very
robust, as in most cases they can successfully deal
with an infinity of problems from very diverse areas
and specifically in Data Mining (Cantu-Paz and Ka-
math, 2001). These algorithms are normally used in
problems without specialized techniques or even in
those problems where a technique does exist, but is
combined with a genetic algorithm to obtain hybrid
algorithms that improve results (Cox, 2005).
The genetic algorithm take as input the cut points
which we are obtained in the first stage, but it is im-
portant to mention that the genetic algorithm will de-
cide what cut point are more important to construct
the fuzzy partitions, so it is possible that many cut
point are not use to obtain the optimal fuzzy parti-
tions. Maximum if the first stage gets F cut points
for the attribute i, the genetic algorithm can make up
F
i
+ 1 fuzzy partition for the attribute i. However,
if the genetic algorithm considers that the attribute i
CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA
383
don’t have a lot of relevance in the dataset, this at-
tribute won’t be partitioned. The different elements
which compose this genetic algorithm are as follows:
Encoding. An individual will consist of two array
v
1
and v
2
. The array v
1
has a real coding and its
size will be the sum of the number of split points that
the fuzzy tree will have provided for each attribute in
the first stage. Each gene in array v
1
represents the
quantity to be added to and subtracted from each at-
tribute’s split point to form the partition fuzzy. On
the other hand, the array v
2
has a binary coding and
its size is the same that the array v
1
. Each gene in
array v
2
indicates whether the corresponding gene or
split point of v
1
is active or not. The array v
2
will
change the domain of each gene in array v
1
. The do-
main of each gene in array v
1
is an interval defined
by [0,min(
p
r
p
r1
2
,
p
r+1
p
r
2
)] where p
r
is the r-th split
point of attribute i represented by this gene except in
the first (p
1
) and last (p
u
) split point of each attribute
whose domains are, respectively: [0,min(p
1
,
p
2
p
1
2
]
and [0,min(
p
u
p
u1
2
,1 p
u
].
When F
i
= 2, the domain of the single split point
is defined by [0, min(p
1
,1 p
1
]. The population size
will be 100 individuals.
Initialization. First the array v
2
in each individual
is randomly initialized, provided that the genes of
the array are not all zero value, since all the split
points would be deactivated and attributes would
not be discretized. Once initialized the array v
2
,
the domain of each gene in array v
1
is calculated,
considering what points are active and which not.
After calculating the domain of each gene of the
array v
1
, each gene is randomly initialized generating
a value within its domain.
Fitness Function. The fitness function of each in-
dividual is defined according to the information gain
defined in (Au et al., 2006). Algorithm 2 implements
the fitness function, where:
µ
if
is the belonging function corresponding to
fuzzy set f of attribute i. Again, we must empha-
size that this membership function depend on the
kind of attribute. Where if the attribute is contin-
uous or belonging to to a known fuzzy partition,
the membership function is calculated as we have
indicated in 2. On the contrary if the attribute is
fuzzy or interval, the membership function is cal-
culated as we show in function (1).
E
k
is the subset of examples of E belonging to
class k.
This fitness function, based on the information
gain, indicates how dependent the attributes are
with regard to class, i.e., how discriminatory each
attribute’s partitions are. If the fitness we obtain
for each individual is close to zero, it indicates
that the attributes are totally independent of the
classes, which means that the fuzzy sets obtained
do not discriminate classes. On the other hand,
as the fitness value moves further away from zero,
it indicates that the partitions obtained are more
than acceptable and may discriminate classes with
good accuracy.
Algorithm 2: Fitness Function.
Fitness(in : E, out : ValueFitness)
begin
1. For each attribute i = 1,...,|A|:
1.1 For each set f = 1,...,F
i
of attribute i
For each class k = 1,...,|C| calculate the probabil-
ity
P
ifk
=
Σ
eεE
k
µ
if
(e)
Σ
eεE
µ
if
(e)
1.2 For each class k = 1,...,|C| calculate the probabil-
ity
P
ik
= Σ
F
i
f=1
P
ifk
1.3 For each f = 1,...,F
i
calculate the probability
P
if
= Σ
|C|
k=1
P
ifk
1.4 For each f = 1,...,F
i
calculate the information
gain of attribute i and set f
I
if
= Σ
|C|
k=1
P
ifk
· log
2
P
ifk
P
ik
· P
if
1.5 For each f = 1,...,F
i
calculate the entropy
H
if
= Σ
|C|
k=1
P
ifk
· log
2
P
ifk
1.6 Calculate the I and H total of attribute i
I
i
=
F
i
f=1
I
if
and H
i
=
F
i
f=1
H
if
2. Calculate the fitness as :
ValueFitness =
|A|
i=1
I
i
|A|
i=1
H
i
end
Selection. Individual selection is by means of
tournament, taking subsets with size 2.
Crossing. The crossing operator is applied with a
probability of 0.3, crossing two individuals through a
single point, which may be any one of the positions
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
384
on the vector. Not all crossings are valid, since one of
the restrictions imposed on an individual is that the
array v
2
should not has all its genes to zero. When
crossing two individuals and this situation occurs,
the crossing is invalid, and individuals remain in
the population without interbreeding. If instead the
crossing is valid, the domain for each gene of array
v
1
is updated in individuals generated.
Mutation. Mutation is carried out according to a cer-
tain probability at interval [0.01, 0.1], changing the
value of one gene to any other in the possible do-
main. First, the gene of the array v
2
is mutated and
then checked that there are still genes with value 1 in
v
2
. In this case, the gene in v
2
is mutated and, in ad-
dition, the domains of this one and its adjacent genes
are updated in the vector v
1
. Finally, the mutation in
this same gene is carried out in the vector v
1
.
If when a gene is mutated in v
2
all genes are zero,
then the mutation process is not produced.
Stopping. The stopping condition is determined
by the number of generations situated at interval
[100, 150].
The genetic algorithm should find the best possi-
ble solution in order to achieve a more efficient clas-
sification.
In the next section we want to show with some
computational experiments that it is important con-
struct fuzzy partitions from real data versus transform
them because we will lost information and accuracy.
3 EXPERIMENTS
In this section we are going to show different ex-
periments to evaluate if the fuzzy partitions which
are constructed without making any transform of data
(EOFP Algorithm) are better than fuzzy partitions
which are constructed making certain transforma-
tion on imprecise data to convert them in crisp data
(OFP CLASS algorithm). All partitions are evalu-
ated classifying with a Fuzzy Random Forest ensem-
ble (FRF) (Bonissone et al., 2010) which is able to
handle imperfect data into the learning and the classi-
fication phases.
The experiments are designed to measure the be-
havior of fuzzy partitions used in the FRF ensem-
ble using datasets and results proposed in (Palacios
et al., 2009; Palacios et al., 2010) where the authors
use a fuzzy rule-based classifier to classify datasets
with imprecise data such as missing or interval. Also
they use uniform partitions to evaluate the datasets
and we are going to show how the results are better
when the partitions are fuzzy although they are con-
structed using the modified dataset instead of the orig-
inal dataset. Also we are going to show how the re-
sults in classification are still better if we don’t mod-
ify data to construct the fuzzy partitions. Due to we
are going to compare with results of (Palacios et al.,
2009; Palacios et al., 2010), we define the experimen-
tal settings quite similar to those proposed by them.
3.1 Datasets and Parameters for FRF
Ensemble
To evaluate fuzzy partitions, we have used real-world
datasets about medical diagnosis and high perfor-
mance athletics (Palacios et al., 2009; Palacios et al.,
2010), that we describe in Table 1.
Table 1: Datasets.
Dataset |E| |M| I
100ml-4-I 52 4 2
100ml-4-P 52 4 2
Long-4 25 4 2
Dyslexic-12 65 12 4
Dyslexic-12-01 65 12 3
Dyslexic-12-12 65 12 3
Table 1 shows the number of examples (|E|), the
number of attributes (|M|) and the number of classes
(I) for each dataset. “Abbr” indicates the abbreviation
of the dataset used in the experiments.
All FRF ensembles use a forest size of 100 trees.
The number of attributes chosen at random at a given
node is log
2
(|·|+1), where | · | is the number of avail-
able attributes at that node, and each tree of the FRF
ensemble is constructed to the maximum size (node
pure or set of available attributes is empty) and with-
out pruning.
3.2 Results
These experiments were conducted to test the accu-
racy of FRF ensemble when it uses fuzzy partitions
constructed from real-world datasets with imperfect
values using EOFP Algorithm. These results are com-
pared with the ones obtained by the GFS classifier
proposed in (Palacios et al., 2009), which use uni-
form partitions and with the results obtained by FRF
ensemble when uses fuzzy partitions constructed with
the OPF CLASS Algorithm.
It is important to clarify that OPF CLASS Algo-
rithm doesn’t work with imperfect data. For this rea-
son, to get the fuzzy partitions of these datasets we
CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA
385
have modified the original data. The interval or fuzzy
valueshave been changedby its average value. There-
fore we have transformed the interval and fuzzy val-
ues in crisp values and of this way the OPF CLASS
Algorithm can work with these datasets.
In these experiments we have used the available
datasets in “http://sci2s.ugr.es/keel/” and the available
results in (Palacios et al., 2009; Palacios et al., 2010).
There are datasets from two different real-world prob-
lems. The first one is related to the composition of
teams in high performance athletics and the second
one is a medical diagnosis problem. A more detailed
description of these problems may be found in (Pala-
cios et al., 2009; Palacios et al., 2010).
3.2.1 High Performance Athletics
The score of an athletics team is the sum of the indi-
vidual scores of the athletes in the different events. It
is the coach’s responsibility to balance the capabilities
of the different athletes in order to maximize the score
of a team according to the regulations. The variables
that define each problem are as follows:
There are four indicators for the long jump that
are used to predict whether an athlete will pass a
given threshold: the ratio between the weight and
the height, the maximum speed in the 40 meters
race, and the tests of central (abdominal) muscles
and lower extremities;
There are also four indicators for the 100 meters
race: the ratio between weight and height, the re-
action time, the starting or 20 m speed, and the
maximum or 40 m speed.
The datasets used in this experiment are the fol-
lowing: “Long-4” (25 examples, 4 attributes, 2
classes, no missing values and all attributes are
interval-valued), “100ml-4-I and “100ml-4-P” (52
examples, 4 attributes, 2 classes, no missing values
and all attributes are interval-valued).
As in (Palacios et al., 2009), we have used a
10 fold cross-validation design for all datasets. Ta-
ble 2 shows the results obtained in (Palacios et al.,
2009) and the ones obtained by the FRF ensem-
ble with the six combination methods which are ex-
plained in detail in (Bonissone et al., 2010). In Ap-
pendix 4 we present a brief intuitive description of
each of them. Except for the crisp algorithm pro-
posed in (Palacios et al., 2009), in Table 2, the interval
[mean min error,mean max error] obtained for each
dataset according to the decision process described in
Section 2.1.1, is shown. For each dataset, we high-
light in bold the best results obtained with each algo-
rithm.
The results obtained in classification by the ex-
tended GPS proposed in (Palacios et al., 2009) and
FRF ensemble, are very promising because we are
representing the information in a more natural and ap-
propriate way, and in this problem, we are allowing
the collection of knowledge of the coach by ranges of
values and linguistic terms.
The results of FRF ensemble are very competitive
with all fuzzy partitions but the fuzzy partitions ob-
tained with EOFP Algorithm are the best.
3.2.2 Diagnosis of Dyslexic
Dyslexia is a learning disability in people with normal
intellectual coefficient, and without further physical
or psychological problems explaining such disability.
A more detailed description of this problem can found
in (Palacios et al., 2009; Palacios et al., 2010).
In these experiments, we have used three different
datasets. Their names are “Dyslexic-12’, “Dyslexic-
12-01 and “Dyslexic-12-12”. Each dataset has 65
examples, 12 attributes. The output variable for each
of these datasets is a subset of the labels that follow:
- No dyslexic; - Control and revision; - Dyslexic; and
- Inattention, hyperactivity or other problems.
These three datasets differ only in their outputs:
Dyslexic-12” comprises the four mentioned
classes.
Dyslexic-12-01” does not make use of the class
“control and revision”, whose members are in-
cluded in class “no dyslexic”.
Dyslexic-12-12” does not make use of the class
“control and revision”, whose members are in-
cluded in class “dyslexic”.
All experiments are repeated 100 times for boot-
strap resamples with replacement of the training set.
The test set comprises the “out of the bag” elements.
In Table 3, we show the results obtained when
we run FRF ensemble with fuzzy partitions obtained
with OFP CLASS and fuzzy partitions obtained with
EOFP for datasets “Dyslexic-12”, “Dyslexic-12-01
and“Dyslexic-12-12’.
Also, in Table 3, we compare these results with
the best ones obtained in (Palacios et al., 2010)
((): partition used - four labels; (∗∗) partition used
- five labels). Again, in this table, the interval
[mean min error,mean max error] obtained for each
dataset according to the decision process described in
Section 2.1.1, is shown. For each dataset, we high-
light in bold the best results obtained with each algo-
rithm.
As comment about all experiment, we see that
FRF ensemble with EOFP fuzzy partitions obtains
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
386
Table 2: Comparative results for datasets of high performance athletics.
Dataset
100ml-4-I 100ml-4-P Long-4
Technique Train Test Train Test Train Test
EOFP
Fuzzy partition
FFR
SM1
[0.107,0.305] [0.130,0.323] [0.043,0.235] [0.093,0.290] [0.191,0.484] [0.083,0.349]
FRF
SM2
[0.110,0.306] [0.150,0.343] [0.045,0.237] [0.110,0.307] [0.165,0.449] [0.083,0.349]
FRF
MWL1
[0.070,0.265] [0.073,0.267] [0.032,0.224] [0.060,0.257] [0.085,0.364] [0.033,0.299]
FRF
MWL2
[0.060,0.254] [0.113,0.306] [0.043,0.235] [0.060,0.257] [0.111,0.391] [0.083,0.349]
FRF
MWLT1
[0.070,0.267] [0.073,0.267] [0.032,0.224] [0.060,0.257] [0.085,0.364] [0.033,0.299]
FRF
MWLT2
[0.060,0.252] [0.093,0.286] [0.038,0.231] [0.060,0.257] [0.107,0.386] [0,083,0.349]
OFP CLASS
Fuzzy partition
FFR
SM1
[0.139,0.331] [0.150,0.343] [0.098,0.291] [0.133,0.310] [0.120,0.404] [0.200,0.467]
FRF
SM2
[0.141,0.333] [0.150,0.343] [0.096,0.288] [0.093,0.290] [0.115,0.391] [0.200,0.467]
FRF
MWL1
[0.077,0.269] [0.093,0.287] [0.075,0.269] [0.073,0.270] [0.116,0.396] [0.100,0.417]
FRF
MWL2
[0.060,0.252] [0.093,0.287] [0.077,0.269] [0.073,0.270] [0.102,0.382] [0.100,0.367]
FRF
MWLT1
[0.077,0.269] [0.093,0.287] [0.075,0.267] [0.073,0.270] [0.107,0.387] [0.150,0.417]
FRF
MWLT2
[0.062,0.254] [0.093,0.287] [0.077,0.269] [0.073,0.270] [0.094,0.373] [0.067,0.333]
Crisp (Palacios et al., 2009) 0.259 0.384 0.288 0.419 0.327 0.544
GGFS (Palacios et al., 2009) [0.089,0.346] [0.189,0.476] [0.076,0.320] [0.170,0.406] [0.000,0.279] [0.349,0.616]
Table 3: Comparative results for datasets of dyslexia.
Dataset
Dyslexic-12 Dyslexic-12-01 Dyslexic-12-12
Technique Train Test Train Test Train Test
EOFP
Fuzzy partition
FRF
SM1
[0.000,0.238] [0.000,0.398] [0.022,0.223] [0.039,0.377] [0.001,0.263] [0.035,0.422]
FRF
SM2
[0.000,0.228] [0.000,0.399] [0.008,0.184] [0.022,0.332] [0.009,0.245] [0.032,0.411]
FRF
MWL1
[0.000,0.270] [0.000,0.406] [0.017,0.231] [0.045,0.383] [0.001,0.273] [0.019,0.430]
FRF
MWL2
[0.000,0.270] [0.000,0.407] [0.020,0.241] [0.056,0.385] [0.001,0.267] [0.026,0.406]
FRF
MWLT1
[0.000,0.263] [0.000,0.402] [0.012,0.216] [0.038,0.365] [0.000,0.265] [0.019,0.427]
FRF
MWLT2
[0.000,0.266] [0.000,0.404] [0.015,0.221] [0.049,0.373] [0.000,0.262] [0.024,0.422]
OFP CLASS
Fuzzy partition
FRF
SM1
[0.000,0.320] [0.002,0.511] [0.000,0.282] [0.000,0.413] [0.000,0.405] [0.000,0.477]
FRF
SM2
[0.000,0.327] [0.001,0.515] [0.000,0.253] [0.000,0.389] [0.000,0.402] [0.000,0.469]
FRF
MWL1
[0.000,0.261] [0.003,0.419] [0.000,0.264] [0.000,0.400] [0.000,0.335] [0.000,0.422]
FRF
MWL2
[0.000,0.270] [0.003,0.423] [0.000,0.276] [0.000,0.407] [0.000,0.343] [0.000,0.414]
FRF
MWLT1
[0.000,0.264] [0.004,0.419] [0.000,0.243] [0.000,0.386] [0.000,0.331] [0.000,0.422]
FRF
MWLT2
[0.000,0.267] [0.003,0.417] [0.000,0.259] [0.000,0.394] [0.000,0.343] [0.000,0.418]
Crisp CF
0
(*) 0.444 [0.572,0.694] 0.336 [0.452,0.533] 0.390 [0.511,0.664]
GGFS (*) [0.421,0.558] [0.219,0.759] [0.199,0.757]
GGFS CF
0
(*) [0.003,0.237] [0.405,0.548] [0.005,0.193] [0.330,0.440] [0.003,0.243] [0.325,0.509]
Crisp CF
0
(**) 0.556 [0.614,0.731] 0.460 [0.508,0.605] 0.485 [0.539,0.692]
GGFS (**) [0.490,0.609] [0.323,0.797] [0.211,0.700]
GGFS CF
0
(**) [0.038,0.233] [0.480,0.621] [0.000,0.187] [0.394,0.522] [0.000,0.239] [0.393,0.591]
better results in test than FRF with OFP CLASS
fuzzy partitions. FRF ensemble is a significant im-
provement over the crisp GFS. In these experiments
we can see that when the partitions are obtained with
the original data using the EOFP algorithm, the ac-
curacy is higher (the intervals of error are closer to
0 and they are less imprecise). As also discussed in
(Palacios et al., 2010) is preferable to use an algo-
rithm which is capable of learning with low quality
data than removing the imperfect information and us-
ing a conventional algorithm.
4 CONCLUSIONS
In this paper we have presented the EOFP Algorithm
for fuzzy discretization of continuous attributes. This
algorithm is able to work with imperfect information.
CONSTRUCTING FUZZY PARTITIONS FROM IMPRECISE DATA
387
We have performed several experiments using impre-
cise datasets, obtaining better results when working
with the original data. Besides, we have presented
a fuzzy decision tree which can work with imprecise
information.
Our final conclusion, as many papers in the liter-
ature are indicating, is that it is necessary to design
classification techniques so they can manipulate orig-
inal data that can be imperfect in some cases. The
transformation of these imperfect values to (imputed)
crisp values may cause undesirable effects with re-
spect to accuracy of the technique.
ACKNOWLEDGEMENTS
Supported by the project TIN2008-06872-C04-03 of
the MICINN of Spain and European Regional Devel-
opment Fund. Thanks to the Funding Program for
Research Groups of Excellence (04552/GERM/06)
granted by the Agencia Regional de Ciencia y Tec-
nolog´ı a - Fundaci´on S´eneca”, Murcia, Spain. Raquel
Mart´ınez is supported by the scholarship program FPI
from the “Fundaci´on S´eneca” of Spain.
REFERENCES
Au, W.-H., Chan, K. C., and Wong, A. (2006). A fuzzy ap-
proach to partitioning continuous attributes for classi-
fication. IEEE Tran, Knowledge and Data Engineer-
ing, 18(5):715–719.
Bonissone, P. (1997). Uncertainty Management in Infor-
mation Systems: From Needs to Solutions, chapter
Approximate reasoning systems: handling uncertainty
and imprecision in information systems, pages 369–
395. A. Motro and Ph. Smets, Eds. Kluwer Academic
Publishers.
Bonissone, P., Cadenas, J. M., Garrido, M., and ıaz-
Valladares, R. (2010). A fuzzy random forest. Int.
J. Approx. Reasoning, 51(7):729–747.
Cadenas, J., Garrido, M., Mart´ınez, R., and Mu˜noz,
E. (2010). Ofp class: An algorithm to gener-
ate optimized fuzzy partitions to classification. In
2nd International Conference on Fuzzy Computation,
ICFC2010., pages 5–13.
Cantu-Paz, E. and Kamath, C. (2001). Data Mining: A
heuristic approach, chapter On the use of evolutionary
algorithms in data mining, pages 48–71. Ideal Group
Publishing.
Casillas, J. and S´anchez, L. (2006). Knowledge extrac-
tion from data fuzzy for estimating consumer behavior
models. In IEEE conference on Fuzzy Systems, pages
164–170.
Cox, E. (2005). Fuzzy Modeling and Genetic Algorithms
for Data Mining and Exploration. Morgan Kaufmann
Publishers.
Garrido, M., Cadenas, J., and Bonissone, P. (2010). A clas-
sification and regression technique to handle hetero-
geneous and imperfect information. Soft Computing,
14(11):1165–1185.
Liu, H., Hussain, F., Tan, C., and Dash, M. (2002). Dis-
cretization: an enabling technique. Journal of Data
Mining and Knowledge Discovery, 6(4):393–423.
Otero, A. J., S´anchez, L., and Villar, J. R. (2006). Longest
path estimation from inherently fuzzy data acquired
with gps using genetic algorithms. In Int. Symposium
on Evolving Fuzzy Systems, pages 300–305.
Palacios, A. M., S´anchez, L., and Couso, I. (2009). Extend-
ing a simple genetic coopertative-competitive learning
fuzzy classifier to low quality datasets. Evolutionary
Intelligence, 2:73–84.
Palacios, A. M., S´anchez, L., and Couso, I. (2010). Diag-
nosis of dyslexia with low quality data with genetic
fuzzy systems. Int. J. Approx. Reasoning, 51:993–
1009.
Wang, X. and Kerre, E. (2001). Reasonable propierties for
the ordering of fuzzy quantities (i-ii). Journal of Fuzzy
Sets and Systems, 118:375–405.
APPENDIX
Combination Methods
We present, with a brief intuitive description, the
combination methods used in this paper. These meth-
ods are described with more details in (Bonissone
et al., 2010).
Method SM1: In this method, each tree of the en-
semble assigns a simple vote to the most voted
class among the reached leaves by the example.
The FRF ensemble classifies the example with the
most voted class among the trees.
Method SM2: The FRF ensemble classifies the
example with the most voted class among the
reached leaves by the example.
Method MWL1: This method is similar to SM1
method but the vote of each reached leaf is
weighted by the weight of the leaf.
Method MWL2: In this case, each leaves reached
assigns a weight vote to the majority class. The
ensemble decides the most voted class.
Method MWLT1: This method is similar to
MWL1 method but the vote of each tree is
weighted by a weight assigned to each tree.
Method MWLT2: Each leaf reached vote to the
majority class with a weighted vote with the
weight of the leaf and the tree to which it belongs.
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
388