A DISTRIBUTED ALGORITHM FOR MINING FUZZY

ASSOCIATION RULES

George Stephanides

University of Macedonia, Department of Applied Informatics

156 Egnatia Street, 540 06 Thessaloniki GREECE

Mihai Gabroveanu, Mirel Cosulschi, Nicolae Constantinescu

University of Craiova, Computer Science Department

13 A.I. Cuza Street, 200585 Craiova ROMANIA

Keywords:

data mining, fuzzy association rules, distributed mining.

Abstract:

Data mining, also known as knowledge discovery in databases, is the process of discovery potentially useful,

hidden knowledge or relations among data from large databases. An important topic in data mining research

is concerned with the discovery of association rules. The majority of databases are distributed nowadays.

In this paper is presented an algorithm for mining fuzzy association rules from these distributed databases.

This algorithm is inspired from DMA (Distributed Mining of Association rules) algorithm for mining boolean

association rules.

1 INTRODUCTION

Data mining, also known as knowledge discovery in

databases, is the process of discovery potentially use-

ful, hidden knowledge or relations among data from

large databases. An important task in data mining

process is the discovery of association rules. An

association rule describes an interesting relationship

among different attributes.

The task of discovering association rules was ﬁrst

introduced in (Agrawal R., 1993). Many of pro-

posed algorithms for mining association rules are se-

quential algorithms. The most popular are: Apri-

ori (Rakesh Agrawal, 1994), DHP, DIC. The basic

problem of ﬁnding fuzzy association rules was intro-

duced in (Chan Man Kuok, 1998).

Mining association rules based on fuzzy sets can

handle quantitative and categorical data, providing the

necessary support to use uncertain data types with ex-

isting algorithms. Today the majority of databases are

distributed. The records of transactions correspond-

ing to each customer operation registered in a stores

chain distributed in many locations form an example

of such databases. The main problem here is to dis-

cover the association rules from this distributed data.

In this paper we introduce an algorithm for mining

fuzzy association rules from these distributed data-

bases. This algorithm is an adaptation of DMA al-

gorithm used here for mining fuzzy association rules.

2 PROBLEM DEFINITION

2.1 Sequential problem deﬁnition

The formal problem deﬁnition as in (Chan Man Kuok,

1998) is the following:

Let DB = {t

, . . . , t

} a transactional database.

We consider that this database is characterized by a

set of categorical or quantitative attributes (items).

Let I = {i

, . . . , i

} the set of these attributes. We

note with dom(i

) the domain of values for the at-

tribute i

. For each attribute i

, (k = 1, . . . , m)

we will consider n(k) associated fuzzy sets. Let

= {f

, . . . , f

n(k)

} be the set of fuzzy sets. For

an attribute i

and a fuzzy set f

, the membership

function is µ

Deﬁnition 2.1. We call fuzzy itemset the tuple

hX, F

i, where X ⊆ I, and F

is a set of fuzzy

sets associated with items from X. A fuzzy itemset

hX, F

i is called k-fuzzy itemset if the number of at-

tributes from X is k.

Deﬁnition 2.2. A fuzzy association rule is an im-

plication with following form X ∈ A ⇒ Y ∈ B,

where X, Y ∈ I, X ∩ Y = ∅, X = {x

, . . . , x

Y = {y

, . . . , y

}. A = {a

, . . . , a

} and B =

, . . . , b

} are fuzzy sets related to attributes from

X, respectively Y . More exactly, a

∈ F

, (i =

1, . . . , p), and b

∈ F

, (i = 1, . . . , q).

206

Stephanides G., Gabroveanu M., Cosulschi M. and Constantinescu N. (2005).

A DISTRIBUTED ALGORITHM FOR MINING FUZZY ASSOCIATION RULES.

In Proceedings of the First International Conference on Web Information Systems and Technologies, pages 206-209

DOI: 10.5220/0001228802060209

 SciTePress

We denote this rule with hX, Ai ⇒ hY, Bi.

The intuitively signiﬁcation of this fuzzy associa-

tion rule hX, Ai ⇒ hY, Bi is: ”if a transaction (tuple)

satisﬁes the property X ∈ A then it will satisfy the

property Y ∈ B with a high probability also”.

Deﬁnition 2.3. The fuzzy support value of itemset

hX, F

i in DB is:

F S

hX,F

∈DB

∈X

])

|DB |

where

]) =



]), if µ

]) ≥ ω

0, otherwhise

and ω is a user speciﬁed minimum threshold for the

membership function. Thus, the values of member-

ship functions lesser than this minimum threshold are

ignored.

Deﬁnition 2.4. A fuzzy itemset hX, F

i is called

a large (frequent) fuzzy itemset if its fuzzy support

value is greater than or equal to the minimum support

threshold (minsup), namely F S

hX,F

≥ minsup.

An association rule is considered as interesting if it

has enough support and high conﬁdence value. This

association rule can be encountered under the name

strong rule.

Problem 1 (Sequential Mining Fuzzy Association

Rules). Given the database DB characterized by a

set of attributes I, the fuzzy sets associated with at-

tributes from I, ω the minimum support threshold for

membership function, the minimum support threshold

(minsup) and the minimum conﬁdence threshold

(minconf), extract all interesting fuzzy association

rules.

Deﬁnition 2.5. Let hX, Ai ⇒ hY, Bi be a fuzzy

association rule. The fuzzy support value of the

rule is deﬁned as fuzzy support value of the itemset

h{X, Y }, {A, B}i:

F S

hX,Ai⇒hY,Bi

= F S

h{X,Y },{A,B}i

Deﬁnition 2.6. A fuzzy association rule is called a

frequent rule if its fuzzy support value is greater than

or equal to the minimum support threshold (minsup),

namely F S

hX,Ai⇒hY,Bi

≥ minsup.

Based on discovered large fuzzy itemsets we can

generate all possible frequent rules, but in order to be

interesting they must have a high conﬁdence value.

Deﬁnition 2.7. Let hX, Ai ⇒ hY, Bi a fuzzy associ-

ation rule. The fuzzy conﬁdence value of the rule is

deﬁned as:

F C

hX,Ai⇒hY,Bi

F S

hZ,Ci

F S

hX,Ai

where Z = {X, Y } and C = {A, B}

The conﬁdence of the rule is deﬁned as the fraction

between the value of fuzzy support of the fuzzy item-

set hZ, Ci and the value of fuzzy support of the fuzzy

itemset hX, Ai.

Lemma 1. If a fuzzy itemset hX, F

i is a large fuzzy

itemset in DB, Y ⊆ X, F

⊆ F

, then also fuzzy

itemsets hY, F

i are large in DB.

From the above lemma we can draw the conclusion

that any fuzzy subitemset of a large fuzzy itemset is

also large.

The problem of sequential mining of fuzzy associ-

ation rules can be decomposed in two subproblems:

1. ﬁnd all large fuzzy itemsets.

2. generate the fuzzy association rules from the large

fuzzy itemsets founded.

The majority of algorithms for mining

fuzzy association rules (see (Gyenesei, 2000),

(Hong T.P., 2000)) are based on the algorithm

Apriori (Rakesh Agrawal, 1994).

2.2 Distributed problem deﬁnition

Let DB = {DB

, DB

, . . . , DB

} be a distributed

database over n sites S

, S

, . . . , S

. We denote with

D the number of transactions from DB, and with D

the number of transactions from DB

, for all i =

1, . . . , n.

Deﬁnition 2.8. For a given fuzzy itemset hX, F

we call global fuzzy support value the fuzzy support

value of hX, F

i in DB deﬁned as:

F S

hX,F

∈DB

∈X

])

|DB |

and global fuzzy support count in DB is deﬁned as:

CF S

hX,F

∈DB

∈X

])

Deﬁnition 2.9. For a given fuzzy itemset hX, F

i and

a database DB

we call local fuzzy support value in

the fuzzy support value of hX, F

i in DB

de-

ﬁned as:

F S

hX,F

∈DB

∈X

])

|DB

and local fuzzy support count in DB

is deﬁned as:

CF S

hX,F

∈DB

∈X

])

Let minsup be the minimum support threshold.

Deﬁnition 2.10. A fuzzy itemset hX, F

i is called

global large fuzzy itemset if F S

hX,F

≥ minsup.

A DISTRIBUTED ALGORITHM FOR MINING FUZZY ASSOCIATION RULES

207

Deﬁnition 2.11. A fuzzy itemset hX, F

i is called

local large fuzzy itemset at site S

if F S

hX,F

≥

minsup.

Deﬁnition 2.12. If a fuzzy itemset hX, F

i is both

globally large and locally large at a site S

, it is called

gl-large fuzzy itemset at site S

In the following, we will denote with L the set of

all globally large fuzzy itemsets in DB, and with L

(k)

the set of all globally large k-fuzzy itemsets in DB.

Problem 2 (Distributed Mining Fuzzy Association

Rules). Given the set of items I, the distributed data-

base DB = {DB

, DB

, . . . , DB

}, the fuzzy sets

associated with attributes from I, the minimum sup-

port threshold (minsup) and the minimum conﬁdence

threshold (minconf ), extract all global fuzzy associ-

ation rules.

3 THE DISTRIBUTED

ALGORITHM

In (Cheung D.W., 1996), the authors proposed a DMA

algorithm for mining boolean association rules from

distributed databases.

3.1 Generate set of candidate fuzzy

itemsets

The candidate fuzzy itemsets reduction is made on the

basis of the properties of the global large fuzzy item-

sets and local large fuzzy itemsets subsequently pre-

sented:

Lemma 2. If a fuzzy itemset hX, F

i is locally large

at a site S

, then all its subsets are also locally large

at site S

Lemma 3. If a fuzzy itemset hX, F

i is globally

large, then there exist a site S

, (1 ≤ i ≤ n), such

that hX, F

i is locally large at site S

Lemma 4. If a fuzzy itemset hX, F

i is gl-large fuzzy

itemset at a site S

, (1 ≤ i ≤ n), then all its sub-fuzzy

itemsets, hY, F

i, Y ⊆ X, are also gl-large fuzzy

itemsets at site S

We use GL

to denote the set of all gl-large fuzzy

itemsets at site S

, and GL

(k)

to denote all k-gl-large

fuzzy itemsets at site S

Lemma 5. If hX, F

i ∈ L

(k)

, (i.e. is a globally

large fuzzy k-itemset), then there exists a site S

(1 ≤ i ≤ n) such that hX, F

i and all its (k-1) sub-

fuzzy itemsets are gl-large fuzzy itemsets at site S

Like in the DMA algorithm, which is an adapta-

tion of the Apriori algorithm, at k-th iteration, the

set of candidate sets is obtained by applying the

Fuzzy

Apriori Gen function on L

(k−1)

. We denote

this set by CA

(k)

. More exactly,

(k)

= Fuzzy Apriori Gen(L

(k−1)

For each site S

, (1 ≤ i ≤ n), we denote with

(k)

the set of candidate fuzzy itemsets generated

applying Fuzzy Apriori Gen on GL

(k−1)

, i.e.,

(k)

= Fuzzy

Apriori Gen(GL

(k−1)

Because GL

(k−1)

⊆ L

(k−1)

, then CG

(k)

is a

subset of CA

(k)

. Following, we denote CG

(k)

i=1

(k)

Theorem 1. For every k > 1, the set of all globally

large k-fuzzy itemsets L

(k)

is a subset of CG

(k)

i=1

(k)

Applying the Theorem 1 the result is that we can

use the set CG

(k)

, which is a superset of L

(k)

, as a

candidate set instead of CA

(k)

, and could be much

smaller that CA

(k)

Thus the candidate set for L

(k)

will be generated at

k-th iteration in the following manner: ﬁrst the set of

candidate sets CG

(k)

can be generated locally at each

site S

. After this step, sites exchange fuzzy support

count and compute the set of gl-large fuzzy itemsets

(k)

. Based on GL

(k)

, the candidate fuzzy itemsets

at S

for (k + 1)-st iteration can then be generated.

3.2 Local pruning of candidate sets

The Lemma 3 can be used to perform a local prun-

ing of the set of candidate fuzzy item sets. At a site

, after the set of candidate fuzzy itemsets CG

(k)

generated, in order to ﬁnd if a candidate fuzzy itemset

hX, F

i ∈ CG

(k)

is gl-large fuzzy itemset, the fuzzy

support count must be requested from all other sites.

We can prune this request for fuzzy support count for

some candidates using a local pruning technique. The

basic idea is that at site S

, if a candidate fuzzy item-

set hX, F

i ∈ CG

(k)

is not locally large at site S

there is no need for S

to compute global support to

ﬁnd out if it is globally large. This is possible because

in this case, either hX, F

i is not globally large, or

it will be locally large at some other site, and hence

only the sites where hX, F

i is locally large need to

be responsible to ﬁnd its global support count. We use

(k)

to denote those fuzzy candidate items in CG

(k)

which are locally large at site S

3.3 The algorithm outline

In Algorithm 1 is presented in detail the FUZZY-

DMA algorithm for distributed mining of association

WEBIST 2005 - INTERNET COMPUTING

208

Algorithm 1 FUZZY-DMA

INPUT:

D B

, . . . , DB

- the database partition at each site.

minsup - the minimum support threshold.

F - the set of fuzzy sets associated with attributes from

OUTPUT:

L - the set of all globally large fuzzy itemsets in DB.

METHOD: For all k ≥ 1, iterates the following algorithm

distributively at each site S

. At the end of each step a syn-

chronization is required to develop global count. The al-

gorithm terminates when either L

(k)

returned is empty or

candidate CG

(k)

= ∅.

1: if k = 1 then

2: T

(1)

= Get

Local F uzzy Count(DB

, ∅, 1)

3: else

4: CG

(k)

= ∪

i=1

(k)

= ∪

i=1

F uzzy

Apriori Gen(GL

(k−1)

)

5: T

(k)

= Get

Local F uzzy Count(DB

, CG

(k)

, i)

6: for all hX, Ai ∈ CG

(k)

7: if CF S

hX,Ai

≥ minsup × D

then

8: insert hX, Ai into LL

(k)

{Broadcast support count request to compute global

fuzzy support count}

9: for j = 1, . . . , n; j 6= i do

10: Broadcast Count Request(LL

(k)

, S

)

{Receive support count request}

11: for j = 1, . . . , n; j 6= i do

12: receive LL

(k)

extract CF S

hX,Ai

from T

(k)

and send

to S

{Compute global fuzzy support count}

13: for all hX, Ai ∈ LL

(k)

14: receive CF S

hX,Ai

from sites S

, where j 6= i

15: CF S

hX,Ai

p=1

CF S

hX,Ai

16: if CF S

hX,Ai

≥ minsup × D then

17: insert hX, Ai into G

(k)

18: broadcast G

(k)

{Compute global L

(k)

}

19: receive G

(k)

from all other sites S

, (i 6= j)

20: L

(k)

= ∪

i=1

(k)

21: return L

(k)

rules. At every iteration (k-th iteration), each site S

computes the set of gl-large fuzzy itemsets GL

(k)

the site, and from these computes the set of all glob-

ally large fuzzy itemsets L

(k)

Initially, each site S

generates the complete global

candidates fuzzy itemsets CG

(k)

using the globally

(k−1)-fuzzy itemsets, L

(k−1)

, generated at the end of

step k − 1, and locally large candidate fuzzy itemsets

based on gl-large fuzzy itemsets found at site S

(k − 1) step applying function Fuzzy

Apriori Gen on

(k−1)

(candidate sets generation).

For each hX, Ai ∈ CG

(k)

, scan the database DB

to compute the local fuzzy support count CF S

hX,Ai

and store it into the hash tree T

(k)

using function

Get

Local Fuzzy Count, and generate set of locally

large fuzzy itemsets LL

(k)

. After this, S

broadcasts

the candidate fuzzy itemsets from LL

(k)

to other sites

to collect fuzzy support counts. The fuzzy support

counts are needed to compute global support counts

and generate set of all gl-large k-fuzzy itemsets at site

Finally computed gl-large fuzzy itemsets are broad-

casted to all other sites, and these can compute L

(k)

The algorithm is stopped when either L

(k)

returned

is empty or candidate set CG

(k)

is empty.

4 CONCLUSION

In this article, it is proposed an algorithm for min-

ing fuzzy association rules from distributed databases

more efﬁciently than a sequential algorithm. In the fu-

ture, we will study the means of automatically ﬁnding

of fuzzy sets associated with database attributes. The

other direction of improvement is related to the study

of new relationships between local and global large

itemsets in order to reduce the number of messages

exchanged among sites.

REFERENCES

Agrawal R., Imiclinski T., S. A. (1993). Mining associa-

tion rules between sets of items in large databases. In

Proceedings of the 1993 ACM SIGMOD Conference

Washington DC, USA.

Chan Man Kuok, Ada Fu, M. H. W. (1998). Mining

fuzzy association rules in databases. SIGMOD Rec.,

27(1):41–46.

Cheung D.W., Jiawei Han, N. V. F. A. Y. F. (1996). A fast

distributed algorithm for mining association rules. In

In 4th International Conference on Parallel and Dis-

tributed Information Systems (PDIS ’96), pages 31–

43. IEEE Computer Society Technical Committee on

Data Engineering, and ACM SIGMOD.

Gyenesei, A. (2000). Mining weighted association rules for

fuzzy quantitative items. In Principles of Data Mining

and Knowledge Discovery, pages 416–423.

Hong T.P., Kuo C.S., C. S. W. S. (2000). Mining fuzzy

rules from quantitative data based on the apriotitid al-

gorithm. In Proceedings of the 2000 ACM symposium

on Applied computing, pages 534–536.

Rakesh Agrawal, R. S. (1994). Fast algorithms for mining

association rules. In Bocca, J. B., Jarke, M., and Zan-

iolo, C., editors, Proc. 20th Int. Conf. Very Large Data

Bases, VLDB, pages 487–499. Morgan Kaufmann.

A DISTRIBUTED ALGORITHM FOR MINING FUZZY ASSOCIATION RULES

209