6.1.3 Generating k-tree-Expressions
(Pavón, 2006) presented an algorithm named
MATRIX APRIORI, which incorporates the positive
characteristics from Apriori and FP-growth. Matrix
Apriori offers a simpler and more efficient solution
for the process of mining association rules than
previous proposals. The details of implementation
are mentioned in (Pavón, 2006). The generated
frequent 1-tree-expressions are passed as an input.
The output is frequent patterns that pass the
minimum support test. The set of frequent paths are
displayed to the user. The users can drag the paths of
interest to form the antecedent and the consequent
templates and can supply the support and confidence
of the rule as well. Frequent structure might yield
useful associations while infrequent structure will
never yield useful ones. The user can select a
template from the pool of frequent structure.
6.2 Generating Association Rules
In relational database, each predicate name is an
attribute name and can be instantiated by a certain
value. In terms of XML, a predicate is the full path
describing data. For example, "student" is the value
of attribute order/person/job. The only relation we
have now is the structured layer (set of frequent
paths). The attributes of the frequent structure can
guide the user to form an interesting template. The
user can supply the template, the rule support and
the rule confidence. Here, the user introduced two
support values, one for structure and the other for
association rules support (AR support.). The support
of structure is the frequency of paths relative to the
whole number of transaction objects. While the
support of AR is the frequency of a certain attribute
value relative to the transaction objects containing
this attribute. For example, suppose 50% of the
whole number of orders contains the job of the
customer. If the database contains 4 orders, it means
that 2 orders only contain the job. Assume the two
orders had "student" as a value. The support of
"student" will be 100%. The question now is how to
derive associations that comply with the chosen
templates? The answer is that we can apply the
algorithm of meta-rule guided mining of single
variable rules introduced in (Fu, 1995 a). (Fu, 1995
a) borrowed the concept of Apriori presented by
(Agrawal, 1994).
In a nutshell, the count of each
distinct value of each predicate is retrieved using
(DISTINCT, COUNT) X-query statements. Each
frequent predicate values can be joined to generate
more frequent candidates. Association rules that
satisfy the supplied confidence are generated.
7 CONCLUSIONS AND FUTURE
WORK
As the emerging standard for semi-structured data
(XML) has been widely used, the need to mining
XML becomes important. In this study, we proposed
a method for designing templates for mining
association rules from XML. (Fing, 2004) proposed
templates based on user's prior knowledge about
XML structure specifications. Our design differs in
that we don't assume users' knowledge because of
the irregular structure of XML. Discovering frequent
structure that satisfies the user support is the first
stage. To implement this stage, we applied an
Apriori-Matrix algorithm (Pavón, 2006) which
outperforms Aprioir and FP growth. While mining
frequent structure, we allowed the user exploration
of data to be able to define constrains on the mined
data even before the whole mining process starts. X-
query can be used to constraint the XML data
without the need to convert the whole xml into
relational database. Frequent paths serve as a
structured layer over the semi-structured data. Each
one of its paths can be considered as an attribute.
The users can select the paths of interest to define
the rule template. Now, the process of discovering
association rules can take the same direction of (Fu,
1995 a) in the context of relational database. There
are many open issues related to the implementation.
Highly friendly user interface is required to
encourage the user intervention. Also, tightly
coupling the implementation with X-query is a step
forward towards coupling the data mining process
with database servers. The idea of integrating the
data mining and data warehouse comes from
relational database world.
(Chaudhuri, 1998)
mentioned the advantages of such integration. Our
work can be used as an interface to the proposed
XML-DMQL (Fing, 2005).
REFERENCES
Agrawal, R. and Srikant, R., 1994. Fast algorithms for
mining association rules in large databases. In
Proceedings of 20th International Conference on Very
Large Data Bases, Santiago, Chile, September 12-15.
pages 487–499.
Baraga, D., Campi, A., Ceri, S., Klemettinen, M. and
Lanza, P., 2003. Discovering interesting information
WEBIST 2007 - International Conference on Web Information Systems and Technologies
218