
 
Naturally, categories, movies, roles, actors, 
awards and directors are concepts, with categories, 
roles and awards specialized by some other 
concepts. Relations are represented by directed 
arrows. In order to allow the discovery of patterns 
involving real actors and directors, the ontology is 
enriched with a leaf concept for each known actor 
and director. From this ontology and each row in a 
denormalized table containing one row for each 
participation on a movie, we can construct the 
dataset to mine. In order to find patterns in the form 
(director,  category), (actor,  role,  category), 
(category,  award), we only need the axioms that 
define equality for each leaf concept. 
The identification of frequent molecular 
fragments presents additional challenges to the 
framework, since those patterns are structured 
patterns, in the form of graphs. Allied to this 
structural nature, molecules may have multiple 
atoms for the same chemical element. In order to 
deal with these particularities, the framework only 
demands the definition of a new class of constraints 
–structural constraints. A structural constraint is a 
content constraint that defines a differentiated 
areJoinable axiom. It only considers that two 
itemsets are joinable if the maximal proper suffix of 
the first itemset is equal to the maximal proper 
prefix of the second one. 
10100
......),(:...,...
−
⇔==∀
nnnn
ttsstseareJoinabltttsss
 
Note that, this predicate states the new conditions 
to generate a candidate, and these conditions are just 
the same used by sequential pattern mining 
algorithms. For avoiding the problem of the 
presence of multiple atoms of the same element, we 
can represent a molecule as a chain of bonds, each 
one involving two different atoms, as represented in 
Figure 2-bottom. This is achieved by representing 
each atom as an indexed one, for allowing multiple 
identical bonds. For example, the ring of carbons in 
Figure 2-bottom (right) would be represented as 
(C
0
–C
1
,C
0
–C
3
,C
1
–C
2
,C
2
–C
3
). With these simple 
tools, it is possible to identify exactly the same 
patterns found by graph-mining algorithms. 
5 CONCLUSIONS 
The recent advances in the area of knowledge 
representation makes possible to represent 
background knowledge, in an effective way, using 
ontologies. Since one of the main drawbacks of data 
mining, in general, and of pattern mining, in 
particular, is to ignore domain knowledge, with 
those advances, it is time to surpass that feature. 
This paper explains how the Onto4AR 
framework can solve some of the main difficulties 
faced by transactional pattern mining approaches, 
like dealing with multiple concepts in the same 
transaction either on dealing with structured data. 
We showed that with the incorporation of 
background knowledge in the core of the mining 
process, by using domain ontologies and by defining 
a set of constraints above them, it is possible to 
address those difficulties naturally. 
From the case studies described, it is easy to 
realize the potentialities of the Onto4AR framework. 
Indeed, the framework provides the necessary tools 
to overcome several difficulties faced by pattern 
mining techniques. Its conception, based on a 
standard and widely recognized instrument for 
representing existent domain knowledge, is one of 
its strongest points, followed closely by its 
simplicity and its extensibility. 
However, experiments show that candidate-
based algorithms are not the most adequate to 
perform the discovery. Definitely, the explosion of 
candidates, resulting from the existence of multiple 
equivalent concepts (as defined by their equal 
predicate), strongly impairs algorithms performance. 
However, and since several algorithms following 
other approaches have been proposed with a fair 
success, it is likely that they can be adapted to 
function on this new context. 
REFERENCES 
Agrawal, R., Imielinsky, T., and Swami, A. Mining 
Association Rules between Sets of Items in Large 
Databases. In Proc. ACM SIGMOD Conf 
Management of Data. 1993. 207-216 
Antunes, C., and Oliveira, A.L., Constraint Relaxations for 
Discovering Unknown Sequential Patterns. In 
Knowledge Discovery in Inductive Databases: Third 
International Workshop, Springer, 2005, 11-32 
Antunes, C. Onto4AR: a framework for mining 
association rules. In Proc. Int’l Workshop on 
Constraint-Based Mining and Learning, 2007. 37-48 
Antunes, C. An ontology-based method for mining 
frequent patterns. Technical report, Instituto Superior 
Técnico. 2008. 
Bayardo, R.J., The Many Roles of Constraints in Data 
Mining. In SIGKDD Explorations, vol. 4, nr. 1 pp. i-ii, 
2002. 
Garofalakis, M.N., Rastogi, R., and Shim, K., SPIRIT: 
Sequential Pattern Mining with Regular Expression 
Constraints. In Proc. Very Large Databases Conf. 
1999, 223-234 
Maedche, A., Ontology Learning for the Semantic Web, 
Kluwer Academic Publishers, 2002. 
Wiederhold, G., Movies Database Documentation, 1989. 
MINING PATTERNS IN THE PRESENCE OF DOMAIN KNOWLEDGE
193