A Post-Processing Strategy for Association Rules in Knowledge Discovery

Luiz Fernando da Cunha Cintra

1 a

, Rodigo da Silva Dias

and Rogerio Salvini

1 b

Instituto de Inform

atica, Universidade Federal de Goi

as, Goi

ania-GO, Brazil

Department of Psychiatry, University of Sao Paulo Medical School, S

ao Paulo-SP, Brazil

Keywords:

Association Rule Mining, Post-Processing, Knowledge Discovery.

Abstract:

Association Rule Mining (ARM) is a traditional data mining method that describes associations among el-

ements in transactional databases. A well-known problem of ARM is the large number of rules generated,

requiring approaches to post-process these rules so that a human expert can analyze the associations found. In

certain scenarios, experts focus on exploring a speciﬁc element within the data, and a search based on this item

can help reduce the problem. Few methods concentrate on post-processing generated rules targeting a speciﬁc

item of interest. This study aims to highlight relevant associations of a particular element in order to gain

knowledge about its role through its interactions and relationships with other factors. The paper introduces a

post-processing strategy for association rules, selecting and grouping rules pertinent to a speciﬁc item of in-

terest as provided by a domain expert. Additionally, a graphical representation facilitates the visualization and

interpretation of associations between rules and their groupings. A case study demonstrates the applicability

of the proposed method, effectively reducing the number of relevant rules to a manageable level for expert

analysis.

1 INTRODUCTION

Association Rule Mining (ARM) (Agrawal et al.,

1993) is a well-known method for extracting patterns

from a dataset. Initially proposed to discover as-

sociations in supermarket basket data, over the last

decades, it has been applied to various other domains,

such as construction (Cheng et al., 2016), product de-

velopment (Karimi-Majd and Mahootchi, 2015), edu-

cation (Matetic et al., 2015), sports (Weidner et al.,

2020), building maintenance (Zhang et al., 2021),

medicine (Castro et al., 2018) (Wei and Scott, 2015),

and urban planning (Balasubramani et al., 2016). One

of the main challenges related to using association

rules is the massive number of rules that ARM algo-

rithms can generate. This issue is well-known and has

been studied for over 20 years (Baesens et al., 2000).

ARM aims to identify frequent and meaningful

associations within a transactional database. In the

context of ARM, a transactional database stores data

resulting from interactions between two or more par-

ties, with each interaction referred to as a transaction.

Typically, each transaction includes an identity num-

ber and a list of the items making up the transaction,

https://orcid.org/0009-0002-6709-8789

https://orcid.org/0000-0001-8889-6654

such as a customer’s purchase, a ﬂight booking, or a

user’s clicks on a web page (Han et al., 2012).

An association rule is an “if-then” type of rule,

formalized by Agrawal et al. (Agrawal et al., 1993)

that reveals patterns or relationships within a set of

transactions. A more general formalization states

that an association rule has the form A → B, where

A and B are itemsets, i.e., A = {a

,...,a

} and

B = {b

,...,b

}, with a

and b

being items from

a database. A is referred to as the antecedent, and

B as the consequent of the rule. Given that I is the

set of all items in the database, A ⊂ I, B ⊂ I, and

A ∩ B =

0, meaning that antecedent and consequent

are items from the database and do not have repeated

items. The size of a rule is the number of items it

contains.

Let D be a set of transactions, where each trans-

action T is an itemset such that T ⊂ I. The support

of a rule A → B is the percentage of transactions in D

containing the items of A ∪ B, i.e., the percentage of

transactions where the items from the antecedent and

consequent of the rule occur together. The conﬁdence

of a rule A → B is the percentage of transactions in

D that, if containing A, also contain B (Agrawal and

Srikant, 1994).

Essentially, the ARM problem involves generat-

ing, from a set of transactions D, all rules with sup-

Cintra, L., Dias, R. and Salvini, R.

A Post-Processing Strategy for Association Rules in Knowledge Discovery.

DOI: 10.5220/0012465800003654

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024), pages 119-130

ISBN: 978-989-758-684-2; ISSN: 2184-4313

119

port and conﬁdence greater than the minimum val-

ues speciﬁed by the user. Consequently, this prob-

lem breaks down into two subproblems: 1) identify-

ing all itemsets with support greater than or equal to

the minimum support (referred to as frequent item-

sets), and 2) generating rules from the frequent item-

sets that have conﬁdence greater than or equal to the

speciﬁed minimum conﬁdence. Numerous ARM al-

gorithms adopt the support/conﬁdence model to gen-

erate association rules. Examples of such algorithms

include Apriori (Agrawal and Srikant, 1994), Eclat

(Zaki, 2000), and FP-Growth (Han et al., 2004).

ARM, in general, is an exploratory activity. Un-

like a classiﬁcation task, where a target variable of

the dataset is used to guide the construction of classi-

ﬁcation rules, ARM algorithms seek any statistically

relevant pattern within the data (Freitas, 2000). This

allows the analysis of speciﬁc items both in the conse-

quent and the antecedent of a rule. An item of interest

positioned in the antecedent enables the extraction of

the consequences of its presence, not just what leads

to its occurrence, as is the case in classiﬁcation rules.

For example, in the work of Wei and Scott (Wei and

Scott, 2015), the item of interest is a vaccine, and the

consequent is adverse events. Another aspect that can

be explored in a generated set of association rules is

the choice of rules that do not include an item of in-

terest but share items with other rules that do, thus

highlighting these relationships. Therefore, a method

that considers these characteristics is particularly ben-

eﬁcial for experts looking for speciﬁc insights into

their datasets, contributing to discovering knowledge

in particular domains.

The main objective of this work is to propose a

post-processing method for association rules that se-

lect the rules that, directly or indirectly, are related

to a speciﬁc object of study for a domain expert

based on an item of interest in the database. This

method was motivated by a clinical study presented

in the work of Castro and colleagues (Castro et al.,

2018), in which ARM was used to uncover correla-

tions between events related to the menstrual cycle.

Their efforts aimed to construct a network of relation-

ships from groupings of rules that could reveal spe-

ciﬁc clinical conditions inﬂuenced by the menstrual

cycle. Essentially, the study addressed the question of

whether a particular event forms a continuum related

to women’s hormonal ﬂuctuations or whether they are

isolated epiphenomena. Thus, the present work aims

to extract rules that show how the occurrence of an

item of interest inﬂuences other items and how differ-

ent items are inﬂuenced by the item of interest, high-

lighting only those most relevant to the expert. We

also introduce a graphical representation in the form

of a graph to visualize the multiple relationships be-

tween the associations around the item of interest.

We extensively reviewed recent literature and did

not identify any method speciﬁcally designed to han-

dle the speciﬁed scenario. While we found four meth-

ods that focus on an item of interest, none involve

the appearance of this item on both sides of the rule.

Moreover, no existing method explicitly seeks to es-

tablish connections between rules, as proposed in this

work.

The remainder of this paper is structured as fol-

lows. Section 2 presents the proposed association rule

post-processing method. In Section 3, we present the

results obtained when applying the proposed method

to real data on deaths due to COVID-19 in Brazil.

Section 4 makes a comparison between our method

and related methods in the literature. The conclusions

of this work are in Section 5.

2 PROPOSED METHOD

The proposed method is tailored to address issues

where a speciﬁc feature is a focal point for investi-

gation. It strategically selects rule subsets from an

association rule set generated by an ARM algorithm,

guided by a particular item of the database (referred

to as item of interest) and a measure M reﬂecting the

importance of the rules. Moreover, the method exclu-

sively considers rules with a maximum size of three,

recognizing that more extensive rules pose challenges

in interpretation and practical application. The mini-

mum rule size is set at two, aligning with the necessity

for a rule antecedent in conﬁdence calculations. Ad-

ditionally, the method excludes rules containing items

with missing values, ensuring a robust and reliable

analysis.

The method organizes the selected rules into eight

distinct types based on their relational structures.

Type 1 addresses bidirectional rules, emphasizing the

mutual inﬂuence between the item of interest and an-

other element. Types 2, 3, 4, and 5 involve the in-

clusion of the item of interest in the rule’s antecedent,

spotlighting its role in strengthening the factors lead-

ing to another entity. Conversely, Types 6, 7, and 8

feature the item of interest in the consequent of the

rule, revealing how associations with other factors

may fortify or amplify the likelihood of the item of

interest. Organizing rules into these speciﬁc group-

ings facilitates a deeper understanding of associations

related to the research subject. Moreover, the pro-

posed method ﬁlters out rules not directly pertinent to

the research interest, reducing the number of rules for

analysis.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

120

We introduce the following eight types of group-

ings, illustrating them with association rules related

to dengue fever

. These rules are not derived from a

database but are based on information provided by a

medical professional, serving as illustrative examples

of the grouping types. In the subsequent section, we

will present results obtained from actual databases.

The focal item of interest is analyzing the presence

of dengue fever (dengue = yes) in the associations.

The symbol M indicates a value of a speciﬁc mea-

sure quantifying the degree of dependence between

the antecedent and the consequent of a rule, guiding

the grouping process.

Type 1 Group

Type 1 encompasses sets of bidirectional rules, mean-

ing rules where some item implies the item of interest

and the item of interest implies that item. This type

of information reinforces associations, as it indicates

a close connection between the entities. The example

below shows that fever for more than seven days and

dengue are strongly associated.

f ever = [> 7days] → dengue = yes

dengue = yes

dengue = yes → f ever = [> 7days]

Type 2 Group

This type of grouping describes sets of rules suggest-

ing a strengthened association between the item of in-

terest and another speciﬁc item linked with the same

consequent. This strengthening is observed when

comparing the values of the particular metric to the

individual associations of these items with the same

consequent. In the example below, hemophilia and

dengue individually cause bleeding. However, when

both conditions (hemophilia and dengue) coexist in

an individual, the association with bleeding is inten-

siﬁed. This is quantiﬁed through the metric M, where

would surpass both M

and M

hemophilia = yes → bleeding = yes, (M

)

dengue = yes

dengue = yes → bleeding = yes, (M

)

hemophilia = yes,dengue = yes

dengue = yes

dengue = yes → bleeding =

yes, (M

)

Type 3 Group

Type 3 indicates when the item of interest strength-

ens an existing association. Unlike Type 2, there is no

prior rule where the item of interest is already related

Dengue fever (CID A90) is a mosquito-borne tropical

disease caused by the dengue virus.

to the consequent. The example below shows that se-

vere abdominal pain would lead an individual to be

admitted to the Intensive Care Unit (ICU). The occur-

rence of dengue alongside intense abdominal pain re-

inforces the need for treatment in the ICU (M

> M

It is important to note that there is no rule explicitly

stating the association between dengue and ICU ad-

mission.

abdominalPain = intense → ICU = yes, (M

)

abdominalPain = intense,dengue = yes

dengue = yes

dengue = yes → ICU =

yes, (M

)

Type 4 Group

Type 4 is similar to Type 3, but this time, it empha-

sizes when another item reinforces an existing associ-

ation between the item of interest and the item in the

consequent. In this scenario, there is no rule estab-

lishing a direct connection between the other element

and the consequent. In the example below, we observe

that dengue is associated with hospitalization. How-

ever, the presence of pregnancy along with dengue in-

tensiﬁes the likelihood of hospitalization (M

> M

even without a rule explicitly stating that pregnancy

alone leads to hospitalization.

dengue = yes

dengue = yes → hospitalization = yes, (M

)

pregnant = yes,dengue = yes

dengue = yes

dengue = yes → hospitalization =

yes, (M

)

Type 5 Group

Type 5 consists of only one rule of size 3, where the

item of interest is associated with an item in the an-

tecedent and another in the consequent, without indi-

vidual associations between the antecedent items and

the consequent item. In the example below, dengue

and intense abdominal pain are associated with respi-

ratory distress. However, there are no individual asso-

ciations between dengue and respiratory distress nor

between intense abdominal pain and respiratory dis-

tress.

dengue = yes

dengue = yes,abdominalPain = intense →

respiratoryDistress = yes

Type 6 Group

Type 6 deals with groups of rules where the item of in-

terest appears as the consequent of these rules. In this

grouping, three rules aim to demonstrate that the con-

junction of other items leading to the item of interest

strengthens the association. In the example, we ob-

serve that fever for more than seven days and throm-

bocytopenia can be symptoms of dengue individu-

A Post-Processing Strategy for Association Rules in Knowledge Discovery

121

ally. The co-occurrence of these two symptoms re-

inforces the association with the likelihood of dengue

would be greater than M

and M

f ever = [> 7days] → dengue = yes

dengue = yes

dengue = yes, (M

)

thrombocytopenia = yes → dengue = yes

dengue = yes

dengue = yes, (M

)

f ever = [> 7days],thrombocytopenia = yes →

dengue = yes

dengue = yes, (M

)

Type 7 Group

Similar to Type 6, Type 7 also deals with rules where

the item of interest is in the consequent; however, in

this case, one of the other items is not individually

associated with the item of interest. In the example

below, the ﬁrst rule presents the association between

fever for more than seven days and dengue, while the

second rule shows that mild bleeding enhances this

association (with M

> M

). Moreover, there is no

rule associating mild bleeding with dengue.

f ever = [> 7days] → dengue = yes

dengue = yes

dengue = yes, (M

)

mildBleeding = yes, f ever = [> 7days] →

dengue = yes

dengue = yes, (M

)

Type 8 Group

Type 8 highlights items that, when examined individ-

ually, are not associated with the item of interest in the

consequent but, when combined, demonstrate a sig-

niﬁcant relationship. In the example, petechiae and

intense abdominal pain are associated with dengue.

However, there are no other rules associating these

symptoms with dengue individually.

petechiae = yes,abdominalPain = intense →

dengue = yes

Algorithm

The algorithm for the proposed method selects and

groups rules according to the types described above.

To this end, the algorithm input is a set R of rules

of size 2 or 3 generated by an ARM algorithm based

on support and conﬁdence. The user must specify an

item of interest, ¯a, corresponding to a variable with

a speciﬁc value in the data set. The user must also

deﬁne an evaluation measure, referred to as M, which

is used to evaluate the correlation between the items

present in a rule. Measures such as Lift, Conviction,

and Odds Ratio are commonly used for this purpose.

The value 1 in these measures indicates that there is

no correlation between the items in the antecedent and

the consequent. This means that the antecedent and

consequent of the rule are independent. In this case,

the rule is ignored. To ensure that only rules with

a strong association between items are selected, the

user can specify a threshold around the no-correlation

value through the parameter δ. If δ = 0, only rules

with the exact no-correlation value will be removed.

A principle of the proposed method is that a larger

rule should only be retained if it has some gain (mea-

sured by M) over its sub-rules. To prevent this gain

from being insigniﬁcant, the user can set a parame-

ter α, indicating the minimum relative gain between a

size-3 rule and its size-2 sub-rules. The relative gain

is the difference in M values between the size-3 rule

and a size-2 sub-rule, divided by the M value of the

size-2 rule. In other words, the M value for the size-3

rule should be α% higher than that of the size-2 sub-

rule. Equation 1 shows how the relative gain of rule

is calculated concerning rule r

RG(r

) =

M(r

) −M(r

)

M(r

)

(1)

In our experiments, we observed that using rules with

lower conﬁdence values might be interesting to en-

able the formation of rule groupings that exhibit rele-

vant relationships. On the other hand, this can cause

a side effect, generating many groups around size-

3 rules with low conﬁdence. The parameter c

′

was

added to control this issue and specify minimum con-

ﬁdence for size-3 rules. Therefore, the c

′

parameter of

the proposed method is not directly related to the min-

imum conﬁdence of ARM algorithms. If c

′

= 0, all

rules generated by the ARM algorithm will be consid-

ered. Algorithm 1 performs the procedure described

above.

Visualizing the Interconnections Among

Rule Groups

We observe that speciﬁc rules may appear in mul-

tiple groupings generated by the proposed method.

Therefore, we have developed a graphical represen-

tation capable of condensing the information derived

from these groupings. This is accomplished through

a graph encompassing all generated groups connected

by some common rule. Figure 1 provides a general

example of this graph. In the graph, the red nodes

represent rules shared by the groupings, referred to

as pivot rules. The blue nodes depict groupings of

Types 2, 4, 6, and 7, while the yellow nodes represent

groupings of Type 1. The connections between group-

ings are established exclusively through pivot rules,

a strategy employed to reduce the number of edges

and enhance visualization. Establishing connections

between groups of Types 3, 5, and 8 is not feasible.

Types 5 and 8 consist of only one rule and lack a pivot

rule. Similarly, Type 3 also lacks a pivot rule, given

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

122

Data: R: rule set generated by ARM; ¯a: item

of interest; δ: correlation threshold; α:

minimum relative gain; c

′

: minimum

conﬁdence for size three rules

Result: rule groupings categorized into 8

different types

Select from R the rules whose M value is

greater than 1.0 +δ:

′

= {r

∈ R ∧ M(r

) > 1.0 + δ};

Select from R

′

only the rules that have the

item of interest ¯a: R

′′

= {r

∈ R ∧ ¯a ⊂ r

};

Remove 3 size rules with conﬁdence less than

′

: R

′′′

= {r

∈ R

′′

∧ (len(r

) =

2 ∨Con f (r

) ≥ c

′

)};

Organize the rules into the following types:

Type 1: rule pairs of size 2, such as:

: a → ¯a; r

: ¯a → a}, where r

∈ R

′′′

Type 2: trios of rules, such as:

: a

→ a

: ¯a → a

: a

, ¯a → a

where r

∈ R

′

, r

∈ R

′′′

, RG(r

) ≥ α

and RG(r

) ≥ α.

Type 3: pairs of rules, such as:

: a

→ a

: a

, ¯a → a

where r

∈ R

′

, r

∈ R

′′′

, ¯a → a

/∈ R

′′′

and

RG(r

) ≥ α.

Type 4: pairs of rules, such as:

: ¯a → a

: a

, ¯a → a

where r

∈ R

′′′

, a

→ a

/∈ R

′

and

RG(r

) ≥ α.

Type 5: unitary sets of rules, such as:

{r : a

, ¯a → a

where r ∈ R

′′′

, a

→ a

/∈ R

′

and

¯a → a

/∈ R

′′′

Type 6: trios of rules, such as:

: a

→ ¯a;r

: a

→ ¯a;r

: a

→ ¯a},

where r

∈ R

′′′

, RG(r

) ≥ α and

RG(r

) ≥ α.

Type 7: pairs of rules, such as:

: a

→ ¯a;r

: a

→ ¯a},

where r

∈ R

′′′

, a

→ ¯a /∈ R

′′′

and

RG(r

) ≥ α.

Type 8: unitary sets of rules, such as:

{r : a

→ ¯a},

where r ∈ R

′′′

, a

→ ¯a /∈ R

′′′

and a

→ ¯a /∈ R

′′′

Algorithm 1: Algorithm for rule selection and grouping.

Figure 1: Graph illustrating the connections between rule

groups. Red nodes denote pivot rules, yellow nodes corre-

spond to Type 1 groups (bidirectional rules), and blue nodes

represent other groups connected through pivot rules.

that the size-2 rule can only appear in a single group-

ing as it does not include the item of interest.

Figure 2 illustrates a subgraph example where

the central red node represents a pivot rule estab-

lishing an association where dengue leads to bleed-

ing. This pivot rule is also present in other groups,

thereby connecting various associations with bleed-

ing. For instance, medications such as acetylsali-

cylic acid (ASA), ibuprofen, and escitalopram en-

hance the association between dengue and bleeding

despite lacking direct individual associations with

bleeding (no speciﬁc association rules exist between

these medications and bleeding). Additionally, the

pivot rule connects associations related to severe liver

disease, hemophilia, and bleeding. In this scenario,

there are rules indicating that these diseases individu-

ally lead to bleeding, and other rules suggest that the

presence of dengue with these diseases ampliﬁes the

occurrence of the association with bleeding. Lastly,

the yellow node encompasses a bidirectional rule in-

dicating that dengue and bleeding share a mutual re-

lationship and can serve as a linkage point with other

nodes featuring associations leading to dengue in dif-

ferent group rules.

3 CASE STUDIES

We conducted experiments on four real case studies

to evaluate the effectiveness of the proposed method.

A Post-Processing Strategy for Association Rules in Knowledge Discovery

123

Figure 2: Illustration of a subgraph exemplifying multi-

ple associations interconnected through the relationship be-

tween dengue and bleeding.

The initial case study is derived from research by

(Slyepchenko et al., 2017), later utilized by (Castro

et al., 2018) as an Association Rule Mining (ARM)

task. The primary objective is to investigate Premen-

strual Dysphoric Disorder (PMDD) in women with

Bipolar Disorder (BD). The other three studies utilize

information from open sources provided by the

Brazilian government on Severe Acute Respiratory

Syndrome (SARS)

. Due to space constraints, we

will focus on one of the SARS case studies related

to COVID-19 and mortality. However, the results

of rule groups from all case studies are available at

https://github.com/Luiz-Cintra-Experiments/masters-

degree-experiments/tree/main/results.

The ARM algorithm used to generate the associa-

tion rules was the Apriori implemented in the R lan-

guage, version 4.3.1, available in the library arules

version 1.7-6 (Hahsler, 2023). The parameters used

were: minimum support of 1% , minimum conﬁdence

of 30%, and maximum rule size equal to 3. This sup-

port value was chosen because some attributes have

a low frequency at speciﬁc values. The conﬁdence

value was chosen to demonstrate that rules of size 2

with lower conﬁdence may have their associations en-

hanced if the item in its antecedent occurs concomi-

tantly with another item, forming a rule of size 3 that

is more reliable.

The proposed method was implemented

in Python language version 3.8, available

at https://github.com/Association-Rules-Post-

Processing/ARPPL. The parameters of the proposed

method were: interest measure M = Odds Ratio

(OR), dependence margin δ = 0.1, minimum im-

https://opendatasus.saude.gov.br/dataset?tags=SRAG

provement α = 10%, and minimum conﬁdence for

size 3 c

′

= 50%. The graphs were generated using

the library Networkx (Hagberg et al., 2008) in version

2.8.8 (NetworkX Developers, 2022), with the library

ForceAtlas2 version 1.0 (Shinn, 2016) being used to

improve the layout of nodes in the graph.

3.1 Case Study: COVID-19-Related

Mortality

The data used in this case study was compiled from

four distinct SARS databases spanning the years 2019

to 2022. The ﬁnal dataset comprises 2, 166, 443 en-

tries from patients with a conﬁrmed diagnosis of

COVID-19 and includes 34 variables

. In the ﬁnal

dataset, 95.75% of patients were hospitalized. There-

fore, the results presented concern hospitalized pa-

tients and not a general perspective of the effects of

COVID-19.

The Apriori algorithm generated 187,407 associ-

ation rules. These rules were then processed through

the proposed method, with evolution = covid death as

the item of interest for rule grouping. This resulted in

the formation of 116 groups, encompassing 215 dis-

tinct association rules. Only groupings of types 7 and

8 were not formed. Table 1 presents a subset of rules

from these groups.

The two initial groupings reveal a bidirectional

association (Type 1 group) between medical proce-

dures such as the use of invasive respiratory support

(vent sup = inv) and admission to the Intensive Care

Unit (ICU) (ICU = yes) with death due to COVID-

19 (evolution = covid death). In the ﬁrst grouping

), it is observed that patients using invasive respi-

ratory support (rule 1) had a high occurrence of mor-

tality (Conf. = 74.5%). On the other hand, among

patients who died from COVID-19 (rule 2), there is

an occurrence of invasive respiratory support, but to a

lesser extent (Conf. = 41.3%). In the second group-

ing (G

), the occurrence of items in the antecedent

and consequent of the rules appears balanced. Of

the patients admitted to the ICU (rule 3), around half

died from COVID-19 (Conf. = 53.1). Likewise, of

the patients who died from COVID-19 (rule 4), just

over half were admitted to the ICU (Conf. = 55.3%).

The following three groupings show bidirectional as-

sociations of respiratory symptoms such as respira-

tory discomfort (resp discom f ort = yes), low blood

oxygen saturation (blood oxygen = [< 95]), and dys-

pnea (dypnea = yes) with death. We can observe that

given these respiratory symptoms (rules 5, 7, and 9),

Available at: https://github.com/Luiz-

Cintra-Databases/SRAG-OpenDataSUS-2019-

2022/tree/main/srag/database

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

124

Table 1: Subset of rule groups formed for the analysis of COVID death (item of interest evolution = covid death). Gr.: group

number; Tp.: group type; Id: rule identiﬁcation; Sup.: rule support (in %); Conf.: rule conﬁdence (in %); OR: rule odds ratio.

Gr. (Tp.) Id: Rule Sup. Conf. OR

(1)

1: vent sup = inv → evolution = covid death 12.8 74.5 10.4

2: evolution = covid death → vent sup = inv 12.8 41.3 10.4

(1)

3: ICU = yes → evolution = covid death 17.1 53.1 4.4

4: evolution = covid death → ICU = yes 17.1 55.3 4.4

(1)

5: resp discom f ort = yes → evolution = covid death 19.3 35.3 1.6

6: evolution = covid death → resp discom f ort = yes 19.3 62.3 1.6

(1)

7: blood oxygen = [< 95] → evolution = covid death 21.8 35.3 1.7

8: evolution = covid death → blood oxygen = [< 95] 21.8 70.2 1.7

(1)

9: dyspnea = yes → evolution = covid death 23.4 33.9 1.6

10: evolution = covid death → dyspnea = yes 23.4 75.6 1.6

(1)

11: risk f ac = yes → evolution = covid death 22.5 37.3 2.2

12: evolution = covid death → risk f ac = yes 22.5 72.7 2.2

(2)

12: evolution = covid death → risk f ac = yes 22.5 72.7 2.2

14: vaccinated = yes → risk f ac = yes 14.1 70.5 1.7

15: vaccinated = yes, evolution = covid death → risk f ac = yes 4.8 81.1 3.0

(2)

16: ICU = yes → vent sup = inv 14.4 44.8 19.2

2: evolution = covid death → vent sup = inv 12.8 41.3 10.4

18: ICU = yes,evolution = covid death → vent sup = inv 11.1 64.7 23.1

(6)

1: vent sup = inv → evolution = covid death 12.8 74.5 10.4

20: age = (75y+) → evolution = covid death 10.0 51.5 3.0

21: age = (75y+),vent sup = inv → evolution = covid death 3.3 86.9 16.4

(6)

3: ICU = yes → evolution = covid death 17.1 53.1 4.4

20: age = (75y+) → evolution = covid death 10.0 51.5 3.0

24: ICU = yes,age = (75y+) → evolution = covid death 4.8 68.8 5.6

(6)

5: resp discom f ort = yes → evolution = covid death 19.3 35.3 1.6

20: age = (75y+) → evolution = covid death 10.0 51.5 3.0

27: resp discom f ort = yes,age = (75y+) → evolution = covid death 6.1 56.8 3.4

(6)

28: sx date = [5/2020-8/2020] → evolution = covid death 5.5 33.0 1.1

20: age = (75y+) → evolution = covid death 10.0 51.5 3.0

30: sx date = [5/2020-8/2020],age = (75y+) → evolution = covid death 2.1 58.7 3.3

the occurrence of death is close to one-third (conﬁ-

dence of the rules is 35.5% and 33.9%). However,

patients who died (rules 6, 8, and 10) had a higher oc-

currence of these symptoms (conﬁdence of the rules

is 62.3%, 70.2%, and 75.6%, respectively). The last

Type 1 grouping (G

) presented in the Table 1 shows a

bidirectional association between a patient’s risk fac-

tor (risk f ac = yes) and death from COVID-19. Like

the three previous groupings, the risk factor (rule 11)

implies death from COVID-19 in close to a third of

cases (Conf. = 37.3%). However, given a patient

who died from COVID-19 (rule 12), the occurrence of

them having some risk factor is high (Conf. = 72.7%).

Two Type 2 groupings are illustrated in Table 1.

The grouping G

shows the previous association rule

12 in which patients who died from Covid were more

likely to have some risk factor. Additionally, there

is an association between vaccinated COVID patients

(vaccinated = yes) who also had a higher occurrence

of having some risk factor (rule 14). Rule 15 then

establishes that the association of vaccinated patients

who died from COVID increases the possibility of

them having had a risk factor. This is concluded

from the increase in conﬁdence and odds ratio of rule

15 (Conf.=81.1% and OR=3.0) concerning rules 12

(Conf.=72.7% and OR=2.2) and 14 (Conf.= 70.5%

and OR=1.7). The grouping, G

shows an associa-

tion in which patients admitted to the ICU lead to the

occurrence of invasive respiratory support (rule 16).

Additionally, the previous association rule 2 indicates

that patients who died from COVID-19 also involved

in the use of invasive respiratory support. Rule 18

reinforces these two previous associations, showing

that patients admitted to the ICU who died increase

the occurrence of using invasive respiratory support.

This is also concluded by the increase in conﬁdence

and odds ratio of rule 18 (Conf.=64.7% and OR=23.1)

concerning rules 16 (Conf.=44.8% and OR=19.2) and

2 (Conf.=41.3% and OR=10.4).

Type 6 groupings deal with the item of interest,

A Post-Processing Strategy for Association Rules in Knowledge Discovery

125

death from COVID-19, in the consequent of the rules.

For these groupings, we highlight associations that re-

ﬂect how the need for speciﬁc procedures in individ-

uals over 75 years old (age = 75y+) presents a higher

risk of death. Grouping G

shows a previously seen

association where the use of invasive respiratory sup-

port leads to death (rule 1). Besides, it shows that

age over 75 also leads to death (rule 20). However,

age acts as an important reinforcement when it oc-

curs with ventilatory support (rule 21). We can verify

that both conﬁdence and odds ratio have a signiﬁcant

increase in rule 21 (Conf. = 86.9% and OR = 16.4)

compared to rules 1 (Conf. = 74.5% and OR = 10.4)

and 20 (Conf. = 51.5% and OR = 3.0). Similarly, rule

24 in grouping G

indicates that elderly patients ad-

mitted to the ICU have a higher occurrence of death

when looking at their conﬁdence (68.8%) and odds ra-

tio (5.6), which are higher than those of rules 3 and 20

seen earlier. In turn, rule 27 in grouping G

indicates

that the symptom of respiratory discomfort in the el-

derly also intensiﬁes the evolution of covid to death,

according to its higher conﬁdence value (56.8%) and

odds ratio (3.4) compared to the previous rules 25 and

20. G

is the last grouping presented in Table 1. It

shows that patients with COVID-19 symptoms in the

second third of 2020 (sx date = [5/2020 − 8/2020])

are associated with death from COVID-19 (rule 28).

Although this association is not as strong (Conf. =

33.0% and OR = 1.1), this type of association was not

generated for other dates. However, the association is

enhanced in patients over 75 years old (rule 30) (Conf.

= 58.7% and OR = 3.3).

Visualization of Rule Groupings

Figure 3 shows the graph of generated groups that

present relationship between groups. The image has

three subgraphs corresponding to groups of Type 6

and Type 7 (the subgraphs that look like a ﬁreworks

explosion), they are linked to the center of the graph

through bidirectional rules (yellow nodes). The center

of the graph shows the relationship between groups

of Type 2 and Type 4. A subgraph with only three

groups was generated and is not related to the rest of

the graph. The two subgraphs highlighted in the Fig-

ure 3 were adjusted manually, due to limitation on the

number of the pages, and will be presented below.

Figure 4 highlights all the forms of relationships

between the groups that the method is able to capture.

A bidirectional rule will show that the item of interest

also acts as a factor in the occurrence of the associated

item. Therefore, the yellow nodes provide a bridge to

verify both the factors that lead to the occurrence of

the item of interest, what the item of interest acts as a

factor and whether any item can reinforce such an as-

sociation. For example, age = (75y+) → evolution =

covid death and evolution = covid death → age =

(75y+), furthermore, the latter has a connection with

another group that shows that in patients with neuro-

logical diseases and who have had Covid there is a

greater chance of being over 75 years old.

The graph also shows that resp discom f ort =

yes, ven sup = inv e icu = yes are associated with

death due to COVID-19 through the blue nodes linked

to rule age = (75y+) → evolution = covid death.

The remaining bidirectional rules convey that items

are inherently associated to death due to COVID-

19 (resp discom f ort = yes is linked to a bidirec-

tional, which in turn is linked to center of graph,

see Figure 3). Thereby, we can see that not only

ven sup = inv → evolution = covid death but also

evolution = covid death → ven sup = inv, addition-

ally the association is reinforced when evolution =

covid death and icu = yes occurs together. A sim-

ilar relationship can also be seen in the rule icu =

yes → evolution = covid death, where evolution =

covid death → icu = yes also occurs, in addi-

tion, the association is reinforced when evolution =

covid death occurs concomitantly with tomo

res =

typical covid or obesity = yes.

Relationship between groups of Types 2 and 4

are showed in Figure 5. This type of relation-

ship shows when the item of interest acts as a fac-

tor for another item to occurs, moreover to show-

ing the other items that reinforced this association.

For example, in Figure 5, in addition to showing that

evolution = covid death → risk f ac = yes occurs, the

graph also shows that several items reinforced this as-

sociation, such as, icu = yes, blood oxygen = [< 95],

f ever = not e sore throat = not. The subgraph also

shows a bidirectional that conveys that evolution =

covid death and risk f ac = yes are intrinsically asso-

ciated.

4 DISCUSSION

Few studies have been found on post-processing asso-

ciation rules based on an item of interest in the data.

In a literature review since 2015 on this subject, only

three works were found that focus on an item of in-

terest in the consequent (Berka, 2018) (Cheng et al.,

2016) (Hahsler and Karpienko, 2017), and a single

work that focuses on an item of interest in the an-

tecedent (Wei and Scott, 2015). However, no works

address an item of interest on either side of the rule,

as in our proposed method.

Berka’s work (Berka, 2018) focuses on describ-

ing concepts by ﬁxing the chosen concept in the con-

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

126

Figure 3: Graph showing the relationships among all rule groupings. The detailed view of the highlighted square on the right

side is presented in Figure 4, and the detailed view of the highlighted square in the upper left corner is provided in Figure 5.

sequent of the rule and generating meta-rules. His

method involves, from rules obtained by the ARM

algorithm, ﬁltering those with a speciﬁc consequent

(the concept to be explored) and generating a new

database where each rule is mapped as a row, main-

taining the columns of the original database except

for the column that was ﬁxed in the consequent. The

ARM algorithm is then applied to this new database,

and the result of this application provides information

associated with the sought concept. Thus, a second

level of information is obtained compared to tradi-

tional rules. As a consequence of this approach, he

can get the association between items that lead to the

occurrence of a speciﬁc concept. However, associ-

ations in which the concept acts as a factor for the

presence of another item are lost. Our method is more

generic and retains these associations in which the

item of interest acts as a factor for the occurrence of

other items (item of interest in the antecedent of the

rule).

Cheng and colleagues (Cheng et al., 2016) pro-

pose a visualization based on a set of association rules

with the same consequent, where an expert can “as-

semble” a rule by adding items. This strategy allows

the expert to try to build rules they already have a

prior assumption. As this construction progresses,

the expert validates the relevance of the constructed

rules based on the support and conﬁdence provided

by the visualization. Hahsler and Karpienko (Hah-

sler and Karpienko, 2017) also propose a visualiza-

tion method where clusters of antecedents are gen-

erated for rules with the same consequent. First, a

matrix is created where the rows represent the con-

sequent, and the columns represent the antecedents

(both itemsets). Then, the columns are grouped us-

ing the k-means algorithm, checking the lift measure.

The authors argue that the lift measure allows match-

ing conditions of synonyms or similar items, such as

butter and margarine. The visualization consists of a

balloon plot where rows are the consequent, columns

are the clusters of antecedents, and a point on the plot

represents a rule. The points correspond to balloons

with properties such as color, size, and balloon po-

sition being used to highlight the clusters. Although

the method does not focus on a speciﬁc item of inter-

est, as all consequents are arranged in the rows of the

visualization, it facilitates an analysis of a particular

item in the consequent. Moreover, as their goal is to

group similar items, the clustering method does not

highlight the individual associations contained in less

general rules. In both mentioned works that use visu-

alization, only associations with an item of interest in

the consequent are highlighted, but not those where an

item of interest appears in the antecedent. In our ap-

proach, we consider that rules with an item of interest

in the antecedent are essential because they allow for

the analysis of the consequences of this occurrence,

and it is possible to visualize these relationships in

the graph in a grouped manner.

Wei and Scott (Wei and Scott, 2015) combine

pruning, summarization, and visualization to ﬁnd pat-

terns in adverse reactions to vaccines in the United

A Post-Processing Strategy for Association Rules in Knowledge Discovery

127

Figure 4: Subgraph displaying interconnections among various types of groups of rules.

States. In this work, the item of interest acts as the

sole antecedent of the rule, and the consequent is a

set of items representing the adverse effects of a vac-

cine. This work aligns with ours in studying the item

of interest in the antecedent, but it does not address

when the item is in the consequent, making our pro-

posal more general as we can analyze the causes and

consequences of an item of interest in other items in

the database.

We can categorize the post-processing of associ-

ation rules into four main types of tasks: pruning,

grouping, summarization, and visualization (Baesens

et al., 2000). Our method performs three of these

tasks: pruning (selecting rules based on an item of in-

terest and an evaluation measure), grouping (grouping

the selected rules into types), and visualization (gen-

erating a graph to visualize the interconnections of the

formed rule groups).

The pruning performed by our method is based,

in parts, on the concept presented by (Bayardo et al.,

2000), but using a minimum percentage improvement

in the pruning process. The grouping carried out by

our method differs signiﬁcantly from the groupings

found in recent literature, as it employs a predeﬁned

format of the relationship between rules (subsumed

rules). Works such as (de Padua et al., 2018) and

(Karimi-Majd and Mahootchi, 2015) perform group-

ings based on the similarity of items between rules,

enabling the display of other related rules. However,

if a specialist is interested in verifying, from a rule of

size 3, whether the antecedents are individually asso-

ciated, these additional rules will increase the effort

required for analysis. The same issue occurs in the

study by (Zhang et al., 2019), which groups rules us-

ing rule semantics, making it unclear to the specialist

the individual associations in relation to size 3 associ-

ations.

Some works use graphs to visualize rules, but the

approach of ﬁxing a pivot rule was not found. The

idea of using the pivot rule came from adapting the

approaches of (Wei and Scott, 2015) and (Kwon and

Kim, 2019), where the rules are nodes, and the items

are represented with different visual marks. Since the

work of (Wei and Scott, 2015) ﬁxes an antecedent, it

does not need to worry about showing the antecedent,

unlike (Kwon and Kim, 2019), which needs to show

the antecedent by connecting to the node and the node

connecting to the consequent.

Microsoft provides an alternative visualization

method through its proprietary software, MS Analysis

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

128

Figure 5: Subgraph showing multiple associations interconnected through the relationship between COVID-19-related deaths

and a risk factor.

Services

. The visualization in the MS software rep-

resents a dependency network, i.e., a directed graph

where items are nodes, linking antecedents to conse-

quents. A key distinction from our approach is that

our graph nodes can represent individual rules (if it is

a pivot rule) or groups of rules rather than individual

items. This distinction simpliﬁes the visualization by

avoiding complex cycles between items and reducing

the number of edges. Additionally, each group con-

nects to only one pivot rule and vice versa, further

minimizing edges. Our method is tailored to extract

and emphasize more intricate relationships within as-

sociations for the usefulness of specialists.

Thus, although the proposed method incorporates

adaptations of existing methods and concepts, our

work presents some notable differences compared to

the most similar approaches found in the literature on

association rule post-processing. Primarily, it is fo-

cused on the user’s interest in a speciﬁc item. Due to

this emphasis, we address the item of interest more

comprehensively, acknowledging its relevance both

in the antecedent and consequent of the rule. There-

fore, pruning, grouping, and visualization operations

are optimized to meet the user’s needs.

5 CONCLUSIONS

Developing an effective method in Association Rule

Mining (ARM) is crucial to harness the exploratory

nature inherent in this process, as it can result in many

rules, making it challenging for a domain-speciﬁc ex-

pert to manipulate and analyze them later. Unlike the

https://learn.microsoft.com/en-us/analysis-

services/data-mining/browse-a-model-using-the-

microsoft-association-rules-viewer?view=asallproducts-

allversions#BKMK Dependency

inductive bias of supervised approaches, ARM seeks

statistically relevant patterns without relying on a spe-

ciﬁc target variable. Instead of simply ﬁltering by

the item of interest and potentially concealing rele-

vant relationships, the proposed method allows the se-

lection of rules that, even without directly containing

the item of interest, share other items with those that

do. This approach promotes knowledge discovery by

highlighting complex and valuable connections in the

dataset and is beneﬁcial for domain experts seeking

speciﬁc insights into their datasets, contributing sig-

niﬁcantly to understanding and interpreting the rela-

tionships in the data.

A limitation of the work is that the proposed

method only considers rules with a maximum size

of three, which may lead to the exclusion of poten-

tially relevant rules and limit the scope of the anal-

ysis. Furthermore, the method depends on the rules

generated by the ARM algorithm and the parameters

deﬁned to create them. For example, rare items may

only appear if the minimum support used to generate

the rules is lower than the frequency of the item in

the database. For future work, we plan to assess dif-

ferences in the formed rule groupings by considering

various evaluation measures. In the graphical repre-

sentation, there is room for improvement in visualiza-

tion, particularly in areas with signiﬁcant edge over-

lap. Additionally, we aim to enhance the level of de-

tail in the graphs by incorporating additional informa-

tion about the rules, such as support, conﬁdence, and

the evaluation measure used for forming the group-

ings. These enhancements are expected to contribute

to a more comprehensive and complete analysis of the

associations within rule groupings.

A Post-Processing Strategy for Association Rules in Knowledge Discovery

129

REFERENCES

Agrawal, R., Imieli

nski, T., and Swami, A. (1993). Min-

ing association rules between sets of items in large

databases. ACM SIGMOD Record, 22:207–216.

Agrawal, R. and Srikant, R. (1994). Fast algorithms

for mining association rules in large databases. In

Proceedings of the 20th International Conference on

Very Large Data Bases, page 487–499. Morgan Kauf-

mann Publishers Inc.

Baesens, B., Viaene, S., and Vanthienen, J. (2000). Post-

processing of association rules. In KDD ’00: Proceed-

ings of the Sixth ACM SIGKDD International Confer-

ence on Knowledge Discovery and Data Mining. Spe-

cial workshop on post-processing in machine learning

and data mining: interpretation, visualization, integra-

tion, and related topics. Association for Computing

Machinery.

Balasubramani, B. S., Shivaprabhu, V. R., Krishnamurthy,

S., Cruz, I. F., and Malik, T. (2016). Ontology-based

urban data exploration. In Proceedings of the 2nd

ACM SIGSPATIAL Workshop on Smart Cities and

Urban Analytics - UrbanGIS ’16, pages 1–8. ACM

Press.

Bayardo, R. J., Agrawal, R., and Gunopulos, D.

(2000). Constraint-based rule mining in large, dense

databases. Data Mining and Knowledge Discovery,

4:217–240.

Berka, P. (2018). Comprehensive concept description

based on association rules: A meta-learning approach.

Intelligent Data Analysis, 22:325–344.

Castro, G., Salvini, R., Soares, F. A., Nierenberg, A. A.,

Sachs, G. S., Lafer, B., and Dias, R. S. (2018). Ap-

plying Association Rules to Study Bipolar Disorder

and Premenstrual Dysphoric Disorder Comorbidity.

In 2018 IEEE Canadian Conference on Electrical &

Computer Engineering (CCECE), pages 1–4. IEEE.

Cheng, C.-W., Sha, Y., and Wang, M. D. (2016). Inter-

visar: An interactive visualization for association rule

search. In Proceedings of the 7th ACM International

Conference on Bioinformatics, Computational Biol-

ogy, and Health Informatics, pages 175–184. ACM.

de Padua, R., do Carmo, L. P., Rezende, S. O., and de Car-

valho, V. O. (2018). An analysis on community detec-

tion and clustering algorithms on the post-processing

of association rules. In 2018 International Joint Con-

ference on Neural Networks (IJCNN), volume 2018-

July, pages 1–7. IEEE.

Freitas, A. A. (2000). Understanding the crucial differ-

ences between classiﬁcation and discovery of associ-

ation rules. ACM SIGKDD Explorations Newsletter,

2:65–69.

Hagberg, A. A., Schult, D. A., and Swart, P. J. (2008). Ex-

ploring network structure, dynamics, and function us-

ing networkx. In Varoquaux, G., Vaught, T., and Mill-

man, J., editors, Proceedings of the 7th Python in Sci-

ence Conference, pages 11 – 15, Pasadena, CA USA.

Hahsler, M. (2023). R package arules - min-

ing association rules and frequent itemsets.

https://www.rdocumentation.org/packages/arules/ver

sions/1.7-6. Accessed: 2023-08-05.

Hahsler, M. and Karpienko, R. (2017). Visualizing associa-

tion rules in hierarchical groups. Journal of Business

Economics, 87:317–335.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Con-

cepts and Techniques. Elsevier Inc.

Han, J., Pei, J., Yin, Y., and Mao, R. (2004). Min-

ing frequent patterns without candidate generation:

A frequent-pattern tree approach. Data Mining and

Knowledge Discovery, 8:53–87.

Karimi-Majd, A.-M. and Mahootchi, M. (2015). A new

data mining methodology for generating new service

ideas. Information Systems and e-Business Manage-

ment, 13:421–443.

Kwon, J.-H. and Kim, E.-J. (2019). Accident prediction

model using environmental sensors for industrial in-

ternet of things. Sensors and Materials, 31:579.

Matetic, M., Bakaric, M. B., and Sisovic, S. (2015). Asso-

ciation rule mining and visualization of introductory

programming course activities. In Proceedings of the

16th International Conference on Computer Systems

and Technologies - CompSysTech ’15, volume 1008,

pages 374–381. ACM Press.

NetworkX Developers (2022). Net-

workx - network analysis in python.

https://networkx.org/documentation/networkx-2.8.8/.

Accessed: 2023-08-05.

Shinn, M. (2016). Forceatlas2 for python.

https://github.com/mwshinn/forceatlas2-python/.

Accessed: 2023-08-05.

Slyepchenko, A., Frey, B. N., Lafer, B., Nierenberg, A. A.,

Sachs, G. S., and Dias, R. S. (2017). Increased ill-

ness burden in women with comorbid bipolar and pre-

menstrual dysphoric disorder: data from 1 099 women

from STEP-BD study. Acta Psychiatrica Scandinav-

ica, 136(5):473–482.

Wei, L. and Scott, J. (2015). Association rule mining in

the us vaccine adverse event reporting system (vaers).

Pharmacoepidemiology and Drug Safety, 24:922–

933.

Weidner, D., Atzmueller, M., and Seipel, D. (2020).

Finding Maximal Non-redundant Association Rules

in Tennis Data, volume 12057 LNAI, pages 59–78.

Springer.

Zaki, M. (2000). Scalable algorithms for association min-

ing. IEEE Transactions on Knowledge and Data En-

gineering, 12:372–390.

Zhang, C., Xue, X., Zhao, Y., Zhang, X., and Li, T. (2019).

An improved association rule mining-based method

for revealing operational problems of building heat-

ing, ventilation and air conditioning (hvac) systems.

Applied Energy, 253:113492.

Zhang, C., Zhao, Y., Lu, J., Li, T., and Zhang, X. (2021).

Analytic hierarchy process-based fuzzy post mining

method for operation anomaly detection of building

energy systems. Energy and Buildings, 252:111426.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

130