Scenario Generation With Transitive Rules for Counterfactual Event
Analysis
Aigerim Mussina
1 a
, Paulo Trigo
2 b
and Sanzhar Aubakirov
1 c
1
Department of Computer Science, Al-Farabi Kazakh National University, 71 al-Farabi Ave., Almaty, Kazakhstan
2
GuIAA, ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal
Keywords:
Event Detection, Association Rules, What-If Analysis.
Abstract:
Event detection on online social networks is one of the comprehensive approaches for analyzing people’s dis-
cussions. However, it is not enough to detect an event as people often look for ways to influence the course
of an event. Often, in the course of a discussion, the introduction of a new topic can shift the focus to another
subject and thus move from one event to another. The causal relationship between topics and events can be ex-
plored by extracting association rules among the topics covered in each event. The scenario generation based
on those causal relationships can support what-if (counterfactual) analysis and explain transitions between
events. In this paper our goal is to generate what-if scenarios among topics of detected events. The association
rule approach was chosen as a method for its human-readable output that can be transposed into a counter-
factual scenario. We propose methods for time-window constrained topic-based what-if scenario generation
founded on market-basket analysis.
1 INTRODUCTION
Nowadays people are often immersed in a continu-
ous stream of textual data generated from social net-
works. A large volume of data usually emerges (and
grows) from people’s discussions and around aggre-
gating concepts commonly designated as “events”.
Also, within en event, the people’s discussions unfold
around certain “topics”. There is thus an increase in
research effort in the processing of textual data origi-
nating from social networks, with the goal of automat-
ically detecting both the events and their topics. Re-
searchers are interested in detecting events-and-topics
as it appears to become a powerful method for the
follow-up of people’s discussions. However, just de-
tecting events-and-topics is not enough.
The follow-up of people’s discussions also in-
volves the challenge of trying to predict the cause-
and-effect relationship that results from introducing a
new topic into a discussion. People usually (and in-
tuitively) seek to understand not only the source-and-
flow of a discussion but also the impact that their own
participation, via the introduction of a topic, may have
a
https://orcid.org/0000-0002-7043-0810
b
https://orcid.org/0000-0001-5850-615X
c
https://orcid.org/0000-0002-8416-527X
on that same flow.
Therefore, there is a “topic space” where each per-
son searches for a subset of topics through the gene-
ration of different scenarios in order to decide how to
best intervene in the flow of a discussion. In this con-
text, a scenario is constructed by combining different
topics that may have originated from the same event
(intra-event scenario) or from different events (inter-
event scenario).
The overall process starts with the event detection
process being applied in the context of an online so-
cial network (OSN). Therefore, an event is described
by a set of topics-of-interest (ToI) that, in turn, were
addressed by people, in the course of their OSN inter-
actions (discussions) during a given time-window.
We formally define an event as E(w, T ) =
{t
1
,t
2
, ...,t
n
}, where w is a time-window, T is a ToI
dictionary and the {t
1
,t
2
, ...,t
n
} T is a topic-set;
hence, each t
x
E(w, T ) represents the topic, x, as
taken from T and addressed, over w, in the detected
event E. We point out that the event detection algo-
rithm resorts to a ToI dictionary in order to improve
the process accuracy (Mussina et al., 2022).
The counterfactual analysis was chosen as a base
approach to the search for “topic space”. It aims
to explore cause-and-effect relations by searching for
statements such as “if A occurred, then B is also
Mussina, A., Trigo, P. and Aubakirov, S.
Scenario Generation With Transitive Rules for Counterfactual Event Analysis.
DOI: 10.5220/0011895000003393
In Proceedings of the 15th International Conference on Agents and Artificial Intelligence (ICAART 2023) - Volume 3, pages 1047-1051
ISBN: 978-989-758-623-1; ISSN: 2184-433X
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
1047
likely to occur” (Menzies and Beebee, 2001). Con-
sidering A and B as events and E definition (above),
we may rephrase such a statement as “if, within
a time-window, w, topics {t
A
1
,t
A
2
, ...,t
A
r
} occurred,
then topics {t
B
1
,t
B
2
, ...,t
B
s
} are also likely to oc-
cur in w”; where t
E
x
, here simply means that topic
x was addressed in event E. Following the idea
of what-if perspective, we suppose that As topic-
set, {t
A
1
,t
A
2
, ...,t
A
r
}, includes “topic space” that have
been intervened such that discussion goes to Bs topic-
set, {t
B
1
,t
B
2
, ...,t
B
s
}.
In this paper, we follow the counterfactual per-
spective and propose methods for time-window con-
strained topic-based what-if scenario generation.
Generated scenarios have two types: intra-event and
inter-event. The inter-event scenario generation in-
cludes two approaches: a one-rule-based approach
and a two-rules-transitivity-based approach. We fol-
lowed a market-basket analysis approach and the pro-
posed methods resort to (unsupervised) association
rule extraction techniques when applied to detected
events (baskets) and the sets of corresponding topics
discussed therein (item-sets within baskets).
2 RELATED WORK
An approach to the event analysis process (that fol-
lows from event-detection) falls into the broader re-
search field of “event association extration”. Usually
researchers resort to graph-based formulations, where
events are usually modelled as nodes (Shahaf et al.,
2015) and the value of edges between nodes (events)
are computed from the frequency of common words.
The “Connecting the dots” concept with graph neu-
ral networks outperformed the results of the state-of-
the-art methods (Wu et al., 2020). Analyzing event
associations one could extract cause-and-effect rela-
tionship.
Another recent research approach is the “counter-
factual event analysis” which is being achieved in the
healthcare field where researchers are focused on the
prediction of the outcome of treatments (Zou et al.,
2022). Research aims not only to predict risks of
a treatment, but also to weight the cause-effect rela-
tion of alternative interventions (Prosperi et al., 2020).
Another approach is related to the “prescriptive anal-
ysis” that finds relevant applications in business man-
agement processes and decision making (Lepenioti
et al., 2020).
The main limitations of modern solutions in pre-
scriptive and counterfactual analysis are both the: a)
the generation of models that are complex to explain,
and b) focus on narrow areas of application. In this
work we apply a market-basket analysis, which pro-
vides, as a result, a human readable model based on
association rules. Such a model is used to generate
what-if counterfactual scenario. In this context, we
could not compare results with other researchers right
now.
3 COUNTERFACTUAL EVENT
ANALYSIS
The market-basket analysis approach is formalized
with the following main concepts: basket, item,
itemset, association rule, left-hand-side (LHS), right-
hand-side (RHS), support and confidence. In a con-
text of event analysis, the detected event is consid-
ered as a basket and topic is considered as an item.
The itemset I = {t
1
,t
2
, . . . , t
k
} is a, k size, set of items
where each t
x
is a topic (item). The association rule is
constructed from subitemsets and described as impli-
cation of the form LHS RHS, where LHS, RHS I
and LHS RHS =
/
0. The support and confidence are
metrics that each association rule satisfies. The sup-
port is the joint probability, P(LHS, RHS), that items
in LHS and RHS occur together. The confidence is a
conditional probability of the form P(RHS|LHS).
Definition 1. A what-if scenario is generated from
an itemset I
N
as an association rule, LHS RHS, that
takes the form:
Base
L
W I
M
RHS
R
(1)
where Base
L
W I
M
= LHS and Base
L
W I
M
=
/
0, also LHS I
N
and RHS I
N
; each L, M and R
subscript is the size of the respective subitemset.
In this way the counterfactual scenario perspective
can be read as “if W I
M
occurs together with the Base
L
then RHS
R
is also likely to occur”.
Definition 2. An intra-event scenario is taken
from a what-if scenario where support(W I
M
) =
support(Base
L
) = support(RHS
R
)
In the intra-event scenario the equal support
means that subitemsets are likely to appear together
in each event. This comes from a small time-window
event detection where distinct events describe the
same real-world discussions.
Definition 3. A one-rule-based inter-event scenario
is taken from a what-if scenario where:
support(W I
M
) is the minimum from all subitem-
sets of size M,
support(Base
L
) is the maximum from all
subitemsets of size L,
support(RHS
R
) is the maximum from all
subitemsets of size R.
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
1048
Events may have common topic-subsets but they
actually represent different real-world discussions. In
that case, we use the inter-event scenario that can
show the path from one event to another. The asso-
ciation rules extraction for the inter-event scenario is
focused on topics that appear in several events. A
threshold value, h, is used to control the number of
events with common topics.
Definition 4. A two-rules-transitivity-based inter-
event scenario is taken from two association rules of
the form:
Base
L
W I
M
W I
M
RHS
R
(2)
where M > L, M > R and Base
L
,W I
M
I
N1
,
W I
M
, RHS
R
I
N2
, I
N1
and I
N2
are different itemsets.
Analysis of two rules at once is another approach
of inter-event scenario generation. Here we try
to pass from Base
L
(left-hand-side) to RHS
R
(right-
hand-side) through W I
M
(transitive subitemset). The
key feature is the association rule’s confidence anti-
monotone property. It states that confidence is anti-
monotone with respect to the number of items on the
right-hand-side of the rule. i.e., an increase in right-
hand-side dimension decreases or maintains its confi-
dence. In our case, if M > L in Base
L
W I
M
then the
probability that W I
M
will happen when Base
L
hap-
pened is low. However, if we also have M > R in
W I
M
RHS
R
then the probability that RHS
R
will
happen when W I
M
happened is high. In this inter-
event scenario we pass from the low confidence rule
to the high confidence rule.
4 RESULTS
An overall view of our work is depicted in figure
1. The left side of the figure represents our pre-
vious work on the event detection where we ex-
plored two input data sources corpora from Twit-
ter “Events2012” and Topics (ToI) (McMinn et al.,
2013) and implemented the SEDTWik (Morabia
et al., 2019) algorithm for event detection replac-
ing Wikipedia with ToI dictionaries (Mussina et al.,
2022). Since event detection method uses frequency
counting to determine the burst in the use of cer-
tain words, it was necessary to remove the noise and
reduce each word to the same form. Tweets were
cleaned of stop words and converted to their stemmed
form. The NLTK library was used in the process of
cleaning tweets from stop words and stemming (with
PorterStemmer) (NLTK, 2022). The detected events
were combined into one file for each ToI dictionary,
where each line corresponds to an event.
Figure 1: Overall view of the work.
The right side of the figure 1 represents this pa-
per’s current work. The association rules extraction
resorts to the market-basket approach being applied
to the detected events. Given that the approach may
identify a large number of rules, it was limited by
the number of rules and minimum support and con-
fidence. Therefore, the support-and-confidence space
was explored to extract groups with at most 10 rules.
From those extracted rules we generate the scenar-
ios. As it was mentioned above the scenarios could
be intra-event and inter-event. Below we describe the
scenarios’ generation details along with preliminary
results.
4.1 Intra-Event Associations
Since detected events have a small number of top-
ics and the number of words in a “topic space” is
enormous, we faced many rules being extracted from
one event. One such rule, where the support of each
subitemset is about 0.029, is presented in the table
1. From that rule, the following counterfactual sce-
nario was generated: “if ‘shrine’ occurs together with
the ‘nearbi’, ‘flood’ then ‘sanctuari’, ‘evacu’ are also
likely to occur”. This scenario can be interpreted as
“if a flood occurs near the shrine, then people from
the sanctuary should be evacuated”.
4.2 Inter-Event Associations
The first tests showed that the size of topic-sets should
be increased to extract the inter-event association
rules. The subitemsets support was lower than 0.1,
Scenario Generation With Transitive Rules for Counterfactual Event Analysis
1049
Table 1: Association rules for one real-world event.
Base
L
W I
M
RHS
R
itemsets ‘nearbi’,
‘flood’
‘shrine’ ‘sanctu-
ari’,
‘evacu’
support 0.0294 0.0294 0.0294
which means that the probability of topics appearing
together is tiny. The average topic-set size was 9,
which is not enough to search for common topics in a
certain number of events. For the inter-event scenario
generation, the topic-set expanded by tweets contain-
ing topics, which construct event clusters in event de-
tection. The added tweets were cleared from non-ToI
dictionary topics. As a result, the average topic-set
size increased to 453.
In the context of an inter-event scenario genera-
tion, we use topics that appear in several events. If the
number of events containing a topic is greater than the
threshold value, h, then this topic will be included in
the itemset, I. We intuitively set the threshold value
as h = 10.
4.2.1 One-Rule-Based Approach
This subsection provides two examples. The first ex-
ample is from the dictionary Armed Conflicts and
Attacks”, and the second is from Arts, Culture and
Entertainment”. For the first experiments, the follow-
ing sizes of subitemsets of the what-if scenario were
intuitively chosen: L = 2, M = 1, and R = 2. The
extracted association rules corresponding to the inter-
event scenario definition are presented in the table 2.
Table 2: Association rules support - example 1.
Base
L
W I
M
RHS
R
itemsets ‘news’,
‘kill’
‘car’ ‘least’,
‘bomb’
support 0.6712 0.4521 0.4246
Table 3: Association rules support - example 2.
Base
L
W I
M
RHS
R
itemsets ‘news’,
‘kill’
‘car’ ‘least’,
‘bomb’
support 0.6712 0.4521 0.4246
It is necessary to identify events from the ex-
tracted rules in order to better understand the results.
We found which detected events contain the W I
M
subitemset. Detected events are listed in the table 4.
More detailed and readable descriptions of the real-
world situations are shown in the table 5. Descriptions
of the events taken from the open-source “Wikipedia
Current Events Portal” (Wikipedia, 2022).
Table 4: Detected events topics.
Detected Event A Detected Event B
Ex. 1 ’beirut’, ’car’,
’eight’, ’central’,
’explod’
’kill’, ’bomb’,
syria’, turkey’,
’car’, ’attack’,
suicid’, ’central’,
’wound’
Ex. 2 ’pope’, ’coptic’,
’egypt’, ’chris-
tian’, ’bishop’,
’chosen’, ’select’
’egypt’, ’chris-
tian’, ’chosen’,
’blindfold’,
’crystal’, ’copt’
Table 5: Real-world events description.
Real-world Event
A
Real-world Event
B
Ex. 1 A car bomb ex-
plodes at Sassine
Square in the
Lebanese capital
of Beirut, killing
at least eight
people ...
A car bomb
detonates in
Semdinli, Turkey,
killing 1 and
injuring 12...”;
A suicide car
bomber detonates
a bomb in the
Hama province
of Syria ...
Ex. 2 “Bishop Richard
Williamson is ex-
pelled from the
Society of Saint
Pius X (SSPX)
...
A shortlist of
successors to
the Coptic Pope
is drawn up; a
blindfolded child
is then expected
to pick from a list
of three. (BBC)”
From the obtained results, we could derive the fol-
lowing scenarios:
“if ’car’ occurs together with the ’news’, ’kill’
then ’bomb’, ’least’ are also likely to occur”. If
there is a news about murder and additional infor-
mation as a car occurred, then we can suggest that
a bomb in a car exploded.
“if ’choos’ occurs together with the ’pope’,
’egypt’ then ’christian’, ’coptic’ are also likely
to occur”. Even though events in real-world are
not strictly connected, it is interesting that Event
A was about exclusion of bishop, Event B was
about including new members in the Coptic Pope.
Events connected by topic ’choos’. This topic is a
stemmed version of word choose.
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
1050
4.2.2 Two-Rules-Transitivity-Based Approach
In this subsection we present an example of a two-
rules-transitivity-based inter-event scenario genera-
tion. The example is presented in the table 6.
Table 6: Two-rules-transitivity-based approach example.
LHS RHS
Rule 1 ‘least’, ‘syria’ ‘news’, ‘car’,
‘kill’, ‘bomb’
Rule 2 ‘news’, ‘car’,
‘kill’, ‘bomb’
‘soldier’,
‘sever’
Rule 1, from the table 6, describes an event with
description: A car bomb explodes at Sassine Square
in the Lebanese capital of Beirut, killing at least eight
people and wounding up to 78 others. (BBC)”. Rule 2
describes an event with description: “Syrian civil war:
A Jordanian soldier dies during a gunfight between
Jordanian troops and Islamic militants attempting to
cross the border into Syria. (CTV News)”.
From that rules, the following counterfactual sce-
nario was generated: “if ‘news’, ‘car’, ‘kill’, ‘bomb’
occurs together with the ‘least’, ‘syria’ then ‘soldier’,
‘sever’ are also likely to occur”.
5 CONCLUSION
In this paper, we proposed methods for time-window
constrained topic-based what-if scenario generation,
in the counterfactual perspective, founded on market-
basket analysis and association rules extraction. Def-
initions of counterfactual scenarios, both intra-event
and inter-event, are given. Preliminary results illus-
trate the extraction of coherent causal effects and re-
quire more analysis and controlled experiments. Fu-
ture work will apply the methods to events detected
using other ToI dictionaries. We will also include
evaluating the proposed methods in the context of the
usefulness of association rules and scenarios.
ACKNOWLEDGEMENTS
This work was supported by the grant of the Min-
istry of Science and Higher Education of the Republic
of Kazakhstan, project BR10965311 “Development
of intelligent information and telecommunication sys-
tems for urban infrastructure: transport, ecology, en-
ergy, and data-analytics in the Smart City concept”.
REFERENCES
Lepenioti, K., Bousdekis, A., Apostolou, D., and Mentzas,
G. (2020). Prescriptive analytics: Literature review
and research challenges. International Journal of In-
formation Management, 50:57–70.
McMinn, A. J., Moshfeghi, Y., and Jose, J. M. (2013).
Building a large-scale corpus for evaluating event de-
tection on twitter. In Proceedings of the 22nd ACM in-
ternational conference on Information & Knowledge
Management, pages 409–418.
Menzies, P. and Beebee, H. (2001). Counterfactual theories
of causation.
Morabia, K., Murthy, N. L. B., Malapati, A., and Samant,
S. (2019). Sedtwik: segmentation-based event detec-
tion from tweets using wikipedia. In Proceedings of
the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Stu-
dent Research Workshop, pages 77–85.
Mussina, A., Aubakirov, S., and Trigo, P. (2022).
Parametrized event analysis from social networks.
Scientific Journal of Astana IT University, 10(10).
NLTK (2022). NLTK :: sample usage for stem. https:
//www.nltk.org/howto/stem.html. [Online; accessed
09-November-2022].
Prosperi, M., Guo, Y., Sperrin, M., Koopman, J. S., Min,
J. S., He, X., Rich, S., Wang, M., Buchan, I. E., and
Bian, J. (2020). Causal inference and counterfactual
prediction in machine learning for actionable health-
care. Nature Machine Intelligence, 2(7):369–375.
Shahaf, D., Guestrin, C., Horvitz, E., and Leskovec, J.
(2015). Information cartography. Communications of
the ACM, 58(11):62–73.
Wikipedia (2022). Wikipedia Current Events Portal. https:
//en.wikipedia.org/wiki/Portal:Current\ events/. [On-
line; accessed 15-December-2022].
Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., and Zhang,
C. (2020). Connecting the dots: Multivariate time se-
ries forecasting with graph neural networks. In Pro-
ceedings of the 26th ACM SIGKDD international con-
ference on knowledge discovery & data mining, pages
753–763.
Zou, H., Li, B., Han, J., Chen, S., Ding, X., and Cui,
P. (2022). Counterfactual prediction for outcome-
oriented treatments. In International Conference on
Machine Learning, pages 27693–27706. PMLR.
Scenario Generation With Transitive Rules for Counterfactual Event Analysis
1051