From What-If Scenarios to Event Associations: A Novel Approach to

Social Media Event Analysis

Aigerim Mussina

1 a

, Sanzhar Aubakirov

1 b

, Paulo Trigo

2 c

and Madina Mansurova

1 d

Al-Farabi Kazakh National University, Almaty, Kazakhstan

ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal

Keywords:

Events Association, Counterfactual Analysis, Association Rules, What-If Analysis.

Abstract:

This paper introduces a novel approach to event prediction in social media by applying association rules to

generate counterfactual what-if scenarios. Using the Events2012 dataset as a foundation, we developed the

EventsAssociation2012 dataset to systematically identify patterns within event sequences and assess the pre-

dictive power of what-if scenarios. Employing a Large Language Model (LLM) to generate event embeddings,

similarity scores, and conditional probabilities, we mapped real-world scenarios to intra-event and inter-event

associations, thereby creating a robust framework for understanding the interconnected nature of social media

discussions. Our methodology leverages association rule mining to model causal relationships between events,

enabling predictions of plausible future outcomes based on hypothetical scenarios. The results demonstrate

the potential for applying what-if scenarios to new event datasets, revealing challenges and opportunities for

reﬁning this approach. The study further discusses areas for improvement, such as expanding the identiﬁca-

tion of intra-event scenarios, exploring multi-event associations, and enhancing topic embedding techniques.

Overall, this work advances counterfactual analysis in event prediction, providing a more accurate and com-

prehensive method for modeling event associations in the dynamic landscape of social media.

1 INTRODUCTION

In recent years, Online Social Networks (OSNs) have

become central to information exchange, allowing

millions of users worldwide to share their experi-

ences, opinions, and perceptions in real-time. Plat-

forms such as Twitter, Facebook, and Telegram of-

fer a vast repository of user-generated content, mak-

ing them invaluable sources for capturing collective

social dynamics. Studying user messages on OSNs

has provided insights into societal trends, public opin-

ions, and information propagation. One emerging

research area is the exploration of associations be-

tween signiﬁcant events discussed on these platforms.

Identifying and analyzing the most discussed events

can uncover underlying patterns and causal relation-

ships, contributing to a deeper understanding of social

phenomena (Tangcharoensathien et al., 2020; Valdez

et al., 2020).

https://orcid.org/0000-0002-7043-0810

https://orcid.org/0000-0002-8416-527X

https://orcid.org/0000-0001-5850-615X

https://orcid.org/0000-0002-9680-2758

Events within OSNs, as derived from public dis-

cussions, represent the main points of collective atten-

tion. These events could range from political events,

natural disasters, and cultural movements to techno-

logical innovations (Kaliyar et al., 2021; Daud et al.,

2020). They are often interconnected, forming chains

of discussions that could reﬂect sequences of causal

effects in real-world scenarios. For example, a polit-

ical debate may trigger widespread online discourse,

which in turn can inﬂuence public opinion, shape pol-

icy decisions, and lead to further events in a domino

effect (Islam et al., 2020). Understanding these asso-

ciations provides a way to decode public sentiment

and an opportunity to forecast future developments

based on current discussions.

Event association detection within OSNs is a rel-

atively new research area, with a limited number of

studies exploring the nuances of how events discussed

on social media platforms are interconnected. Most

existing research focuses on detecting events based

on the intensity and frequency of discussions, such

as using keyword-based extraction, topic modeling,

or sentiment analysis (Ali et al., 2021). Few studies

have explored the more profound dimension of con-

Mussina, A., Aubakirov, S., Trigo, P., Mansurova and M.

From What-If Scenarios to Event Associations: A Novel Approach to Social Media Event Analysis.

DOI: 10.5220/0013645900003967

In Proceedings of the 14th International Conference on Data Science, Technology and Applications (DATA 2025), pages 203-212

ISBN: 978-989-758-758-0; ISSN: 2184-285X

203

necting events, in which one event may potentially

trigger or affect another. Some studies in this do-

main often rely on time-series analysis or basic cor-

relation techniques, which might not fully capture

the complex web of event associations as they un-

fold in real-world contexts (Daud et al., 2020; Bian

et al., 2020). For example, link prediction (Daud

et al., 2020) provides insights into associations but

does not always reﬂect causal relationships. Methods

like rumor detection using graph convolutional net-

works (Bian et al., 2020) explore information prop-

agation but must establish event-to-event causality.

Our work addresses this gap by introducing a more

sophisticated approach to identifying these associa-

tions through counterfactual analysis.

Counterfactual analysis traditionally ﬁnds its ap-

plications in business and healthcare. For exam-

ple, in business, counterfactuals help assess the im-

pact of strategic decisions (Eabrasu, 2008), while in

healthcare, they evaluate the potential outcomes of

various treatment options (Shalit et al., 2017). One

work suggested using counterfactual models to as-

sess the impact of Twitter misinformation on future

events(Zhang et al., 2022). Their work focuses on

capturing the temporal dynamics of information dis-

semination and its potential inﬂuence on public dis-

course. By employing a neural temporal point process

model, they estimated the causal effects of misinfor-

mation propagation on social networks, demonstrat-

ing the value of counterfactual reasoning in under-

standing the broader consequences of false informa-

tion. The counterfactual analysis was used to explore

the impact of social media campaigns on user behav-

ior(Yu et al., 2022). Researchers developed a causal

impact model to assess how the diffusion of social

media content inﬂuences user actions, such as partic-

ipation in social movements or purchasing behavior.

Their work highlights how counterfactual reasoning

can offer a deeper understanding of the causal mech-

anisms behind information spread on OSN. Another

research applied counterfactual reasoning to detecting

rumors on social networks, offering insights into how

events could unfold differently with changes in key

events(Zhang et al., 2023). Their work introduced di-

verse counterfactual evidence to model the spread of

rumors, facilitating the exploration of alternative sce-

narios in which different events inﬂuence the rumor’s

propagation. This research underscores the potential

of counterfactual analysis in understanding the dy-

namics of information spread between events on so-

cial media.

While these works have contributed signiﬁcantly

to understanding event associations and the effects of

social media content, gaps remain in the application

of counterfactual analysis speciﬁcally for event asso-

ciation detection. Previous studies have focused on

individual aspects like misinformation, user behavior,

narrative structure, and rumor detection. However,

a comprehensive exploration of how social media-

derived events can generate “what-if” scenarios using

association rules to forecast future events is still in

its infancy. Our research aims to address this gap by

leveraging counterfactual analysis to investigate the

causal interconnections between events within social

networks.

In our previous work, we introduced the concept

of counterfactual ”what-if” scenarios for understand-

ing event associations (Mussina et al., 2023). How-

ever, this earlier work lacked a formal evaluation of

these scenarios. The present study seeks to build upon

that foundation by thoroughly evaluating the ”what-

if” scenarios and demonstrating their practical appli-

cation in detecting event associations on OSNs. Our

central hypothesis is: ”Social media-derived event de-

tections can generate what-if scenarios using associ-

ation rules from event topics, which can then be ap-

plied to assess their applicability for future events.”

This novel application of counterfactual reasoning

to event associations opens up new possibilities for

understanding the ﬂow of information and inﬂuence

within digital ecosystems.

2 MATERIALS AND METHODS

In this section, we introduce our study’s formal deﬁ-

nitions and methodologies, detailing how we created

a dataset for event association detection, generated

”what-if” scenarios and evaluated them, and applied

scenarios.

2.1 Deﬁnitions

At ﬁrst we need to describe additional data used in

event detection.

Deﬁnition 1. A topic of interest, ToI, is deﬁned as a

dictionary of topics, where each topic has a value of

thematic coefﬁcient. A topic, t, is deﬁned as an N-

gram, a sequence of N words, related to the interest

of study. This relation of topic is represented as the-

matic coefﬁcient of topic which is deﬁned by the next

formula:

= log

target

common

> 0 (1)

, where N

target

is a frequency of topic t in the target

corpora, N

common

is a frequency of topic t in the com-

mon corpora, and M

is a thematic coefﬁcient.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

204

In this work we examined events of different cate-

gories. ToI generation is based on the idea that words

speciﬁc to a particular event category might appear

in texts of other categories, but they are most fre-

quently found within their own category (Mussina

et al., 2022). During ToI generation event tweets of

each category was used as a target corpora while oth-

ers constructed common corpora. If thematic coefﬁ-

cient was greater than 0, this topic was added to ToI

dictionary.

Deﬁnition 2. An event E(w, T) is a subset of topics

from a predeﬁned set of Topics of Interest (ToI), ob-

served within a time window w. Formally,

E(w, T ) = {t

, ..., t

} ⊆ T (2)

, where:

• T is a ToI dictionary related to the category

• w is the time-window during which the event oc-

curs

• t

represents individual topics from T, and

, ..., t

} is the subset of topics discussed

within w

An additional characteristic of an event is its

newsworthiness, a value calculated during the event

detection process. This value indicates the signiﬁ-

cance of the topics within the event, helping to pri-

oritize or identify key events within the online discus-

sions.

Deﬁnition 3. Topic space is a subset of an event’s

topic-set that causes one event to become another.

The counterfactual analysis aims to explore cause-

and-effect relations by searching for statements such

as “if A occurred, then B is also likely to occur”. Con-

sidering A and B as events and “Deﬁnition 1“, we

may rephrase such a statement as “if, within a time-

window, w, topics {t

, ..., t

} occurred, then top-

ics {t

, ..., t

} are also likely to occur in w”,

where t

x means that topic x was addressed in event

E. Following the idea of what-if perspective, we

suppose that A’s topic-set, {t

, ..., t

}, includes

“topic space” that has been intervened such that dis-

cussion goes to B’s topic-set, {t

, ..., t

The following deﬁnitions are formulated based on

the market basket analysis approach, utilizing its fun-

damental concepts, namely: basket, item, itemset,

left-hand side (LHS), right-hand side (RHS), support,

and conﬁdence. The itemset I = t

, ..., t

is a set

of items with the length k, where each t

is a topic

(item). The association rule consists of subitemsets

in the form LHS ⇒ RHS, where LHS, RHS ⊂ I and

LHS ∩ RHS = ∅. Support and conﬁdence are special

metrics of association rules. The support is the joint

probability of LHS and RHS that items in LHS and

RHS occur together. The conﬁdence is a conditional

probability of the form P(RHS|LHS).

Deﬁnition 4. A what-if scenario is generated from an

itemset I

as an association rule, LHS ⇒ RHS, that

takes the form: Base

∪W I

⇒ RHS

, where Base

∪

W I

= LHS and Base

∩W I

= ∅, also LHS ⊂ I

and RHS ⊂ I

; each L, M, and R subscript is the size

of the respective subitemset.

This way, the counterfactual scenario perspective

can be read as “if W I

occurs together with the Base

then RHS

is also likely to occur.”

Deﬁnition 5. An intra-event scenario is taken

from a what-if scenario where support(W I

) =

support(Base

) = support(RHS

Deﬁnition 6. A sub-event is deﬁned within an intra-

event scenario, where multiple events are clustered

based on shared topics. Within such a cluster, there

exists a center event, which is the event with the high-

est newsworthiness, indicating its relative importance

within the cluster. The other events in the cluster,

which are associated with but less signiﬁcant than the

center event, are referred to as sub-events. These sub-

events represent smaller occurrences that contribute

to the overall context of the center event, highlighting

the hierarchical nature of event relationships within

social media discussions.

Deﬁnition 7. A one-rule-based inter-event scenario is

taken from a what-if scenario where:

• support(W I

) is the minimum from all subitem-

sets of size M,

• support(Base

) is the maximum from all

subitemsets of size L,

• support(RHS

) is the minimum from all subitem-

sets of size R.

Deﬁnition 8. A two-rules-transitivity-based inter-

event scenario is taken from two association rules

of the form: Base

⇒ W I

and W I

⇒ RHS

where M > L, M > R and Base

,W I

⊂ I

W I

, RHS

⊂ I

, I

and I

are different itemsets.

This scenario generation is based on the association

rule’s conﬁdence antimonotone property.

From What-If Scenarios to Event Associations: A Novel Approach to Social Media Event Analysis

205

Figure 1: EventsAssociation2012 generation schema.

2.2 EventAssociation Dataset

Generation

There is a gap in available datasets with event associ-

ations suitable for evaluating what-if scenarios, so we

decided to create one, see Figure 1. As the foundation

for our new dataset, we used the labeled events from

the Events2012 (SEDTWik) dataset (McMinn et al.,

2013), which has 504 events. The SEDTWik dataset

comprises events in id, description, category. Each

description is a one-sentence summary of the topic

discussed, and the events are categorized into eight

distinct types, including Sport, Politics, Business, and

Disaster.

We used a Large Language Model (LLM), to iden-

tify associations between these events and detect sim-

ilarities. LLMs can understand context, semantics,

and the subtle nuances in natural language, making

them well-suited for comparing event descriptions.

We prompted the LLM to compare events within each

category and provide the following:

• Similarity – a percentage indicating the events’

similarity based on their textual descriptions.

• Conditional Probability – the likelihood (in per-

centage) that one event stems from another and

vice versa.

• Explanation – a textual description elaborating on

the relationship between the events.

The LLM performs these tasks by leveraging its

vast training on diverse datasets containing patterns

in language, relationships, and causality. When asked

to compare events, the LLM analyzes the semantic

content of the event descriptions to calculate similar-

ity and infer potential causal connections. It estimates

the conditional probabilities by considering linguistic

cues and contextual information that suggest the like-

lihood of one event leading to another. The LLM gen-

erates explanations using its contextual understanding

to provide a coherent rationale for the similarity and

probability assessments.

After obtaining 24,719 event comparisons through

this process, we developed deﬁnitions for intra-event

and inter-event associations.

Deﬁnition 9. Intra-event Association: This describes

a single event along with its sub-events. For an associ-

ation to qualify as intra-event, it must have high sim-

ilarity, with both conditional probabilities P(A|B) >

80% and P(B|A) > 80% and the absolute difference

between these probabilities |P(A|B) − P(B|A)| < 1%.

The requirement for high conditional probability en-

sures that even if events are similar within a category,

they genuinely describe the same event rather than

two unrelated occurrences. When events exhibit high

similarity but low conditional probabilities, they do

not indicate causality. Therefore, intra-event scenar-

ios represent one event and its sub-events.

Deﬁnition 10. Inter-event Association: This type of

association occurs between two distinct events. It is

characterized by high similarity between the events

and a signiﬁcant difference in conditional probability,

formulated as P(A|B) > sim or P(B|A) > sim, where

sim is a similarity between events, and P(A|B) −

P(B|A) > n, where n = 20%. The variation in condi-

tional probability indicates the direction and possible

causal link between two events.

Associations were also validated on time-

windows. One event could not cause another event

in the past. By using an LLM to extract these associa-

tions, we can efﬁciently process and evaluate the com-

plex relationships between events, providing a rich

dataset for what-if scenario analysis. Our dataset is

called EventAssociation2012.

2.3 Counterfactual What-If Scenarios

Evaluation

The purpose of this evaluation is to identify if the gen-

erated what-if scenarios correspond to the event as-

sociations within the EventsAssociation2012 dataset.

To achieve this, we employ the following multi-step

process:

• Conversion of Text Data to Vectors

First, we convert the text data from real event de-

scriptions and detected event topics into numeri-

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

206

cal vectors using text embeddings. For this pur-

pose, we use the text-embedding-3-small model

from the OpenAI client. This model processes

each input text to produce a dense vector represen-

tation in a high-dimensional space. These vectors

capture the semantic essence of the text, allowing

for similarity comparisons. The text-embedding-

3-small model uses a neural network trained on

diverse textual data, creating embeddings that re-

ﬂect both the syntactic and contextual features of

the input text. This embedding process gener-

ates vectors that we then use to measure similarity

through cosine distance.

• Mapping Detected Events to Real Events

In this step, we map each detected event from

Events2012 dataset to its corresponding real event

in the same dataset. Let: R = r

, r

, ..., r

be the set

of vectors representing the real event descriptions

from Events2012. Let D = d

, d

, ..., d

be the set

of vectors representing the topics of the detected

events. We determine the mapping by compar-

ing each detected event vector d

to all real event

vectors r

, see “Equation 3“. The detected event

is considered a match to a real event if it has the

maximum similarity, measured using the cosine

distance between the vectors. The cosine similar-

ity gives a value between -1 and 1, where 1 indi-

cates that the vectors are identical. This approach

helps identify which real event most closely aligns

with each detected event based on the discussion

topics.

event

= max

j=1

∗ r

| ∗ |r

, (3)

• Matching What-if Scenarios with Detected Events

A what-if scenario consists of three compo-

nents: Base, W I (What-If), and RHS (Right-

Hand Side). For this evaluation, we combine the

Base and W I into a single component referred

to as the LHS (Left-Hand Side). The process

for matching each side of the what-if scenario

to the detected events is as follows: Let LHS =

lhs

, lhs

, ..., lhs

be the set of vectors represent-

ing the LHS (Base + W I) of the what-if scenar-

ios. Let RHS = rhs

, rhs

, ..., rhs

be the set of

vectors representing the RHS of the what-if sce-

narios. Let D = d

, d

, ..., d

be the set of vec-

tors representing the detected event topics. For

each what-if scenario, we compare the vectors of

the LHS and RHS with the vectors of the detected

events. A scenario is considered to be mapped to a

detected event if the LHS and RHS vectors show

the highest cosine similarity with the vectors in

D. Let the event with the highest similarity from

LHS from nth what-if scenario with D be named

Event

LHS

, see “Equation 4“, and the event with

the highest similarity from RHS from nth what-if

scenario with D be named Event

RHS

, see “Equa-

tion 5“. This process enables us to identify which

detected events best match the hypothetical con-

ditions outlined in the what-if scenarios.

Event

LHS

(lhs

) = max

j=1

lhs

∗ d

|lhs

| ∗ |d

(4)

Event

RHS

(rhs

) = max

j=1

rhs

∗ d

|rhs

| ∗ |d

(5)

• Calculating Accuracy of Matched Event Associa-

tions

The ﬁnal step is to evaluate how many of the

mapped what-if scenarios correspond to the as-

sociations in the EventsAssociation2012 dataset.

This involves checking if the matched events for

both the LHS and RHS of each what-if scenario

align with an entry in the EventsAssociation2012

dataset. Accuracy is calculated as the ratio of

matched scenarios to the total number of evalu-

ated scenarios. Let N

matched

represent the number

of what-if scenarios that successfully match an en-

try in EventsAssociation2012, and N

total

represent

the total number of evaluated scenarios. The ac-

curacy is given by “Equation 6“.

Accuracy =

matched

total

∗ 100% (6)

This accuracy metric provides an indication of

how effectively the what-if scenario generation

and detection process mirrors real-world event as-

sociations.

We will call scenarios that match EventsAssoci-

ation2012 real-world what-if scenarios. In the next

section, we will apply these real-world scenarios to a

newly detected events dataset.

2.4 Event Association Detection via

Real-World What-If Scenarios on a

New Dataset

Our hypothesis is that ”Social media-derived event

detections can generate what-if scenarios using asso-

ciation rules from event topics, which can then be ap-

plied to assess their applicability for future events.” To

test this hypothesis, we apply real-world what-if sce-

narios to a new dataset of detected events, following

the outlined algorithm.

• Matching Events with Scenario Parts

Each what-if scenario has two main components:

From What-If Scenarios to Event Associations: A Novel Approach to Social Media Event Analysis

207

the Left-Hand Side (LHS) and the Right-Hand

Side (RHS). The LHS represents the initial event,

while the RHS represents the subsequent event.

To match these scenarios to the new set of de-

tected events, we ﬁrst use embeddings of the de-

tected event topics and the scenario topic sets. We

calculate the cosine similarity between the topic

embeddings of the detected events and the topic

sets of the scenario components. The detected

events with the maximum similarity are selected

as the matched events in the scenario. This pro-

cess results in a mapping where a detected event,

eventA

matched

, corresponds to the LHS of the sce-

nario, and another detected event, eventB

matched

corresponds to the RHS.

• Identifying the WI Part in the Matched Event

With the matched pair eventA

matched

⇒

eventB

matched

established according to the

scenario’s LHS ⇒ RHS relationship, the next

step is to identify the What-If (W I) part within

eventA

matched

. The W I part represents a hypothet-

ical or counterfactual condition within the initial

event that could lead to the subsequent event. To

identify the W I part, we compare the embeddings

of various combinations of detected event topics

in eventA

matched

with the embedding of the WI

component of the scenario. The number of topics

in each combination is based on the length of

the WI in the original what-if scenario. The

combination of topics that exhibits the highest

similarity to the WI embedding is deﬁned as

the topic space for the counterfactual WI part.

This topic space represents the set of conditions

within eventA

matched

that could potentially trigger

eventB

matched

, thereby validating the applicability

of the what-if scenario to newly detected events.

By following this algorithm, we can effectively

apply real what-if scenarios to newly detected events,

enabling us to explore the causal and associative dy-

namics of social media-derived events. This process

allows for the practical evaluation of our hypothesis,

demonstrating whether event-topic associations de-

tected in the past can be used to predict and assess

potential future events.

3 RESULTS

In this section, we present the ﬁndings of our study,

encompassing three key aspects: the development of

the EventsAssociation2012 dataset, the evaluation of

what-if scenarios using this dataset, and the applica-

tion of these scenarios to detect event associations on

a new set of social media-derived events. The results

from each stage contribute to a comprehensive under-

standing of how event-topic associations in social me-

dia can be modeled, evaluated, and used for future

event prediction.

3.1 Dataset for Event Association

Detection

First, we describe the characteristics of the EventsAs-

sociation2012 dataset, which was constructed by ap-

plying an LLM to the Events2012 dataset to generate

event-to-event similarity scores, conditional probabil-

ities, and textual explanations. This dataset, which

contains both intra-event and inter-event associations,

serves as the foundation for evaluating our what-if

scenarios.

A pairwise comparison of 504 events was

conducted within each respective category of the

Events2012 dataset. This approach ensured that

events in the ”Sport” category, for example, were only

compared to other events within the same category.

The Events2012 dataset consists of eight categories:

”Armed Conﬂicts & Attacks,” ”Arts, Culture & En-

tertainment,” ”Business & Economy,” ”Disasters &

Accidents,” ”Law, Politics & Scandals,” ”Science &

Technology,” and ”Sports.” The result of EventsAsso-

ciation2012 is presented in Table 1.

Table 1: EventsAssociation2012 information.

Event

pairings

intra-

event

associa-

tions

inter-

event

associa-

tions

Number

of asso-

ciations

24,719 14 16

The example of inter-event association is pre-

sented in Table 2.

3.2 What-if Scenarios Evaluation

Next, we detail the evaluation of what-if scenarios,

which involved matching these scenarios with the en-

tries in EventsAssociation2012. We outline the crite-

ria used for successful matching and assess the accu-

racy of these scenarios, providing insight into their ef-

fectiveness in capturing real-world event associations.

Since what-if scenarios are constructed from de-

tected events, we needed to ﬁrst match real-events

to detected events. According to the steps described

in Section 3.3, we detected events mapped to real

events from Events2012 dataset. For example, the

real event description is “Lebron and the Heat get-

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

208

Table 2: Inter-event association example.

Event

: “Hurricane

Sandy in the Ba-

hamas.”

Event

: “Tweets for

Praying for people

affected by the hurri-

cane sandy.”

Similarity: 65%; Similarity reason: “Both

describe reactions to Hurricane Sandy.”

P(A|B) : 30% P(B|A) : 70%

P(A|B) reason:

“People in Bahamas

might be among

those prayed for.”

P(B|A) reason:

“Prayers likely in-

clude people affected

in multiple regions,

including Bahamas.”

Category: Disasters

& Accidents

Association type:

inter-event associa-

tion

ting their NBA championship rings” and correspond-

ing detected event’s topic set is ’heat’, ’ring’, ’ring

ceremoni’, ’miami heat’, ’miami’, ’championship’.

Then, scenario components were mapped to detected

events, and evaluated counterfactual what-if scenarios

were received, see Table 3.

Table 3: What-if scenario example.

What-If

scenario

aftermath, hurrican, news →

damag, superstorm

Event

LHS

“Hurricane Sandy makes landfall

near Atlantic City, New Jersey,

with widespread ﬂooding and at

least 29 deaths in the Northeast-

ern United States” discussed during

29.10.2012 - 31.10.2012.

Event

RHS

“Superstorm Sandy hits the east

coast of the USA” discussed during

02.11.2012 - 02.11.2012.

During this experiment, we concentrated on de-

tecting associations between two different events. For

that purpose, we have created scenarios by Deﬁnitions

6 and 7. The results of the evaluation are presented in

Table 4.

The analysis revealed that one-rule-based inter-

event scenarios, with an accuracy of 26%, yield rel-

atively better results than two-rules-based inter-event

scenarios, with an accuracy of 3%. However, the re-

sult is a small number of real-world what-if scenar-

ios. Out of 6,672 scenarios, only 8 were identiﬁed as

real-world what-if scenarios from a possible 30. We

also can see that inter-event scenarios could identify

the intra-event associations. This outcome indicates a

limitation in the current approach for what-if scenario

identiﬁcation, suggesting that improvements are nec-

essary. Strategies for enhancing this process will be

Table 4: What-if scenarios evaluation results.

One-rule-

based inter-

event scenar-

ios

Two-rules-

based inter-

event scenar-

ios

Parameters L = 2, M = 1,

R = 2

L = 3, M = 2,

R = 2

Number of

scenarios

6672 2284

Number of

intra-event

associations

3 out of 14 0 out of 14

Number of

inter-event

associations

5 out of 16 1 out of 16

discussed in the following section. Despite this limi-

tation, these 8 identiﬁed what-if scenarios can still be

applied to the new dataset to uncover potential event

associations.

3.3 Event Associations on a New

Dataset

Lastly, we apply the validated real-world what-if sce-

narios to a new dataset of detected events to explore

the potential of using social media-derived event de-

tections to forecast future associations. By matching

the scenarios’ LHS (initial event) and RHS (subse-

quent event) with the newly detected events, we as-

sess how well these scenarios can identify event asso-

ciations, thereby validating our hypothesis about the

predictive capabilities of what-if scenarios in the con-

text of social media discussions.

Events were detected from Telegram messages be-

tween January 1, 2024, and May 31, 2024. A total of

389 events were identiﬁed, all belonging to the ”Dis-

asters” category. This category was selected for anal-

ysis because all real-world what-if scenarios fall un-

der this speciﬁc category. The event detection was

performed using the same methodology evaluated in

(Mussina et al., 2022).

It is important to note that there are 8 real-world

what-if scenarios, but this number represents unique

event pairs. Generating what-if scenarios can result

in multiple variations of topic sets in both the LHS

and RHS components, even when they correspond to

the same pair of events. In total, 128 scenarios were

generated, describing these 8 unique event pairs.

From these 128 scenarios, we received 47 associ-

ations between events, of which 28 were unique, see

Figure 2. Here the same text-embedding-3-small is

used to generate vectors for detected events, D. When

using this model with non-English languages, it can

From What-If Scenarios to Event Associations: A Novel Approach to Social Media Event Analysis

209

still generate embeddings that capture some semantic

information, but the quality and representational ac-

curacy might be lower compared to English.

Figure 2: EventAssociationsKazTel2024 generation

schema.

Next, after we found associations, we tried to ﬁnd

the W I part in Event

LHS

to explain why this event

could lead to Event

RHS

. As described in subsection

2.4 we found the W I part for each association. This

resulting dataset was named EventAssociationsKaz-

Tel2024. Some of the event association with the high-

lighted WI part is presented in Table 5. Crawled data

from Telegram is mostly written in Russian. Words

are translated from Russian to English.

It can be seen, that when the topic “Department of

Emergency Situations” appears in the event topic set,

which is about the disaster itself, the associated event

concentrates more on rescue operations.

4 DISCUSSION

This section discusses the results obtained and out-

lines the potential improvements for future research.

Firstly, further testing is necessary to identify

intra-event scenarios as outlined in “Deﬁnition 4“.

Currently, the focus has primarily been on detecting

inter-event scenarios, based on the assumption that

this approach would yield a larger number of scenar-

ios. Expanding the focus to intra-event scenarios will

provide a more comprehensive understanding of event

associations.

Additionally, while associations between two

events were successfully identiﬁed, future work may

explore associations involving three events by treating

the W I (What-If) component as a distinct event. This

would require extending the size of the WI part in the

scenarios. For this task, two-rules-based inter-event

scenarios may offer a more suitable framework. How-

ever, this approach necessitates a larger set of real-

world what-if scenarios derived from two-rules-based

Table 5: What-if scenarios evaluation results.

LHS WI RHS

1 ﬂoor, epi-

center, cut,

Department

of Emergency

Situations,

depth, register

earthquake

Department

of Emer-

gency

Situations

elimination,

get off, res-

cuer, descent,

Department

of Emergency

Situations,

slope, res-

idential

building,

search work

2 ﬁre, ﬁreman,

observe,

eliminate,

cylinder, be

installed,

victim to

suffer, igni-

tion, private

residential

building,

salon

ﬁre, vic-

tim to

suffer

disaster, op-

erational,

emergency,

emergency

situation,

training, re-

sponse

3 disaster, ﬁre,

occur, burn,

ﬁre, meter,

district, op-

erational,

Ministry of

Emergency

Situations

disaster need, today,

situation,

operational,

monitoring,

operational,

medical assis-

tance

inter-event associations. During initial experiments,

it was not feasible to conduct all tests with every pos-

sible variation in the sizes of L, W I, and RHS in the

scenario itemsets due to RAM limitations. To address

this, future tests can be split into batches or run on a

more powerful computing environment.

In the current study, the similarity between the de-

tected event topic sets and scenario components was

calculated using the cosine similarity of sentence em-

beddings. Since embeddings are inﬂuenced by the or-

der of words, an alphabetical arrangement was used

for consistency. However, future work could explore

using all possible combinations of word order or im-

plement a method for embedding calculation that con-

siders sets of topics without regard to word sequence.

Furthermore, the relationship between support

and conﬁdence in one-rule-based inter-event scenar-

ios can be represented in a matrix format, as shown

in Table 6. This matrix could assist in identifying

scenarios of various types, such as rare, popular, or

common scenarios. This study primarily focused on

generating rare scenarios; however, exploring differ-

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

210

ent scenario types in future research could provide

valuable insights into the dynamics of event associ-

ations.

Table 6: One-rule-based inter-event scenario types matrix.

Scenario

type

Support

(W I

)

Support

(Base

)

Support

(RHS

)

Popular min max max

Rare min max min

Common max max max

5 CONCLUSIONS

This work presents a novel approach to event predic-

tion by applying association rules to generate counter-

factual what-if scenarios. Hypothetical scenarios are

leveraged through association rule mining, allowing

the methodology to systematically identify key pat-

terns within event sequences and thereby facilitate the

prediction of future events.

The study also introduces the EventsAssocia-

tion2012 dataset, which serves as the foundation for

evaluating the accuracy and applicability of what-if

scenarios. Through a detailed analysis using a Large

Language Model (LLM) to generate event-to-event

similarities and conditional probabilities, this work

establishes criteria for matching scenarios with real-

world events. The evaluation results demonstrate the

potential of this approach for identifying both two-

event and multi-event associations, providing a robust

framework for understanding the interconnected na-

ture of social media discussions.

Searching for causal relationships can be achieved

by integrating association rules into counterfactual

analysis. This study advances the modeling of causal

relationships within event associations, offering a

more precise and comprehensive method for pre-

dicting plausible alternative outcomes based on ob-

served data. Additionally, the work highlights sev-

eral areas for future improvement, including the iden-

tiﬁcation of intra-event scenarios, exploring associa-

tions among three events, reﬁning the What-If com-

ponent, and implementing more advanced embed-

ding techniques, which are key steps toward further

strengthening the predictive capabilities of the pro-

posed methodology.

ACKNOWLEDGEMENTS

This research has been supported by the Science

Committee of the Ministry of Education and Sci-

ence of the Republic of Kazakhstan (Grant No.

BR24993001) ”Creation of a large language model

(LLM) to maintain the implementation of Kazakh lan-

guage and increase the technological progress”.

REFERENCES

Ali, F., Ali, A., Imran, M., et al. (2021). Trafﬁc acci-

dent detection and condition analysis based on social

networking data. Accident Analysis & Prevention,

151:105973.

Bian, T., Xiao, X., Xu, T., et al. (2020). Rumor detection on

social media with bi-directional graph convolutional

networks. In Proceedings of the AAAI Conference on

Artiﬁcial Intelligence, volume 34, pages 549–556.

Daud, N., Ab Hamid, S., Saadoon, M., and Sahran, F.

(2020). Applications of link prediction in social net-

works: A review. Journal of Network and Computer

Applications, 166:102716.

Eabrasu, M. (2008). A what if? ﬁne-tuning the expectations

of business simulation technology through the lens of

philosophical counterfactual analysis. Organization,

30(4):694–711.

Islam, M., Liu, S., Wang, X., and Xu, G. (2020). Deep

learning for misinformation detection on online social

networks: a survey and new perspectives. Social Net-

work Analysis and Mining, 10(1):82.

Kaliyar, R., Goswami, A., and Narang, P. (2021). Fakebert:

Fake news detection in social media with a bert-based

deep learning approach. Multimedia Tools and Appli-

cations, 80(8):11765–11788.

McMinn, A. J., Moshfeghi, Y., and Jose, J. M. (2013).

Building a large-scale corpus for evaluating event de-

tection on twitter. In Proceedings of the 22nd ACM in-

ternational conference on Information & Knowledge

Management, pages 409–418.

Mussina, A., Aubakirov, S., and Trigo, P. (2022).

Parametrized event analysis from social networks.

Scientiﬁc Journal of Astana IT University, 10(10).

Mussina, A., Trigo, P., and Aubakirov, S. (2023). Sce-

nario generation with transitive rules for counterfac-

tual event analysis. In Proceedings of the 15th Inter-

national Conference on Agents and Artiﬁcial Intelli-

gence, V.3, pages 1047–1051.

Shalit, U., Johansson, F., and Sontag, D. (2017). Estimat-

ing individual treatment effect: Generalization bounds

and algorithms. In Proceedings of the 34th Interna-

tional Conference on Machine Learning, pages 3076–

3085.

Tangcharoensathien, V., Calleja, N., Nguyen, T., et al.

(2020). Framework for managing the covid-19 in-

fodemic: methods and results of an online, crowd-

sourced who technical consultation. Journal of Medi-

cal Internet Research, 22(6):e19659.

Valdez, D., Ten Thij, M., Bathina, K., et al. (2020). Social

media insights into us mental health during the covid-

19 pandemic: Longitudinal analysis of twitter data.

Journal of Medical Internet Research, 22(12):e21418.

From What-If Scenarios to Event Associations: A Novel Approach to Social Media Event Analysis

211

Yu, X., Mashhadi, A., Boy, J., Nielsen, R. C., and Hong, L.

(2022). Causal impact model to evaluate the diffusion

effect of social media campaigns. In EUSSET.

Zhang, K., Yu, J., Shi, H., Liang, J., and Zhang, X. Y.

(2023). Rumor detection with diverse counterfactual

evidence. In Proceedings of the 29th ACM SIGKDD

Conference on Knowledge Discovery and Data Min-

ing, pages 3321–3331.

Zhang, Y., Cao, D., and Liu, Y. (2022). Counterfactual neu-

ral temporal point process for estimating causal inﬂu-

ence of misinformation on social media. Advances

in Neural Information Processing Systems, 35:10643–

10655.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

212