Next-Event Prediction in Cybercrime Complaint Narratives Using

Temporal Event Scene Graphs

Mohammad Saad Rafeeq

1 a

, Narendra Bijarniya

and Chandramani Chaudhary

1 b

National Institute of Technology, Calicut, India

Birla Institute of Technology and Science, Pilani, India

Keywords:

Temporal Event Scene Graphs, Event Prediction, Cybercrime, Sequence Modeling, BART, T5, GPT-2.

Abstract:

Cybercrime complaint narratives encompass complex sequences of criminal activities that challenge conven-

tional sequence modeling techniques. This work introduces a framework that employs dynamic temporal event

scene graphs to represent each narrative as an evolving, structured network of entities and events. Our approach

converts complaint texts into temporal event scene graphs in which nodes symbolize key entities and edges

capture interactions, annotated with their sequential order. This structured representation provides a richer and

more intuitive understanding of how cybercrime incidents unfold over time. To forecast missing or forthcoming

events, we ﬁne-tune a pre-trained BART model using a masked sequence-to-sequence paradigm.Our experi-

ments are performed on a dataset comprising thousands of real-world cybercrime reports, containing roughly

76,000 distinct event descriptions—a scale that introduces signiﬁcant sparsity and generalization challenges.

Our results demonstrate that while models such as GPT-2 and T5 struggle to capture robust patterns in this

diverse domain, the BART-based approach achieves modest yet promising improvements.

1 INTRODUCTION

Predicting the next event in a narrative sequence is an

important task to understand and anticipate complex

scenarios. In the domain of cyber-crime complaints,

accurately forecasting the subsequent step in an attack

or crime sequence could assist investigators and auto-

mated systems in taking proactive measures. However,

this task is challenging due to the unstructured nature

of free text reports and the enormous variety of events

(Al-Zaidy et al., 2012). Each report may contain vari-

ous entities, such as victim details, crime details, and

additional items (e.g., banks, accounts, cryptocurren-

cies) along with idiosyncratic event descriptions. Tra-

ditional sequence modeling approaches for narrative

understanding, such as narrative event chains, capture

temporal event sequences centered on a protagonist,

but might not fully utilize the rich connections between

multiple entities involved in a cybercrime incident.

Graph-based textual representations, on the other hand,

can encode complex relationships: for example, mul-

tiple entities like (victim, bank, money) might be in-

volved in a single event (“unauthorized transaction

https://orcid.org/0009-0008-3049-9531

https://orcid.org/0000-0003-2072-4391

without OTP”), and an entity can reappear across

events, linking the storyline together.

Our approach is related to previous work on event

prediction, often framed as script learning, which has

laid the foundation for our method. Early studies in-

troduced the concept of narrative schemas or event

chains, where structured representations of common

event sequences were learned from text corpora to pre-

dict likely subsequent events (Schank and Abelson,

1977), (Chambers and Jurafsky, 2008). More recent

research has explored the use of knowledge graphs

to capture complex interconnections between events

(Li et al., 2018). Inspired by these advancements,

our method explicitly represents each narrative as a

dynamic graph of entities and actions, enabling the

model to capture melianingful state changes and in-

terevent relationships.

We represent each complaint narrative as a tempo-

ral event scene graph (TESG), where entities (e.g.,

“Bank,” “victim,” “INR 4500”) are nodes and actions

(e.g., “transaction without OTP”) are time-stamped

edges. Converting text into these graphs exposes each

entity’s participation in events, but here we focus on

the extracted event sequence to leverage BART for

next-event prediction rather than on graph visualiza-

tion.

Rafeeq, M. S., Bijarniya, N., Chaudhary and C.

Next-Event Prediction in Cybercrime Complaint Narratives Using Temporal Event Scene Graphs.

DOI: 10.5220/0013647000003967

In Proceedings of the 14th International Conference on Data Science, Technology and Applications (DATA 2025), pages 693-700

ISBN: 978-989-758-758-0; ISSN: 2184-285X

693

Our backbone is BART, a Transformer based en-

coder decoder pretrained denoising autoencoder. Its

bidirectional encoder and autoregressive decoder sup-

port next event prediction and missing event interpola-

tion. To our knowledge, this is one of the ﬁrst applica-

tions of temporal event scene graphs to forecast events

in cybercrime narratives.We focus on two inference

tasks that predict the next event and infer intermediate

events while addressing the roles of arguments and co-

reference preprocessing. In summary, we demonstrate

a novel representation of cybercrime narratives as tem-

poral event scene graphs, a BART based sequence

generation model for event prediction, and an empiri-

cal comparison of next event versus intermediate event

masking strategies that advance incident forecasting,

leveraging extensive experiments and robust evalua-

tion metrics across diverse datasets.

We describe our modeling methodology in Section

3, present our data set and the graph construction pro-

cess in Section 4, and outline the planned experiments

in Section 5.

2 RELATED WORK

We review three related research directions: (i) event

detection, (ii) temporal event prediction, and (iii) crime

event detection.

2.1 Event Detection

Early event detection drew on script theory, modeling

scripts (Schank and Abelson, 1977). Chambers and

Jurafsky (Chambers and Jurafsky, 2008) introduced

unsupervised bootstrapping to induce chains. Recent

methods integrate neural networks with knowledge

graphs for event extraction. Li et al. (Li et al., 2018)

propose a narrative event evolutionary graph capturing

semantic relationships via graph-based attention. We

employ dynamic temporal event scene graphs to en-

force sequential ordering and isolate entity transitions

in cybercrime narratives.

2.2 Temporal Event Prediction

Temporal event prediction models timing and event

order. Du et al. (Du et al., 2016) introduced recur-

rent marked temporal point processes; Mei and Eisner

(Mei and Eisner, 2017) proposed Neural Hawkes for

irregular intervals. Kong et al.’s Language-TPP (Kong

et al., 2025) adds continuous temporal tokens to Trans-

formers. We instead employ dynamic temporal event

scene graphs on noisy text.

2.3 Crime Event Detection

Crime event detection and prediction aid law enforce-

ment. Yang (Yang, 2023) proposed TransCrimeNet,

fusing text with criminal-network graph embeddings

to predict crimes. Zhu and Xie (Zhu and Xie, 2022)

introduced a spatiotemporal-textual point process for

crime linkage. Khairova et al. (Khairova et al., 2023)

applied cross-lingual transfer on parallel corpora for

low-resource event extraction. These works under-

score graph–text integration, motivating our dynamic

temporal event scene graphs with BART.

3 METHODOLOGY

3.1 Construction

Each cybercrime complaint (a free text narrative) is

converted into a temportal event scene graph, where

nodes represent entities and edges denote actions. In

this process, we extract key information by identifying

entities such as people (e.g. the victim), organiza-

tions (e.g., a bank), and other relevant objects (e.g.,

amounts of money, devices), as well as the actions

or incidents and the order in which they occur. For

example, the ﬁrst event is assigned timestamp

, the

, etc. Each distinct action is represented as an

edge connecting the involved entities along with its

relative timestamp. The ﬁnal result is a temporal event

scene graph that captures how the narrative structure

evolves with each successive event.

For example, consider a complaint: “The victim

received a phishing email claiming to be from a [Pri-

vate] Bank. Later, a transaction of

Rs 4500

was made

from his account without an OTP.” From this narra-

tive, we extract the entities

{Victim, Bank, Rs 4500}

The ﬁrst action, “phishing email”, connects the Victim

and the Bank (indicating that an email was received

from a fraudster impersonating the bank), and is as-

signed timestamp

. The second action, “unautho-

rized transaction without OTP”, involves the victim

and the amount

Rs 4500

(indicating a funds transfer),

and is assigned a timestamp

. Figure 1, shows that

these events, arranged chronologically, produce an

event graph that reﬂects the initial phishing email be-

tween the victim and the bank, followed by the fraudu-

lent transaction involving Rs 4500.

: Phishing Email

Victim ↔ Bank

: Unauthorized Transaction

Victim ↔ Rs 4500

Figure 1: Dynamic TESG illustrating two-step event se-

quence in a phishing incident.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

694

3.2 Sequence Generation with BART

We linearize the events from each event graph into an

ordered sequence based on the timestamp, e.g.

, e

, . . . , e

n+1

where each

represents an event description at times-

tamp

i−1

. We then employ the BART base model

(Lewis et al., 2019) as our sequence-to-sequence

predictor for the modeling of masked events. BART

is particularly suitable here because it was pretrained

with a span-masking objective (among others),

learning to reconstruct text with missing spans. We

ﬁne-tune BART on our data by feeding it sequences of

events with certain events replaced by a mask token

and training it to generate the missing event text. The

ﬁne-tuning is conﬁgured in two modes corresponding

to our experimental scenarios.

1. Next-Event Prediction:

In this setup, we mask only the ﬁnal event in the se-

quence and ask BART to predict it. Formally, given

a sequence of events,

, e

, . . . , e

n+1

, generated

from a complaint description, where

occurs at

and

n+1

is the ﬁnal recorded event at

. We cre-

ate an input where

n+1

is replaced by the

[MASK]

token as shown in Figure 2. BART’s encoder pro-

cesses the sequence of past events

, e

, . . . , e

along with the mask token, and the decoder is

trained to output

n+1

(the next event)

This is anal-

ogous to a conditional next-step prediction task, ex-

cept that BART, unlike a standard language model,

can leverage the bidirectional encoding of the pre-

ceding events. The target output is the textual

description of the held-out event

n+1

. By train-

ing on many such sequences, the model learns to

infer what typically comes last given the earlier

events in cybercrime scenarios (e.g., after “victim

receives a phishing email”, the next event might

be “unauthorized bank transaction”).

at T

···

[MASK] at T

Figure 2: Dynamic event sequence illustrating a masked

event at T

within a sequential narrative.

2. Intermediate-Event Prediction:

Here, we evaluate the model’s ability to predict

any single event in the sequence when that event is

hidden (missing event), using the rest of the events

as context. For a sequence

, e

, . . . , e

n+1

occurring at

, T

, . . . , T

, we create multiple

training examples by masking one event at a time.

For instance,

, [MASK], e

, . . . , e

n+1

)

with tar-

get

;

, e

, [MASK], e

, . . . , e

n+1

)

with target

;

and so on, (including a case masking

n+1

as in

the ﬁrst experiment) as shown in Figure 3. BART

is trained to ﬁll in the blank in each case. This

setup forces the model to use both preceding and

following events to predict the missing one, tap-

ping into its sequence inﬁlling capability. It is

effectively a form of data augmentation, since a

single narrative yields multiple training samples

(one for each event masked). During inference, we

can similarly mask an event and have the model

generate a prediction for what event should be at

that position.

at T

[MASK] at T

·· ·

n+1

at T

Figure 3: Dynamic event sequence demonstrating the mask-

ing of an intermediate event, where the context before and

after the masked event is utilized for prediction.

Training BART with these masked event objectives

requires a proper formulation of the input-output pair.

We concatenate the sequence of event descriptions into

a single textual input separating events with a delim-

iter (semicolon ’;’), and use a special

[MASK]

token

in place of the hidden event’s text. The decoder’s tar-

get is the text of the hidden event. We initialize the

model with

Facebook/bart-base

weights and ﬁne-

tune for a ﬁxed number of epochs, using a loss function

based on the cross-entropy between the generated to-

ken sequence and the ground-truth event. Because

event descriptions in our data can be a short phrase

or a full sentence, we treat each action as a segment

of text to be generated. Notably, BART’s ability to

consider the entire input sequence (with knowledge

of both prior and subsequent events in the sequence

for intermediate prediction) gives it an advantage over

unidirectional models in the second scenario. The

model effectively learns a conditional distribution for

an event given its surrounding context events.

3.3 Baseline Models

We experimented with two pre-trained models as base-

lines: GPT-2 and T5. GPT-2 is a decoder-only lan-

guage model that generates text left-to-right. We ﬁne-

tuned a GPT-2 model (117M parameters) to predict the

next event given the preceding events as input. How-

ever, GPT-2 struggled with the highly variable event

vocabulary, often generating generic or incoherent out-

puts when confronted with rare or complex events.

Although it achieved a semantic similarity of 49.8%,

GPT-2 failed to produce robust performance in terms

of Hit and other metrics.

Next-Event Prediction in Cybercrime Complaint Narratives Using Temporal Event Scene Graphs

695

T5, an encoder-decoder transformer pre-trained on

a wide variety of text-to-text tasks, was ﬁne-tuned in a

similar manner—treating the task as text generation,

where the input is the sequence of prior events and

the output is the next event. T5-generated outputs ex-

hibited a semantic similarity of only 9.25%, making

it less reliable. Consequently, both GPT-2 and T5 un-

derperformed compared to BART. BART’s superior

performance can be partly attributed to its pretrain-

ing objective, which included an in-ﬁlling task (i.e.,

predicting masked spans), making it inherently well

suited for our formulation. Therefore, our methodol-

ogy focuses on BART for the reported results.

4 DATASET

We compiled a data set of cybercrime complaint nar-

ratives ﬁled on the ofﬁcial online reporting portal

(

https://cybercrime.gov.in/

), which serves as

the primary source of our data. Each entry in the

data set is a textual description of an incident written

by the victim or a police transcript thereof. These nar-

ratives typically range from a few sentences to a few

paragraphs detailing how the crime unfolded. From

an initial pool of reports, we ﬁltered and segmented

the text to isolate discrete events and identify entities,

yielding a collection of structured event sequences.

Examples to Illustrate the Data:

Consider the following example narrative.

“On [Date], a fraudulent transaction of 4500 Rs.

occurred approximately [Time] from a Private Bank

account, without the victim receiving any OTP or mes-

sage regarding the transaction.”

We extract entities, events, and a temporal event

scene graph from this narrative.

•

Entities:

[’Private Bank’, ’4500 Rs.’,

’Victim’]

• Events: [’fraudulent transaction of 4500

Rs.’, ’no OTP received’, ’no message

received’]

Statistics: The processed data set contains approx-

imately

N = 11,500

narratives (reports). The number

of events per narrative varies , while some reports

record only a single event, the most detailed ones

include up to 23 events. Importantly, the vocabulary

of unique events is extremely large, on the order of

76,000 unique event descriptions. This indicates that

authors of the reports seldom use identical phrasing,

and many actions/events are very speciﬁc (e.g., “user’s

credit card limit was increased without authorization”

might appear only once).

Table 1: Extracted data from example narrative.

Entities [’Private Bank’, ’4500 Rs.’,

’Victim’]

Events [’fraudulent transaction of

4500 Rs.’, ’no OTP received’, ’no

message received’]

Scene Graph { ’nodes’: [{ ’id’: ’Private

Bank’ }, { ’id’: ’4500 Rs.’ }, {

’id’: ’Victim’ }],

’edges’: [

{ ’source’: ’Private

Bank’, ’target’: ’4500 Rs.’,

’relationship’: ’fraudulent

transaction of’, ’timestamp’: ’T0’

{ ’source’: ’4500 Rs.’,

’target’: ’Victim’, ’relationship’:

’no OTP received’, ’timestamp’:

’T1’ },

{ ’source’: ’4500 Rs.’,

’target’: ’Victim’, ’relationship’:

’no message received’, ’timestamp’:

’T2’ }

] }

4.1 Preprocessing

We preprocess raw multilingual cybercrime narratives

by prompting a large language model to produce

uniﬁed English summaries that preserve key events,

entities, and relations, thereby standardizing input

for downstream analysis. Named Entity Recognition

then extracts persons, organizations, and other salient

entities, while dependency parsing isolates event

predicates and arguments, yielding entity–action–

entity triples with synthetic timestamps derived from

narrative order. These actions serve as event labels.

We do not yet merge semantically similar actions (e.g.,

“withdrew money from ATM” vs. “cash withdrawal at

ATM”), leaving sparsity reduction to future work.

Sparsity and Pattern Learning: The sheer

number of unique events means that most of the

event sequences in the training set are nearly unique

in their exact surface form. Therefore, the model

cannot rely on memorizing frequent event diagrams

or templates; it must learn higher-level patterns or

analogies. For example, even if ”phishing email

→

bank transaction” appears only once, the model

might learn a broader pattern that an event involving

a social engineering attack (like phishing) is often

followed by a ﬁnancial fraud event. Our hope is that

by representing events in context with their entities,

the model can learn latent connections (e.g., if the

same bank entity appears in two events, they might

be related). We acknowledge that without explicit

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

696

semantic clustering, this is a difﬁcult task; the model

must generalize from very few examples per event

type. This characteristic of the data set underscores

the need for a structured approach and informs our

decision to compare different modeling strategies

(standard language models vs. event-based BART).

Train/Validation/Test Split: We split the data set

into training, validation, and test sets at the narrative

level. The validation set comprises 25% of the en-

tire dataset, while the remaining 75% is divided into

training and test sets in an 80:20 ratio (resulting in

approximately 60% training and 15% test). This strat-

egy ensures each narrative appears in only one set,

reducing train-test overlap.

5 EXPERIMENTS AND RESULTS

5.1 Next-Event Prediction

In this, we evaluate the prediction of the next event

using a ﬁne-tuned BART model. Here, the model is

provided with a dynamic temporal event scene graph

constructed from a cybercrime narrative up to time

—that is, it observes all events

, e

, . . . , e

n−1

prior

to the incident to be predicted. The objective is to

generate the subsequent event

at time

t + 1

. Unlike

approaches that beneﬁt from both preceding and suc-

ceeding context (as in intermediate event inpainting),

this next-event prediction task is constrained by the

fact that only historical events—with the last observed

event

n−1

serving as the immediate cue—are used for

prediction. This limited context typically results in

fewer generated examples and often leads the model

to produce degenerate outputs, such as simply echoing

the ﬁnal observed event, thereby failing to introduce

the necessary novelty for e

To address these challenges, BART is ﬁne-tuned to

generate a descriptive textual output for the predicted

event. This output is subsequently parsed into its struc-

tured components—source, target, relationship, and

timestamp. The evaluation protocol is twofold. Firstly,

we utilize text similarity metrics: ROUGE scores quan-

tify the lexical overlap between the generated and true

event descriptions, while an embedding-based seman-

tic similarity metric assesses the alignment in meaning

between them. Secondly, we employ rank-based met-

rics by reporting Hit@K (for

K = 1

), which indi-

cate whether the ground-truth event appears among the

top

model predictions. Table 2 summarizes these

overall performance metrics for next-event prediction,

including Semantic Similarity, ROUGE-1, ROUGE-2,

ROUGE-L, and Hit@K values.

Table 2: Performance on Next-Event Prediction using BART.

Metric Score

Semantic Similarity 0.5459

ROUGE-1 0.4283

ROUGE-2 0.2199

ROUGE-L 0.41695

Hit@1 0.0094

Hit@2 0.0135

Hit@3 0.0165

Hit@4 0.0183

Hit@5 0.0193

Additionally, we compute the accuracy for each

individual event component to better understand the

model’s ability to correctly identify the roles and

the temporal ordering within the event. Table 3 pro-

vides a detailed analysis of the ﬁrst predicted event

(i.e., Hit@1), reporting component-wise accuracies for

source, target, relationship, and timestamp.

This experimental setup is inherently challenging

because predicting an unseen future event using solely

the historical sequence—where only

n−1

directly in-

forms the prediction of

—provides less contextual

information compared to settings that incorporate both

past and future cues. Consequently, the limited num-

ber of training examples and the absence of forward-

looking indicators lead to lower overall performance,

particularly in terms of Hit@K scores and the accuracy

of predicting the target and relationship components.

Table 2 summarizes the overall performance met-

rics for next-event prediction, ordered as: Semantic

Similarity, ROUGE-1, ROUGE-2, ROUGE-L, fol-

lowed by Hit@1 to Hit@5. The average semantic

similarity between the predicted and ground-truth

events is 0.5459, indicating that the model output is

somewhat related in meaning to the intended events.

The ROUGE scores are moderate, with ROUGE-1

at 0.4283, ROUGE-2 at 0.2199, and ROUGE-L at

0.41695, suggesting partial lexical overlap between

predicted and actual events. The Hit metrics show that

Hit@1 is only 0.0094 and Hit@5 is 0.0193, meaning

that the correct (exact) event appears as the top predic-

tion in less than 1% of cases and within the top ﬁve

predictions in fewer than 2% of cases.The prediction

components analysis (Table 3) reveals that while the

model correctly identiﬁes the source 49.72% of the

time, it struggles with predicting the target (11.59%)

and the relationship (4.02%), even though the times-

tamp is correctly predicted in 91.76% of instances.

For a more detailed analysis of the ﬁrst predicted

event (i.e. Hit@1), Table 3 reports the component-wise

accuracies.

Next-Event Prediction in Cybercrime Complaint Narratives Using Temporal Event Scene Graphs

697

Table 3: Prediction Components Analysis for Hit@1 in Next-

Event Prediction.

Component Accuracy

Source Accuracy 0.4972

Target Accuracy 0.1159

Relationship Accuracy 0.0402

Timestamp Accuracy 0.9176

Table 4: Performance on Intermediate Event Inpainting using

BART.

Metric Score

Semantic Similarity 0.6073

ROUGE-1 0.4968

ROUGE-2 0.2920

ROUGE-L 0.4844

Hit@1 0.0287

Hit@2 0.0417

Hit@3 0.0489

Hit@4 0.0535

Hit@5 0.0559

5.2 Intermediate Event Inpainting

Here, we evaluate intermediate event inpainting using

the same BART architecture. In this task, an event

in the middle of a narrative is hidden, and the model

must infer this missing event given the surrounding

context (all prior events up to time

and subsequent

events after time

t + 1

). The model is provided with

a narrative with a ‘gap’ and is asked to ﬁll that gap

with a plausible event that connects logically to both

the preceding and the following events. We ﬁne-tuned

the model on this inpainting task, expecting that the

additional future context would guide the generation

of the missing event.

Table 4 shows the overall performance metrics

for the inpainting task, ordered as: Semantic Simi-

larity, ROUGE-1, ROUGE-2, ROUGE-L, followed by

Hit@1 to Hit@5. The table shows that the average

semantic similarity in intermediate event inpainting is

0.6073, which is higher than in the Next-Event Pre-

diction—indicating that the painted events are seman-

tically closer to the true events. The ROUGE scores

also show improvement, with ROUGE-1 at 0.4968,

ROUGE-2 at 0.2920, and ROUGE-L at 0.4844. The

Hit metrics reveal that Hit@1 is 0.0287 and Hit@5

is 0.0559, signifying an increased likelihood of the

correct event appearing in the top predictions when

both past and future contexts are available.

Similarly, Table 5 details the prediction compo-

nents for the ﬁrst predicted event (Hit@1) in the

inpainting experiment. The prediction components

Table 5: Prediction Components Analysis for Hit@1 in In-

termediate Event Inpainting.

Component Accuracy

Source Accuracy 0.5723

Target Accuracy 0.2261

Relationship Accuracy 0.0876

Timestamp Accuracy 0.9768

analysis (Table 5) indicates enhanced performance in

event component prediction, with source accuracy at

57.23%, target accuracy at 22.61%, relationship accu-

racy at 8.76%, and timestamp accuracy at 97.68%.

5.3 Comparative Analysis of Results

Comparing the results of the next event prediction and

intermediate event inpainting side by side, we observe

a consistent improvement in all metrics when perform-

ing intermediate event inpainting instead of the next

event prediction. The semantic similarity increases

from 0.5459 to 0.6073, and ROUGE scores are higher

in intermediate event inpainting (e.g., ROUGE-1 im-

proves from 0.4283 to 0.4968 and ROUGE-L from

0.41695 to 0.4844). The Hit metrics approximately

triple, with Hit@1 increasing from 0.0094 to 0.0287

and Hit@5 from 0.0193 to 0.0559. These improve-

ments suggest that providing both preceding and sub-

sequent context allows the model to generate missing

events more accurately.The prediction component ac-

curacies are also notably higher in intermediate event

inpainting, particularly for the target and relationship,

which roughly double in accuracy compared to Next-

Event Prediction, while the source and timestamp pre-

dictions show moderate improvements.

5.4 Qualitative Analysis and Error

Discussion

Table 6 shows both successes and failure modes. The

most frequent issue is entity confusion: in the third row

the model predicts “phone – priced at

→

Rs 38 000”

at T1, whereas the gold event is “suspect – claimed

to be selling on

→

MARKETPLACE,” misassigning

subject and predicate. A second pattern is loss of ﬁne-

grained detail: in the second row it outputs “individual

– made payment via

→

PAYMENT” instead of the

precise “individual – completed payment.” Yet, with

unambiguous context the model can be exact, as in

the ﬁrst row where it perfectly recovers “hacker – sent

inappropriate messages

→

friend” at T1. These ob-

servations align with moderate ROUGE and semantic-

similarity scores: predictions capture the event frame

but often slip on roles or verbs. Prior experiments with

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

698

Table 6: Qualitative examples for Intermediate Event Inpainting (three illustrative cases). Each row shows the masked input

sequence, the model’s prediction, and the ground-truth event. Entity tokens replace personal information and proper nouns

(e.g., SOCIAL, ECOMMERCE, PAYMENT).

Input (with [MASK]) Predicted Event Target Event

Complete: account - account hack

→

hacker at T0

; [MASK] ; hacker - sent nude photos

→

friend

at T2 ; victim - seeking assistance regarding the

incident → assistance at T3

hacker - sent inappropriate mes-

sages → friend at T1

hacker - sent inappropriate mes-

sages → friend at T1

Complete: individual - encountered

→

SOCIAL at

T0 ; SOCIAL - misleading advertisement for a sale

→

ECOMMERCE at T1 ; individual - clicked link

to fraudulent website

→

ECOMMERCE at T2 ; in-

dividual - purchased

→

MOBILE at T3 ; [MASK]

; PAYMENT - payment amount

→

Rs 1999 at T5

; individual - transaction identiﬁed as fraudulent

→

fraudsters at T6 ; individual - requested action

against

→

fraudsters at T7 ; individual - requested

to freeze associated accounts → fraudsters at T8

individual - made payment via

→ PAYMENT at T4

individual - completed payment

→ PAYMENT at T4

Complete: suspect - sold on

→

phone at T0 ;

[MASK] ; suspect - manipulated into paying

→

victim at T2 ; victim - paid total amount of

→

Rs 38000 at T3 ; suspect - requested additional

payment of

→

Rs 11000 at T4 ; suspect - falsely

claimed penalty to

→

victim at T5 ; suspect -

claimed to be in → MILITARY at T6

phone - priced at

→

Rs 38000

at T1

suspect - claimed to be selling

on → MARKETPLACE at T1

GPT-2 and standard T5 were even less accurate, likely

due to sparse event inventories and long-range depen-

dencies. BART, guided by dynamic temporal scene

graphs, offers a stronger inductive bias but still re-

quires richer role-aware encodings and semantic clus-

tering to curb entity confusion and sparsity.

6 DISCUSSIONS

The experimental ﬁndings highlight key insights into

modeling cybercrime narratives through event predic-

tion. Quantitatively, the intermediate event inpainting

task demonstrates a clear advantage over next-event

prediction. The availability of both preceding and sub-

sequent context yields higher semantic similarity and

improved ROUGE scores, as well as signiﬁcantly en-

hanced Hit@K metrics. These improvements conﬁrm

that additional future context helps mitigate challenges

inherent to unidirectional prediction, such as the lim-

ited cue provided by the immediate past event.

Qualitatively, the analysis reveals critical error pat-

terns that impact overall performance. In the next-

event prediction task, a notable issue observed with

models such as T5 is the tendency to simply repeat the

ﬁnal observed event. Additionally, there is a prevalent

confusion between the roles of entities, speciﬁcally a

misassignment of source and target, which is particu-

larly detrimental. In contrast, the inpainting approach

beneﬁts from a more robust context that reduces these

errors, although difﬁculties in precisely capturing com-

plex relationships remain. These challenges are further

compounded by the lack of unique event patterns in cy-

bercrime data, highlighting the importance of effective

long-range dependency modeling. These observations

underscore the importance of structured inputs, such as

dynamic temporal event scene graphs, and suggest that

future work should focus on reﬁning entity role dif-

ferentiation and exploring hybrid architectures. Such

advancements may lead to further improvements in the

capture of the nuances of cybercrime narratives and

the improvement of prediction accuracy.

7 CONCLUSION

We presented a dynamic temporal event scene graph

approach for next-event prediction in cybercrime nar-

ratives. By converting free-text reports into event

sequences, we harnessed pretrained BART to pre-

dict missing events. We compared next-event pre-

diction and intermediate-event inpainting. Quantita-

tive evaluations (Hit@K, ROUGE, semantic similar-

ity) show next-event prediction remains challenging

(

Hit@1 < 1%

), while inpainting leveraging both prior

and subsequent context triples. Qualitative analysis re-

vealed issues like event repetition and entity confusion.

Future work will address these via event clustering,

Next-Event Prediction in Cybercrime Complaint Narratives Using Temporal Event Scene Graphs

699

entity resolution, event standardization, and external

knowledge integration. Our framework also boosts

interpretability and situational awareness by detecting

subtle narrative shifts, offering actionable insights for

law enforcement.

ACKNOWLEDGEMENT

The authors gratefully acknowledge the Indian Space

Research Organisation (ISRO) for supporting this

work ﬁnancially.

REFERENCES

Al-Zaidy, R., Fung, B. C., Youssef, A. M., and Fortin, F.

(2012). Mining criminal networks from unstructured

text documents. Digital Investigation, 8(3-4):147–160.

Chambers, N. and Jurafsky, D. (2008). Unsupervised learn-

ing of narrative event chains. In Proceedings of ACL-

08: HLT, pages 789–797.

Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-

Rodriguez, M., and Song, L. (2016, August). Recurrent

marked temporal point processes: Embedding event

history to vector. In Proceedings of the 22nd ACM

SIGKDD International Conference on Knowledge Dis-

covery and Data Mining, pages 1555–1564.

Hou, M., Hu, X., Cai, J., Han, X., and Yuan, S. (2022).

An integrated graph model for spatial–temporal urban

crime prediction based on attention mechanism. ISPRS

International Journal of Geo-Information, 11(5):294.

Khairova, N., Mamyrbayev, O., Rizun, N., Razno, M., and

Galiya, Y. (2023). A parallel corpus-based approach to

the crime event extraction for low-resource languages.

IEEE Access, 11:54093–54111.

Kochakarn, P., De Martini, D., Omeiza, D., and Kunze, L.

(2023, May). Explainable action prediction through

self-supervision on scene graphs. In 2023 IEEE In-

ternational Conference on Robotics and Automation

(ICRA), pages 1479–1485. IEEE.

Kong, Q., Zhang, Y., Liu, Y., Tong, P., Liu, E., and Zhou,

F. (2025). Language-TPP: Integrating Temporal Point

Processes with Language Models for Event Analysis.

arXiv preprint arXiv:2502.07139.

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed,

A., Levy, O., . . . and Zettlemoyer, L. (2019). Bart: De-

noising sequence-to-sequence pre-training for natural

language generation, translation, and comprehension.

arXiv preprint arXiv:1910.13461.

Li, Y. and Liu, W. (2022). Sudden event prediction

based on event knowledge graph. Applied Sciences,

12(21):11195.

Li, Z., Ding, X., and Liu, T. (2018). Constructing narrative

event evolutionary graph for script event prediction.

arXiv preprint arXiv:1805.05081.

Mei, H. and Eisner, J. M. (2017). The neural hawkes process:

A neurally self-modulating multivariate point process.

In Advances in Neural Information Processing Systems,

30.

Roshankar, R. and Keyvanpour, M. R. (2023, November).

Spatio-temporal graph neural networks for accurate

crime prediction. In 2023 13th International Confer-

ence on Computer and Knowledge Engineering (IC-

CKE), pages 168–173. IEEE.

Schank, R. C. and Abelson, R. P. (1977). Scripts, plans,

goals and understanding: an inquiry into human

Knowledge structures. Lawrence Erlbaum, Oxford.

Slam, M. I. K., Saifuddin, K. M., Hossain, T., and Akbas, E.

(2024, December). Dygcl: Dynamic graph contrastive

learning for event prediction. In 2024 IEEE Inter-

national Conference on Big Data (BigData), pages

559–568. IEEE.

Xia, L., Huang, C., Xu, Y., Dai, P., Bo, L., Zhang, X.,

and Chen, T. (2022). Spatial-temporal sequential

hypergraph network for crime prediction with dy-

namic multiplex relation learning. arXiv preprint

arXiv:2201.02435.

Yang, C. (2023). TransCrimeNet: A transformer-based

model for text-based crime prediction in criminal net-

works. arXiv preprint arXiv:2311.09529.

Zhu, S. and Xie, Y. (2022). Spatiotemporal-textual point

processes for crime linkage detection. The Annals of

Applied Statistics, 16(2):1151–1170.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

700