DGDNN: Decoupled Graph Diffusion Neural Network for Stock

Movement Prediction

Zinuo You

, Zijian Shi

, Hongbo Bo

1,3

, John Cartlidge

, Li Zhang

and Yan Ge

School of Computer Science, University of Bristol, Bristol, U.K.

School of Engineering Mathematics and Technology, University of Bristol, Bristol, U.K.

NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle, U.K.

Department of Engineering Science, University of Oxford, U.K.

Keywords:

Stock Prediction, Graph Neural Network, Graph Structure Learning, Information Propagation.

Abstract:

Forecasting future stock trends remains challenging for academia and industry due to stochastic inter-stock dy-

namics and hierarchical intra-stock dynamics inﬂuencing stock prices. In recent years, graph neural networks

have achieved remarkable performance in this problem by formulating multiple stocks as graph-structured

data. However, most of these approaches rely on artiﬁcially deﬁned factors to construct static stock graphs,

which fail to capture the intrinsic interdependencies between stocks that rapidly evolve. In addition, these

methods often ignore the hierarchical features of the stocks and lose distinctive information within. In this

work, we propose a novel graph learning approach implemented without expert knowledge to address these

issues. First, our approach automatically constructs dynamic stock graphs by entropy-driven edge generation

from a signal processing perspective. Then, we further learn task-optimal dependencies between stocks via

a generalized graph diffusion process on constructed stock graphs. Last, a decoupled representation learning

scheme is adopted to capture distinctive hierarchical intra-stock features. Experimental results demonstrate

substantial improvements over state-of-the-art baselines on real-world datasets. Moreover, the ablation study

and sensitivity study further illustrate the effectiveness of the proposed method in modeling the time-evolving

inter-stock and intra-stock dynamics.

1 INTRODUCTION

The stock market has long been an intensively dis-

cussed research topic by investors pursuing proﬁtable

trading opportunities and policymakers attempting to

gain market insights. Recent research advancements

have primarily concentrated on exploring the poten-

tial of deep learning models, driven by their ability

to model complex non-linear relationships (Bo et al.,

2023) and automatically extract high-level features

from raw data (Akita et al., 2016; Shi and Cartlidge,

2022). These abilities further enable the capture of

intricate patterns in stock market data that traditional

statistical methods might omit. However, the efﬁ-

cient market theory (Malkiel, 2003) and the random

walk nature of stock prices make it challenging to

predict exact future prices with high accuracy (Adam

et al., 2016). As a result, research efforts have shifted

towards the more robust task of anticipating stock

movements (Jiang, 2021).

Early works (Roondiwala et al., 2017; Bao et al.,

2017) commonly adopt deep learning techniques to

extract temporal features from historical stock data

and predict stock movements accordingly. However,

these methods assume independence between stocks,

neglecting their rich connections. In reality, stocks

are often interrelated from which valuable informa-

tion can be derived. These complicated relations be-

tween stocks are crucial for understanding the stock

markets (Deng et al., 2019; Feng et al., 2019b; Feng

et al., 2022).

To bridge this gap, some deep learning models at-

tempt to model the interconnections between stocks

by integrating textual data (Sawhney et al., 2020),

such as tweets (Xu and Cohen, 2018) and news (Li

et al., 2020b). Nevertheless, these models heavily rely

on the quality of embedded extra information, result-

ing in highly volatile performance. Meanwhile, the

transformer-based methods introduce different atten-

tion mechanisms to capture inter-stock relations based

on multiple time series (i.e., time series of stock indi-

cators, such as open price, close price, highest price,

You, Z., Shi, Z., Bo, H., Cartlidge, J., Zhang, L. and Ge, Y.

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction.

DOI: 10.5220/0012406400003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 2, pages 431-442

ISBN: 978-989-758-680-4; ISSN: 2184-433X

431

lowest price, and trading volume) (Yoo et al., 2021;

Ding et al., 2021). Despite this advancement, these

methods often lack explicit modeling of temporal in-

formation of these time series, such as temporal order

and inter-series information (Zhu et al., 2021; Wen

et al., 2022).

Recently, Graph Neural Networks (GNNs) have

shown promising performance in analyzing various

real-world networks or systems by formulating them

as graph-structured data, such as transaction net-

works (Pareja et al., 2020), trafﬁc networks (Wang

et al., 2020), and communication networks (Li et al.,

2020a). Typically, these networks possess multiple

entities interacting over time, and time series data

can characterize each entity. Analyzing stock markets

as complex networks is a natural choice, as previous

works indicate (Liu and Arunkumar, 2019; Shahzad

et al., 2018). Moreover, various interactive mech-

anisms (e.g., transmitters and receivers (Shahzad

et al., 2018)) that exist between stocks can be eas-

ily represented by edges (Cont and Bouchaud, 2000).

Therefore, these properties make GNNs powerful

candidates for explicitly grasping inter-stock rela-

tions and capturing intra-stock patterns with stock

graphs (Sawhney et al., 2021a; Xiang et al., 2022).

However, existing GNN-based models face two

fundamental challenges for stock movement predic-

tion: representing complicated time-evolving inter-

stock dependencies and capturing hierarchical fea-

tures of stocks. First, speciﬁc groups of related stocks

are affected by various factors, which change stochas-

tically over time (Huynh et al., 2023). Most graph-

based models (Kim et al., 2019; Ye et al., 2021;

Sawhney et al., 2021b) construct time-invariant stock

graphs, which are contrary to the stochastic and time-

evolving nature of the stock market (Adam et al.,

2016). For instance, inter-stock relations are com-

monly pre-determined by sector or ﬁrm-speciﬁc rela-

tionships (e.g., belonging to the same industry (Sawh-

ney et al., 2021b) or sharing the same CEO (Kim

et al., 2019)). Besides, artiﬁcially deﬁned graphs for

speciﬁc tasks may not be versatile or applicable to

other tasks. Sticking to rigid graphs risks introduc-

ing noise and task-irrelevant patterns to models (Chen

et al., 2020). Therefore, generating appropriate stock

graphs and learning task-relevant topology remains a

preliminary yet critical part of GNN-based methods in

predicting stock movements. Second, stocks possess

distinctive hierarchical features (Mantegna, 1999;

Sawhney et al., 2021b) that remain under-exploited

(e.g., overall market trends, group-speciﬁc dynamics,

and individual trading patterns (Huynh et al., 2023)).

Previous works indicate that these hierarchical intra-

stock features could distinguish highly related stocks

from different levels and be utilized for learning bet-

ter and more robust representations (Huynh et al.,

2023; Mantegna, 1999). However, in the conventional

GNN-based methods, representation learning is com-

bined with the message-passing process between im-

mediate neighbors in the Euclidean space. As a re-

sult, node representations become overly similar as

the message passes, severely distorting the distinc-

tive individual node information (Huang et al., 2020;

Rusch et al., 2023; Liu et al., 2020). Hence, preserv-

ing these hierarchical intra-stock features is necessary

for GNN-based methods in predicting stock move-

ments.

In this paper, we propose the Decoupled Graph

Diffusion Neural Network (DGDNN) to address the

abovementioned challenges. Overall, we treat stock

movement prediction as a temporal node classiﬁca-

tion task and optimize the model toward identifying

movements (classes) of stocks (nodes) on the next

trading day. The main contributions of this paper are

summarised as follows:

• We exploit the information entropy of nodes as

their pair-wise connectivities with ratios of node

energy as weights, enabling the modeling of in-

trinsic time-varying relations between stocks from

the view of information propagation.

• We extend the layer-wise update rule of conven-

tional GNNs to a decoupled graph diffusion pro-

cess. This allows for learning the task-optimal

graph topology and capturing the hierarchical fea-

tures of multiple stocks.

• We conduct extensive experiments on real-world

stock datasets with 2,893 stocks from three mar-

kets (NASDAQ, NYSE, and SSE). The experi-

mental results demonstrate that DGDNN signif-

icantly outperforms state-of-the-art baselines in

predicting the next trading day movement, with

improvements of 9.06% in classiﬁcation accuracy,

0.09 in Matthew correlation coefﬁcient, and 0.06

in F1-Score.

2 RELATED WORK

This section provides a brief overview of relevant

studies.

2.1 GNN-Based Methods for Modeling

Multiple Stocks

The major advantage of applying GNNs lies in

their graphical structure, which allows for explic-

itly modeling the relations between entities. For in-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

432

stance, STHAN-SR (Sawhney et al., 2021a), which

is similar to the Graph Attention Neural Networks

(GATs) (Veli

ckovi

c et al., 2018), adopts a spatial-

temporal attention mechanism on a hypergraph with

industry and corporate edges to capture inter-stock re-

lations on the temporal domain and spatial domain.

HATS (Kim et al., 2019) predicts the stock movement

by a GAT-based method that the immediate neighbor

nodes are selectively aggregated with learned weights

on manually crafted multi-relational stock graphs.

Moreover, HyperStockGAT (Sawhney et al., 2021b)

leverages graph learning in hyperbolic space to cap-

ture the heterogeneity of node degree and hierarchical

nature of stocks on an industry-related stock graph.

This method illustrates that the node degree of stock

graphs is not evenly distributed. Nonetheless, these

methods directly correlate the stocks by empirical

assumptions or expert knowledge to construct static

stock graphs, contradicting the time-varying nature of

the stock market.

2.2 Graph Topology Learning

To address the constraint of GNNs relying on the

quality of raw graphs, researchers have proposed

graph structure learning to optimize raw graphs for

improved performance in downstream tasks. These

methods can be broadly categorized into direct pa-

rameterizing approaches and neural network ap-

proaches. In the former category, methods treat the

adjacency matrix of the target graph as free param-

eters to learn. Pro-GNN, for instance, demonstrates

that reﬁned graphs can gain robustness by learning

perturbed raw graphs guided by critical properties of

raw graphs (Jin et al., 2020). GLNN (Gao et al.,

2020) integrates sparsity, feature smoothness, and ini-

tial connectivity into an objective function to obtain

target graphs. In contrast, neural network approaches

employ more complex neural networks to model edge

weights based on node features and representations.

For example, SLCNN utilizes two types of convolu-

tional neural networks to learn the graph structure at

both the global and local levels (Zhang et al., 2020).

GLCN integrates graph learning and convolutional

neural networks to discover the optimal graph struc-

ture that best serves downstream tasks (Jiang et al.,

2019). Despite these advancements, direct parameter-

izing approaches often necessitate complex and time-

consuming alternating optimizations or bi-level opti-

mizations, and neural network approaches may over-

look the unique characteristics of graph data or lose

the positional information of nodes.

2.3 Decoupled Representation Learning

Various networks or systems exhibit unique character-

istics that are challenging to capture within the con-

straints of Euclidean space, particularly when rely-

ing on manually assumed prior knowledge (Huynh

et al., 2023; Sawhney et al., 2021b). In addressing

this challenge, DAGNN (Liu et al., 2020) offers the-

oretical insights, emphasizing that the entanglement

between representation transformation and message

propagation can hinder the performance of message-

passing GNNs. SHADOW-GNN (Zeng et al., 2021),

on the other hand, concentrates on decoupling the

representation learning process both in depth and

scope. By learning on multiple subgraphs with arbi-

trary depth, SHADOW-GNN preserves the distinctive

information of localized subgraphs instead of globally

smoothing them into white noise. Another approach,

MMP (Chen et al., 2022), transforms updated node

messages into self-embedded representations. It then

selectively aggregates these representations to form

the ﬁnal graph representation, deviating from the di-

rect use of representations from the message-passing

process.

3 PRELIMINARY

In this section, we present the fundamental notations

used throughout this paper and details of the problem

setting. Nodes represent stocks, node features repre-

sent their historical stock indicators, and edges repre-

sent interconnections between stocks.

3.1 Notation

Let G

(V , E

) represents a weighted and directed

graph on trading day t, where V is the set of nodes

(stocks) {v

,..., v

} with the number of nodes as

|V | = N, and E

is the set of edges (inter-stock re-

lations). Let A

∈ R

N×N

represents the adjacency ma-

trix and its entry (A

)

i, j

represents an edge from v

to v

. The node feature matrix is denoted as X

∈

N×(τM)

, where M represents the number of stock in-

dicators (i.e., open price, close price, highest price,

lowest price, trading volume, etc), and τ represents

the length of the historical lookback window. The fea-

ture vector of v

on trading day t is denoted as x

t,i

. Let

t,i

represent the label of v

on trading day t, where

∈ R

N×1

is the label matrix on trading day t.

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

433

Entropy-driven

Edge

Generation

Generalized

Graph

Diffusion

Multi-head

Attention

Multi-Layer

Perceptron

Concatenation Concatenation

l-1

Unconnected

Connected

Generalized

Graph

Diffusion

Generalized

Graph

Diffusion

t,0

t,1

t,i

Signal

Energy

Information

Entropy

Minimizing

t+1

Generalized

Graph

Diffusion

Multi-head

Attention

Multi-head

Attention

Multi-head

Attention

Figure 1: The DGDNN framework consists of three steps: (1) constructing the raw stock graph G

(see Section 4.1); (2) learn-

ing the task-optimal graph topology by generalized graph diffusion (see Section 4.2); (3) applying a hierarchical decoupled

representation learning scheme (see Section 4.3).

3.2 Problem Setting

Since we are predicting future trends of multiple

stocks by utilizing their corresponding stock indica-

tors, we transform the regression task of predicting

exact stock prices into a temporal node classiﬁca-

tion task. Similar to previous works on stock move-

ment prediction (Kim et al., 2019; Xiang et al., 2022;

Sawhney et al., 2021a; Xu and Cohen, 2018; Li et al.,

2021), we refer to this common and important task

as next trading day stock trend classiﬁcation. Given

a set of stocks on the trading day t, the model learns

from a historical lookback window of length τ (i.e.,

[t −τ+1,t]) and predicts their labels in the next times-

tamp (i.e., trading day t + 1). The mapping relation-

ship of this work is expressed as follows,

f (G

(V , E

)) −→ C

t+1

. (1)

Here, f (·) represents the proposed method DGDNN.

4 METHODOLOGY

In this section, we detail the framework of the pro-

posed DGDNN in depth, as depicted in Fig 1.

4.1 Entropy-Driven Edge Generation

Deﬁning the graph structure is crucial for achieving

reasonable performance for GNN-based approaches.

In terms of stock graphs, traditional methods often es-

tablish static relations between stocks through human

labeling or natural language processing techniques.

However, recent practices have proven that generating

dynamic relations based on historical stock indicators

is more effective (Li et al., 2021; Xiang et al., 2022).

These indicators, as suggested by previous ﬁnancial

studies (Dessaint et al., 2019; Cont and Bouchaud,

2000; Liu and Arunkumar, 2019), can be treated as

noisy temporal signals. Simultaneously, stocks can

be viewed as transmitters or receivers of informa-

tion signals, inﬂuencing other stocks (Shahzad et al.,

2018; Ferrer et al., 2018). Additionally, stock markets

exhibit signiﬁcant node-degree heterogeneity, with

highly inﬂuential stocks having relatively large node

degrees (Sawhney et al., 2021b; Arora et al., 2006).

Consequently, we propose to model interdepen-

dencies between stocks by treating the stock market as

a communication network. Prior research (Yue et al.,

2020) generates the asymmetric inter-stock relations

based on transfer entropy. Nonetheless, the complex

estimation process of transfer entropy and the limited

consideration of edge weights hamper the approxima-

tion of the intrinsic inter-stock connections.

To this end, we quantify the links between nodes

by utilizing the information entropy as the directional

connectivity and signal energy as its intensity. On the

one hand, if the information can propagate between

entities within real-world systems, the uncertainty or

randomness is reduced, resulting in a decrease in en-

tropy and an increase in predictability at the receiving

entities (Jaynes, 1957; Csisz

ar et al., 2004). On the

other hand, the energy of the signals reﬂects their in-

tensity during propagation, which can inﬂuence the

received information at the receiver. The entry (A

)

i, j

is deﬁned by,

)

i, j

E(x

t,i

)

E(x

t, j

)

S(x

t,i

)+S(x

t, j

)−S(x

t,i

t, j

)

− 1). (2)

Here, E(·) denotes the signal energy, and S(·) denotes

the information entropy. The signal energy of v

obtained by,

E(x

t,i

) =

τM−1

∑

n=0

t,i

[n]|

. (3)

The information entropy of v

is obtained by,

S(x

t,i

) = −

∑

j=0

p(s

)ln p(s

), (4)

where {s

,..., s

} denotes the non-repeating sequence

of x

t,i

and p(s

) represents the probability of value s

By deﬁnition, we can obtain p(s

) by,

p(s

) =

∑

τM−1

n=0

δ(s

− x

t,i

[n])

τM

. (5)

Here δ(·) denotes the Dirac delta function.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

434

4.2 Generalized Graph Diffusion

However, simply assuming constructed graphs are

perfect for performing speciﬁc tasks can lead to dis-

cordance between given graphs and task objectives,

resulting in sub-optimal model performance (Chen

et al., 2020). Several methods have been proposed

to mitigate this issue, including AdaEdge (Chen et al.,

2020) and DropEdge (Rong et al., 2019). These meth-

ods demonstrate notable improvements in node clas-

siﬁcation tasks by adding or removing edges to per-

turb graph topologies, enabling models to capture and

leverage critical topological information.

With this in mind, we propose to utilize a general-

ized diffusion process on the constructed stock graph

to learn the task-optimal topology. It enables more ef-

fective capture of long-range dependencies and global

information on the graph by diffusing information

across larger neighborhoods (Klicpera et al., 2019).

The following equation deﬁnes the generalized

graph diffusion at layer l,

K−1

∑

k=0

l,k

K−1

∑

k=0

l,k

= 1. (6)

Here Q

denotes the diffusion matrix, K denotes the

maximum diffusion step, θ

l,k

denotes the weight coef-

ﬁcients, and T

l,k

denotes the column-stochastic tran-

sition matrix. Speciﬁcally, generalized graph diffu-

sion transforms the given graph structure into a new

one while keeping node signals neither ampliﬁed nor

reduced. Consequently, the generalized graph diffu-

sion turns the information exchange solely between

adjacent connected nodes into broader unconnected

areas of the graph.

Notably, θ

l,k

and T

l,k

can be determined in ad-

vance (Klicpera et al., 2019). For instance, we can

use the heat kernel or the personalized PageRank to

deﬁne θ

l,k

, and the random walk transition matrix or

symmetric transition matrix to deﬁne T

l,k

. Although

these pre-deﬁned mappings perform well in some

datasets (e.g., CORA, CiteSeer, and PubMed) with

time-invariant relations (Zhao et al., 2021), they are

not feasible for tasks that require considering chang-

ing relationships.

Therefore, we make θ

l,k

as trainable parameters,

l,k

as trainable matrices, and K as a hyperparame-

ter. Furthermore, we introduce a neighborhood ra-

dius (Zhao et al., 2021) to control the effectiveness

of the generalized graph diffusion. The neighborhood

radius at layer l is expressed as,

∑

K−1

k=0

l,k

∑

K−1

k=0

l,k

, r

> 0 (7)

Generalized Graph Diffusion Hierarchical Representation Learning

, A

Generalized

Graph Diffusion

. . .

Generalized

Graph Diffusion

Cat Attention

. . .

layer 0

layer l

. . .

Figure 2: The component-wise layout of hierarchical de-

coupled representation learning with input X

, A

Here, large r

indicates the model explores more on

distant nodes and vice versa.

4.3 Hierarchical Decoupled

Representation Learning

Theoretically, GNNs update nodes by continuously

aggregating direct one-hop neighbors, producing the

ﬁnal representation. However, this can lead to a high

distortion of the learned representation. It is proba-

bly because the message-passing and representation

transformation do not essentially share a ﬁxed neigh-

borhood in the Euclidean space (Liu et al., 2020; Xu

et al., 2018; Chen et al., 2020). To address this is-

sue, decoupled GNNs have been proposed (Liu et al.,

2020; Xu et al., 2018), aiming to decouple these two

processes and prevent the loss of distinctive local in-

formation in learned representation. Similarly, meth-

ods such as HyperStockGAT (Sawhney et al., 2021b)

have explored learning graph representations in hy-

perbolic spaces with attention mechanisms to capture

temporal features of stocks at different levels.

Inspired by these methods, we adopt a hierarchi-

cal decoupled representation learning strategy to cap-

ture hierarchical intra-stock features. Each layer in

DGDNN comprises a Generalized Graph Diffusion

layer and a Cat Attention layer in parallel, as depicted

in Fig. 2. The layer-wise update rule is deﬁned by,

= σ



⊙ A

l−1



′

= σ



ζ(H

||H

′

l−1

+ b



. (8)

Here, H

′

denote the node representation of l − th

layer, σ(·) is the activation function, ζ(·) denotes the

multi-head attention, || denotes the concatenation, and

denotes the layer-wise trainable weight matrix.

4.4 Objective Function

According to Eq. 1, Eq. 6, and Eq. 7, we formulated

the objective function of DGDNN as follows,

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

435

J =

B−1

∑

t=0

t+1

, f (X

)) − α

L−1

∑

l=0

L−1

∑

l=0

(

K−1

∑

k=0

l,k

− 1). (9)

Here, L

(·) denotes the cross-entropy loss, B de-

notes the batch size, L denotes the number of infor-

mation propagation layers, and α denotes the weight

coefﬁcient controlling the neighborhood radius.

5 EXPERIMENT

The experiments are conducted on 3x Nvidia Tesla

T4, CUDA version 11.2. Datasets and source code

are available

5.1 Dataset

Following previous works (Kim et al., 2019; Xiang

et al., 2022; Sawhney et al., 2021a; Li et al., 2021), we

evaluate DGDNN on three real-world datasets from

two US stock markets (NASDAQ and NYSE) and one

China stock market (SSE). We collect historical stock

indicators from Yahoo Finance and Google Finance

for all the selected stocks. We choose the stocks that

span the S&P 500 and NASDAQ composite indices

for the NASDAQ dataset. We select the stocks that

span the Dow Jones Industrial Average, S&P 500, and

NYSE composite indices for the NYSE dataset. We

choose the stocks that compose the SSE 180 for the

SSE dataset. The details of the three datasets are pre-

sented in Table 1.

5.2 Model Setting

Based on grid search, hyperparameters are selected

using sensitivity analysis over the validation period

(see Section 5.6). For NASDAQ, we set α = 2.9 ×

−3

, τ = 19, K = 9, and L = 8. For NYSE, we set

α = 2.7 × 10

−3

, τ = 22, K = 10, and L = 9. For SSE,

we set α = 8.6 ×10

−3

, τ = 14, K = 3, and L = 5. The

training epoch is set to 1200. Adam is the optimizer

with a learning rate of 2 × 10

−4

and a weight decay

of 1.5 × 10

−5

. The number of layers of Muti-Layer

Perceptron is set to 3, the number of heads of Cat At-

tention layers is set to 3, the embedding dimension is

set to 128, and full batch training is selected.

https://github.com/pixelhero98/DGDNN

5.3 Baseline

To evaluate the performance of the proposed model,

we compared DGDNN with the following baseline

approaches:

Table 1: Statistics of NASDAQ, NYSE, and SSE.

NASDAQ NYSE SSE

Train Period 05/2016-06/2017 05/2016-06/2017 05/2016-06/2017

Validation Period 07/2017-12/2017 07/2017-12/2017 07/2017-12/2017

Test Period 01/2018-12/2019 01/2018-12/2019 01/2018-12/2019

# Days Tr:Val:Test 252:64:489 252:64:489 299:128:503

# Stocks 1026 1737 130

# Stock Indicators 5 5 4

# Label per trading day 2 2 2

5.3.1 RNN-Based Baseline

• DA-RNN (Qin et al., 2017). A dual-stage

attention-based RNN model with an encoder-

decoder structure. The encoder utilizes an atten-

tion mechanism to extract the input time-series

feature, and the decoder utilizes a temporal atten-

tion mechanism to capture the long-range tempo-

ral relationships among the encoded series.

• Adv-ALSTM (Feng et al., 2019a). An LSTM-

based model that leverages adversarial training to

improve the generalization ability of the stochas-

ticity of price data and a temporal attention mech-

anism to capture the long-term dependencies in

the price data.

5.3.2 Transformer-Based Baseline

• HMG-TF (Ding et al., 2021). A transformer

method for modeling long-term dependencies of

ﬁnancial time series. The model proposes multi-

scale Gaussian priors to enhance the locality, or-

thogonal regularization to avoid learning redun-

dant heads in multi-head attention, and trading

gap splitter to learn the hierarchical features of

high-frequency data.

• DTML (Yoo et al., 2021). A multi-level context-

based transformer model learns the correlations

between stocks and temporal correlations in an

end-to-end way.

5.3.3 GNN-Based Baseline

• HATS (Kim et al., 2019). A GNN-based model

with a hierarchical graph attention mechanism.

It utilizes LSTM and GRU layers to extract the

temporal features as the node representation, and

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

436

Table 2: ACC, MCC, and F1-Score of proposed DGDNN and other baselines on next trading day stock trend classiﬁcation

over the test period. Bold numbers denote the best results.

Method

NASDAQ NYSE SSE

ACC(%) MCC F1-Score ACC(%) MCC F1-Score ACC(%) MCC F1-Score

DA-RNN (Qin et al., 2017) 57.59±0.36 0.05±1.47×10

−3

0.56±0.01 56.97±0.13 0.06±1.12×10

−3

0.57±0.02 56.19±0.23 0.04±1.24×10

−3

0.52±0.02

Adv-ALSTM (Feng et al., 2019a) 51.16±0.42 0.04±3.88×10

−3

0.53±0.02 53.42±0.30 0.05±2.30×10

−3

0.53±0.02 52.41±0.56 0.03±6.01×10

−3

0.51±0.01

HMG-TF (Ding et al., 2021) 57.18±0.17 0.11±1.64×10

−3

0.59±0.01 58.49±0.12 0.09±2.03×10

−3

0.59±0.02 58.88±0.20 0.12±1.71×10

−3

0.59±0.01

DTML (Yoo et al., 2021) 58.27±0.79 0.07±2.75×10

−3

0.58±0.01 59.17±0.25 0.07±3.07×10

−3

0.60±0.01 59.25±0.38 0.11±4.79×10

−3

0.59±0.02

HATS (Kim et al., 2019) 51.43±0.49 0.01±5.66×10

−3

0.48±0.01 52.05±0.82 0.02±7.42×10

−3

0.50±0.03 53.72±0.59 0.02±3.80×10

−3

0.49±0.01

STHAN-SR (Sawhney et al., 2021a) 55.18±0.34 0.03±4.11×10

−3

0.56±0.01 54.24±0.50 0.01±5.73×10

−3

0.58±0.02 55.01±0.11 0.03±3.09×10

−3

0.57±0.01

GraphWaveNet (Wu et al., 2019) 59.57±0.27 0.07±2.12×10

−3

0.60±0.02 58.11±0.66 0.05±2.21×10

−3

0.59±0.02 60.78±0.23 0.06±1.93×10

−3

0.57±0.01

HyperStockGAT (Sawhney et al., 2021b)

58.23±0.68 0.06±1.23×10

−3

0.59±0.02 59.34±0.19 0.04±5.73×10

−3

0.61±0.02 57.36±0.10 0.09±1.21×10

−3

0.58±0.02

DGDNN 65.07±0.25 0.20±2.33×10

−3

0.63±0.01 66.16±0.14 0.14±1.67×10

−3

0.65±0.01 64.30±0.32 0.19±4.33×10

−3

0.64±0.02

the message-passing is achieved by selectively ag-

gregating the representation of directly adjacent

nodes according to their edge type at each level.

• STHAN-SR (Sawhney et al., 2021a). A GNN-

based model operated on a hypergraph with two

types of hyperedges: industrial hyperedges and

Wikidata corporate hyperedges. The node fea-

tures are generated by temporal Hawkes attention,

and weights of hyperedges are generated by hy-

pergraph attention. The spatial hypergraph con-

volution achieves representation and information-

spreading.

• GraphWaveNet (Wu et al., 2019). A spatial-

temporal graph modeling method that captures the

spatial-temporal dependencies between multiple

time series by combining graph convolution with

dilated casual convolution.

• HyperStockGAT (Sawhney et al., 2021b). A

graph attention network utilizing the hyperbolic

graph representation learning on Riemannian

manifolds to predict the rankings of stocks on the

next trading day based on proﬁtability.

5.4 Evaluation Metric

Following approaches taken in previous works (Kim

et al., 2019; Xiang et al., 2022; Deng et al., 2019;

Sawhney et al., 2021a; Sawhney et al., 2021b), F1-

Score, Matthews Correlation Coefﬁcient (MCC), and

Classiﬁcation Accuracy (ACC) are utilized to evalu-

ate the performance of the models.

5.5 Evaluation Result

The experimental results are presented in Table 2. Our

model outperforms baseline methods regarding ACC,

MCC, and F1-score over three datasets. Speciﬁcally,

DGDNN exhibits average improvements of 10.78%

in ACC, 0.13 in MCC, and 0.10 in F1-Score compared

to RNN-based baseline methods. In comparison to

Transformer-based methods, DGDNN shows average

improvements of 7.78% in ACC, 0.07 in MCC, and

0.05 in F1-Score. Furthermore, when contrasted with

GNN-based models, DGDNN achieves average im-

provements of 7.16% in ACC, 0.12 in MCC, and 0.07

in F1-Score.

We can make the following observations based on

experimental results. First, models such as Graph-

WaveNet, DTML, HMG-TF, DA-RNN, and DGDNN

that obtain the interdependencies between entities

during the learning process perform better in most of

the metrics than those methods (HATS, STHAN-SR,

HyperStockGAT, and Adv-ALSTM) with pre-deﬁned

relationships (e.g., industry and corporate edges) or

without considering dependencies between entities.

Second, regarding the GNN-based models, Hyper-

StockGAT and DGDNN, which learn the graph rep-

resentations in different latent spaces, perform bet-

ter than those (STHAN-SR and HATS) in Euclidean

space.

Fig. 3 presents visualizations of diffusion matri-

ces across three consecutive trading days, with col-

ors representing normalized weights. We make the

following three observations. First, stocks from con-

secutive trading days do not necessarily exhibit sim-

ilar patterns in terms of information diffusion. The

distributions of edge weights change rapidly between

Fig. 3a and Fig. 3b, and between Fig. 3e and Fig. 3f.

Second, shallow layers tend to disseminate informa-

tion across a broader neighborhood. A larger number

of entries in the diffusion matrices are not zero and are

distributed across the matrices in Fig 3a to Fig. 3c. In

contrast, deeper layers tend to focus on speciﬁc lo-

cal areas. The entries with larger absolute values are

more centralized in Fig. 3d to Fig. 3f). Third, even

though the initial patterns from consecutive test trad-

ing days are similar (as shown in Fig. 3b and Fig. 3c),

differences in local structures result in distinctive pat-

terns as the layers deepen (Fig. 3e and Fig. 3f), i.e.,

the weights of edges can show similar distributions

globally, but local areas exhibit different patterns. For

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

437

Diffusion Matrix

Node Index

(a) Q

,t − 2.

Diffusion Matrix

Node Index

(b) Q

,t − 1.

Diffusion Matrix

Node Index

,t.

Diffusion Matrix

Node Index

(d) Q

L−1

,t − 2.

Diffusion Matrix

Node Index

(e) Q

L−1

,t − 1.

Diffusion Matrix

Node Index

(f) Q

L−1

,t.

Figure 3: Example normalized color maps of diffusion matrices from different layers on the NYSE dataset. t = 03/06/2016.

instance, in Fig. 3f, some dark blue clusters are distin-

guished from light blue clusters in shape and weight,

which might be crucial local graph structures.

These results suggest that the complex relation-

ships between stocks are not static but evolve rapidly

over time, and the domain knowledge does not suf-

ﬁciently describe the intrinsic interdependencies be-

tween multiple entities. The manually crafted ﬁxed

stock graph assumes that the stocks of the same

class are connected (Livingston, 1977), neglecting the

possibility that stocks change to different classes as

time changes. Besides, some stocks are more criti-

cal than others in exhibiting the hierarchical nature

of intra-stock dynamics (Mantegna, 1999; Sawhney

et al., 2021b), which is hard to capture in Euclidean

space by directly aggregating representations as the

message-passing process does.

5.6 Hyperparameter Sensitivity

In this section, we explore the sensitivity of two im-

portant hyperparameters: the historical lookback win-

dow size τ and the maximum diffusion step K. These

hyperparameters directly affect the model’s ability to

model the relations between multiple stocks. The sen-

sitivity results of τ and K are shown in Fig. 5. Based

on the sensitivity results, DGDNN consistently per-

forms better on the three datasets when the histori-

cal lookback window size τ ∈ [14,24]. This coincides

with the 20-day (i.e., monthly) professional ﬁnancial

strategies (Adam et al., 2016). Moreover, the opti-

mal K of DGDNN varies considerably with differ-

ent datasets. On the one hand, the model’s perfor-

mance generally improves as K grows on the NAS-

DAQ dataset and the NYSE dataset, achieving the

optimal when K ∈ {9, 10}. On the other hand, the

model’s performance on the SSE dataset reaches the

peak when K = 3 and retains a slightly worse perfor-

mance as K grows. Intuitively, the stock graph of the

SSE dataset is smaller than the NASDAQ dataset and

the NYSE dataset, resulting in a smaller K.

5.7 Ablation Study

The proposed DGDNN consists of three critical com-

ponents: entropy-driven edge generation, generalized

graph diffusion, and hierarchical decoupled represen-

tation learning. We further verify the effectiveness of

each component by removing it from DGDNN. The

ablation study results are shown in Fig. 4.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

438

Figure 4: Results of the ablation study. Blue: P1 denotes entropy-driven edge generation, P2 denotes generalized graph

diffusion, and P3 denotes hierarchical decoupled representation learning. Gray dot line: best baseline accuracy.

Figure 5: Sensitivity study of the historical lookback win-

dow length τ and the maximum diffusion step K over vali-

dation period.

5.7.1 Entropy-Driven Edge Generation

To demonstrate the effectiveness of constructing dy-

namic relations from the stock signals, we replace the

entropy-driven edge generation with the commonly

adopted industry-corporate stock graph using Wiki-

data

(Feng et al., 2019b). We observe that apply-

ing the industry and corporate relationships leads to

an average performance reduction of classiﬁcation ac-

curacy by 9.23%, reiterating the importance of con-

sidering temporally evolving dependencies between

stocks. Moreover, when testing on the NYSE dataset

and the SSE dataset, the degradation of model per-

formance is slightly smaller than on the NASDAQ

https://www.wikidata.org/wiki/Wikidata:List of pro

perties

dataset. According to ﬁnancial studies (Jiang et al.,

2011; Schwert, 2002), the NASDAQ market tends to

be more unstable than the other two. This might in-

dicate that the injection of expert knowledge works

better in less noisy and more stable markets.

5.7.2 Generalized Graph Diffusion

We explore the impact of utilizing the generalized

graph diffusion process. Results of the ablation study

show that DGDNN performs worse without general-

ized graph diffusion on all datasets, with classiﬁca-

tion accuracy reduced by 10.43% on average. This

indicates that the generalized graph diffusion facil-

itates information exchange better than immediate

neighbors with invariant structures. While the per-

formance degradation on the SSE dataset is about

38% of the performance degradation on the NASDAQ

dataset and the NYSE dataset. Since the size of the

stock graphs (130 stocks) of the SSE dataset is much

smaller than the other two (1026 stocks and 1737

stocks), the graph diffusion process has limited im-

provements through utilizing larger neighborhoods.

5.7.3 Hierarchical Decoupled Representation

Learning

The ablation experiments demonstrate that the model

coupling the two processes deteriorates with a reduc-

tion of classiﬁcation accuracy by 9.40% on the NAS-

DAQ dataset, 8.55% on the NYSE dataset, and 5.23%

on the SSE dataset. This observation empirically val-

idates that a decoupled GNN can better capture the

hierarchical characteristic of stocks. Meanwhile, this

suggests that the representation transformation is not

necessarily aligned with information propagation in

Euclidean space. It is because different graphs ex-

hibit various types of inter-entities patterns and intra-

entities features, which do not always follow the as-

sumption of smoothed node features (Liu et al., 2020;

Xu et al., 2018; Li et al., 2018).

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

439

6 CONCLUSION

In this paper, we propose DGDNN, a novel graph

learning approach for predicting the future trends of

multiple stocks based on their historical indicators.

Traditionally, stock graphs are crafted based on do-

main knowledge (e.g., ﬁrm-speciﬁc and industrial re-

lations) or generated by alternative information (e.g.,

news and reports). To make stock graphs appropri-

ately represent complex time-variant inter-stock re-

lations, we dynamically generate raw stock graphs

from a signal processing view considering ﬁnancial

theories of stock markets. Then, we propose lever-

aging the generalized graph diffusion process to opti-

mize the topologies of raw stock graphs. Eventually,

the decoupled representation learning scheme cap-

tures and preserves the hierarchical features of stocks,

which are often overlooked in prior works. The ex-

perimental results demonstrate performance improve-

ments of the proposed DGDNN over baseline meth-

ods. The ablation study results prove the effective-

ness of each module in DGDNN. Besides ﬁnancial

applications, the proposed method can be easily trans-

ferred to tasks that involve multiple entities exhibit-

ing interdependent and time-evolving features. One

limitation of DGDNN is that it generates an overall

dynamic relationship from multiple stock indicators

without sufﬁciently considering the interplay between

them. Notwithstanding the promising results, we plan

to learn multi-relational dynamic stock graphs and al-

low information to be further diffused across different

relational stock graphs in future work.

REFERENCES

Adam, K., Marcet, A., and Nicolini, J. P. (2016). Stock mar-

ket volatility and learning. The Journal of Finance,

71(1):33–82.

Akita, R., Yoshihara, A., Matsubara, T., and Uehara, K.

(2016). Deep learning for stock prediction using nu-

merical and textual information. In IEEE/ACIS 15th

International Conference on Computer and Informa-

tion Science, pages 1–6.

Arora, N., Narayanan, B., and Paul, S. (2006). Financial

inﬂuences and scale-free networks. In Computational

Science–ICCS 2006: Lecture Notes in Computer Sci-

ence, volume 3991, pages 16–23. Springer, Berlin,

Heidelberg.

Bao, W., Yue, J., and Rao, Y. (2017). A deep learning

framework for ﬁnancial time series using stacked au-

toencoders and long-short term memory. PloS one,

12(7):e0180944.

Bo, H., Wu, Y., You, Z., McConville, R., Hong, J., and Liu,

W. (2023). What will make misinformation spread: an

XAI perspective. In World Conference on Explainable

Artiﬁcial Intelligence, pages 321–337. Springer.

Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., and Sun, X.

(2020). Measuring and relieving the over-smoothing

problem for graph neural networks from the topologi-

cal view. In 34th AAAI Conference on Artiﬁcial Intel-

ligence, pages 3438–3445.

Chen, J., Liu, W., and Pu, J. (2022). Memory-based mes-

sage passing: Decoupling the message for propaga-

tion from discrimination. In ICASSP 2022-2022 IEEE

International Conference on Acoustics, Speech and

Signal Processing, pages 4033–4037.

Cont, R. and Bouchaud, J.-P. (2000). Herd behavior and

aggregate ﬂuctuations in ﬁnancial markets. Macroe-

conomic Dynamics, 4(2):170–196.

Csisz

ar, I., Shields, P. C., et al. (2004). Information theory

and statistics: A tutorial. Foundations and Trends® in

Communications and Information Theory, 1(4):417–

528.

Deng, S., Zhang, N., Zhang, W., Chen, J., Pan, J. Z., and

Chen, H. (2019). Knowledge-driven stock trend pre-

diction and explanation via temporal convolutional

network. In Companion Proceedings of The 2019

World Wide Web Conference, pages 678–685.

Dessaint, O., Foucault, T., Fr

esard, L., and Matray, A.

(2019). Noisy stock prices and corporate investment.

The Review of Financial Studies, 32(7):2625–2672.

Ding, Q., Wu, S., Sun, H., Guo, J., and Guo, J. (2021). Hi-

erarchical multi-scale gaussian transformer for stock

movement prediction. In 29th International Joint

Conference on Artiﬁcial Intelligence (IJCAI), pages

4640–4646.

Feng, F., Chen, H., He, X., Ding, J., Sun, M., and Chua,

T.-S. (2019a). Enhancing stock movement prediction

with adversarial training. In 28th International Joint

Conference on Artiﬁcial Intelligence (IJCAI), pages

5843–5849.

Feng, F., He, X., Wang, X., Luo, C., Liu, Y., and Chua, T.-

S. (2019b). Temporal relational ranking for stock pre-

diction. ACM Transactions on Information Systems,

37(2):1–30.

Feng, S., Xu, C., Zuo, Y., Chen, G., Lin, F., and XiaHou,

J. (2022). Relation-aware dynamic attributed graph

attention network for stocks recommendation. Pattern

Recognition, 121:108119.

Ferrer, R., Shahzad, S. J. H., L

opez, R., and Jare

no, F.

(2018). Time and frequency dynamics of connected-

ness between renewable energy stocks and crude oil

prices. Energy Economics, 76:1–20.

Gao, X., Hu, W., and Guo, Z. (2020). Exploring structure-

adaptive graph learning for robust semi-supervised

classiﬁcation. In IEEE International Conference on

Multimedia and Expo (ICME), pages 1–6.

Huang, W., Rong, Y., Xu, T., Sun, F., and Huang, J. (2020).

Tackling over-smoothing for general graph convolu-

tional networks. arXiv preprint arXiv:2008.09864.

Huynh, T. T., Nguyen, M. H., Nguyen, T. T., Nguyen,

P. L., Weidlich, M., Nguyen, Q. V. H., and Aberer, K.

(2023). Efﬁcient integration of multi-order dynamics

and internal dynamics in stock movement prediction.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

440

In 16th ACM International Conference on Web Search

and Data Mining, pages 850–858.

Jaynes, E. T. (1957). Information theory and statistical me-

chanics. Physical review, 106(4):620.

Jiang, B., Zhang, Z., Lin, D., Tang, J., and Luo,

B. (2019). Semi-supervised learning with graph

learning-convolutional networks. In IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 11313–11320.

Jiang, C. X., Kim, J.-C., and Wood, R. A. (2011). A com-

parison of volatility and bid–ask spread for NASDAQ

and NYSE after decimalization. Applied Economics,

43(10):1227–1239.

Jiang, W. (2021). Applications of deep learning in stock

market prediction: recent progress. Expert Systems

with Applications, 184:115537.

Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang,

J. (2020). Graph structure learning for robust graph

neural networks. In 26th ACM SIGKDD International

Conference on Knowledge Discovery & Data Mining,

pages 66–74.

Kim, R., So, C. H., Jeong, M., Lee, S., Kim, J., and Kang,

J. (2019). HATS: A hierarchical graph attention net-

work for stock movement prediction. arXiv preprint

arXiv:1908.07999.

Klicpera, J., Weißenberger, S., and G

unnemann, S. (2019).

Diffusion improves graph learning. In 33rd Interna-

tional Conference on Neural Information Processing

Systems (NeurIPS), pages 13366–13378.

Li, Q., Gama, F., Ribeiro, A., and Prorok, A. (2020a).

Graph neural networks for decentralized multi-robot

path planning. In IEEE/RSJ International Confer-

ence on Intelligent Robots and Systems (IROS), pages

11785–11792.

Li, Q., Han, Z., and Wu, X.-M. (2018). Deeper in-

sights into graph convolutional networks for semi-

supervised learning. In 32nd AAAI Conference on Ar-

tiﬁcial Intelligence, pages 3528–3545.

Li, Q., Tan, J., Wang, J., and Chen, H. (2020b). A multi-

modal event-driven LSTM model for stock prediction

using online news. IEEE Transactions on Knowledge

and Data Engineering, 33(10):3323–3337.

Li, W., Bao, R., Harimoto, K., Chen, D., Xu, J., and Su,

Q. (2021). Modeling the stock relation with graph

network for overnight stock movement prediction. In

29th International Joint Conference on Artiﬁcial In-

telligence (IJCAI), pages 4541–4547.

Liu, C. and Arunkumar, N. (2019). Risk prediction and

evaluation of transnational transmission of ﬁnancial

crisis based on complex network. Cluster Computing,

22:4307–4313.

Liu, M., Gao, H., and Ji, S. (2020). Towards deeper graph

neural networks. In 26th ACM SIGKDD International

Conference on Knowledge Discovery & Data Mining,

pages 338–348.

Livingston, M. (1977). Industry movements of common

stocks. The Journal of Finance, 32(3):861–874.

Malkiel, B. G. (2003). The efﬁcient market hypothesis

and its critics. Journal of economic perspectives,

17(1):59–82.

Mantegna, R. N. (1999). Hierarchical structure in ﬁnan-

cial markets. The European Physical Journal B-

Condensed Matter and Complex Systems, 11:193–

197.

Pareja, A., Domeniconi, G., Chen, J., Ma, T., Suzumura, T.,

Kanezashi, H., Kaler, T., Schardl, T., and Leiserson,

C. (2020). EvolveGCN: Evolving graph convolutional

networks for dynamic graphs. In 34th AAAI Confer-

ence on Artiﬁcial Intelligence, pages 5363–5370.

Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., and

Cottrell, G. W. (2017). A dual-stage attention-based

recurrent neural network for time series prediction. In

26th International Joint Conference on Artiﬁcial In-

telligence (IJCAI), pages 2627–2633.

Rong, Y., Huang, W., Xu, T., and Huang, J. (2019). DropE-

dge: Towards deep graph convolutional networks on

node classiﬁcation. arXiv preprint arXiv:1907.10903.

Roondiwala, M., Patel, H., and Varma, S. (2017). Predicting

stock prices using LSTM. International Journal of

Science and Research, 6(4):1754–1756.

Rusch, T. K., Bronstein, M. M., and Mishra, S. (2023). A

survey on oversmoothing in graph neural networks.

arXiv preprint arXiv:2303.10993.

Sawhney, R., Agarwal, S., Wadhwa, A., Derr, T., and Shah,

R. R. (2021a). Stock selection via spatiotemporal hy-

pergraph attention network: A learning to rank ap-

proach. In 35th AAAI Conference on Artiﬁcial Intelli-

gence, pages 497–504.

Sawhney, R., Agarwal, S., Wadhwa, A., and Shah, R.

(2020). Deep attentive learning for stock movement

prediction from social media text and company cor-

relations. In Conference on Empirical Methods in

Natural Language Processing (EMNLP), pages 8415–

8426.

Sawhney, R., Agarwal, S., Wadhwa, A., and Shah, R.

(2021b). Exploring the scale-free nature of stock mar-

kets: Hyperbolic graph learning for algorithmic trad-

ing. In Proceedings of the Web Conference, pages 11–

22.

Schwert, G. W. (2002). Stock volatility in the new millen-

nium: how wacky is NASDAQ? Journal of Monetary

Economics, 49(1):3–26.

Shahzad, S. J. H., Hernandez, J. A., Rehman, M. U., Al-

Yahyaee, K. H., and Zakaria, M. (2018). A global

network topology of stock markets: Transmitters and

receivers of spillover effects. Physica A: Statistical

Mechanics and its Applications, 492:2136–2153.

Shi, Z. and Cartlidge, J. (2022). State dependent paral-

lel neural Hawkes process for limit order book event

stream prediction and simulation. In 28th ACM

SIGKDD Conference on Knowledge Discovery and

Data Mining, pages 1607–1615.

Veli

ckovi

c, P., Cucurull, G., Casanova, A., Romero, A., Li

P., and Bengio, Y. (2018). Graph attention networks.

In International Conference on Learning Representa-

tions (ICLR).

Wang, X., Ma, Y., Wang, Y., Jin, W., Wang, X., Tang, J.,

Jia, C., and Yu, J. (2020). Trafﬁc ﬂow prediction via

spatial temporal graph neural network. In Proceedings

of the web conference, pages 1082–1092.

DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

441

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J.,

and Sun, L. (2022). Transformers in time series: A

survey. arXiv preprint arXiv:2202.07125.

Wu, Z., Pan, S., Long, G., Jiang, J., and Zhang, C. (2019).

Graph wavenet for deep spatial-temporal graph mod-

eling. In 28th International Joint Conference on Arti-

ﬁcial Intelligence (IJCAI), pages 1907–1913.

Xiang, S., Cheng, D., Shang, C., Zhang, Y., and Liang,

Y. (2022). Temporal and heterogeneous graph neu-

ral network for ﬁnancial time series prediction. In

31st ACM International Conference on Information &

Knowledge Management, pages 3584–3593.

Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-

i., and Jegelka, S. (2018). Representation learning

on graphs with jumping knowledge networks. In In-

ternational Conference on Machine Learning (ICML),

pages 5453–5462.

Xu, Y. and Cohen, S. B. (2018). Stock movement predic-

tion from tweets and historical prices. In 56th An-

nual Meeting of the Association for Computational

Linguistics, pages 1970–1979.

Ye, J., Zhao, J., Ye, K., and Xu, C. (2021). Multi-graph

convolutional network for relationship-driven stock

movement prediction. In 25th International Confer-

ence on Pattern Recognition, pages 6702–6709.

Yoo, J., Soun, Y., Park, Y.-c., and Kang, U. (2021). Accu-

rate multivariate stock movement prediction via data-

axis Transformer with multi-level contexts. In 27th

ACM SIGKDD Conference on Knowledge Discovery

& Data Mining, pages 2037–2045.

Yue, P., Fan, Y., Batten, J. A., and Zhou, W.-X. (2020).

Information transfer between stock market sectors:

A comparison between the usa and china. Entropy,

22(2):194.

Zeng, H., Zhang, M., Xia, Y., Srivastava, A., Malevich, A.,

Kannan, R., Prasanna, V., Jin, L., and Chen, R. (2021).

Decoupling the depth and scope of graph neural net-

works. Advances in Neural Information Processing

Systems (NeurIPS), 34:19665–19679.

Zhang, Q., Chang, J., Meng, G., Xiang, S., and Pan, C.

(2020). Spatio-temporal graph structure learning for

trafﬁc forecasting. In 34th AAAI Conference on Arti-

ﬁcial Intelligence, pages 1177–1185.

Zhao, J., Dong, Y., Ding, M., Kharlamov, E., and Tang, J.

(2021). Adaptive diffusion in graph neural networks.

Advances in Neural Information Processing Systems,

34:23321–23333.

Zhu, Y., Xu, W., Zhang, J., Du, Y., Zhang, J., Liu,

Q., Yang, C., and Wu, S. (2021). A survey on

graph structure learning: Progress and opportunities.

arXiv:2103.03036 https://doi.org/10.48550/arXiv.210

3.03036.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

442