Gradient-Based Clean Label Backdoor Attack to Graph Neural

Networks

Ryo Meguro

, Hiroya Kato

, Shintaro Narisada

, Seira Hidano

, Kazuhide Fukushima

Takuo Suganuma

and Masahiro Hiji

Tohoku University, Miyagi, Japan

KDDI Research, Inc., Saitama, Japan

Keywords:

Graph Neural Networks, AI Security, Backdoor Attacks.

Abstract:

Graph neural networks (GNNs) can obtain useful information from graph structured data. Although its great

capability is promising, GNNs are vulnerable to backdoor attacks, which plant a marker called trigger in

victims’ models to cause them to misclassify poisoned data with triggers into a target class. In particular, a

clean label backdoor attack (CLBA) on the GNNs remains largely unexplored. Revealing characteristics of

the CLBA is vital from the perspective of defense. In this paper, we propose the ﬁrst gradient based CLBA on

GNNs for graph classiﬁcation tasks. Our attack consists of two important phases, the graph embedding based

pairing and the gradient based trigger injection. Our pairing makes pairs from graphs of the target class and

the others to successfully plant the backdoor in the target class area in the graph embedding space. Our trigger

injection embeds triggers in graphs with gradient-based scores, yielding effective poisoned graphs. We conduct

experiments on multiple datasets and GNN models. Our results demonstrate that our attack outperforms the

existing CLBA using ﬁxed triggers. Our attack surpasses attack success rates of the existing CLBA by up to

50%. Furthermore, we show that our attack is difﬁcult to detect with an existing defense.

1 INTRODUCTION

Graph structured data appear in various ﬁelds, such as

molecular structures, social networks, and web anal-

ysis. In recent years, GNNs have been researched

and utilized in ﬁelds such as security and commerce,

indicating their broad potential applications to graph

structured data. For example, research has been con-

ducted on the detection of fake news (Zhang et al.,

2020) and malware detection based on call graphs

(Feng et al., 2020). There are also many studies on

the application of GNNs in recommendation systems

(RSs), demonstrating the advantages of GNN-based

RSs over traditional methods (Qiu et al., 2020; Yang

et al., 2021; Guo et al., 2021).

However, as with other machine learning systems,

GNNs have been recognized to be vulnerable to ad-

versarial examples and poisoning attacks (Chen et al.,

2020; Z

ugner et al., 2018; Kwon et al., 2019; Jiang

et al., 2022). Furthermore, studies on backdoor at-

tacks on GNNs (Zhang et al., 2021; Xi et al., 2021;

Yang et al., 2022; Xu and Picek, 2022) have been pre-

sented recently. The backdoor attacks plant a marker

called trigger in victims’ models to cause them to mis-

classify poisoned data with triggers into a target class.

In the context of graph domains, the trigger denotes

modiﬁed information such as edges or node features

in the graph. Backdoored models behaves normally

when clean data is given. This is why the backdoor

attack is highly favorable for attackers in terms of

stealthiness. Consequently, attackers may conduct the

backdoor attack on GNNs with the motive for disrupt-

ing security or gaining ﬁnancial beneﬁts. However,

effective defensive methods against backdoor attacks

have not been established in the GNN domain. From

the perspective of defense, revealing characteristics of

backdoor attacks is vital. Therefore, it is essential to

work on a study on backdoor attacks so as to ensure

robustness and security of GNNs.

There are two types of backdoor attacks, namely a

label ﬂipping attack and a clean label backdoor attack

(CLBA). In the label ﬂipping attack, labels of poi-

soned data injected into victim’s dataset are changed

to a target label. The CLBA does not alter labels of

poisoned data. In the GNN domain, the majority of

studies on backdoor attacks are based on label ﬂip-

ping attacks (Zhang et al., 2021; Xi et al., 2021; Yang

et al., 2022). The potential risks of CLBAs remain

510

Meguro, R., Kato, H., Narisada, S., Hidano, S., Fukushima, K., Suganuma, T. and Hiji, M.

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks.

DOI: 10.5220/0012369500003648

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Information Systems Security and Privacy (ICISSP 2024), pages 510-521

ISBN: 978-989-758-683-5; ISSN: 2184-4356

largely unexplored. To our knowledge, there is only

one study (Xu and Picek, 2022) regarding CLBAs on

GNNs for graph classiﬁcation tasks that utilizes ﬁxed

subgraphs as triggers. There are two problems with

that existing CLBA (Xu and Picek, 2022). The ﬁrst

problem is that the CLBA ignores the topological at-

tributes in each graph. Since ER model (Erdos, 1959)

is used to generate a trigger with a random structure,

the performance of the attack is solely dependent on

the trigger density. Thus, there are cases where the

attack does not succeed for certain datasets. The sec-

ond problem is that a ﬁxed subgraph is utilized as the

trigger. The ﬁxed trigger may be easily detected be-

cause the same structure always appears in all poi-

soned graphs. Therefore, assigning more adaptive

triggers to each graph is favorable for the effective

backdoor attack.

In this paper, we propose the ﬁrst gradient based

CLBA on GNNs for graph classiﬁcation tasks. Our

attack consists of two important phases, namely (1)

graph embedding based pairing, and (2) gradient

based trigger injection. In the graph embedding based

pairing, the attacker makes pairs from graphs of the

target class and the others on the basis of their dis-

tance in the graph embedding space. This pairing

is helpful in clarifying the direction of movement in

the graph embedding space during the trigger injec-

tion phase, enabling a more successful attack. In the

gradient based trigger injection, the attacker injects

tailored trigger edges on paired graphs chosen in the

pairing phase to create effective poisoned graphs. In

this part, gradient based edge scoring is introduced in

order to create triggers that enables the pairs to ap-

propriately approach each other in the graph embed-

ding space. This trigger injection is performed on a

graph-by-graph basis to enhance the attack effective-

ness. These phases realize topology-aware and graph-

adaptive triggers, which can address problems posed

by the existing CLBA.

Our Contributions. The main contributions of this

work are summarized as follows:

• We propose the ﬁrst gradient-based CLBA on

GNNs for graph classiﬁcation tasks. We intro-

duce the graph embedding based pairing and the

gradient based trigger injection to our CLBA for

creating effective poisoned graphs.

• We uniquely redeﬁne the threat model of the

CLBA to correct the overestimation of the exist-

ing attack on GNNs.

• We conduct experiments on multiple datasets and

multiple experimental conditions, demonstrating

the effectiveness of our attack. Our attack im-

proves the attack success rate by up to over 50%

compared with the existing CLBA.

• We show that our attack is difﬁcult to detect by an

existing defense method.

2 RELATED WORK

2.1 Backdoor Attacks

Backdoor attacks aim to change the model prediction

only for poisoned data that include triggers from a

true label to a desired label. These true and desired

labels are referred to as source and target labels, re-

spectively. In a backdoor attack, an attacker is in-

volved in both the training and inference phases of

a victim model. In the training phase, poisoned data

are generated by applying triggers to clean data. After

that, in order to poison victim’s model, the poisoned

data are injected into training datasets of victims in

some way. In the inference phase, the attacker injects

similar triggers to clean data with source labels which

leads to misclassiﬁcation of those data.

BadNets (Gu et al., 2019) is a backdoor attack on

the image domain. In this attack, road signs are mis-

classiﬁed into the labels intended by the attacker us-

ing small image instances as triggers. Another back-

door attack on CNNs for image classiﬁcation tasks

exists (Liao et al., 2018). In that backdoor attack, the

attacker generates adaptive trigger pixels, making it

more difﬁcult for humans to visually identify the poi-

soned data. A more advanced backdoor attack that

utilizes encoders is proposed in (Li et al., 2021). That

attack is inspired by DNN-based image steganogra-

phy and designed to avoid trigger detection. As men-

tioned above, in the ﬁeld of image processing, various

techniques for backdoor attacks have been studied, re-

vealing the presence of certain threats.

2.2 Backdoor Attacks on GNNs

Recently, backdoor attacks intended for GNNs have

been studied. In the graph domain, a subgraph is uti-

lized as a trigger for backdoor attacks. The poisoned

graphs are injected into the training dataset to make

a victim model associate the trigger edges or nodes

with the target labels during the model training. In

the inference phase, poisoned graphs for which at-

tackers desire to change the prediction are created by

assigning trigger edges or nodes to clean graphs with

source labels. As a result, trained models yield incor-

rect prediction for input data with triggers while mak-

ing correct predictions for clean data. For the sake

of simplicity, we refer to clean graphs with source

labels as “sources” and clean graphs with target la-

bels as “targets”. In addition, we call sources with

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

511

triggers “poisoned sources” and targets with triggers

“poisoned targets”.

There are multiple scenarios of backdoor at-

tacks. In particular, existing backdoor attacks in the

graph domain are mainly divided into two directions,

namely a label ﬂipping backdoor attack, and a CLBA

depending on whether attackers change labels of poi-

soned graphs. This label manipulation has a signiﬁ-

cant impact on the difﬁculty of the attacks.

2.2.1 Label Flipping Backdoor Attacks

In label ﬂipping backdoor attacks, an attacker changes

the labels of the poisoned graphs. In particular, the

attacker ﬁrst generates poisoned data by embedding

trigger edges into sources. Then, their labels are

changed to target labels, and the poisoned data be-

come poisoned targets. As with general backdoor at-

tacks, such poisoned targets are injected into training

datasets to plant a hidden backdoor in victim mod-

els. In the inference phase, poisoned sources that

have similar triggers are input into victim models with

backdoors. A subgraph based backdoor attack (Zhang

et al., 2021) uses a ﬁxed subgraph that is pregenerated

by connecting existing nodes randomly as the trig-

ger. In the above paper, experiments are conducted

on various trigger sizes, trigger densities, and other

parameters to show the effectiveness of the backdoor

attack using ﬁxed triggers. GTA (Xi et al., 2021) is a

backdoor attack against pretrained GNN models that

optimizes both attack effectiveness and evasivenss by

bi-level optimization. That attack shows the high at-

tack performance in multiple datasets. Furthermore,

their results reveal that existing defense mechanisms

in other domains are ineffective in preventing that at-

tack. In a TRAP attack (Yang et al., 2022), graph

edges are perturbed on the basis of speciﬁc scores.

These scores are calculated using gradients of an at-

tacker’s model. The utilization of gradient informa-

tion to select modiﬁed edges is effective.

Since the original labels of poisoned targets are

the source labels in label ﬂipping backdoor attacks,

it is relatively easy for the attacker to change decision

boundary of the victim model and to induce the model

to misclassify the poisoned sources. However, since

the labels and characteristics of data themselves are

inconsistent compared to those from legitimate tar-

gets, it is possible that poisoned targets are detected

as outliers, which is a limitation of label ﬂipping at-

tacks.

2.2.2 Clean Label Backdoor Attack (CLBA)

To realize a more sophisticated attack, a CLBA is pro-

posed in the graph domain. In CLBAs, attackers cre-

ate poisoned targets by injecting triggers into targets.

In other words, their labels are consistent throughout

the entire attack process from beginning to end. This

is why their features tend to be almost consistent with

those of clean targets, meaning that poisoned targets

are more difﬁcult to detect as outliers. These poisoned

targets are injected into training dataset. Then, as with

label ﬂipping backdoor attacks, triggers are attached

to sources so as to create poisoned sources which are

input during the inference phase. Although there are

some CLBAs targeting node classiﬁcation tasks (Dai

et al., 2023; Yang et al., 2023), there is currently only

one CLBA (Xu and Picek, 2022) targeting graph clas-

siﬁcation tasks. In that CLBA, a small, ﬁxed random

graph is generated by utilizing the Erd

os-R

enyi (ER)

model in advance (Erdos, 1959). This subgraph is

used as a common trigger among all poisoned graphs

for CLBAs.

2.3 Defense Against Backdoor Attacks

A defense method has been proposed in (Liu et al.,

2018). In this approach, a defender checks the distri-

bution of erroneously predicted results. However, this

approach has been shown to be ineffective in subse-

quent research. Neural Cleanse (Wang et al., 2019)

is a defense mechanism that has been shown to be ef-

fective in preventing backdoor attacks against DNNs.

In that method, a defender mitigates backdoor attacks

by utilizing reverse-engineered triggers.

Currently, there are a few defense mechanisms

against backdoor attacks in GNNs. For example,

there is randomized subsampling (Zhang et al., 2021)

that changes edge information of a graph to mitigate

the impact of trigger edges. However, such a de-

fense is not effective because the accuracy of the clean

model often decreases greatly or it cannot prevent at-

tacks in almost all cases (Yang et al., 2022). There

is another existing defense mechanism (Jiang and Li,

2022) against label ﬂipping backdoor attacks. That

defense utilizes the explanation score and achieves

high detection rates of poisoned data in their experi-

ment. However, we ﬁnd that that defense cannot work

well for our attack as shown in Section 5. Thus, more

effective defense methods against backdoor attacks

are still needed. To devise such effective methods,

we argue that the study of the backdoor attack is quite

important because it can reveal characteristics of poi-

soned data. Such revealed characteristics regarding

the topological structures allow us to utilize them for

defensive strategies.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

512

3 PRELIMINARIES

3.1 Graph Neural Networks (GNNs)

In this work, we consider graph classiﬁcation as

the task. This type of task can be widely

utilized for various purposes such as the pre-

diction of the existence of enzymes in a pro-

tein molecule and cell lung cancer in chemicals.

Let D

train

= {(G

),(G

),...,(G

)} denote a

training dataset. D

train

contains graphs G

= (A

)

and their labels y

. A

∈ {0,1}

N×N

is the adjacency

matrix of the graph data that has N nodes, where

the existence of each edge is represented as 1. X

the feature matrix of nodes in G

. The goal of the

graph classiﬁcation is to learn a model F that out-

puts a predicted label y

for a class out of classes in

C = {c

,..., c

}. K is the number of classes. In

fact, F consists of two components, namely the graph

representation module f

GNN

and the classiﬁer module

, which means F = ( f

◦ f

GNN

). f

GNN

is utilized so

as to learn useful information from graph structured

data. In general, f

GNN

generates node embeddings,

which are numerical vectors regarding node features.

On the other hand, f

is equivalent to a fully con-

nected neural network or a traditional classiﬁer such

as support vector machine.

As representative graph representation modules,

there are two GNN models for graph classiﬁcation

tasks. The ﬁrst one is Graph Convolutional Network

(GCN) (Kipf and Welling, 2016). In GCN architec-

ture, the one step of node embeddings generation is

formulated as follows:

(l+1)

= σ(

−1/2

(l)

). (1)

Here, σ is a nonlinear activation function like ReLU,

A is the addition of an adjacency matrix A and the

identity matrix I. D is the the diagonal node degree

matrix of

A, and W

(l)

is a weight matrix for the l-th

GNN layer. The other one is Graph Isomorphism Net-

work (GIN) (Xu et al., 2018). In GIN architecture, the

one step of node embeddings generation is formulated

as follows:

(l+1)

= MLP

(1 + ε

(l)

) · h

(l)

∑

j∈N(i)

(l)

. (2)

Here, MLP is multi-layer perceptrons, h

is the l-th

node embedding of node i and N(i) is the set of nodes

adjacent to node i.

In the graph classiﬁcation, f

GNN

outputs a entire

graph representation h

obtained by aggregating fea-

tures of nodes in each graph. To this end, a readout

function is applied to h

. A simple summation or a

more sophisticated graph-level pooling function can

be utilized as a readout function. The resulting h

input to f

for predicting a label of G

3.2 Threat Model

Attacker’s Goal. The attacker has two goals. The

ﬁrst goal is to improve the attack effectiveness. This

goal is intended to ensure that the poisoned sources

input by the attacker during the inference phase are

misclassiﬁed to the target label y

. The second goal is

to improve the attack evasiveness. With this goal, the

attack is accomplished without the victim being aware

of it. To achieve these goals, the attacker plants trig-

gers in the victim’s model by poisoning their training

dataset with poisoned targets disseminated to public.

Attacker’s Capability. The attacker has his own

GNN model. We assume two attack scenarios for this

model, namely white- and gray-box scenarios. The

white- and gray-box scenarios are deﬁned as follows:

White-Box. Attackers can access the whole informa-

tion about parameters of victim models and what

graph data are used for training dataset in this sce-

nario. This scenario is the worst-case scenario for

victims in the real world.

Gray-Box. Attackers cannot access any information

about victim models, as they know only about

what graph data are used for training. In other

words, attackers need to prepare surrogate mod-

els with different architectures and parameters to

conduct attacks. For example, if the victim model

is a GCN model, then the attacker’s model is a

GIN model, and vice versa.

Unlike existing work (Xu and Picek, 2022), we

uniquely assume that the attacker has the following

constraint on both the existing and proposed CLBAs.

Maintenance of the Clean Label Scenario. In the

CLBA, poisoned targets are injected into the train-

ing dataset of the victim model. Thus, an attacker

should select only from the targets that are correctly

classiﬁed as the target labels by the attacker’s model.

Choosing targets classiﬁed as source labels leads to

the association of source-like graphs with triggers, de-

spite the target labels being assigned to them. If such

targets mutate into poisoned targets to plant triggers

in the victim models, then the genuine CLBA is not

realized. In such a case, the success of the attack is

purely dependent on misclassiﬁcation and label ﬂip-

ping ability. In other words, the attack performance

of the existing attack is overestimated. This is why

we set up the above two scenarios both in the existing

attack and our attack to properly evaluate the CLBA

in our experiments while the attacker does not neces-

sarily need to utilize the information of models in the

existing attack.

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

513

3.3 Problem Formulation

We formulate CLBAs that achieves the goal described

in Section 3.2 as follows:

′

) = y

s.t. θ

′

= argmin

train

∪ D

poison

(3)

′

) = F

), (4)

where F

′

and F

denote a backdoored model with

model parameters θ

′

and a clean one with model

parameters θ

, respectively. Furthermore, y

is the

source label. Let D

poison

denote the poisoned dataset

containing poisoned targets with y

. A poisoned tar-

get G

is created by injecting a trigger g into a clean

target G

. Similarly, a poisoned source G

is created

by injecting g into a clean source G

. L

train

is the loss

for training F

on D

train

∪ D

poison

. Eq.(3) represents

the situation where the poisoned sources are misclas-

siﬁed as the target labels by the backdoored model.

This equation is related to the attack effectiveness of

backdoor attacks. Moreover, Eq.(4) represents the sit-

uation where the backdoored model correctly outputs

the prediction for clean sources. This equation guar-

antees attack evasiveness.

4 PROPOSED ATTACK

4.1 Attack Overview

We propose the ﬁrst gradient based CLBA on GNNs

for graph classiﬁcation tasks. Our attack consists of

two important phases, namely, (1) graph embedding

based pairing and (2) gradient based trigger injec-

tion. In the ﬁrst phase, the attacker forms pairs of

sources and targets on the basis of their distance in the

graph embedding space. This phase is divided into

two parts. First, the attacker selects the same num-

ber of sources as the number of targets that have been

correctly predicted by the attacker’s model. Second,

a source is paired with a target in a one-to-one cor-

respondence considering their positions in graph em-

bedding space. This pairing is helpful in clarifying

the direction of movement in the graph embedding

space during the trigger injection phase, enabling a

more successful attack. In the second phase, the at-

tacker attaches trigger edges on the graphs chosen in

the pairing phase to create poisoned sources and poi-

soned targets. In this part, gradient-based triggers are

assigned to the selected graphs, and the pairs are ap-

propriately guided to approach each other in the graph

embedding space. This optimization is performed on

a graph-by-graph basis to enhance attack effective-

ness. We provide an overview diagram of our attack

in Fig.1.

4.2 Graph Embedding Based Pairing

As shown in Subsection 4.1, this phase has two parts.

First, attackers select p sources which have smallest

prediction probability with respect to the source label

by the attacker’s model. The process of choosing a

source is described as follows:

argmin

∈S

′

)

, (5)

where F

′

(G)

represents the prediction probability of

the attacker’s model F

′

for graph G regarding a label

l, and S is the set of sources. This data selection ap-

proach is repeated p times. We refer to these selected

p sources as S (different from S).

Attackers also choose the same number of p tar-

gets which are farthest from decision boundary. The

process of choosing such a target is described as fol-

lows:

argmax

∈T

′

)

, (6)

where T is the set of targets. We refer to these p tar-

gets as T (different from T ). The reason for select-

ing data on the basis of distance from the decision

boundary is that moving sources in the direction of

paired targets is expected to cause the sources to in-

vade the target region when trigger edges are injected

into them.

Second, attackers pair sources and targets in a

one-to-one correspondence based on a greedy algo-

rithm. The greedy algorithm assigns G

∈ T that is

the closest to G

∈ S in the graph embedding space

one by one. Let h

and h

denote the graph em-

beddings of targets and sources, respectively. h

and

are obtained through F

′

that is attacker’s model

trained on D

train

. In other words, h

= f

′

GNN

)

and h

= f

′

GNN

) where f

′

GNN

is a graph repre-

sentation module of the attacker’s model.

In this approach, G

is paired with G

as follows:

argmin

ℓ(h

), (7)

where ℓ(·,·) is the loss function based on the cosine

similarity and is deﬁned as

ℓ(h

) = 1 − γ(h

). (8)

Note that γ(·,·) is the cosine similarity.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

514

GNNLayer1

Target label：

protein

Target label：

protein

Victim GNN model

(3) Inject poisoned targets

True label：non-protein

Predicted：protein

(2) Inject

triggers

(1) Pairing

Source label：

non-protein

True label：non-protein

Predicted：non-protein

(5) Predict labels

Training dataset

(4) Train with contaminated

dataset

GNNLayer1

Attacker’s GNN model

Training phase

Inference phase

Figure 1: The proposed gradient-based CLBA.

4.3 Gradient Based Trigger Injection

Our trigger injection has two important points,

namely a gradient based edge scoring and the order

of trigger injection. In what follows, we elaborate on

them.

The gradient based edge scoring is useful in se-

lecting appropriate edges that constitute a trigger. Let

Score

denote the scores for edges in a graph G

Score

is calculated as follows:

Score

= (2 · A

− 1) · ∇

, (9)

where ∇

means the gradient of Eq.(8) with respect

to A

corresponding to edges in G

. The meaning

of this score is explained from here. First, consider

the existent edges. If the edge connecting nodes a

and b has a positive gradient value, deleting this edge

reduces the loss in Eq.(8), which brings the graph

data closer to the paired data in the graph embed-

ding space. If the gradient is negative, then we do

not delete the edge because doing so only increases

the loss. Next, we consider nonexistent edges. If the

gradient of the nonexistent edge between nodes a and

b is positive, then we do not connect them because

doing so only increases the loss. On the other hand,

if the gradient value is negative, then adding the edge

brings the graph data closer to the paired data in the

graph embedding space. In short, by inverting the in-

formation of “existent edges with positive gradients”

and “nonexistent edges with negative gradients”, the

pairs can be approached efﬁciently in the graph em-

bedding space. The larger the absolute values of the

gradients are, the closer to the paired data are when

these edges are changed. Therefore, to assign higher

scores to these edges, the sign is adjusted in the ﬁrst

term in Eq.(9). The above score is calculated for each

value in A

to generate the score list. On the basis of

that score list, edge information with top m scores is

changed to attach triggers to graphs.

As for the order of trigger injection, the trig-

ger edges are ﬁrst assigned to S to create poisoned

sources

S. This is because the above order allows

the sources to move to the paired targets signiﬁcantly,

which results in invading the target area in graph em-

bedding space. After that, poisoned targets

T are cre-

ated by making T close to paired poisoned sources

S. This process allows for the creation of effec-

tive poisoned targets. As a result, when the poisoned

targets are used for training the victim model, target

label area can be further expanded, causing misclas-

siﬁcation of the poisoned sources.

4.4 Algorithm

The complete attack process is presented in Algo-

rithm 1. After the initialization procedure (Line 1)

and obtaining F

′

trained on D

train

(Line 2), poisoned

graphs are generated. In our attack,

S is generated

prior to

T . Let G

mod

and G

anch

denote the set of p

modiﬁed graphs and that of p anchor graphs, respec-

tively. Triggers are embedded in G

mod

∈ G

mod

so that

mod

are close to G

anch

∈ G

anch

. G

mod

and G

anch

are

changed depending on the type of graph that triggers

are embedded in. When

S is generated, G

mod

and

anch

are S and T , respectively (Line 5). On the other

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

515

Algorithm 1: Proposed gradient-based CLBA.

Input : S, T , F

′

, and D

train

Output: Poisoned sources and targets (

T )

1 (

T ) ← (

2 Obtain F

′

trained on D

train

3 while

S =

0 or

T =

0 do

4 if

S =

0 then

5 G

mod

← S and G

anch

← T

6 else

7 G

mod

← T and G

anch

←

8 P ,G

tmp

←

9 for i = 1 to p do

10 Select G

mod

∈ G

mod

paired with

anch

∈ G

anch

based on Eq.(7)

11 P ← P ∪ (G

mod

anch

)

12 for each pair (G

mod

anch

) ∈ P do

13 ∇

mod

← ∇

mod

ℓ(h

mod

anch

)

14 Score

← (2 · A

mod

− 1) · ∇

mod

15 Create

mod

by reversing elements

with top m values of Score

in A

mod

16 G

tmp

← G

tmp

∪ {

mod

}

17 if

S =

0 then

S ← G

tmp

19 else

T ← G

tmp

21 return (

T )

hand, when

T is generated, G

mod

is T , and G

anch

(Line 7). A set of paired graphs P is created by pair-

ing G

mod

with G

anch

on the basis of the greedy algo-

rithm (Lines 9-11). For each pair (G

mod

anch

) ∈ P,

the gradient ∇

mod

of the loss function ℓ with respect

to the adjacency matrix A

mod

of G

mod

is calculated

(Lines 13). Then, Score

is calculated on the basis of

∇

mod

to determine the edges to reverse (Line 14). Af-

ter that, a poison graph

mod

is created on the basis of

Score

(Line 15).

mod

is stored in a temporary set

tmp

(Line 16). The above procedures are repeatedly

conducted for creating each of

S and

T . Finally,

and

T are returned.

5 ATTACK EVALUATION

In this section, we evaluate our attack effectiveness

and evasiveness. Speciﬁcally, in this experiment, we

aim to address the following questions.

1. How much does our attack effectiveness and eva-

siveness improve compared to those in the exist-

ing method?

2. To what extent does the threat model impact at-

tack performance?

3. Do existing defense methods effectively work

against our attack?

5.1 Experimental Settings

5.1.1 Datasets

We utilized four real-world datasets to evaluate our

attack effectiveness and evasiveness. The datasets

are MUTAG (Debnath et al., 1991), DHFR (Wale

et al., 2008), NCI1 (Dobson and Doig, 2003) and

PROTEINS-full (Morris et al., 2020). We now refer

to PROTEINS-full as PROTEINS. Table 1 presents

the summary of the datasets. The label of graph with

fewer instances is selected as the target label.

We use 80% of graphs for the training dataset and

the remaining 20% for the test dataset. When we di-

vide the dataset, we ensure that the ratio of labels is

maintained during the split. Both the attacker and

the victim train their models according to the above

data split scenario. The datasets are binary classiﬁca-

tion datasets with labels 0 and 1. The labels refer to

the classiﬁcation results of the molecular data in each

graph dataset.

5.1.2 Metrics

We use two metrics to evaluate our attack effective-

ness and evasiveness.

The ﬁrst metric is the Attack Success Rate (ASR),

which is the proportion of misclassiﬁed samples in

the poisoned sources by the victim model and is de-

scribed as follows:

ASR =

∑

i=1

′

(

) = y

]

S is the set of poisoned sources, and F

′

is the contam-

inated victim model. [A] represents 1 if proposition A

is true and 0 otherwise. Note here that the attacks are

considered successful only when a clean model cor-

rectly classiﬁes clean data before the trigger edges are

inserted as the source label; however, the backdoored

model misclassiﬁes poisoned data as the target label.

Although this scenario is challenging for attackers, it

is appropriate because it helps avoid overestimation

of the ASR due to coincidental misclassiﬁcations.

The second metric is the Clean Accuracy Drop

(CAD), which is the drop from clean accuracy to

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

516

Table 1: Datasets summary.

Dataset Graphs in class Avg. nodes Avg. edges Node features Target label

NCI1 2053(0), 2057(1) 29.86 64.60 37 0

PROTEINS 663(0), 450(1) 39.05 145.63 32 1

DHFR 295(0), 461(1) 42.42 89.08 3 0

MUTAG 63(0), 125(1) 17.93 39.58 7 0

Table 2: GNN model.

GNN Parameter Settings

Architecture 2 Layers(16, 8)

Classiﬁer Linear

GCN, GIN Aggregator Max Pooling

Optimizer Adam

Scheduler StepLR

Table 3: Clean Accuracy (%).

Dataset GCN GIN

NCI1 65.43 70.91

PROTEINS 74.43 74.61

DHFR 70.72 72.43

MUTAG 74.73 79.47

backdoored accuracy. Here, clean accuracy is the

accuracy of the model trained without poisoned tar-

gets, while backdoored accuracy is that of the model

trained with poisoned targets.

We evaluate the experimental values of the ASR

and CAD by taking the average of 10 repetitions for

each experimental condition. The existing attack is

executed in the condition that edge density for ﬁxed

triggers is 0.8.

5.1.3 Model Settings

The composition of the GNN model is shown in the

Table 2. We use the Adam Optimizer and StepLR

scheduler in the training process. The model parame-

ters are selected on the basis of the results of the grid

search, with those that yield the maximum accuracies

being chosen. Table 3 shows the clean accuracy by

the models with the selected parameters.

5.2 Experimental Results

How Much Does our Attack Effectiveness and

Evasiveness Improve Compared to Those in the

Existing Method? Table 4 and Table 5 summarize

the experimental values for the 10% poisoning rate (or

the maximum poisoning rate) as numerical data. Our

attack outperforms the existing attack under almost

all conditions in terms of the ASR. We achieve an in-

crease of over 50% compared to the existing attack in

the gray-box scenario where the dataset is MUTAG.

This improvement is achieved due to the following

reason. In our attack, it is possible to induce mis-

classiﬁcation effectively because our attack assigns

appropriate trigger edges tailored to each graph. In

contrast, the existing attack ignores the characteris-

tics of each graph and assigns a ﬁxed trigger, which is

why its attack performance is considered unstable.

Fig.2–5 show the comparison of the ASR and the

CAD in our gray-box and white-box attacks when the

poisoning rate is varied at 1%, 3%, 5%, 7%, 9%, and

10%. Due to the limited number of graphs for creating

poisoned data, there are cases where the maximum

poisoning rate is less than 10%, and in such cases, we

stop the evaluation of metrics at the maximum data

size. Regarding the increase in the ASR with respect

to poisoning rate, our attack method shows a strong

upward trend as shown in Fig.2 and Fig.4. On the

other hand, the existing attack appears to remain rela-

tively stable. This suggests that our attack is capable

of generating effective poisoned data, which the ex-

isting attack is unable to produce. For example, when

the poisoning rate increased from 1% to 10%, the in-

crease of the ASR in the existing attack reaches at

most about 20%. In contrast, with our attack, the in-

crease can be over 40% at its maximum. Regarding

CAD as shown in Fig.3 and Fig.5, for the MUTAG

dataset, our attack shows an upward trend, indicating

suboptimal evasion performance. However, for other

datasets, even when compared to the existing attack,

CAD remains at similarly low levels of less than 5%

in most datasets. Therefore, it becomes possible to

execute our attack without being detected by the vic-

tim.

To What Extent Does the Threat Model Impact At-

tack Performance? As shown in Fig.2, there is a

case where the ASR of our gray-box attack dropped

by more than 20% compared to that of our white-

box attack when the dataset is MUTAG. However, in

other datasets, our attack maintains a high attack suc-

cess rate even in the gray-box scenario, demonstrat-

ing that our attack is feasible for attackers who do not

possess detailed information about the model. Fur-

thermore, there are cases where the gray-box scenario

shows better attack performance, but this ﬁnding can

be attributed to S whose labels are originally misclas-

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

517

Table 4: ASR comparison of the existing attack, our whitebox and graybox attacks (poisoning rate=10%). Highest ASR cells

are grayed out.

Dataset Existing (graybox) Existing (whitebox) Proposal (graybox) Proposal (whitebox)

GCN GIN GCN GIN GCN GIN GCN GIN

NCI1 33.48 37.45 36.97 48.29 68.56 73.44 61.01 68.51

PROTEINS 20.39 30.49 23.24 28.10 67.03 66.82 60.80 71.25

DHFR 24.63 35.62 26.84 50.65 54.43 72.99 60.65 68.94

MUTAG 13.11 21.05 18.94 33.68 18.77 73.24 48.94 58.42

Table 5: CAD comparison of the existing attack, our whitebox and graybox attacks(poisoning rate=10%). Lowest CAD are

grayed out.

Dataset Existing (graybox) Existing (whitebox) Proposal (graybox) Proposal (whitebox)

GCN GIN GCN GIN GCN GIN GCN GIN

NCI1 0.93 0.75 0.09 2.00 3.31 2.83 3.00 2.91

PROTEINS -1.56 1.79 -1.74 0.98 -0.67 2.55 -0.71 3.45

DHFR 0.39 1.38 -1.51 1.57 2.36 -0.59 0.52 2.30

MUTAG -1.31 0.00 -1.84 1.57 1.31 7.89 3.15 2.89

(a) NCI1 (b) PROTEINS (c) DHFR (d) MUTAG

Figure 2: Comparison of the ASR in our attack and the existing attack (GCN).

(a) NCI1 (b) PEOTEINS (c) DHFR (d) MUTAG

Figure 3: Comparison of the CAD in our attack and the existing attack (GCN).

(a) NCI1 (b) PROTEINS (c) DHFR (d) MUTAG

Figure 4: Comparison of the ASR in our attack and the existing attack (GIN).

siﬁed by the victim model, leading to the inability to

maintain the clean-label scenario. Therefore, it is not

always straightforward that the gray-box scenario is

superior to the white-box scenario.

Next, we explain the difference between the re-

sults reported in the existing attack’s paper (Xu and

Picek, 2022) and the results of our veriﬁcation exper-

iments. In that paper (Xu and Picek, 2022), the ASR

is over 80% when datasets are NCI1 and MUTAG in

the condition that poisoning rate is 10%. However,

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

518

(a) NCI1 (b) PEOTEINS (c) DHFR (d) MUTAG

Figure 5: Comparison of the CAD in our attack and the existing attack (GIN).

according to our experimental results, the ASR is less

than 50% in all cases. This is because change of the

threat model enables proper evaluation of the ASR,

which corrects the overestimation. As a result, it is

revealed that the existing attack is not so successful in

practice.

Do Existing Defense Methods Effectively Work

Against our Attack? To evaluate our attack eva-

siveness, we apply an existing defense mechanism

(Jiang and Li, 2022) against label ﬂipping backdoor

attacks. That defense utilizes explanation scores and

shows a high detection rate of poisoned data. In

this defense approach, the defender ﬁrst adopts edges

that serve as explanations for the model’s predictions.

The explanations mean important edges that dictate

the prediction. Subsequently, based on explanation-

based scores, the defender detects the presence of

trigger edges in poisoned sources and removes those

edges. To be more speciﬁc, the defender calculates

an explanability score (ES) for each graph, identify-

ing graphs with high ESs as poisoned sources. The

threshold is set as the maximum value of benign data

in the test dataset, and any data with ESs exceeding

this threshold are considered malicious. The detec-

tion rate is the number of detected poisoned sources

divided by the number of all poisoned sources. Then,

the defender proceeds to guide predictions in the cor-

rect direction by removing the edges on the basis of

explanations. In this experiment, the number of re-

moved edges is equal to the trigger size of our attack.

We utilize the same dataset to evaluate our attack and

take this defensive approach under the condition that

the poisoning rate is the maximum value.

The results of defense by explanation are shown

in Fig.6 and Fig.7. The detection rate of the attack

is below 20% in most cases, indicating a high level

of attack evasiveness. There are conditions where

the partial detection rate is higher (Dataset: MUTAG,

Model: GCN, our whitebox attack). There are two

possible reasons for this. First, the average graph

size of MUTAG dataset is smaller than that of other

datasets. As a result, the trigger size decreases, and

the detection rate increases because the importance

of edges concentrated on speciﬁc edges. However,

it is important to note that in this case, the number of

edges adopted for explanation is equal to the trigger

size, and this defense is carried out under conditions

favorable to the defender. Therefore, in practice, de-

tection may be more challenging.

6 CONCLUSION

In this paper, we propose the ﬁrst gradient based

CLBA on GNNs for graph classiﬁcation tasks. Our

attack introduces the graph embedding based pairing

and the gradient based trigger injection. We also re-

consider the threat model of backdoor attacks, deﬁne

a more practical threat model, and eliminate the over-

estimation of attack performance in the existing at-

tack. Under this threat model, our results demon-

strate that our attack outperforms the existing CLBA

in terms of the ASR metric and the existing CLBA is

not so successful in practice. We validate our attack

performance on multiple datasets and experimental

conditions, demonstrating the generalizability of our

attack approach. Furthermore, we conﬁrm that the

existing defense cannot sufﬁciently detect our attack.

Therefore, the results clarify that devising more effec-

tive defense mechanisms against CLBAs are needed,

which stimulates further research in this area.

REFERENCES

Chen, L., Li, J., Peng, J., Xie, T., Cao, Z., Xu, K., He, X.,

Zheng, Z., and Wu, B. (2020). A survey of adversarial

learning on graphs. arXiv preprint arXiv:2003.05730.

Dai, E., Lin, M., Zhang, X., and Wang, S. (2023). Unnotice-

able backdoor attacks on graph neural networks. In

Proceedings of the ACM Web Conference 2023, pages

2263–2273.

Debnath, A. K., Lopez de Compadre, R. L., Debnath, G.,

Shusterman, A. J., and Hansch, C. (1991). Structure-

activity relationship of mutagenic aromatic and het-

eroaromatic nitro compounds. correlation with molec-

ular orbital energies and hydrophobicity. Journal of

medicinal chemistry, 34(2):786–797.

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

519

(a) NCI1 (b) PROTEINS (c) DHFR (d) MUTAG

Figure 6: Comparison of Defense results (GCN).

(a) NCI1 (b) PROTEINS (c) DHFR (d) MUTAG

Figure 7: Comparison of Defense results (GIN).

Dobson, P. D. and Doig, A. J. (2003). Distinguishing

enzyme structures from non-enzymes without align-

ments. Journal of molecular biology, 330(4):771–

783.

Erdos, P. (1959). On random graphs. Mathematicae, 6:290–

297.

Feng, P., Ma, J., Li, T., Ma, X., Xi, N., and Lu, D.

(2020). Android malware detection based on call

graph via graph neural network. In 2020 International

Conference on Networking and Network Applications

(NaNA), pages 368–374. IEEE.

Gu, T., Dolan-Gavitt, B., and Garg, S. (2019). Bad-

nets: Identifying vulnerabilities in the machine learn-

ing model supply chain.

Guo, L., Yin, H., Chen, T., Zhang, X., and Zheng, K.

(2021). Hierarchical hyperedge embedding-based

representation learning for group recommendation.

ACM Transactions on Information Systems (TOIS),

40(1):1–27.

Jiang, B. and Li, Z. (2022). Defending against backdoor at-

tack on graph nerual network by explainability. arXiv

preprint arXiv:2209.02902.

Jiang, C., He, Y., Chapman, R., and Wu, H. (2022). Cam-

ouﬂaged poisoning attack on graph neural networks.

In Proceedings of the 2022 International Conference

on Multimedia Retrieval, pages 451–461.

Kipf, T. N. and Welling, M. (2016). Semi-supervised clas-

siﬁcation with graph convolutional networks. arXiv

preprint arXiv:1609.02907.

Kwon, H., Yoon, H., and Park, K.-W. (2019). Selec-

tive poisoning attack on deep neural network to in-

duce ﬁne-grained recognition error. In 2019 IEEE

Second International Conference on Artiﬁcial Intel-

ligence and Knowledge Engineering (AIKE), pages

136–139. IEEE.

Li, Y., Li, Y., Wu, B., Li, L., He, R., and Lyu, S. (2021). In-

visible backdoor attack with sample-speciﬁc triggers.

In Proceedings of the IEEE/CVF International Con-

ference on Computer Vision (ICCV), pages 16463–

16472.

Liao, C., Zhong, H., Squicciarini, A., Zhu, S., and Miller, D.

(2018). Backdoor embedding in convolutional neural

network models via invisible perturbation.

Liu, Y., Ma, S., Aafer, Y., Lee, W.-C., Zhai, J., Wang, W.,

and Zhang, X. (2018). Trojaning attack on neural net-

works. In 25th Annual Network And Distributed Sys-

tem Security Symposium (NDSS 2018). Internet Soc.

Morris, C., Kriege, N. M., Bause, F., Kersting, K., Mutzel,

P., and Neumann, M. (2020). Tudataset: A collec-

tion of benchmark datasets for learning with graphs.

In ICML 2020 Workshop on Graph Representation

Learning and Beyond (GRL+ 2020).

Qiu, R., Huang, Z., Li, J., and Yin, H. (2020). Exploiting

cross-session information for session-based recom-

mendation with graph neural networks. ACM Trans-

actions on Information Systems (TOIS), 38(3):1–23.

Wale, N., Watson, I. A., and Karypis, G. (2008). Compar-

ison of descriptor spaces for chemical compound re-

trieval and classiﬁcation. Knowledge and Information

Systems, 14:347–375.

Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng,

H., and Zhao, B. Y. (2019). Neural cleanse: Identi-

fying and mitigating backdoor attacks in neural net-

works. In 2019 IEEE Symposium on Security and Pri-

vacy (SP), pages 707–723.

Xi, Z., Pang, R., Ji, S., and Wang, T. (2021). Graph back-

door. In 30th USENIX Security Symposium (USENIX

Security 21), pages 1523–1540.

Xu, J. and Picek, S. (2022). Poster: Clean-label backdoor

attack on graph neural networks. In Proceedings of

the 2022 ACM SIGSAC Conference on Computer and

Communications Security, pages 3491–3493.

Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2018). How

powerful are graph neural networks? arXiv preprint

arXiv:1810.00826.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

520

Yang, J., Ma, W., Zhang, M., Zhou, X., Liu, Y., and Ma, S.

(2021). Legalgnn: Legal information enhanced graph

neural network for recommendation. ACM Transac-

tions on Information Systems (TOIS), 40(2):1–29.

Yang, S., Doan, B. G., Montague, P., De Vel, O., Abra-

ham, T., Camtepe, S., Ranasinghe, D. C., and Kan-

here, S. S. (2022). Transferable graph backdoor at-

tack. In Proceedings of the 25th International Sympo-

sium on Research in Attacks, Intrusions and Defenses,

pages 321–332.

Yang, X., Li, G., Zhang, C., Han, M., and Yang, W. (2023).

Percba: Persistent clean-label backdoor attacks on

semi-supervised graph node classiﬁcation.

Zhang, J., Dong, B., and Philip, S. Y. (2020). Fakedetector:

Effective fake news detection with deep diffusive neu-

ral network. In 2020 IEEE 36th international confer-

ence on data engineering (ICDE), pages 1826–1829.

IEEE.

Zhang, Z., Jia, J., Wang, B., and Gong, N. Z. (2021). Back-

door attacks to graph neural networks. In Proceedings

of the 26th ACM Symposium on Access Control Mod-

els and Technologies, pages 15–26.

ugner, D., Akbarnejad, A., and G

unnemann, S. (2018).

Adversarial attacks on neural networks for graph data.

In Proceedings of the 24th ACM SIGKDD interna-

tional conference on knowledge discovery & data

mining, pages 2847–2856.

Gradient-Based Clean Label Backdoor Attack to Graph Neural Networks

521