Self-adaptive Norm Update for Faster Gradient-based
L
2
Adversarial Attacks and Defenses
Yanhong Liu and Fengming Cao
Pingan International Smart City, China
Keywords:
Adversarial Attacks and Defenses, Computer Vision, Neural Networks, Deep Learning.
Abstract:
Adversarial training has been shown as one of the most effective defense techniques against adversarial at-
tacks. However, it is based on generating strong adversarial examples by attacks in each iteration of its training
process. Research efforts have always been paid to reduce the time overhead of attacks, without impacting
their efficiency. The recent work of Decoupled Direction and Norm (DDN) pushed forward the progress on
the gradient-based L
2
attack with low norm, by adjusting the norm of the noise in each iteration based on
whether the last perturbed image is adversarial or not. In this paper, we propose a self-adaptive way of adjust-
ing the L
2
norm, by considering whether the perturbed images in the last two iterations are both adversarial or
not. Experiments conducted on the MNIST, CIFAR-10 and ImageNet datasets show that our proposed attack
achieves comparable or even better performance than DDN with up to 30% less number of iterations. Models
trained with our attack achieve comparable robustness to those trained with the DDN attack on the MNIST
and CIFAR-10 datasets, by taking around 20% less training time, when the attacks are limited to a maximum
norm.
1 INTRODUCTION
After the emergence of deep learning techniques, it
has received considerable attraction and applied in a
wide range of computer vision tasks, such as clas-
sification (Szegedy et al., 2015; Simonyan and Zis-
serman, 2015; He et al., 2016) and object detection
(Girshick, 2015; Liu et al., 2016; Redmonand et al.,
2016), leading to state-of-the art performance. How-
ever, it has been shown that deep neural network mod-
els are vulnerable to adversarial examples (Biggio
et al., 2013; Szegedy et al., 2013). For example, an
image with small perturbations added is perceptually
very similar to the original one, but misclassified with
high confidence by the classifier.
Research in the literature usually measures the
added noise by L
p
norm (p = 0, 1,2,, etc), which
gives a type of distance measure between the origi-
nal image and the adversarial example. Each form
of the L
p
norm has its own preference of the dis-
tortion on the image to be attacked. Various attack
techniques (Papernot et al., 2016; Modas et al., 2019;
Chen et al., 2018; Goodfellow et al., 2015; Kurakin
et al., 2016; Madry et al., 2018; Szegedy et al., 2013;
Carlini and Wagner, 2017; Rony et al., 2019) have
been devised to generate adversarial examples, with
each one typically specific to a particular form of L
p
norm. Among of these, the projected gradient descent
(PGD) (Madry et al., 2018) method is well known for
its effectiveness on L
attack. The state-of-the-art for
L
2
attack was proposed by Carlini and Wagner (Car-
lini and Wagner, 2017), known as the C&W attack.
As for the optimization strategy used, the attacks
can be classified into two main branches. One is
to find the perturbation within a specified norm ball
around the original sample, while the perturbed ex-
ample maximizes the loss function. PGD is the repre-
sentative work along this line. The other is to search
for the perturbation with the lowest norm, among all
the possible distortions that mislead the classifier. The
typical work along this line refers to C&W (Carlini
and Wagner, 2017).
It has always been a widely active research area
for devising defense methods (Prakash et al., 2018;
Gu et al., 2019; Machado et al., 2019) to get robust
models that can correctly label the adversarially per-
turbed images. One of the most effective defenses
(Athalye et al., 2018) until now is the adversarial
training approach, which arguments each minibatch
of training data with adversarial examples. Usually,
iterative attacks (Kurakin et al., 2017; Madry et al.,
2018) are used to generate stronger adversarial exam-
Liu, Y. and Cao, F.
Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses.
DOI: 10.5220/0010186100150024
In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 15-24
ISBN: 978-989-758-486-2
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
15
ples for training, in order to get more robust models.
The more attack iterations are used, the stronger ex-
amples are generated. However, it is prohibitively
time-consuming since it needs to compute the gra-
dient on the input in each attack iteration. Some
research efforts (Shafahi et al., 2019; Wong et al.,
2020; Zheng et al., 2020) have been paid to speed
up the adversarial training process, which mainly con-
sider about the widely adopted L
attack PGD (Madry
et al., 2018).
There is far rarely work on adversarial training
based on L
2
attacks, especially for those searching
for the perturbation with the minimum norm. The
main reason is due to the lack of an efficient attack
method. C&W requires a line-search for one of the
optimization terms and thus needs thousands of itera-
tions. The extremely high computational cost makes
it impossible to apply it in adversarial training. To
tackle with the difficulty of C&W attack and reduce
the time cost, Rony et al. (Rony et al., 2019) proposed
to decouple the direction and the norm (DDN) of the
generated perturbation in each gradient-based itera-
tion. Along the direction of the gradient, the norm of
the perturbation is constrained by projecting it onto
an ε-sphere around the original image. The value of
ε is adjusted by a binary decision, based on whether
the distorted image is adversarial or not. In this way,
DDN generates adversarial examples closest to the
decision boundary, and thus opens the possibility of
using an efficient L
2
attack for adversarial training.
In this paper, we follow this line of work on
speeding-up the L
2
attack. We observe that DDN al-
gorithm adjusts the L
2
norm ε by a fixed update fac-
tor, which may slow down the progress of approach-
ing the decision boundary. We propose to adjust the
norm in a self-adaptive way in each gradient-based it-
eration, by tracing whether the perturbed images were
adversarial or not in the previous two steps. We can
change the norm to a greater degree, if the previous
two perturbations led the same predictions. Other-
wise, we change the norm less since the two perturbed
images have crossed the decision boundary. We de-
sign a self-adaptive norm update (ANU) scheme for
L
2
attack with low norm, which can approach to the
minimum adversarial perturbation much faster than
DDN. This improvement in time reduction helps to
further speed up the adversarial training process us-
ing L
2
attack.
Extensive experiments on the MNIST, CIFAR-10
and ImageNet datasets show that the ANU algorithm
generally has better performance than DDN when tak-
ing the same number of iterations, with comparable
execution time. It was also observed that the ANU
attack can achieve comparable or even better perfor-
mance than DDN with up to 30% less number of iter-
ations, which is especially useful under the scenario
of adversarial training. Experiments show that the
defense models trained with our attack may achieve
comparable robustness with those trained with DDN
on MNIST and CIFAR-10, with around 20% reduced
training time.
2 RELATED WORK
In this section, we review the basic concept of adver-
sarial examples, related work on attack and defense
methods, and then give the background of our work.
2.1 Threat Model
In this paper, we consider the white-box attack scenar-
ios, where the model architecture and parameters are
known to the adversary. The white-box attacks can
be used to evaluate the worst-case robustness of the
model, or generate adversarial examples for adversar-
ial training.
2.2 L
p
Norm
We study the problem of image classification. Given
an image sample x from the input space R
n
(i.e. x
R
n
), it has a true label y (the index of the label) from
the predefined set of m possible labels R
m
. For ex-
ample, a grey-scale image with h × w pixels defines a
two-dimensional vector x R
hw
, where x
i
denotes the
intensity of the i-th pixel. The value of each pixel is
scaled to be in the range [0,1]. Similarly, a color RGB
image defines a three-dimensional vector x R
3hw
.
The image classifier, modeled by a neural network
F parameterized by θ, outputs the probability of x be-
longing to y
i
for each y
i
R
m
, i.e. F(x) is a list of
probability P(y
i
|x,θ). The sample x is then labelled
as C(x) = argmax
i
P(y
i
|x,θ).
An adversary attacks x with small perturbations
added to the intensity of the image pixels, such that
the resulted image is misclassified. Suppose that the
distorted image is denoted by x
0
, a successful attack
will lead the classifier to have C(x
0
) not equal to its
true label y. To quantify the difference of x
0
to its
original image x, the widely used distance metrics in
the literature are L
p
(p = 0,1,2, , etc.) norms, which
give a reasonable approximation of human perceptual
similarity. Let δ denote the perturbation added to x,
i.e. δ = x
0
x. The L
p
distance between x and x
0
,
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
16
denoted by kδk
p
, is defined as follows:
kδk
p
=
n
i=1
|δ|
p
1
p
(1)
Intuitively, L
0
distance measures the number of pix-
els that are altered in x
0
. L
1
attacks lead to sparsity
in the perturbation, with only a few pixels adjusted.
L
2
distance measures the standard Euclidean distance
between x and x
0
. L
2
attacks lead to noise that are
more localized in the image, since it can trade off a
large perturbation in some pixel for less perturbations
in other pixels. L
distance measures the maximum
change for all the pixel values, and thus L
attacks
lead to small noise everywhere in the image.
2.3 Optimization Objective
The attacks can be grouped into non-targeted and tar-
geted ones. For the non-targeted ones, an adversar-
ial example x
0
leads to a misclassified label C(x
0
) 6=
y. The targeted attacks assume a predefined label
y
target
6= y, while leading the classifier to predict x
0
to be y
target
.
Taking the non-targeted attacks as an example, we
show two main approaches of optimization over gen-
erating the adversarial examples. The first approach
is to minimize the norm of the distortion δ under the
constraint that x +δ misleads the classifier, which can
be formulated as:
min
δ
kδk
p
such that C(x + δ) 6= y,
and x + δ [0,1]
n
(2)
For targeted attacks, a similar formulation can be ob-
tained by modifying the constraint to be C(x + δ) =
y
target
The other approach is to maximize the loss func-
tion J(x + δ,y,θ) (equivalent to minimize the prob-
ability of being correctly classified), given that the
norm of the perturbation is bounded by ε, which can
be formulated as:
max
δ
J(x + δ,y, θ)
such that kδk
p
ε,
and x + δ [0,1]
n
(3)
A similar formulation can be derived for targeted at-
tacks, by modifying the objective to be min
δ
J(x +
δ,y
target
,θ) (equivalent to max
δ
P(y
target
|x + δ,θ)).
2.4 Attacks
Since the discovery of the phenomena of adversarial
examples, various gradient-based attack techniques
have been proposed to generate examples to evaluate
the robustness of deep neural network models, based
on different L
p
norm specifications.
L
0
/L
1
: The JSMA (Papernot et al., 2016) L
0
attack
constructs adversarial saliency maps through comput-
ing forward derivatives, and thus identifies the input
pixels with the highest potential to change the deci-
sion of the classifier. The SparseFool (Modas et al.,
2019) method exploits the low mean curvature of the
decision boundary, and proposes a geometry-inspired
sparse L
0
/L
1
attack that controls the sparsity of the
perturbations. The EAD (Chen et al., 2018) work
formulates the process of attacking the neural net-
works as an elastic-net regularized optimization prob-
lem and features L
1
-oriented adversarial examples.
L
: The fast gradient sign method (FGSM) (Good-
fellow et al., 2015) generates adversarial examples
with a single gradient step, which maximizes the
loss function with the norm of the perturbations up-
per bounded, following Eqn. ( 3). The basic iter-
ative method (Kurakin et al., 2016) applies multiple
and smaller FGSM steps, which was further strength-
ened by adding multiple random restarts. These en-
hancements were later well known as the PGD at-
tack (Madry et al., 2018), which is the state of the
art method for L
attack. There have been some im-
provements on PGD that aim at making it more effec-
tive and/or more query-efficient by changing its up-
date rule to Adam (Uesato et al., 2018) or momentum
(Dong et al., 2018).
L
2
: FGSM and PGD can also be extended to generate
L
2
attacks, with the objective of optimizing Eqn. (3).
Following the constrained minimization objective as
stated in Eqn. (2), Szegedy et al. (Szegedy et al.,
2013) reformulated the problem in the following form
that is better suited for optimization:
min
δ
c · kδk
2
2
+ log P(y|x + δ,θ)
such that x + δ [0,1]
n
(4)
where c is a constant. A box-constrained optimizer
(L-BFGS) is used to address the constraint of x + δ
[0,1]
n
and line search is performed to find an appro-
priate value of c.
DeepFool (Moosavi-Dezfooli et al., 2016) itera-
tively constructs untargeted adversarial examples by
assuming a linear approximation of the neural net-
work models. C&W attack (Carlini and Wagner,
2017) reformulates the optimization problem in Eqn.
(2) in a similar way to L-BFGS, as shown in Eqn. (5).
Instead of using the box-constrained optimization, it
proposes to change the variables using the tanh func-
tion and thus address the constraint of x + δ [0,1]
n
naturally. It also tries to optimize the difference be-
tween logits (the model output before the softmax ac-
Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses
17
tivation), instead of the loss function.
min
δ
kx
0
xk
2
2
+ c · f (x
0
)
where f (x
0
) = max(Z(x
0
)
y
max{Z(x
0
)
i
: i 6= y}, κ)
(5)
where x
0
is equal to
1
2
tanh(arctanh(2x 1)+δ)+1).
Z(x
0
)
i
denotes the logit corresponding to the i-th class.
κ denotes a confidence parameter, a higher value of
which results in higher confidence of misclassifying
the adversarial sample x
0
.
C&W shows the state-of-the-art performance for
L
2
attack. However, it has to iteratively search for
the appropriate value of constant c. The prohibitively
high computational cost makes it not suitable for
generating adversarial examples for training defense
models.
The DDN attack (Rony et al., 2019) makes a great
progress along the way of being time efficient (as few
as 100 iterations), with comparable results to C&W.
We will detail this algorithm in the next section.
2.5 Defenses
Adversarial training has been shown as one of the
most effective defense methods to train robust models
against adversarial examples (Athalye et al., 2018). It
mixes the clean train data with adversarial examples,
where the adversarial loss function becomes:
J
(x,y,θ) = βJ(x, y,θ) + (1 β)J(x
0
,y,θ)
(6)
where β specifies a constant ratio. J(x,y,θ) defines
the normal loss function of the model, such as the
cross-entropy loss. The adversarial example x
0
of the
original image x may be generated by the attacks as
stated in the last subsection. When it is generated fol-
lowing the objective of Eqn. (3), the training process
becomes a min-max optimization problem as formu-
lated by Madry’s defense. Most defenses focus on
this line of work due to the availability of efficient at-
tack mechanisms such as FGSM (Goodfellow et al.,
2015) or PGD (Madry et al., 2018) etc, where the at-
tacks aim to maximize the worst-case loss given that
the perturbations are bounded by a maximum norm
ball.
However, the research on attacks with the objec-
tive of Eqn. (2) lacks behind on the aspect of time
efficiency. The recently proposed DDN (Rony et al.,
2019) attack moves a big step forward, which makes
it possible to train defense models following this line.
3 THE ANU ATTACK
In this section, we first analyze the DDN algorithm
(Rony et al., 2019) in detail, and then propose a self-
adaptive norm update scheme to further improve the
time efficiency of L
2
attack.
3.1 Decoupled Direction and Norm
As shown in Alg. 1 (step 4 and 5), the DDN (Rony
et al., 2019) attack iteratively refines the noise δ
k
,
based on the gradient (denoted by g) of the loss func-
tion J(x
0
k1
,y,θ) relative to the noise δ
k1
computed
in the last iteration. The noise δ
k
is updated along the
direction of g, with the goal of increasing the loss for
the untargeted attack while decreasing the loss for the
targeted attack.
In each iteration, the DDN algorithm constrains
the norm of the perturbation by projecting δ
k
on an
ε-sphere around the original image x (step 11). The
PGD attack also projects the noise on a pre-specified
ε-sphere, where ε is the maximal norm ball allowed
(see Eqn. (3)). However, DDN adapts the value of ε
in each iteration to make the perturbed image closer
to the decision boundary.
In each iteration k, as shown from step 6 to 10, the
ε-sphere is updated based on if x
0
k1
is adversarial or
not. If x
0
k1
is not adversarial, i.e. C(x
0
k1
) = y, the
norm ε
k
is increased to be (1 + γ)ε
k1
. Otherwise, if
x
0
k1
is adversarial, i.e. C(x
0
k1
) 6= y, the norm ε
k
is
decreased to be (1 γ)ε
k1
. It can be seen that ε
k
is
changed to improve the probability of making x
0
k
cross
the decision boundary.
Note that DDN takes a fixed factor of γ for up-
dating the norm in each iteration. We argue that this
fixed ratio of scaling for ε slow down the convergence
of the adversarial norm to the decision boundary. As
illustrated in Figure 1, the noise δ
k
is updated along
the direction of g, and then projected back onto the
ε
k
-sphere around the original image x. Figure 1 (a)
shows that the norm is scaled up with a fixed ratio
of 1 + γ, when both x
0
k2
and x
0
k1
are not adversar-
ial. In the case that x
0
k2
is not adversarial and x
0
k1
is adversarial, Figure 1 (b) shows that the norm is
scaled up to (1 + γ)ε
k2
and then reduced back to
(1 γ)ε
k1
= (1 γ
2
)ε
k2
that is even smaller than
ε
k2
.
3.2 Self-adaptive Norm Update
Instead of using a fixed factor of γ, in this paper we
propose a self-adaptive scheme of adjusting the norm
update factor γ
k
in each iteration, as shown in Alg. 2.
With an initial input of γ
0
, we update the value of
γ
k
by observing whether the past two perturbed im-
ages are adversarial or not. As shown from step 7 to
13, if x
0
k2
and x
0
k1
are both adversarial or not, we
can say that these two distorted images did not cross
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
18
Algorithm 1: Algorithm of DDN Attack.
Input: x: original image to be attacked
Input: y: true label (untargeted) or targeted
label (targeted)
Input: K: number of iterations
Input: α: step size in the direction of g
Input: γ: factor to update the norm
Output: x
0
: adversarial image
1 Initialize δ
0
0, x
0
0
x, ε
0
1 ;
2 If targeted attack: m 1 else m +1;
3 for k 1 to K do
4 g m
δ
k1
J(x
0
k1
,y,θ);
5 δ
k
δ
k1
+ α
g
kgk
2
;
6 if x
0
k1
is adversarial then
7 ε
k
(1 γ)ε
k1
// decrease norm
8 else
9 ε
k
(1 + γ)ε
k1
// increase norm
10 end
11 x
0
k
x + ε
k
δ
k
kδ
k
k
2
// projection
12 x
0
k
clip(x
0
k
,0,1) // make x
0
k
[0,1]
n
13 end
14 Return x
0
k
that has lowest norm kx
0
k
xk
2
and
is adversarial;
N
G
N
G
J
c
N
[
c
N
[
[
N
[
c
N
HJ
N
H
N
G
N
G
N
G
N
G
J
J
c
N
[
N
[
c
N
HJ
N
H
c
N
[
[
N
HJ
(a) (b)
Figure 1: Update of ε
k
by DDN in the case that (a) both x
0
k2
and x
0
k1
are not adversarial, (b) x
0
k2
is not adversarial and
x
0
k1
is adversarial.
the decision boundary, since the variation of the norm
ε
k1
was less than expected. Hence, we propose to
update the norm ε
k
with a larger factor γ
k
, in order
to make x
0
k
cross the decision boundary with a higher
probability. We do this by scaling γ
k1
with a ratio of
1 + α
γ
(see step 10).
On the other hand, if x
0
k2
is non-adversarial and
x
0
k1
is adversarial or vice visa, we reduce the vari-
ation of the norm ε
k
by adjusting the factor γ
k
to a
smaller value (see step 12). In this way, we let x
0
k
ap-
proach closer to the decision boundary with a higher
probability.
In the rest of the algorithm, the norm ε
k
is modi-
fied with the self-adaptive factor γ
k
. With the pro-
Algorithm 2: Algorithm of Self-Adaptive Norm
Update Attack.
Input: x: original image to be attacked
Input: y: true label (untargeted) or targeted
label (targeted)
Input: K: number of iterations
Input: α: step size in the direction of g
Input: γ
0
: initial norm update factor
Input: α
γ
: step size for updating γ
k
with
cosine annealing scheduling
Output: x
0
: adversarial image
1 Initialize δ
0
0, x
0
0
x, ε
0
1 ;
2 Initialize A
1
0 // [ANU]
3 If targeted attack: m 1 else m +1;
4 for k 1 to K do
5 g m
δ
k1
J(x
0
k1
,y,θ);
6 δ
k
δ
k1
+ α
g
kgk
2
;
/* start for updating γ
k
*/
7 A
2
A
1
;
8 A
1
Bool(x
0
k1
is adversarial);
9 if A
1
= A
2
then
10 γ
k
(1 + α
γ
)γ
k1
// increase γ
k
11 else
12 γ
k
(1 α
γ
)γ
k1
// decrease γ
k
13 end
/* end for updating γ
k
*/
14 if x
0
k1
is adversarial then
15 ε
k
(1 γ
k
)ε
k1
// [ANU]
16 else
17 ε
k
(1 + γ
k
)ε
k1
// [ANU]
18 end
19 x
0
k
x + ε
k
δ
k
kδ
k
k
2
// projection
20 x
0
k
clip(x
0
k
,0,1) // make x
0
k
[0,1]
n
21 end
22 Return x
0
k
that has lowest norm kx
0
k
xk
2
and
is adversarial;
posed algorithm, we expect that the adversarial image
with a comparable minimum L
2
norm can be obtained
with less number of iterations than DDN. Note that
our algorithm degrades to DDN when α
γ
is set to be
zero.
Suppose that γ
k1
is equal to γ, Figure 2 (a) shows
that ANU scales the norm ε
k
to a larger value (com-
pared to Figure 1 (a)) with a greater factor γ
k
than γ,
based on the observation that both ε
k2
and ε
k1
are
non-adversarial. In other words, x
0
k
has a higher prob-
ability of being adversarial. In the case that x
0
k2
is
not adversarial and x
0
k1
is adversarial, Figure 2 (b)
shows that the norm ε
k
reduces back to (1 γ
k
)(1 +
γ
k1
)ε
k2
, which is supposed to be greater than that
computed by DDN (as shown in Figure 1 (b)), since
γ
k
is smaller than γ
k1
. It is equal to say that the newly
Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses
19
N
G
N
G
J
c
N
[
c
N
[
[
N
[
c
N
H
N
G
NN
HJ
NN
HJ
N
G
N
G
N
G
J
J
c
N
[
N
[
c
N
H
c
N
[
[
N
N
HJ
NN

N
HJJ
(a) (b)
Figure 2: Update of ε
k
by ANU in the case that (a) both x
0
k2
and x
0
k1
are not adversarial, (b) x
0
k2
is not adversarial and
x
0
k1
is adversarial.
computed x
0
k
has a higher probability of being adver-
sarial under the condition that ε
k
is less than ε
k1
.
4 ADVERSARIAL TRAINING
WITH ANU
As having been described in Section 2.5, adversarial
examples generated by the attacks can be used to aug-
ment the training set. The trained model is thus ex-
pected to be robust to adversarial attacks. In this pa-
per, we mainly focus on the robustness of the models
against the attacks, hence we simplify the Eqn. (6) to
consider only the adversarial examples:
J
(x,y,θ) = J(x
0
,y,θ)
(7)
where x
0
is an adversarial example of x produced by
the ANU algorithm, which is projected to be within an
ε-sphere around x. In other words, adversarial exam-
ples used for training are constrained to have a maxi-
mum norm of ε.
Due to the computational efficiency of ANU, we
expect that the robust models based on the ANU at-
tacks can be trained with considerably less time, with
comparable robustness to that based on the DDN at-
tacks.
5 EXPERIMENTAL EVALUATION
We have conducted extensive experiments on the
MNIST, CIFAR-10 and ImageNet datasets, showing
the effectiveness of our proposed ANU algorithm
compared to the state-of-the-art L
2
attack DDN. We
used the same model architectures for training as
in (Rony et al., 2019). The base image classifiers,
trained with 50 epochs, have 99.41% and 85.64%
accuracy on the test sets of MNIST and CIFAR-
10, respectively. A pre-trained Inception V3 model
(Szegedy et al., 2016) was used for the ImageNet ex-
periments, which takes cropped images of size 299 ×
299. All the images were normalized in the [0,1]
range.
For the experiments on DDN, the hyper-
parameters were initialized in the same way as (Rony
et al., 2019), i.e. ε
0
= 1 and γ = 0.05. The initial step
size α (in the direction of gradient g) was set to 1,
which was reduced with cosine annealing to 0.01 in
the last iteration. The norm update factor γ was cho-
sen to be 0.05, since the norm of the adversarial per-
turbation at the best case can be reduced to ε
0
(1 γ)
K
after K iterations. For K >= 100 iterations, γ = 0.05
is enough for the algorithm to find an adversarial ex-
ample with the smallest possible perturbation (change
one pixel by 1/255 for images encoded in 8 bit val-
ues) if it exists. The details of the derivation can be
referred to (Rony et al., 2019). For the experiments
on ANU, the step size α was set in the same way as
DDN. The step size α
γ
(for updating the norm update
factor γ
k
) was also set with cosine annealing.
In this section, we first verify the basic idea of
ANU with a small example, illustrating its improve-
ment on the process of norm update. Then we com-
pare the performance results of ANU and DDN. Fi-
nally, we evaluate the robustness of the defense mod-
els trained with these two algorithms respectively.
5.1 Illustration
We first conducted the two gradient-based attacks
with 100 iterations on the first image from the MNIST
test dataset, using the base classifier for MNIST. Fig-
ure 3 shows the minimum norm ε
k
that has been
achieved ever for those iterations where the per-
turbed image x
0
k
is adversarial, as it iterates from 1
to 100. Suppose that S denote the set of all itera-
tions where the perturbed images are adversarial, i.e.
S = {i|x
0
i
is adversarial,1 i K}. Given the in-
dex k of such an iteration, we have ε
k
= min{ε
i
|i
S and i k}. It can be observed that the DDN attack
gradually reduces the norm of the adversarial noise
at a steady step after learning the first adversarial im-
age. The ANU attack first continues to increase the
norm update factor γ
i
and learns the first adversarial
image much earlier than DDN, and then changes γ
i
at
a self-adjusted pace. We observe that ANU can arrive
at a steady solution of the lowest adversarial norm ε
k
much faster than DDN.
We also conducted a similar experiment on the
first 1000 images from the MNIST test dataset, with
1000 iterations of attacks. Figure 4 also shows that in
the beginning the ANU attack aggressively increases
the norm update factors to find adversarial images,
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
20
Figure 3: Change of the minimum L
2
norm achieved for the
adversarial images during the attack of one MNIST sample
for 100 iterations.
Figure 4: Change of the mean L
2
norm for the perturbed
images during the attack of 1000 MNIST samples for 1000
iterations.
and then adaptively adjusts the update factors with re-
duced mean L
2
norm. It further verifies that the ANU
algorithm approaches to the neighbor of the minimum
adversarial norm much faster than DDN.
5.2 Attack Evaluation
For a complete comparison of our proposed ANU al-
gorithm with DDN, we conducted two sets of exper-
iments: untargeted attacks and targeted attacks. We
generated attacks for the first 1000 images from the
test datasets of MNIST and CIFAR-10 respectively,
using the base image classifiers as described earlier.
For ImageNet, we randomly chose one image for each
of the 1000 classes from the validation set.
Table 1 shows the results for the untargeted at-
tacks, including the number of iterations (budget)
taken, the mean and median L
2
norms of all attacks as
well as the execution times that were reported on an
NVIDIA GTX 2080 Ti with 11GB of memory. Note
that all the images were successfully attacked. For
simplicity, we used a common set of hyperparameters
for ANU. We set γ
0
to be 0.4, 0.2, 0.1 and initialized
α
γ
to be 0.15, 0.1, 0.08 for MNIST, CIFAR-10 and
ImageNet respectively. Note that it is possible to get
Table 1: Performance comparison of ANU and DDN for the
untargeted attacks on MNIST, CIFAR-10 and ImageNet.
Attack Budget Mean Median time
L
2
L
2
(s)
50 1.5095 1.4738 1.0
70 1.4610 1.4428 1.1
100 1.4470 1.4400 1.4
DDN 200 1.4411 1.4358 2.5
500 1.4362 1.4334 6.2
MNIST
1000 1.4346 1.4328 12.2
50 1.4572 1.4429 0.8
70 1.4457 1.4336 1.0
ANU 100 1.4425 1.4311 1.4
200 1.4363 1.4310 2.8
500 1.4362 1.4297 6.3
1000 1.4347 1.4297 12.4
50 0.1900 0.1657 1.9
70 0.1709 0.1536 2.6
100 0.1676 0.1521 3.5
DDN 200 0.1665 0.1511 6.9
500 0.1659 0.1510 17.6
CIFAR-10
1000 0.1655 0.1502 35.1
50 0.1679 0.1519 1.8
70 0.1669 0.1511 2.5
100 0.1662 0.1509 3.5
ANU 200 0.1658 0.1505 6.8
500 0.1656 0.1500 17.1
1000 0.1655 0.1498 35.1
50 0.5116 0.4937 220.9
70 0.4862 0.4853 311.4
100 0.4769 0.4805 435.3
DDN 200 0.4674 0.4738 872.2
500 0.4576 0.4704 2187.9
ImageNet
1000 0.4497 0.4686 4365.1
50 0.4809 0.4836 219.6
70 0.4717 0.4801 307.3
100 0.4667 0.4772 445.6
ANU 200 0.4597 0.4743 876.5
500 0.4529 0.4717 2197.6
1000 0.4469 0.4700 4384.5
better performance for ANU by tuning these hyperpa-
rameters.
It can be observed that the mean/media norm
achieved for both DDN and ANU attacks decreases as
the budget increases, with more execution time. With
the same budget, the ANU attack generally achieves
less mean/median norm, with comparable run time.
More interestingly, the ANU attack with less number
of iterations can achieve comparable or better perfor-
mance than DDN. For example, the ANU attack with
70 iterations achieves less norms than DDN with 100
iterations, with around 30% reduced time.
We also conducted the targeted attacks using both
the ANU and DDN algorithms. We set both γ
0
and
the initial value of α
γ
to be 0.05 for MNIST and Im-
Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses
21
ageNet. For CIFAR-10, we set γ
0
to be 0.15 and ini-
tialized α
γ
to be 0.06.
Using all 9 possible classes (except for the true la-
bel) as the target, we generated 9 attacks for each test
image of MNIST and CIFAR-10. We generated at-
tacks against 20 randomly chosen classes for each test
image of ImageNet. Results are reported in Table 2
for the totally 9000 attacks on MNIST and CIFAR-10
respectively, and 20000 attacks on ImageNet. Simi-
lar conclusions can be obtained as the untargeted at-
tacks. Note that the ANU attack has no absolute ad-
vantage over DDN for the case of 1000 iterations. It
may be explained by the reason that with large num-
ber of iterations (e.g., greater than or equal to 1000),
it provides enough time for DDN to cover the neigh-
borhood of the decision boundary, as shown in Fig-
ure 4. However, in this paper we are more interested
in the attacks with less number of iterations. Notably,
the ANU attack with less number of iterations (85,
80 and 87 on MNIST, CIFAR-10 and ImageNet re-
spectively) achieves better results than the DDN at-
tack with 100 iterations, with reduced execution time
(by about 15%, 20% and 12% on MNIST, CIFAR-10
and ImageNet respectively).
5.3 Defense Evaluation
To conduct a fair comparison to DDN, we used the
same architectures as (Rony et al., 2019) for train-
ing robust models. A small CNN architecture was
used for MNIST (Rony et al., 2019; Carlini and Wag-
ner, 2017). A Wide ResNet (Zagoruyko and Ko-
modakis, 2016) with 28 layers and widening factor
of 10 (WRN-28-10) was used for CIFAR-10. For
each step of adversarial training, we attacked the im-
age with DDN with a budget of 100 iterations. When
trained with ANU, we attacked the image with less
number of iterations. The norm of the perturbations
is limited to a maximum ε = 2.4 on the MNIST ex-
periments, and ε = 1 on the CIFAR-10 experiments.
The detailed settings can be referred to (Rony et al.,
2019).
We trained a robust MNIST model with ANU and
set the number of iterations to 70 for generating each
adversarial example, which has a test accuracy of
98.88% on the clean samples, with the total train-
ing time of 3857.8 seconds. We also trained a ro-
bust MNIST model with DDN as the internal attack
for generating the adversarial examples, which has an
accuracy of 98.96%, with the total training time of
4840.4 seconds.
1
We report the times on ImageNet with the original ones
divided by 20.
Table 2: Performance comparison of ANU and DDN for the
targeted attacks on MNIST, CIFAR-10 and ImageNet.
Attack Budget Mean Median time
L
2
L
2
(s)
50 2.1069 2.0952 5.6
85 2.0527 2.0317 9.5
100 2.0453 2.0311 11.2
DDN 300 2.0283 2.0191 34.8
500 2.0258 2.0174 60.2
MNIST
1000 2.0229 2.0149 117.3
50 2.0847 2.0705 5.6
85 2.0447 2.0264 9.6
100 2.0391 2.0255 11.2
ANU 300 2.0273 2.0179 36.4
500 2.0258 2.0172 60.8
1000 2.0239 2.0156 118.9
50 0.3510 0.3290 15.0
80 0.3409 0.3233 24.0
100 0.3394 0.3209 30.5
DDN 300 0.3359 0.3187 95.3
500 0.3350 0.3176 157.6
CIFAR-10
1000 0.3340 0.3163 324.5
50 0.3433 0.3257 15.1
80 0.3392 0.3214 24.7
100 0.3381 0.3209 31.3
ANU 300 0.3357 0.3179 95.6
500 0.3349 0.3170 158.5
1000 0.3344 0.3172 329.5
80 0.8194 0.7756 245.5
1
87 0.7986 0.7556 266.8
100 0.7737 0.7337 305.1
DDN 300 0.6677 0.6305 921.9
500 0.6446 0.6066 1532.2
ImageNet
1000 0.6234 0.5897 3059.5
80 0.7881 0.7424 245.8
87 0.7718 0.7257 268.3
100 0.7545 0.7107 306.0
ANU 300 0.6651 0.6264 924.3
500 0.6439 0.6056 1537.1
1000 0.6229 0.5888 3071.1
We trained the robust CIFAR-10 models with
these two attacks. The model trained with ANU by
setting the number of iterations to 80, has a test accu-
racy of 87.08%, with the training time of 36.1 hours.
The CIFAR-10 model trained with DDN has an accu-
racy of 86.76%, with the training time of 44.2 hours.
It can be concluded that the ANU attack-based robust
models with less budget can achieve comparable test
accuracy on the clean samples as the models trained
with the DDN attack, with around 20% less training
time.
We also ran multiple attacks to the three types of
models on MNIST and CIFAR-10 respectively: the
baseline (the base image classifier as described in the
beginning of this section), the DDN-attack based and
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
22
(a)
(b)
Figure 5: Models robustness on (a) MNIST and (b) CIFAR-
10, as we increase the maximum allowed L
2
norm for the
attack.
the ANU-attack based models, with different values
of the maximum allowed L
2
norm for the perturba-
tions. As shown in Figure 5, the accuracy of the three
models decreases as the maximum allowed norm ε
increases. However, the robustly trained models are
much slower in decrease of accuracy. It is worth not-
ing that the models trained with ANU achieve compa-
rable robustness to those trained with DDN, although
it takes less budget and reduces the training time by
around 20% for them.
Figure 6 shows some adversarial examples gen-
erated by the ANU attack with 1000 iterations for
the above different models. We chose such an attack
since it is stronger for evaluating the model robust-
ness compared to those with less number of iterations.
It can be observed that adversarial examples for the
baseline model have the smallest L
2
norm, while the
largest perturbations are needed to successfully attack
the model trained with ANU.
6 CONCLUSIONS AND FUTURE
WORK
In this paper we presented the Self-Adaptive Norm
Update gradient-based L
2
attack, which learns to
濢瀅濼濺濼瀁濴濿
濕濴瀆濸濿濼瀁濸
濗濗濡激濵
濔濡濨激濵
濅濁濊濃 濅濁濄濅 濄濁濊濈 濄濁濌濄
濅濁濋濇 濅濁濆濈 濄濁濌濅 濄濁濌濋
濄濁濅濆 濃濁濉濉 濄濁濃濇 濃濁濊濃
濄濁濃濈 濄濁濄濌 濄濁濊濋 濄濁濆濅
濄濁濆濆 濄濁濇濌 濄濁濋濌 濄濁濇濃
濃濁濄濋
濃濁濄濅 濃濁濆濅 濃濁濅濅
Figure 6: Adversarial examples against three models: base-
line, DDN-based, ANU-based defenses. Text on top of each
image indicates L
2
norm of the noise kδk
2
. Text on bottom
indicates the predicted class
2
.
scale the norm update factor in a self-adaptive way
in each iteration. In comparison to the state-of-the-
art Decoupled Direction and Norm L
2
attack, our al-
gorithm achieves comparable or even better perfor-
mance with fewer iterations. The proposed attack can
then be used to speed up the process of adversarial
training, where the attack is used to generate adver-
sarial examples close to the decision boundary. The
experiments with the MNIST and CIFAR-10 datasets
show that the robust models trained with our attack
have comparable robustness to those based on DDN,
while taking about 20% less training time.
Some techniques have been proposed in the liter-
ature (Shafahi et al., 2019; Wong et al., 2020; Zheng
et al., 2020) to speed up the adversarial training, by
alleviating the time overhead of attacks during the
training. These techniques are complementary to our
work. In the future, we may explore to combine our
attack algorithm with them to further accelerate ad-
versarial training based on L
2
attacks. On the other
way around, we may study the problem of improv-
ing the accuracy of robust models for both clean and
L
2
norm-based adversarial examples, by differentiat-
ing the distributions of these two class of images (Xie
et al., 2020).
REFERENCES
Athalye, A., Carlini, N., and Wagner, D. (2018). Obfuscated
gradients give a false sense of security: Circumvent-
ing defenses to adversarial examples. In International
Conference on Machine Learning (ICML).
Biggio, B., Corona, I., Maiorca, D., Nelson, B.,
ˇ
Srndi´c,
N., Laskov, P., Giacinto, G., and Roli, F. (2013). Eva-
sion attacks against machine learning at test time. In
ECML-PKDD.
Carlini, N. and Wagner, D. (2017). Towards evaluating the
robustness of neural networks. In IEEE Symposium
on Security and Privacy (SP).
2
For CIFAR-10, 0: airplane, 1: automobile, 2: bird, 5:
dog, 8: ship, 9: truck.
Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses
23
Chen, P.-Y., Sharma, Y., Zhang, H., Yi, J., and Hsieh, C.-J.
(2018). Ead: Elastic-net attacks to deep neural net-
works via adversarial examples. In AAAI Conference
on Artificial Intelligence.
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., and Li,
J. (2018). Boosting adversarial attacks with momen-
tum. In IEEE Conference on Computer Vision and
Pattern Recognition.
Girshick, R. (2015). Fast r-cnn. In IEEE International Con-
ference on Computer Vision.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-
plaining and harnessing adversarial examples. In In-
ternational Conference on Learning Representations.
Gu, S., Yi, P., Zhu, T., Yao, Y., and Wang, W. (2019). De-
tecting adversarial examples in deep neural networks
using normalizing filters. In International Confer-
ence on Agents and Artificial Intelligence - Volume 2:
ICAART.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In IEEE Confer-
ence on Computer Vision and Pattern Recognition.
Kurakin, A., Goodfellow, I., and Bengio, S. (2016). Ad-
versarial examples in the physical world. In arXiv
preprint arXiv:1607.02533.
Kurakin, A., Goodfellow, I., and Bengio, S. (2017). Ad-
versarial machine learning at scale. In International
Conference on Learning Representations (ICLR).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
C.-Y.Fu, and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In European Conference on Com-
puter Vision.
Machado, G., Goldschmidt, R., and Silva, E. (2019). Mul-
timagnet: A non-deterministic approach based on the
formation of ensembles for defending against adver-
sarial images. In International Conference on Enter-
prise Information Systems - Volume 1: ICEIS.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and
Vladu, A. (2018). Towards deep learning models re-
sistant to adversarial attacks. In International Confer-
ence on Learning Representations.
Modas, A., Moosavi-Dezfooli, S.-M., and Frossard, P.
(2019). Sparsefool: a few pixels make a big differ-
ence. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P.
(2016). Deepfool: a simple and accurate method to
fool deep neural networks. In IEEE Conference on
Computer Vision and Pattern Recognition.
Papernot, N., Mcdaniel, P., Jha, S., Fredrikson, M., Celik,
Z. B., and Swami, A. (2016). The limitations of deep
learning in adversarial settings. In IEEE Symposium
on Security and Privacy.
Prakash, A., Moran, N., Garber, S., DiLillo, A., and Storer,
J. (2018). Protecting jpeg images against adversarial
attacks. In Data Compression Conference.
Redmonand, J., Divvala, S. K., Girshick, R. B., and Farhadi,
A. (2016). You only look once: Unified, real-time
object detection. In IEEE Conference on Computer
Vision and Pattern Recognition.
Rony, J., Hafemann, L. G., Oliveira, L. S., Ayed, I. B.,
Sabourin, R., and Granger, E. (2019). Decoupling di-
rection and norm for efficient gradient-based l2 adver-
sarial attacks and defenses. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 4317–4325.
Shafahi, A., Najibi, M., Ghiasi, A., Xu, Z., Dickerson, J.,
Studer, C., Davis, L. S., Taylor, G., and Goldstein,
T. (2019). Adversarial training for free! In Neural
Information Processing Systems (NeurIPS).
Simonyan, K. and Zisserman, A. (2015). Very deep con-
volutional networks for large-scale image recognition.
In International Conference on Learning Representa-
tions.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2015). Going deeper with convolutions.
In IEEE Conference on Computer Vision and Pattern
Recognition.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,
Z. (2016). Rethinking the inception architecture for
computer vision. In IEEE Conference on Computer
Vision and Pattern Recognition.
Szegedy, H., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., and Fergus, R. (2013). Intriguing
properties of neural networks. In International Con-
ference on Learning Representations.
Uesato, J., O’Donoghue, B., van den Oord, A., and
Kohli, P. (2018). Adversarial risk and the dangers
of evaluating against weak attacks. In arXiv preprint
arXiv:1802.05666.
Wong, E., Rice, L., and Kolter, J. (2020). Fast is better than
free: Revisiting adversarial training. In International
Conference on Learning Representations.
Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A. L., and Le,
Q. V. (2020). Adversarial examples improve image
recognition. In IEEE Conference on Computer Vision
and Pattern Recognition.
Zagoruyko, S. and Komodakis, N. (2016). Wide residual
networks. In Proceedings of the British Machine Vi-
sion Conference.
Zheng, H., Zhang, Z., Gu, J., Lee, H., and Prakash, A.
(2020). Efficient adversarial training with transferable
adversarial examples. In IEEE Conference on Com-
puter Vision and Pattern Recognition(CVPR).
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
24