Uncertainty-Based Detection of Adversarial Attacks

in Semantic Segmentation

Kira Maag

and Asja Fischer

Technical University of Berlin, Germany

Ruhr University Bochum, Germany

ﬁ

Keywords:

Deep Learning, Semantic Segmentation, Adversarial Attacks, Detection.

Abstract:

State-of-the-Art deep neural networks have proven to be highly powerful in a broad range of tasks, including

semantic image segmentation. However, these networks are vulnerable against adversarial attacks, i.e., non-

perceptible perturbations added to the input image causing incorrect predictions, which is hazardous in safety-

critical applications like automated driving. Adversarial examples and defense strategies are well studied for

the image classiﬁcation task, while there has been limited research in the context of semantic segmentation.

First works however show that the segmentation outcome can be severely distorted by adversarial attacks. In

this work, we introduce an uncertainty-based approach for the detection of adversarial attacks in semantic

segmentation. We observe that uncertainty as for example captured by the entropy of the output distribution

behaves differently on clean and perturbed images and leverage this property to distinguish between the two

cases. Our method works in a light-weight and post-processing manner, i.e., we do not modify the model or

need knowledge of the process used for generating adversarial examples. In a thorough empirical analysis, we

demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.

1 INTRODUCTION

In recent years, deep neural networks (DNNs)

have demonstrated outstanding performance and have

proven to be highly expressive in a broad range of

tasks, including semantic image segmentation (Chen

et al., 2018; Pan et al., 2022). Semantic segmentation

aims at segmenting objects in an image by assigning

each pixel to a ﬁxed and predeﬁned set of semantic

classes, providing comprehensive and precise infor-

mation about the given scene. However, DNNs are

vulnerable to adversarial attacks (Bar et al., 2021)

which is very hazardous in safety-related applica-

tions like automated driving. Adversarial attacks are

small perturbations added to the input image caus-

ing the DNN to perform incorrect predictions at test

time. The perturbations are not perceptible to hu-

mans, making the detection of these examples very

challenging, see for example Figure 1. This undesir-

able property of DNNs is a major security concern in

real world applications. Hence, developing efﬁcient

strategies against adversarial attacks is of high impor-

tance. Such strategies can either increase the robust-

ness of DNNs making it more difﬁcult to generate ad-

versarial examples (defense) or build on approaches

(a) Input image (perturbed half on right hand side)

(b) Semantic Segmentation prediction

Figure 1: Semantic segmentation prediction and entropy

heatmap for clean (left) and perturbed image (right) gen-

erated by a dynamic target attack for hiding pedestrians.

to detect adversarial attacks (detection).

Adversarial attacks have attracted much attention,

and numerous attacks as well as detection strategies

Maag, K. and Fischer, A.

Uncertainty-Based Detection of Adversarial Attacks in Semantic Segmentation.

DOI: 10.5220/0012303500003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 2: VISAPP, pages

37-46

ISBN: 978-989-758-679-8; ISSN: 2184-4321

have been proposed (Khamaiseh et al., 2022). How-

ever, adversarial examples have not been analyzed ex-

tensively beyond standard image classiﬁcation mod-

els, often using small datasets such as MNIST (LeCun

and Cortes, 2010) or CIFAR10 (Krizhevsky, 2009).

The vulnerability of modern DNNs to attacks in more

complex tasks like semantic segmentation in the con-

text of real datasets from different domains has been

rather poorly explored. The attacks which have been

tested so far on semantic segmentation networks can

be divided roughly into three categories. The ﬁrst ap-

proaches transfer common attacks from image clas-

siﬁcation to semantic segmentation, i.e., pixel-wise

classiﬁcation (Agnihotri and Keuper, 2023; Gu et al.,

2022; Rony et al., 2022). Other works (Cisse et al.,

2017; Xie et al., 2017) have presented attacks speciﬁ-

cally designed for the task of semantic segmentation,

either attacking the input in a way that leads to the

prediction of some predeﬁned and image-unrelated

segmentation mask or the complete deletion of one

segmentation class (Metzen et al., 2017). Such at-

tacks are more difﬁcult to detect than attacks that per-

turb each pixel independently. The third group of at-

tacks creates rectangular patches smaller than the in-

put size, which leads to an erroneous prediction of the

entire image (Nakka and Salzmann, 2020; Nesti et al.,

2022). Defense methods aim to be robust against

such attacks, i.e., to achieve high prediction accuracy

even on perturbed images. For semantic segmenta-

tion tasks, defense approaches are often robust against

only one type of attack (Arnab et al., 2018; Klingner

et al., 2020; Yatsura et al., 2022). In contrast, de-

tection methods aim at classifying an input as clean

or perturbed based on the output of the segmentation

model (Xiao et al., 2018).

In this paper, we present an uncertainty-based ap-

proach for detecting several kinds of adversarial at-

tacks on semantic image segmentation models. Un-

certainty information has already been exploited for

the detection of adversarial attacks on classiﬁcation

DNNs, but has not been investigated in the context

of segmentation so far. In (Feinman et al., 2017) an

approximation to Bayesian inference (Monte-Carlo

Dropout) is proposed, which is widely employed to

estimate model uncertainty, for the detection of ad-

versarial attacks. The gradient-based approach intro-

duced in (Michel and Ewetz, 2022) generates salient

features which are used to train a detector. While both

methods require access to the inside of the model, our

approach can be applied as a post-processing step us-

ing only information of the network output. We con-

struct features per image based on uncertainty infor-

mation provided by the DNN such as the entropy of

the output distribution. In Figure 1 (c), the entropy

heatmaps for a clean (left) and a perturbed image

(right) are shown, indicating high uncertainties in the

attacked regions, motivating the use of uncertainty in-

formation to separate clean and perturbed images. On

the one hand, these features for clean and perturbed

inputs are fed into an one-class support vector ma-

chine that performs unsupervised novelty detection

(Weerasinghe et al., 2018). On the other hand, we

train a logistic regression model with clean and per-

turbed data for classiﬁcation. The perturbed data used

during training is generated by only one kind of adver-

sarial attack method, while the detector is applied to

identify adversarial examples of other methods. Our

approach neither modiﬁes the semantic segmentation

model nor requires knowledge of the process for gen-

erating adversarial examples. We only assume that

our post-processing model is kept private while the

attacker may have full access to the semantic segmen-

tation model.

In our tests, we employ state-of-the-art seman-

tic segmentation networks (Chen et al., 2018; Pan

et al., 2022) applied to the Cityscapes (Cordts et al.,

2016) as well as Pascal VOC2012 dataset (Evering-

ham et al., 2012) demonstrating our adversarial at-

tack detection performance. To this end, we con-

sider different types of attackers, from pixel-level

attacks designed for image classiﬁcation (Goodfel-

low et al., 2015; Kurakin et al., 2017) to pixel-

wise attacks developed for semantic segmentation

(Metzen et al., 2017) and patch-based ones (Nesti

et al., 2022). The source code of our method

is publicly available at https://github.com/kmaag/

Adversarial-Attack-Detection-Uncertainty. Our con-

tributions are summarized as follows:

• We introduce a new uncertainty-based approach

for the detection of adversarial attacks for the se-

mantic image segmentation task. In a thorough

empirical analysis, we demonstrate the capabil-

ity of uncertainty measures to distinguish between

clean and perturbed images. Our approach serves

as a light-weight post-processing step, i.e., we do

not modify the model or need knowledge of the

process for generating adversarial examples.

• For the ﬁrst time, we present a detection method

that was not designed for a speciﬁc adversarial at-

tack, rather has a high detection capability across

multiple types. We achieve averaged detection ac-

curacy values of up to 100% for different network

architectures and datasets.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

2 RELATED WORK

In this section, we discuss the related works on de-

fense and detection methods for the semantic seg-

mentation task. Defense methods aim to achieve high

prediction accuracy even on perturbed images, while

detection methods classify the model input as clean

or attacked image. A dynamic divide-and-conquer

strategy (Xu et al., 2021) and multi-task training

(Klingner et al., 2020), which extends supervised se-

mantic segmentation by a self-supervised monocular

depth estimation using unlabeled videos, are consid-

ered as adversarial training approaches enhancing the

robustness of the networks. Another kind of defense

strategy is input denoising to remove the perturba-

tion from the input without the necessity to re-train

the model. In (Bar et al., 2021) image quilting and

the non-local means algorithm are presented as input

transformation techniques. To denoise the perturba-

tion and restore the original image, a denoise autoen-

coder is used in (Cho et al., 2020). The demasked

smoothing technique, introduced in (Yatsura et al.,

2022), reconstructs masked regions of each image

based on the available information with an inpaint-

ing model defending against patch attacks. Another

possibility to increase the robustness of the model is

during inference. In (Arnab et al., 2018) is shown

how mean-ﬁeld inference and multi-scale processing

naturally form an adversarial defense. The non-local

context encoder proposed in (He et al., 2019) mod-

els spatial dependencies and encodes global contexts

for strengthening feature activations. From all pyra-

mid features multi-scale information is fused to reﬁne

the prediction and create segmentation. The presented

works up to now are defense methods improving the

robustness of the model. To the best of our knowl-

edge so far only one work focuses on detecting adver-

sarial attacks on segmentation models, i.e., the patch-

wise spatial consistency check which is introduced in

(Xiao et al., 2018).

The described defense approaches are created for

and tested only on a speciﬁc type of attack. The prob-

lem is that you assume a high model robustness, how-

ever, the defense method may perform poorly on new

unseen attacks and does not provide a statement about

this insecurity. Therefore, we present an uncertainty-

based detection approach which shows strong results

over several types of adversarial attacks. The pre-

sented detection approach (Xiao et al., 2018) is only

tested on perturbed images attacked in such a way

that a selected image is predicted. The spatial consis-

tency check randomly selects overlapping patches to

obtain pixel-wise conﬁdence vectors. In contrast, we

use only information of the network output from one

inference and not from (computationally expensive)

multiple runs of the network. For these reasons, the

detection approach introduced in (Xiao et al., 2018)

cannot be considered as a suitable baseline.

Post-processing classiﬁcation models as well as

simple output-based methods are used for false

positive detection (Maag et al., 2020; Maag and

Rottmann, 2023) and out-of-distribution segmenta-

tion (Maag et al., 2022; Hendrycks and Gimpel,

2016), but have not been investigated for adversarial

examples.

3 ADVERSARIAL ATTACKS

For the generation of adversarial examples, we dis-

tinguish between white and black box attacks. White

box attacks are created based on information of the

victim model, i.e., the adversarial attacker has access

to the full model, including its parameters, and knows

the loss function used for training. In contrast, black

box attackers have zero knowledge about the victim

model. The idea behind these type of attacks is trans-

ferability, i.e., an adversarial example generated from

another model works well with the victim one. The

attacks described in the following belong to the white

box setting and were proposed to attack semantic seg-

mentation models.

Attacks on Pixel-Wise Classiﬁcation. The attacks

described in this paragraph were originally developed

for image classiﬁcation and were (in a modiﬁed ver-

sion) applied to semantic segmentation. For semantic

segmentation, given an image x a neural network with

parameters w provides pixel-wise probability distri-

butions f (x;w)

i j

over a label space C = {y

,...,y

}

per spatial dimension (i, j). The single-step untar-

geted fast gradient sign method (FGSM, (Goodfellow

et al., 2015)) creates adversarial examples by adding

perturbations to the pixels of an image x with (one-hot

encoded) label y that leads to an increase of the loss L

(here cross entropy), that is

adv

i j

= x

i j

+ ε · sign(∇

i j

( f (x;w)

i j

)) , (1)

where ε is the magnitude of perturbation. The single-

step targeted attack with target label y

instead de-

creases the loss for the target label and is given by

adv

i j

= x

i j

− ε · sign(∇

i j

( f (x;w)

i j

)) . (2)

Following the convention, the least likely class pre-

dicted by the model is chosen as target class. This

attack is extended in (Kurakin et al., 2017) in an it-

erative manner (I-FGSM) to increase the perturbation

Uncertainty-Based Detection of Adversarial Attacks in Semantic Segmentation

strength

adv

i j,t+1

= (3)

clip

x,ε



adv

i j,t

+ α · sign(∇

adv

i j

( f (x

adv

;w)

i j

))



with x

adv

= x, step size α, and a clip function ensuring

that x

adv

∈ [x−ε,x+ε]. The targeted case (see eq. (2))

can be formulated analogously. Based on these at-

tacks, further methods for pixel-wise perturbations in

the classiﬁcation context have been proposed such as

projected gradient descent (Madry et al., 2018; Bry-

niarski et al., 2022) and DeepFool (Moosavi-Dezfooli

et al., 2016). Some of these approaches have been fur-

ther developed and adapted to semantic segmentation

(Agnihotri and Keuper, 2023; Gu et al., 2022; Rony

et al., 2022).

Stationary Segmentation Mask Attacks. Another

type of attacks are so called stationary segmentation

mask methods (SSMM) where the pixels of a whole

image are iteratively attacked until most of the pixels

have been mis-classiﬁed into the target class (Cisse

et al., 2017; Xie et al., 2017). For each spatial dimen-

sion (i, j) ∈ I , the loss function per image x is given

L( f (x;w),y) =

|I |

∑

(i, j)∈I

i j

( f (x;w)

i j

). (4)

In (Metzen et al., 2017), the universal perturbation is

introduced to achieve real-time performance for the

attack at test time. To this end, training inputs D

train

(k)

(k),target

}

k=1

are generated where y

(k),target

de-

ﬁnes a ﬁxed target segmentation. The universal noise

in iteration t + 1 is computed by

t+1

= clip



(5)

−α · sign(

∑

k=1

∇

L( f (x

(k)

+ ξ

;w),y

(k),target

)



with ξ

= 0. The loss of pixels which are predicted as

belonging to the desired target class with a conﬁdence

above a threshold τ are set to 0. At test time, this

noise is added to the input image and does not require

multiple calculations of the backward pass.

The dynamic nearest neighbor method (DNNM)

presented in (Metzen et al., 2017) aims to keep the

network’s segmentation unchanged but to remove a

desired target class. Let o be the object class be-

ing deleted and ˆy(x)

i j

= argmax

y∈C

f (x;w)

i j

the pre-

dicted class, where f (x; w)

i j

denotes the probability

the model assigns for the pixel at position (i, j) to

belong to class y, then I

= {(i, j)| ˆy(x)

i j

= o} and

¯o

= I \ I

. The target label is chosen by y

target

i j

ˆy(x)

′

with argmin

′

, j

′

)∈I

¯o

′

− i)

+ ( j

′

− j)

for all

(i, j) ∈ I

and y

target

i j

= ˆy(x)

i j

for all (i, j) ∈ I

¯o

. Since

the loss function described in eq. (4) weights all pix-

els equally though both objectives, i.e., hiding a ob-

ject class and being unobtrusive are not necessarily

equally important, a modiﬁed version of the loss func-

tion with weighting parameter ω is given by

( f (x;w),y) =

|I |

(ω

∑

(i, j)∈I

i j

( f (x;w)

i j

target

i j

)

+ (1 − ω)

∑

(i, j)∈I

¯o

i j

( f (x;w)

i j

target

i j

)).

(6)

Note, the universal perturbation can also be computed

for the DNNM.

Patch-Based Attacks. The idea behind patch at-

tacks is that perturbing a small region of the im-

age causes prediction errors in a much larger re-

gion (Nakka and Salzmann, 2020). In (Nesti et al.,

2022), the expectation over transformation (EOT)-

based patch attack is introduced to create robust ad-

versarial examples, i.e., individual adversarial exam-

ples that are at the same time adversarial over a

range of transformations. Transformations occurring

in the real world are for instance angle and viewpoint

changes. These perturbations are modeled within the

optimization procedure and an extension of the pixel-

wise cross entropy loss is additionally presented in

(Nesti et al., 2022) to enable crafting strong patches

for the semantic segmentation setting.

4 DETECTION METHOD

Our method does not alter the semantic segmenta-

tion model, nor does it require knowledge of the ad-

versarial example generation process. While the at-

tacker may have full access to the semantic segmenta-

tion model, we only assume that our post-processing

model is kept secret or not attacked. Our approach

can be applied to any semantic segmentation network

serving as a post-processing step using only informa-

tion of the network output. In Figure 2 an overview

of our approach is given.

The degree of uncertainty in a semantic segmenta-

tion prediction is quantiﬁed by pixel-wise dispersion

measures like the entropy

E(x)

i j

= −

∑

y∈C

f (x;w)

i j

log f (x; w)

i j

, (7)

the variation ratio

V (x)

i j

= 1 − max

y∈C

f (x;w)

i j

, (8)

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

input image

segmentation

network f

softmax

probs f (x;w)

semantic seg-

mentation ˆy(x)

detector

p(x)

p(x) ≥ κ

p(x) < κ

clean

perturbed

adversarial

attacker

features

Figure 2: Schematic illustration of our detection method. The adversarial attacker can have full access to the semantic

segmentation model. Information from the network output is extracted to construct the features which serve as input to the

detector model classifying between clean and perturbed images.

or the probability margin

M(x)

i j

= V (x)

i j

+ max

y∈C \{ˆy(x)

i j

}

f (x;w)

i j

. (9)

The entropy heatmaps for a clean (left) and a per-

turbed image (right) are shown in Figure 1 (c) in-

dicating that higher uncertainties occur in the at-

tacked regions which motivates the use of uncer-

tainty information to separate clean and perturbed

data. To obtain uncertainty features per image from

these pixel-wise dispersion measures, we aggregate

them over a whole image by calculating the averages

D = 1/|I |

∑

(i, j)∈I

D(x)

i j

where D ∈ {E,V,M}. More-

over, we obtain mean class probabilities for each class

y ∈ {1,...,C}

P(y|x) =

|I |

∑

(i, j)∈I

f (x;w)

i j

. (10)

The concatenation of this |C| + 3 features forms the

feature vectors used in the following. We compute

these image-wise features for a set of benign (and

adversarially changed) images, which are then used

to train classiﬁcation models providing per image a

probability p(x) of being clean (and not perturbed).

We classify x as perturbed if p(x) < κ and as clean if

p(x) ≥ κ, where κ is a predeﬁned detection threshold.

We explore different ways to construct such a classi-

ﬁer.

First, we consider two basic outlier detection tech-

niques which only require benign data, i.e., an one-

class support vector machine (OCSVM, (Sch

olkopf

et al., 1999)) and an approach for detecting out-

liers in a Gaussian distributed dataset learning an el-

lipse (Rousseeuw and Driessen, 1999). Second, we

consider the supervised logistic regression (LASSO,

(Tibshirani, 1996)) as classiﬁcation model trained on

the features extracted for clean and perturbed images.

Importantly, we do not require knowledge of the ad-

versarial example generation process used by the at-

tacker, instead we use attacked data generated by any

(other) adversarial attack (cross-attack). While the

OCSVM and the ellipse approach are unsupervised

and outlier detection is a difﬁcult task, the super-

vised cross-attack method has the advantage of hav-

ing already seen other types of perturbed data. Third,

we threshold only on the mean entropy

E (which re-

quires only to choose the threshold value) proposing a

very basic uncertainty-based detector. Note, applying

our detection method is light-weight, i.e., the feature

computation is inexpensive and classiﬁcation models

are trained in advance so that only one inference run

is added after semantic segmentation inference.

5 EXPERIMENTS

First, we present the experimental setting and then

evaluate our adversarial detection approach.

5.1 Experimental Setting

Datasets. We perform our tests on the Cityscapes

(Cordts et al., 2016) dataset for semantic segmen-

tation in street and on the Pascal VOC2012 (Ever-

ingham et al., 2012) (shorthand VOC) dataset of vi-

sual object classes in realistic scenes. The Cityscapes

dataset consists of 2,975 training and 500 validation

images of dense urban trafﬁc in 18 and 3 different

German towns, respectively. The VOC dataset con-

tains 1,464 training and 1,449 validation images with

annotations for the various objects of categories per-

son, animal, vehicle and indoor.

Segmentation Networks. We consider the state-

of-the-art DeepLabv3+ network (Chen et al., 2018)

with Xception65 backbone (Chollet, 2017). Trained

on Cityscapes, we achieve a mean intersection over

union (mIoU) value of 78.93 on the validation set

and trained on VOC, a validation mIoU value of

88.39. Moreover, we use the BiSeNet (Yu et al., 2018)

trained on Cityscapes obtaining a validation mIoU of

70.32. We consider also two real-time models for the

Cityscapes dataset, the DDRNet (Pan et al., 2022)

achieving 77.8 mIoU and the HRNet (Wang et al.,

2021) with 70.5 mIoU.

Uncertainty-Based Detection of Adversarial Attacks in Semantic Segmentation

(a) Input image (b) Prediction (c) I-FGSM

(d) I-FGSM

(e) Ground truth (f) SSMM (g) DNNM (h) Patch attack

Figure 3: Input image (a) with corresponding ground truth (e). Semantic segmentation prediction for clean (b) and perturbed

image generated by an untargeted (c) and a targeted FGSM attack (d) as well as by SSMM (f), DNNM (g) and patch attack

(h).

Adversarial Attacks. In many defense methods in

semantic segmentation, the adapted FGSM and I-

FGSM attack are employed (Arnab et al., 2018; Bar

et al., 2021; He et al., 2019; Klingner et al., 2020;

Xu et al., 2021). Thus, we study both attacks in

our experiments with the parameter setting presented

in (Kurakin et al., 2017). The magnitude of pertur-

bation is given by ε = {4,8,16}, the step size by

α = 1 and the number of iterations is computed as

n = min{ε + 4,⌊1.25ε⌋}. We denote the attack by

FGSM

and the iterative one by I-FGSM

, # ∈ { ,ll},

where the superscript discriminates between untar-

geted and targeted (here ll refers to ”least likely”). For

the re-implementation of SSMM and DNNM (Met-

zen et al., 2017), we use the parameters ε = 0.1 · 255,

α = 0.01 · 255, n = 60 and τ = 0.75. For SSMM, the

target image is chosen randomly for both datasets and

for DNNM, the class person is to be deleted for the

Cityscapes dataset. For the VOC dataset, the DNNM

attack makes no sense, since on the input images of-

ten only one object or several objects of the same class

are contained. For our experiments, we use a model

zoo

where we add the implementations of the adver-

sarial attacks FGSM, I-FGSM, SSMM and DNNM.

As we also use the pre-trained networks provided in

the repository, we run experiments for the Cityscapes

dataset on both models, DeepLabv3+ and HRNet, and

for VOC on the DeepLabv3+ network.

For the patch attack introduced in (Nesti et al.,

2022), we use the provided code with default param-

eters and consider two different segmentation mod-

els, BiSeNet and DDRNet, applied to the one tested

real world dataset (Cityscapes). Since we use the

cross-attack procedure (logistic regression) as detec-

tion model, i.e., we train the classiﬁer on clean and

perturbed data attacked by an attack other than the

patch, we use the data obtained from the DeepLabv3+

https://github.com/LikeLy-Journey/SegmenTron

to train the detector and test on the DDRNet. For

the HRNet and the BiSeNet we proceed analogously,

since in each case the prediction performance (in

terms of mIoU) of both networks is similar.

As the Cityscapes dataset provides high-

resolution images (1024 × 2048 pixels) which require

a great amount of memory to run a full backward

pass for the computation of adversarial samples, we

re-scale image size for this dataset to 512 × 1024

when evaluating. In Figure 3, a selection of these

attacks is shown for the Cityscapes dataset and the

DeepLabv3+ network (or DDRNet for the patch

attack).

Evaluation Metrics. Our detection models provide

per image a probability p(x) of being clean (and

not perturbed). The image is then classiﬁed as at-

tacked if the probability exceeds a threshold κ for

which we tested 40 different values equally spaced

in [0,1]. The ﬁrst evaluation metric we use is the

averaged detection accuracy (ADA) which is de-

ﬁned as the proportion of images that are classi-

ﬁed correctly. As this metric depends on a thresh-

old κ, we report the optimal ADA score obtained

by ADA

∗

= max

κ∈[0,1]

ADA(κ). Secondly, we com-

pute the area under the receiver operating character-

istic curve (AUROC) to obtain a threshold indepen-

dent metric. Lastly, we consider the true positive rate

while ﬁxing the false positive rate on clean images to

5% (TPR

5.2 Numerical Results

In the following, we study the attack success perfor-

mance and evaluate our adversarial attack detection

performance.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Table 1: APSR results for the semantic segmentation pre-

dictions on clean data.

DeepLabv3+ DDRNet HRNet BiSeNet

Cityscapes 6.84 4.00 5.48 5.26

VOC 2.92 - - -

Figure 4: APSR results for the Cityscapes dataset and both

networks perturbed by different attacks.

Attack Performance. In order to access the per-

formance, i.e., the strength, of the attack generating

methods, we consider the attack pixel success rate

(APSR) (Rony et al., 2022) deﬁned by

APSR =

|I |

∑

(i, j)∈I

argmax

y∈C

f (x

adv

;w)

i j

̸= y

i j

, (11)

where y

i j

denotes the true class of the pixel at loca-

tion (i, j). Note, this metric is the opposite of the ac-

curacy, as it focuses on falsely (and not correct) pre-

dicted pixels. If we replace x

adv

in eq. (11) by the

input image x, we obtain a measure how well the se-

mantic segmentation model performs on clean data.

The values for this measure for the different networks

and both datasets are given in Table 1. As expected,

we observe small APSR scores: all values are below

6.84%.

The APSR results for various attacks on the

Cityscapes dataset are shown in Figure 4. For all vari-

ations of the FGSM attack (untargeted vs. targeted,

non-iterative vs. iterative) the APSR increases with

larger magnitude of perturbation strength. Moreover,

targeted attacks lead to larger ASPR values than their

untargeted counterpart. The I-FGSM outperforms the

FGSM due to the iterative procedure of individual

perturbation steps. For the SSMM attack, the target

is a randomly chosen image from the dataset. The ex-

amples shown in Figure 3 (a) and (f) indicate, that

the correct and target classes of the clean and the

perturbed image coincide in several areas, such as

the street or the sky reﬂecting the nature of street

scenes. This may explain the relatively low ASPR

values around 50%. For the DNNM attack, the APSR

Table 2: APSR results for the VOC dataset and the

DeepLabv3+ network perturbed by different attacks.

FGSM

I-FGSM

SSMM

17.81 19.68 56.73 64.30 62.99

FGSM

I-FGSM

17.93 19.66 74.94 82.51

FGSM

I-FGSM

16.67 17.65 80.98 91.68

Figure 5: Detection performance results for the VOC

dataset and the DeepLabv3+ network.

scores are comparatively small since most parts of

the images are not perturbed but only one class is

to be deleted. We observe that the performance of

the patch attack has more or less impact depending

on the model. Comparing the two models, we ﬁnd

that DeepLabv3+ is more robust against adversarial

attacks, as the APSR values are mostly smaller than

those of the HRNet network.

The results for the VOC dataset given in Ta-

ble 2 are qualitatively similar to the ﬁndings for the

Cityscapes dataset. However, the outcome for the

(targeted as well as untargeted) non-iterative FGSM

attack differs, i.e., the APSR scores are not increas-

ing with higher noise but stay at similar values. In

summary, for both datasets and the different network

architectures, most attacks achieve high APSR values

and greatly alter the prediction. Thus, the detection of

such attacks is extremely important.

Evaluation of Our Detection Method. The de-

fense approaches described above are created for and

tested only on a speciﬁc type of attacks. The sole pre-

sented detection approach (Xiao et al., 2018) is only

tested on stationary segmentation mask methods and

is computationally expensive due to the requirement

of multiple runs of the network. For this reason, nei-

ther this detection approach nor the defense methods

Uncertainty-Based Detection of Adversarial Attacks in Semantic Segmentation

Figure 6: Detection performance results for DeepLabv3+ (left) and the HRNet (right) trained on the Cityscapes dataset.

can be considered as suitable baselines.

In the following, we denote the single feature,

mean entropy based classiﬁcation by Entropy, the or-

dinary one-class support vector machine by OCSVM,

the outlier detection method proposed in (Rousseeuw

and Driessen, 1999) by Ellipse and the logistic re-

gression model by CrossA. For training the regression

model, we chose data resulting from the iterative tar-

geted FGSM attack with a magnitude of noise of 2

as perturbed data (assuming that it might be advan-

tageous to have malicious data stemming from an at-

tack method with small perturbation strength, which

makes the attack harder to detect). The resulting clas-

siﬁer is then evaluated against all other attacks. Note,

the training of the detection models is light-weight

and no knowledge of the process for generating ad-

versarial examples is needed. For evaluating our de-

tection models, we use cross-validation with 5 runs.

The detection results for the VOC dataset are

shown in Figure 5 and for the Cityscapes dataset in

Figure 6. We observe comparatively lower detection

performance results for smaller perturbation magni-

tudes for the untargeted FGSM attacks which may

be explained by the fact that weaker attacks lead to a

change of prediction for a lower number of pixels and

are thus more difﬁcult to detect. Targeted attacks are

better detected than untargeted attacks (with ADA

∗

values over 80% for all models). This could be due

to the procedure of picking the most unlikely class as

target which results in larger changes of the uncer-

tainty measures used as features. The detectors per-

form not that well on the adversarial examples result-

ing from the untargeted I-FGSM despite the strength

of the attack. An inspection of these examples shows

that during segmentation only a few classes are pre-

dicted (see Figure 3 (c)) with often low uncertainty for

large connected components which complicates the

distinction between clean and perturbed data. Inter-

esting are the high detection results of up to 96.89%

ADA

∗

values (obtained by CrossA for the Cityscapes

dataset and the DeepLabv3+ network) for the DNNM

attack as the perturbation targets only a few pixels,

(APSR values around 15%) and is therefore difﬁcult

to detect. For the patch attack, it is noticeable that

the detection performance for the DeepLabv3+ net-

work on the Cityscapes dataset is low compared to the

HRNet which is explained by the higher disturbance

power of this attack on the HRNet, reﬂected in 36.11

percentage points higher APSR values. In general,

the detection capability for attacks on the HRNet is

stronger than for the DeepLabv3+, since the HRNet is

more easily to attack (see higher APSR values in Fig-

ure 4). Generally, our experiments highlight the high

potential of investigating uncertainty information for

successfully detecting adversarial segmentation at-

tacks. Already the basic method Entropy leads to

high accuracies often outperforming OCSVM and El-

lipse. However, across different attacks and datasets,

CrossA achieves ADA

∗

values of up to 100%. Thus,

our light-weight and uncertainty-based detection ap-

proach should be considered as baseline for future

methods.

6 CONCLUSION

In this work, we introduced a new uncertainty-based

approach for the detection of adversarial attacks on

semantic image segmentation tasks. We observed

that uncertainty information as given by the entropy

behaves differently on clean and perturbed images

and used this property to distinguish between the two

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

cases with very basic classiﬁcation models. Our ap-

proach works in a light-weight and post-processing

manner, i.e., we do not modify the model nor need

knowledge of the process used by the attacker for gen-

erating adversarial examples. We achieve averaged

detection accuracy values of up to 100% for differ-

ent network architectures and datasets. Moreover, it

has to be pointed out, that our proposed detection ap-

proach is the ﬁrst that was not designed for a speciﬁc

adversarial attack, but has a high detection capability

across multiple types. Given the high detection accu-

racy and the simplicity of the proposed approach, we

are convinced, that it should serve as simple baseline

for more elaborated but computationally more expen-

sive approaches developed in future.

ACKNOWLEDGEMENTS

This work is supported by the Ministry of Culture

and Science of the German state of North Rhine-

Westphalia as part of the KI-Starter research fund-

ing program and by the Deutsche Forschungsgemein-

schaft (DFG, German Research Foundation) under

Germany’s Excellence Strategy – EXC-2092 CASA

– 390781972.

REFERENCES

Agnihotri, S. and Keuper, M. (2023). Cospgd: a uniﬁed

white-box adversarial attack for pixel-wise prediction

tasks. 2, 4

Arnab, A., Miksik, O., and Torr, P. (2018). On the robust-

ness of semantic segmentation models to adversarial

attacks. In IEEE/CVF Conference on Computer Vi-

sion and Pattern Recognition (CVPR). 2, 3, 6

Bar, A., Lohdeﬁnk, J., Kapoor, N., Varghese, S., Huger, F.,

Schlicht, P., and Fingscheidt, T. (2021). The vulner-

ability of semantic segmentation networks to adver-

sarial attacks in autonomous driving: Enhancing ex-

tensive environment sensing. IEEE Signal Processing

Magazine. 1, 3, 6

Bryniarski, O., Hingun, N., Pachuca, P., Wang, V., and Car-

lini, N. (2022). Evading adversarial example detection

defenses with orthogonal projected gradient descent.

In International Conference on Learning Representa-

tions (ICLR). 4

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and

Adam, H. (2018). Encoder-decoder with atrous sep-

arable convolution for semantic image segmentation.

In European Conference on Computer Vision (ECCV).

1, 2, 5

Cho, S., Jun, T. J., Oh, B., and Kim, D. (2020). Dapas

: Denoising autoencoder to prevent adversarial attack

in semantic segmentation. In International Joint Con-

ference on Neural Network (IJCNN). 3

Chollet, F. (2017). Xception: Deep learning with depthwise

separable convolutions. IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR). 5

Cisse, M., Adi, Y., Neverova, N., and Keshet, J. (2017).

Houdini: Fooling deep structured prediction models.

In Conference on Neural Information Processing Sys-

tems (NeurIPS). 2, 4

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,

M., Benenson, R., Franke, U., Roth, S., and Schiele,

B. (2016). The cityscapes dataset for semantic urban

scene understanding. In IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR). 2, 5

Everingham, M., Van Gool, L., Williams, C. K. I.,

Winn, J., and Zisserman, A. (2012). The

PASCAL Visual Object Classes Challenge

2012 (VOC2012) Results. http://www.pascal-

network.org/challenges/VOC/voc2012/workshop/

index.html. 2, 5

Feinman, R., Curtin, R. R., Shintre, S., and Gardner, A. B.

(2017). Detecting adversarial samples from artifacts.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-

plaining and harnessing adversarial examples. In Ben-

gio, Y. and LeCun, Y., editors, International Confer-

ence on Learning Representations (ICLR). 2, 3

Gu, J., Zhao, H., Tresp, V., and Torr, P. (2022). Segpgd: An

effective and efﬁcient adversarial attack for evaluating

and boosting segmentation robustness. In European

Conference on Computer Vision (ECCV). 2, 4

He, X., Yang, S., Li, G., Li, H., Chang, H., and Yu, Y.

(2019). Non-local context encoder: Robust biomedi-

cal image segmentation against adversarial attacks. In

AAAI Conference on Artiﬁcial Intelligence. 3, 6

Hendrycks, D. and Gimpel, K. (2016). A baseline for de-

tecting misclassiﬁed and out-of-distribution examples

in neural networks. 3

Khamaiseh, S. Y., Bagagem, D., Al-Alaj, A., Mancino,

M., and Alomari, H. W. (2022). Adversarial deep

learning: A survey on adversarial attacks and defense

mechanisms on image classiﬁcation. IEEE Access. 2

Klingner, M., B

ar, A., and Fingscheidt, T. (2020). Im-

proved noise and attack robustness for semantic seg-

mentation by using multi-task training with self-

supervised depth estimation. In IEEE/CVF Confer-

ence on Computer Vision and Pattern Recognition

Workshop (CVPRW). 2, 3, 6

Krizhevsky, A. (2009). Learning multiple layers of features

from tiny images. 2

Kurakin, A., Goodfellow, I. J., and Bengio, S. (2017). Ad-

versarial machine learning at scale. In International

Conference on Learning Representations (ICLR). 2,

3, 6

LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit

database. 2

Maag, K., Chan, R., Uhlemeyer, S., Kowol, K., and

Gottschalk, H. (2022). Two video data sets for track-

ing and retrieval of out of distribution objects. In Asian

Conference on Computer Vision (ACCV), pages 3776–

3794. 3

Uncertainty-Based Detection of Adversarial Attacks in Semantic Segmentation

Maag, K. and Rottmann, M. (2023). False negative re-

duction in semantic segmentation under domain shift

using depth estimation. In International Joint Con-

ference on Computer Vision, Imaging and Computer

Graphics Theory and Applications. SCITEPRESS -

Science and Technology Publications. 3

Maag, K., Rottmann, M., and Gottschalk, H. (2020). Time-

dynamic estimates of the reliability of deep semantic

segmentation networks. In IEEE International Con-

ference on Tools with Artiﬁcial Intelligence (ICTAI).

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and

Vladu, A. (2018). Towards deep learning models re-

sistant to adversarial attacks. In International Confer-

ence on Learning Representations, (ICLR). 4

Metzen, J. H., Kumar, M. C., Brox, T., and Fischer, V.

(2017). Universal adversarial perturbations against

semantic image segmentation. In IEEE International

Conference on Computer Vision (ICCV). 2, 4, 6

Michel, A. and Ewetz, R. (2022). Gradient-based adver-

sarial attack detection via deep feature extraction. In

SoutheastCon. 2

Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P.

(2016). Deepfool: A simple and accurate method to

fool deep neural networks. In IEEE/CVF Conference

on Computer Vision and Pattern Recognition (CVPR).

Nakka, K. K. and Salzmann, M. (2020). Indirect local

attacks for context-aware semantic segmentation net-

works. In European Conference on Computer Vision

(ECCV). 2, 4

Nesti, F., Rossolini, G., Nair, S., Biondi, A., and Buttazzo,

G. C. (2022). Evaluating the robustness of seman-

tic segmentation for autonomous driving against real-

world adversarial patch attacks. In IEEE/CVF Win-

ter Conference on Applications of Computer Vision

(WACV). 2, 4, 6

Pan, H., Hong, Y., Sun, W., and Jia, Y. (2022). Deep dual-

resolution networks for real-time and accurate seman-

tic segmentation of trafﬁc scenes. IEEE Transactions

on Intelligent Transportation Systems. 1, 2, 5

Rony, J., Pesquet, J.-C., and Ayed, I. B. (2022). Proxi-

mal splitting adversarial attacks for semantic segmen-

tation. 2, 4, 7

Rousseeuw, P. J. and Driessen, K. V. (1999). A fast algo-

rithm for the minimum covariance determinant esti-

mator. Technometrics. 5, 8

Sch

olkopf, B., Williamson, R. C., Smola, A., Shawe-Taylor,

J., and Platt, J. (1999). Support vector method for

novelty detection. In Advances in Neural Information

Processing Systems. MIT Press. 5

Tibshirani, R. (1996). Regression shrinkage and selection

via the lasso. Journal of the Royal Statistical Society:

Series B. 5

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,

Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and

Xiao, B. (2021). Deep high-resolution representation

learning for visual recognition. IEEE Transactions on

Pattern Analysis and Machine Intelligence. 5

Weerasinghe, P. S., Erfani, S. M., Alpcan, T., Leckie, C.,

and Kuijper, M. (2018). Unsupervised adversarial

anomaly detection using one-class support vector ma-

chines. International Symposium on Mathematical

Theory of Networks and Systems. 2

Xiao, C., Deng, R., Li, B., Yu, F., Liu, M., and Song, D.

(2018). Characterizing adversarial examples based on

spatial consistency information for semantic segmen-

tation. In European Conference on Computer Vision

(ECCV). 2, 3, 7

Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., and Yuille,

A. L. (2017). Adversarial examples for semantic seg-

mentation and object detection. IEEE International

Conference on Computer Vision (ICCV). 2, 4

Xu, X., Zhao, H., and Jia, J. (2021). Dynamic divide-

and-conquer adversarial training for robust semantic

segmentation. In IEEE International Conference on

Computer Vision (ICCV). 3, 6

Yatsura, M., Sakmann, K., Hua, N. G., Hein, M., and Met-

zen, J. H. (2022). Certiﬁed defences against adversar-

ial patch attacks on semantic segmentation. 2, 3

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N.

(2018). Bisenet: Bilateral segmentation network for

real-time semantic segmentation. In European Con-

ference on Computer Vision. 5

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications