Combining Two Adversarial Attacks Against Person Re-Identiﬁcation

Systems

Eduardo de O. Andrade

1 a

, Igor Garcia Ballhausen Sampaio

1 b

, Joris Gu

erin

2 c

and Jos

e Viterbo

1 d

Computing Institute, Fluminense Federal University, Niter

oi, Brazil

LAAS-CNRS, Toulouse University, Midi-Pyr

ees, France

Keywords:

Person Re-Identiﬁcation, Adversarial Attacks, Deep Learning.

Abstract:

The ﬁeld of Person Re-Identiﬁcation (Re-ID) has received much attention recently, driven by the progress of

deep neural networks, especially for image classiﬁcation. The problem of Re-ID consists in identifying indi-

viduals through images captured by surveillance cameras in different scenarios. Governments and companies

are investing a lot of time and money in Re-ID systems for use in public safety and identifying missing per-

sons. However, several challenges remain for successfully implementing Re-ID, such as occlusions and light

reﬂections in people’s images. In this work, we focus on adversarial attacks on Re-ID systems, which can be

a critical threat to the performance of these systems. In particular, we explore the combination of adversarial

attacks against Re-ID models, trying to strengthen the decrease in the classiﬁcation results. We conduct our

experiments on three datasets: DukeMTMC-ReID, Market-1501, and CUHK03. We combine the use of two

types of adversarial attacks, P-FGSM and Deep Mis-Ranking, applied to two popular Re-ID models: IDE

(ResNet-50) and AlignedReID. The best result demonstrates a decrease of 3.36% in the Rank-10 metric for

AlignedReID applied to CUHK03. We also try to use Dropout during the inference as a defense method.

1 INTRODUCTION

The amount of surveillance cameras is rising fast and

could reach a market of 19.5 billion euros in the year

2023 (Khan et al., 2020). This market is related to

the concept of smart cities, which aim to address sus-

tainability themes, seeking to improve the manage-

ment of risks in urban environments. As a result, the

number of systems developed to re-identify people

has increased rapidly in recent years, driven by the

progress of deep neural networks (Luo et al., 2019;

Kurnianggoro and Jo, 2017). These systems are in

high demand by companies and governments to ad-

dress problems such as public safety, tracking peo-

ple in universities and streets, behavior analysis, and

even surveillance (Islam, 2020). For example, this

approach could help countermeasure against a terror-

ist offensive (Shah et al., 2016), such as the 9/11 at-

tack

. However, all this technological insertion ends

https://orcid.org/0000-0002-5978-9718

https://orcid.org/0000-0002-1890-1451

https://orcid.org/0000-0002-8048-8960

https://orcid.org/0000-0002-0339-6624

https://www.mprnews.org/story/2021/09/10/npr-911-

travel-timeline-tsa/

up creating a scenario prone to software errors, hacks,

malware, and other criminal activities (Kitchin and

Dodge, 2019).

Even with the many hours of video generated by

an immense number of cameras, we still need many

human operators responsible for verifying incidents

through observation on many screens. Automatic

analysis of this data can considerably help human

operators and improve the efﬁciency of these sys-

tems (Sumari et al., 2020). The research ﬁeld study-

ing this problem is called Person Re-Identiﬁcation

(Re-ID). It aims to distinguish speciﬁc individu-

als through images captured by surveillance cam-

eras in different scenarios in the same environ-

ment (Galanakis et al., 2019), such as an airport.

Thanks to the large amount of data generated for Re-

ID in recent years, there has been an exponential in-

crease in publications about Re-ID systems, mostly

considering deep learning solutions. For an overview

of popular approaches for Re-ID, we refer the reader

to the following survey (Yaghoubi et al., 2021).

Despite the increased performance of Re-ID mod-

els in the last decade, they are vulnerable to attacks

called adversarial examples (Bouniot et al., 2020).

This attack can confuse deep neural networks, mak-

Andrade, E., Sampaio, I., Guérin, J. and Viterbo, J.

Combining Two Adversarial Attacks Against Person Re-Identiﬁcation Systems.

DOI: 10.5220/0011623800003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

437-444

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

437

ing the classiﬁcation models return erroneous predic-

tions with high conﬁdence (Goodfellow et al., 2014).

An adversarial example attack on a Re-ID model can

be a severe risk, such as a strike against an object de-

tection system

. Finding efﬁcient attacks and coun-

termeasures to mitigate them are active ﬁelds of re-

search (Chen et al., 2020). We present a literature

review about adversarial attacks in Section 2.

The main objective of our work is to strengthen

the degeneration of the classiﬁcation accuracy of a

Re-ID model by combining two different types of ad-

versarial attacks. In addition, this paper also uses a

defense method for Re-ID’s hardening. The attacks

implemented and combined are 1. a modiﬁcation of

the Fast Gradient Signed Method (Goodfellow et al.,

2014), known as Private Fast Gradient Signed Method

(P-FGSM) (Li et al., 2019), and 2. a state-of-the-art

method for Re-ID, called Deep Mis-Ranking (Wang

et al., 2020). For the defense, we try to apply the

method from (Sheikholeslami et al., 2019) to Re-ID,

which consists in using the Dropout layers during the

inference phase. As far as we know, the defense

method and one of the attacks have never been used

for Re-ID before.

The experiments are run using three known

datasets: Duke Multi-Tracking Multi-Camera Re-

Identiﬁcation (DukeMTMC-ReID) (Ristani et al.,

2016), Market-1501 (Zheng et al., 2015) and Chinese

University of Hong Kong 03 (CUHK03) (Li et al.,

2014). For this work, we have the implementation of

two models of Re-ID systems: AlignedReID (Zhang

et al., 2017) and another system with Identiﬁcation-

discriminative Embedding (IDE) (Zheng et al., 2016)

based on the known deep Residual Neural Network,

ResNet-50 (He et al., 2016).

The structure of this work is in ﬁve sections. Sec-

tion 2 starts with a discussion of the different adver-

sarial attack approaches for Re-ID present in the lit-

erature. In Section 3 we present the details of the two

attacks used in this work. Next, Section 4 presents the

experiments performed on the implemented models

and discusses the results obtained. Finally, Section 5

concludes this paper, describing some limitations and

possible future work for this work.

2 RELATED WORK

In 2014, there was an extensive study about adver-

sarial examples and their effects (Goodfellow et al.,

https://www.biometricupdate.com/201904/novel-

techniques-that-can-trick-object-detection-systems-

sounds-familiar-alarm

2014). The authors observed that more linear mod-

els are prone to fail under attacks. The direction of

perturbations was the most crucial feature in drasti-

cally altering neural network predictions. The authors

also showed that adversarial examples could gener-

alize across different models. Perturbations that are

more aligned with the weight vectors of the models,

learning similar functions, and training for the same

tasks, facilitate generalization. Furthermore, the neu-

ral network models that are easy to optimize were

easy to confuse. In 2018, another paper reviewed at-

tack and defense approaches for deep learning mod-

els (Yuan et al., 2019), applied to tasks such as image

classiﬁcation, image segmentation, and object detec-

tion.

The Fast Gradient Signed Method (FGSM) ap-

proach emerged in 2014 and demonstrated how ef-

fective a simple, low-computation attack could be. It

consists in adding imperceptible perturbations whose

direction is the same as the gradient of the cost func-

tion concerning the data. In 2019, a variation of

FGSM called Private FGSM (P-FGSM) achieved an

excellent trade-off between the drop in classiﬁcation

accuracy and distortion of private classes (Li et al.,

2019). The real purpose of class privacy is to protect

sensitive information from images when there is an

inference from a classiﬁer. This information may in-

clude the presence of people, faces, and other content

that we cannot violate. Using a ResNet-50 model and

the Places365-Standard (Zhou et al., 2017) dataset,

the P-FGSM authors were able to fool the classiﬁer

94.40% of the times in the top-5 classes with only

a slight average reduction, considering three image

quality measures. As far as we know, no other work

in the literature used the P-FGSM in Re-ID.

The Opposite-Direction Feature Attack (ODFA)

paper (Zheng et al., 2018), implemented in 2018, used

a Dense Convolution Network (DenseNet) with a

depth of 121 as the victim model and another ResNet-

50 model for the generation of adversarial queries.

Three datasets were part of the experiments: Market-

1501, Caltech University Birds-200-2011 (CUB-200-

2011) (Wah et al., 2011) and CIFAR-10 (Krizhevsky

et al., 2009). The Market-1501 and CUB-200-2011

had better results than CIFAR-10 as ODFA handled

the recovery task better. For Market-1501, the mean

Average Precision (mAP) metric without the attack

in a speciﬁc victim model reached an accuracy of

77.14% (Sun et al., 2018), while the attack decreased

the accuracy to 21.52% using the same model.

Another attack from 2019 has two different pro-

posals for dealing with adversarial patterns (AdvPat-

tern): EvdAttack and ImpAttack (Wang et al., 2019).

The authors used the Market-1501 and another pro-

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

438

prietary dataset to craft transformable patterns into

adversarial clothing. The name of this proprietary

dataset is Person Re-Identiﬁcation in Campus Streets

(PRCS). Two models were part of the experiments:

a Siamese Network (A) (Zheng et al., 2017) and a

ResNet-50 capable of learning the discriminative em-

beddings of identities (B) (Zheng et al., 2016). For

Market-1501, The mAP metric values before the ap-

plication of AdvPattern are 62.7% (model A) and

57.3% (model B). Considering the dataset generated

with EvdAttack, the authors achieved 4.4% in model

A and 4.5% in model B. Using ImpAttack, the accu-

racy decreased to 9.20% in model A and 10.9% in

model B. The adoption of PRCS with the AdvPat-

tern approach differs from the attacks addressed in our

work.

In 2020, there was an opposite approach to ODFA

with the implementation of Self Metric Attack (SMA)

and Furthest-Negative Attack (FNA) (Bouniot et al.,

2020). The authors performed both attacks on

Market-1501 and DukeMTMC-ReID. They adopted

ResNet-50 architectures using two distinct types of

loss minimization: the cross-entropy (C) (Xiong

et al., 2019) and the triplet loss (T) (Hermans et al.,

2017; Schroff et al., 2015). The accuracy results

achieved with the mAP metric for Market-1501 with-

out the attacks were 67.22% for T and 77.53% for C.

Using the SMA attack, there was a decrease in ac-

curacy to 0.05% for T and 0.26% for C. The FNA

obtained 0.05% for T and 0.07% for C. For the

DukeMTMC-ReID dataset, the mAP results achieved

without the attacks were 60.33% for T and 67.64% for

C. Again, with the SMA attack, there was a decrease

to 0.05% for T and 0.32% for C. The FNA obtained

0.04% for T and 0.06% for C.

The most important paper regarding adversarial

attack approaches for this work appeared in mid-

2020 (Wang et al., 2020). The Deep Mis-Ranking

attack is responsible for most state-of-the-art results

compared to our work. It is presented in details in

Section 3.1. However, some results obtained in our

work are close to but not the same as those described

in the paper. Some of the problems in implementing

Deep Mis-Ranking included code errors to be cor-

rected. The experiments were not perfectly repro-

ducible, and results differ slightly from those initially

presented in the paper, even after corrections and us-

ing models with pre-trained weights.

3 COMBINED ATTACK

METHODS

There is little attention to the security risks and the

impact of the attacks on Re-ID systems. This section

explains the approaches used in this work: Deep Mis-

Ranking, P-FGSM, and their combination.

3.1 Deep Mis-Ranking

The Deep Mis-Ranking is a formulation to disrupt

the ranking prediction of Re-ID models. The main

characteristic of Deep Mis-Ranking is that it has high

transferability, i.e., if we implement it for dataset A, it

can generalize to another dataset B. Other characteris-

tics of Deep Mis-Ranking, include the controllability

and imperceptibility of the attack (Wang et al., 2020).

Figure 1 shows the visual representation of the

framework. The generator G produces the prelimi-

nary perturbations P

′

that, multiplied with the mask

M , originate the disturbances P for each input im-

age I. The generator G is a ResNet-50 architec-

ture, and it is trained jointly with the discriminator

D to form the general Generative Adversarial Net-

work (GAN) structure of the framework. We com-

monly use this unsupervised neural network for image

generation (Konidaris et al., 2019). L

GAN

represents

the GAN loss, whereas L

adv

etri

, L

adv xent

, and L

V P

correspond to mis-ranking, misclassiﬁcation, and per-

ception losses, respectively. The T represents the at-

tacked Re-ID system and receives the adversarial im-

age

I as input.

Figure 1: The framework structure of the Deep Mis-

Ranking attack. The main objective of the attack is to max-

imize the distance between the samples from the same cat-

egory (pull) and minimize the distance between the sam-

ples from different categories (push). Source: (Wang et al.,

2020).

Looking more closely at T , the inputs and out-

puts follow the scheme illustrated in Figure 2. We

aim to minimize the distance of each pair of sam-

ples from different categories (e.g., (

, I), ∀I ∈ {I

})

while maximizing the distance of each pair of sam-

ples from the same category (e.g., (

, I), ∀I ∈ {I

})

Combining Two Adversarial Attacks Against Person Re-Identiﬁcation Systems

439

to achieve a successful attack.

Figure 2: The scheme of how the Deep Mis-Ranking attack

occurs in a Re-ID system T concerning pairs of samples

and their distances. Source: (Wang et al., 2020).

The Equation 1 corresponds to L

GAN

. While the

D discriminator tries to differentiate the real images

from the adversarial ones, the G generator tries to

produce the perturbations in the input images. The

expected value E

)

represents the expected con-

ditional of logD

1,2,3

, I

) given I

and I

in the

form E

X,Y

[Y ] = E

[Y |X]].

GAN

= E

)

[logD

1,2,3

, I

)]+

[log(1 − D

1,2,3

(I,

I))]

(1)

The ﬁrst loss related to a Re-ID system T is

adv etri

, represented by Equation 2, where the ex-

pression [x]

is equal to max(0, x). This mis-ranking

loss function follows the form of a triplet loss (Ding

et al., 2015), aiming to minimize the distance of mis-

matched pair, while maximizing the distance of the

matched pair. The letter K represents the set of peo-

ple’s identities. Meanwhile, C

is the set of sample

numbers taken from the k-th identity of a person and

are the c-th images of the k identity in a mini-batch.

The L2 norm used as a distance metric is represented

by || · ||

and ∆ is a margin threshold.

adv

etri

∑

k=1

∑

c=1

[ max

j̸=k

j=1...K

=1...C

||T (

) − T (

)||

−

min

=1...C

||T (

) − T (

)||

+ ∆]

(2)

Another loss present in the framework is L

adv xent

for non-targeted attack (Equation 3), where S de-

notes the softmax function and δ is the Dirac delta.

The term υ is the smoothing regularization, where

υ = [

K−1

, ..., 0, ...,

K−1

], where υ is always equal to

K−1

except in the case where k is the ground-truth

ID (K is the set including each k-th person ID). The

arg min preceded by I represents the case in which

we have the return of the indices of the minimum

values of an output probability vector, indicating the

least likely class (similar to the numpy.argmin func-

tion present in the NumPy library of Python).

adv xent

= −

∑

k=1

S(T (

I )

((1 − δ)I

arg minT (I )

+ δ

)

(3)

In order to improve the visual quality for T and

prevent the attack from being detected by humans, we

have the Equation 4 corresponding to the perception

loss L

V P

. The formulation of this loss function origi-

nates from an approach to the structural similarity im-

age quality paradigm (Wang et al., 2003). The com-

parison measures of contrast (c

) and structure (s

)

on the jth scale are calculated by c

(I ,

I ) =

2σ

+σ

and s

(I ,

I ) =

⟨⊣⊔I

, where σ

is the standard

deviation, σ

is the variance and σ

of covariance.

The level of the scales is represented by L, where

, β

and γ

are the factors that help to re-weight

the contribution of each component. Finally, we have

the luminosity measure (l) calculated by l

(I ,

I ) =

2µ

+µ

, where µ

is the mean form.

V P

= [l

(I ,

I )]

∏

j=1

(I ,

I )]

(I ,

I )]

(4)

The M mask determines the number of target pix-

els to attack. After multiplying the preliminary pertur-

bation P

′

with the mask M , we have the ﬁnal pertur-

bation P with a controlled number of pixels enabled

to maintain discretion from the attack. The function

Gumbel softmax (Jang et al., 2016) is responsible for

choosing pixels from all possibilities. The general-

ization capacity of Deep Mis-Ranking is its main ad-

vantage. It is possible to use it with different Re-ID

systems and efﬁciently in black-box scenarios.

3.2 Private Fast Gradient Signed

Method

The design of the P-FGSM aims to “protect” the data

of an image through directed distortions that make

it difﬁcult to infer a classiﬁer. The purpose of this

approach is to maintain usefulness for social media

users. P-FGSM is based on the FGSM attack already

used in Re-ID and includes a limitation on the prob-

ability that automatic inference can expose the true

class of a distorted image. This limitation may in-

clude even more disturbances that mislead models (Li

et al., 2019).

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

440

Figure 3: The two images on the left side represent the true

class. On the right side, we have the distortions that make

them human-imperceptible and give a high misclassiﬁcation

rate to the models. Source: (Li et al., 2019).

Figure 3 shows an example of two images after ex-

ecuting P-FGSM. The most signiﬁcant difference be-

tween FGSM and other variants of this attack is irre-

versibility, i.e., the random selection of the target class

among the subset of classes that do not contain the

protected class. The target class and other classes can

denote people’s different identities in a Re-ID dataset.

The class adapted as an adversarial example

works as a function of the classiﬁcation probability

vector p. So, p being equal to the vector of features

for classiﬁcation, we have p’ which contains the el-

ements of p in descending order, p’ = (p

′

, ..., p

′

where D represents the scene classes. Equation 5

corresponds to the random choice of

y from the sub-

set of classes whose cumulative probability exceeds

a threshold σ in the interval [0,1], in which R is the

function that randomly picks one class label y

from

the input set.

y = R

(

j−1

∑

i=1

p’

> σ

(5)

Lastly, in Equation 6, we have the generation of

the protected image ˙x = ˙x

for N iterations, starting

from ˙x

= x to a maximum number of iterations aim-

ing to increase the probability of predicting

y. We

can represent the cost function by J

, and it is used

in training to estimate the θ parameters of the classi-

ﬁer M. The ε represents the measure of the maximum

magnitude of the adversarial perturbation and ∇ is, in

this case, the gradient vector that is related to the im-

age x.

˙x

= ˙x

N−1

− εsign(∇

(θ, ˙x

N−1

y)) (6)

4 EXPERIMENTS

We used a computer with a 2.9 GHz Intel Xeon pro-

cessor and 16 GB 2400 MHz DDR4 of RAM for eval-

uation purposes using the GPU Nvidia Quadro P5000.

The datasets used where DukeMTMC-ReID (Ristani

et al., 2016), Market-1501 (Zheng et al., 2015), and

CUHK03 (Li et al., 2014). DukeMTMC-ReID had

16,522 images (bounding boxes) with 702 identities

for training and 19,889 images with 702 other identi-

ties for testing. We used 2228 bounding boxes to cor-

rectly identify the test identities considering the query

set. For Market-1501, the composition was 12,936

images of 751 identities for training and 19,281 im-

ages of 750 identities for testing. We selected 3368

bounding boxes for the query set. Finally, CUHK03

comprised 7365 images of 767 identities for train-

ing and 6732 images of 700 identities for testing,

and the query set contained 1400 images. It is im-

portant to mention that we neglected some “junk im-

ages” from Market-1501 in our testing set. These im-

ages were neither good nor bad considering the De-

formable Part Model (DPM) bounding boxes; they

could hinder more than help, making no difference in

the re-identiﬁcation process and accuracy. The DPM

is a pedestrian detector employed instead of the hand-

cropped boxes. We also did not use some images from

CUHK03 that we could not read from the MATLAB

ﬁle that composes the dataset.

The implemented models were IDE (ResNet-

50) (He et al., 2016) and AlignedReID (Zhang et al.,

2017). In addition to the Deep Mis-Ranking (Wang

et al., 2020) and P-FGSM (Li et al., 2019) that we use

as a combined attack against the models that charac-

terize the Re-ID systems, we also implemented the

Dropout at inference as a defense method. As far as

we know, this defense method was not implemented

yet for Re-ID systems.

Table 1 shows the results using the metrics mean

Average Precision (mAP), Rank-1 (R-1), Rank-5 (R-

5), and Rank-10 (R-10) for the experiments with and

without the combined attacks. Considering the com-

bined attacks, we implemented one attack after the

other, using P-FGSM ﬁrst. There was no signiﬁcant

difference in changing the order of the attacks, and

Combining Two Adversarial Attacks Against Person Re-Identiﬁcation Systems

441

Table 1: The results (in percent) with and without combined attacks for the chosen models and datasets.

Dataset Method

IDE AlignedReID

mAP R-1 R-5 R-10 mAP R-1 R-5 R-10

DukeMTMC-ReID

No Attacks

Deep Mis-Ranking

P-FGSM

Combined Attacks

58.14

4.68

56.06

4.71

76.53

5.16

75.45

5.25

86.76

8.71

86.54

9.87

89.99

11.00

90.08

11.98

69.75

3.12

67.05

3.09

82.14

3.23

81.82

3.77

91.65

6.01

91.16

7.00

94.43

7.99

93.85

8.75

Market-1501

No Attacks

Deep Mis-Ranking

P-FGSM

Combined Attacks

61.13

4.30

58.08

4.24

80.85

3.98

79.33

3.95

91.89

8.88

91.27

9.38

94.83

12.23

94.12

12.83

79.10

2.58

76.84

2.44

91.83

1.84

91.12

1.96

96.97

4.22

96.82

4.54

98.13

6.29

98.25

6.71

CUHK03

No Attacks

Deep Mis-Ranking

P-FGSM

Combined Attacks

24.54

0.77

19.53

0.57

24.93

0.29

21.14

0.07

43.29

1.00

35.79

0.79

51.79

1.71

45.64

1.71

59.65

2.19

50.05

1.76

61.50

1.36

53.50

1.14

79.43

2.50

75.07

1.93

85.79

4.36

82.21

3.36

we used the same pre-trained weights from the Deep

Mis-Ranking work

Looking again at Table 1, if we compare the re-

sults without attacks and with Deep Mis-Ranking

only, there are differences concerning the original pa-

per. For IDE (ResNet-50), for instance, the results

without attacks are equal in our experiments using the

exact implementation and different with Deep Mis-

Ranking. We used the same split for training and test

sets. So, this difference could be about the dataset and

its samples because it is no longer available on the of-

ﬁcial repository site or even something related to the

available pre-trained weights.

We tried to strengthen the combined attacks’ de-

crease in the classiﬁcation results. This decrease

occurred more times with the CUHK03 dataset, as

shown in bold at Table 1. However, if we look at all

the datasets and models, there are more times with a

slight increase in the considered metrics, but this rise

seems less critical than decreasing, even more, the re-

sults compared to the Deep Mis-Ranking attack.

Furthermore, we used Dropout during the infer-

ence as a defense method. We expected a good trade-

off for the Re-ID system against adversarial exam-

ples, changing some loss in identiﬁcation results with-

out attacks for a considerable gain in decreasing the

loss in identiﬁcation results, considering the attacks.

Nonetheless, unlike in other cases, we did not get sig-

niﬁcant results using that method for the Re-ID sys-

tems. We can see the results of this trial in Figure 4

for the mAP and Rank-10 metrics with the IDE model

and CUHK03 dataset.

The Dropout behavior in Figure 4 illustrates the

insigniﬁcant gain as a defense method. We used rate

https://github.com/whj363636/Adversarial-attack-on-

Person-ReID-With-Deep-Mis-Ranking

Figure 4: The Dropout rate & metric percentage with and

without attacks for the mAP and Rank-10 (R-10) metrics

with the IDE model and CUHK03 dataset.

values for Dropout from 0.025 to 0.75, and the best

increase was in the R-10 metric against the Deep Mis-

Ranking attack, with a rate of 0.25, improving from

1.71% to 2.73%. Meanwhile, we have a decrease in

the mAP metric using the same rate from 0.77% to

0.60%, which does not pay off. For the other model

and datasets, the results were not good enough too.

Finally, the Dropout during the inference consid-

ered all the hidden layers of the two models. The time

for running the experiments on the testing set for IDE

(ResNet-50) model and DukeMTMC-ReID dataset

was approximately 4 minutes. For the Market-1501

dataset, 4 minutes and 30 seconds. The CUHK03

dataset spent nearly 1 minute and 30 seconds of the

execution time. Considering the AlignedReID model

and DukeMTMC-ReID dataset, we ﬁnished in ap-

proximately 8 minutes. For the Market-1501 dataset,

it was 11 minutes. Lastly, the CUHK03 dataset spent

2 minutes and 30 seconds.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

442

5 CONCLUSION

In this work, we proposed the combination of two ad-

versarial attacks against Re-ID systems. As far as we

know, one of the attacks, the P-FGSM, was never im-

plemented before for this kind of system. More than

that, we used Dropout during the inference as a coun-

termeasure for the considered attacks.

We used three datasets and two models with the

best results and among the most used ones for the ex-

periments. Our tests aimed to increase the obstacles

even further for Re-ID with the combination of the

attack methods. These tests strengthen the decrease

in the classiﬁcation results in some cases. However,

the proposed countermeasure did not perform well

against the attacks.

There were limitations related to the accessible

data and unexpected results considering the already

available attack implementations. However, we pre-

tend to continue exploring this problem concerning

adversarial attacks and Re-ID systems. We also hope

that combining different attack and defense methods

can be an approach for our future work and other

works.

ACKNOWLEDGEMENT

The Coordenac¸

ao de Aperfeic¸oamento de Pessoal de

ıvel Superior (CAPES), Conselho Nacional de De-

senvolvimento Cient

ıﬁco e Tecnol

ogico (CNPq), and

PrimeUp Soluc¸

oes de TI LTDA ﬁnanced part of this

work.

REFERENCES

Bouniot, Q., Audigier, R., and Loesch, A. (2020). Vul-

nerability of person re-identiﬁcation models to metric

adversarial attacks. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recogni-

tion Workshops, pages 794–795.

Chen, K., Zhu, H., Yan, L., and Wang, J. (2020). A survey

on adversarial examples in deep learning. Journal on

Big Data, 2(2):71.

Ding, S., Lin, L., Wang, G., and Chao, H. (2015). Deep

feature learning with relative distance comparison

for person re-identiﬁcation. Pattern Recognition,

48(10):2993–3003.

Galanakis, G., Zabulis, X., and Argyros, A. A. (2019). Nov-

elty detection for person re-identiﬁcation in an open

world. In VISIGRAPP (5: VISAPP), pages 401–411.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Ex-

plaining and harnessing adversarial examples. arXiv

preprint arXiv:1412.6572.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Hermans, A., Beyer, L., and Leibe, B. (2017). In defense

of the triplet loss for person re-identiﬁcation. arXiv

preprint arXiv:1703.07737.

Islam, K. (2020). Person search: New paradigm of per-

son re-identiﬁcation: A survey and outlook of recent

works. Image and Vision Computing, 101:103970.

Jang, E., Gu, S., and Poole, B. (2016). Categorical repa-

rameterization with gumbel-softmax. arXiv preprint

arXiv:1611.01144.

Khan, P. W., Byun, Y.-C., and Park, N. (2020). A data ver-

iﬁcation system for cctv surveillance cameras using

blockchain technology in smart cities. Electronics,

9(3):484.

Kitchin, R. and Dodge, M. (2019). The (in) security of

smart cities: Vulnerabilities, risks, mitigation, and

prevention. Journal of Urban Technology, 26(2):47–

65.

Konidaris, F., Tagaris, T., Sdraka, M., and Stafylopatis, A.

(2019). Generative adversarial networks as an ad-

vanced data augmentation technique for mri data. In

VISIGRAPP (5: VISAPP), pages 48–59.

Krizhevsky, A., Hinton, G., et al. (2009). Learning mul-

tiple layers of features from tiny images. Master’s

thesis, Department of Computer Science, University

of Toronto.

Kurnianggoro, L. and Jo, K.-H. (2017). Identiﬁcation of

pedestrian attributes using deep network. In IECON

2017-43rd Annual Conference of the IEEE Industrial

Electronics Society, pages 8503–8507. IEEE.

Li, C. Y., Shamsabadi, A. S., Sanchez-Matilla, R., Mazzon,

R., and Cavallaro, A. (2019). Scene privacy protec-

tion. In ICASSP 2019-2019 IEEE International Con-

ference on Acoustics, Speech and Signal Processing

(ICASSP), pages 2502–2506. IEEE.

Li, W., Zhao, R., Xiao, T., and Wang, X. (2014). Deep-

reid: Deep ﬁlter pairing neural network for person

re-identiﬁcation. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 152–159.

Luo, H., Gu, Y., Liao, X., Lai, S., and Jiang, W. (2019).

Bag of tricks and a strong baseline for deep person re-

identiﬁcation. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

Workshops, pages 0–0.

Ristani, E., Solera, F., Zou, R., Cucchiara, R., and Tomasi,

C. (2016). Performance measures and a data set for

multi-target, multi-camera tracking. In European con-

ference on computer vision, pages 17–35. Springer.

Schroff, F., Kalenichenko, D., and Philbin, J. (2015).

Facenet: A uniﬁed embedding for face recognition

and clustering. In Proceedings of the IEEE conference

on computer vision and pattern recognition, pages

815–823.

Shah, J. H., Lin, M., and Chen, Z. (2016). Multi-camera

handoff for person re-identiﬁcation. Neurocomputing,

191:238–248.

Combining Two Adversarial Attacks Against Person Re-Identiﬁcation Systems

443

Sheikholeslami, F., Jain, S., and Giannakis, G. B.

(2019). Efﬁcient randomized defense against ad-

versarial attacks in deep convolutional neural net-

works. In ICASSP 2019-2019 IEEE International

Conference on Acoustics, Speech and Signal Process-

ing (ICASSP), pages 3277–3281. IEEE.

Sumari, F. O., Machaca, L., Huaman, J., Clua, E. W., and

erin, J. (2020). Towards practical implementations

of person re-identiﬁcation from full video frames. Pat-

tern Recognition Letters, 138:513–519.

Sun, Y., Zheng, L., Yang, Y., Tian, Q., and Wang, S. (2018).

Beyond part models: Person retrieval with reﬁned part

pooling (and a strong convolutional baseline). In Pro-

ceedings of the European conference on computer vi-

sion (ECCV), pages 480–496.

Wah, C., Branson, S., Welinder, P., Perona, P., and Be-

longie, S. (2011). The Caltech-UCSD Birds-200-2011

Dataset. Technical Report CNS-TR-2011-001, Cali-

fornia Institute of Technology.

Wang, H., Wang, G., Li, Y., Zhang, D., and Lin, L. (2020).

Transferable, controllable, and inconspicuous adver-

sarial attacks on person re-identiﬁcation with deep

mis-ranking. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition,

pages 342–351.

Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Mul-

tiscale structural similarity for image quality assess-

ment. In The Thrity-Seventh Asilomar Conference on

Signals, Systems & Computers, 2003, volume 2, pages

1398–1402. Ieee.

Wang, Z., Zheng, S., Song, M., Wang, Q., Rahimpour, A.,

and Qi, H. (2019). advpattern: physical-world at-

tacks on deep person re-identiﬁcation via adversar-

ially transformable patterns. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion, pages 8341–8350.

Xiong, F., Xiao, Y., Cao, Z., Gong, K., Fang, Z., and Zhou,

J. T. (2019). Good practices on building effective cnn

baseline model for person re-identiﬁcation. In Tenth

International Conference on Graphics and Image Pro-

cessing (ICGIP 2018), volume 11069, page 110690I.

International Society for Optics and Photonics.

Yaghoubi, E., Kumar, A., and Proenc¸a, H. (2021). Sss-pr:

A short survey of surveys in person re-identiﬁcation.

Pattern Recognition Letters, 143:50–57.

Yuan, X., He, P., Zhu, Q., and Li, X. (2019). Adversar-

ial examples: Attacks and defenses for deep learning.

IEEE transactions on neural networks and learning

systems, 30(9):2805–2824.

Zhang, X., Luo, H., Fan, X., Xiang, W., Sun, Y., Xiao,

Q., Jiang, W., Zhang, C., and Sun, J. (2017). Aligne-

dreid: Surpassing human-level performance in person

re-identiﬁcation. arXiv preprint arXiv:1711.08184.

Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian,

Q. (2015). Scalable person re-identiﬁcation: A bench-

mark. In Proceedings of the IEEE international con-

ference on computer vision, pages 1116–1124.

Zheng, L., Yang, Y., and Hauptmann, A. G. (2016). Per-

son re-identiﬁcation: Past, present and future. arXiv

preprint arXiv:1610.02984.

Zheng, Z., Zheng, L., Hu, Z., and Yang, Y. (2018).

Open set adversarial examples. arXiv preprint

arXiv:1809.02681, 3.

Zheng, Z., Zheng, L., and Yang, Y. (2017). A discrimi-

natively learned cnn embedding for person reidentiﬁ-

cation. ACM transactions on multimedia computing,

communications, and applications (TOMM), 14(1):1–

20.

Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-

ralba, A. (2017). Places: A 10 million image database

for scene recognition. IEEE transactions on pattern

analysis and machine intelligence, 40(6):1452–1464.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

444