An Experimental Consideration on Gait Spooﬁng

Yuki Hirose

, Kazuaki Nakamura

2 b

, Naoko Nitta

and Noboru Babaguchi

Graduate School of Engineering, Osaka University, Suita, Osaka, 565-0871, Japan

Faculty of Engineering, Tokyo University of Science, Tokyo, 125-8585, Japan

School of Human Environmental Sciences, Mukogawa Women’s University, Nishinomiya, Hyogo, 663-8558, Japan

Institute for Datability Science, Osaka University, Suita, Osaka, 565-0871, Japan

Keywords:

Gait Recognition, Spooﬁng Attacks, Master Gait, Masterization, G ait Spooﬁng, Fake Gait Silhouettes,

Multimedia Generation.

Abstract:

Deep learning technologies have improved the performance of biometric systems as w el l as increased the

risk of spooﬁng attacks against them. So far, lots of spooﬁng and anti-spooﬁng methods were proposed

for face and voice. However, for gait, there are a limited number of studies focusing on the spooﬁng risk.

To examine the executability of gait spooﬁng, in this paper, we attempt to generate a sequence of fake gait

silhouettes that mimics a certain target person’s walking style only from his/her single photo. A feature vector

extracted from such a single photo does not have full information about the target person’s gait characteristics.

To complement the information, we update the extracted feature so that it simultaneously contains various

people’s characteristics like a wolf sample. Inspired by a wolf sample or also called “master” sample, which

can simultaneously pass two or more veriﬁcation systems like a master key, we call the proposed process

“masterization”. After the masterization, we decode its resultant feature vector to a gait silhouette sequence.

In our experiment, the gait recognition accuracy with t he generated fake silhouette sequences is increased from

69% to 78% by the masterization, which indicates an unignorable risk of gait spooﬁng.

1 INTRODUCTION

Recently, deep neural networks (D NN s) have been

introdu ced in a wide range of research ﬁelds and

achieved great success. One of the most DNN-

beneﬁtted research ﬁelds is biometrics such as face

identiﬁcation, voice au thentication, gait recognition,

and so on, whose pe rformance has been d rastically

improved by DNNs. On the o ther hand, DNNs have

also accelerated the performance of multimedia gen-

eration techniques. D N Ns, or more speciﬁcally gen-

erative adversarial networks (GANs), can generate

highly realistic facial images, speech data, and so on

that mimics an actual person’s biometric characte ris-

tics (Kammoun et al. , 2022; Toshpulatov et al., 2021;

Tu et al., 2019). These tec hniques are useful in some

aspects (e.g., content c reation and movie production),

but they b ring a risk of spooﬁng attacks against bio-

metrics.

The risk of spooﬁng attacks against face identiﬁ-

cation and voice authentication has been widely dis-

https://orcid.org/0000-0003-3370-0372

https://orcid.org/0000-0002-4859-4624

cussed in the literatu re (Conotter et al., 2014; Nguye n

et al., 2015; Chen et al., 2022; Shiota et al., 2015;

Wang et al., 2019). There a re a lot of existing studies

proposing anti-spooﬁng methods for face and voice.

However, for gait, which is a re la tively novel biomet-

ric clue for human identiﬁcation, the risk of spoof-

ing attacks ha s not been well-analyzed yet. Although

the development of gait r ecognition is still halfway,

it is advantageous in that it can be applied to low-

resolution videos where facial textures a re not clearly

observed. Hence, gait recognition will be more

widely a nd complementarily used with face identiﬁ-

cation in the near future society. This means that ex-

ploring the (fu ture) risk of gait spooﬁng is an impor-

tant issue even if the performance and the spread of

gait recognition are still limited at present.

So far, a few existing studies focus on the risk of

gait spooﬁng (Gafur ov et al., 2007; Hadid et al., 201 2;

Hadid et al., 2013). However, they do not assume

fake gait generated by multimedia genera tion tech-

niques; they only assume th e ca ses where an attacker

physically mimics the target person’s walking style or

physically wears the same clothes as th e target person.

Unlike them, in this paper, we focus o n the risk of

Hirose, Y., Nakamura, K., Nitta, N. and Babaguchi, N.

An Experimental Consideration on Gait Spooﬁng.

DOI: 10.5220/0011661200003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

559-566

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

559

Figure 1: Assumed scenario of gait spooﬁng.

gait spooﬁng caused by DNN-based multimedia g en-

eration tec hniques. Speciﬁcally, in order to analyze

whether a spooﬁng attack a gainst gait recognition is

practically possible or not, we propose a metho d for

generating a seq uence of a target individual’s fake gait

silhouettes using DNNs.

Figure 1 depicts a possible scenario o f spooﬁn g at-

tacks against gait analysis. Suppose that there are two

political parties P

and P

opposing each other. An

attacker A joining the party P

attempts to injure the

reputation of a politician B in the party P

by making

his fake video. First, the attacker A capture s a video

of a certain scandalous place using his own device. In

parallel, he generates a sequence of fake gait silhou-

ettes that mimic or spoof B’s walking style. Then, he

colorizes the generated silhouettes and ﬁlls in them to

the video captured above, which results in a fake gait

video of B. In this step, he also generates a fake face

of B and inserts it into the fake ga it video if needed.

This increases the reality of the fake video but is not

necessarily needed when the video resolution is low.

Note that the colorization process itself is not so im-

portant bec ause human eyes cannot identify peop le b y

their b ody textures whereas automa te d systems just

use silhouette information. At last, the attac ker up-

loads the fake gait video onto the Web, particularly

SNS. Nowadays, there ar e a lot of IT-skillful people

who want to ch e ck the social behavior of politicians.

For them , a gait recognition system can be a useful

tool. O ne such person C, who is not a police ofﬁcer

but an ordinary citizen, checks the SNS and inputs the

fake video into her private gait recognition system Y .

As a result, the politician B’s behavior is fabricated

and distributed as fake news even th ough the checker

C has no malicious intent, as shown in Figure 1. The

fake gait video may pass the mod e rn deep fake detec -

tion systems because most of them are focu sin g only

on faces. In other words, fake news fabricated with

a fake face becom es more difﬁcult to detect by com-

bined with fake gait.

In the above scenario , we assume tha t the attacker

A can use a single photo of the politician B as well as

a large database of gait silhouettes tha t are not related

to B no r Y . Under this assumption, targets of th e gait

spooﬁng are not limited to politicians; not only other

famous people such as celebrities and sports players

but even o rdinary citizens whose photo is on the Web

or SNS could be a victim of this attack. The goal

of the attacker is to generate a sequence of fake gait

silhouettes that can be recognized as the vic tim by an

unknown gait recognition system Y .

The contributions of this paper are summarized as

follows. Fir st, this is the ﬁr st work focusing on the

method of DNN-based gait spooﬁng and an a lyzing its

risk, to the best of our knowledge. Second, to achieve

gait spooﬁng, we introduce the novel concept named

“master gait”, which is a master key-like gait data that

can be accepted by multiple gait veriﬁcation systems.

The concept of master gait is utilized to complement

the limited information of a single photo. We will

explain its details in Section 3.

2 RELATED WORK

2.1 Spooﬁng Attacks Against

Biometrics

Methods of spoo ﬁng attacks against face recognition

and voice authentication have been actively stu died

for m ore than ﬁfteen years. On e of the simplest at-

tack ways is a presentation attack. For a face reco gni-

tion system equipped with a camera, an attacker can

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

560

fool it by presenting a photo or a video of some valid

user (Patel et al., 2016; Anjos et al., 20 14). Similarly,

for a voice authentication system equipped with a mi-

crophone, the attacker can fool it by replaying pre-

recorde d voice of some valid user (Cheng and Roedig,

2022).

Conducting a presentation attack needs some

“spoof data”, i.e ., a photo or pre-recorded voice of

a valid user. For the face, spoof data can be retrieved

from social networking services such a s Facebook or

Instagram in so me cases (Kuma r et a l., 2017). In con-

trast, for the voice, spoof data cannot always be easily

obtained. Hence, speech synthesis techniques are of-

ten exploited to make spoof data . A voice spooﬁng

attack using such sy nthetic data is called a voice syn-

thesis attack . The speech synthesis techniques used

for this attack are divided into two categories: voice

conversion (VC) (Liu et al., 2018) and text-to-speech

(TTS) (Tu et al., 2019). VC is a technique to convert

a so urce speaker’s voice to a target speaker’s voice

without changing its linguistic information. TTS is

a technique to c onvert an arbitral plaintext to spoken

words with a certain target speaker’s voice. Applying

these techniques to the attacker’s own voice or plain-

text da ta to co nvert it to a valid user’s voice, h e can

obtain spoof data (Kreuk et a l., 2018; Zhang et al.,

2021). Multimedia generation techniques are also ex-

ploited for face spooﬁng. Nowadays, no t on ly 2D im-

ages/video but also 3D volume data of faces can be

generated by GANs (Toshpulatov et al., 2021), which

are at risk of being exploited for face spooﬁng (Gal-

bally and Satta, 2015).

To defeat the above spooﬁng methods, anti-

spooﬁng methods for face and voice authentication

have also be en a c tively studied. For instance, the re

is some previous work that tried to discriminate

computer-generated or GAN-generated face images

from real face images (Conotter et al., 2014; Nguyen

et al., 2015). Recently, CNNs are often used for face

anti-spooﬁng. For instance, Chen et al. found that

the luminance component of face images is helpful

to detect GAN-generated faces and proposed to use

YCbCr images in addition to RGB imag e s as input

of a CNN (Chen et al., 2022). For voice, pop noise-

based anti-spooﬁng methods are well-studied (Shiota

et al., 2015; Wang et al., 2019). When a hu man speaks

into a microphone, his/her breath sometimes reaches

the microphone, which yields a pop noise. This is

difﬁcult to be naturally generated eve n with GANs.

Therefore pop noises become a good clue for voice

anti-spooﬁng.

As discussed above, a lot of spooﬁng and anti-

spooﬁng methods have been studied for face recog-

nition and voice au thentication. In contrast, for gait

recogn ition systems, spooﬁng attacks with synth e tic

data have not been studied yet. T hus, in this paper,

we focus on gait spooﬁng utilizing CNN-b a sed mul-

timedia generation techniques.

2.2 Wolf Attacks by “Master” Samples

The purpose of spooﬁng attacks is to create a fake

biometric sample (e.g. face) that is similar to a target

person and dissimilar to all other people. However,

researchers in the ﬁeld of biom etrics found that it is

possible to create a single fake sam ple that is similar

to two or more people . This is called a “wolf sample”,

and the attacks against biometric veriﬁcation system s

based on a wolf sample ar e called “wolf attacks” (Une

et al., 2007). Suppose that th ere are two or more peo-

ple who have th e ir own biometric veriﬁcation system.

Each veriﬁcation system is a two-class classiﬁer th at

predicts whether an input biom etric sample is from

its owner or not. An attacker can simultaneously fool

many of these systems by using a sin gle wolf sam-

ple, where the wolf sample plays the role of a master

key. Since this is a serious problem, methods of wolf

attacks and their countermeasures have been stud ie d.

For instance, Ohki et al. evaluated the executabil-

ity of wolf attacks against voice veriﬁcation systems

(Ohki e t a l., 2012). Nguyen et al. proposed a GAN-

based method for generating a wolf sample against

face recognition systems (Nguyen et al., 2020). They

refer to the wolf samples generated by their method

as “master faces”.

Although there is no previous work focusing on

wolf attacks against gait recognition systems, we be-

lieve that the characteristics of wolf samples are help-

ful to conduct gait spooﬁng. Thus, we introduce

the concep t of “master gait” in the proposed method,

which is explained in the next section.

3 GAIT SPOOFING METHOD

3.1 Concept Deﬁnitions

In this p a per, the term “gait veriﬁcation” means

a two-class classiﬁcation task. A gait veriﬁcation

system only focuses on a single individual and pre-

dicts whether an input gait silhouette sequence S =

, ··· , s

} is genuinely captured from the indi-

vidual or no t. s

is the i-th fra me in S. Generally,

a veriﬁcation system ﬁrst compresses S into a single

feature map f = F(S) by a compressor F, whose typ-

ical example s a re Gait Energy Im a ge (GEI) (Man and

Bhanu, 2006) and Frequency Domain Feature (FDF)

An Experimental Consideration on Gait Spooﬁng

561

Figure 2: Procedure for training gait silhouette decoder D and encoder E.

(Makihara et al., 2006). GEI is the pixel-wise aver-

age of {s

} while FDF is the pixel-wise Fourier co-

efﬁcients calculated for {s

}. For the feature map f ,

the veriﬁcation system outputs a single score ω( f ) =

ω(F(S)) ∈ [0, 1], where the input S is classiﬁed as

“genuin e” if and only if ω( f ) ≥ 0.5. Based on the

above, we deﬁne a “master gait” as a feature map

that ha s a score higher than 0.5 for two or more dif-

ferent individuals’ gait veriﬁcation systems.

In contrast, the term “gait recognition” means a

multi-class classiﬁcation task. A gait recog nition sys-

tem focuses on K different individuals (K ≥ 2) and

predicts which of them an input sequence S is cap-

tured from. A recognition system generally ou tputs

K-dimensional sco re vector η( f ) = η(F(S)) ∈ [0, 1]

If the j-th element in η( f ) is larger than all the other

elements, the system judge s S is captured from the j -

th individual.

3.2 Overview of the Proposed Method

The sh a pe of a single gait silhou e tte s is determined

by two factors: body shape (including the shape of

clothes) and posture. Since walking is a periodic ac-

tion, a human’s posture in h is/h e r one cycle of gait

can be represented by a phase value θ ∈ [0, 2π]. A

human’s body shape can be represented by a certain

shape vector z ∈ R

, where d is its dimensionality.

Thus, a gait silhouette s can be determined by z and

θ. Let D be a decoder that ge nerates a silhouette im-

age s = D(z, θ) from z and θ.

The goal of gait spooﬁng is to obtain th e opti-

mal shape vector z

∗

that maximizes η

( f (z)), where

f (z) = F(S(z)) is a f eature map of a fake silhou-

ette sequence S(z) = {D(z, θ

)|i = 1, ··· , n} gener-

ated by D. η is the score vector outputted by the

checker C’s gait recognition system Y , and b is the

ID of the spooﬁng target B. The phase sequence

Θ = {θ

, ··· , θ

} can be arbitrarily given. Note that

attacker A does not know the network structure and

the parameters of Y but he can guess F beca use th ere

are only a few kinds of feature maps wid ely used for

gait recognition. In this paper, we assume FDF as the

feature map extrac tor F.

To obtain z

∗

, the attacker can use a single photo

of the target B, as we assumed in Sec tion 1. Let p be

the silhouette extracted from the photo. The simplest

way to ﬁnd z

∗

is train ing a phase-free shap e encoder

E that can extract z from D(z, θ) as z = E(D(z, θ)),

by which we can get z

∗

as z

∗

= E(p). However,

this strategy can hardly provide a good z

∗

in prac-

tice. This is bec a use a single silhouette p does not

have full information about B’s gait characteristics.

Hence, there is a certain extraction error ∆z between

˜z = E(p) and z

∗

, i.e., ˜z = z

∗

+ ∆z. This ∆z needs to

be estimated f or gait spooﬁng.

In summa ry, the process of gait spoo ﬁng is as fo l-

lows, where we d escribe the ways to a chieve steps (1)

and (3 ) in Subsections 3.3 and 3.4, respectively.

(1) The attacker ﬁrst trains D and E using his own gait

silhouette database.

(2) With the trained E, he gets ˜z = E(p) using a photo

of the target pe rson B.

(3) Next, he optimizes the ˜z to z

∗

= ˜z − ∆z by esti-

mating ∆z.

(4) Finally, he gene rates a fake silhouette sequence

{D(z

∗

, θ

)|i = 1, ··· , n} by the trained D and arbi-

trarily given Θ = {θ

, ·· · , θ

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

562

3.3 Training Process of Gait Silhouette

Encoder and Decoder

Figure 2 depicts the proposed pro c edure for training

D and E, which c onsists of three steps.

It is difﬁcult to direc tly train the phase-free silhou-

ette encoder E. Hence, we ﬁrst train a f e ature map-

leve l encoder E

and dec oder D

as STEP1. To this

end, for each sequence

= { ˆs

l,1

, · ·· , ˆs

l,m

} in the

attacker’s database, we compress it by F and ob tain a

feature map

= F(

). Let

F = {

|l = 1, · · · , M}

be a set of th e obtained feature maps. Using

F ,

we train an autoencoder, whose encoder and decoder

parts are E

and D

, respectively. T he lo ss function

for STEP1 is

Loss

∑



− D

)



∑



− D

(

))



. (1)

Since z

= E

(

) extracted from

by E

is phase-indepen dent, it can be used as a phase-

free shape vector of the silhouette image ˆs

l,i

for all

i ∈ {1, · · · , m

}. Using them, we next train the

silhouette-level decoder D as STEP2. The loss func-

tion for STEP2 is

Loss

∑





− F(S

′

)



∑

i=1



ˆs

l,i

− D(z

l,i

)





, (2)

where S

′

= {D(z

l,i

)|i = 1, ··· , m

}. The phase

value

l,i

for each image ˆs

l,i

is calcu la te d by our pre-

vious method (Hirose et al., 201 9). Finally, we train

the phase-free silhouette encoder E as STEP3, whose

loss function is

Loss

∑



E( ˆs

l,i

) − z



. (3)

3.4 Optimization of G ait Shape Vector

Using Master Gait

Shape vector ˜z = E(p) obtained by the encoder E in-

cludes an extraction error ∆z. Due to the error, ˜z does

not keep enough level of characteristics of the spoof-

ing target B. Hence, the attacker has to emphasize the

characteristics.

Here, suppose that th e attacker trains a gait ver-

iﬁcation system f or each individual in his database.

Let X

( j = 1, ··· , K) be the j-th individual’s veri-

ﬁcation system and let ω

be the output score of X

where K is the number of in dividuals in the attacker’s

database. By inpu tting a feature map

f = D

( ˜z) into

for a ll j, the attacker can obtain a set of scores

{ω

( ˜z))| j = 1, ··· , K}. Sin c e the target B is

not any individual in the attacker’s da ta base, all of the

scores are less than 0.5. However, if the d a ta base is

large enough, it has some individuals somewhat sim-

ilar to B. Hence, some elements in the score set are

relatively larger than the other elements. This repre-

sents th e target B’s gait characteristics. In the pro-

posed method, we em phasize the characteristics by

perturbing ˜z so that the relatively large elements be-

come further larger and the other elements become

smaller. The perturbation result is u sed as z

∗

, which

satisﬁes ω

∗

)) > 0.5 for two or more elements.

This means D

∗

) behaves as a master gait, thus we

refer to the above process as “masterizat ion” of ˜z.

The co ncrete process of the masteriza tion is as fol-

lows (see also Figu re 3). First, the attacker trains X

, · · · , and X

using his own database. Next, he in-

puts the shape vector ˜z = E(p) into each X

and ob-

tains a score vector

ω( ˜z) =







( ˜z))







∈ [0, 1]

. (4)

Then, he ﬁnds top -N large elements in ω( ˜z) and

makes a N-hot vector h

= (h

N,1

··· h

N,K

)

⊤

∈

{0, 1}

. Each element of h

is set as 1 if and only if

its correspon ding elements in ω( ˜z) is included in the

top-N ones. Other elements in h

are set as 0. After

that, he calculates the binary cross entropy between

ω( ˜z) and h

, i.e.,

−

∑

j=1



N, j

log{ω

( ˜z))}

+(1 − h

N, j

)log{1 − ω

( ˜z))}



, (5)

and min imizes it with respe ct to ˜z to ﬁnd the optimal

∗

. The minimizatio n process is performed by a gra-

dient descent algo rithm. This process is equiva lent to

estimating ∆z as ∆z = ˜z − z

∗

4 EXPERIMENTS

4.1 Experimental Setup

To examin e the performance of the proposed method,

we conducted an experimen t, w here we used the

OU-ISIR Gait Database (Makihara et al., 2012) as a

dataset. This d ataset has several different subsets, two

of which called “treadmill-(A)” and “treadmill-(B)”

An Experimental Consideration on Gait Spooﬁng

563

Figure 3: Updating procedure of ˜z by masterization.

Figure 4: Network structure of gait recognition system Y . “Conv” means a convolutional layer, where “KS” and “Ch” are its

kernel size and num. of channels. “FC” means a fully-connected layer, where “n” is num. of units in it. “⊗” is pixel-wise

multiplication.

were used. The treadmill-(A) consists of 612 ga it sil-

houette sequences of 34 individuals ( 18 sequences per

individual), while the treadmill-(B) consists of 2176

sequences of 68 individuals ( 32 sequ ences per indi-

vidual). In our experiment, we used the treadmill-(A)

to construct the checker C’s gait recognition system

Y as well as treated the tr eadmill-(B) as the a ttacker

A’s datab a se.

We trained Y as a DNN, whose network structure

is shown in Figure 4. After training Y , we selected

a single frame from each sequence in the treadmill-

(A) an d used it as the photo of the spo oﬁng target B.

From the photo, we generated a fake gait silhouette

sequence and fed it into Y to check whether it is cor-

rectly recognized or not. We repeated this process for

every frame in the tr eadmill-(A), and ﬁnally evaluated

the recognition accuracy. High e r accuracy is de sira ble

for the attacker since it means a high success rate of

gait spooﬁn g.

The silhouette-level encoder E and de c oder D, the

feature map-level encoder E

and decoder D

, and

gait veriﬁcation system s {X

} were trained as a DNN

with the attacker’s data base, namely the treadmill-

(B). The network structures of the se DNNs are shown

in Figure 5, where we set the dimensionality of the

shape vector z ∈ R

as 16, i.e., d = 16.

4.2 Results and Discussions

Figure 6 shows the result of the experiment under var-

ious settings of N. The red solid line indicates the

recogn ition accuracy of Y w hen we fed it with the

fake gait silhouette sequences generated by the pr o-

posed method. Th e blue da shed line indicates the

recogn ition accuracy without the masterization. Com-

pared to the dashed line, we obtained better ac curacy

when N = 1 and N = 3. This result demon strates the

effectiveness of the masterization as a technique for

gait spooﬁng attacks. On the other hand, when N ≥ 5 ,

the gait recognition accuracy is seriously degraded by

the masterization. The purp ose of the masterization is

to e nlarge the relatively large elements in the score set

{ω

( ˜z))| j = 1, ··· , K}. However, most of these

scores are small since the spooﬁng target person is not

any individual in the attacker’s database. Therefore,

even the fourth or ﬁfth largest value in the score set

is q uite small, at least in this experiment. Enlarging

such values is not effective for gait spooﬁng. Based

on the ab ove consideration, the best setting of N de-

pends on the size of the attacker’s database. We will

try to ﬁnd the relationship between them in our future

work.

Figure 7 shows some examples of fake gait sil-

houettes generated by the pr oposed method as well

as those without masterization. We ca n see that the

generated silhouettes lose their shape when N = 20.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

564

Figure 5: Network structure of E, D, E

, D

, and X

. “Deconv” means a transposed convolutional layer. “⊕” means

concatenation operator.

Figure 6: Recognition accuracy of gait recognition system

Y under various settings of N.

Figure 7: Examples of fake gait silhouettes generated by the

proposed method.

On the other hand, the silhouettes gen erated with rela-

tively small N (e.g. , N = 3) can keep a natur al appear-

ance. These results indicate that the proposed method

does not give any serious distortion to the generated

fake gait silho uettes when N is appropriately set. In

the case of “without ma sterization”, arm regions in

the silho uettes are not generated well. This is be-

cause the input photo does not have the information

of the arm shape. Nevertheless, the pr oposed method

with N = 3 can generate the arm regions more nat-

urally. This is a rea son why the pro posed method

achieves higher accuracy than th at without masteri-

zation in Fig ure 6.

5 CONCLUSION

In this paper, we focused on the risk of gait spooﬁng

and pro posed a method for generatin g a fake gait sil-

houette seq uence of a target person only from his/he r

single photo. In general, a single photo does not have

full information about the gait c haracteristics of its

owner. Hen c e, it is not e nough for the attacker to

just extract a feature vector from the photo. To solve

this prob le m, we proposed to emp hasize the gait char-

acteristics of the target person by the m a sterization

of the feature vector, before decoding it to a silhou-

ette sequence. In ou r experiment, we found that the

gait recognition accuracy with the gene rated fake se-

quences was increased from 69% to 78% by the mas-

terization. This means the unignorable risk of ga it

spooﬁng. We will further investigate the possibility

of gait spooﬁng as well as try to propose its counter-

measure in our future work.

This work was supported by JSPS KAKENHI u n-

der Grant JP21J11069 and JST CREST under Grant

JPMJCR20D3.

An Experimental Consideration on Gait Spooﬁng

565

REFERENCES

Anjos, A ., Chakka, M. M., and Marcel, S. (2014). Motion-

based counter-measures to photo attacks in face recog-

nition. IET Biometrics, 3(3):147–158.

Chen, B., Liu, X., Zheng, Y., Zhao, G., and Shi, Y. (2022).

A robust gan-generated face detection method based

on dual-color spaces and an improved xception. IEEE

Trans. on Circuits and Systems for Video Technology,

32(6):3527–3538.

Cheng, P. and Roedig, U. (2022). Personal voice assistant

security and privacy — a survey. Proceedings of the

IEEE, 110(4):476–507.

Conotter, V., Bodnari, E., Boato, G., and Farid, H. (2014).

Physiologically-based detection of computer gener-

ated faces in video. In Proc. 21st IEEE Int’l Conf.

on Image Processing, pages 248–252.

Gafurov, D., Snekkenes, E., and Bours, P. (2007). Spoof

attacks on gait authentication system. IEEE Trans. on

Information Forensics and Security, 2(3):491–502.

Galbally, J. and Satta, R. (2015). Three-dimensional and

two-and-a-half-dimensional face recognition spooﬁng

using three-dimensional printed models. IET Biomet-

rics, 5(2):83–91.

Hadid, A. , Ghahramani, M., Bustard, J. , and Nixon, M.

(2013). Improving gait biometrics under spooﬁng at-

tacks. In Proc. 17th IAPR Int’l Conf. on Image Analy-

sis and Processing, pages 1–10.

Hadid, A., Ghahramani, M., Kellokumpu, V., Pi et ik¨ainen,

M., Bustard, J., and Nixon, M. (2012). Can gait bio-

metrics be spoofed? In P roc. 21st IAPR Int’l Conf. on

Pattern Recognition, pages 3280–3283.

He, Z., Wang, W., D ong, J., and Tan, T. (2020). Tempo-

ral sparse adversarial attack on sequence-based gait

recognition. arXiv:2002.09674.

Hirose, Y., Nakamura, K., Nitta, N., and Babaguchi, N.

(2019). Anonymization of gait silhouette video by

perturbing its phase and shape components. In Proc.

11th Asia-Paciﬁc Signal and Information Processing

Association Annual Summit and Conference, pages

1679–1685.

Kammoun, A., Slama, R., Tabia, H., Ouni, T., and Abid,

M. (2022). Generative adversarial networks for face

generation: A survey. ACM Computing Surveys, page

37 pages.

Kreuk, F., Adi, Y., Cisse, M., and Keshet, J. (2018). Fooling

end-to-end speaker veriﬁcation with adversarial ex-

amples. In Proc. 2018 IEEE Int’l Conf. on Acoustics,

Speech and Signal Processing, pages 1962–1966.

Kumar, S., Singh, S., and Kumar, J. (2017). A compar-

ative study on face spooﬁng attacks. In Proc. 2017

Int’l Conf. on Computing, Communication and Au-

tomation, pages 1104–1108.

Liu, L., Ling, Z., Jiang, Y., Zhou, M., and Dai, L. ( 2018).

Wavenet vocoder with limited t raining data for voice

conversion. In Proc. INTERSPEECH 2018, pages

1983–1987.

Makihara, Y., Mannami, H., Tsuji, A., Hossain, M. A., Sug-

iura, K., Mori, A., and Yagi, Y. (2012). The ou-isir gait

database comprising the treadmill dataset. IPSJ Trans.

on Computer Vision and Applications, 4:53–62.

Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., and

Yagi, Y. (2006). Gait recognition using a view trans-

formation model in t he frequency domain. In Proc.

European C onf. on Computer Vision, pages 151–163.

Man, J. and Bhanu, B . (2006). Individual recognition using

gait energy image. IEEE Trans. on Pattern Analysis

and Machine Intelligence, 28(2):316–322.

Maqsood, M., Ghazanfar, M. A., Mehmood, I., Hwang,

E., and Rho, S. (2022). A meta-heuristic optimiza-

tion based less imperceptible adversarial attack on gait

based surveillance systems. Journal of Signal Pro-

cessing Systems, page 23 pages.

Nguyen, H., Nguyen-Son, H., Nguyen, T., and Echizen, I.

(2015). Discriminating between computer-generated

facial images and natural ones using smoothness prop-

erty and local entropy. In Proc. 14th Int’l Workshop

on Di git al Forensics and Watermarking, pages 39–50.

Nguyen, H. H., Yamagishi, J., Echizen, I., and Marcel,

S. (2020). Generating master faces for use in per-

forming wolf attacks on face recognition systems. In

Proc. 2020 IEEE Int’l Joint Conf. on Biometrics, page

10 pages.

Ohki, T., Hidano, S., and Takehisa, T. (2012). Evaluation

of wolf attack for classiﬁed target on speaker veriﬁ-

cation systems. In Proc. 12th Int’l Conf. on Control

Automation Robotics and Vision, pages 182–187.

Patel, K., Han, H., and Jain, A. K. (2016). Secure face

unlock: Spoof detection on smartphones. IEEE Trans.

on Information Forensics and Security, 11(10):2268–

2283.

Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N.,

Echizen, I. , and Matsui, T. (2015). Voice liveness

detection algorithms based on pop noise caused by

human breath for automatic speaker veriﬁcation. In

Proc. of 16th Annual Conf. of the Int’l Speech Com-

munication Association, pages 239–243.

Toshpulatov, M., Lee, W., and Lee, S. (2021). Generative

adversarial networks and their application to 3d face

generation: A survey. Image and Vision Computing,

108:18 pages.

Tu, T., Chen, Y., Yeh, C., and Lee, H. (2019). End-to-end

text-to-speech for low-resource languages by cross-

lingual transfer learning. In Proc. INTERSPEE CH

2019, page 5 pages.

Une, M., Otsuka, A., and Imai, H. (2007). Wolf attack prob-

ability: A new security measure in biometric authen-

tication systems. In Proc. Int’l Conf. on Biometrics,

pages 396–406.

Wang, Q., Lin, X., Zhou, M., Chen, Y., Wang, C., Li,

Q., and Luo, X. (2019). Voicepop: A pop noise

based anti-spooﬁng system for voice authentication on

smartphones. In Proc. IEEE Conf. on Computer Com-

munications, pages 2062–2070.

Zhang, Y., Jiang, F., and Duan, Z. (2021). One-class learn-

ing towards synthetic voice spooﬁng detection. IEEE

Signal Processing Letters, 28:937–941.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

566