Towards an Automatic System for Generating Synthetic and

Representative Facial Data for Anonymization

Nat

alia F. de C. Meira

, Ricardo C. C. de M. Santos, Mateus C. Silva

, Eduardo J. da S. Luz

and Ricardo A. R. Oliveira

Department of Computer Science, Federal University of Ouro Preto, Ouro Preto, Brazil

Keywords:

GANs, Privacy, Data Anonymization, Representative Data, Face Swap, Deep Learning.

Abstract:

Deep learning models based on autoencoders and generative adversarial networks (GANs) have enabled in-

creasingly realistic face-swapping tasks. Surveillance cameras for detecting people and faces to monitor hu-

man behavior are becoming more common. Training AI models for these detection and monitoring tasks

require large sets of facial data that represent ethnic, gender, and age diversity. In this work, we propose the

use of generative facial manipulation techniques to build a new representative data augmentation set to be

used in deep learning training for tasks involving the face. In the presented step, we implemented one of the

most famous facial switching architectures to demonstrate an application for anonymizing personal data and

generating synthetic data with images of drivers’ faces during their work activity. Our case study generated

synthetic facial data from a driver at work. The results were convincing in facial replacement and preservation

of the driver’s expression.

1 INTRODUCTION

Surveillance cameras that utilize pre-trained AI mod-

els are common in industrial, commercial and resi-

dential environments (Morishita et al., 2021; Eldran-

daly et al., 2019). These cameras are used for various

applications such as fatigue detection (Sikander and

Anwar, 2018), personal protective equipment detec-

tion (Nath et al., 2020) and, human behavior classiﬁ-

cation to identify inappropriate or dangerous behavior

for the individual (Ahmed and Echi, 2021; Liu et al.,

2021; Alajrami et al., 2019).

Facial exchange technology has evolved with a va-

riety of specialized approaches and techniques. Cur-

rently, the most common approaches are: face swap,

and synthesized aging and rejuvenation while main-

taining the person’s identity. The work by Yu et al.

(Yu et al., 2021) divides facial manipulation algo-

rithms into two types: face swapping and face reen-

actment.

The models behind these face swap tasks are

based on Autoencoders (AEs) and Adversarial Gen-

erative Neural Networks (GANs). Generative models

https://orcid.org/0000-0002-7331-6263

https://orcid.org/0000-0003-3717-1906

https://orcid.org/0000-0001-5167-1523

can potentially learn any data distribution in an un-

supervised way (Pavan Kumar and Jayagopal, 2021).

The increasing advancement of these techniques has

raised questions about privacy and manipulation of

personal data for fraud, scams, and unethical and ma-

licious applications. Mirsky et al. (Mirsky and Lee,

2021) reinforce the need to advance the detection of

false content to obtain countermeasures for this con-

tent.

In this work, we propose the use of generative fa-

cial manipulation techniques for constructing a new

set of data augmentation. Facial manipulation tech-

niques for entertainment and advertising attract at-

tention and become increasingly popular in online

communities (Yu et al., 2021). With our automatic

method, the user will be able to generate new repre-

sentative deepfake faces by following the automatic

steps shown in Figure 1.

Based on the discussion presented, we intend to

develop an automatic method for generating synthetic

data of representative faces for use with embedded

cameras in industrial scenarios. The use of open mod-

els is a requirement. Therefore, the main objective of

this text is:

• To present the method for generating representa-

tive synthetic datasets for work activities based

on face-switching by Adversary Generative Neu-

854

Meira, N., Santos, R., Silva, M., Luz, E. and Oliveira, R.

Towards an Automatic System for Generating Synthetic and Representative Facial Data for Anonymization.

DOI: 10.5220/0011796800003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP , pages

854-861

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: Representation of the automatic system for gen-

erating representative synthetic datasets.

ral Networks.

We propose to develop this work through the fol-

lowing steps:

• Step 1 – Generate representative deepfake faces

with ethnic, racial, gender, and age diversity to

avoid biases in facial datasets;

• Step 2 – Provide one or more videos of the situa-

tion studied for the model;

• Step 3 – Train the model to get replicas of the

videos provided with the faces swapped. These

videos will provide a representative, anonymized

synthetic facial dataset for training models for

tasks involving facial datasets.

This work aims to explore a case study of moni-

toring the work activity of drivers of a transport com-

pany, with embedded cameras onboard the vehicle,

to obtain a proof of concept for Step 2 and Step 3.

For this, the case study consists of replacing the face

of the individual during the performance of his work

through the deepfake technique using GAN.

The main contribution of this work is to present a

proof of concept as an alternative for generating syn-

thetic and anonymized data representative of employ-

ees during the execution of the work activity.

This work is structured as follows: In Section 1,

we presented the problem and the main contributions

of the work. In Section 2 we provide a theoretical re-

view of deepfake algorithms, metrics for evaluating

GANs, and challenges for detecting and creating fake

content. In Section 3 we present some works related

to people detection and ethics in AI. In Section 4 we

present the methodology implemented, with the re-

sults achieved in Section 5 . Finally, the conclusions

are in Section 6.

2 THEORETICAL BACKGROUND

In this section, we present some concepts and research

that have advanced in the context of deepfakes: evolv-

ing approaches, face swap approaches, metrics for

evaluating deepfakes, and recent discussions on false

content detection and future challenges.

2.1 Popularization of Face Swap

Frameworks: DeepFaceLab

Most deepfakes are created using variations or com-

binations of generative adversarial networks and

encoder-decoder networks (Mirsky and Lee, 2021;

Yu et al., 2021). Among these, the GANs intro-

duced by Goodfellow et al. (Goodfellow et al., 2014)

have gained particular attention for quality imag-

ing and data augmentation tasks (Pavan Kumar and

Jayagopal, 2021).

The popularization of facial manipulation algo-

rithms gained prominence in 2017, with the publica-

tion of a deepfake video posted by a Reddit user and,

in the same year, videos of former President Barak

Obama (Suwajanakorn et al., 2017; Kumar et al.,

2017; Jalalifar et al., 2018).

Then, deepfake algorithms for facial swap and

synthesis became popular, such as Fast Face-

Swap (Korshunova et al., 2017), Faceswap-GAN

(Shaoanlu, 2018), and DeepFaceLab (Perov et al.,

2020).

DeepFaceLab (DFL)

(Perov et al., 2020) is an

open-source deepfake project used to create facial

manipulation videos. The three phases of the DFL

pipeline are extraction, training, and conversion:

• Extraction – aims to extract a face from the

source (src) and destination (dst) sets. It performs

face detection, face alignment, and face segmen-

tation;

• Training – training has two structures;

• Conversion – swap face from src to dst and vice

versa.

At the time of its release, DFL had competitive re-

sults with other face swap frameworks in experiments

under the same conditions. Currently, DFL has a sup-

port community for users’ implementations and ques-

tions.

2.2 Metrics for GANs

The work of Kumar and Jayagopal (Pavan Kumar and

Jayagopal, 2021) organized the main evaluation met-

https://github.com/iperov/DeepFaceLab

Towards an Automatic System for Generating Synthetic and Representative Facial Data for Anonymization

855

rics for GANs. The authors found that no metrics

were developed and standardized to evaluate GANs.

In addition, evaluation proposals rely on a distance

function to calculate the distance between an actual

distribution and a generated distribution. Some of the

metrics presented were:

• Nearest Neighbor Classiﬁer (1-NN) – is a ver-

sion of the classiﬁer two sample tests (C2ST), and

not an evaluation metric and aims to verify a simi-

larity between the distribution of real data and the

distribution of generated data;

• Inception Scores (IS) – this metric is derived

from the work of Salimans et al. (Salimans et al.,

2016), and is used to assess the quality and diver-

sity of images synthesized by generative models;

• Mode Score (MS) - metric that overcomes the

limitation faced by the IS metric and considers the

previous distribution statistics to assess the qual-

ity and diversity of images (Cai et al., 2019).

Other metrics presented are: Maximum mean

discrepancy (MMD) (Guo et al., 2020), Multi-scale

structural similarity for image quality (Wang et al.,

2004), and Wasserstein critical (Arjovsky et al.,

2017).

2.3 False Content Detection

The work of Yu et al. (Yu et al., 2021) aimed to

demonstrate the current status of deepfake video de-

tection research. The authors cited the recent merger

of Amazon, Facebook, and Microsoft, which teamed

up to host the Deepfake Detection Challenge (DFDC)

to build innovative technologies beneﬁcial to detect-

ing deepfake videos. The authors also cite that tra-

ditional methods are not suitable for detecting deep-

fakes.

The proposed methods for detecting deepfakes

can be divided into at least ﬁve categories:

• General-network-Based Methods: detection is

considered a frame-level classiﬁcation task that

CNNs complete;

• Temporal-consistency-Based Methods: several

works have implemented RNNs to detect incon-

sistencies between adjacent frames;

• Visual Artifacts-Based Methods: discrepancies

that appear at the image blend boundaries;

• Camera-ﬁngerprints-Based Methods: different

devices leave different traces on captured images.

So it is possible to identify when face and back-

ground images are coming from different devices;

• Biological-signals-Based Methods: approaches

based on blink rate and heart rate.

Other works aimed to use multitasking learning

(Nguyen et al., 2019) , attention mechanisms (Dang

et al., 2020) and methods based on visual artifacts (Li

and Lyu, 2018) among others.

2.4 Challenges in Creating Deepfakes

The works of Mirsky and Lee (Mirsky and Lee,

2021), Yu et al. (Yu et al., 2021) and Kumam

and Jayagopal (Pavan Kumar and Jayagopal, 2021)

pointed out some of the challenges in creating and de-

tecting realistic deepfakes:

• Generalization – deepfakes are data-driven and

reﬂect training data in the output, especially for

disparate data distributions;

• Identity Leakage - sometimes, the driver’s iden-

tity is partially transferred to the ﬁnal deepfake

image. This problem is recurrent in facial image

synthesis problems;

• Occlusions – occur when part of the source or tar-

get image is obstructed by a hand, hair, glasses, or

any other item;

• Temporal Coherence – deepfake videos often

produce more obvious artifacts with oscillations

and irregular variations;

• Trade-Off – network complexity and detection

effect;

• Robustness – detecting fake material is more dif-

ﬁcult in compressed videos;

• Trap for the Generator – if the generator is not

as good as the discriminator, then the discrimi-

nator always differentiates between real and fake

data. This feature can be caused by gradient leak-

age from the generators;

• Mode Collapse Problem of GANs - the genera-

tor tries to over-optimize the discriminator at each

epoch of the GAN training. If the discriminator is

caught in the local minimum trap and always re-

jecting all instances of its inputs, then the genera-

tor continues to generate the same set of instances;

• Convergence Problem – sometimes, GAN train-

ing does not converge due to irregularities in

the model structure, hyperparameter tuning, and

training strategies.

These challenges point out how research still

needs to advance in developing and detecting quality

deepfake videos.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

856

3 RELATED WORKS

In this Section, we present some works that reinforce

the thesis about the importance of surveillance and in-

spection in productive and typical environments to-

day. These systems are commonly trained with AI to

detect people in these environments.

Luo et al. (Luo et al., 2020) addressed the lim-

itations of surveillance in hazardous areas and pro-

posed a real-time video surveillance system to detect

people and industrial plant status in a hazardous area.

The proposed system accurately recognized people

and provided instant feedback on unsafe behavior.

Nguyen et al. (Nguyen et al., 2019) proposed a

method to solve camera surveillance systems (CSS)

problems, such as storage capacity and bandwidth

consumption. This issue happens due to the high de-

mand from various industries for smart manufactur-

ing that needs to monitor for abnormal behavior or

perceivable objects.

Boudjit & Ramzan (Boudjit and Ramzan, 2022)

presented the research progress in developing applica-

tions for identifying and detecting people using drone

camera-based convolutional neural networks (CNN).

The authors used different people in the tests to vali-

date the application’s performance.

Gorospe et al. (Gorospe et al., 2021) compared the

performance of different DL methods in a human de-

tection case. The authors also developed an environ-

ment to be embedded in any general-purpose embed-

ded system, aiming at an edge computing system. The

authors used the Visual wake words (VWW) dataset

for binary classiﬁcation to predict whether there is a

person in the image.

Zhao et al. (Zhao et al., 2019) designed Eagle-

Eye, an efﬁcient face detector with high accuracy and

speed to be implemented in popular embedded de-

vices with low computing power. The authors demon-

strated ablation studies, and EagleEye runs on ARM

Cortex-A53 (Raspberry Pi1 3b+) based embedded de-

vice at 21FPS with VGA resolution input with better

accuracy than methods with the same order of com-

putational complexity.

Yang et al. (Yang et al., 2020) also focused on

the challenge of matching facial recognition perfor-

mance and speed in an embedded environment. The

authors designed and implemented a method based

on the MTCNN algorithms for feature extraction and

on the RKNN model to quantify the acceleration of

feature extraction. The result was facial recognition

processing on the RK3399Pro platform, meeting real-

time requirements.

Lv et al. (Lv et al., 2021) used the MTCNN

and LCNN algorithms to design an embedded fa-

cial recognition system. The authors signiﬁcantly

improved the face detection and recognition perfor-

mance of the embedded system.

These are just a few examples of applications that

monitor and detect individuals. People detection sys-

tems must deal with factors such as ethnic diversity,

racial diversity, age range, and gender.

4 METHODOLOGY

This section presents the methodology implemented

for the dataset’s conﬁguration. We also present the

architecture of the GAN used in this work (see in Fig-

ure 2).

4.1 Dataset

The DFL framework ﬁts into the one-to-one

paradigm, i.e., there are two types of data: source

(src) and destination (dst). We obtained source (src)

images from videos publicly available on YouTube.

The videos aim to cover the broadest possible range

of expressions and angles.

We compose the destination (dst) dataset with im-

ages monitoring the work activity of drivers of a

transport company. While driving, the images from

the driver were recorded inside the moving vehicle.

The videos have a resolution of 640x480, with 28

frames/second.

4.2 Pipeline

The DFL workﬂow consists of three sequential steps:

extraction, training, and conversion.

4.2.1 Extraction

The extraction consists of extracting the face from the

src and dst data. We used the whole-face extraction

mode in all training. We use the standard S3FD face

detector.

Then the DFL performs the alignment. In devel-

oping the DFL, the authors (Perov et al., 2020) noted

that facial landmarks were vital in maintaining sta-

bility. Then, the authors implemented two canonical

types of facial landmark extraction algorithms, fol-

lowed by a classical method of mapping and trans-

forming point patterns (Perov et al., 2020).

4.2.2 Training

In the training phase, the authors (Perov et al., 2020)

proposed two structures: the DF structure and the

Towards an Automatic System for Generating Synthetic and Representative Facial Data for Anonymization

857

Figure 2: Overview of the extraction phase of the DF structure in DeepFaceLab (DFL) - adapted from Perov et al, (Perov

et al., 2020).

LIAE structure. The architecture is composed of an

Encoder and an Inter that share weights and two De-

coders that belong to src and dst separately (Perov

et al., 2020). The DF framework aims to solve the

unpaired problem, it can complete the face swap task,

but it cannot inherit enough information from dst,

such as lighting.

Then, to improve the luminosity consistency prob-

lem, the LIAE structure is more complex. This struc-

ture consists of an Encoder, followed by two indepen-

dent Inters. InterAB generates latent code from src

and dst, while InterB generates latent code from dst

only (Perov et al., 2020).

Another strategy adopted by the authors (Perov

et al., 2020) is using a loss-weighted sum mask in

general SSIM to improve the quality of the generated

face, such as adding more weights to the eye area than

the cheek area to generate a livelier look.

The standard loss consists of a mixed loss (DSSIM

(structural dissimilarity) + MSE) to take advantage

of both, such as: generalizing human faces faster

and providing better clarity, respectively (Perov et al.,

2020).

4.2.3 Conversion

The ﬁrst step of converting from src to dst is trans-

forming the generated face (see Figure 3). The face

generated with the mask is reverted to the original po-

sition of the target image. Then the face is realigned

to ﬁt the target image’s outer contour perfectly. Fi-

nally, a super-resolution neural network improves the

sharpness and smoothing of the generated face (Perov

et al., 2020).

5 RESULTS

In this Section, we present the results obtained with

the training. We discuss the results related to the train-

ing data and the qualitative results in this case.

5.1 Data Results

We performed the training with the DF structure, us-

ing the Quick96 model (Perov et al., 2020), as shown

in Figure 2. The training lasted about 3 hours. The

hardware available for training was a computer with

an AMD Ryzen 5 3600XT six-core processor, 32 GB

of RAM, and the NVIDIA GeForce RTX 3090 graph-

ics processing unit (GPU) with 21.78 GB of VRAM.

We consider different expressions and lighting

conditions inside the vehicle to train the model. The

image “aligned debug” allows checking if the refer-

ence points are correct in the face extraction phase, as

shown in Figure 3.

5.2 Qualitative Results

The model was able to perform the alignment of the

front face effectively.For a lateral face, it was neces-

sary to contemplate several faces of rotation in the

data set to improve the result, but the face swap

in these cases presented a realism bottleneck. We

changed the available settings to achieve greater re-

alism and alignment of the generated deepfake face in

the conversion phase. This result is seen in Figure 4

Figure 5 shows some examples of the original im-

age versus the obtained deepfake. As expected, the

model had more difﬁculty performing face replace-

ment when there was interference, for example, hand

on face. We also observed some artifacts when the

driver was fatigued, such as when the driver yawned.

We hope to correct this aspect by contributing to the

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

858

Figure 3: Images generated by ”aligned debug” in the extraction phase for different poses and expressions. Allows the user

to verify that the reference points are correct.

Figure 4: Overview of conversion phase in DeepFaceLab (DFL) (adapted from Perov et al., 2017).

Figure 5: Original images versus deepfake synthetic images obtained with DFL training.

source dataset with more representations of the open

mouth and yawning. One can see an example of

the result achieved in this link: https://tinyurl.com/

2o6a2g8s.

Overall, the model achieved the goal of perform-

ing the driver’s anonymization while effectively pre-

serving expressions and poses.

6 CONCLUSION

In this work, we present a proof of concept of one

of the stages of development of an image data aug-

mentation tool. The ultimate goal of this research is

to generate more diverse, representative, and anony-

mous datasets in a GAN-based tool. In this work we

present the face change phase for a work environment

to generate synthetic and anonymized images.

We presented a case study an anonymization ap-

plication for a vehicle driver. We built a source

dataset from YouTube videos. We created a desti-

nation dataset with real vehicle driver videos. We

trained the DFL model to perform face-swap and

driver anonymization. We presented the quality of

anonymization generated from low-resolution and

low-quality images. The model provides signiﬁcant

anonymization results and has preserved poses and

expressions.

We used the DeepFaceLab framework for this de-

velopment step. We found signiﬁcant difﬁculties with

Towards an Automatic System for Generating Synthetic and Representative Facial Data for Anonymization

859

images of driver fatigue, such as yawning. In future

works, we intend to improve this aspect through the

dataset and test the LIAE model proposed by DFL. In

the next step, we intend to develop the integration be-

tween STEP 1 and STEP 2 to automate the process of

generating these datasets.

ACKNOWLEDGMENTS

The authors would like to thank CAPES, CNPq and

the Federal University of Ouro Preto for support-

ing this work. This study was ﬁnanced in part

by the Coordenac¸

ao de Aperfeic¸oamento de Pes-

soal de N

ıvel Superior - Brasil (CAPES) - Finance

Code 001, the Conselho Nacional de Desenvolvi-

mento Cient

ıﬁco e Tecnol

ogico (CNPQ) and the Uni-

versidade Federal de Ouro Preto (UFOP).

REFERENCES

Ahmed, A. A. and Echi, M. (2021). Hawk-eye: An ai-

powered threat detector for intelligent surveillance

cameras. IEEE Access, 9:63283–63293.

Alajrami, E., Tabash, H., Singer, Y., and El Astal, M.-T.

(2019). On using ai-based human identiﬁcation in im-

proving surveillance system efﬁciency. In 2019 In-

ternational Conference on Promising Electronic Tech-

nologies (ICPET), pages 91–95. IEEE.

Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasser-

stein generative adversarial networks. In Interna-

tional conference on machine learning, pages 214–

223. PMLR.

Boudjit, K. and Ramzan, N. (2022). Human detection based

on deep learning yolo-v2 for real-time uav applica-

tions. Journal of Experimental & Theoretical Artiﬁ-

cial Intelligence, 34(3):527–544.

Cai, Y., Wang, X., Yu, Z., Li, F., Xu, P., Li, Y., and Li, L.

(2019). Dualattn-gan: Text to image synthesis with

dual attentional generative adversarial network. IEEE

Access, 7:183706–183716.

Dang, H., Liu, F., Stehouwer, J., Liu, X., and Jain, A. K.

(2020). On the detection of digital face manipula-

tion. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern recognition, pages

5781–5790.

Eldrandaly, K. A., Abdel-Basset, M., and Abdel-Fatah, L.

(2019). Ptz-surveillance coverage based on artiﬁcial

intelligence for smart cities. International Journal of

Information Management, 49:520–532.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. Advances

in neural information processing systems, 27.

Gorospe, J., Mulero, R., Arbelaitz, O., Muguerza, J., and

Ant

on, M.

A. (2021). A generalization performance

study using deep learning networks in embedded sys-

tems. Sensors, 21(4):1031.

Guo, C., Huang, D., Zhang, J., Xu, J., Bai, G., and Dong, N.

(2020). Early prediction for mode anomaly in gener-

ative adversarial network training: an empirical study.

Information Sciences, 534:117–138.

Jalalifar, S. A., Hasani, H., and Aghajan, H. (2018).

Speech-driven facial reenactment using conditional

generative adversarial networks. arXiv preprint

arXiv:1803.07461.

Korshunova, I., Shi, W., Dambre, J., and Theis, L. (2017).

Fast face-swap using convolutional neural networks.

In Proceedings of the IEEE international conference

on computer vision, pages 3677–3685.

Kumar, R., Sotelo, J., Kumar, K., de Br

ebisson, A., and

Bengio, Y. (2017). Obamanet: Photo-realistic lip-sync

from text. arXiv preprint arXiv:1801.01442.

Li, Y. and Lyu, S. (2018). Exposing deepfake videos

by detecting face warping artifacts. arXiv preprint

arXiv:1811.00656.

Liu, Y., Kong, L., Chen, G., Xu, F., and Wang, Z. (2021).

Light-weight ai and iot collaboration for surveillance

video pre-processing. Journal of Systems Architec-

ture, 114:101934.

Luo, H., Liu, J., Fang, W., Love, P. E., Yu, Q., and Lu, Z.

(2020). Real-time smart video surveillance to man-

age safety: A case study of a transport mega-project.

Advanced Engineering Informatics, 45:101100.

Lv, X., Su, M., and Wang, Z. (2021). Application of face

recognition method under deep learning algorithm in

embedded systems. Microprocessors and Microsys-

tems, page 104034.

Mirsky, Y. and Lee, W. (2021). The creation and detec-

tion of deepfakes: A survey. ACM Computing Surveys

(CSUR), 54(1):1–41.

Morishita, F., Kato, N., Okubo, S., Toi, T., Hiraki, M.,

Otani, S., Abe, H., Shinohara, Y., and Kondo, H.

(2021). A cmos image sensor and an ai accelerator

for realizing edge-computing-based surveillance cam-

era systems. In 2021 Symposium on VLSI Circuits,

pages 1–2. IEEE.

Nath, N. D., Behzadan, A. H., and Paal, S. G. (2020). Deep

learning for site safety: Real-time detection of per-

sonal protective equipment. Automation in Construc-

tion, 112:103085.

Nguyen, H. H., Fang, F., Yamagishi, J., and Echizen, I.

(2019). Multi-task learning for detecting and seg-

menting manipulated facial images and videos. In

2019 IEEE 10th International Conference on Biomet-

rics Theory, Applications and Systems (BTAS), pages

1–8. IEEE.

Pavan Kumar, M. and Jayagopal, P. (2021). Generative ad-

versarial networks: a survey on applications and chal-

lenges. International Journal of Multimedia Informa-

tion Retrieval, 10(1):1–24.

Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S.,

e, C., Dpfks, M., Facenheim, C. S., RP, L., Jiang,

J., et al. (2020). Deepfacelab: Integrated, ﬂexible and

extensible face-swapping framework. arXiv preprint

arXiv:2005.05535.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

860

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,

Radford, A., and Chen, X. (2016). Improved tech-

niques for training gans. Advances in neural informa-

tion processing systems, 29.

Shaoanlu (2018). Faceswap-gan: A denoising autoencoder

+ adversarial losses and attention mechanisms for face

swapping.

Sikander, G. and Anwar, S. (2018). Driver fatigue detection

systems: A review. IEEE Transactions on Intelligent

Transportation Systems, 20(6):2339–2352.

Suwajanakorn, S., Seitz, S. M., and Kemelmacher-

Shlizerman, I. (2017). Synthesizing obama: learning

lip sync from audio. ACM Transactions on Graphics

(ToG), 36(4):1–13.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.

(2004). Image quality assessment: from error visi-

bility to structural similarity. IEEE transactions on

image processing, 13(4):600–612.

Yang, S., Niu, Z., Cheng, J., Feng, S., and Li, P. (2020).

Face recognition speed optimization method for em-

bedded environment. In 2020 International Workshop

on Electronic Communication and Artiﬁcial Intelli-

gence (IWECAI), pages 147–153. IEEE.

Yu, P., Xia, Z., Fei, J., and Lu, Y. (2021). A survey on

deepfake video detection. Iet Biometrics, 10(6):607–

624.

Zhao, X., Liang, X., Zhao, C., Tang, M., and Wang, J.

(2019). Real-time multi-scale face detector on em-

bedded devices. Sensors, 19(9):2158.

Towards an Automatic System for Generating Synthetic and Representative Facial Data for Anonymization

861