GAN Inversion with Editable StyleMap

So Honda, Ryohei Orihara

, Yuichi Sei

, Yasuyuki Tahara

and Akihiko Ohsuga

The University of Electro-Communications, Tokyo, Japan

Keywords:

GAN Inversion, StyleGAN, StyleMap, Editability, Local Editing.

Abstract:

Recently, the ﬁeld of GAN Inversion, which estimates the latent code of a GAN to reproduce the desired

image, has attracted much attention. Once a latent variable that reproduces the input image is obtained, the

image can be edited by manipulating the latent code. However, it is known that there is a trade-off between

reconstruction quality, which is the difference between the input image and the reproduced image, and ed-

itability, which is the plausibility of the edited image. In our study, we attempted to improve reconstruction

quality by extending latent code that represents the properties of the entire image in the spatial direction. Next,

since such an expansion signiﬁcantly impairs the editing quality, we performed a GAN Inversion that realizes

both reconstruction quality and editability by imposing an additional regularization. As a result, the proposed

method yielded a better trade-off between the reconstruction quality and the editability against the baseline

from both quantitative and qualitative perspectives, and is comparable to state-of-the-art(SOTA) methods that

adjust the weights of the generators.

1 INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfel-

low et al., 2014) are generative models consisting of a

Generator, which generates data similar to the train-

ing data, and a Discriminator, which distinguishes

whether the input is true data or generated data. Style-

GAN(Karras et al., 2019, 2020), which has made

great achievements as unconditional image genera-

tion application, can not only generate high-quality

images but also control the semantic properties of the

images because in the latent space the properties are

disentangled and become independently manipulat-

able. In StyleGAN, noise z, which follows a standard

normal distribution, is transformed into latent code w

by a mapping network and used for image genera-

tion. In pre-trained StyleGAN, the latent space W ,

which is the space where latent code w is distributed,

is known to have the disentanglement properties. In

other words, various image editing can be realized by

controlling the W space. However, in order to edit

an arbitrary image, latent code is needed to generate

such an image. The task of “inverting” the input im-

age to the latent code is called GAN Inversion. Usu-

https://orcid.org/0000-0002-9039-7704

https://orcid.org/0000-0002-2552-6717

https://orcid.org/0000-0002-1939-4455

https://orcid.org/0000-0001-6717-7028

ally, inversion into W space is the last resort due to its

poor reconstruction quality. Therefore, many studies

use W + space with as many different latent codes

as the number of convolution layers. At a resolu-

tion of 1024×1024, an arbitrary input image is repre-

sented in W + ⊂ R

18×512

space using 18 latent codes.

However, there are two problems: (1) W + space is

not always sufﬁcient for reconstruction, and (2) In-

version in W + space impairs editability. To solve the

former problem, we propose to perform GAN Inver-

sion using StyleMap, which is a spatial extension of

the latent code. The extension frees the StyleGAN

Generator from the need to represent the features of

the entire image as vectors and allows detailed recon-

struction of each segmented image region. We found

that this method yields high reconstruction quality but

poor editability. Several studies (Tov et al., 2021; Zhu

et al., 2020b) have shown that editability decreases as

the estimated latent code deviates from the region of

the latent space used in unconditional image genera-

tion. Therefore, we incorporated simple regulariza-

tions to improve editability. The regularizations make

our GAN Inversion competitive with existing work in

the trade-off between reconstruction quality and ed-

itability. Finally, to conﬁrm that the editing results

were qualitatively satisfactory, we actually edited im-

ages with StyleGAN using well-known methods as

shown in Figure 1.

Honda, S., Orihara, R., Sei, Y., Tahara, Y. and Ohsuga, A.

GAN Inversion with Editable StyleMap.

DOI: 10.5220/0011676400003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 389-396

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

389

Figure 1: GAN Inversion by the proposed method and image editing results using it. From left to right: input image,

reconstructed image, aged image, smiling image, and tooniﬁed image. The last two rows, from top to bottom: input image,

reconstructed image, and local editing with neighboring images. Local editing mixes the original image at the top of the

image and the neighboring image at the bottom.

This paper is organized as follows. Section 2 de-

scribes related works on GAN Inversion and its ed-

itability, and latent codes extended in the spatial di-

rection. Section 3 explains the architecture of the pro-

posed method and the regularization method. Section

4 presents the experimental setup and results. Section

5 provides a quantitative and qualitative evaluation of

the proposed method. Section 6 discusses the limita-

tions of our study and the validity of our assumptions.

Finally, Section 7 presents conclusions and future re-

search.

2 RELATED WORKS

2.1 GAN Inversion

GAN Inversion is a task to estimate the latent code

of a GAN such that the desired image can be gen-

erated. The basic policy is to ﬁnd the latent code

that minimizes the difference between the input im-

age and the generated image. The simplest way to

achieve this is to optimize the latent code directly

(Abdal et al., 2019) using, for example, gradient de-

scent. While this method yields high reconstruction

quality, it suffers from very long inference times. On

the other hand, encoder-based methods (Richardson

et al., 2021a; Tov et al., 2021), while inferior in re-

construction quality, have the signiﬁcant advantage of

faster inference time. Intermediate between these two

methods are (1) separately optimizing a latent code

inferred by a pre-trained encoder as an initial value

and (2) progressively obtaining a better latent code us-

ing the encoder multiple times (Alaluf et al., 2021a).

2.2 Editability of GAN Inversion

Pre-trained StyleGANs can reconstruct arbitrary im-

ages due to their generative power. For example, a

StyleGAN trained on a human face image can recon-

struct an image of a cat’s face or even a bedroom (Zhu

et al., 2020a). On the other hand, there is no guaran-

tee that the estimated latent code has good editabil-

ity. Several studies have attributed this phenomenon

to the difference in the distribution of the latent code

at the time of generation and those of the estimated

latent code.

Tov et al. (2021) proposed Encoder for Editing

(e4e), which improves editability at the expense of re-

construction quality by inverting input image to a la-

tent code distributed in a space close to the W space

at image generation. Here, the closeness to the distri-

bution in W space is deﬁned as (1) the closeness of

each of the 18 latent codes and (2) the closeness of

the distribution of each latent code to the distribution

in W space. For the former, the encoder is trained to

minimize the L

norm between the latent code con-

trolling the coarsest scale and other latent codes. For

the latter, a Discriminator is used to determine if the

latent code is sampled from W space.

Zhu et al. (2020b) qualitatively showed that p =

LeakyReLU

5.0

(w), the output of the mapping net-

work w before passing the ﬁnal activation function,

is multivariate normally distributed. They then per-

formed a highly editable GAN Inversion by adding

the Mahalanobis’ distance between the estimated la-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

390

tent code and the mean of p as a penalty term and op-

timizing it by gradient descent. Note that the policy

by Zhu et al. (2020b) allows for high reconstruction

quality and editability, while the inference time is still

long.

2.3 Spatially Extended Latent Code

In general, GAN uses a latent code to represent the

properties of the entire image. However, there is also

a demand to generate images by specifying different

properties for each region. Examples include methods

that generate images by providing semantic segmen-

tation masks as conditions (Park et al., 2019; Isola

et al., 2017), GAN architectures that simply extend

the latent code in the spatial direction (Kim et al.,

2021), and local editing methods for the generated

image of a trained StyleGAN (Hong et al., 2020).

In particular, StyleMapGAN(Kim et al., 2021) is

an image generation architecture that allows local

editing of images by simply extending the latent code

in the spatial direction, and also performs GAN Inver-

sion by learning additional encoders. However, it is a

modiﬁcation of the StyleGAN1 architecture, making

it difﬁcult to take advantage of the rich resource of

pre-trained StyleGAN2 weights.

2.4 Pivotal Tuning Inversion

Pivotal Tuning Inversion (PTI) (Roich et al., 2021) is

a novel method of GAN Inversion that has attracted

much attention in recent years. PTI is a two-stage

method where in the ﬁrst stage images are inverted to

W space, which has high editability but low recon-

struction quality, and later the generator weights are

tuned to minimize the difference between its output

and the input image. It is reported that such a strat-

egy achieves ﬁne results in editability and reconstruc-

tion quality. The SOTA methods in encoder-based

GAN Inversion are HyperStyle(Alaluf et al., 2021b)

and HyperInverter(Dinh et al., 2022), which are meth-

ods that employ the PTI strategy.

3 METHOD

3.1 Architecture of Encoder

We modiﬁed the pSp encoder(Richardson et al.,

2021a) to be able to output StyleMap. The pSp en-

coder transforms each of the three levels of interme-

diate feature maps into latent code using Map2Style

blocks. We deﬁned the Map2Map block by replac-

ing some of the stride 2 convolution layers of the

Map2Style block with stride 1 convolution layers.

The architecture of the proposed method is shown in

Figure 2.

Each Map2Map block downsamples the feature

map by three stride-2 convolutions. Since the reso-

lution of the feature map is halved with each down-

sampling, a StyleMap of 2 × 2 is estimated for the

coarse scale and 4× 4 for the medium scale. The esti-

mated StyleMap is upsampled to the same size when

the StyleGAN feature map is convolved. On the ﬁne

scale, a regular Map2Style network is used to save

memory. Upon receiving a StyleMap, the pre-trained

StyleGAN transforms the feature map using an oper-

ation called Spatially Modulated Convolution. Spa-

tially Modulated Convolution is a generalized convo-

lution operation with Weight Demodulation:

SpModConv

(x, s) =

w ∗ (s  x)

∑

i=1

∑

j=1

∗ s

)

i, j

(1)

where s is the StyleMap transformed by pointwise

convolution using the weights of the afﬁne transfor-

mation layer of StyleGAN, x is the feature map and w

is the convolution weight.

These extensions make few fundamental changes

to the architecture of the StyleGAN generator, there-

fore abundant pre-trained models can be used without

modiﬁcation.

3.2 Reconstruction Loss

In our method, the reconstruction loss is the same as

the one used in pSp. The loss is composed of L

loss,

LPIPS loss(Zhang et al., 2018), and ID loss. LPIPS

loss is a measure known to be close to human percep-

tion. ID loss is deﬁned by L

= 1 − R(x) · R( (G ◦

E)(x) ) for the ArcFace(Deng et al., 2019) network R

that outputs face similarity.

3.3 Regularization of Latent Code

Since we found that Inversion to StyleMap improves

reconstruction quality but signiﬁcantly impairs ed-

itability, we examined the latent space. We ﬁrst in-

vestigated the singular values of the outputs of the

mapping network, inspired by the insights of Zhu et

al (Zhu et al., 2020b). The singular values are shown

in Figure 3.

Assuming that small singular values contribute lit-

tle to the properties of the image, we performed di-

mensionality reduction and had the Map2Map and

Map2Style blocks output a normalized latent code of

128 dimensions. The encoder restores the latent code

to 512 dimensions. That is, for example, the output v

GAN Inversion with Editable StyleMap

391

Figure 2: Our architecture. The left frame of the ﬁgure shows the encoder and the right frame shows the StyleGAN generator.

Similar to the pSp encoder, the feature pyramid in the ResNet backbone converts the input image into a 3-scale feature map.

However, the coarse and medium scale feature maps are converted to a StyleMap by the Map2Map block, and the ﬁne scale

feature map is converted to a Style by the original Map2Style block.

Figure 3: Visualization of the singular values of the output

of the mapping network. The 512 singular values are side by

side. The large singular values are only a small percentage

of the total, and after that they are much smaller.

of Map2Style is transformed as w

= Av

w. where

A ∈ R

512×128

and

w is the average of the mapping net-

work output.

Two regularization terms were added to encour-

age the encoder output to be distributed closer to W

space.

The ﬁrst is a term to bring each latent code

closer to the distribution of the output of the map-

ping network. The outputs of the Map2Map and

Map2Style blocks should be standardized and uncor-

related. Therefore, we used the KL divergence with

the standard normal distribution for the output v

each block as the loss function:

KLD

) =

D−1

∑

d=0

ln(σ

i,d

) +

1 + µ

i,d

2σ

i,d

−

(2)

where D is the number of dimensions of v

, this time

128, and µ

i,d

and σ

i,d

are the mean and variance for

the dth element of v

, respectively.

The second is the difference minimization of the

latent code at each scale. The output of Map2Map

for the coarsest scale is downsampled to 1 × 1 and the

norm with the other outputs is used for the loss

function.

The ﬁrst constraint implicitly assumes that the

W space is multivariate normal. However, as Zhu

et al. (2020b) showed, the mapping network of the

StyleGAN is generally closer to the distribution trans-

formed by the activation function than to the multi-

variate normal distribution. However, we experimen-

tally conﬁrmed that the constraint works well despite

this fact.

4 EXPERIMENTS

4.1 Dataset

For GAN Inversion of human face images, The en-

coder is trained with FFHQ(Karras et al., 2019), a

face image dataset. StyleGAN, pre-trained on the

same dataset, is used as the generator. CelebA-

HQ(Karras et al., 2018), a face image dataset different

from FFHQ, is used for evaluation.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

392

4.2 Experimental Results

The inversion results of the existing method and the

inversion results with StyleMap without regulariza-

tion and with regularization are shown in Figure 4. In-

put images were randomly selected from the CelebA-

HQ test set. Although all methods are able to re-

construct the input image reasonably well, the pro-

posed method without regularization and HyperIn-

verter have higher reconstruction quality.

Input

pSp

e4e

Hyper

Inverter

Ours

w/o reg

Ours

Figure 4: Inversion results of CelebA-HQ for existing and

proposed methods.

5 EVALUATION

5.1 Quantitative Evaluation of

Reconstruction Quality

The evaluations by LPIPS and MSE of our method as

well as pSp and e4e are shown in Table 1. The pro-

posed method is better than the baselines. In particu-

lar, the proposed method without regularization even

outperforms the SoTA, HyperInverter.

Table 1: Quantitative evaluation of reconstruction quality.

Method LPIPS↓ MSE↓

pSp 0.16 0.03

e4e 0.20 0.05

HyperInverter 0.11 0.02

Ours w/o reg 0.10 0.02

Ours 0.15 0.03

5.2 Qualitative Evaluation of Editability

5.2.1 Latent Code Editing

The results of image editing by adding Age vectors by

InterfaceGAN(Shen et al., 2020) for the existing and

proposed methods are shown in Figure 5. HyperIn-

verter, pSp and the proposed method without regular-

ization have low plausibility of the edited image. Es-

pecially in the second column, HyperInverter’s edit-

ing even changed the gender of the subject.

5.2.2 Toonify

Toonify(Pinkney and Adler, 2020) is a method

that performs impressive image transformations by

switching layers midway between a pre-trained Style-

GAN and a ﬁne-tuned version of it on a different data

set. The results of Toonify editing for each method

are shown in Figure 6. Although pSp reports better

tooniﬁcation than the default by learning with differ-

ent settings in an additional report(Richardson et al.,

2021b), we used the default encoder for all methods

for the sake of fair comparison. In other words, the

encoders used in Figure 6 are the same as in Figure

4, respectively. The proposed method with regular-

ization is as plausible as e4e, while pSp and the pro-

posed method without regularization are less visually

plausible. Although HyperInverter’s tooniﬁcation is

plausible, its editing effect is insigniﬁcant

5.2.3 Local Editing

Local editing of images is possible by editing the

StyleMap. Figure 7 shows how local editing was per-

formed by gently interpolating the StyleMap.

5.3 Quantitative Evaluation of

Editability

Tov et al. (2021) proposed Latent Editing Consistency

(LEC) as a measure of editability in GAN Inversion:

LE C( f

) = E

[kE(x) −( f

−1

◦ E ◦ G ◦ f

◦ E)(x)k

]

(3)

GAN Inversion with Editable StyleMap

393

Input

pSp

e4e

Hyper

Inverter

Ours

w/o reg

Ours

Figure 5: The inversion results of CelebA-HQ for existing

and proposed methods.

where E and G are the encoder and StyleGAN gener-

ator, respectively, and f

(w) = w + αw

dir

. LEC mea-

sures the distance between the latent code estimated

from an image and the latent code estimated from an

image that has been edited and reverse-edited. The

Age and Smile vectors produced by InterfaceGAN are

used as the edit vector w

dir

. We deﬁned avg, min, and

max as extensions of LEC to StyleMap. Each is mea-

sured by calculating the mean, maximum, and mini-

mum pixel values for the pixel-wise sum of squares of

the difference between two StyleMaps. We evaluated

LEC with α = 3, −3 for the Age vector as Old and

Young, respectively, and with α = 2, −2 for the Smile

vector as Smile and No Smile, respectively. The eval-

uation results are shown in Table 2.

The proposed method without regularization is,

as expected, hardly editable. On the other hand, the

proposed method with regularization shows better ed-

Input

pSp

e4e

Hyper

Inverter

Ours

w/o reg

Ours

Figure 6: The results of tooniﬁcation using existing and pro-

posed methods.

Table 2: Editability Evaluation by LEC↓.

Method Old Young Smile No Smile

pSp 63.03 59.43 48.09 48.15

e4e 24.33 24.59 19.92 20.45

HyperInverter 34.04 34.81 24.99 26.00

Ours w/o reg(avg) 295.07 285.80 264.90 264.88

Ours w/o reg(min) 201.18 193.42 178.17 178.38

Ours w/o reg(max) 425.15 413.76 386.07 384.96

Ours (avg) 32.42 30.57 22.94 23.09

Ours (min) 29.99 27.84 20.95 21.10

Ours (max) 36.12 34.65 26.21 26.34

itability than pSp, even if not as good as e4e. Note

that e4e is a method that improves editability at the

expense of reconstruction quality from pSp, and our

method improves editability without sacriﬁcing the

reconstruction quality of pSp. See Figure 8 for the

trade-off between reconstruction quality and editabil-

ity.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

394

Figure 7: Local editing results. Each edited image is a com-

posite of the top of the image in the ﬁrst row and the bottom

of the image in the ﬁrst column.

Figure 8: Trade-off between reconstruction quality (LPIPS)

and editability (LEC). As editability, the average of four val-

ues of LEC is plotted. Avg, min, and max are plotted as the

LEC of the proposed method, respectively. The proposed

method with regularization, plotted in ?, achieves an excel-

lent trade-off.

6 DISCUSSION

6.1 Input Image Challenging to

Reproduce

It is still challenging to reconstruct speciﬁc images

as well as the existing methods, while the proposed

method has qualitatively and quantitatively better re-

Input pSp e4e

Hyper

Inverter

Ours

Figure 9: Examples of images that are difﬁcult to recon-

struct and comparison of actual reconstruction results.

construction quality than the existing methods. Dis-

appointing results can be seen as shown in Figure

9, where arms, clothes, or ornaments cover the face

(rows 1-2), or where the face orientation is extreme

(rows 3). However, while pSp vaguely describes the

garment in the second line, the proposed method re-

constructs it better. In the third line, the proposed

method can reconstruct the extreme angles of the face

in the input image, while the baselines are not able to

reproduce it. In addition, the proposed method also

tries to reconstruct the background, which the base-

lines have given up on. Overall, even with challeng-

ing inputs, the proposed method is able to reconstruct

them as well as HyperInverter.

6.2 Regularization Assuming Normal

Distribution

In this study, we designed the architecture assuming

that each element of the output of the Map2Map and

Map2Style blocks (1) has 0 expected value, (2) has 1

variance, and (3) is uncorrelated. However, we used

the KL divergence to the standard normal distribution

as a penalty term to encourage its output to satisfy

the above conditions. This leads to the following two

problems:

• KL divergence to each element does not guarantee

uncorrelatedness

• W space is different from a normal distribution

Despite these problems, the proposed method shows

good results. For the former, if each element of the

output of the block is correlated, the distribution of the

estimated latent code will fall within the distribution,

although it will not match the desired distribution. For

the latter, although the shape of the distribution is not

taken into account, it is considered to be a sufﬁcient

constraint in that standardization of each element is

encouraged.

GAN Inversion with Editable StyleMap

395

7 CONCLUSIONS

In this study, we proposed a GAN Inversion us-

ing StyleMap, a spatial extension of the latent code

that controls image properties in StyleGAN. We

found that a simple extension of existing encoders to

StyleMap improves reconstruction quality, but signif-

icantly degrades editability. Therefore, we added reg-

ularization to improve editability. Even though the

use of StyleMap is out of consideration in the design

of StyleGAN, we conﬁrmed that our method is com-

parable to existing methods in image editing. In addi-

tion, we showed that StyleMap allows local editing of

arbitrary images. Notably, our method is comparable

in performance to SOTA methods, even though it em-

ploys a strategy independent of PTI. In other words,

performance can be improved by incorporating PTI’s

strategy into our method. In the future, we would like

to adopt a PTI strategy and experiment with a wide

range of data sets.

ACKNOWLEDGEMENTS

This work was supported by JSPS KAKENHI Grant

Numbers JP21H03496, JP22K12157.

REFERENCES

Abdal, R., Qin, Y., and Wonka, P. (2019). Image2stylegan:

How to embed images into the stylegan latent space?

In Proceedings of the IEEE international conference

on computer vision.

Alaluf, Y., Patashnik, O., and Cohen-Or, D. (2021a).

Restyle: A residual-based stylegan encoder via iter-

ative reﬁnement. In Proceedings of the IEEE/CVF In-

ternational Conference on Computer Vision (ICCV).

Alaluf, Y., Tov, O., Mokady, R., Gal, R., and Bermano,

A. H. (2021b). Hyperstyle: Stylegan inversion with

hypernetworks for real image editing.

Deng, J., Guo, J., Niannan, X., and Zafeiriou, S. (2019).

Arcface: Additive angular margin loss for deep face

recognition. In CVPR.

Dinh, T. M., Tran, A. T., Nguyen, R., and Hua, B.-S. (2022).

Hyperinverter: Improving stylegan inversion via hy-

pernetwork. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Ghahra-

mani, Z., Welling, M., Cortes, C., Lawrence, N., and

Weinberger, K. Q., editors, Advances in Neural Infor-

mation Processing Systems, volume 27, pages 2672–

2680. Curran Associates, Inc.

Hong, S., Arjovsky, M., Barnhart, D., and Thompson, I.

(2020). Low distortion block-resampling with spa-

tially stochastic networks. In Larochelle, H., Ranzato,

M., Hadsell, R., Balcan, M. F., and Lin, H., editors,

Advances in Neural Information Processing Systems,

volume 33, pages 4441–4452. Curran Associates, Inc.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

Image-to-image translation with conditional adversar-

ial networks. CVPR.

Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2018). Pro-

gressive growing of gans for improved quality, sta-

bility, and variation. In International Conference on

Learning Representations.

Karras, T., Laine, S., and Aila, T. (2019). A style-based

generator architecture for generative adversarial net-

works. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition (CVPR).

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J.,

and Aila, T. (2020). Analyzing and improving the im-

age quality of StyleGAN. In Proc. CVPR.

Kim, H., Choi, Y., Kim, J., Yoo, S., and Uh, Y. (2021).

Exploiting spatial dimensions of latent in gan for real-

time image editing. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition.

Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. (2019).

Semantic image synthesis with spatially-adaptive nor-

malization. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition.

Pinkney, J. and Adler, D. (2020). Resolution dependant gan

interpolation for controllable image synthesis between

domains.

Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar,

Y., Shapiro, S., and Cohen-Or, D. (2021a). Encoding

in style: a stylegan encoder for image-to-image trans-

lation. In IEEE/CVF Conference on Computer Vision

and Pattern Recognition (CVPR).

Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar,

Y., Shapiro, S., and Cohen-Or, D. (2021b). Encoding

in style: a stylegan encoder for image-to-image trans-

lation. https://github.com/eladrich/pixel2style2pixel#

additional-applications.

Roich, D., Mokady, R., Bermano, A. H., and Cohen-Or, D.

(2021). Pivotal tuning for latent-based editing of real

images. ACM Trans. Graph.

Shen, Y., Gu, J., Tang, X., and Zhou, B. (2020). Interpreting

the latent space of gans for semantic face editing. In

CVPR.

Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., and Cohen-Or,

D. (2021). Designing an encoder for stylegan image

manipulation. arXiv preprint arXiv:2102.02766.

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang,

O. (2018). The unreasonable effectiveness of deep

features as a perceptual metric. In CVPR.

Zhu, J., Shen, Y., Zhao, D., and Zhou, B. (2020a). In-

domain gan inversion for real image editing. In Pro-

ceedings of European Conference on Computer Vision

(ECCV).

Zhu, P., Abdal, R., Qin, Y., Femiani, J., and Wonka, P.

(2020b). Improved stylegan embedding: Where are

the good latents?

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

396