Anime Face Generating Using Deep Learning

Shaik Muskan Begum, Desam Yashoda, P. Raga Chandrika, Veena Madhuri and Sura Sravani

Department of Computer Science and Engineering, Ravindra College of Engineering for Women, Kurnool,

Andhra Pradesh, India

Keywords: Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Variational

Autoencoders (VAEs), Pre-Processing.

Abstract: In recent years, the rise of deep learning technologies has greatly enhanced the capabilities of generating

realistic and artistic images. One area that has gained significant attention is the automatic generation of anime

faces. These images, characterized by distinct features such as large eyes, exaggerated facial expressions, and

vibrant hair colors, have seen widespread use in both entertainment and art. The task of generating such faces

requires an effective combination of deep learning techniques, including Convolutional Neural Networks

(CNNs), Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs). This paper

proposes a novel approach for generating anime faces using deep learning methods. Our model leverages a

GAN architecture, where the generator creates high-quality anime faces, and the discriminator ensures that

these faces appear realistic and true to the style. To train this model, we use a large dataset of labeled anime

face images, ensuring diversity in terms of facial expressions, hair styles, and other unique anime traits. The

dataset is preprocessed to extract important facial features, which allows the model to focus on learning the

abstract characteristics of anime faces, such as the stylized proportions and exaggerated features.

1 INTRODUCTION

Anime face generation using deep learning is a

fascinating application of artificial intelligence,

where neural networks are trained to create realistic

and stylistic anime character faces. This technology

leverages generative models, particularly Generative

Adversarial Networks (GANs) or Variational

Autoencoders (VAEs), to learn the patterns and

features commonly found in anime artwork.

The process typically involves training a model on

a large dataset of anime faces, which allows the

network to learn and replicate the distinctive features

such as large eyes, unique expressions, and specific

styles of shading and coloring. Once trained, the

model can generate new anime faces based on

random inputs or specific conditions (like gender,

hairstyle, etc.). This technology has been used in

applications like generating character designs for

anime and video games, art creation, and even

personal avatars. It also highlights the power of deep

learning in creative fields, making it possible for

anyone to create unique anime-style artwork through

AI tools.

2 RELATED WORKS

Anime face generation using deep learning has

garnered significant attention in recent years, with

many advancements in both research and practical

applications. Below is a brief overview of key related

works in the area:

2.1 GANs for Anime Face Generation

(Generative Adversarial Network) generative

adversarial network (GAN) represents a deep

learning framework. It consists of two neural

networks that work against each other to create more

realistic data based on a provided training dataset. For

example, you can produce new images from an

existing image collection or compose original music

using a library of songs. The term adversarial is used

because it involves two distinct networks competing

with one another. One network generates new data by

altering an input sample as much as possible, while

the other network attempts to identify if the generated

data belongs to the original dataset. In simpler terms,

the second network assesses whether the data

202

Begum, S. M., Yashoda, D., Chandrika, P. R., Madhuri, V. and Sravani, S.

Anime Face Generating Using Deep Learning.

DOI: 10.5220/0013910400004919

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 4, pages

202-207

ISBN: 978-989-758-777-1

produced is genuine or fabricated. The system

continues to refine and enhance the fake data until the

guessing network is unable to tell the difference

between fake and original data.

Creating an anime face using deep learning,

particularly Generative Adversarial Networks

(GANs), has been an interesting field of research in

the AI community. GANs are often used to generate

realistic images, and for anime faces, they are trained

on large datasets of anime images to generate new,

unique anime-style faces. Figure 1 Shows the

Generative adversarial network.

Here are some relevant works and approaches

related to GANs that have been used for generating

anime faces:

Figure 1: Generative adversarial network.

2.2 StyleGAN and StyleGAN2

StyleGAN

(and its improved version StyleGAN2) are among the

most famous GAN architectures used for generating

high-quality images. StyleGAN allows for more

control over the generated images by manipulating

the "style" or latent space at various levels, making it

highly suitable for generating faces. These models

can be used to generate realistic or stylized faces,

including anime faces, when trained on a specific

dataset of anime artwork.

2.2.1 Anime Faces with StyleGAN

There has been significant work where StyleGAN is

fine-tuned on large datasets of anime faces, such as

the Danbooru dataset. These models can generate

high-quality anime faces with fine details like eyes,

hair, and facial expressions.

2.3 AnimeGAN

AnimeGAN is a specific GAN-based model that was

designed to generate anime-style faces from real

images. The model is trained to convert real human

faces into anime-like faces by learning the specific

stylizations used in anime artwork. Style GAN

Architecture Shown in Figure 2.

Figure 2: Style GAN Architecture.

It’s a specialized model designed to ensure that

the generated image maintains the unique

characteristics of anime, such as large eyes,

simplified facial features, and bright colors.

2.4 DeepAnime

This is a deep learning framework that leverages

GANs to create anime faces. It includes several

architectures, including convolutional neural

networks (CNNs) and GANs, for the task of

translating or generating anime characters. One

example is DeepAnime, which can convert real-world

faces into anime-style versions or generate unique

anime-style faces from scratch.

2.5 CycleGAN for Anime Faces

CycleGAN is another GAN variant that is used for

image-to-image translation. It can be trained to

convert real human faces into anime-style faces

without the need for paired data. By using

unsupervised learning, CycleGAN learns to map

images between two domains (real faces and anime

faces). This allows it to generate anime faces from

photographs or create variations of anime characters.

Like all the adversarial network CycleGAN also has

two parts Generator and Discriminator, the job of

generator to produce the samples from the desired

distribution and the job discriminator is to figure out

the sample is from actual distribution (real) or from

the one that are generated by generator (fake). Figure

3 Shows the CycleGAN Architecture for Unpaired

Image-to-Image Translation Between Domains X and

Anime Face Generating Using Deep Learning

203

Figure 3: CycleGAN Architecture for Unpaired Image-to-

Image Translation Between Domains X and Y.

CycleGAN is a fascinating deep learning model

designed for unpaired image-to-image translation.

Unlike traditional GANs that generate images from

random noise, CycleGAN transforms images from

one domain (say, horses) to another (like zebras)

without needing paired examples. Let’s break down

its architecture in a clear, natural way.

2.5.1 Core Idea

CycleGAN tackles the challenge of translating

images between two domains—let’s call them

Domain X and Domain Y—without a direct one-to-

one mapping in the training data. It uses two

generators and two discriminators, tied together with

a unique "cycle consistency" concept. The idea is

simple yet clever: if you translate an image from X to

Y and back to X, you should end up close to where

you started. This enforces consistency without paired

data.

Components of the Architecture

The CycleGAN architecture has four main neural

networks working in harmony:

Generator G (X → Y)

● Takes an image from Domain X

(e.g., a horse) and generates a

version in Domain Y (e.g., a zebra).

● Think of it as an artist who learns to

paint zebras when given horse

pictures.

Generator F (Y → X)

● Does the reverse: takes an image

from Domain Y and generates one

in Domain X.

● This is the artist flipping the task,

turning zebras back into horses.

Discriminator D_Y

● Looks at images from Domain Y

(real zebras) and generated images

from G (fake zebras) and tries to tell

them apart.

● It’s like a critic who decides if a

zebra painting looks real.

2. Discriminator D_X

● Does the same for Domain X,

distinguishing real horses from fake

ones produced by F.

● Another critic, this time for horse

authenticity.

How It Works

The training process is where the magic happens.

Here’s the flow:

• Forward Cycle: Start with an image from X.

Generator G transforms it into a fake Y image

(X → G(X)). Then, Generator F takes that fake

Y and tries to reconstruct the original X (G(X)

→ F(G(X))). The goal? F(G(X)) should look

very close to the original X.

• Backward Cycle: Flip it. Start with a Y image,

use F to make a fake X (Y → F(Y)), then G to

get back to Y (F(Y) → G(F(Y))). Again,

G(F(Y)) should resemble the original Y.

This back-and-forth cycle is what makes CycleGAN

special it’s self-regulating. The generators aren’t just

making random images;

they’re constrained to preserve the essence of the

input through this loop.

Figure 4: Cycle GAN Framework with Generators and

Discriminators Demonstrating Cycle-Consistency Loss.

Network Design

Let’s get into the nuts and bolts of the architecture:

2.5.2 Generators

The generators typically follow a structure inspired

by image-to-image translation models:

• Encoder: Downsamples the input image (e.g.,

256x256x3 for RGB) using convolutional layers

with stride 2. This shrinks the spatial size while

increasing the number of feature channels (e.g.,

64, 128, 256 filters).

• Transformer: A set of residual blocks (often 6

or 9) with skip connections. These keep the core

features intact while applying the stylistic

changes between domains. Each block has

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

204

convolutions, normalization, and ReLU

activation.

• Decoder: Upsamples back to the original size

using fractionally-strided convolutions

(sometimes called transposed convolutions)

with stride ½. This rebuilds the image with the

new style.

Normalization is key here CycleGAN often uses

Instance Normalization (normalizing each sample’s

feature maps independently) instead of Batch

Normalization, which works better for style transfer

tasks.

2.5.3 Discriminators

The discriminators use a PatchGAN design, which is

different from a typical GAN discriminator:

Instead of judging the whole image as real or fake,

it evaluates small patches (e.g., 70x70 regions).

The architecture is a series of convolutional layers

(e.g., 64, 128, 256, 512 filters) with 4x4 kernels,

stride 2, and LeakyReLU activation. No

normalization on the first layer, but Instance

Normalization elsewhere.

The output is a grid (e.g., 30x30 for a 256x256

input), where each value says if its patch is real or

fake. The final score averages these patch predictions.

This patch-based approach focuses on local

texture and style, which suits image translation well.

2.5.4 Loss Functions

Cycle GAN’s training balances three types of losses:

Adversarial Loss

For G and D_Y: G tries to fool D_Y by making G(X)

look like real Y images. D_Y fights back by getting

better at spotting fakes.

For F and D_X: Same deal, but F(Y) must trick

D_X into thinking it’s a real X.

● Often uses Least Squares Loss (L2) instead of

the classic cross-entropy, making training

more stable.

CycleConsistency Loss

● Measures how close F(G(X)) is to X and

G(F(Y)) to Y using an L1 (mean absolute

error) norm.

● This is the heart of unpaired training ensuring

the mappings are reversible.

● Weighted by a factor λ (typically 10) to

balance it against adversarial loss.

LGAN(G, DY, X, Y) = Ey ∼ pdata(y)

[logDY(y)] + Ex ∼ pdata(x)[log(1 − DY

(G(x)))] (1)

Figure 5: Visualization of Cycle-Consistency Loss in

Bidirectional Translation Between Domains X and Y Using

Cycle GAN.

2.5.5 Identity Loss (Optional)

If you feed G an image already from Y, it should

output something close to that input (G(Y) ≈ Y).

Same for F(X) ≈ X.

Helps preserve color and structure, especially for

subtle translations. Weighted lower (e.g., 0.5λ).

The total loss combines these, optimizing G and F

to minimize it while D_X and D_Y maximize their

adversarial terms.

Putting It Together

Imagine training CycleGAN to turn horses into

zebras:

● G learns to add stripes to horses.

● F learns to remove stripes from zebras.

● D_Y checks if zebra images (real or fake) look

legit.

● D_X ensures horse images pass muster.

LGAN(F, DX, Y, X) = Ex ∼ pdata(x)[logDX

(x)] + Ey ∼ pdata(y)[log(1 − DX(F(y)))] (2)

● The cycle loss keeps the horse’s pose and

background intact through the transformation

loop.

3 METHODOLOGY

3.1 Data Collection and Preprocessing

• Data Collection: For the training of a deep

learning model, a large dataset of high-quality

anime face images is required. Common

datasets used for anime generation include

• Anime Face Dataset: A large-scale dataset of

labeled anime images, containing thousands of

different character faces.

• Other Annotated Datasets: A collection of

anime characters with tagged attributes like

Anime Face Generating Using Deep Learning

205

gender, expression, etc. Figure 6 Shows the

High-level procedure of anime face generating

using deep learning.

Figure 6: High level procedure of anime face generating

using deep learning.

3.2 Data Preprocessing

• Image Resizing: All images are resized to a

fixed resolution (e.g., 64x64 or 256x256

pixels) for input consistency.

• Normalization: Pixel values are normalized

to the range [0, 1] or [-1, 1] to speed up

training.

• Augmentation: Random horizontal flipping,

rotation, and other techniques to increase the

diversity of the training data and improve

generalization.

3.2.1 Choose a Deep Learning Framework

• Options: Use TensorFlow, PyTorch, or Keras

(PyTorch is popular for GANs due to its

flexibility).

• Hardware: Ensure access to a GPU (e.g.,

NVIDIA) for faster training. Google Colab or

a local setup with CUDA support works well.

• Libraries: Install dependencies like NumPy,

OpenCV, and Matplotlib for data handling and

visualization.

3.2.2 Select a Model Architecture

• Model Type: Use a GAN, which consists of

two networks:

• Generator: Takes random noise as input and

generates fake anime faces.

• Discriminator: Evaluates whether an image is

real (from the dataset) or fake (from the

generator).

• Specific Architecture: Start with a simple

DCGAN (Deep Convolutional GAN) or

upgrade to StyleGAN for higher quality.

• Generator: Convolutional layers with

upsampling (e.g., transposed convolutions).

• Discriminator: Convolutional layers with

downsampling.

• Hyperparameters: Set learning rate (e.g.,

0.0002), batch size (e.g., 64), and latent space

size (e.g., 100-dimensional noise vector).

3.2.3 Implement the Training Loop

• Epochs: The model is trained over several

epochs, with frequent checkpoints to save

progress.

• Batch Size: Choose an appropriate batch size

(e.g., 16, 32) based on GPU memory

limitations.

• Learning Rate: A learning rate scheduler or

decayed learning rate is used to stabilize the

training process.

• Optimization Algorithm: The Adam

optimizer is commonly used for both

generator and discriminator.

Loss Functions

● Adversarial Loss: The GAN's training is

based on the adversarial loss, where the

generator tries to fool the discriminator, and

the discriminator tries to correctly classify

images as real or fake.

● Feature Matching Loss: In addition to

adversarial loss, feature matching loss (to

ensure high-level consistency in generated

features) can be added forbetter performance.

● L2 Loss / Content Loss: For enhanced image

quality, an L2 loss can be used to minimize

pixel-wise differences between the generated

images and the real images.

3.2.4 Evaluation and Post-Processing

Quantitative Evaluation

● Frechet Inception Distance (FID): Measures

the similarity between generated images and

real images, evaluating the quality and

diversity of generated anime faces.

● Inception Score (IS): Another metric for

evaluating the quality of generated images

based on the classifier’s predictions.

Qualitative Evaluation

● Human Evaluation: Conduct surveys with

human evaluators to assess the realism and

diversity of generated anime faces.

● Attribute-based Evaluation: For conditional

models, check the correctness of the

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

206

conditional attributes (e.g., matching the

gender or facial expressions).

Output’s

Visual Results of Image-to-Image Translation Using

CycleGAN for Face Style Transformation Shown in

Figure 7.

Figure 7: Visual Results of Image-to-Image Translation

Using CycleGAN for Face Style Transformation.

4 CONCLUSIONS

In conclusion, generating anime faces using deep

learning has revolutionized the way we create and

personalize animated characters. By utilizing

advanced models such as GANs and VAEs,

researchers have been able to generate highly realistic

and expressive anime faces. These models learn

intricate patterns and details from large datasets,

enabling the creation of diverse and unique

characters.

The ability to customize facial features allows for

endless possibilities, making anime face generation a

powerful tool in both creative and commercial

industries. While challenges such as training stability

and dataset bias remain, advancements in AI continue

to push the boundaries of what’s possible. As deep

learning technologies evolve, we can expect even

more realistic, nuanced, and creative anime

characters in the future. This progress opens doors for

further exploration in animation, gaming, and virtual

reality applications. Ultimately, AI-driven anime face

generation represents a significant leap toward a more

personalized, immersive, and creative digital

landscape.

REFERENCES

X. Zeng, H. Wang, Y. Yang, and Z. Yang, "Face image

generation for anime characters based on generative

adversarial network," ResearchGate, 2023. [Online].

Available: https://www.researchgate.net/publication/3

88057590_Face_Image_Generation_for_Anime_Char

acters_based_on_Generative_Adversarial_Network

B. H. Assefa and C.-C. J. Kuo, "Graph convolutional

networks with edge-aware message passing for

skeleton-based action recognition," TNS Proceedings,

2023. [Online]. Available: https://www.ewadirect.co

m/proceedings/tns/article/view/20348/pdf

Y.-C. Chen, K.-Y. Hsu, and C.-S. Fuh, "Generating anime

faces from human faces with adversarial networks,"

National Taiwan University, 2018. [Online].

H. Ding, C. Jin, and G. Xu, "Anime character face

generation based on GAN and transfer learning,"

Research Square, preprint, 2023. [Online]. Available:

https://www.researchsquare.com/article/rs-2530988/v1

L. Zhang, Y. Lin, Y. Wang, and S. Liu, "StyleFaceGAN:

Face stylization with generative adversarial networks,"

in 2022 IEEE International Conference on Image

Processing (ICIP), 2023, pp. 3366–3370.

Doi:10.1109/ICIP46576.2022.9897875. [Online].

Available: https://ieeexplore.ieee.org/document/1000

9693

M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J.

Lehtinen, and J. Kautz, "Few-shot unsupervised image-

to-image translation," in 2019 IEEE/CVF International

Conference on Computer Vision (ICCV), 2019, pp.

10551–10560. Doi: 10.1109/ICCV.2019.01065.

[Online].

Y. Zheng, J. H. Liew, Z. Lin, and Y. Liu, "Appearance-

preserved portrait-to-anime translation via proxy-

guided domain adaptation," Singapore Management

University, 2022. [Online]. Available:

https://ink.library.smu.edu.sg/sis_research/8362

Anime Face Generating Using Deep Learning

207