Efﬁcient DeepFake Image Classiﬁcation Using Lightweight

MobileNetV4-Small Architecture

Savita Sidnal, Shradha Kekare, Soujanya Menasagi,

Vaishnavi Tharakar, Uday Kulkarni and Shashank Hegde

School of Computer Science and Engineering, KLE Technological University, Hubli, Karnataka, India

Keywords:

Deep Learning, Deepfake Detection, Lightweight Neural Networks, MobileNetV4-Small, Synthetic Media.

Abstract:

The rise of Deep Learning (DL) has unlocked a wide range of transformative applications, including the

creation of hyper-realistic synthetic images through Generative Adversarial Networks (GANs). While these

images demonstrate the immense potential of DL, they also pose signiﬁcant risks, such as misuse in cyberse-

curity breaches, political manipulation, and disinformation campaigns. This paper proposes a robust approach

for deepfake detection using MobileNetV4-Small, a lightweight and efﬁcient DL model. Leveraging advanced

preprocessing techniques, the proposed method enhances the ability to distinguish counterfeit images from au-

thentic ones. The study utilized a dataset containing real and fake images, achieving a notable test accuracy

of 89.78%.The model’s performance was further analyzed through visual evaluation of classiﬁcation results.

This work underscores the efﬁcacy of lightweight architectures in addressing the challenges posed by deep-

fake media and provides a comparative analysis with existing approaches. Future enhancements could involve

ensemble techniques and expanded datasets to further improve accuracy and generalization. The results afﬁrm

the critical role of DL models in mitigating the risks associated with synthetic media.

1 INTRODUCTION

The advancements in Deep Learning (DL)(Mienye

and Swart, 2024) have unlocked transformative ap-

plications across diverse ﬁelds such as education,

entertainment, and healthcare. However, these ad-

vancements have also given rise to deepfake tech-

nology(Croitoru et al., 2024), a highly debated and

morally complex innovation. Deepfakes leverage ad-

vanced machine learning models, particularly Gen-

erative Adversarial Networks (GANs)(Ahmad et al.,

2024), to create hyper-realistic images and videos,

blurring the boundaries between reality and fabrica-

tion. This capability has raised signiﬁcant concerns

in areas such as politics(Appel and Prietzel, 2022),

journalism, cybersecurity (Broklyn et al., 2024), and

individual privacy.

While deepfakes offer legitimate applica-

tions—such as digital education, lifelike animations,

and historical preservation—they also pose alarm-

ing risks. Malicious actors can exploit them for

disinformation, identity theft, or reputational harm.

For instance, a deepfake video of a political leader

spreading false statements could incite unrest or

destabilize societal systems. Consequently, the

development of effective detection mechanisms is

essential to mitigate these threats and safeguard

against potential misuse.

Despite notable progress in deepfake detection,

existing methods often rely on resource-intensive

models that are unsuitable for deployment on mo-

bile or edge devices. This creates a critical gap, as

lightweight and efﬁcient detection systems are ur-

gently needed to ensure accessibility and scalability

in real-world scenarios.

To address this challenge, the DeepFake Im-

age Classiﬁcation explores MobileNetV4-Small (Qin

et al., 2025), a lightweight Convolutional Neural Net-

work (CNN) (Patel et al., 2023) designed for ef-

ﬁciency and portability. MobileNetV4-Small bal-

ances computational simplicity with high perfor-

mance, making it an ideal candidate for real-time de-

ployment on resource-constrained devices. By ﬁne-

tuning the model to detect subtle artifacts unique

to deepfakes and employing optimization techniques,

this work aims to deliver an effective yet scalable de-

tection framework.

Section 2 presents the background study, while

Section 3 describes the proposed deepfake image

detection approach and the MobileNetV4-Small ar-

Sidnal, S., Kekare, S., Menasagi, S., Tharakar, V., Kulkarni, U. and Hegde, S.

Efﬁcient DeepFake Image Classiﬁcation Using Lightweight MobileNetV4-Small Architecture.

DOI: 10.5220/0013613400004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futur istic Technology (INCOFT 2025) - Volume 3, pages 247-253

ISBN: 978-989-758-763-4

247

chitecture and preprocessing steps. Section 4 pro-

vides details about the dataset and accuracy analy-

sis, demonstrating the effectiveness of MobileNetV4-

Small in detecting deepfake images.

2 BACKGROUND STUDY

Deepfake detection has become an increasingly im-

portant research area, driven by the rapid evolution

of generative technologies such as Variational Au-

toEncoders (VAEs)(Kingma et al., 2019) and GANs

(Goodfellow et al., 2014). These methods allow

for highly realistic manipulations, with applications

in face swapping, reenactment (Nirkin et al., 2019),

and other forms of image tampering (Zheng et al.,

2019). To counter these challenges, researchers have

developed various detection methods (Heidari et al.,

2024), broadly categorized into traditional forensic

techniques, CNN-based methods, and lightweight ar-

chitectures.

Early approaches to detecting manipulated media

relied on forensic techniques that analyzed inconsis-

tencies in lighting, pixel values, and compression arti-

facts. While these methods could identify simple ma-

nipulations, they struggle with highly sophisticated

deepfake techniques enabled by advanced GANs. For

example, techniques like Face2Face (Thies et al.,

2016) and Neural Textures (Thies et al., 2019) lever-

age 3D modeling and photometric reconstruction to

produce highly realistic results, making detection

through traditional methods increasingly difﬁcult.

The introduction of CNNs revolutionized deep-

fake detection by automating feature extraction and

analysis. Models like ResNet (Targ et al., 2016) and

VGG (Tammina, 2019) demonstrated high accuracy

in detecting artifacts in manipulated media. ResNet-

50 achieved up to 95% accuracy on datasets like

Celeb-DF (Li et al., 2020), while EfﬁcientNet (Tan

and Le, 2019), with its balance of computational ef-

ﬁciency and accuracy, has been a preferred choice

for many applications. Transformers, such as Vision

Transformers (Khan et al., 2022), have also emerged

as a promising alternative, providing strong general-

ization across datasets, though at a higher computa-

tional cost.

Given the need for real-time and resource-

efﬁcient solutions, lightweight models such as Mo-

bileNet, ShufﬂeNet (Zhang et al., 2018), and

SqueezeNet (Bhuvaneswari and Enaganti, 2023) have

gained popularity. These architectures balance accu-

racy and computational requirements, making them

suitable for deployment on devices with limited re-

sources, such as mobile phones. MobileNetV2

achieved 89% accuracy on datasets like FaceForen-

sics++(Rossler et al., 2019), but its limitations in

capturing subtle manipulation details restrict its use

in more challenging scenarios. ShufﬂeNet and

SqueezeNet also show promising results, with Shuf-

ﬂeNetV2 achieving 88.7% accuracy and SqueezeNet

achieving 87.5% accuracy on similar datasets. How-

ever, these models tend to compromise on the ability

to detect subtle manipulation artifacts and are limited

by the lack of advanced modules for feature extrac-

tion.

MobileNetV4 and its compact variant,

MobileNetV4-Small, further enhance computa-

tional efﬁciency while maintaining high detection

accuracy. MobileNetV4-Small leverages advanced

techniques such as depthwise separable convolutions

and squeeze-and-excitation modules, optimizing fea-

ture extraction and ensuring the model can efﬁciently

process complex data while using fewer resources.

In comparison to MobileNetV2, which achieves

89% accuracy, MobileNetV4-Small surpasses this

by achieving 89.78% accuracy with a lower test loss

of 0.2648. This makes MobileNetV4-Small more

efﬁcient and capable of detecting subtle manipula-

tions as well as deployment on resource-constrained

devices , a crucial aspect for deepfake detection in

real-world applications.

Despite the progress made by lightweight archi-

tectures, several challenges remain unresolved. One

major issue is the generalization of detection mod-

els across different types of manipulations. Many

models tend to overﬁt speciﬁc datasets, limiting their

ability to detect unseen manipulations. Addition-

ally, resource-intensive models like ResNet and Vi-

sion Transformers are impractical for deployment

on devices with constrained computational resources.

These limitations highlight the need for more efﬁ-

cient, generalizable, and lightweight models that can

perform well across diverse manipulation techniques

and be deployed on resource-constrained devices.

The proposed preprocessing pipeline improves

generalization by applying augmentations like ran-

dom rotation and color jittering, enhancing the

model’s robustness to real-world variations. Com-

bined with the compact MobileNetV4-Small archi-

tecture, which ensures computational efﬁciency, this

approach makes real-time deepfake detection feasible

on lightweight devices. By addressing the limitations

of existing methods, our solution offers an efﬁcient

and accurate deepfake detection model, optimized for

resource-constrained environments, with the next sec-

tion elaborating on the methodology and key innova-

tions.

INCOFT 2025 - International Conference on Futuristic Technology

248

3 PROPOSED WORK

The MobileNetV4-Small model is a lightweight ar-

chitecture optimized for resource-constrained envi-

ronments like mobile and edge devices. It balances

computational efﬁciency, feature extraction, and clas-

siﬁcation accuracy, making it ideal for real-time ap-

plications such as deepfake image classiﬁcation. The

following subsections highlight the key components

and their speciﬁc role in detecting deepfake images.

3.1 Model Architecture

The Figure.1 describes the backbone model,

MobileNetV4-Small, used for feature extraction,

leveraging its specialized convolutional layers for

efﬁcient processing. The extracted features are re-

ﬁned through dimensionality reduction before being

passed to a dense layer. To enhance generalization

and minimize overﬁtting, techniques like dropout

or normalization are applied before the dense layer.

Finally, a binary classiﬁer with a sigmoid activation

function ensures accurate classiﬁcation of the input.

Figure 1: Proposed Work.

3.2 Proposed Architecture Components

Depthwise Separable Convolutions (Kaiser et al.,

2017) reduce computational complexity by splitting

a standard convolution into two steps: depthwise con-

volution (spatial ﬁltering) and pointwise convolution

(channel-wise interaction). This reduces the compu-

tational cost from Equation (1) to Equation (2):

O(M · N · K

) (1)

O(M · K

+ M · N) (2)

where M is the output channels, N is the spatial

dimensions, and K is the kernel size. This optimiza-

tion allows lightweight architectures to process spatial

features more efﬁciently, making them well-suited for

large-scale datasets in deepfake classiﬁcation.

Universal Inverted Bottleneck (UIB) blocks (Qin

et al., 2025), as depicted in Figure 2, expand the in-

put feature dimension F

by a factor t, as shown in

Equation (3):

exp

= t · F

(3)

Figure 2: UIB Blocks (Qin et al., 2025).

followed by depthwise convolutions and projec-

tion back to F

out

. These blocks efﬁciently capture hi-

erarchical structures, including subtle distortions in-

troduced by manipulations, while maintaining com-

putational efﬁciency.

Fused Inverted Bottleneck Blocks (Tan and Le,

2021), shown in Figure 3, integrate activation and

batch normalization directly within the block. These

blocks transition between narrow and wide layers, op-

timizing both computational cost and feature extrac-

tion. They effectively identify deepfake artifacts, such

as texture irregularities and lighting inconsistencies.

Figure 3: Fused Inverted Bottleneck Blocks architec-

ture (Sandler et al., 2018).

Squeeze-and-Excitation (SE) blocks (Hu et al.,

2018) use attention mechanisms to recalibrate

channel-wise feature responses. For an input tensor

X with dimensions (C, H, W ), the global context vec-

tor s

is calculated as shown in Equation (4):

H ·W

∑

h=1

∑

w=1

c,h,w

(4)

This vector is used to scale X with learned

weights, emphasizing features indicative of manipula-

tions, such as unnatural patterns in facial expressions

or mismatched shadows.

The ReLU6 (Howard, 2017) activation function,

deﬁned as shown in Equation (5), is employed:

f (x) = min(max(0, x), 6) (5)

Efﬁcient DeepFake Image Classiﬁcation Using Lightweight MobileNetV4-Small Architecture

249

where x is the input. ReLU6 prevents overﬂow

and ensures stability during low-precision computa-

tions, which is crucial for distinguishing subtle differ-

ences between real and fake images.

Finally, the head and tail layers (Howard et al.,

2019) efﬁciently manage feature extraction and clas-

siﬁcation. The head extracts low-level features

through convolution, batch normalization, and acti-

vation, while the tail reduces spatial dimensions via

adaptive average pooling, producing a feature vector

tail

. The ﬁnal predictions are computed using the

softmax function, as shown in Equation (6):

P(y

) =

exp(W

· F

tail

)

∑

exp(W

· F

tail

)

(6)

This combination ensures robust feature extrac-

tion and accurate classiﬁcation of deepfake artifacts.

3.3 Preprocessing

The dataset preprocessing pipeline was speciﬁ-

cally designed to ensure compatibility with the

MobileNetV4-Small architecture and improve the

model’s robustness. The dataset was organized into

Train, Test, and Validation directories, each contain-

ing subdirectories for real (genuine) and fake (manip-

ulated) images.

For training, the images were resized to 224 ×224

pixels, which aligns with the input dimensions re-

quired by the MobileNetV4-Small architecture. To

further improve model generalization, a series of aug-

mentation techniques were applied. These included

horizontal ﬂipping with a probability of 0.5, rotation

within a range of ±10

◦

, and color jittering, which ad-

justed the brightness, contrast, saturation, and hue of

the images. These transformations helped the model

learn a wider range of features and better handle vari-

ations in the data.

Normalization was performed using the ImageNet

statistics, speciﬁcally the mean [0.485, 0.456, 0.406]

and standard deviation [0.229, 0.224, 0.225]. This

step, represented by the Equation (7) , ensured that

the pixel values were standardized to the same scale as

the pre-trained weights used by MobileNetV4-Small.

′

− µ

(7)

where x

is the pixel value, µ is the mean, and σ is the

standard deviation.

For the test and validation sets, only resizing and

normalization were applied, without any augmenta-

tion. This approach was intended to evaluate the

model’s performance on unaltered images. To stream-

line the loading and processing of the dataset, Py-

Torch’s ImageFolder class was used to organize the

data, and data loaders were conﬁgured with a batch

size of 32 for efﬁcient training and evaluation.

3.4 Training and Evaluation

The MobileNetV4-Small model was trained over 11

epochs using a deep learning framework, with the

training process incorporating the model, training

data, loss function, optimizer, and epochs. The model

was trained using a cross-entropy loss function as

demonstrated in Equation (8).

L(θ) = −

∑

i=1

∑

c=1

i,c

log(p

i,c

) (8)

where N is the batch size, C is the number of

classes, y

i,c

is the true label, and p

i,c

is the predicted

probability for class c. The optimizer employed back-

propagation and the Adam algorithm to iteratively up-

date model parameters, minimizing the loss as out-

lined in Equation (9).

t+1

= θ

− η∇

L(θ) (9)

where η is the learning rate and ∇

L(θ) is the gra-

dient. After each epoch, the model’s accuracy was

calculated using Equation (10).

A =

∑

i=1

1(y

= ˆy

) × 100 (10)

where 1 is an indicator function. Validation was

performed at the end of each epoch to assess gen-

eralization, and the model was saved after training

for testing on a separate test set. During testing, the

model’s accuracy on the test data was calculated sim-

ilarly to the training phase. The ﬁnal test accuracy of

89.78% and loss evaluated the model’s effectiveness

in deepfake detection, demonstrating its ability to dis-

tinguish between real and manipulated images.

The effectiveness of these innovations is evident

in the results of the model’s evaluation on the test

dataset. The following section presents the perfor-

mance metrics and detailed analysis of the model’s

ability to detect deepfake images, showcasing its ro-

bust capabilities in both qualitative and quantitative

assessments.

4 RESULTS

This section presents the experimental results, includ-

ing both qualitative and quantitative analysis, and pro-

vides an overview of the dataset.

INCOFT 2025 - International Conference on Futuristic Technology

250

Figure 4: Sample images from the OpenForensics dataset.

The ﬁgure illustrates examples of genuine (real) images and

manipulated (fake) images, highlighting the focus on facial

regions where deepfake techniques have been applied.

4.1 Dataset Description

The OpenForensics Dataset (Karki, 2024) for Deep-

fake Detection is designed for benchmarking deep-

fake detection methods, organized into Train, Test,

and Validation directories with subdirectories for gen-

uine (real) and manipulated (fake) images. Manip-

ulated images focus on facial regions with applied

deepfake techniques, while backgrounds remain un-

changed, ensuring models focus on facial anomalies.

Images in standard formats (JPEG, PNG) ensure com-

patibility with deep learning frameworks. The dataset

includes 140,002 training images (70,001 real, 70,001

fake), 10,905 test images (5,413 real, 5,492 fake), and

39,428 validation images (19,787 real, 19,641 fake),

enabling robust model evaluation.The sample images

from the dataset,illustrating both real and fake im-

ages, are shown in Figure. 4

4.2 Qualitative Analysis

The MobileNetV4-Small model was trained on the

training dataset and evaluated on the test dataset to

detect real and fake images effectively. The model

achieved a test loss of 0.2648 and a test accuracy of

89.78%, indicating its robust performance in distin-

guishing between real and fake images.

The Figure 5 illustrates the MobileNetV4-Small

model’s ability to distinguish ”Fake” images with fa-

cial manipulations from ”Real” images, even when

backgrounds are altered. The model accurately clas-

siﬁes images as ”Fake” when facial features, expres-

sions, or visual artifacts are modiﬁed, demonstrating

its robustness in detecting manipulations

To further evaluate the model’s classiﬁcation per-

formance, a confusion matrix is presented in Figure 6,

which provides insights into the true positive, true

negative, false positive, and false negative predictions

made by the model.

Figure 5: Top: Real with altered backgrounds. Bottom:

Fake (left) vs. Real (right) based on facial edits

Figure 6: Confusion Matrix for MobileNetV4-Small on the

Test Dataset

4.3 Quantitative Analysis

The Table 1 showcases the performance of var-

ious models for deepfake image classiﬁcation,

with MobileNetV4-Small achieving an accuracy of

89.78% on the OpenForensics dataset, which em-

phasizes facial manipulation detection. In addition

to its accuracy, the model exhibits strong quantita-

tive metrics, including a precision of 91.52%, recall

of 87.53%, F1-score of 89.48%, and a high ROC-

AUC of 96.81%. These metrics underscore its abil-

ity to effectively detect subtle manipulations, such as

blending inconsistencies and unnatural textures, even

in challenging scenarios.

MobileNetV4-Small’s lightweight and efﬁcient

architecture makes it ideal for resource-constrained

environments, balancing performance and scalability

effectively. Its robust design ensures generalization

across diverse manipulations, including variations in

lighting, facial expressions, and angles. Data aug-

mentation during training further enhances its abil-

ity to detect deepfake artifacts. With its compact size

Efﬁcient DeepFake Image Classiﬁcation Using Lightweight MobileNetV4-Small Architecture

251

and real-time processing capabilities, MobileNetV4-

Small is well-suited for practical deepfake detec-

tion applications, addressing both computational ef-

ﬁciency and detection challenges.

Table 1: Comparative Analysis of Lightweight Models for

Deepfake Image Classiﬁcation

Model Dataset Used Accuracy (%)

ShufﬂeNetV2

(Zhang et al.,

2018)

DFDC 88.7

SqueezeNet

(Bhuvaneswari

and Enaganti,

2023)

Celeb-DF 87.5

MobileNetV4-

Small

OpenForensics 89.78

5 CONCLUSION

The MobileNetV4-Small model proves to be a

lightweight and efﬁcient solution for deepfake detec-

tion, making it suitable for deployment on resource-

constrained devices. Its architecture effectively cap-

tures subtle image artifacts, addressing the grow-

ing need for real-time detection systems in practi-

cal scenarios. Future work could focus on extend-

ing this framework to classify speciﬁc manipulation

techniques, enhancing its robustness across diverse

datasets and unseen deepfake types, and exploring in-

tegration into real-world applications. Additionally,

considerations for dataset biases and challenges with

generalization across diverse fake media should be

addressed to ensure broader applicability and fairness

in detection results.

REFERENCES

Ahmad, Z., Jaffri, Z. u. A., Chen, M., and Bao, S.

(2024). Understanding gans: fundamentals, vari-

ants, training challenges, applications, and open

problems. Multimedia Tools and Applications,

pages 1–77.

Appel, M. and Prietzel, F. (2022). The detection

of political deepfakes. Journal of Computer-

Mediated Communication, 27(4):zmac008.

Bhuvaneswari, R. and Enaganti, K. K. (2023). Robust

image forgery classiﬁcation using squeezenet

network. In 2023 First International Conference

on Advances in Electrical, Electronics and Com-

putational Intelligence (ICAEECI), pages 1–5.

IEEE.

Broklyn, P., Egon, A., and Shad, R. (2024). Deep-

fakes and cybersecurity: Detection and mitiga-

tion authors. Available at SSRN 4904874.

Croitoru, F.-A., Hiji, A.-I., Hondru, V., Ristea, N. C.,

Irofti, P., Popescu, M., Rusu, C., Ionescu, R. T.,

Khan, F. S., and Shah, M. (2024). Deepfake

media generation and detection in the genera-

tive ai era: A survey and outlook. arXiv preprint

arXiv:2411.19537.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and

Bengio, Y. (2014). Generative adversarial nets.

Advances in neural information processing sys-

tems, 27.

Heidari, A., Jafari Navimipour, N., Dag, H., and Unal,

M. (2024). Deepfake detection using deep learn-

ing methods: A systematic and comprehensive

review. Wiley Interdisciplinary Reviews: Data

Mining and Knowledge Discovery, 14(2):e1520.

Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen,

B., Tan, M., Wang, W., Zhu, Y., Pang, R., Va-

sudevan, V., et al. (2019). Searching for mo-

bilenetv3. In Proceedings of the IEEE/CVF

international conference on computer vision,

pages 1314–1324.

Howard, A. G. (2017). Mobilenets: Efﬁcient convo-

lutional neural networks for mobile vision appli-

cations. arXiv preprint arXiv:1704.04861.

Hu, J., Shen, L., and Sun, G. (2018). Squeeze-

and-excitation networks. In Proceedings of the

IEEE conference on computer vision and pattern

recognition, pages 7132–7141.

Kaiser, L., Gomez, A. N., and Chollet, F.

(2017). Depthwise separable convolutions for

neural machine translation. arXiv preprint

arXiv:1706.03059.

Karki, M. (2024). Deepfake and real im-

ages dataset. https://www.kaggle.com/datasets/

manjilkarki/deepfake-and-real-images/data. Ac-

cessed: 2024-12-02.

Khan, S., Naseer, M., Hayat, M., Zamir, S. W.,

Khan, F. S., and Shah, M. (2022). Transform-

ers in vision: A survey. ACM computing surveys

(CSUR), 54(10s):1–41.

Kingma, D. P., Welling, M., et al. (2019). An intro-

duction to variational autoencoders. Foundations

and Trends® in Machine Learning, 12(4):307–

392.

INCOFT 2025 - International Conference on Futuristic Technology

252

Li, Y., Yang, X., Sun, P., Qi, H., and Lyu, S.

(2020). Celeb-df: A large-scale challenging

dataset for deepfake forensics. In Proceedings

of the IEEE/CVF conference on computer vision

and pattern recognition, pages 3207–3216.

Mienye, I. D. and Swart, T. G. (2024). A compre-

hensive review of deep learning: Architectures,

recent advances, and applications. Information,

15(12):755.

Nirkin, Y., Keller, Y., and Hassner, T. (2019). Fs-

gan: Subject agnostic face swapping and reen-

actment. In Proceedings of the IEEE/CVF inter-

national conference on computer vision, pages

7184–7193.

Patel, Y., Tanwar, S., Bhattacharya, P., Gupta, R.,

Alsuwian, T., Davidson, I. E., and Mazibuko,

T. F. (2023). An improved dense cnn architec-

ture for deepfake image detection. IEEE Access,

11:22081–22095.

Qin, D., Leichner, C., Delakis, M., Fornoni, M., Luo,

S., Yang, F., Wang, W., Banbury, C., Ye, C.,

Akin, B., et al. (2025). Mobilenetv4: Universal

models for the mobile ecosystem. In European

Conference on Computer Vision, pages 78–96.

Springer.

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C.,

Thies, J., and Nießner, M. (2019). Faceforen-

sics++: Learning to detect manipulated facial

images. In Proceedings of the IEEE/CVF inter-

national conference on computer vision, pages

1–11.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A.,

and Chen, L.-C. (2018). Mobilenetv2: Inverted

residuals and linear bottlenecks. In Proceedings

of the IEEE conference on computer vision and

pattern recognition, pages 4510–4520.

Tammina, S. (2019). Transfer learning using vgg-

16 with deep convolutional neural network for

classifying images. International Journal of

Scientiﬁc and Research Publications (IJSRP),

9(10):143–150.

Tan, M. and Le, Q. (2019). Efﬁcientnet: Rethink-

ing model scaling for convolutional neural net-

works. In International conference on machine

learning, pages 6105–6114. PMLR.

Tan, M. and Le, Q. (2021). Efﬁcientnetv2: Smaller

models and faster training. In International

conference on machine learning, pages 10096–

10106. PMLR.

Targ, S., Almeida, D., and Lyman, K. (2016). Resnet

in resnet: Generalizing residual architectures.

arXiv preprint arXiv:1603.08029.

Thies, J., Zollh

ofer, M., and Nießner, M. (2019). De-

ferred neural rendering: Image synthesis using

neural textures. Acm Transactions on Graphics

(TOG), 38(4):1–12.

Thies, J., Zollhofer, M., Stamminger, M., Theobalt,

C., and Nießner, M. (2016). Face2face: Real-

time face capture and reenactment of rgb videos.

In Proceedings of the IEEE conference on com-

puter vision and pattern recognition, pages

2387–2395.

Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018).

Shufﬂenet: An extremely efﬁcient convolutional

neural network for mobile devices. In Proceed-

ings of the IEEE conference on computer vision

and pattern recognition, pages 6848–6856.

Zheng, L., Zhang, Y., and Thing, V. L. (2019). A sur-

vey on image tampering and its detection in real-

world photos. Journal of Visual Communication

and Image Representation, 58:380–399.

Efﬁcient DeepFake Image Classiﬁcation Using Lightweight MobileNetV4-Small Architecture

253