
chitecture and preprocessing steps. Section 4 pro-
vides details about the dataset and accuracy analy-
sis, demonstrating the effectiveness of MobileNetV4-
Small in detecting deepfake images.
2 BACKGROUND STUDY
Deepfake detection has become an increasingly im-
portant research area, driven by the rapid evolution
of generative technologies such as Variational Au-
toEncoders (VAEs)(Kingma et al., 2019) and GANs
(Goodfellow et al., 2014). These methods allow
for highly realistic manipulations, with applications
in face swapping, reenactment (Nirkin et al., 2019),
and other forms of image tampering (Zheng et al.,
2019). To counter these challenges, researchers have
developed various detection methods (Heidari et al.,
2024), broadly categorized into traditional forensic
techniques, CNN-based methods, and lightweight ar-
chitectures.
Early approaches to detecting manipulated media
relied on forensic techniques that analyzed inconsis-
tencies in lighting, pixel values, and compression arti-
facts. While these methods could identify simple ma-
nipulations, they struggle with highly sophisticated
deepfake techniques enabled by advanced GANs. For
example, techniques like Face2Face (Thies et al.,
2016) and Neural Textures (Thies et al., 2019) lever-
age 3D modeling and photometric reconstruction to
produce highly realistic results, making detection
through traditional methods increasingly difficult.
The introduction of CNNs revolutionized deep-
fake detection by automating feature extraction and
analysis. Models like ResNet (Targ et al., 2016) and
VGG (Tammina, 2019) demonstrated high accuracy
in detecting artifacts in manipulated media. ResNet-
50 achieved up to 95% accuracy on datasets like
Celeb-DF (Li et al., 2020), while EfficientNet (Tan
and Le, 2019), with its balance of computational ef-
ficiency and accuracy, has been a preferred choice
for many applications. Transformers, such as Vision
Transformers (Khan et al., 2022), have also emerged
as a promising alternative, providing strong general-
ization across datasets, though at a higher computa-
tional cost.
Given the need for real-time and resource-
efficient solutions, lightweight models such as Mo-
bileNet, ShuffleNet (Zhang et al., 2018), and
SqueezeNet (Bhuvaneswari and Enaganti, 2023) have
gained popularity. These architectures balance accu-
racy and computational requirements, making them
suitable for deployment on devices with limited re-
sources, such as mobile phones. MobileNetV2
achieved 89% accuracy on datasets like FaceForen-
sics++(Rossler et al., 2019), but its limitations in
capturing subtle manipulation details restrict its use
in more challenging scenarios. ShuffleNet and
SqueezeNet also show promising results, with Shuf-
fleNetV2 achieving 88.7% accuracy and SqueezeNet
achieving 87.5% accuracy on similar datasets. How-
ever, these models tend to compromise on the ability
to detect subtle manipulation artifacts and are limited
by the lack of advanced modules for feature extrac-
tion.
MobileNetV4 and its compact variant,
MobileNetV4-Small, further enhance computa-
tional efficiency while maintaining high detection
accuracy. MobileNetV4-Small leverages advanced
techniques such as depthwise separable convolutions
and squeeze-and-excitation modules, optimizing fea-
ture extraction and ensuring the model can efficiently
process complex data while using fewer resources.
In comparison to MobileNetV2, which achieves
89% accuracy, MobileNetV4-Small surpasses this
by achieving 89.78% accuracy with a lower test loss
of 0.2648. This makes MobileNetV4-Small more
efficient and capable of detecting subtle manipula-
tions as well as deployment on resource-constrained
devices , a crucial aspect for deepfake detection in
real-world applications.
Despite the progress made by lightweight archi-
tectures, several challenges remain unresolved. One
major issue is the generalization of detection mod-
els across different types of manipulations. Many
models tend to overfit specific datasets, limiting their
ability to detect unseen manipulations. Addition-
ally, resource-intensive models like ResNet and Vi-
sion Transformers are impractical for deployment
on devices with constrained computational resources.
These limitations highlight the need for more effi-
cient, generalizable, and lightweight models that can
perform well across diverse manipulation techniques
and be deployed on resource-constrained devices.
The proposed preprocessing pipeline improves
generalization by applying augmentations like ran-
dom rotation and color jittering, enhancing the
model’s robustness to real-world variations. Com-
bined with the compact MobileNetV4-Small archi-
tecture, which ensures computational efficiency, this
approach makes real-time deepfake detection feasible
on lightweight devices. By addressing the limitations
of existing methods, our solution offers an efficient
and accurate deepfake detection model, optimized for
resource-constrained environments, with the next sec-
tion elaborating on the methodology and key innova-
tions.
INCOFT 2025 - International Conference on Futuristic Technology
248