The performance metrics of this comparative
study classify the YuNet-based deepfake detection
pipeline to be better than the MTCNN-based
approach in terms of accuracy, processing time, and
robustness. The shown benefits of the YuNet-based
pipeline indicate that it is a feasible option for
practical scenarios in deepfake detection, proving to
be more reliable and more efficient in tackling the
issue of synthetic media manipulation.
6 CONCLUSIONS
The comparative study on MTCNN and YuNet face
detection systems with InceptionResNetV1 as the
core recognition system for deepfake detection
indicates significant advantages of the YuNet-based
pipeline. The experimental results show that YuNet
outperforms MTCNN in terms of all three important
performance factors: detection accuracy, processing
speed, and robustness to different input conditions.
The above benefits are directly attributable to the
improved face detection portion of YuNet and the
architectural modifications made to minimize
latency, which render it especially suitable for low-
latency applications like deepfake detection.
This study lays the groundwork for multiple
intriguing avenues of future work. One promising
direction is utilizing transferable learnings to adapt
pre-trained InceptionResNetV1 models by tuning
them to domain-specific datasets that more accurately
reflect the changing landscape of synthetic media.
While some recent works have focused solely on
detector training with new data through transfer
learning, others have combined architectural
innovations from contemporary state-of-the-art
neural networks, potentially benefiting detection
performance on progressively more advanced
deepfake content that leverage subtle facial artifacts.
Another valuable future work direction is robust
data augmentation strategies. Adopting higher-order
methods like geometric transformations, noise
functions, and adversarial training can increase the
coverage of model generalization to address a
multitude of deepfake generation techniques. Such
techniques would allow detection systems to remain
effective even as deepfake technologies grow in
complexity and subtlety. Ensemble methodologies
also deserve to be thoroughly studied. Ensemble
detection models involving augmenting model’s
expertise such as manipulation models or artifact
domains could pave the way towards more complete
and generalized detection systems. Ensemble
techniques can include combinations of
convolutional neural networks with transformer
architectures, as well as spatial and temporal analysis
for video deepfakes, further enhancing the detection
of synthetic media that often requires multiple inputs.
Beyond the current rendering of facial detection,
future research needs to explore multimodal avenues
that the user considers both with visual and audio
features of media in order to detect discrepancies
typical of deepfakes. In addition, effective detection
pipelines can also focus on developing lightweight
versions and deploying them on edge devices for
broader application of deepfake detection
technologies.
While state-of-the-art models to generate
synthetic media have continued to become more
sophisticated and are also easier to acquire and use,
the effects that deepfakes could potentially have on
different aspects of society are alarming.
This arms race between generation and detection
technology requires that detection methodologies
continue to evolve. This study offers important
contributions to this vital domain by extending the
knowledge of effective strategies for sustaining
digital media authenticity in a world filled with ever-
more persuasive synthetic media.
These results argue for the informativeness of
video as a medium for more effective deepfake
detection systems and the need to continue funding
research in this area to maintain the integrity of the
information across the digital ecosystem. Future
interdisciplinary collaboration between computer
vision specialists, security researchers, and media
forensics experts will be critical to create holistic
solutions to these growing threats.
REFERENCES
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi,
“Inception-v4, Inception-ResNet and the Impact of
Residual Connections on Learning,” Feb. 12, 2017,
Association for the Advancement of Artificial
Intelligence. https://doi.org/10.1609/aaai.v31i1.11231
E. Wahab, W. Shafique, H. Amir, S. Javed, and M. Marouf,
“Robust face detection and identification under
occlusion using MTCNN and ResNet50,” Jan. 19,
2025. https://doi.org/10.30537/sjet.v7i2.1499.
G. Gupta, K. Raja, M. Gupta, T. Jan, S. T. Whiteside, and
M. Prasad, “A comprehensive review of deepfake
detection using advanced machine learning and fusion
methods,” Electronics, vol.13, no.1, p.95, Dec.25,2023.
https://doi.org/10.3390/electronics13010095.
J. Newman, “AI-generated fakes launch a software arms
race,” NiemanLab, Dec. 2018. [Online]. Available:
https://www.niemanlab.org/2018/12/ai-generated-