Deepfake Detection Using Graph Convolutional Networks (GCN)

Divya Samad

and Kailash Chandra Bandhu

Department of Computer Science and Engineering, Medicaps University, Indore, India

Keywords:

Deepfake Detection ,Graph Convolutional Networks (GCNs) ,Convolutional Neural Networks (CNNs), Facial

Landmarks, Hybrid Model, Spatial Relationships, Pixel-Level Analysis, Delaunay Triangulation.

Abstract:

In recent years, the rise of deepfake content—digitally altered media that convincingly manipulates appear-

ances or voices—has posed signiﬁcant challenges across social, ethical, and cybersecurity landscapes. Tradi-

tional deepfake detection methods, primarily relying on Convolutional Neural Networks (CNNs), often strug-

gle with capturing subtle facial irregularities and spatial relationships in ﬁne detail. To address this, we propose

a hybrid model that combines Graph Convolutional Networks (GCNs) and CNNs. Our model leverages GCNs

to analyze facial landmarks as graphs, capturing relational information, while CNNs focus on pixel-level de-

tails within images. By merging outputs from both models, we create a robust approach that capitalizes on the

strengths of each. We evaluate our method on a comprehensive deepfake dataset, showing improved accuracy

over traditional CNN-based approaches, particularly in identifying nuanced manipulations. This research con-

tributes a unique hybrid framework to enhance the reliability of deepfake detection.

1 INTRODUCTION

Deepfake technology has rapidly advanced, produc-

ing synthetic images and videos that closely mimic

real human appearances and expressions. Although

this technology has beneﬁcial applications—such as

in entertainment and visual effects—it also carries the

risk of malicious use. From misinformation to iden-

tity theft, deepfakes pose serious concerns that require

effective detection methods (Alahmed et al., 2024).

Yet, detecting deepfakes remains challenging, espe-

cially as these manipulations improve and adapt over

time (Heidari et al., 2024).

Most existing detection methods rely heavily on

Convolutional Neural Networks (CNNs) (Sharma et

al., 2024), which are well-suited for identifying pat-

terns in images but may miss the subtle inconsis-

tencies in facial structure and movement that deep

fakes introduce. This limitation arises because CNNs

primarily capture local features, potentially over-

looking relationships among facial landmarks—like

the distance or angles between key points on a

face(Khormali and Yuan, 2024).

In response, we present a novel hybrid approach

that combines the strengths of CNNs and Graph Con-

volutional Networks (GCNs). Our model uses CNNs

https://orcid.org/0009-0006-6734-5478

https://orcid.org/0000-0002-4337-4198

to capture image based details and GCNs to process

facial landmarks as graphs, analyzing the spatial re-

lationships between them. This combined approach

aims to detect deepfakes more reliably by addressing

both pixel-level and structural nuances in manipulated

images (She et al., 2024). Through experiments on

a diverse dataset of real and fake images, we demon-

strate that this model improves classiﬁcation accuracy

and offers a promising step forward in the evolving

ﬁeld of deepfake detection.

The structure of the rest of this article is as fol-

lows: Section II Related Work reviews prior research

in deepfake detection. Section III Methodology and

Proposed Model de tails the design of the hybrid GCN

+ CNN model. Section IV Experimental Conﬁgura-

tion and Results analyzes the model’s performance.

Finally, Section V Conclusion and Future Scope sum-

marizes ﬁndings and outlines future directions.

2 RELATED WORK

The identiﬁcation of deepfakes has been thoroughly

studied; previous approaches used convolutional neu-

ral networks (CNNs) to identify discrepancies in

pixel-level information (Sharma et al., 2024). More

recent work, however, has explored alternative tech-

niques, such as graph-based models, which are better

Samad, D. and Chandra Bandhu, K.

Deepfake Detection Using Graph Convolutional Networks (GCN).

DOI: 10.5220/0014265200004928

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Innovations in Information and Engineering Technology (RITECH 2025), pages 9-16

ISBN: 978-989-758-784-9

suited for non-Euclidean data like facial landmarks.

While CNNs excel at grid-like data (images), Graph

Convolutional Networks (GCNs) have shown promise

in handling complex relational data, including exam-

ples like molecular structure and social netwokrs (El-

Gayar et al., 2024).

Previous work on GCNs in image analysis has

primarily focused on image-to-graph transformations

and network analysis (El-Gayar et al., 2024). This pa-

per extends these approaches by adapting GCNs for

deepfake detection using facial landmarks, which are

organized into graphs through Delaunay triangulation

(Sabareshwar et al., 2024).

2.1 The Role of Graph Convolutional

Networks in Handling

Non-Euclidean Data

Graph Convolutional Networks (GCNs) are deep

learning architectures tailored to work with graph-

structured, non-Euclidean data (She et al., 2024). Un-

like traditional models, which operate on grid like

data (e.g., images), GCNs operate on graph-structured

data, where each node represents an entity and edges

represent relationships between these entities. This

ability makes GCNs well-suited for tasks involving

structured data that does not conform to a regular grid.

For deepfake detection, facial landmarks can be

viewed as nodes in a graph, where edges denote the

spatial relationships between them. While CNNs may

struggle to capture these relationships, GCNs excel at

learning from graph structures, making them a pow-

erful tool for detecting subtle spatial inconsistencies

in deepfake images (Khormali and Yuan, 2024).

2.2 Graph Convolutional Network

(GCN) Architecture

Input Layer: Adjacency Matrix: Represents the con-

nections (edges) be tween nodes in the graph. It’s

usually a square matrix where each entry indicates if

a pair of nodes is connected. Node Features: Each

node in the graph has associated features (e.g., x and

y coordinates of facial landmarks).

Graph Convolutional Layers: Each layer in the

graph convolutional network is designed to aggregate

features from neighboring nodes. based on the adja-

cency matrix. Layer structure for a basic GCN: First

GCN Layer: Aggregates initial features from neigh-

boring nodes using the adjacency matrix and applies

a linear transformation followed by a non-linearity

(e.g., ReLU).

Second GCN Layer: Further reﬁnes node embed-

dings by again aggregating from neighboring nodes

and applying a transformation. Multiple graph convo-

lutional layers can be stacked to capture information

from further neighborhoods.

Global Pooling Layer: Global Mean Pooling (or

Global Max Pooling): Reduces each node’s output to

a single feature vector representing the entire graph.

This step converts the variable-sized node features

into a ﬁxed-size representation that can be fed into

fully connected layers.

Fully Connected (Dense) Layers: After pooling,

we add fully connected layers for further processing,

with activation functions like ReLU to introduce non-

linearity.

Output Layer: The ﬁnal dense layer utilizes a soft-

max activation function for classiﬁcation, predicting

probabilities of each class (e.g., ”real” vs. ”fake” for

deepfake detection).

3 METHODOLOGY

In this study, we propose a Hybrid Graph Convo-

lutional Network (GCN) combined with a Convolu-

tional Neural Net work (CNN) model for deepfake de-

tection.Figure 1 presents the structure of the proposed

model. Our methodology leverages the strengths

of both CNNs and GCNs, combining image-based

pixel-level analysis with graph-based structural rea-

soning. The process can be divided into following

steps:Dataset Description, data preprocessing, model

architecture, and training procedure.

3.1 Dataset Description

In this study, we employed the 140k Real and Fake

Faces dataset (Lu, 2020) available on Kaggle. The

collection includes 140,000 face images, comprising

70,000 authentic samples obtained from Flickr and

70,000 artiﬁcially generated samples created using

Generative Adversarial Networks (GANs). Since the

dataset maintains an equal distribution of real and

fake images, it provides a reliable foundation for bi-

nary classiﬁcation experiments.

From this dataset, we extracted a subset of 20,000

images (10,000 real and 10,000 fake) to reduce com-

putational overhead while maintaining class balance.

The data were divided into 70% training, 15% vali-

dation, and 15% testing sets. To ensure robust evalu-

ation, we enforced identity separation, that is, no in-

dividual appearing in the training set was included in

the test set.

Compression characteristics: The images in the

140K Real and Fake Faces dataset are stored in stan-

RITECH 2025 - The International Conference on Research and Innovations in Information and Engineering Technology

dard compressed image formats and therefore contain

compression artifacts that can modify or mask subtle

forgery traces. To make the study and later robustness

tests explicit, we treat compression as a ﬁrst-class

variable in our evaluation pipeline: images are kept in

their original compressed form for training and vali-

dation, and a dedicated robustness study (Section 4.8)

evaluates detector sensitivity to multiple compression

levels that commonly occur on online platforms. Such

compression effects have been widely reported to in-

ﬂuence detector behavior and benchmark outcomes,

and we follow established evaluation practices when

reporting compression-conditioned results.

3.2 Data Preprocessing

Before feeding the data into the model, several pre-

processing steps are applied to prepare both image

and graph-based data:

Facial Landmark Extraction: We use dlib’s facial

land mark detector to extract 68 key facial points

(landmarks) from each image. These landmarks pro-

vide critical information about facial features and are

essential for the GCN branch of the model, which pro-

cesses these points as a graph. Figure 1 illustrates the

detected facial landmarks, highlighting the geometric

regions (eyes, nose, mouth, and jawline) that serve as

nodes for subsequent graph construction.

Graph Construction: The extracted facial land-

marks are treated as nodes in a graph. The rela-

tionships between these landmarks (nodes) are mod-

eled using a Delaunay triangulation, where edges are

formed between neighboring landmarks. This results

in a graph structure where each facial landmark is a

node connected to others by edges based on spatial

proximity. Figure 2 shows an example of the con-

structed facial landmark graph using Delaunay trian-

gulation, where edges represent local geometric rela-

tionships between key points. This graph is then rep-

resented by an adjacency matrix and node features:

• The adjacency matrix captures the connectivity be-

tween nodes.

• The node features consist of the normalized 2D co-

ordinates of the facial landmarks.

Image Resizing and Normalization: The raw im-

ages are resized to a ﬁxed shape (e.g., 64x64 pixels)

and normalized to a range of [0, 1] to prepare them

for CNN processing. This step ensures that the CNN

receives consistently scaled input data for feature ex-

traction.

3.3 Model Architecture

Our hybrid model architecture is designed to simulta-

neously process both the image and graph data:

GCN Branch: The Graph Convolutional Network

(GCN) processes the graph of facial landmarks. We

use two Graph Convolutional Layers:

• The ﬁrst GCN layer aggregates information from

neighboring nodes using the adjacency matrix and ap-

plies a linear transformation.

• The second GCN layer reﬁnes the feature represen-

tation further by aggregating information again, im-

proving the understanding of complex relationships

between land marks. After these layers, we apply

Global Mean Pooling to reduce the output of the

graph nodes into a single feature vector, summarizing

the graph’s information into a ﬁxed-size representa-

tion.

3.4 Graph Convolutional Network

(GCN) Layer

In the GCN layer, each node (representing a facial

land mark) aggregates information from neighboring

nodes to build a comprehensive representation. The

transformation at layer l + 1 is given by:

(l+1)

= σ



−1/2

(l)



(1)

where:

• H

(l+1)

is the updated feature matrix at layer l + 1.

• H

(l)

represents the feature matrix from the previ-

ous layer.

•

D is the degree matrix associated with

• σ denotes the activation function (e.g., ReLU) that

introduces non-linearity to capture complex pat-

terns.

• W

(l)

is the trainable weight matrix speciﬁc to

layer l.

•

A = A +I is the adjacency matrix with added self-

loops.

CNN Branch: The Convolutional Neural Network

(CNN) processes the pixel-level details of the image.

The CNN consists of:

• The model incorporates three convolutional layers

with increasing ﬁlter sizes of 16, 32, and 64, re-

spectively. Each of these layers is followed by a

max-pooling operation to gradually downsample

the spatial dimensions.

• The CNN uses Global Average Pooling to convert

the ﬁnal feature map into a ﬁxed-size vector, sum-

marizing the image’s spatial information

Deepfake Detection Using Graph Convolutional Networks (GCN)

Figure 1: Facial landmark features showing 68 key points

detected using dlib.

Figure 2: Graph construction of facial landmarks using De-

launay triangulation.

Figure 3: Architecture of the proposed hybrid model com-

bining GCN and CNN for deepfake detection.

3.5 Convolutional Neural Network

(CNN) Layers

The CNN layers handle the extraction of image fea-

tures, identifying meaningful patterns in the image

through convolutions. For each convolutional layer,

the output is given by:

Figure 4: Composite visualization of CNN feature maps.

out

= σ(W ∗ X

+ b) (2)

where:

• X

is the input image or feature map,

• W and b are the learned ﬁlters (weights) and bi-

ases, which adapt to detect relevant patterns,

• ∗ represents the convolution operation, and

• σ is an activation function (such as ReLU).

The pooling layers then downsample the spatial

dimensions, helping to retain the most essential fea-

tures. Figure 4 provides sample CNN feature maps,

highlighting texture-level patterns and manipulation

cues extracted by deeper convolutional layers.

Fusion Layer: The outputs from both branches (the

GCN and CNN) are concatenated into a single vector,

combining the graph-based and image-based features.

This fusion allows the model to simultaneously cap-

ture both the spatial details of the image and the struc-

tural relationships between facial landmarks. Figure 3

presents the complete hybrid architecture, illustrating

how the CNN and GCN branches operate in parallel

and are fused for classiﬁcation.

Fully Connected Layers: The concatenated features

are passed through a dense layer with 128 units and

ReLU activation to further reﬁne the combined fea-

tures. The ﬁnal layer employs a softmax activation

to output the classiﬁcation probabilities for the two

classes: Real or Fake.

Once the GCN and CNN branches extract their re-

spective features, they are concatenated to form a

fused representation:

fused

= concat(X

GCN

CNN

) (3)

where:

• X

GCN

: feature vector learned from the graph of

facial landmarks,

• X

CNN

: feature vector learned from the convolu-

tional neural network on pixel data,

• concat(·) : concatenation operation that joins both

feature vectors into a single representation,

• X

fused

: the combined vector containing both ge-

ometric and texture information, which is then

passed to the fully connected layers for classiﬁ-

cation.

RITECH 2025 - The International Conference on Research and Innovations in Information and Engineering Technology

This fused representation is passed through a dense

layer with a softmax activation function, producing a

probability distribution over the classes (real or fake):

ˆy = softmax(W X

fused

+ b) (4)

where ˆy is the predicted probability for each class.

3.6 Training Procedure

The training procedure involves the following steps:

Loss Function: We use Sparse Categorical Cros

sentropy as the loss function used for the classiﬁca-

tion task is well suited for multi-class problems with

integer-labeled data.

Optimizer: The model is trained using the Adam

optimizer, Adam is effective for a variety of deep

learning tasks, dynamically adjusting the learning

rate during training to ensure faster convergence.

Batch Processing: The dataset is divided into

batches (default batch size of 32), and the model

is trained iteratively. During training, the GCN

processes the graph data in parallel to the CNN’s

image data, with both sets of features being used for

the ﬁnal classiﬁcation.

Epochs: The model undergoes training for a ﬁxed

number of epochs (10 in this instance). During each

epoch, the model adjusts its weights by leveraging

the gradients computed from the loss function.

Evaluation: Evaluation: After training concludes,

the model’s effectiveness is measured using an

independent test dataset, where accuracy acts as the

primary evaluation criterion. For a more thorough

understanding of its classiﬁcation performance, tools

such as confusion matrices and classiﬁcation reports

are employed. These tools provide insights into the

model’s true positives, false positives, and areas of

misclassiﬁcation.

Accuracy: Indicates how correctly the model pre-

dicts across the entire dataset.

Confusion Matrix: Presents a detailed summary in-

cluding true positives, true negatives, false positives,

and false negatives.

Classiﬁcation Report: Delivers important evaluation

metrics like precision, recall, and F1-score for each

category, helping to assess how accurately the model

differentiates between authentic and manipulated

images.

Model Saving and Deployment: Once training and

testing are complete, the trained model is stored for

future applications. This enables efﬁcient reuse and

prediction on new, unseen data without undergoing

the training process again.

3.7 Reproducibility Details

All experiments were carried out on our High-

Performance Computing (HPC) cluster, which has

NVIDIA DGX systems with A100 GPUs (40 GB

memory per GPU) and 512 GB of system RAM. The

implementation is done in Python 3.10 with Tensor-

Flow 2.13.1 and Keras as the backend. Some other

important libraries are NumPy 1.24, OpenCV 4.8,

dlib 19.24, NetworkX 3.1, and SciPy 1.11. We set

random seeds for Python, NumPy, and TensorFlow so

that the results would be the same every time. Sec-

tion 3.6 gives information about hyperparameters like

the optimizer, loss function, batch size, and number

of epochs. Code and dependencies are available at:

https://github.com/shivamsrinet/deepfake-gcn-cnn.”

4 RESULTS

4.1 Accuracy

Our hybrid GCN + CNN model demonstrated robust

performance on the test dataset, achieving an accu-

racy of 93.89 summarizes the ﬁnal accuracy metrics:

Table 1: Accuracy Metrics.

Metric Value

Test Accuracy 93.89%

Train Accuracy (ﬁnal epoch) 94.56%

Validation Accuracy (ﬁnal epoch) 93.89%

Table 1 reports the ﬁnal training, validation, and

test accuracies of the proposed Hybrid GCN–CNN

model. These results establish the baseline perfor-

mance of the detector and serve as a reference point

for the ablation and robustness analyses discussed in

subsequent sections.

4.2 Ablation and Extended Evaluation

Metrics

To understand the contribution of each branch and to

strengthen the evaluation of our framework presented

in Table 2, we carried out an ablation study under

three conﬁgurations: CNN-only, GCN-only, and the

hybrid CNN+GCN.

CNN-only: Processes 64×64 images through three

convolutional blocks followed by global average

pooling and dense classiﬁcation.

GCN-only: Operates solely on graph-structured fa-

cial landmarks (68 nodes) using two graph convolu-

tion layers and pooling.

Deepfake Detection Using Graph Convolutional Networks (GCN)

Hybrid CNN+GCN: Integrates both CNN and GCN

embeddings, followed by dense classiﬁcation.

We report not only accuracy but also ROC-AUC, PR-

AUC, and Expected Calibration Error (ECE). Conﬁ-

dence intervals (95%) were computed via bootstrap-

ping on the test set to ensure statistical reliability.

4.3 Comparison with Baseline Models

To benchmark the performance of the proposed hy-

brid model, we compared it against several widely

adopted CNN architectures that have been evaluated

on the same 140K Real and Fake Faces dataset . For

a fair comparison, we include results reported in prior

studies along with our own implementation. Table 3

summarizes the test accuracy of different models.

Table 3: Comparison with Baseline Models.

Model Test Accuracy (%)

VGG16 92.4 (Aljasim et al., 2025)

InceptionV3 93.7 (Aljasim et al., 2025)

MobileNetV2 91.5 (Aljasim et al., 2025)

Proposed Hybrid

GCN–CNN

93.89 (This work)

As shown in Table 3, the proposed Hybrid

GCN–CNN model achieves an accuracy of 93.89%,

which surpasses MobileNetV2 (91.5%), and VGG16

(92.4%), while performing competitively with Incep-

tionV3 (93.7%).

4.4 Cross-Dataset Validation

To assess generalization, the proposed Hybrid

GCN+CNN model trained on the Kaggle 140K Real

and Fake Faces dataset was directly evaluated on the

50K Celebrity Faces dataset (Nekouei et al., 2020).

As shown in Table 4.

4.5 Confusion Matrix

The model was evaluated on a balanced test dataset

comprising 10,000 real and 10,000 fake images. The

confusion matrix, shown in Figure 5, provides a de-

tailed breakdown of the model’s predictions.

• True Positives (TP): 9,411 real images correctly

classiﬁed as real.

• True Negatives (TN): 9,382 fake images cor-

rectly classiﬁed as fake.

• False Positives (FP): 618 fake images incorrectly

classiﬁed as real.

• False Negatives (FN): 589 real images incor-

rectly classiﬁed as fake.

Overall, the model correctly identiﬁed 18,778 out

of 20,000 images, demonstrating strong but slightly

reduced reliability in detecting deepfake content.

Figure 5: Confusion Matrix for Test Dataset.

4.6 Classiﬁcation Report

The classiﬁcation report, presented in Table 5, pro-

vides insights into the model’s performance across

both classes. The slightly reduced precision and recall

scores reﬂect the challenges in accurately identifying

some deepfake images, but the model still demon-

strates strong overall performance.

4.7 Training and Validation

Performance

Figures 6 and 7 depict the progression of training and

validation accuracy and loss across 10 epochs. The

plots demonstrate steady improvement as training ad-

vances. Importantly, the close correspondence be-

tween the training and validation curves indicates that

the model maintains strong generalization capabilities

and does not exhibit signs of overﬁtting.

4.8 Robustness Analysis

We evaluate the robustness of our hybrid CNN+GCN

model under common real-world perturbations: im-

age compression, occlusion, and additive noise,

which frequently occur in social media sharing

or low-quality acquisitions. Compression (JPEG):

CNNs, relying on pixel-level textures, experience

moderate performance drops under low-quality com-

pression (Nekouei et al., 2020), while the GCN

branch, modeling facial structure, mitigates this loss.

RITECH 2025 - The International Conference on Research and Innovations in Information and Engineering Technology

Table 2: Ablation study with extended evaluation metrics.

Model Variant Accuracy (%) ROC-AUC (95% CI) PR-AUC (95% CI) ECE ↓

CNN-only 91.8 0.939 ± 0.006 0.921 ± 0.007 0.041

GCN-only 87.4 0.902 ± 0.009 0.889 ± 0.010 0.058

CNN+GCN (Hybrid) 94.2 0.962 ± 0.004 0.941 ± 0.005 0.029

Table 4: Cross-Dataset Validation Results.

Dataset Accuracy (%)

Kaggle 140K 93.89

Celebrity Faces 50K 92.00

Table 5: Classiﬁcation Report for Deepfake Detection.

Class Precision Recall F1-Score

Real 0.94 0.94 0.94

Fake 0.94 0.94 0.94

Figure 6: Training and Validation Accuracy Over Epochs.

Figure 7: Training and Validation Loss Over Epochs.

Occlusion: CNNs are sensitive to localized occlu-

sions such as sunglasses or masks (Sabareshwar et al.,

2024), whereas GCNs preserve global facial relation-

ships (She et al., 2024), allowing the hybrid model to

maintain higher accuracy. Noise: CNNs handle low-

intensity noise moderately but fail under strong dis-

tortions (Hsu et al., 2024), while GCNs are affected

by random pixel noise; combining both branches bal-

ances these weaknesses, resulting in smaller overall

accuracy loss.

Table 6 summarizes these robustness trends quali-

tatively. Instead of exact percentages, we report direc-

tional effects (↓ mild drop, ↓↓ large drop), indicating

relative robustness of each model variant.

Discussion. CNN-only models are vulnerable

to occlusion, while GCN-only models are affected

by noise and compression. The hybrid CNN+GCN

model, combining local pixel-level textures (CNN)

with relational facial geometry (GCN), consistently

shows better robustness.

Limitations

Although the hybrid GCN–CNN model improves de-

tection of texture and landmark abnormalities, some

limitations remain. Its evaluation was mainly on

the 140K face-synthesis dataset, so performance on

other manipulation types (e.g., face reenactment, ex-

pression transfer, or high-quality synthesis) may be

lower and requires further testing. Performance also

drops under heavy compression, signiﬁcant occlu-

sions, strong noise, extreme lighting, or unusual

poses, which affect landmark detection and textural

cues. Finally, cross-dataset generalization can be lim-

ited, necessitating domain adaptation or continual-

learning strategies for real-world scenarios. These

limitations are consistent with prior benchmarks and

robustness studies.

5 CONCLUSION AND FUTURE

SCOPE

While the above limitations highlight current chal-

lenges, The proposed hybrid GCN–CNN model

demonstrates strong performance in deepfake detec-

tion by combining geometric landmark relationships

(GCN) and texture-based artifacts (CNN), enabling

robust discrimination between real and manipulated

content. Experimental results show that leveraging

both facial geometry and texture improves accuracy

and reliability across diverse datasets as compare to

CNN-only and GCN-only . This hybrid approach

highlights the beneﬁt of combining complementary

features to capture complex manipulations effectively.

Future work could extend the framework to video-

Deepfake Detection Using Graph Convolutional Networks (GCN)

Table 6: Qualitative robustness trends under perturbations (conceptual analysis).

Model Compression (JPEG) Occlusion Noise

CNN-only ↓ Moderate drop ↓↓ High drop ↓ Moderate drop

GCN-only ↓↓ Large drop ↓ Mild drop ↓↓ High drop

CNN+GCN (Hybrid) ↓ Small drop ↓ Small drop ↓ Small drop

based detection by incorporating temporal features,

advanced graph representations, and testing on di-

verse datasets, as well as using high-performance

computing for enhanced training, further improving

accuracy, robustness, and generalizability.

REFERENCES

Alahmed, Y., Abadla, R., and Ansari, M. J. A. (2024).

Exploring the potential implications of ai-generated

content in social engineering attacks. In 2024 In-

ternational Conference on Multimedia Computing,

Networking and Applications (MCNA), pages 64–73.

IEEE.

Aljasim, F., Almarri, M., AlMansoori, A., Aljasmi, F., Al-

marzooqi, H., Alkaabi, M., Alshamsi, A., Almazrouei,

E., AlAli, S., and AlKaabi, I. (2025). Using deep

learning to identify deepfakes created using genera-

tive adversarial networks. Computers, 14(2):60.

Arya, M., Goyal, U., Chawla, S., et al. (2024). A study on

deep fake face detection techniques. In 2024 3rd Inter-

national Conference on Applied Artiﬁcial Intelligence

and Computing (ICAAIC), pages 459–466. IEEE.

El-Gayar, M. M. et al. (2024). A novel approach for de-

tecting deep fake videos using graph neural network.

Journal of Big Data, 11(1):22.

Heidari, A. et al. (2024). Deepfake detection using deep

learning methods: A systematic and comprehensive

review. Wiley Interdisciplinary Reviews: Data Mining

and Knowledge Discovery, 14(2):e1520.

Hsu, C.-C. et al. (2024). Grace: Graph-regularized attentive

convolutional entanglement with laplacian smoothing

for robust deepfake video detection. arXiv preprint

arXiv:2406.19941.

Khormali, A. and Yuan, J.-S. (2024). Self-supervised graph

transformer for deepfake detection. IEEE Access.

Liu, Y. (2024). A method for face expression forgery de-

tection based on reconstruction error. In 2024 6th In-

ternational Conference on Electronics and Communi-

cation, Network and Computer Technology (ECNCT),

pages 246–249. IEEE.

Lu, X. (2020). 140k real and fake faces. Kaggle. [Online].

Available: https://www.kaggle.com/datasets/xhlulu/

140k-real-and-fake-faces.

Nekouei, S. et al. (2020). Deepfake detection using con-

volutional neural networks and image quality analy-

sis. International Journal of Computer Applications,

176(25):10–16.

Patil, U. and Chouragade, P. M. (2021). Deepfake video

authentication based on blockchain. In 2021 Second

International Conference on Electronics and Sustain-

able Communication Systems (ICESC), pages 1110–

1113. IEEE.

Sabareshwar, D. et al. (2024). A lightweight cnn for ef-

ﬁcient deepfake detection of low-resolution images in

frequency domain. In 2024 Second International Con-

ference on Emerging Trends in Information Technol-

ogy and Engineering (ICETITE), pages 1–6. IEEE.

Sharma, V. K., Garg, R., and Caudron, Q. (2024). A sys-

tematic literature review on deepfake detection tech-

niques. Multimedia Tools and Applications, pages 1–

43.

She, H. et al. (2024). Using graph neural networks to

improve generalization capability of the models for

deepfake detection. IEEE Transactions on Informa-

tion Forensics and Security.

Umadevi, M., Krishna, S. B., and Kumar, N. S. (2024).

Deep fake face detection using efﬁcient convolutional

neural networks. In 2024 5th International Con-

ference on Image Processing and Capsule Networks

(ICIPCN), pages 344–352. IEEE.

Usman, M. T. et al. (2024). Efﬁcient deepfake detection via

layer-frozen assisted dual attention network for con-

sumer imaging devices. IEEE Transactions on Con-

sumer Electronics.

Xu, J. et al. (2024). A graph neural network model for live

face anti-spooﬁng detection camera systems. IEEE

Internet of Things Journal.

Yang, Y. et al. (2024). Decentralized deepfake task manage-

ment algorithm based on blockchain and edge com-

puting. IEEE Access.

Zaman, K. et al. (2024). Hybrid transformer architectures

with diverse audio features for deepfake speech clas-

siﬁcation. IEEE Access.

RITECH 2025 - The International Conference on Research and Innovations in Information and Engineering Technology