detection and developing tools to verify the
authenticity of digital content. Additionally, public
awareness and education on the dangers of deepfakes
are vital for empowering individuals to recognize and
avoid falling victim to misinformation. In (Ahmad et
al., 2022), the authors investigated methods to
enhance the performance of the K-Nearest
Neighbours (KNN) algorithm by utilizing a Genetic
Algorithm to optimize the parameters of nonlinear
functions associated with different features, resulting
in improved outcomes. Similarly, Preeti Nair and
Indu Kashyap in (Kumar et al., 2021) emphasized the
benefits of incorporating resampling techniques and
the Interquartile Range (IQR) during data
preprocessing, which helps normalize the input data
for classifiers and improves algorithm performance.
The authors of (Agrawal and Ramalingam, 2019)
developed a model for detecting fake news that
analyses headlines and user engagement data from
social media platforms. K. Nagashri and J.
Sangeetha, in (Parth and Iqbal, 2020), focused on
identifying fake news through count vectorization
techniques, evaluating various machine learning
algorithms based on metrics like accuracy,
precision, recall, and F1 score, and concluded that
TF-IDF is an effective text preprocessing method. In
(Singh, Yadav, and Verma, 2022), researchers
examined the relationship between word usage and
context to classify texts as genuine or fake. They
employed models such as Count Vectorizer to
transform text into numerical data, assessing which
models effectively distinguished real from fake
content. Shlok Gilda, in (Roy and Bhattacharya,
2020), utilized term frequency-inverse document
frequency (TF-IDF) with bi-grams and probabilistic
context-free grammar (PCFG) techniques on a
dataset of around 11,000 articles, achieving a
classification accuracy of 77.2% using various
machine learning algorithms like Random Forests
and Gradient Boosting.
2 METHODS
2.1 KNN Classifier
The KNN algorithm is a supervised machine learning
technique employed for both classification and
regression tasks. It operates by analysing a dataset of
labelled inputs to create a function that assigns labels
to new, unlabelled data based on the concept of
"nearest neighbours." The parameter "K" indicates
how many neighbours are considered when
determining the label of a new input. The algorithm
evaluates the proximity of the input data points in a
multidimensional space and assigns a label based on
the majority vote among the nearest neighbours. The
optimal K value is typically identified through trial-
and- error methods, with the elbow method being a
common approach.
2.2 Generative Adversarial Network
(GAN)
Generative Adversarial Network (GANs) are a class
of deep learning models consisting of two competing
neural networks: a generator and a discriminator. The
generator's role is to produce new data samples, while
the discriminator's function is to differentiate between
real and generated (fake) samples. This competitive
training process enables the generator to improve its
ability to create realistic data that resembles the
training dataset. The GAN architecture has gained
prominence in deepfake creation, utilizing the
adversarial relationship between the generator and
discriminator to refine the quality of generated
outputs. The training process involves optimizing
loss functions that guide both networks: the generator
seeks to minimize the chances of its outputs being
identified as fake, while the discriminator aims to
maximize its accuracy in distinguishing real from fake
data.
2.3 Cycle GAN
In Cycle GAN, the workflow begins with an input
image being transformed into a reconstructed image by
the generator, which is then reverted to the original
form by another generator. The model assesses the
mean squared error loss between the actual and
reconstructed images, enhancing its learning
capabilities. The primary advantage of Cycle GAN
lies in its ability to learn features from one domain and
apply them to another, even when the two domains are
not directly related. Cycle GAN leverages the GAN
framework to facilitate image image-to-image
translation by extracting and transferring features
between two unrelated image domains. This
unsupervised approach uses a cycle loss function to
maintain the integrity of image characteristics
throughout the transformation process, allowing the
model to learn how to convert images from one
domain to another without requiring paired examples.