the internet (Arora, A., & Sharma, P. 2022). The
continuous improvement and adaptation of machine
learning detection methods will allow social media
companies to keep up with the evolving strategies of
those behind fake profiles. By automating the
identification process, these advanced models can
greatly lessen the manual effort and response time
linked to combating fraudulent pursuits. This leads to
a stronger and more authentic online ecosystem,
boosting trust and interaction in digital environments.
2 LITERATURE SURVEY
One way of fake profile detection data that has been
a major capitalized on in recent years, as the demand
for interpretable machine learning has been on the
rise prevalence of automated accounts. Cresci et al.
(2017) offered an exhaustive overview of social
spambots describing the different traits and actions
that differentiate them from real users. Their work
emphasizes the need for the latest detection methods
which fit tactics used by such malicious accounts.
Subsequently, Kudugunta and Ferrara (2018)
examined the usage of deep neural networks to bot
detection, demonstrating how powerful these models
can be at capturing Behavioral, leading to more
complex patterns in user behavior, thus improving
detection accuracy. Drawing on these early works,
Wang et al. (2023) was instead about Twitter using
deep learning techniques with great success in
identifying fake profiles. Of all of their research, the
thing that stood out was the specific questions raised
by the data architecture of the platform showing the
power of convolutional neural networks (CNNs) in
detecting micro- indicators of account authenticity. In
the respective study, Yang et al. (2021) is how
spammers’ social networks were studied, creating
a worth of relational data on detection strategy
improvement. There with findings suggesting that
understanding of the connections between it is
important to note that users can greatly increase spam
detection bots. Alghamdi and Shadi (2023) proposed
a new machine learning, but in some cases, hybrid
approaches algorithms with state-of-the-art
techniques, showing better prevention in fake profile
detection. Their study indicates of Health Founding
Professor of Medicine at the University of Eugene,
which incorporates diverse methodologies and can
have higher accuracy and recall rates, giving a
broader solution to the problem. Besel and Co (2022),
again disciplines by profiling and detecting the
malicious employing ensemble accounts on social
media platforms ways to enable detection capabilities
to differ among categories environments. Moreover,
Singh et al. (2023) examined the Fake users on
Instagram: a novel approach for identification and
categorization by using supervised machine learning
techniques. Al Zamal et al. (2022) that wrote on
the application of ML approaches to a variety of
social media platforms offering a broader
detectability approach. The ongoing research
emphasizes the need for ongoing innovation within
detection techniques, because the methods that evil
are constantly changing.
3 DATA COLLECTION AND
PRE-PROCESSING
Developing a model involves data collection and
preprocessing that are the most crucial. Machine
learning models to identify fake profiles. User profile
data, typically collected from social networking
websites in CSV format with fields like
screen_name, verification status, statuses_count,
followers_count and friends_count. This data needs
to be prepared for machine learning. categorical
fields Verified and protected words are encoded to
numpy value using Label Encoder and textual data in
the screen_name field is Thereafter, TF-IDF
Vectorization is then used to process the data and
identify the words and transforms them into
numerical representation. View Image features and
keep the top 5000 key features to balance
information-preserving representation and
computational hardness. These inputted text features
are merged with numerical friends_count and
favourites_count attributes, forming a complete
dataset to train models, the target labels shows
whether a real or fake profile from the status column.
The dataset is divided into three groups the
training dataset, the validation dataset, and the test
dataset stratified train_test_split to maintain the class
balance. SVM, and traditional machine learning
models such as SVM and Random Forest, and a
dense and dropout-based neural network features,
which are the preprocessed features are then sent into
the CNN, in the form of n-dimensional layers. These
models are refined by metrics, like accuracy,
precision, recall, F1 score, and RMSE, which made
a built comprehensive comparison to decide the most
suitable model for detecting fake profiles in a reliable
way.