AuthentiCheck: Combating Fake Twitter Profiles with Advanced

Machine Learning Techniques

Pravin Patil, Priyanshu Chaudhari, Nilesh Wankhede, Aditya Khandelwal and Tejas Sonar

Department of Information Technology, SVKM’s Institute of Technology Dhule, India

Keywords: Neural Networks, NTScraper, Random Forest Algorithm, SVM (Support Vector Machines), Snscrape, Stream

Lit, Twitter API, Web Scraping.

Abstract: The increasing prevalence of fake profiles on social media platforms, particularly Twitter, has become a

pressing issue, impacting user trust, security, and the integrity of online interactions. These fake accounts—

comprising bots, impostors, and malicious entities—are frequently used for phishing, identity theft, spreading

misinformation, and influencing public opinion. Despite efforts by social media platforms to tackle this

problem, the evolving tactics of cybercriminals demand more advanced and effective solutions.To address

this, we developed AuthentiCheck, a browser plug-in that leverages cutting-edge machine learning

technologies to detect and classify fake Twitter profiles. AuthentiCheck analyzes key behavioral attributes,

such as tweeting frequency, follower growth, and engagement patterns, to distinguish between legitimate and

fraudulent accounts. Using a dataset of over 9,600 manually labeled profiles, we trained and evaluated several

state-of-the-art machine learning models, selecting the most accurate and salable model for

deployment.AuthentiCheck offers users a seamless and user-friendly way to verify profile authenticity with a

simple right-click on a Twitter ID. The plug-in provides real-time feedback on a profile's legitimacy,

empowering users to navigate social media more securely. This work demonstrates how advanced machine

learning techniques can combat the challenges posed by fake profiles and lays the foundation for future

improvements in real-time detection systems

1 INTRODUCTION

Social media sites such as Twitter have revolutionary

transformed modes of communication and

socialization among individuals as well as

information sharing among the population. These

modes of interaction have become inescapable tools

used in personal interaction, business

communication, and news communication. It is

within these parameters that Twitter has received

widespread use because it allowed its users to post

shorter messages, follow other people, and engage in

very rapid conversations. As of 2018, Twitter had

over 336 million active monthly users, making it one

of the most widely used social networking sites.

However, with such widespread usage comes the

advent of significant challenges, particularly the rise

in fraudulent accounts. These may include bots,

impersonators, and other malicious actors that pose a

significant threat to users and the online environment

at large. Fraudulent accounts are used for various

illicit activities, including phishing schemes, identity

theft, and the spread of false information. Further,

they compromise the integrity of social media

platforms by distorting public perception,

exaggerating follower numbers, and participating in

synchronized digital assaults.

As fake accounts are becoming more common, it

is impossible not to create efficient detection and

remediation techniques for the same. Although the

social media websites have implemented deletion of

such fraudulent accounts, cyber crooks' strategies are

advancing so fast that keeping up with the same has

become challenging. Therefore, a need for such

advanced technologies that can detect fake profiles in

no time.

The current paper presents a browser extension

called AuthentiCheck, which identifies false accounts

on Twitter with the help of machine learning. Using

critical behavioral features from user profiles, such as

follower increase, tweet frequency, and statistics for

engagement, the paper examines an account as valid

or malicious. This tool provides a simple method of

assessing and minimizing fraudulent risks associated

Patil, P., Chaudhari, P., Wankhede, N., Khandelwal, A. and Sonar, T.

AuthentiCheck: Combating Fake Twitter Proﬁles with Advanced Machine Learning Techniques.

DOI: 10.5220/0013621500004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 3, pages 455-462

ISBN: 978-989-758-763-4

455

with profiles in Twitter by checking the genuineness

of individual accounts. The paper reviews how the

AuthentiCheck tool was developed, from data

collection to model training, analysis of its

effectiveness in deciding fake profiles. Finally, this

research contributes to the effort to make social media

safer and to improve users' trust in them.

2 LITERATURE SURVEY

The proliferation of social media, such as Twitter,

brings along with it an extensive avenue for

information exchange and personal interaction;

however, the proliferation has led to the

establishment of fake profiles that provide avenues

for the spread of false and malicious activities.

Platform-based identification of fake profiles on the

Twitter platform calls for an analysis of complex

patterns in behavior and text using advanced

techniques like machine learning, feature

engineering, and NLP (K. M. Manojkumar, G.

Gudikoti, J. Naveen, S. B. Devamane and G. C.

Lakshmikantha,2023) (K. Shreya, A. Kothapelly, D.

V and H. Shanmugasundaram,2022). Although these

methodologies are intrinsically very effective, each

has its own limitations related to scale, precision, and

adaptability that keep prodding further research into

adaptation for current detection methods for real-time

applications.

The research by Manojkumar and Shreya was

based on the feature-based machine learning

approach in which certain characteristics like the

follower-to-following ratio, tweet frequency, and

engagement metrics were used to train classifiers (K.

M. Manojkumar, G. Gudikoti, J. Naveen, S. B.

Devamane and G. C. Lakshmikantha,2023). Studies

as mentioned above have depicted how the

supervised machine learning techniques classified

under Random Forest and Support Vector Machines

SVM can be present, even for identifying fake

profiles. However, such methods require a lot of

training data, which is hard to be obtained and to be

maintained over dynamic social media scenarios.

Linguistic-based approaches employ NLP for

textual feature analysis, which involves language

pattern, sentiment, and syntactic structure. (Bhatia et

al. ,2023) presented a technique that used NLP and

deep learning to identify sources of fake news on

Twitter, applying it to demonstrate the utility of the

recurrent neural network in understanding textual

cues that differentiate between authentic and fake

profiles (P. Harris, J. Gojal, R. Chitra and S.

Anithra,2021). These methods demonstrate the

strength of NLP tools, including sentiment analysis

and topic modelling, in detecting manipulated

content. Still, issues regarding the management of

several languages and dialects will hinder it from

wider usage (Madhura Vyawahare and Sharvari

Govilkar , 2022).

(Narayanan et al,2018), employ so-called hybrid

models, where both feature engineering and machine

learning algorithms are used together to maximize the

accuracy of detection (T. Bhatia, B. Manaskasemsak

and A. Rungsawang,2023). For instance, in the case

of Narayanan's Iron Sense system, its ML algorithms

were combined with feature-based methods. The way

this was done is through the study of user activity

along with metadata from the account and network

connection behaviour.

Bio-inspired algorithms, in which recent

examples would include the work of (Mahammed et

al, 2022), can be used in evolutionary techniques to

improve feature selection mechanisms when

detecting fake profiles (L. P, S. V, V. Sasikala, J.

Arunarasi, A. R. Rajini and N. Nithiya,2022). Despite

being considered to be new and hence innovative, it

is significantly restricted due to the computational

requirements implicated in scalability. At any rate,

this remains an area of inspiration derived from the

changing dynamic and adaptive behaviour of users,

as well as the ever-changing patterns developed on

social media.

Research performed by (Harris,2021) and

(Vyawahare&Govilkar,2022) analyzed the

identification of spammers on platforms other than

Twitter, including Instagram, where different

behaviour of its users and unique feature spaces posed

distinctive challenges (Sonowal, G., Balaji, V. &

Kumar, N ,2024). Cross-platform studies enabled

cross-platform comparisons and opened up the

potential discussion for transferring knowledge

learned on one platform to improve performance on

another. Although theoretical, in practice, issues arise

because of differences in data structures and user

behaviour from one network to another, thus

necessitating a customized model for each social

network.

3 METHODOLOGY

3.1 Pre-processing

The web application includes the authentication

of users before allowing them to access for the

proposed model.

INCOFT 2025 - International Conference on Futuristic Technology

456

Figure 1: System Architecture

3.1.1 Data Retrieval

This ability of the system to acquire further data from

web scraping tools such as NTScraper, snscrape or

use the Twitter API to fetch the profile details like

follower count, tweets, following count, and

3.1.2 User Interaction

A user inputs a Twitter profile username to verify

authenticity using the front-end interface, that is,

Stream lit application, HTML and CSS for designing

the web application.

3.1.3 Twitter API Integration

An API, or Application Programming Interface, helps

the connectivity between the twitter and application.

In this case, it most likely helps the system to

communicate with twitter, like fetching data or

sending information to another module of the system.

3.1.4 Database Storage (Firebase Firestore)

The results of the analysis such as whether a profile

is fake or real for reference later on to store on the

Firebase database.

3.1.5 User History and Reporting

The system is capable of showing users what past

profiles they have screened. This saved outcome

means that they can monitor steadily and, therefore,

observe patterns over time.

3.1.6 Display Outcomes

The user is presented with real-time feedback of the

profile through the interface provided by Stream lit.

In case the profile is a scam, then some additional

information or risk factors like large follower-to-

following ratio or other suspicious engagement-are

also shown.

3.1.7 Visualization

The system provides different types of charts and

graphs, to justify the outcomes made by system and

gives recommendations or warnings to the user on the

profile's authenticity.

3.2 Model Processing

Figure 2: Model Processing

3.2.1 Data Cleaning

This includes cleaning of models by cleaning of data

from which the retrieved data would then have its

accuracy enhanced and reliability maintained.

Handling profiles that did not have either a bio or a

location ensured proper handling of this sort of

profile. Numerical features such as followers count,

following count and number of tweets were

standardized.

AuthentiCheck: Combating Fake Twitter Proﬁles with Advanced Machine Learning Techniques

457

3.2.2 Feature Selection

After pre-processing the data, following are some

features related to identifying the fake profiles

extracted.

 Account Age: Accounts which are too new are

marked suspicious, using standard

configurations, such as profile pictures or

banner graphics, is considered to be very

dangerous signs.

 Profile Completeness: A profile which does

not possess important information, such as

bio, location, or photo is more likely to be

fake.

 Content Attributes: Repetitive or copied

content helps detect patterns often associated

with bots or fake accounts.

3.2.3 Feature Extraction

Categorical attributes, such as sidebar colours or the

presence of a verified badge, are converted into

numerical representations to make them more

interpretable to the model.

Supplementary characteristics are derived from

the primary data to enhance the model's ability for

better prediction. The main features are made up of:

 Follower-to-Following Ratio: Values which

are too high or too low could raise concerns.

 Tweet Frequency: Very high or very sporadic

tweet frequencies could be bot-like.

 Engagement metrics take the analysis of

indicators showing authentic interactions as

likes ,retweets, and replies to illustrate

potential fraudulent profiles.

3.2.4 Model Training

The Random Forest algorithm is used because it

excels at classification tasks and is also known to be

immune to over-fitting even for large feature sets.

Random Forest works as an ensemble of decision

trees:

Each decision tree is generated based on a

randomly selected subset of the dataset.

The output of the model is formed by applying a

mechanism called majority voting of all decision

trees.

The algorithm learns using a labelled dataset of

genuine and fraudulent Twitter profiles. All the

profiles are associated with a set of features drawn

out, including number of followers, tweets per hour,

and engagement metrics. For example, a profile with

a high follower-to-following ratio accompanied by

low engagement can be classified as fraudulent.

3.2.5 Profile Classification

When analysing a new Twitter profile:

 The extracted features are fed to the trained

Random Forest model.

 Each tree of classification in the model labels

a profile as either *fake or *real.

 The last class assigned by majority voting.

3.2.6 Post-Processing and Risk Score

Calculation

After the model has generated output, a risk score is

calculated. This risk score represents the probability

that a particular profile is fraudulent; the higher the

score, the greater the likelihood of being fake.

The raw output is translated into meaningful,

human-readable feedback to make the results easier

to understand. It provides more detailed insights

rather than giving a simple "fake" or "real" label.

Feedback comprises particular indicators that

suggest questionable behaviour, thereby rendering

the classification applicable. Instances include: -

"High follower-to-following ratio with low

engagement." Suspicious tweet frequency with

default profile indicators.

4 EXPERIMENTS AND RESULTS

Several assessment criteria are used by testing to

determine which of the models we tested is the best

in terms of prediction accuracy and inference time.

the models using Twitter account attributes as a test

dataset. Already taken out. The comparison of the

three models was done using the evaluation

parameter’s values that were obtained.

4.1 Assessment of Experiments on

Models

To determine which of our models performed best,

the following criteria or parameters were applied to

unseen data.

Accuracy rating: The accuracy score is a

measurement of the proportion of test dataset data

instances that the model correctly labels.

4.2 Learning Curves:

Learning curves indicate how fast a model

approaches convergence. The curves' plateau zone

indicates model convergence.

INCOFT 2025 - International Conference on Futuristic Technology

458

Following are the results of evaluation on

different models Learning curves:

Figure 3 : Learning curve of Random Forest

The figure 3 depicts the performance of a Random

Forest model in terms of training and cross-validation

scores as the number of training samples increases.

The red line, representing the training score, stays

very close to 1.0, which is expected because the

model fits the training data almost perfectly; this is a

characteristic of Random Forest because of its high

capacity to memorize data. The cross-validation score

that is presented by the green line is lower at the

beginning but improves as data is added and stabilizes

at about 0.94. The gap in the cross-validation and the

training score indicates a minimal overfitting

whereby the model trains better with data but not so

when it's presented with a new one

Figure 4 : Learning curve of SVM

The learning curve diagram in figure 4

demonstrates a Support Vector Machine (SVM)

model performance, along with an RBF kernel for the

number of training examples. The red line displays

the training score: it presents how well the model can

approximate the training data. Green line is for the

cross-validation score; this can be an estimation of the

generalization ability of the model over unseen data.

Ideally, both should tend to a high value as the

number of training examples increases. In this case,

the training score seems to plateau after

approximately 500 examples, suggesting the model is

learning the training data well.

Figure 5 : Learning curve of Neural Network

The figure 5 of the learning curve for the

identification of a Neural Network model of a fake

profile as the number of training examples increases

is as follows: the red line represents the training score

which represents the goodness of fit for the training

data. The green line is the cross-validation score,

estimating how well the model will generalize to

unseen data. Ideally, both lines should converge to a

high value as the number of training examples

increases. In this case, the training score plateaus after

about 500 examples, meaning the model is learning

the training data well.

Figure 6 : Confusion matrix of Random Forest

AuthentiCheck: Combating Fake Twitter Proﬁles with Advanced Machine Learning Techniques

459

Figure 7 : Confusion matrix of SVM

Figure 8 : Confusion matrix of Neutral Network

4.3 Inference

Although we carried out very extensive experiments

with the labelled test accounts, it treated the tested

models as fake or genuine, and consequently,

examined different classifiers in detail. The metrics

results are presented in Table 1 comparing the two

models based on Precision, Recall, F-Measure, and

Accuracy rating: The accuracy score is a

measurement of the proportion of test dataset data

instances that the model correctly labels.

Table 1: Model Comparison Performance.

Metric Random

Forest

SVM Neural

Networks

Precision 0.90 0.85 0.88

Recall 0.99 0.90 0.98

F-Measure 0.94 0.91 0.93

Accurac

94.2% 90.4% 93.9%

Accuracy was highest for Random Forest at

94.2% followed by Neural Networks with 93.9% and

SVM at 90.4%.

Our experiment results show that Random Forest

outperformed other models in terms of Recall value

(0.99) and F-Measure value (0.94). High recall score

implies that the Random Forest classifier does a

fantastic job of eliminating false negatives and, thus,

identifies a large majority of the fake profiles. Neural

Networks were well balanced with a strong

combination of Precision (0.88) and Recall (0.98),

making them a good choice to work with complex

datasets. SVM performed with a relatively lower

Precision at 0.85 but achieved good scores in other

metrics.

Figure 9 : Accuracy of ML Models

Random Forest showed the best time and resource

efficiency while labelling unseen test datasets. This

suggests a potential application in real-time detection

of fake profiles. The high F-Measure obtained by

Random Forest indicates its strength in balancing

false positives with false negatives for flagging

fraudulent accounts on Twitter.

Given the overall performance, we recommend

Random Forest for future implementations because of

its higher recall and accuracy scores that make it very

effective in real-time fake profile detection on social

media platforms. Neural Networks may also be

pursued further because of their balance between

precision and recall in scenarios involving high-

dimensional and complex data.

In fact, several sample accounts claimed to be

fake or authentic by trustworthy sources have been

run through our system with excellent real-time

classification capabilities. The system correctly

classified suspicious accounts, and promising

accuracy was achieved in detecting fraudulent

profiles on Twitter.

INCOFT 2025 - International Conference on Futuristic Technology

460

Figure 10 : Recall . Precision and F-Measure Comparison

5 CONCLUSION

This increases the threats of fake profiles in social

media sites like Twitter against users' security,

authenticity of interactions, and integrity of the

platform. The current research is aimed at proposing

a novel approach toward the detection of fake profiles

in Twitter using machine learning techniques

available through a user-friendly web application

called AuthentiCheck. AuthentiCheck is a service

that identifies suspicious accounts in real-time

through the evaluation of key behavioral attributes.

These include the frequency of tweets, the follower-

to-following ratio, engagement metrics, and

completeness of profile information. The tool used

the Twitter API and methods of web scraping, using

NTScraper, among others, to extract data as well as

train models. The ability of the system to provide

real-time feedback, actionable insights, as well as

visualizations, ensures a better user experience.

Extensive testing of a machine learning classifier led

the Random Forest algorithm to top the list of

effective models because of its classification

accuracy and robustness. Future work may involve

applying this model to other social networks and

feature set fine-tuning for further enhancement in the

accuracy of the fake profile detection. The current

paper serves as a sound basis for continued work

towards improving the fight against fake profiles and

the online environment

REFERENCES

Manojkumar, K. M., Gudikoti, G., Naveen, J., Devamane,

S. B., & Lakshmikantha, G. C. (2023). Machine

learning algorithms for fake profile detection using

Twitter data. In International Conference on

Computational Intelligence for Information, Security

and Communication Applications.

Shreya, K., Kothapelly, A., V, D., & Shanmugasundaram,

H. (2022). Identification of fake accounts in social

media using machine learning. In 2022 Fourth

International Conference on Emerging Research in

Electronics, Computer Science and Technology

(ICERECT), Mandya, India

Narayanan, A., Garg, I., Arora, T., Sureka, M., Sridhar, M.,

& Prasad, H. B. (2018). IronSense: Towards the

identification of fake user-profiles on Twitter using

machine learning. In 2018 Fourteenth International

Conference on Information Processing (ICINPRO),

Bangalore, India.

Mahammed, N., Bennabi, S., Fahsi, M., Klouche, B.,

Elouali, N., & Bouhadra, C. (2022). Fake profiles

identification on social networks with bio-inspired

algorithm. In 2022 First International Conference on

Big Data, IoT, Web Intelligence and Applications

BIWA

Harris, P., Gojal, J., Chitra, R., & Anithra, S. (2021). Fake

Instagram profile identification and classification using

machine learning. In 2nd Global Conference for

Advancement in Technology (GCAT), Bangalore, India

(pp. 1-5).

Narayanan, Vyawahare, M., & Govilkar, S. (2022). Fake

profile recognition using profanity and gender

identification on online social networks. 2018

SPRINGER.

Bhatia, T., Manaskasemsak, B., & Rungsawang, A. (2023).

Detecting fake news sources on Twitter using deep

neural network. In 2023 11th International Conference

on Information and Education Technology (ICIET),

Fujisawa, Japan (pp.508-512).

P, L., V, S., Sasikala, V., Arunarasi, J., Rajini, A. R., &

Nithiya, N. (2022). Fake profile identification in social

network using machine learning and NLP. In 2022

International Conference on Communication,

Computing and Internet of Things (IC3IoT), Chennai.

Sonowal, G., Balaji, V., & Kumar, N. (2024). A model to

detect fake profile on Instagram using rule-based

approach Moore, R., Lopes, J. (1999).

TEMPLATE’06,1st-International Conference Template

Production. SCITEPRESS.

Bhatia, R., & Yadav, S. (2021). Detection of Fake Profiles

in Social Media Networks Using Machine Learning. In

Proceedings of the 5th International Conference on

Social Media Security (SMSS'21), IEEE.

Kumar, R., & Sharma, P. (2020). Fake Account Detection

Using AI-Based Approaches on Social Media

Platforms. In Proceedings of the 6th International

Conference on Machine Learning and Data Mining

(MLDM'20), SPRINGER.

Ghosh, P., & Gupta, A. (2020). Identifying Fake Social

Media Profiles Using Behavioral Analysis.In

Proceedings of the 12th International Conference on

Computer Networks and Information Security

(CNIS'20), SCITEPRESS.

Singh, V., & Mehta, D. (2021). A Survey on Fake Account

Detection in Social Media Using Data-Mining.In

Proceedings of the 8th International Conference on

AuthentiCheck: Combating Fake Twitter Proﬁles with Advanced Machine Learning Techniques

461

Data Science and Machine Learning (DSML'21),

SPRINGER.

Chaudhary, N., & Sharma, R. (2019). Identifying Fake

Users on Social Media Platforms Through Network

Analysis. In Proceedings of the 10th International

Conference on Social Network Analysis (SNA'19),

IEEE.

Verma, S., & Pandey, A. (2020). Fake Profile Detection in

Social Media: A Machine Learning-Approach. In

Proceedings of the 10th International Conference on

Social Network Analysis (SNA'19), IEEE.

Patel, M., & Desai, K. (2021). A Comparative Study of

Fake Profile Detection Algorithms in Social-Networks.

In Proceedings of the 9th International Conference on

Computational Intelligence (CI'21), SPRINGER.

Nair, A., & Kaur, S. (2020). Fake Profile Identification

Using User Behavior Analysis in Social-Media.In

Proceedings of the 7th International Conference on

Computational Data and Social Media (CDSM'20),

IEEE.

Singh, R., & Mishra, A. (2021). Automated Detection of

Fake Accounts in Social Media Platforms. In

Proceedings of the 11th International Conference on

Social Media and Security (SMSS'21), SPRINGER.

Das, D., & Saha, A. (2020). Fake Profile Detection and

Fake News Propagation in Social-Media-Networks.In

Proceedings of the 4th International Conference on

Network and Information Systems (NIS'20), IEEE.

Arora, R., & Jain, V. (2018). Fake Account Detection in

Social Media Using Content-Based and Behavior-

Based Features. In Proceedings of the 10th

International Conference on Computer Science and

Artificial Intelligence (CSAI'18), SPRINGER.

Gupta, P., & Agarwal, S. (2019). Fake Profile Detection in

Social Networks Using Data Mining-Techniques.In

Proceedings of the 13th International Conference on

Machine Learning and Pattern Recognition

(MLPR'19), IEEE.

Patil, S., & Bansal, A. (2020). Detecting Fake Profiles in

Social Media Networks Using Graph-Based-

Algorithms.In Proceedings of the 8th International

Conference on Artificial Intelligence (AIC'20),

SCITEPRESS.

Raj, S., & Sharma, S. (2019). Identification of Fake Users

in Online Social Networks Using Supervised-

Learning.In Proceedings of the 3rd International

Conference on Data Science and Machine Learning

(DSML'19), SPRINGER.

Verma, H., & Chaurasia, R. (2021). Fake Profile Detection

in Social Media Using Feature Engineering and

Classification Algorithms. In Proceedings of the 16th

International Conference on Big Data and Data Mining

(BDDM'21), IEEE

Yadav, M., & Gupta, S. (2019). Fake User Detection in

Social Media Using Hybrid Machine Learning Models.

In Proceedings of the 11th International Conference on

Machine Learning (ML'19), SPRINGER.

Soni, M., & Kapoor, S. (2018). Identifying Fake Profiles in

Online Social Networks Using Pattern Recognition. In

Proceedings of the 11th International Conference on

Machine Learning (ML'19), SPRINGER.

Rathore, P., & Kapoor, A. (2020). Fake Profile Detection

Using Behavioral and Demographic Data in Social

Media. In Proceedings of the 12th International

Conference on Data Mining and Analytics (DMA'20),

SCITEPRESS.

Sharma, S., & Agarwal, A. (2021). A Review on Fake

Profile Detection Techniques in Social Media

Networks. In Proceedings of the 9th International

Conference on Computer Science (ICS'21),

SPRINGER.

Saha, S., & Banerjee, A. (2018). Fake Account Detection

Using Machine Learning and Graph Theory in Social

Media. In Proceedings of the 5th International

Conference on Data Mining and Knowledge

Engineering (DMKE'18), IEEE.

INCOFT 2025 - International Conference on Futuristic Technology

462