Fake News Detection in Social Networks using Machine Learning:

A Review

Sonali Raturi, Amit Kumar Mishra and Srabanti Maji

School of Computing, DIT, Dehradun, Uttarakhand, India

Keywords: Fake News, Machine Learning(ML), Support Vector Machine (SVM), Naïve Bayes, Social media

Abstract: Fake News is spreading so rapidly these days. This is low-quality news that is generated to targeted

someone. This could be created for financial gain or political gain. In no time, millions of tweets are

generated and that could be false, people start believing in fake news when there is not enough information

available to examine whether the information or the tweet that has been created is true or false and also

people start believing in the information that they hear frequently and that could be false. It has been

continuing since traditional media but now it is easier in social media to share or comment on such false

information. With the growth of this false news or information, it is impossible to manually filter such news.

So, there is some computational approach to recognize fake news with different Machine Learning

Algorithms like SVM, Naïve Bayes, etc. This review paper mentioned different types of techniques required

to detect hoax news. Also discussed different methods used in existing models with their accuracy.

1 INTRODUCTION

We live in a society where people generally depend

on socia1 media principles where many people are

likely to look up and get news from social media

instead of traditional news such as newspapers.

False news is poor-quality news that contains false

news which is intentionally created. The vast spread

of fake news day by day has the ability for

tremendous bad effects on society or any individual.

Fake news is written to mislead readers so that they

could believe false information that is intentionally

generated, that makes it hard to detect fake news

dependent on report contents only hence, we need to

involve reserved information, that could be useful’

social involvements on social media which help to

form a conclusion.

Social Media is in a timely fashion and not that

much expensive for consumers to consume news

rather than other traditional news media like

newspapers so it makes it easy to share news further

or comment on and that news is sharing on social

media could be fake.

However, news articles are produced online

because it is low-cost and faster to release news

through social media. These are produced online for

different purposes like political and financial gain.

Fake data is spreading over social media and fake

cures also. Now how to distinctly differentiate real

news, misinformation opinion: False news can

identify by comparing various properties and

theories in both media i.e., social media and

traditional media. Now, define the fake news

detection difficulties and will summarize the

techniques to detect fake news. Next, define the

datasets that will be used in this method and

evaluation of a new model used by existing methods.

Two main features of this definition: intent and

second is authenticity. First, false news involves

false knowledge that can be proved as it is. Second,

fake news is generated to mislead consumers with

dishonest intentions.

There could be several reasons for spreading

fake news.

Fake news could be rumors that are generally not

generated from any news events, only for political

gain or any financial gain or it could be

misinformation that is generated unpremeditated.

Fake news could be produced by fun or to hustle a

specific person. Recently, fake news is dynamic as

changing its phase from traditional media to social

media or online news.

Raturi, S., Mishra, A. and Maji, S.

Fake News Detection in Social Networks using Machine Lear ning: A Review.

DOI: 10.5220/0010564800003161

In Proceedings of the 3rd International Conference on Advanced Computing and Software Engineering (ICACSE 2021), pages 177-181

ISBN: 978-989-758-544-9

177

2 LITERATURE REVIEW

Here are two components that make users naturally

endangered to false news:

Naïve Realism- In this, users start believing that

their viewpoints for reality are the only views that

are accurate and, those whose viewpoints vary are

considered as prejudiced(Ward, 2013).

Confirmation Bias- In this, users believe to receive

only that information that their existing views are

confirmed(Nickerson, 1998).

Venomous accounts could be created online. The

major reason for venomous accounts could be the

cost-effectiveness of creating an account on social

media. It is less expensive to create bots online for

social media. A bot could be an account on social

media and is managed by different computer

algorithms so that it can produce content and link

with bots or people automatically on social

media(Ferrara, 2016). Social bots are said to be

venomous entities when it is designed with a

specific purpose, basically to harm, such as to spread

or manipulate false news on media. People start

believing in false news on account of the following

factors:

Due to the credibility on social media, that

means users review a source of fake news as

credible if others review the same source as credible.

And they do so when there is not enough

information available to decide whether the source is

fake or real, or the truthfulness of any source.

Due to the frequency heuristic, that means users

naturally start supporting that information which

they hear time and again even it could be fake news.

2.1 Techniques Used in Fake News

Detection

Naïve Bayes: In ML,(Yuslee, 2021) naïve bayes is

simply a “probabilistic classifier” which is based

upon applying Naïve Bayes theorem using naïve

independence suppositions between the features.

P(S|T) = {P(T|S) P(S)} / P(T), Where P(S|T) is the

probability of S when T has already occurred, P(T|S)

is the probability of T when S has already occurred,

P(S) is the probability of S occurring, and P(T) is the

probability of T occurring.

The above equation can be written as:

Posterior = {prior x likelihood} / evidence

Support Vector Machine (SVM)(Fung, 2002), is

a supervised Machine Learning algorithm and can be

used for classification problems or regression

problems. It uses a technique that transforms your

data and then observes an optimal boundary based

upon those transformations, and this optimal

boundary should be between the possible outputs.

This technique is called the kernel trick. SVM is

capable of doing regression and classification.

Regression is a supervised Machine Learning

algorithm and it is a subdivision of ML

algorithms(Mahir, 2019). It foretells the product

values based upon input values from the data fed in

the system. The algorithm creates a model on the

features of training data.

SGD “Stochastic Gradient Descent”, a very

common and popular algorithm used in different

Machine Learning tasks, mainly builds the basis of

NN(Helmstetter, 2018). Gradient means in SGD is

slope of a surface or it could be slant of any surface.

Hence, gradient descent in SGD means decreasing a

slope to reach the lowest point on that

surface(Zhang, 2020). Random forest algorithm is

basically a supervised algorithm. In this algorithm,

comes a direct relationship between the no of trees

in the forest and results it can get. In simple words,

the larger the number of trees, the more precise the

result(Stahl, 2018).

2.2 Types of Data Present in Social

Media

As discussed in a paper(Parikh, 2018), three types of

data are available in social media posts, Text data

(Multilingual) which focuses the root of text in

systematically and semantically manner. This data is

analyzed by computational linguistics, since many

posts are produced in texts format so much work has

been executed. Second, Multimedia is multiple

forms of media that is combined in a post.

Multimedia could be an audio, images, graphics and

video. This is an attractive data type and it raise the

attention of the viewers and third is Hyperlink.

ICACSE 2021 - International Conference on Advanced Computing and Software Engineering

178

Table 1: Comparitive performance measurements of

various Fake News Detection techniques

Serial

no.

Paper

Studie

Approach Result Gap

1 “Fake

News

Detectio

System

using

Article

Abstract

ion”

(2019)

Natural

Language

Processing,

Article

abstraction,

Sentence

matching,

Deep

learning

They

proposed

BiMPM

(Bidirection

MultiPerspe

ctive

Matching)

model using

article

abstraction

and entity

set matching

with 0.663

AUC

accuracy

They will

propose a

different

technique

which will

use entity

matching set

and article

abstraction

and along

with

BiMPM

model

2 “Autom

atic

Online

Fake

News

Detectio

Combin

ing

Content

and

Social

Signals”

(2018)

social-based

and content-

based

methods

They

proposed

false news

detection

method and

execute this

method on

Facebook

Messenger

chatbot with

81.7%

accuracy

They will

propose a

new method

to train the

bot in

different

languages in

order to

elongate it

to various

countries

3 “Detecti

ng Fake

News

using

Machin

Learnin

g and

Deep

Learnin

Algorith

ms”

(2019)

RNN, SVM,

Naive Bayes

Logistic

Regression

They

proposed a

model to

check the

affirmation

of news

pulled out

from Twitter

which is

helpful for

fake news

recognition

with

accuracy

0.94

In future,

they could

pull out

name

entities from

news body

or news

headline and

then

examine

their

relationships

using

knowledge

ase

4 “Weakl

Supervi

sed

Learnin

g for

fake

news

detectio

n on

Twitter”

(2018)

Weakly

supervised

Classificatio

They

proposed a

weakly

supervised

method

which

impulsively

collects

large scale

datasets with

0.9 F1 score

In future,

they could

resolve the

main

challenge

this method

faced and

that is to

congregate a

training

dataset of

suitable size.

“FAKE

DETEC

TOR:

Effectiv

Data

Mining,

Text

Mining,

They

proposed an

automatic

false news

In future,

experiments

can be done

on live false

e Fake

News

Detectio

n with

Deep

Diffusiv

e Neural

Networ

k”

(2018)

Diffusive

Network

credibility

inference

model which

they have

named as

FAKEDETE

CTOR with

0.63

accuracy

score.

news

dataset.

“Fake

news

detectio

n in

social

media”

(2018)

SVM,

Semantic

Analysis,

Naïve Bayes

Classifier

They

proposed a

three-part

method.

In future,

this

proposed

method will

be test out.

In this

paper, this is

yet to do due

to limited

knowledge

and time

“Fake

Data

Analysi

s and

Detectio

n Using

Ensemb

led

Hybrid

Algorith

m”

(2019)

Classificatio

n, Decision

tree, Natural

Language

Processing,

Random

forest, Naïve

bayes,

SVM, KNN

They

proposed a

hybrid

approach

false news

detection

with 94%

accuracy

In future

work, this

algorithm

will

compare

with the

deep NN

and then test

result will

be drawn.

This can be

done to save

time in

training the

deep NN

“Hoax

Web

Detectio

n for

News in

Bahasa

Using

Support

Vector

Machin

e”

(2019)

Text

Mining,

Support

Vector

Machine

They

proposed a

model that

aims is to

find fake

and real

news. This

system is

done on

Indonesian

Language

with an

accuracy of

85%

In future,

this work

can be done

on other

languages

“An

Integrat

approac

h for

Malicio

Tweets

detectio

n using

NLP”

(2017)

Machine

Learning,

Statistical

Natural

Language

Processing

They

proposed a

method

which is

based on

two aspects:

without

knowing

background

of the

consumer,

the

affirmation

of spam-

tweets and

the other

In future,

this method

can focus on

user

accounts as

now it only

focusses

mainly on

analyzing of

tweets

Fake News Detection in Social Networks using Machine Learning: A Review

179

based on

analysis of

language for

detecting

spam on

twitter with

93%

accuracy

“Fake

News

Detectio

n Using

Sentime

Analysi

s”

(2019)

Random

Forest,

Naïve Bayes

They

proposed a

method for

fake news

detection

which

integrates

sentiment to

improve the

accuracy

(0.88 AUC)

In future

work,

dataset can

be images as

well as

videos in

addition to

this method

“A

Sensitiv

Stylistic

Approa

ch to

Identify

Fake

News

Social

Networ

king”

(2020)

One-Class

SVM

They

proposed a

method to

find false

news in

texts format,

pull out

from social

media with

an accuracy

of 86%

In future

work,

accuracy

and

precision

could

increase

“Not

Everyth

ing You

Read Is

True!

Fake

News

Detectio

n using

Machin

learning

Algorith

ms”

(2020)

NLP, K-

nearest

neighbor,

They build a

fake news

detector

which

classify text

or the news

headlines as

hoax or non-

hoax with

71%

accuracy

This

detector can

be build

using

different

algorithms.

3 RESULT AND DISCUSSION

In literature review section different techniques have

been used to proposed fake news detector model like

Naïve Bayes, (Yuslee, 2021), SVM, (Fung, 2002).

As mentioned in a paper(Bhutani, 2019), they have

done accuracy comparison between different

Machine Learning algorithms. Firstly, they have

tested Naïve Baye Model on each vector, so it gives

73% accuracy on count vector, 75% on N - gram

vector, and character vector Word Level TF-IDF as

well. Then regression model was executed. It gives

76% and 74% on count respectively and word level

features. Thirdly, SVM was performed and it gives

accuracy of 74% in all the features.

Figure 1 shows(De Oliveira, 2020) the overall

accuracy comparison chart of Deep Learning and

Machine learning algorithms including SVM,

Logistic Regression, Naïve Bayes, RNN, LSTM.

Figure 1. Comparison of Deep Learning and Machine

learning algorithms based on accuracy

Figure 2 shows the accuracy of different methods

used in the literature review Section II. Accuracy of

Random forest technique(Reddy, 2019) is 94% when

the author proposed a hybrid approach false news

detection. Second, Article Abstraction(Kim, 2019)

gives an accuracy of 66.30 % when they proposed

BiMPM (Bidirectional MultiPerspective Matching

model and one of the authors proposed(Della

Vedova, 2018) false news detection method with

content based and social based methods that gives an

accuracy of 81.70%. Accuracy of SVM(Rahmat,

2019) is 85% when it is used in hoax web detection

system. Statistical NLP accuracy(Gharge, 2017) is

93% when author proposed an integrated approach

for false tweets detection. KNN gives the

accuracy(Tiwari, 2020) of 71% when they build a

fake news detector.

Figure 2. Overall accuracy of different techniques.

94%

66,30%

81,70%

85%

93%

71%

20%

40%

60%

80%

100%

Random

Forest

Content

Based&

SocialBased

StatisticalNLP

Accuracy

ICACSE 2021 - International Conference on Advanced Computing and Software Engineering

180

4 CONCLUSION

In this manuscript, we summarized various Machine

Learning techniques used in detecting false news

and the type of data we see on social media posts

i.e., text, multimedia or hyperlinks. Whereas there is

conspicuous achievement in detection of false news

or fake posts with the use of various Machine

learning approaches. Although, dynamic features of

hoax news in social media is causing problem in

classification of false news. These days false news is

creating various issues from sarcastic articles to a

fabricated news. Lack of trust and false news in the

media are raising problems with great effect in our

society.

Although, the main feature of Machine Learning

is its potentiality to robotize repetitive tasks and

consequently, increasing productivity. Lots of

research work is going to execute Machine Learning

methods like Naïve Bayes, SVM, Random forest,

KNN.

REFERENCES

Bhutani, B., Rastogi, N., Sehgal, P., & Purwar, A. (2019,

August). Fake news detection using sentiment

analysis. In twelfth international conference on

contemporary computing (IC3) (pp. 1-5). IEEE.

De Oliveira, N. R., Medeiros, D. S., & Mattos, D. M.

(2020). A sensitive stylistic approach to identify fake

news on social networking. IEEE Signal Processing

Letters, 27, 1250-1254.

Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G.,

DiPierro, M., & de Alfaro, L. (2018). Automatic

online fake news detection combining content and

social signals. In 22nd Conference of Open

Innovations Association (FRUCT) (pp. 272-279).

IEEE.

Ferrara, E., Varol, O., Davis, C., Menczer, F., &

Flammini, A. (2016). The rise of social

bots. Communications of the ACM, 59(7), 96-104.

Fung, G., Mangasarian, O. L., & Shavlik, J. W. (2002).

Knowledge-based support vector machine classifiers.

In NIPS (pp. 521-528).

Gharge, S., & Chavan, M. (2017). An integrated approach

for malicious tweets detection using NLP.

In International Conference on Inventive

Communication and Computational Technologies

(ICICCT) (pp. 435-438). IEEE.

Helmstetter, S., & Paulheim, H. (2018). Weakly

supervised learning for fake news detection on

Twitter. In IEEE/ACM International Conference on

Advances in Social Networks Analysis and Mining

(ASONAM) (pp. 274-277). IEEE.

Kim, K. H., & Jeong, C. S. (2019). Fake news detection

system using article abstraction. In 16th International

Joint Conference on Computer Science and Software

Engineering (JCSSE) (pp. 209-212). IEEE.

Mahir, E. M., Akhter, S., & Huq, M. R. (2019). Detecting

fake news using machine learning and deep learning

algorithms. In 7th International Conference on Smart

Computing & Communications (ICSCC) (pp. 1-5).

IEEE.

Nickerson, R. S. (1998). Confirmation bias: A ubiquitous

phenomenon in many guises. Review of general

psychology, 2(2), 175-220.

Parikh, S. B., & Atrey, P. K. (2018). Media-rich fake news

detection: A survey. In IEEE conference on

multimedia information processing and retrieval

(MIPR) (pp. 436-441). IEEE.

Rahmat, M. A., & Areni, I. S. (2019 ). Hoax Web

Detection For News in Bahasa Using Support Vector

Machine. In International Conference on Information

and Communications Technology (ICOIACT) (pp.

332-336). IEEE.

Reddy, P. B. P., Reddy, M. P. K., Reddy, G. V. M., &

Mehata, K. M. (2019, March). Fake data analysis and

detection using ensembled hybrid algorithm. In 2019

3rd International Conference on Computing

Methodologies and Communication (ICCMC) (pp.

890-897). IEEE.

Stahl, K. (2018). Fake news detection in social

media. California State University Stanislaus, 6, 4-15.

Tiwari, V., Lennon, R. G., & Dowling, T. (2020). Not

Everything You Read Is True! Fake News Detection

using Machine learning Algorithms. In 31st Irish

Signals and Systems Conference (ISSC) (pp. 1-4).

IEEE.

Ward, A. (2013). Naive realism in everyday life:

Implications for social conflict and

misunderstanding. Values and Knowledge, 103.

Yuslee, N. S., & Abdullah, N. A. S. (2021). Fake News

Detection using Naive Bayes. In IEEE 11th

International Conference on System Engineering and

Technology (ICSET) (pp. 112-117). IEEE.

Zhang, J., Dong, B., & Philip, S. Y. (2020). Fakedetector:

Effective fake news detection with deep diffusive

neural network. In IEEE 36th International

Conference on Data Engineering (ICDE) (pp. 1826-

1829). IEEE.

Fake News Detection in Social Networks using Machine Learning: A Review

181