”Fake News Detector”: An Automatic System for the Reliability

Evaluation of Digital News

Claudio Cilli

, Giulio Magnanini

, Lorenzo Manduca

and Fabrizio Venettoni

Department of Computer Science, La Sapienza, University of Rome, Rome, Italy

Keywords:

Fake News, Misinformation, Disinformation, AI, Scraper, Deep Learning, NLP, Cybersecurity, 5W.

Abstract:

Nowadays, information is taking on an increasingly central role in people’s lives. With the rise of internet, the

amount of information has grown exponentially as the ease of publishing content of all types has increased.

At the same time, however, the risks deriving from the lack of checks on the truthfulness of these have also

increased. In fact, a ”Fake” information content can lead to serious reputational, economic or health damages.

To overcome the problem of veriﬁcation, several studies have been carried out, but none of them appears to

have been the subject of a signiﬁcant commercial implementation. In all the projects proposed so far there is a

common thread, that is to act directly to the source of the news, examining, thanks to existing technologies, the

possible truthfulness of the same. On the other hand, lacking a real involvement of the reader, these projects

were not suitable to increase the awareness of the end user. This work implements a technological platform

able to provide a reliability value of a digital news, thus measuring the level of impartiality of the author

through the evaluation of a deﬁned series of parameters.

1 INTRODUCTION: STATE OF

ART

”Fake News” is a phenomenon that has always char-

acterized the history of mankind and has a dramatic

impact on the society. Today, Fake news are consid-

ered one of the biggest threats to democracy, justice,

public trust, freedom of expression, journalism and

economy. Nowadays there is a large body of scientiﬁc

literature on the subject. Among the many propos-

als there are several articles on the subject, which, for

the most part, propose the use of Machine Learning

technologies, Neural Networks, Multimodal systems,

BlockChain, Deep Learning. In all the projects pro-

posed so far there is a common thread, that is to act

directly to the source of the news, examining, thanks

to existing technologies, the possible veracity of the

same. Lacking the focus on the user, these projects

are not suitable to increase the awareness of the end

user, the primary objective of this work. However, a

list of major works is given:

• MVAE: Multimodal Variational Autoencoder for

https://orcid.org/0000-0001-9558-2565

https://orcid.org/0000-0002-0006-5600

https://orcid.org/0000-0002-8778-3006

https://orcid.org/0000-0002-7592-6479

Fake News Detection that proposes, for the detec-

tion of fake news, a bimodal variational autoen-

coder coupled with a multimodal (textual + vi-

sual) binary classiﬁer; (Khattar et al., 2019)

• DeHiDe: Hybrid model combining blockchain

technology with an intelligent deep learning

model to strengthen robustness and accuracy in

combating fake news; (Agrawal et al., 2020)

• dEFEND: a fake news detection system that

leverages user comments to verify whether the

news is fake or real. (Shu et al., 2019a)

• Fake News Early Detection: A Theory-driven

Model: In this paper, a theory-driven model for

fake news detection is proposed. The proposed

method aims to investigate news content at vari-

ous levels: lexical, syntactic, semantic, and dis-

course. News is represented at each level, rely-

ing on established theories in social and foren-

sic psychology. Fake news detection is then

conducted within a supervised machine learning

framework. This work explores potential fake

news models, improving interpretability in engi-

neering fake news features by studying various as-

pects of them. (AZhou et al., 2019)

In addition to the scientiﬁc literature, some tools

and utilities are highlighted such as:

Cilli, C., Magnanini, G., Manduca, L. and Venettoni, F.

IFake News Detectorâ

I: An Automatic System for the Reliability Evaluation of Digital News.

DOI: 10.5220/0010769700003120

In Proceedings of the 8th International Conference on Information Systems Security and Privacy (ICISSP 2022), pages 15-24

ISBN: 978-989-758-553-1; ISSN: 2184-4356

• Fiskkit: a platform created by John Pettus that

aims to build a place to promote consistency and

neutrality of information through participant sub-

mission of feedback to help people identify what

is true, false, well-argued, or logically incorrect in

articles or opinions. (Pettus, 2018)

• TextThresher: web interface that reﬁnes the so-

cial science practice of content analysis, making

it more transparent and scalable to hundreds of

thousands of documents. (Adams, 2016)

• FakeNewsTracker: a tool for collecting, detect-

ing, and visualizing fake news, using datasets and

ML models by extracting useful features. (Shu

et al., 2019b)

• FakeNewsNet: a data repository with news con-

tent, social context, and spatiotemporal informa-

tion for studying Fake News on social media.

(Shu et al., 2019c)

• Detecting Fake News in Social Media Net-

works: The purpose of the following work was to

ﬁnd a solution that can be used by users to identify

and ﬁlter out sites that contain false and mislead-

ing information. Using simple and carefully se-

lected features of the title and post, it is possible

through the use of the tool to identify false posts.

(Aldwairi and Alwahedi, 2018)

• It is worth mentioning, among the projects under

development, SocialTruth (Demestichas, 2018).

This European project wants to provide an inno-

vative and distributed way, thanks to BlockChain

and Machine Learning technology, to achieve

both content and author credibility veriﬁcation

and fake news detection, in order to increase trust

in Social Media. However, the project started in

2019, is still in the implementation phase, with no

relevant results published so far.

2 FAKE NEWS DETECTOR: A

TOOL TO ASSIGN

RELIABILITY TO NEWS

2.1 Why ”Fake News Detector”?

”Fighting fake news is like battling a many-headed

Hydra while swimming in a tsunami of slime.”

(Govindraj Ethiraj)

As described in the previous paragraph, all the

projects deﬁned act directly to the source of the news

to verify the reliability of the content. Lacking the

focus on the user, these projects are not suitable to

increase the awareness of the end user, the primary

objective of this work. Other limits arised from the

existing projects are:

Lack of an Implemented Tool: there isn’t any

standard tool that veriﬁes fake news;

Lack of a Tool that Implements the 5W’s Theory:

5w’s theory is one of the standard in the fact checking

journalism, but there isn’t any tool that implement it

correctly;

Tool ”No-comment based”: a lot of tools often

verify news asking for user comments;

Fake news Detector aims to overcome those limits

through the implementation of a user-friendly tool

that, using Web-Scraping and AI, tries to implement

in a very simple way, the 5Ws theory to verify a

fake news. This project is related to the creation of

a Python application (Lack of an implemented tool)

that uses a series of functions that assign a score in

order to the 5ws parameters (Lack of a tool that im-

plements the 5W’s theory). Fake news detector also

guarantees the principle that the reliability of a news

is not determined by what other people think but it de-

pends on a series of objective parameters derived by

theory (Tool ”No-comment based”).

3 INFORMATION,

MISINFORMATION AND THE

5WS THEORY

3.1 Information vs. Misinformation

The difference between Fake News and other types

of news is their intent. As you can see from the table

below, a Fake News has a malicious intention, created

with the aim to destabilize the environment and cause

havoc. On the opposite, of a False News, we don’t

know if the intention is necessarily malicious or not

as well as for Disinformation and Rumor.

Table 1: Classiﬁcation of the different types of News.

Authenticity Intention

Fake News False Malicious

False News False Unknown

Satire Unknown Not Malicious

Disinformation False Malicious

Misinformation False Unknown

Rumor Unknown Unknown

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

3.2 The 5Ws Theory

The 5W theory is extremely useful and necessary in

order to defend against Fake News. It represents ques-

tions whose answers are considered fundamental in

gathering information or solving problems. Often

used by journalists (Hart, 2012) (MediaSmarts, 2012)

(Copy Editing, 2008) to write their articles correctly,

these can also be used to assess whether a news item

can be considered reliable or not. A news story can

only be considered complete if it answers these ques-

tions:

• WHO: Is there an author of the news item?

• WHAT: Type of news item. Is it a News Fact,

election propaganda, dogma, maxim, etc.?

• WHEN: When was the news story written? and

when did the events reported in the news story

happen?

• WHERE: Where is the location of the news

story?

• WHY: What is the motivation for publishing the

news? Chronicle, informational, economic, so-

cial/political, etc.

The 5W rule, in addition to the analysis of the param-

eters of presentation of the news, constitute the funda-

mental elements for the identiﬁcation of the reliability

of a piece of information.

4 APPLIED TECHNOLOGIES

4.1 Web-Scraping

Web scraping refers to the extraction of data from a

website. This information is collected and then ex-

ported in a format useful for analysis.

The following web scrapers were used for the imple-

mentation of ”Fake News Detector”: Goose3, Scrapy,

BeautifulSoap.

4.2 NLP and Feature Elaboration

Thanks to NLP technologies, the prototype built for

the detection of fake news is able, given an input text,

to extract the locations present in it, to allow us to

calculate the distance between the place of occurrence

of the facts of the news and the place of the person

who is making the search. To do this, some libraries

and mathematical formulas have been used such as:

spaCy, Geopy.

4.3 RPA

Robotic process automation (RPA) is the technology

that allows anyone today to conﬁgure computer soft-

ware, or a ”robot” to emulate and integrate the actions

of a human interacting within digital systems to exe-

cute a business process. RPA robots utilize the user

interface to capture data and manipulate applications

just like humans do. They interpret, trigger responses

and communicate with other systems in order to per-

form on a vast variety of repetitive tasks. Only sub-

stantially better: an RPA software robot never sleeps

and makes zero mistakes. (UiPath, 2020)

For this tool is used Selenium WebDriver for RPA.

5 THE PROJECT: FAKE NEWS

DETECTOR

5.1 Requirement Speciﬁcation

Fake News Detector has as objective the deﬁnition

of an automatic tool for the detection of Fake News.

Unlike the projects mentioned above, we tried to fo-

cus and involve the user as much as possible without

working only on the source of the news, trying to de-

velop his awareness. The strong point of the tool is

the automation of the 5W theory, to which two How

are added. This can be summarized in the following

points:

• WHO: this point aims to verify the presence or

absence of the author of the news. It wants to an-

swer the question: ”Is there an author for the news

item to be analyzed?”. Based on the answer, the

tool assigns a 0/1 score as follows:

a. [0] The author of the news item is not present.

b. [1] The news item is attributable to an explicit

author.

Generally, a Fake news, does not present an au-

thor.

• WHAT: this point aims to search for the topic of

the news. Based on the type of topic, the tool as-

signs a score from 0 to 1 as follows:

a. [0] It is not possible to classify the topic of the

news item.

b. [0.5] Dogma, maxim, speech, political propa-

ganda, commercial propaganda, satire, provo-

cation

c. [1] Fact, scientiﬁc publication, news story.

A news story considered reliable is most often at-

tributable to a well-deﬁned topic.

IFake News Detectorâ

I: An Automatic System for the Reliability Evaluation of Digital News

• WHEN: this point aims to identify the date of oc-

currence of the events that took place in the news

story. The tool assigns a score from 0 to 1 in the

following way:

a. [0] If there is no reference date of the news item

b. [1] If present in explicit form

A fake news story generally does not have precise

dates. Therefore, it is only possible to give a high

score to a news item if the date is present in an

explicit form.

In addition to the above points, the temporal prox-

imity to the publication date is also added as a fac-

tor in determining the When. The following score

is assigned:

a. [0] If the facts in the news item refer to a period

prior to 180 days (approximately 6 months)

from submission to the tool.

b. [0.15] If the facts of the news item refer to a

period before 2 years since submission to the

tool

c. [0.35] If the facts in the news item refer to a

period earlier than 2 years after submission to

the tool

Another very important feature when evaluating a

news item is the date of occurrence of the reported

facts. Tendentially, more the date of occurrence of

the facts is close to today’s date, more increases

the probability that it is a fake news constructed

ad hoc to deceive the reader. The reader of the

news tends to attribute to it a greater credibility

if the facts present in it have happened in a rel-

atively close period, since the subject is current;

while, the more we move away from today’s date,

the more the probability that the news is a fake

news decreases (but even if it were, it would no

longer have the same media effect of a recent news

story).

• WHERE: this point aims to identify the place

where the events took place and attribute the prox-

imity to the place where the person who intends to

carry out the veriﬁcation resides. The tool assigns

a score from 0 to 1 in the following way:

a. [0] If it does not exist or is not possible to iden-

tify the place of occurrence of the facts

b. [1] If it is possible to attribute the place of oc-

currence of the facts

A news considered reliable will have a well de-

ﬁned place of occurrence of the facts

In addition, an additional score will be assigned

according to the following parameters:

a. [0] If the news is distant less than 1650 KM

b. [0.15] If the news is at medium distance from

the place where the veriﬁer resides (distance

between 1650 KM and 6000 KM)

c. [0.35] If the news is at long range from

the place where the veriﬁer resides (distance

greater than 6000 KM).

is important to analyze the distance between the

person searching for the news and the place where

the news occurred. In general, the closer the dis-

tance of occurrence of the facts is to the place of

residence of the person doing the research, the

greater the probability that it is fake news. The

person reading it will have more interest in giving

importance to a news story that takes place in a

place relatively close to his own

• WHY: this point aims to research the purpose for

which the news story was published. It intends

to answer the question, ”Why did the person who

published or shared the news do so? What pur-

pose did it achieve? What emotions did it evoke

in me?” Based on the type of answer, the tool as-

signs a score from 0 to 1 as follows:

a. [0] If the purpose of the person who published

the news item is to arouse emotions in the

reader or to induce them to buy a good/do an

activity

b. [1] If the purpose of the publisher is to inform

in a disinterested manner

If a news item arouses strong emotions in the reader

(fear, anger, dismay, etc.) or causes him to change his

idea/opinion or to perform an action that he would

not have done before, then the news item will most

likely be a fake news item created with a very speciﬁc

purpose. Generally, a real news that has the sole pur-

pose of informing the reader, should arouse in the one

who reads it a neutral feeling.

The automation of checks follows the following solu-

tion:

• WHO: A web scraper is used to verify the exis-

tence of the author with the goal of extracting the

same from the xml code of the page.

• WHAT: To understand the topic of the news, the

tool asks the user to select one of the options from

a specially designed drop-down menu.

• WHEN: Through an algorithm of ”String Match-

ing” it searches for the date within the news; if

this is not explicitly present, through a semantic

analyzer it searches for names related to known

periods of the year (e.g. Christmas, New Year,

Winter, Summer, etc.). Once determined the date,

the proximity is calculated and the parameters are

assigned as deﬁned above.

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

• WHERE: Using a semantic analyzer trained on

a dataset of geographic locations, the presence of

the location is extracted. Once extracted, through

the Nominatim API, the distance in km from the

address declared by the searcher is calculated.

• WHY: In order to resolve this point, a specially

designed drop-down menu is proposed.

In addition to the parameters provided by the 5W,

two other parameters deﬁned as follows will be stud-

ied:

• HOW 1 - Analysis of the relationship between

uppercase and lowercase letters. Through a syn-

tactic analyzer we analyze the relationship be-

tween capital and small letters in the title in order

to assign a value to be used as an additional pa-

rameter for the evaluation of the news. The tool

assigns a 0/1 score in the following way:

a. [0] If the news item has a capital/lowercase ra-

tio higher than 10 percent

b. [1] If the news item has a case-sensitive ratio of

less than 10 percent

This is the case of the so-called ”Screamed Head-

line”, which has the sole purpose of drawing the

reader’s attention. A real news does not need to

include in the title text a high percentage of capi-

tal letters

• HOW 2 - Analysis of external references to the

facts described in the news. Through the au-

tomation of Google search, Fake News Detector

proposes to the user a series of links referring to

the facts of the news. If the user recognizes a cor-

respondence with the facts in the news itself, then

the tool assigns 1 to the news, otherwise it assigns

a. [0] If the news doesn’t have any reference be-

sides the site itself

b. [1] If the news has a reference on a different

source than the site itself

A fake news story is generally not reported by

other sources but only by the publishing site.

• HOW3 - Presence of Misleading Images.

Through the automation of the reversa image

search of Google it is possible to analyze the im-

ages present in the news in order to verify if they

are coherent with the news itself. The tool assigns

a value included in the interval [0,1] in the follow-

ing way:

scoreImages =



∑

i=1

scorePar(i)



where

– n = total number of images

– scorPar(i) = score assigned to image i

There are many times in which in the news are

included images that are not relevant or modiﬁed;

all this would not make any sense if the news were

real.

6 IMPLEMENTATION

In order to proceed with the implementation of the

tool ”Fake News Detector” it was necessary to set

up the development environment as ﬁrst thing. For

the realization of the prototype was used Python ver-

sion 3.7 because of its compatibility with the libraries

described above. Spyder was also used as the pri-

mary IDE for writing code; this is because the Spyder

Python IDE is provided as the default implementation

along with the Anaconda Python distribution.

6.1 Structure of the Tool

The tool is divided into several python ﬁles. As you

can see in Figure 2, the tool has been split into sev-

eral Python ﬁles so that there are separate functions

for each task. The main ﬁle of the whole program

that inherits the various functions of the tool is called

”main.py”.

6.2 Main.py

The main.py is called when there is the necessity to

start the tool. Once this ﬁle is called, the various mod-

ules that lead to the implementation of the various use

cases are executed in cascade. This ﬁle also allows

the calculation of the total score of the various news

and the probabilistic value of truthfulness, associated

with these. To start the tool it’s necessary to invoke

the main.py from the terminal and enter the following

command line: python main.py.

6.3 WHO

The purpose of this function is to extract author, ti-

tle, text and images present in a news item. After the

insertion of a url by the user, this is passed to a func-

tion named checkURL which checks the format and

notiﬁes the user veriﬁer of any problems. After this

check, the URL is passed to a scraper named ”Goose”,

which, through the analysis of html and xml code, al-

lows with the functions ”cleaned text”, ”title”, the

extraction of text and title in string format.

After the extraction of text and title, all the images

in the page of the deﬁned url were downloaded. To

IFake News Detectorâ

I: An Automatic System for the Reliability Evaluation of Digital News

Figure 1: Tree structure of main. py.

Figure 2: How to start the tool: main. py.

do this we integrated the web scraper BeautifulSoup

to download images and the Python module ”os” to

manage directories. This part of the code checks if

there is a folder named ”articleImage” in the program

path and if it is empty. If the folder does not exist

it is created; otherwise the folder is emptied of ﬁles,

deleted, and then created again. Once the folder is

created, the images present in the ”img” tags of the

html code relative to the url passed as input are down-

loaded into it.

The function ends with the extraction of the author

and assigns the relative score.

Figure 3: Who - Author detection.

6.4 WHAT and WHY

The goal of this function is to allow the user to se-

lect the most suitable topic for the article he submit-

ted to the tool previously. Thanks to the Pandas mod-

ule, a dataframe containing the news topics is created.

Through the inquirerer module, the user is asked to

select the most appropriate topic for the facts con-

tained in the article. Following the user’s choice, the

function ends with the assignment of the score related

to the What.

Figure 4: What - Topic Selection.

Also with regard to Why the function aims to allow,

thanks to the ”inquirer” module, to select an answer

to the question shown to the verifying user: ”Does the

article would like to convince you to buy some product

or to change your opinion about some fact/person?”

In base to the type of answer selected, the function

ﬁnishes with the assignment of the relative score to

the Why

Figure 5: Why - Question.

6.5 WHEN

The objective of this function is to extract dates and

eventual time periods from the text extracted in the

”main.py” function. First of all, the function elimi-

nates the special characters in the text and then ex-

tracts the temporal periods from the text through an

NLP classiﬁer based on the temporal expressions con-

tained in the dataset of the JodaTime library and a

dataset in the ”data” folder of the library containing

the main temporal expressions taken from English

texts and divided into categories retrievable through

settings. This library is built on:

• joda-time: Library for Java date and time classes

• opencsv: Parser library

• JUnit: testing framework

• Log4j: Logging service

• Gson: Json Library for Serialization / Deserializa-

tion

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

The algorithm exploits a dataset of known tem-

poral expressions (included in the library used) and

ad-hoc formatted regular expressions for date recog-

nition. Since the extraction of time periods also in-

cludes expressions such as ”By 20 minutes”, ”From

12 to 20”. etc. it was necessary to set some settings

(previously set) that would exclude these time peri-

ods for the extraction of only expressions containing

dates. These settings, which allowed the exclusion of

these time periods, are:

• .excludeRules(”durationRule”): excludes time

periods like ”last two days”, ”last 30 minutes”,

etc.

• .excludeRules(”repeatedRule”): excludes time

periods with repeated event expressions such as

”every Sunday at 5 pm”, ”weekly”, etc.

• .excludeRules(”timeIntervalRule”): excludes

time periods such as ”from 19 till 20” and ”2:00

pm and 4:00 pm”.

After the extraction, thanks to the Pandas module, the

extracted dates are inserted into a dataframe. The

dates inserted in the dataframe are passed in output

to the user through the inquirer module. If more than

one date is present within the text, the user is asked to

select the most correct date corresponding to the facts

of the article; if no date is detected, the user is instead

asked to select ”No time Identiﬁed” After the user’s

choice, the function ends with the assignment of the

score related to the When

Figure 6: When - Detecting dates.

6.6 WHERE

The goal of this function is to extract locations or

places of interest from the text extracted in ”main.py”;

this is done thanks to the ”Spacy” module, which al-

lows NLP on the text to extract locations or places.

SpaCy extracts location features from text documents

using Named Entity Recognition (NER). The library

applies a classiﬁer based on a dictionary of known lo-

cations included in it. Also in this function, the ex-

tracted locations are inserted into a dataframe along

with their coordinates. To extract the coordinates, the

”geopy” module is used to geocode the extracted ad-

dresses, cities, places or locations into geographic co-

ordinates. Afterwards is asked to the user, thanks to

the ”inquirer” module, to select the most appropriate

place where the facts of the article took place among

those extracted by the function; if it is not possible to

assign a place of occurrence of the facts of the news,

the user selects the item ”No Location Identiﬁed”. In

addition to the choice input, selected above, the user is

asked to enter the address or place from which he in-

tends to verify the facts of the news and geocoded by

the tool. Thanks to Haversine’s function, the distance

between the place where the facts of the news hap-

pened and the place of the verifying user is calculated;

the calculation is made thanks to python’s ”math” li-

brary. The function ends with the assignment of the

score related to the Where.

Figure 7: Where - Select extracted location and insert ad-

dress.

6.7 HOW 1

The goal of this function is to calculate the ratio be-

tween the number of capital letters and the total num-

ber of letters in the title extracted and normalized in

the ”main.py”. The function counts the total letters

and the capital letters and calculates the ratio. The

function ﬁnally ends with the assignment of the score

related to the count upper.

Figure 8: Count Upper - Score.

6.8 HOW 2

The objective of this function is to simulate a search

on the Google search engine, using the title extracted

at the beginning, to establish whether or not there

are results similar to the news previously given as in-

put. To do this, the ”googlesearch” module is used, to

which the title is passed as input; the ”BeautifulSoup”

web scraper is then used to extract the results (title

and link) from the search page. Thanks to the Pandas

module a dataset is created containing the links and

the titles previously extracted by the function. Also

in this case the user is asked to select the title and the

link most similar to the news submitted by him in the

use case UC-01-01; in case no news is found or not

similar to the one submitted by the verifying user, he

can select the item ”No Link identiﬁed”, ”No Title

Identiﬁed”. The function ends with the assignment of

the score related to searchGoogle.

IFake News Detectorâ

I: An Automatic System for the Reliability Evaluation of Digital News

Figure 9: SearchGoogle.

6.9 HOW 3

The goal of this function is to simulate an image

search on the Google search engine to check if any

images in the article or news are relevant or not to the

facts contained in it. The images taken as input are

those downloaded in the folder ”articleImage” from

the ”main.py” As ﬁrst step it is checked if in the folder

”articleImage” there are images or not; this is done

thanks to the ”os” module of Python. Next, the im-

ages are encoded and the url is generated for search-

ing. Thanks to the library ”Selenium”, which instan-

tiates a web browser ”Chrome” in the background, is

passed the url of the previous point and are extracted

the information about the search by url of the im-

ages contained in the article. Also in this function the

”Pandas” module is used to create a dataset contain-

ing links and titles related to the image search. For

each image the information about the related Google

search is extracted using the tag ”fKDtNb”. This is

because Google uses its artiﬁcial intelligence algo-

rithms that try to recognize the content of images and

associate certain keywords with them for searches on

their content. If results are found, the user is asked to

indicate whether or not the topic of the article or news

item relates to the related search value extracted from

the image. If the answer is yes, the function updates

the partial score of the images by adding the value 1

and adds the value 1 also to the image count. More-

over, in case of negative answer, the function extracts

the links related to the image from the search and asks

the user to select one if related to the facts of the news;

if there is no related link, the user will select ”No Link

identiﬁed”. If a link is selected, the function updates

the partial image score by summing the value 1 and

adds the value 1 also to the image count; if no link is

selected, the function updates the partial image score

by summing the value 0 and adds the value 1 to the

image count. This process is repeated for each image

in the ”articleImage” folder. The function ends with

the assignment of the score relative to imageAnalize

6.10 Total Score

As described at the beginning of the chapter, the cal-

culation of the total score of the various news items

and the probabilistic value of the truthfulness associ-

ated with them is carried out in the ”main.py” func-

tion. At the end of each function that is called by the

method, it adds the score obtained by the single func-

tion and updates the variable called score tot which

contains the value of the total score obtained by the

news.

7 TEST

In order to test the tool and calibrate a threshold of

accuracy for Fake News identiﬁcation, a dataset con-

taining 1120 news links of different topics was cre-

ated and labeled as described below:

• 1-80 News item Fake

• 81-160 Scientiﬁc Publication Fake

• 161-240 Dogma Fake

• 241-320 Maximum Phrase Fake

• 321-400 Political Propaganda Fake

• 401-480 Commercial propaganda Fake

• 481-560 Satire Fake

• 561-640 News item Real

• 641-720 Scientiﬁc Publication Real

• 721-800 Dogma Real

• 801-880 Maximum Phrase Real

• 881-960 Political Propaganda Real

• 961-1040 Commercial propaganda Real

• 1041-1120 Satire Real

Following the application of the tool on the links the

tool detected the following values for the news:

Figure 10: Initial Evaluation of the sample.

Analyzing the graph obtained, the maximum accuracy

is obtained calibrating the Threeshold value at 5,5 as

in the following picture:

As we can see from the graph in Figure 11 we can see

that even changing the topic of the news the proposed

threshold remains the same without impacting the ac-

curacy of the system. After the calibration the tool

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

Figure 11: Results from the Test.

was tested on a further sample of 50 random news and

reported an accuracy of 97.73%. It is worth remem-

bering once again that the objective of our project is

not to distinguish with certainty a fake news from a

real one, but to signal to the user the degree of ”dan-

gerousness” of the news itself in order to solicit a criti-

cal examination. From a careful analysis of the results

it is possible to notice that the score relative to the

Who presents some zero values on news that in real-

ity present an author. This happens because the author

of the news is present in the article but is not correctly

inserted in the ”author” ﬁeld of the html code relative

to the url. As far as the extraction of the locations

is concerned, the Where function sometimes presents

difﬁculties in identifying the correct location because,

when translating into English, the name of the loca-

tion could be translated incorrectly. For example, if

one considers locations with compound names such

as ”Testa di Lepre” and ”Porte di Roma”, they may

be translated as ”Hare Head” and ”Gates of Rome”,

which prevents the tool from ﬁnding the correct lo-

cation. If consider the score related to the search for

similar news on other online sources, this often takes

the value 1 as the news is found on social networks or

other sites that represent a different source from the

original one but publish the same content. While for

the image score this is closely related to Google’s ar-

tiﬁcial intelligence algorithm which is often a source

of ambiguity. For example, a photo of a character

dressed in a smart suit (which is consistent with the

content of the news) is understood and analyzed by

Google as ”Formal Wear” creating ambiguity in the

recognition of the image and therefore lowering the

score of the news. In any case, for the purposes of the

system, these uncertainties are perfectly acceptable,

and could be the subject of further developments and

improvements of the software in the future.

8 CONCLUSION AND FUTURE

DEVELOPMENTS

The next release of the tool will verify the credibility

of the author by searching in the main search engines

for further works of the same author in order to eval-

uate his reliability. It is necessary to underline that an

author with a ”pen name” is not necessarily a writer of

fake news. In fact, many authors considered reliable

use pen names (e.g. Joseph Conrad) and these should

be credited by the tool as authors with high repu-

tation. Currently, reader interaction is almost com-

pletely automated as they only have to answer a few

multiple choice questions. In the next version of the

tool, which will be released soon, a user-friendly GUI

will be created and also the disambiguation, which is

currently manual, will be automated.

The following are additional criteria/developments

that would improve the accuracy of our tool:

Improving Instrument Accuracy:

• Studying the difference in points between the font

sizes of the title and the text adding another pa-

rameter to identify the reliability of a news item.

(A fake news generally has a much larger font in

the title than in the text).

• Dictionary of emphatic lexical forms for the anal-

ysis of the presence of these within a news story.

Generally a fake news, to convince, uses a lot of

them.

• Web scraper to automate the reverse image of

Bing that seems to have a higher degree of ac-

curacy than Google in identifying subjects in im-

ages.

• Study of an algorithm that notiﬁes the number

of special characters in a news headline. A fake

news, generally presents a very high number of

special characters in the title.

Browser Plug-In:

• An extension for Web Browser that allows the

user to calculate the degree of reliability of the

news directly within the visited page, at each own

request.

REFERENCES

Adams, N. (2016). Textthresher software.

Agrawal, P., Parwat Singh, A., and Peri, S. (2020). DeHiDe:

Deep Learning-based Hybrid Model to Detect Fake

News using Blockchain.

Aldwairi, M. and Alwahedi, A. (2018). Detecting Fake

News in Social Media Networks.

IFake News Detectorâ

I: An Automatic System for the Reliability Evaluation of Digital News

AZhou, X., Jain, A., Phoha, V., and Zafarani, R. (2019).

Fake News Early Detection: A Theory-driven Model.

Copy Editing, I. (2008). Five More Ws for Good Journal-

ism.

Demestichas, K. (2018). SocialTruth.

Hart, G. (2012). The Five Ws of Online Help.

Khattar, D., Singh, J., Gupta, M., and Varma, V. (2019).

MVAE: Multimodal Variational Autoencoder for Fake

News Detection.

MediaSmarts (2012). Deconstructing Web Pages of Cy-

berspace.

Pettus, J. (2018). Fiskkit.

Shu, K., Cui, L., Wang, S., Lee, D., and Liu, H. (2019a).

DEFEND: Explainable Fake News Detection. As-

sociation for Computing Machinery, New York, NY,

USA,.

Shu, K., Mahudeswaran, D., and Liu, H. (2019b). Fake-

newstracker: a tool for fake news collection, detec-

tion, and visualization.

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and Liu, H.

(2019c). akeNewsNet: A Data Repository with News

Content, Social Context and Spatialtemporal Informa-

tion for Studying Fake News on Social Media.

UiPath (2020).

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy