Email Spooﬁng Attack Detection through an End to End Authorship

Attribution System

Giacomo Giorgi, Andrea Saracino and Fabio Martinelli

Informatics and Telematics Institute (IIT) of National Research Council, via G. Moruzzi 1, Pisa, Italy

Keywords:

Machine Learning, Deep Learning, Privacy-preserving, Email Authorship, Email Spooﬁng, Spear Phishing.

Abstract:

This paper proposes a novel email author veriﬁcation aimed at tackling email spooﬁng attacks. The proposed

approach exploits an authorship technique based on the analysis of the author’s writing style. The problem has

been studied under two viewpoints, i.e. the typical sender veriﬁcation viewpoint, already exploited in previous

works, and the sender-receiver interaction veriﬁcation, which to the best of our knowledge is a novel approach.

Hence, we introduced the concept of end-to-end email authorship veriﬁcation, which is focused on the analysis

of the sender-receiver interactions. The proposed method implements a binary classiﬁcation exploiting both

standard machine learning classiﬁers based on the well-known text stylometric features and deep learning

classiﬁers based on the automatic feature extraction phase. We have used a well-known email dataset, i.e. the

Enron dataset to benchmark our approach, with the experiments showing an authorship veriﬁcation accuracy

reaching 99% and 93% respectively for the sender and the end to end veriﬁcation scenarios. The proposed

method has been implemented as an end-user support system in the Android environment for email spooﬁng

attack detection.

1 INTRODUCTION

Email is ubiquitous in our society, and it is an essen-

tial part of daily communication, in particular in the

workplace where it is still the most common form

of communication but also in every online experi-

ence where an account is required. As afﬁrmed in

(Radicati Group, 2019), in 2019, the total number

of business and consumer emails sent and received

per day will exceed 293 billion and is forecast to

grow over 347 billion by the end of 2023. Despite

the beneﬁts provided by email communication, it has

also generated new fraud opportunities which can ex-

pose the end-user private information to strict secu-

rity and privacy threats. In recent years the percent-

age of unsolicited email sent intending to steal pri-

vate information or harm the recipient device is in-

creasing. Basing on the Spam and phishing report

published by Kaspersky Lab

, the average percent-

age of spam email in global mail trafﬁc in 2018 and

Q1 2019 are comprised between 50% and 60%. The

most widespread spam attacks are scam emails where

the malicious user tries through conﬁdence tricks to

deceive the victim into stealing personal information.

https://securelist.com/spam-and-phishing-in-q1-2019/

90795/

One of the forms of scam attack is represented by the

spear phishing, in which the attacker is intended to

steal sensitive information from a speciﬁc victim of-

ten forging the email header so that the message ap-

pears to have originated from someone or somewhere

other than the actual source. This type of attack can

achieve a high degree of success because people are

more inclined to open an email when they think a le-

gitimate source has sent it. The nature of the orig-

inal Simple Mail Transfer Protocol (SMTP) used in

electronic mail transmission (Hoffman, 2002), does

not provide an authentication mechanism that can ver-

ify information about the origin of email messages.

A large number of valid protocols have been pro-

posed to solve the problem such as ESMTP (Myers,

1999), SPF (Wong and Schlitt, 2006), DKIM (All-

man et al., 2007), DMARC (Kucherawy and Zwicky,

2015). Nevertheless, the original SMTP is still more

widely used (201, ). Therefore a system of email au-

thorship veriﬁcation based on the writing style anal-

ysis can be a valid alternative to support end-user to

determine, with a certain conﬁdence degree, whether

the email sender is who declares to be. In this paper,

we focused on a speciﬁc email scam (spear phishing)

based on email spooﬁng attack and we implemented

a new support end-user system able to detect such at-

Giorgi, G., Saracino, A. and Martinelli, F.

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System.

DOI: 10.5220/0008954600640074

In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 64-74

ISBN: 978-989-758-399-5; ISSN: 2184-4356

tack analyzing the email content. In the paper is given

the description of the email scam attack and, as con-

termeasures, two different scenarios based on email

authorship veriﬁcation are presented: (i) a detection

on the server side which can exploits the character-

ization of the overall writing style of a sender, and

(ii) a detection on the client side that marginalizes the

characterization of a sender only to a speciﬁc receiver

(End to End writing style). We considered solutions

based on machine learning systems experimenting

both standard machine learning classiﬁers based on

well-known text stylometric features and deep learn-

ing classiﬁers characterized by an automatic features

extraction. To reach the best accuracy has been exper-

imented different training approaches, which consider

different subset of the dataset used. The best model

has been employed in the realization of a secure email

client application for Android as instrument to sup-

port the end-user in the detection of suspicious emails.

The paper is organized as follows. In Section 2 the

background concepts related to the spear phishing at-

tack and an introduction of the authorship problem

are explained. In Section 3, the proposed authorship

approach is discussed, and the details of the frame-

work applied in two possible scenarios, are provided.

In Section 4, the feature-based and the deep learning

classiﬁers used and implemented are detailed. Sec-

tion 5 provides a description of the dataset used and

the experiments conducted. In Section 6 the results

obtained are presented and discussed. Section 7 de-

scribes a panoramic of authorship works analyzed in

the literature. In Section 8, the concluding remark and

the possible future work are discussed.

2 BACKGROUND

In this section, the background concepts related to the

spear phishing attack and the introduction of the au-

thorship problem are given. In particular, is described

the basic concept of the attack showing how it is pos-

sible forge an email sender through the email spoof-

ing. Besides, the concept of email authorship is in-

troduced, deﬁning two possible learning writing style

approaches.

2.1 Spear Phishing

The spear phishing is a form of email scam intended

to steal sensitive information from a speciﬁc victim.

Unlike traditional spam attacks, spear-phishing are

personalized to their victims and messages are mod-

iﬁed to speciﬁcally address that victim. This type

of attack can achieves a high degree of success be-

cause people are more inclined to open and reply to

an email when they think a legitimate or a trustwor-

thy source has sent it. The majority of spear phish-

ing emails use email spooﬁng as hacking technique to

forge the sender address acting on the email header.

Due to the structure of the Simple Mail Transfer Pro-

tocol (SMTP) used in the electronic mail transmis-

sion, email services by default are not capable of iden-

tifying and blocking deceptive emails with a forged

sender name or email address.

2.2 Authorship

The Authorship attribution process is deﬁned as the

problem of determining the likely authorship of a

given document. It can be divided into two sub-

problems: (i) authorship identiﬁcation and (ii) author-

ship veriﬁcation. The goal of the identiﬁcation is to

predict the author of an unknown text within a closed

set of candidate authors where, from the classiﬁcation

point of view, can be viewed as a multi-class text clas-

siﬁcation task. While the goal of the authorship veriﬁ-

cation is to predict whether a text is written by the de-

clared author and it can be modeled as a binary classi-

ﬁcation problem in which we attempt to distinguish a

single author (target class) from all other authors (not

target class). In literature, the problem has been ad-

dressed through a study of the linguistic style of a per-

son taking as assumption that each author has distinc-

tive writing habits which can be represented by writ-

ing stylistic features. From our perspective, the writ-

ing style of a person, can be divided into two different

writing style abstraction level: (i) individual writing

style, which is related to the generic writing style of a

person discernible in every context and (ii) end to end

writing style, related to a user writing style used only

with a speciﬁc receipts. The concept of individual

writing style is related to the fact that it is possible to

detect distinctive stylistic features that do not change

respect to the context, situation, or recipient. Such

independence led to consider the individual writing

style as a measurable human trait such as a biomet-

ric characteristic. Therefore analyzing text/messages

sent by an author to a subset of recipients, it is pos-

sible to understand the individual writing style of the

sender and infers the author of the text/messages sent

to new recipients. The concept of End to End writing

style is based on the fact that a person can assume dif-

ferent writing style depending on the recipient (e.g.,

colleague, friend, family member), therefore infer the

author of a text/message it is possible only analyzing

the interaction sender-receiver in order to learn a cus-

tom linguistic ﬁngerprinting for each communication.

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System

Receiver B

mail server

Receiver A

mail server

Sender

mail

server

Authorship

classiﬁer

RECEIVER

SENDER

Figure 1: Sender email veriﬁcation.

3 SCENARIO

Basing on the email system architecture, in this sec-

tion, two possible scenarios in which the authorship

system can be applied, are presented.

3.1 End to End Email Veriﬁcation

To test the email authorship based on end to end writ-

ing style, we consider a context in which, the email

authorship system is located on the client-side. In

such scenario, only an end to end communication be-

tween the parties is known and considering the sys-

tem implemented on the receiver side, only a subset

of emails related to the single sender-receiver interac-

tion are known. The system, located on the receiver

side, performs a writing style analysis of the received

email and assigns it, with a certain conﬁdence degree,

a probability of belonging to a legitimate sender. As

showed in Figure 2 a malicious user intended to per-

form a spooﬁng attack sends an email to the victim

declaring to be a legitimate identity, when the email

arrives at the receiver side is analyzed to the author-

ship email classiﬁer which, knowing the end to end

writing style of the declared sender, assigns it a low

probability to be an email provided by a legitimate

identity working as an Email Anti-Scam tool.

3.2 Sender Email Veriﬁcation

If the point of view is moved on the server side, the

quantity of information known is not restricted to one

single sender-receiver communication but to all the

communication which involves the sender. In that

case, knowing how the sender writes to all its recip-

ients, the writing style is better characterized, there-

fore a individual writing style can be learned. As

showed in Figure 1, a malicious user intended to per-

form a spooﬁng attack send an email to the victim

declaring to be a legitimate identity, when the email

Authorship

classiﬁer

Receiver B

mail server

Receiver A

mail server

Authorship

classiﬁer

Sender

mail

server

SENDER

RECEIVER A

RECEIVER B

Figure 2: End to end email veriﬁcation.

is delivered to the sender email server, the email au-

thorship classiﬁer, knowing the generic writing style

of the declared sender analyzes the email and assigns

it a low probability to be an email provided by a legit-

imate identity and send it back.

3.3 Threat Model

In this section is detailed the threat model explain-

ing how the attack can be performed, and the attacker

knowledge. The aim of the attacker, which performs a

spear-phishing attack, is to steal sensitive information

from a speciﬁc victim. We assumed that the adversary

knows the recipient’s email address (victim email ad-

dress) and the email address of a trusted source for

the recipient. In such a case, the attacker can imper-

sonate the trusted source and it can asks sensitive in-

formation from that speciﬁc victim. we also assumed

that the victim and the trusted email accounts are not

compromised, whereby the attacker doesn’t know the

trusted source writing style.

trusted.user@domain.com

attacker.user@domain.com

FROM:

trusted.user@domain.com

victim.user@domain.com

FROM:

trusted.user@domain.com

Figure 3: Spear Phishing Attack.

Figure 3, shows a practical example of the spear-

phishing. The attacker, knowing only the victim’s

email address and the email address of a trusted user

for the victim, forges the email sender ﬁeld and sends

an email to the victim, impersonating the trusted

source.

4 AUTHORSHIP CLASSIFIERS

The email authorship veriﬁcation can be modeled as

a text binary classiﬁcation problem to distinguish the

target class (email sent by the declared author) from

the not target class (email sent by an author different

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

from who declares to be). The two types of classi-

ﬁers used in the experiments can be divided into two

classes based on the feature extraction method used:

(i) features engineering-based, which require domain

knowledge of the data to extract features, and (ii)

word embedding based, which perform an automatic

feature extraction process to learn a words represen-

tation from the data.

4.1 Features based Classiﬁer

Features based classiﬁers used in the experiments,

consider a set of linguistic features validated in many

authorship veriﬁcation works (Brocardo et al., 2015),

(Zheng et al., 2006). The three main elements that

describe a language are lexis, syntax, and seman-

tics. The lexical features are text items that can be

a word, part of a word, or a chain of words. Lex-

ical items are the basic building blocks of a lan-

guage’s vocabulary and can be used to measuring

the lexical richness of a writing style. By deﬁni-

tion, the syntax is the set of rules, principles, and

processes that govern the structure of sentences in a

given language. Finally, the structural features mea-

sure the text organization in terms of the number of

sentences or sentence length. The complete list of

features used in our classiﬁers is reported in Table

1. As classiﬁers, seven different states of art ma-

chine learning algorithms are experimented: Nearest

Neighbors (Dasarathy, 1991), Radial Basis Function

kernel SVM (RBF SVM) (Suykens and Vandewalle,

1999), Decision Tree (Quinlan, 1986), Random For-

est (Ho, 1995), AdaBoost (Freund et al., 1999), SGD

(Kiefer et al., 1952) and Logistic regression (Peng

et al., 2002).

4.2 Word Embedding Classiﬁer

Neural Networks (NN) require input data as se-

quences of encoded integers so that each word has to

be represented by a unique integer. Therefore it is

necessary an encoding schema that represents a se-

quence of text in an integer vector. Word embed-

ding is a technique for representing words and doc-

uments using a dense vector representation (Mikolov

et al., 2013), its aim, is a text description where for

each word in the vocabulary corresponds a real value

vector in a high-dimensional space. The vectors are

learned in such a way the words that have similar

meanings have similar representations in the vector

space. Such text representation is more expressive

than more classical methods like bag-of-words, where

relationships between words or tokens are ignored,

or forced in bigram and trigram approaches. In ev-

Table 1: Linguistic features.

Category Feature

Lexical

Number of Characters (C)

Number of lower Characters/C

Number of Upper Characters/C

Number of white space/C

Number of special Char/C

Number of Vowels/C

Frequency of Vowels

Frequency of non Vowels

Frequency of special Char

Number of Words (W)

Average length per Word

Number of unique words

Word(W) - Char (C) ratio

Most frequently words

Word 2 and 3-grams

Structural

Number of short words/W

Number of long words/W

Number of Sentences (S)

Average number of words in Sentences

Number of sentences beginning with Uppercase/S

Number of sentences beginning with Lowercase/S

Syntactical

Number of punctuation

Punctuation frequency

Number of symbols

Symbols frequency

ery network implemented, the embedding layer is ini-

tialized with random weights to learn, along with the

model, an embedding space for all of the words in

the training dataset (custom word embedding). In this

way, the vocabulary created reﬂects the terms con-

tained in the dataset, and it is independent of the lan-

guage. Two different types of deep learning classiﬁers

based on word embedding have been experimented:

(i) Convolutional Neural Network and (ii) Recurrent

Convolutional Neural Network.

4.2.1 Convolutional Neural Network

During recent years, Convolutional Neural Network

(CNN) has achieved great performances in the Com-

puter Vision ﬁeld. The extension of the CNN in other

ﬁelds has proved the effectiveness also in Natural

Language Processing (NLP), outperforming state of

the art (Zhang et al., 2015). The CNN architecture is

composed of a combination of layers that, perform-

ing a non-linear operation (convolution and subsam-

pling), can extract essential features from the input

data (text sentences in our case). Convolutional lay-

ers apply a set of learnable ﬁlters to the input with

small receptive ﬁelds. Such ﬁlters are a sort of mask

that is applied to the word representation of the input

text through a sliding window to detect different text

patterns. The set of features extracted through the ﬁl-

ters are called feature map. The convolutional opera-

tion is typically followed by a subsampling operation

performed by a max-pooling layer. This layer aims

to reduce the dimensionality of the feature map and

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System

Word

embedding

500

Convolutional

6x100

Convolutional

7x100

Convolutional

8x100

1D Max

pooling

layers

Concat

Target

Not

target

Fully

connected/

softmax

Figure 4: CNN architecture

extract the most signiﬁcant features. The architec-

ture implemented is composed of three essential part:

(i) Custom Word embedding, (ii) Convolutional part,

and (iii) Fully connected part. As convolutional neu-

ral network, we experimented a multi-channel Con-

volutional network (Ruder et al., 2016), composed of

a custom word embedding of dimension 2000 with

10000 maximal amount of words in the vocabulary,

ables to represent each text sequence with maximum

length 500 through an integer vector of size 2000.

The vector representations are routed to three differ-

ent Convolutional channels having different learnable

ﬁlter dimensions (3, 4, and 5) able to extract distinc-

tive feature maps. On the bottom of the network, the

feature maps extracted are concatenated, and a fully

connected layer with 2 softmax units is applied in or-

der to compute the probability of the input email to

belong to the declared sender. The complete Convo-

lutional architecture used in shown in Figure 4.

4.2.2 Recurrent Convolutional Neural Network

Recurrent Neural Networks (RNNs) are successfully

applied to sequential information such as speech

recognition (Graves et al., 2013), video analysis

(Donahue et al., 2015), or time series (Connor et al., ).

Different from the traditional neural networks, it con-

siders the dependency between each sequence input

value. For this reason, it can successfully be applied

to the text analysis context where the text sequences

are related to each other. Bidirectional RNNs (Schus-

ter and Paliwal, 1997) is a variant of RNN based on

the idea that the output at a speciﬁc time is dependent

not only on the previous element but also on the fu-

ture element of the sequence. The network designed

and implemented to solve the authorship problem is

a combination of a Recurrent (RNN) and a Convolu-

tional (CNN) Neural network (RCNN). The RCNN is

able to capture contextual information and text rep-

resentation, applying respectively recurrent and con-

volutional layers. The network designed and imple-

mented is composed of four part: (i) Custom word

embedding, (ii) Recurrent, (iii) Convolutional and (iv)

fully connected part. The text representation through

word embedding as in the Convolutional network, is

composed of 2000 dimension, a maximum vocabu-

lary size of 10000 and maximum text sequence length

set to 500. Figure 5 shows the entire network imple-

mented.

5 EXPERIMENTS AND

IMPLEMENTATION

In this section a description of the dataset considering

the server and the receiver side are given. In addition

are detailed the approaches used to train the classiﬁers

and the implementation of the tool.

5.1 Dataset Analysis

Since emails contain private user information, only a

few numbers of datasets that contain personal email

labeled with the name of the sender are public. In

the following section is described the unique avail-

able dataset used to test the authorship email archi-

tecture. The Enron Email Dataset (Klimt and Yang,

2004) is a collection of emails prepared by the CALO

Project (A Cognitive Assistant that Learns and Orga-

nizes). It contains data from about 150 users, mostly

senior management of Enron company. This data was

originally made public, and posted to the web, by the

Federal Energy Regulatory Commission during its in-

vestigation. For each of the 150 identity the dataset

contains the inbox folder and the sent folder. The

total emails included in the dataset are 517401, sent

by 20328 different email accounts to 58564 differ-

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

GRU

Word

embedding

500

GRU

...

Bidirectional

GRU 80

Convolutional

7x64

1D Max

pooling

layers

Fully

connected/

softmax

Target

Not

target

Figure 5: RCNN architecture.

ent receivers. Considering the two scenarios, exper-

imented, have been analyzed the dataset under two

viewpoints following explained.

Server Side Dataset. Within the 20328 senders,

136 of them have more than 500 emails sent, and only

67 have more than 1000 emails. Analyzing the email

lengths of the dataset, we can identify three different

email set: (i) Short emails: emails having less than

20 words, (ii) Medium emails: emails having more

than 19 and less than 51 words and (iii) Long emails:

emails having more than 50 words.

Table 2: Enron Senders and communications.

Email length Senders

Sender-Receiver

Communications

No constraint 67 256

words >50 49 126

20 <words <50 13 256

words <20 5 256

That analysis shows as the majority of the identity

sent long emails followed by the medium emails and

only few identities sent short emails. The number

of senders having more than 1000 emails considering

different length is summarized in the second column

of the Table 2.

Receiver Side Dataset. In the receiver scenario, we

are interested in considering users that have a consid-

erable number of emails received from the same iden-

tity to learn with more accuracy the end to end writing

style of the sender toward the receiver. Considering

100 as the minimum number of emails that a single

class has to contain to train a classiﬁer, the number of

receivers with more than 100 emails received from a

single user and more than 100 emails received from

other users is 26, while 256 are the total amount of

sender-receiver interactions. As showed in the third

column of Table 2, from the dataset, considering only

communication with more than 100 emails, it is pos-

sible extract 256 overall sender-receiver, 126 having

long, 256 medium and 256 short length.

5.2 Training and Evaluation

In this section, the training approaches used in both

the scenario are detailed.

Sender Email Veriﬁcation Training. On the

server-side, we considered the authorship system on

the sender email server, in this way it is make possible

to test the learning of the individual writing style of

the target sender. For every sender identity, a binary

classiﬁer has been trained selecting its inbox emails

as a positive class and a list of emails randomly se-

lected from other senders as a negative class. In that

scenario, the amount of sender-receiver communica-

tions known on the server allow to learn the individual

writing style of the sender. During the training, we

considered identities having more than 1000 emails

sent and for each of one have been trained a binary

classiﬁer considering a balanced training set select-

ing randomly 1000 emails sent by the target class

(sender) and 1000 emails randomly selected from the

sent emails of other identities of the dataset. As a

testing phase, a 10 cross-fold validation has been per-

formed using 100 testing emails for the positive class

and 100 emails for the negative class.

End to End Email Veriﬁcation Training. In the

end to end email veriﬁcation context, as explained in

Section 3, the authorship veriﬁcation system, is lo-

cated on the receiver side, simulating in such a way

the end to end authorship veriﬁcation. For each recip-

ient identity have been selected a set of sender identi-

ties, and in turn, choosing a single target sender (tar-

get communication), has been trained a binary clas-

siﬁer using the target emails as positive class and the

remaining sender emails as negative class. During the

training phase, 256 sender-receiver communications

having more than 100 emails, have been considered.

A random sub-sampling of the majority class to bal-

ance the training set has been performed.

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System

Training Approaches. Two different training ap-

proaches, in both the experiments have been used. As

shown in Section 5.1, the dataset can be splitted con-

sidering different email length. Therefore, as well as

the standard training approach, that consider the train-

ing data selection independent from the mail length,

have been considered a training approach customized

for the following subsets: (i) short emails (less than

20 words) (ii) medium emails (between 20 and 50

words) (iii) long emails (greater than 50 words). Each

networks’ training has been performed on balanced

data (number of positive emails equal to the number

of negative emails), performing a random subsam-

pling of the majority class when required. A 10 cross

fold-validation has been applied during the training

phase to have a better evaluation of the machine learn-

ing models. The classiﬁers are evaluated through the

computation of the accuracy on the predictions.

5.3 Implementation

The aim of the proposed work is not to build an email

authentication system, but we focused on building an

alternative instrument to support the end-user in the

detection of a possible email spooﬁng attack. To this

end a secure email client application for Android has

been developed

, it works as a standard client email

system offering the possibility to connect to the own

mail server, download the emails and analyze them

with the end to end authorship attribution system.

Figure 6: Android secure email client application.

The end-user selecting the list of senders to monitor,

launches a training phase on each end to end commu-

nication using, if it exists, the past emails exchanged

by the parties. When a new email arrives, the sys-

tem, reading the ”declared sender”, routed the email

to the proper classiﬁer where an analysis of the end

to end writing style is performed and it assigns to the

email a score that indicates whether the email sender

http://github.com/iitcybersecurity/EmailClientSpooﬁng

Detection

is who declares to be. The classiﬁers are continuously

trained to allow a learning of the end to end writing

style over time. The interface of the Android applica-

tion is showed in Figure 6.

6 RESULTS

In this section, the results obtained from the experi-

ments described in Section 5 are shown. In particular,

are reported the results obtained in the two scenarios

introduced using the proposed training approaches. In

addition a discussion of the results obtained compar-

ing the end to end and the sender email veriﬁcation

results, the classiﬁcation approaches adopted, and the

impact of the email length on the accuracy, is given.

6.1 Sender Veriﬁcation Results

For the server-side scenario, we reported the evalu-

ation of the classiﬁer in terms of accuracy both for

the training independent from the email length and

for the training based on the email length. Table 3

shows the accuracy comparison between the classiﬁ-

cation mechanisms adopted for the training indepen-

dent from the email length. It shows the overall ac-

curacy and the speciﬁc accuracy of each testing set

(short, medium, long).

Table 3: Length independent sender results.

Classiﬁer

Accuracy

Short

Accuracy

Medium

Accuracy

Long

RCNN 89% 94% 94%

CNN 90% 95% 95%

Logistic Reg. 92% 95% 96%

Nearest Neigh. 73% 65% 66%

SVM 92% 95% 96%

Decision Tree 77% 87% 93%

Random For. 90% 94% 96%

AdaBoost 83% 92% 95%

SGD 88% 94% 94%

The reported results are measured through the mean

accuracy of 67 target senders having more than 1000

emails sent. The results show the low accuracy of

each classiﬁer in recognizing the sender identities

through short emails. Conversely, higher accuracy for

the medium and long test set, has been obtained. Such

results can be because the email length inﬂuences ac-

curacy until a certain threshold.

Splitting the training set basing on the email

length and building a custom classiﬁer for each sub-

set as described in Section 5, we obtained the re-

sults reported in Table 4. It shows for every classi-

ﬁer, the average accuracy obtained in recognizing 5,

13 and 49 senders, respectively for the short, medium

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

and long test set. As in the previous experiment, the

lower accuracy is given by the short email set, which

does not take advantage of the custom training. Bet-

ter results in the medium and long email training set,

have been reached, where the accuracy increases of

1-2% respect to the training independent from the

email length. The results obtained shown as the email

length is an important feature to recognize the author

of an email and we can deduce that a short email con-

taining less than 20 words, does not include sufﬁcient

information for the author veriﬁcation. Excluding the

short email set from the results, it is possible compare

the two training approaches tested.

Table 4: Length dependent sender results.

Classiﬁer

Accuracy

Short

Accuracy

Medium

Accuracy

Long

RCNN 89% 96% 95%

CNN 90% 97% 96%

Logistic Reg. 87% 96% 96%

Nearest Neig. 60% 87% 88%

SVM 90% 96% 96%

Decision Tree 79% 90% 93%

Random For. 85% 95% 96%

AdaBoost 79% 94% 95%

SGD 86% 94% 95%

Table 5 shows the comparison between the two train-

ing approaches both for the total testing set (short,

medium and long) and for the medium and long test-

ing sets. In both cases, performing the email length

dependent training method, the word embedding clas-

siﬁers have an accuracy increment, in fact, consider-

ing the CNN classiﬁer, its accuracy goes from 95%

to 96.5% in the medium and long test set, while from

93.3% to 94.3% in the total testing set.

Table 5: Sender veriﬁcation results comparison.

Classiﬁer

Lenght

Independent

Length

Dependent

AVG

Med/Long

AVG

Short/Med/Long

AVG

Med/Long

AVG

Short/Med/Long

RCNN 94% 92.3% 95.5% 93.3%

CNN 95% 93.3% 96.5% 94.3%

Logistic Reg. 95.5% 94.3% 96% 93%

Nearest Neigh. 65.5% 68% 87.5% 78.3%

SVM 95.5% 94.3% 96% 94%

Decision Tree 90% 85.6% 91.5% 87.3%

Random For. 95% 93.3% 95.5% 92%

AdaBoost 93.5% 90% 94% 89.3%

SGD 94% 92% 94.5% 91.6%

6.2 End to End Veriﬁcation Results

As in the sender veriﬁcation scenario, we reported the

results for both the training approaches used. Table

6 shows the mean accuracy of each machine learning

models computed from the evaluation of every sin-

gle end to end classiﬁer trained on the sender-receiver

communication independently from the email length.

The table, as well as, showing the total average ac-

curacy obtained training the overall sender-receiver

communications, shows the average accuracy ob-

tained in every subset of the testing set (short, medium

and long). From the analysis of the results, it is pos-

sible to afﬁrm that the models based on word embed-

dings outperform the feature engineering based mod-

els. Considering the total accuracy, CNN and RCNN

provide higher accuracy respect to the features engi-

neering based models achieving as best result 95.3%

of accuracy against the 94.2% reached by the Logistic

Regression classiﬁer. Analyzing the accuracy com-

puted for each subset, the short email set shows low

accuracy in every model. As in the sender veriﬁca-

tion scenario, the accuracy increase by increasing the

email length until a certain threshold and the better ac-

curacy is achieved with the email having length com-

prised between 20 and 50 words. It is possible to asso-

ciate the accuracy trend obtained to the fact that short

emails do not contain personal writing style features

needed to the classiﬁer to discriminate from one com-

munication to another.

Table 6: End to End veriﬁcation results length independent.

Classiﬁer

Total

Accuracy

Short

Accuracy

Medium

Accuracy

Long

RCNN 95.3% 91.2% 96.3% 97.1%

CNN 94.8% 92.6% 97.2% 97.4%

Logistic Reg. 94.2% 84.3% 96.5% 96.3%

Nearest Neig. 81.4% 79.1% 85.4% 83.1%

SVM 94.2% 74.8% 98.0% 95.6%

Decision Tree 92.1% 76.3% 93.1% 93.9%

Random For. 93.6% 77.1% 94.6% 95.6%

AdaBoost 92.7% 80.2% 96.7% 94.3%

SGD 94.5% 80.4% 95.4% 96.0%

Table 7, shows the results obtained performing the

email length dependent training in each subset de-

ﬁned. As in the sender veriﬁcation test, the accuracy

obtained is higher respect to the training independent

approach and it conﬁrms the validity of the training

method proposed.

Table 7: End to End veriﬁcation results length dependent.

Classiﬁer

Accuracy

Short

Accuracy

Medium

Accuracy

Long

RCNN 91.3% 99.2% 98.8%

CNN 92.5% 98.9% 98.6%

Logistic Reg. 85.3% 97.2% 97.7%

Nearest Neigh. 79.4% 86.5% 84.5%

SVM 75.5% 98.1% 97.6%

Decision Tree 77.4% 95.7% 94.6%

Random For. 78.5% 96.2% 97.4%

AdaBoost 80.9% 97.4% 96.8%

SGD 81.3% 98.0% 97.1%

The accuracy increment is assessable discarding the

short test set. Taking in consideration the best model

(RCNN), it achieves 99.2% and 98.8% of accuracy re-

spectively in the medium and long set, that are better

accuracy comparing to the 96.3% and 97.1% reached

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System

with the length independent training.

6.3 Veriﬁcation Comparison

A comparison between the two authorship ap-

proaches, it is possible only testing the classiﬁers on

the same testing set. Considering an end to end com-

munication composed by one ”declared sender” and

one receiver, it is possible to apply both the sender au-

thorship veriﬁcation systems respectively trained on

the ”declared sender”, and the speciﬁc end to end

communication. Therefore, we performed a sender

prediction of each end to end testing set using the

proper trained sender authorship classiﬁer. The av-

erage accuracy on the overall 256 end to end commu-

nications using both the strategies is showed in Ta-

ble 8. The accuracy of the sender classiﬁers applied

to the end to end testing set, is lower in every test-

ing subset respect to the end to end email veriﬁca-

tion approach. Such behavior is due to the fact that

the sender classiﬁer is able to learn an high abstrac-

tion level of the identity writing style that is useful to

distinguish two different senders which interact with

different receivers, but as highlighted by the accuracy

differences, such learned degree is not sufﬁcient to

distinguish different senders which interact with the

same receiver.

Table 8: End to End - Sender veriﬁcation results compari-

son.

Classiﬁer

End To End Sender

Short Medium Long Short Medium Long

RCNN 91,3% 99,2% 98,8% 85.4% 93.1% 92.2%

CNN 92,5% 98,9% 98,6% 83.6% 93.5% 92.4%

Logistic Reg. 85,3% 97,2% 97,7% 80.5% 87.9% 87.2%

Nearest Neigh. 75,5% 86,5% 84,5% 73.6% 70.3% 71.8%

SVM 77,4% 98,1% 97,6% 74.9% 85.7% 82.6%

Decision Tree 78,5% 95,7% 94,6% 72.6% 86.9% 85.1%

Random For. 78,5% 96,2% 97,4% 75.4% 85.8% 85.7%

AdaBoost 80,9% 97,4% 96,8% 77.9% 88.3% 87.7%

SGD 81,3% 98% 97% 72.9% 87.1% 86.2%

Table 9: Authorship works comparison.

Ref. Dataset Text size Identities

End 2 End

Veriﬁcation

Sender

Veriﬁcation

[2] Enron 500 chars 87 - EER 14.35%

[14] Enron <95 words 52 - Accuracy 97%

[22] Twitter 1000 chars 50 - Accuracy 76%

Our Enron >20 words 67 99% 96.5%

7 RELATED WORK

In this section, the authorship works are presented

taking into consideration the differentiation between

feature engineering based and deep learning author-

ship classiﬁers, as well as the differentiation between

authorship for identiﬁcation and veriﬁcation. Author-

ship analysis is a topic widely treated in literature and

in particular in forensic linguistics ﬁeld where the aim

is to identify linguistic features that can give informa-

tion about the identity of an anonymous text. Many

works have been done regarding authorship identiﬁ-

cation, veriﬁcation, and writing style characterization.

The ﬁrst works on authorship were related to the at-

tribution of an author to a speciﬁc textbook or general

text document well structured and having a long di-

mension. The new investigations are focused on au-

thorship analysis of online documents that have re-

duced text size and in general, not well structured like

social messages or emails (Brocardo et al., 2013). The

main approach used to solve that problem is to use the

stylometric features manually extracted to specify the

writing style of a person through traditional machine

learning algorithms. The effectiveness of deep learn-

ing neural network in Natural Language Processing

(NLP), have provided advantages in feature extrac-

tion, and some techniques have also been applied in

the authorship ﬁeld. Most of the authorship works

are focused on the identiﬁcation problem (attribution

of identity to a given text), in (Zheng et al., 2006) the

authors present an online message authorship identiﬁ-

cation framework based on four types of writing style

(lexical, syntactic, structural and content-speciﬁc).

They experimented with three features based on clas-

siﬁcation techniques on English online text with an

average length of 169 words. They achieved 97%

of accuracy in identifying 20 identities through 30,

40 messages per author. In (Shrestha et al., 2017) is

presented another work on authorship identiﬁcation

of short messages based on a deep learning model.

The authors presented a Convolutional Neural Net-

work for the author attribution of tweets achieving

76% of accuracy for 50 authors with 1000 tweets

each. Another authorship subﬁeld studied in short

message analysis is the veriﬁcation problem (verify

whether the written text belongs to who declares to

be). In such context deep learning models have also

been applied to the authorship veriﬁcation problem

for short messages, in (Litvak, 2018) is presented a

deep learning model for automatic feature extraction

directly from the input text. They implemented a Con-

volutional Neural Network ables to analyze the raw

input email text and extract the discriminate features

to verify the genuineness of the author. Table 9 sum-

marizes the comparison between our work and the

studies in this ﬁeld.

8 CONCLUSION

We faced the problem of spear-phishing attack based

on the forgery of the sender ﬁeld contained in the

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

email. As a countermeasure, we proposed an end-

user email support system based on the analysis of

the writing style of a person. We presented two possi-

ble approaches to solve the problem (i) sender email

veriﬁcation which we exploited the characterization

of the overall writing style of a sender and (ii) end

to end email veriﬁcation, which considers the end to

end writing style in the sender-receiver communica-

tion. As a veriﬁcation system, we proposed an author-

ship email veriﬁcation based on a binary text classi-

ﬁer. We compared two text classiﬁcation approaches

(i) features engineering based and (ii) word embed-

ding based. In both the scenarios experimented are

tested two training techniques based on different split-

ting of the dataset: (i) independent from the email

length and (ii) dependent from the email length. The

analysis of the results shows: (i) the higher accuracy

of the word embedding based classiﬁers respect to the

features engineering based in both the scenarios; (ii)

the effectiveness of the training technique based on

the dataset splitting dependent from the email length

and (iii) the better accuracy obtained by the end to end

email veriﬁcation respect to the traditional sender ver-

iﬁcation. With the high accuracy reached in the email

author veriﬁcation, it has been proved that the author-

ship mechanism is a promising support approach to

use in contrast to the spear-phishing scam emails.

ACKNOWLEDGEMENTS

This work has been partially supported by H2020 EU-

funded projects SPARTA, GA 830892, C3ISP, GA

700294 and EIT-Digital Project HII, PRIN Governing

Adaptive.

REFERENCES

Allman, E., Callas, J., Delany, M., Libbey, M., Fenton, J.,

and Thomas, M. (2007). Domainkeys identiﬁed mail

(dkim) signatures. Technical report, RFC 4871, May.

Brocardo, M. L., Traore, I., Saad, S., and Woungang, I.

(2013). Authorship veriﬁcation for short messages us-

ing stylometry. In 2013 International Conference on

Computer, Information and Telecommunication Sys-

tems (CITS), pages 1–6. IEEE.

Brocardo, M. L., Traore, I., and Woungang, I. (2015). Au-

thorship veriﬁcation of e-mail and tweet messages ap-

plied for continuous authentication. Journal of Com-

puter and System Sciences, 81(8):1429–1440.

Connor, J. T., Martin, R. D., and Atlas, L. E. Recurrent neu-

ral networks and robust time series prediction. IEEE

transactions on neural networks, 5.

Dasarathy, B. V. (1991). Nearest neighbor (nn) norms: Nn

pattern classiﬁcation techniques. IEEE Computer So-

ciety Tutorial.

Donahue, J., Anne Hendricks, L., Guadarrama, S.,

Rohrbach, M., Venugopalan, S., Saenko, K., and Dar-

rell, T. (2015). Long-term recurrent convolutional net-

works for visual recognition and description. In Pro-

ceedings of the IEEE conference on computer vision

and pattern recognition, pages 2625–2634.

Freund, Y., Schapire, R., and Abe, N. (1999). A short in-

troduction to boosting. Journal-Japanese Society For

Artiﬁcial Intelligence, 14(771-780):1612.

Graves, A., Mohamed, A.-r., and Hinton, G. (2013).

Speech recognition with deep recurrent neural net-

works. In 2013 IEEE international conference on

acoustics, speech and signal processing, pages 6645–

6649. IEEE.

Ho, T. K. (1995). Random decision forests. In Proceedings

of 3rd international conference on document analysis

and recognition, volume 1, pages 278–282. IEEE.

Hoffman, P. (2002). Smtp service extension for secure smtp

over transport layer security.

Kiefer, J., Wolfowitz, J., et al. (1952). Stochastic estimation

of the maximum of a regression function. The Annals

of Mathematical Statistics, 23(3):462–466.

Klimt, B. and Yang, Y. (2004). The enron corpus: A

new dataset for email classiﬁcation research. In Eu-

ropean Conference on Machine Learning, pages 217–

226. Springer.

Kucherawy, M. and Zwicky, E. (2015). Domain-based

message authentication, reporting, and conformance

(dmarc).

Litvak, M. (2018). Deep dive into authorship veriﬁcation

of email messages with convolutional neural network.

In Annual International Symposium on Information

Management and Big Data, pages 129–136. Springer.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. arXiv preprint arXiv:1301.3781.

Myers, J. G. (1999). Smtp service extension for authentica-

tion.

Peng, C.-Y. J., Lee, K. L., and Ingersoll, G. M. (2002). An

introduction to logistic regression analysis and report-

ing. The journal of educational research, 96(1):3–14.

Quinlan, J. R. (1986). Induction of decision trees. Machine

learning, 1(1):81–106.

Radicati Group, I. (2019). Emailstatistics report, 2019-

2023.

Ruder, S., Ghaffari, P., and Breslin, J. G. (2016). Character-

level and multi-channel convolutional neural networks

for large-scale authorship attribution. arXiv preprint

arXiv:1609.06686.

Schuster, M. and Paliwal, K. K. (1997). Bidirectional re-

current neural networks. IEEE Transactions on Signal

Processing, 45(11):2673–2681.

Shrestha, P., Sierra, S., Gonzalez, F., Montes, M., Rosso, P.,

and Solorio, T. (2017). Convolutional neural networks

for authorship attribution of short texts. In Proceed-

ings of the 15th Conference of the European Chapter

of the Association for Computational Linguistics: Vol-

ume 2, Short Papers, pages 669–674.

Email Spooﬁng Attack Detection through an End to End Authorship Attribution System

Suykens, J. A. and Vandewalle, J. (1999). Least squares

support vector machine classiﬁers. Neural processing

letters, 9(3):293–300.

Wong, M. and Schlitt, W. (2006). Sender policy framework

(spf) for authorizing use of domains in e-mail, version

1. Technical report, RFC 4408, april.

Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-

level convolutional networks for text classiﬁcation. In

Advances in neural information processing systems,

pages 649–657.

Zheng, R., Li, J., Chen, H., and Huang, Z. (2006). A frame-

work for authorship identiﬁcation of online messages:

Writing-style features and classiﬁcation techniques.

Journal of the American society for information sci-

ence and technology, 57(3):378–393.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy