Battling Against DDoS in SIP

Is Machine Learning-based Detection an Effective Weapon?

Z. Tsiatsikas

, A. Fakis

, D. Papamartzivanos

, D. Geneiatakis

, G. Kambourakis

and C. Kolias

Dept. of Inform. and Comm. Systems Engineering, University of the Aegean, Karlovassi, Greece

Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, GR541 24 Thessaloniki, Greece

Computer Science Department, George Mason University, VA, U.S.A.

Keywords:

Session Initiation Protocol, Machine Learning, DDoS, Anomaly-detection, Intrusion Detection Systems.

Abstract:

This paper focuses on network anomaly-detection and especially the effectiveness of Machine Learning (ML)

techniques in detecting Denial of Service (DoS) in SIP-based VoIP ecosystems. It is true that until now several

works in the literature have been devoted to this topic, but only a small fraction of them have done so in an

elaborate way. Even more, none of them takes into account high and low-rate Distributed DoS (DDoS) when

assessing the efﬁcacy of such techniques in SIP intrusion detection. To provide a more complete estimation

of this potential, we conduct extensive experimentations involving 5 different classiﬁers and a plethora of

realistically simulated attack scenarios representing a variety of (D)DoS incidents. Moreover, for DDoS ones,

we compare our results with those produced by two other anomaly-based detection methods, namely Entropy

and Hellinger Distance. Our results show that ML-powered detection scores a promising false alarm rate in

the general case, and seems to outperform similar methods when it comes to DDoS.

1 INTRODUCTION

During the last years Voice over IP (VoIP) technolo-

gies and services have penetrated the market and for

many of us became an integral part of our software

and/or hardware portfolio. Recent reports indicate

that this market will grow to reach about USD 136.76

billion until 2020 (Mohr, 2014). In both mobile

and ﬁxed networks, Session Initiation Protocol (SIP)

seems to be the predominant means for establishing

and managing a VoIP session. On the downside, the

text and open nature of the protocol has given rise to

a plethora of attacks against it.

By examining the rather rich literature on SIP se-

curity, one can distinguish several categories of as-

saults ranging from SQL injection to Denial of Ser-

vice (DoS) (Geneiatakis et al., 2005; Geneiatakis

et al., 2007; Geneiatakis et al., 2006; Kambourakis

et al., 2011). It can be safely argued that the latter

category attracts the greater attention, and seems to

be the most perilous and difﬁcult to confront since

it is closely related with the signaling nature of the

protocol per se. So, focusing on this kind of attacks,

so far, several protection and detection methods have

been proposed. Roughly, we can categorize them into

misuse-detection and anomaly-detection ones. Gen-

erally, the ﬁrst family of methods monitors network

activity with exact signatures of known malicious be-

havior (e.g., observe the network trafﬁc for singular

byte sequences), while the second possess a knowl-

edge of normal activity and warns against any devia-

tion from that proﬁle. The latter category of methods,

which is the focus of this paper, is usually realized by

means of tools borrowed from the Machine Learning

(ML) community. This refers to algorithms that are

ﬁrst get trained in an either supervised or unsuper-

vised manner with reference input to learn its particu-

lars, and then are fed with unknown input for accom-

plishing the real detection process. Speciﬁcally for

SIP, although the DoS threat has been stressed out and

dealt by a signiﬁcant number of researches (Ehlert

et al., 2010; Keromytis, 2011), the applicability and

effectiveness of ML techniques to cope against such

incidents is still being assessed and certainly in need

for further development.

Naturally, this is mainly due to the increased over-

head that these methods may bear - especially when

it comes to real-time detection and a training phase

is required - in comparison to misuse-based or purely

statistical ones. Nevertheless, in this work we argue

that ML techniques can be particularly fruitful for ex-

amining the high-volume log ﬁles of a given VoIP

301

Tsiatsikas Z., Fakis A., Papamartzivanos D., Geneiatakis D., Kambourakis G. and Kolias C..

Battling Against DDoS in SIP - Is Machine Learning-based Detection an Effective Weapon?.

DOI: 10.5220/0005549103010308

In Proceedings of the 12th International Conference on Security and Cryptography (SECRYPT-2015), pages 301-308

ISBN: 978-989-758-117-5

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

realm in an ofﬂine fashion if they contain DoS inci-

dents. Also, this category of methods may show better

results when used for the detection of low-rate DoS

(also known with the term “low and slow”), which

although is not used to paralyze the target system at

a fast pace, it consumes valuable network, CPU and

memory resources. Ultimately, this results to service

delays which in turn cause customer dissatisfaction

with direct negative results to the provider’s market

share.

Taking the above into consideration, the focus of

this work is on the applicability of ML techniques to

track down DoS incidents, paying special attention to

DDoS and low-rate ones. The main contributions of

this work can be summarized as follows:

• We assess the effectiveness of several well-known

classiﬁers to detect (D)DoS incidents in SIP in

terms of false alarms.

• We offer a method to calculate SIP message head-

ers occurrences from a given log ﬁle in a privacy-

preserving way based on a predeﬁned message

window. The output of this process are fed to the

ML algorithm as the case may be.

• Our experiments consider both DoS and DDoS at-

tacks materialized in 15 different realistically sim-

ulated SIP trafﬁc scenarios, having different char-

acteristics in terms of number of users and calls

per second.

• For DDoS scenarios, we provide a comparison be-

tween two other anomaly-based detection meth-

ods proposed in the literature and ML-powered

detection in terms of effectiveness.

The rest of the paper is organized as follows. Sec-

tion 2 provides an overview of SIP architecture and

brieﬂy describes the threat model. Section 3 details

on the creation of the classiﬁcation features used by

ML classiﬁers, while Section 4 elaborates on the ex-

perimental results. The related work is discussed in

Section 5. Section 6 draws a conclusion and provides

some pointers to future work.

2 PRELIMINARIES

2.1 SIP Architecture & Threat Model

This Section succinctly describes the basic parts of

an SIP architecture. This is required to familiarize

the reader with the terminology and notations used in

the subsequent Sections. An SIP VoIP architecture

consists of the following basic elements.

S1 INVITE sip: zisis@83.212.120.153 SIP/2.0.

S2 Call-ID: a306a24825b11345a79eee1ed9450120@0:0:0.

CSeq: 1 INVITE.

S3 From: "alfa" <sip:alfa@83.212.120.153>;tag=61460cc9.

S4 To: <sip:zisis@83.212.120.153>.

S5 Via: SIP/2.0/UDP 85.74.157.139:5060;branch=z9hG4bK

Max-Forwards: 70.

S6 Contact: "dpapamartz" <sip:dpapamartz@85.74.157.139:5060

User-Agent: Jitsi2.2.4603.9615Windows 7.

Content-Type: application/sdp.

v=0.

o=scype2 0 0 IN IP4 85.74.157.139.

s=-.

c=IN IP4 192.168.1.52.

t=0 0.

Figure 1: A typical SIP mssage.

• User Agent (UA). It represents the end points of

the SIP protocol, that is, the caller and the callee

which are able to initiate or terminate a session

using an SIP software or hardware client.

• SIP Proxy Server: It is an intermediate entity

which plays the role of the client and the server at

the same time. Its task is to route all the packets

being send and received by the users participat-

ing in an SIP session. Note that two or more SIP

proxies may exist between any two UAs.

• Registrar. It handles the authentication and reg-

ister requests initiated by the UAs. To do so, this

entity stores user’s credentials and UA location in-

formation.

Figure 1 presents a typical SIP INVITE message.

As observed, such a message is consisted of several

headers ﬁelds, designated as S1-S6 in the ﬁgure, and

a message body. Initially, a user has to send a REG-

ISTER request in an SIP Registrar. The latter, will

store the contact information of the user in the loca-

tion server. After that, any other user can try to es-

tablish a VoIP session with that UA by sending it an

INVITE request. At any time, either the caller or the

callee can send a BYE message toward the other end

to terminate the session. The interested reader who

wishes to get a deeper understanding of SIP architec-

ture can refer to the corresponding RFC (Rosenberg

et al., 2002).

The SIP signaling produced by the users is logged

by the VoIP provider. In fact, this is in most cases

a mandatory requirement for any service provider

mainly for billing, accounting and network planning

purposes. As a result, these logs may be a valuable

and rich source of information concerning the investi-

gation of security incidents and intrusion detection in

general.

Regarding the security aspects of SIP ecosystems,

various types of vulnerabilities and attacks have been

presented in the literature so far. More precisely, at-

SECRYPT2015-InternationalConferenceonSecurityandCryptography

302

tacks such as malformed messages, ﬂooding, SQL in-

jection and signaling ones (Geneiatakis et al., 2006;

Keromytis, 2012) are some of the most destructive.

Among them, (D)DoS is probably the most hazardous

one as it targets to drain the target’s resources. For

example, an attacker is able to send a high volume of

requests to the victim with the aim to steer it to paral-

ysis. Moreover, the attacker could send a large num-

ber of different requests with spoofed IP addresses,

aiming to drain the target’s resources and confuse the

underlying security mechanisms. In a worst-case sce-

nario, a botnet could be used to launch such an attack,

producing high volume of trafﬁc. This may be also or-

chestrated under the protection of a covert communi-

cation channel, thus making the detection even more

cumbersome. For a more explanatory threat model on

this type of attacks in SIP the reader can refer to (Tsi-

atsikas et al., 2015).

3 CLASSIFICATION FEATURES

As already mentioned in Section 1, to avoid DoS

attacks in SIP several solutions have been pro-

posed (Ehlert et al., 2010; Geneiatakis et al., 2009;

Tang et al., 2014). Given that this type of attack is as

a rule of thumb executed in a distributed manner and

may be quite sophisticated regarding its implementa-

tion, simple anomaly-detection approaches that rely

on the sudden and fast-paced increment of SIP traf-

ﬁc may be not enough. In this regard, ML-powered

methods can be a potent ally towards the detection of

such perilous events. The key factor here is the log

ﬁles on the provider side, which can be used to feed a

ML classiﬁer in real-time or ofﬂine (in case, say, the

investigation of an attack aftermath is required). This

Section elaborates on the use of such techniques in an

SIP environment.

In our experiments, we utilize and evaluate the ef-

fectiveness of 5 well-known classiﬁers tested under

15 different attack scenarios. Speciﬁcally, we use the

SMO, Naive Bayes, Neural Networks, Decision Trees

(J48) and Random Forest classiﬁers. This selection

has been made based on the ability of these classiﬁers

to perform better in terms of decision accuracy and

speed when it comes to numerical data (Witten and

Frank, 2005).

In order to take advantage of the aforementioned

performance characteristics, we utilize algorithm 1.

Its purpose is twofold. On the one hand, it aims to

deal with the sensitive nature of the communication

transactions residing in an audit trail by providing an

anonymization scheme (Tsiatsikas et al., 2015), while

on the other allows for automatically extracting the

classiﬁcation features to be used by the classiﬁers into

a numerical form.

The anonymization goal is met using HMAC

(Eastlake and Hansen, 2011). HMAC enables one to

preserve the anonymity of the communication enti-

ties appearing in the underlying audit trail, while the

entropy of messages is preserved leading the subse-

quent calculations to remain intact. In fact, reveal-

ing the hidden UA identities is as hard as reversing

the HMAC procedure itself. The cryptographic key

is kept secret and in possession of the entity, who is

the legitimate owner of the audit trail. According to

the transformation procedure, a log ﬁle is examined

line-by-line and every privacy-sensitive SIP message

header (e.g., <FROM>, <TO>, <VIA>, etc) be-

comes input for the HMAC function (lines 2-4). The

algorithm considers only the SIP message headers S1

to S6 as given in Figure 1. More precisely, the hash

function used in our case is the HMAC-SHA256 one

combined with a cryptographic key of 256 bits (line

4).

The next stage is to generate the classiﬁcation fea-

tures. The steps to achieve this are summarized in

lines 5-14 of algorithm 1. The anonymized unique

headers are kept in a Hash table data structure (line

5). This table is populated with the number of oc-

currences of every single header checksum. That is,

if a checksum occurs for the ﬁrst time, then a new

instance is generated in the table (lines 8-9). If it is

a repeating header, its number of occurrences is in-

creased by 1 (line 6). This procedure is repeated until

a certain message window M

is met (line 11). In our

case, the M

is set to 1,000, but this parameter can be

adjusted by the service provider itself, say, according

to the average call rates. To our knowledge, there is

no foolproof approach to formally deﬁne this param-

eter, mainly because it is eminently contextual. That

is, it is closely connected to the characteristics of the

service and underlying network. As a result, similar

to other anomaly-based approaches, one can follow

an error-trial approach to equilibrate between the M

parameter and the false alarm rate.

The result of applying algorithm 1 to an audit trail

is a number of specially formatted .arff ﬁles (one per

), which are afterwards used in the classiﬁcation

process. Each .arff ﬁle contains classiﬁcation vectors,

i.e., one vector per SIP message found in the log ﬁle

being examined. Two instances of such a classiﬁca-

tion vector follows.

attack

= {926, 4, 988, 4, 4, 3, attack}

normal

= {12, 4, 6, 4, 3, 8, normal}

The ﬁrst 6 values of each vector represent the occur-

rences of S1 to S6 SIP headers respectively, and the

BattlingAgainstDDoSinSIP-IsMachineLearning-basedDetectionanEffectiveWeapon?

303

last one characterizes the class in which the vector

belongs. One can easily observe that the ﬁrst vector

introduces a higher number of occurrences in S1 and

S3 headers, while the rest remain low, close to those

contained in V

normal

Algorithm 1: Obtain Input Data for ML Classiﬁers.

Input: Audit Trail

Output: Input File for Classiﬁers (.arff format)

1 while (AuditTrail 6= NULL) do

2 Line ← ReadLine();

3 SIPHeader ← ExtractSipHeader(Line);

4 HashedHeader ← HMAC(SIPHeader);

5 if (InsertToHashTable(HashedHeader) 6= NULL) then

6 GetValueofHashTable(HashedHeader)++;

7 else

8 InsertToHashTable(HashedHeader);

9 SetValueInHashTable(HashedHeader) ← 1;

10 end

11 if (Message-Window = 1, 000) then

12 TotalMessages ← TotalMessages + M

;

13 Re-Initialize(HashTable);

14 end

15 for (i=1; i ≤ TotalMessages; i++) do

16 PrintOccurences(GetValueofHashTable(HashedHeader));

17 end

18 end

4 EVALUATION

4.1 Test-bed Setup

In order to evaluate the effectiveness of the aforemen-

tioned classiﬁers in detecting DoS incidents we cre-

ated a test-bed, depicted in Figure 2. Three differ-

ent Virtual Machines (VMs) have been used for the

SIP proxy, the legitimate users, and the generation

of the attack trafﬁc depending on the scenario. All

VMs run on an i7 processor 2.2 GHz machine hav-

ing 6 GB of RAM. For the SIP proxy we employed

the widely-known VoIP server Kamailio (Kamailio,

2014). We simulated distinct patterns for both le-

gitimate and DoS attack trafﬁc using sipp v.3.2

and

sipsak

tools respectively. Furthermore, for the sim-

ulation of DDoS attack, the SIPp-DD tool has been

used (Stanek and Kencl, 2011). The well-known

Weka tool (Hall et al., 2009) has been employed for

ML analysis.

As already pointed out in Section 2, we assessed

5 classiﬁers under 15 different scenarios the results

of which is provided in Table 2. It is stressed that

both the training and testing scenarios include legit-

imate and attack trafﬁc. For example, the training

http://sipp.sourceforge.net/

http://sipsak.org/

scenario is SN1 and its testing scenarios are SN1.1,

SN1.2, SN1.3, and so on. The legitimate trafﬁc for

DoS testing scenarios was created using the same call

rate as that of the corresponding training scenario. On

the other hand, for DDoS we used a range of different

call rates aiming to better simulate the possible varia-

tions that may appear in a real VoIP service. For ex-

ample, as observed in Table 1, the call rate for SN6.1

is given as 20-120, where the ﬁrst number indicates

the call rate of the attack, and the second corresponds

to the call rate of the legitimate trafﬁc both occur-

ring in parallel. Keep in mind that for DDoS scenar-

ios about half of the registered users were generating

the normal trafﬁc, while the other half were launch-

ing the actual attack. Moreover, for all the scenarios,

we employed an exponential inter-arrival time distri-

bution (λ = 100), for producing the legitimate trafﬁc

similar to that used in evaluating SIP server perfor-

mance (Krishnamurthy and Rouskas, 2013). The at-

tack trafﬁc for DoS training scenarios was created us-

ing randomly generated attacks with call rates varied

between 20 to 10,000 calls/sec and time pauses be-

tween them spanning from 15 to 360 secs. The same

method was used for creating the DDoS training sce-

narios that is, seven variants were launched in total,

having different call rates spanning between 2,000 to

15,000 calls/sec and pauses between them set to 10 to

800 secs.

Kamailio SIP Proxy

Caller Calee

Attac Traffic

Generator for

DDoS

IPp

Bac ground

Traffic

Generator

IPp

Attac Traffic

Generator for

DoS

psa

Figure 2: Deployed test-bed for (D)DoS simulations.

4.2 Results

The obtained results for all the scenarios are given

in Table 2. This Section ﬁrstly refers to the DoS at-

tack scenarios and then to DDoS ones. As shown in

Table 2, we use legacy intrusion detection metrics,

namely False Positive (FP) and False Negative (FN)

to assess the performance of each algorithm. One

can easily observe that in the case of DoS involving

scenarios SN1.1 to SN5.2, the maximum FP value is

equal to 3.7%, scored by both SMO and Neural Net-

SECRYPT2015-InternationalConferenceonSecurityandCryptography

304

Table 1: Description of scenarios.

Scen. Num.of Users Calls/Sec. Train Scen. Type of Attack

SN1 30 2 X -

SN1.1 30 50 - DoS

SN1.2 30 175 - DoS

SN1.3 30 350 - DoS

SN2 30 5 X -

SN2.1 30 20 - DoS

SN2.2 30 40 - DoS

SN2.3 30 80 - DoS

SN3 30 20 X -

SN3.1 30 266 - DoS

SN4 30 120 X -

SN4.1 30 800 - DoS

SN5 50 120 X -

SN5.1 50 400 - DoS

SN5.2 50 1200 - DoS

SN6 60 20 X DDoS

SN6.1 60 20-120 - low-rate DDoS

SN6.2 60 120-20 - high-rate DDoS

SN7 500 100 X DDoS

SN7.1 500 10-200 - low-rate DDoS

SN7.2 500 100-40 - high-rate DDoS

SN7.3 500 30-50 - low-rate DDoS

works detectors. For the same scenarios, the FN met-

ric remains low, presenting an average value of 0.02%

and a maximum one of 0.85%. Generally, the best

results in the DoS case are obtained by J48 and Ran-

dom Forest classiﬁers. The results also indicate that

as the attack trafﬁc volume increases the FP and FN

rates decrease. For instance, taking SN3.1 and SN4.1

as an example, the FP metric decreases signiﬁcantly

when compared to the ﬁrst three subscenarios, namely

SN1.1-SN1.3.

On the downside, the false alarms per classiﬁer

augment for scenarios SN6.1 to SN7.3 representing a

DDoS case. This is rather expected as the occurrences

per message header decrease signiﬁcantly due to the

multiple spoofed IPs - that in turn affect headers S3,

S5 and S6 of virtually every transmitted SIP message

- thus leading to a more difﬁcult separation between

the attack and normal messages.

Among all the classiﬁers the worst results for

DDoS scenarios in terms of FP are obtained by SMO

and Naive Bayes. Note that FP percentage rates

scored in DDoS scenarios for all the algorithms are

generally considerably higher than those obtained by

the corresponding DoS ones. Taking SN6.1 for ex-

ample, FP ﬂuctuates between 0.04% and 17.7%, hav-

ing an average value of 6.86%. Similar results are

recorded for SN7.1, with FP varying between 5.2%

and 11.3%. When the attack trafﬁc increases, i.e.,

when the high-rate DDoS scenarios are involved, all

the results are improved signiﬁcantly. This is be-

cause the portion of the attack messages inside the

same M

increases proportionally to the rate of the

attack. For instance, for scenario SN6.2, the maxi-

mum FP value is rather negligible, equal to 0.55%,

while FN is zeroed. Similar results are obtained in

the case of the other high-rate DDoS scenario, namely

SN7.2, demonstrating a maximum FP value equal to

1%. Finally, SN7.3 corresponds to a moderate attack

rate scenario and presents similar results to the four

previously mentioned ones.

Speciﬁcally for DDoS scenarios, we compare the

results scored by ML detectors against those obtained

for the same scenarios but with two other anomaly-

detection methods, namely Entropy (Shannon, 2001)

and Hellinger Distance (Nikulin, 2001; Tsiatsikas

et al., 2014). Table 3 summarizes the FP and FN

results obtained by the two aforementioned schemes.

To help the reader compare between the various algo-

rithms, the rightmost columns of the same Table con-

tain the corresponding false alarm values as scored

by the top ML-based performer. Bear in mind that

in contrast to ML techniques the training scenarios

(SN6 and SN7) used for Entropy and Hellinger Dis-

tance do not include attack trafﬁc. This is sensible

because non-machine learning approaches rely on de-

viations between the legitimate messages in order to

compute the corresponding thresholds. If an exam-

ined message exceeds the predeﬁned threshold, then

the message is classiﬁed as abnormal.

We can safely argue that the non-machine learn-

ing schemes score worse results in comparison with

ML-based ones. More precisely, in the case of En-

tropy metric and for all the ﬁve DDoS scenarios, the

FP rate reaches the maximum value of 18.1%, while

FN varies between 5.41% and 43.5% (and especially

for the Entropy metric scores exceedingly high val-

ues for all the scenarios but one). Further, the FP for

Hellinger Distance ﬂuctuates between 1.8% to 36%.

The maximum FN value for the two aforementioned

methods is the same, equal to 43.1% perceived in both

cases for scenario SN6.2.

To sum up, the results obtained from Table 3

imply that ML-based detectors outperform the non-

machine learning ones especially in terms of FN, for

all DDoS incidents. In fact, the same category of de-

tectors are overall competitive, presenting high accu-

racy in DoS scenarios as well. This is because these

schemes learn from a mixed trafﬁc including both

normal and attack messages, and thus it is easier for

them to separate between the two classes, even with

slight differences in header occurrences.

In general, anomaly-detection schemes must cope

with a number of issues (Gates and Taylor, 2007):

(i) A considerable number of false alarms (especially

false positives) is normally expected by most classi-

ﬁers. In our case, this statement seems to be con-

ﬁrmed in its entirety for the Entropy and Hellinger

Distance metrics. For the ML ones, we can assert

BattlingAgainstDDoSinSIP-IsMachineLearning-basedDetectionanEffectiveWeapon?

305

that the same statement stands half-true for FP, and

false for FN. Speciﬁcally, ML-based detection largely

fails in the case of low-rate DDoS (except for Neu-

ral Networks, and partially for Decision Trees), but it

is effective across all algorithms for high-rate DDoS.

This however hardly comes as a surprise as low-rate

attacks are generally much harder to detect. (ii) Ac-

quiring attack-free data for training may be a prob-

lem. In our case, this point can be dealt with if a VoIP

billing system is in place. This will allow the correct

labeling of each message because these logs are sup-

posed to be accurate and valid. (iii) Smart aggressors

may try to elude detection by increasingly teaching

a system to identify intrusive activity as legitimate.

To tackle this third point, one can vary the M

based

on mid or long-term statistical observations regarding

SIP trafﬁc.

A last point to be emphasized is that in terms

of complexity ML-based classiﬁers require a differ-

ent and usually signiﬁcant amount of time to build a

model from the given training set. Note that this time

does not include that needed to generate an .arff data

ﬁle from the given log ﬁle. For instance, taking SN4

as an example the training process spans between 0.01

to 154.95 secs for all the classiﬁers when fed with a

ﬁle containing 261k records of SIP messages.

5 RELATED WORK

In this Section, we detail on the related work and

more speciﬁcally on contributions discussing the ap-

plicability of ML-driven techniques in detecting secu-

rity incidents in VoIP services employing SIP or other

similar signaling protocol.

The work in (Akbar and Farooq, 2009) proposes

the use of ML techniques to detect ﬂooding attacks

against SIP-based services. The authors build their

model to only consider the ﬁrst line of an SIP message

(S1 in our case), and ignoring the rest of the headers.

They analyse the role of several classiﬁers and their

effectiveness in detecting SIP ﬂooding attacks. They

assert that the false alarms produced are negligible.

However, they take into account only DoS events and

they create their datasets by artiﬁcially injecting the

simulated attack trafﬁc to the normal one. In opposite

to that, we use realistically simulated attacks by using

a rich variety of call rates and considering different

conﬁgurations in the number of users.

The authors in (Akbar and Farooq, 2014), (Nas-

sar et al., 2008) present two rather similar methodolo-

gies for protecting VoIP services against ﬂooding and

Spam over Internet Telephony (SPIT) attacks. For the

ﬁrst one, the authors introduce a real-time mechanism

containing a feature computation module that extracts

a set of spatial (changes in IDs or IP address) and tem-

poral (call ratio) features from SIP packets. The gen-

erated vectors of features are fed to Naive Bayes and

J48 classiﬁers. As in (Akbar and Farooq, 2009), for

creating their training dataset, the authors inject the

attacks into the normal trafﬁc.

The work in (Nassar et al., 2008) proposes a real-

time monitoring system to detect abnormal SIP mes-

sages based on SVM classiﬁer. The authors make use

of 38 dissimilar features aiming to detect SPIT and

ﬂooding assaults. These features are extracted by di-

viding SIP signalling into a number of small portions.

A major difference from our work lies in the exces-

sive number of features they use, which in turn may

cause ambiguity in the classiﬁcation process. Also,

the authors concentrate solely on SVM, thus leaving

aside sereral other ML detectors.

Lastly, the authors in (Bouzida and Mangin, 2008)

introduce another framework for detecting anoma-

lies in SIP. Their proposal capitalizes on the decision

tree classiﬁer, focusing on resource ﬂooding attacks

in general and password guessing ones in particular.

They construct a model based on SIP attributes in-

cluded in <To>, <From>, and <Username> head-

ers. A detection accuracy of over 99% is reported.

Nevertheless, this result refers to DoS incidents, while

DDoS are left unaddressed.

6 CONCLUSIONS

In network intrusion detection, a typical method for

exposing attacks is by tracking the network activity

for any anomaly. That is, any discrepancy from a pre-

viously learned normal proﬁle is identiﬁed as suspi-

cious. This procedure is usually done using meth-

ods borrowed from the machine learning realm. So

far, this potential have been examined in the litera-

ture in a great extend. However, as discussed in Sec-

tion 5 in the case of VoIP in general and SIP in par-

ticular, works on this topic are not only scarce but

also incomplete. To ﬁll this striking gap, in this pa-

per, we try to better assess the power of ML-based

techniques to identify (D)DoS incidents that capital-

ize on the use of SIP signaling. We consider 5 dif-

ferent popular ML detectors and a plethora of realisti-

cally simulated SIP trafﬁc scenarios representing dif-

ferent ﬂavors of (D)DoS. The results indicate that spe-

ciﬁc classiﬁers present high accuracy even in cases of

low-rate DoS attacks. The best results for DDOS are

obtained for the classiﬁer introducing the maximum

overhead, and thus accuracy may at a hefty price.

To grab a better understanding of the effectiveness of

SECRYPT2015-InternationalConferenceonSecurityandCryptography

306

Table 2: Summary of results for all the scenarios (The best performer per scenario in terms of FP is in bold).

Trafﬁc (Calls)

SMO Naive Bayes Neural Networks Decision Trees (J48) Random Forest

FP FN FP FN FP FN FP FN FP FN

Total Rec. Attack Rec. % % % % % % % % % %

SN1.1 11.3k 9.7k 2.1 0 0.3 0 2 0 0 0 0 0

SN1.2 14k 12.3k 1.8 0 0.15 0 1.7 0 0 0 0 0

SN1.3 15.4k 11.3k 3.7 0 0.24 0 3.7 0 0 0 0 0

SN2.1 12k 7.9k 0.01 0 0.25 0 0 0 0 0 0 0

SN2.2 13k 9.2k 0.06 0 0.28 0 0 0 0 0 0 0

SN2.3 24.5k 22.8k 0 0 0.11 0 0 0 0 0 0 0

SN3.1 667k 568k 0.09 0.04 0.02 0.85 0.3 0.01 0 0.01 0.04 0.01

SN4.1 178k 168k 0 0.01 0.01 0.01 0 0 0 0 0.02 0

SN5.1 262k 200k 0.02 0.05 0.05 0.08 0.4 0.01 0.01 0.01 0.06 0.01

SN5.2 667k 611k 0.02 0.01 0.09 0.01 0.28 0.01 0.03 0.01 0.08 0.01

SN6.1 175k 23k 17.7 0.01 11.2 0 0.04 0 2.4 0 3 0

SN6.2 114k 50k 0.18 0 0.55 0 0.01 0 0 0 0 0

SN7.1 203k 11k 10.4 0 11.3 0 5.2 0 7.3 0 5.2 0

SN7.2 144k 50k 0.51 0 1 0 0.25 0 0.27 0 0.25 0

SN7.3 128k 33k 0.78 0 0.91 0 0.25 0 0.31 0 0.24 0

Table 3: Summary of evaluation metrics for Statistical Schemes in DDoS scenarios (M

= 1, 000).

SN Low-rate

Entropy Hellinger Distance ML Techniques (Top performer)

FP FN FP FN FP FN

% % % % % %

SN6.1 X 0 13.3 36 0.01 0.04 0

SN6.2 0.97 43.5 1.8 0 0 0

SN7.1 X 4.4 5.41 8 5.41 5.2 0

SN7.2 18.1 34.5 3.38 0 0.25 0

SN7.3 X 0 25.7 2.49 5.45 0.24 0

this kind of detection, we compare the obtained re-

sults against those generated by two other anomaly-

based methods, namely Entropy and Hellinger Dis-

tance. From this comparison one can safely argue

that ML techniques appreciably surpass non-machine

learning ones in terms of FN and up to a certain ex-

tend in terms of FP.

From the discussion given in the results subsec-

tion, one can mark down some directions for future

work. The ﬁrst one has to do with the extension of

this work to embrace real-time detection of (D)DoS

incidents using the same techniques. A second one in-

volves extensive experimentation with the M

param-

eter in an effort to better assess its overall effect on the

detection process. The last one pertains to the evalua-

tion of more advanced classiﬁers regarding its ability

to cope with DDoS attacks in VoIP ecosystems.

ACKNOWLEDGEMENTS

This paper is part of the 5179 (SCYPE) research

project, implemented within the context of the

Greek Ministry of Development-General Secretariat

of Research and Technology funded program Ex-

cellence II / Aristeia II, co-ﬁnanced by the Euro-

pean Union/European Social Fund - Operational pro-

gram Education and Life-long Learning and National

funds.

REFERENCES

Akbar, M. A. and Farooq, M. (2009). Application of evo-

lutionary algorithms in detection of sip based ﬂooding

attacks. In Proceedings of the 11th Annual confer-

ence on Genetic and evolutionary computation, pages

1419–1426. ACM.

Akbar, M. A. and Farooq, M. (2014). Securing sip-based

voip infrastructure against ﬂooding attacks and spam

over ip telephony. Knowledge and information sys-

tems, 38(2):491–510.

Bouzida, Y. and Mangin, C. (2008). A framework for de-

tecting anomalies in voip networks. In Availability,

Reliability and Security, 2008. ARES 08. Third Inter-

national Conference on, pages 204–211. IEEE.

Eastlake, D. and Hansen, T. (2011). Us secure hash algo-

rithms (sha and sha-based hmac and hkdf). Technical

report, RFC 6234, May.

Ehlert, S., Geneiatakis, D., and Magedanz, T. (2010). Sur-

vey of network security systems to counter sip-based

denial-of-service attacks. Computers and Security,

29(2):225 – 243.

BattlingAgainstDDoSinSIP-IsMachineLearning-basedDetectionanEffectiveWeapon?

307

Gates, C. and Taylor, C. (2007). Challenging the anomaly

detection paradigm: A provocative discussion. In

Proceedings of the 2006 Workshop on New Security

Paradigms, NSPW ’06, pages 21–29, New York, NY,

USA. ACM.

Geneiatakis, D., Dagiuklas, T., Kambourakis, G., Lambri-

noudakis, C., Gritzalis, S., Ehlert, K., and Sisalem, D.

(2006). Survey of security vulnerabilities in session

initiation protocol. Communications Surveys Tutori-

als, IEEE, 8(3):68–81.

Geneiatakis, D., Kambourakis, G., Lambrinoudakis, C.,

Dagiuklas, T., and Gritzalis, S. (2005). Sip message

tampering: The sql code injection attack. In Proceed-

ings of 13th International Conference on Software,

Telecommunications and Computer Networks (Soft-

COM 2005), Split, Croatia.

Geneiatakis, D., Kambourakis, G., Lambrinoudakis, C.,

Dagiuklas, T., and Gritzalis, S. (2007). A framework

for protecting a sip-based infrastructure against mal-

formed message attacks. Communications Networks,

Elsevier, 51(10):2580–2593.

Geneiatakis, D., Vrakas, N., and Lambrinoudakis, C.

(2009). Utilizing bloom ﬁlters for detecting ﬂooding

attacks against SIP based services. Computers & Se-

curity, 28(7):578–591.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,

P., and Witten, I. H. (2009). The weka data min-

ing software: An update. SIGKDD Explor. Newsl.,

11(1):10–18.

Kamailio (2014). the open source sip server.

http://www.kamailio.org/w/.

Kambourakis, G., Kolias, C., Gritzalis, S., and Park, J. H.

(2011). Dos attacks exploiting signaling in UMTS and

IMS. Computer Communications, 34(3):226 – 235.

Keromytis, A. D. (2011). Voice over IP Security - A Com-

prehensive Survey of Vulnerabilities and Academic

Research., volume 1 of Springer Briefs in Computer

Science. Springer.

Keromytis, A. D. (2012). A comprehensive survey of voice

over ip security research. IEEE Communications Sur-

veys and Tutorials, 14(2):514–537.

Krishnamurthy, R. and Rouskas, G. (2013). Evaluation of

sip proxy server performance: Packet-level measure-

ments and queuing model. In Communications (ICC),

2013 IEEE International Conference on, pages 2326–

2330.

Mohr, C. (2014). Report: Global voip services market to

reach 137 billion by 2020.

Nassar, M., Festor, O., et al. (2008). Monitoring sip trafﬁc

using support vector machines. In Recent Advances in

Intrusion Detection, pages 311–330. Springer.

Nikulin, M. (2001). Hellinger distance. Encyclopeadia of

Mathematics.

Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,

Peterson, J., Sparks, R., Handley, M., and Schooler,

E. (2002). Sip: Session initiation protocol. Internet

Requests for Comments.

Shannon, C. E. (2001). A mathematical theory of commu-

nication. SIGMOBILE Mob. Comput. Commun. Rev.,

5(1):3–55.

Stanek, J. and Kencl, L. (2011). Sipp-dd: Sip ddos ﬂood-

attack simulation tool. In Computer Communications

and Networks (ICCCN), 2011 Proceedings of 20th In-

ternational Conference on, pages 1–7.

Tang, J., Cheng, Y., Hao, Y., and Song, W. (2014). Sip

ﬂooding attack detection with a multi-dimensional

sketch design. Dependable and Secure Computing,

IEEE Transactions on, PP(99):1–1.

Tsiatsikas, Z., Geneiatakis, D., Kambourakis, G., and

Keromytis, A. D. (2015). An efﬁcient and easily de-

ployable method for dealing with dos in sip services.

Computer Communications, 57(0):50 – 63.

Tsiatsikas, Z., Kambourakis, G., and Geneiatakis, D.

(2014). Exposing resource consumption attacks in

internet multimedia services. In proceedings of 14th

IEEE International Symposium on Signal Processing

and Information Technology (ISSPIT), Security Track,

pages 1–6. IEEE Press.

Witten, I. H. and Frank, E. (2005). Data Mining: Practi-

cal Machine Learning Tools and Techniques, Second

Edition (Morgan Kaufmann Series in Data Manage-

ment Systems). Morgan Kaufmann Publishers Inc.,

San Francisco, CA, USA.

SECRYPT2015-InternationalConferenceonSecurityandCryptography

308