Canopy: A Learning-based Approach for Automatic Low-and-Slow

DDoS Mitigation

Lucas Cadalzo, Christopher H. Todd, Banjo Obayomi, W. Brad Moore and Anthony C. Wong

Two Six Labs, Arlington, VA, U.S.A.

Keywords:

Network Defense, Distributed Denial of Service, LSDDoS, Machine Learning.

Abstract:

In a low-and-slow distributed denial-of-service (LSDDoS) attack, an adversary attempts to degrade the server

with low-bandwidth requests specially crafted to slowly transmit data, consuming an inordinate amount of

the server’s resources. This paper proposes Canopy, a novel approach for detecting LSDDoS attacks by

applying machine learning techniques to extract meaning from observed patterns of TCP state transitions.

While existing works have presented techniques that successfully mitigate different examples of LSDDoS

attacks, Canopy has uniquely shown the ability to mitigate a diverse set of LSDDoS attacks, including never-

before-seen attacks, all while maintaining a low false positive rate. Canopy is able to detect and mitigate

low-and-slow attacks accurately and quickly: our tests ﬁnd that attacks are identiﬁed during 100% of test

runs within 650 milliseconds. Server performance is restored quickly: in our experimental testbed, we ﬁnd

that clients’ experience is restored to normal within 7.5 seconds. During active attack mitigation, which only

occurs during server performance degradation indicative of an attack, Canopy exhibits minimal erroneous

mitigative action applied to benign clients as it achieves a precision of 99%. Finally, we show that Canopy’s

capabilities generalize well to LSDDoS attacks not included in its training dataset, identifying never-before-

seen attacks within 750 milliseconds.

1 INTRODUCTION

Distributed denial-of-service (DDoS) attacks remain

a pervasive cybersecurity threat with impacts that vary

from minor nuisances, e.g. knocking rivals ofﬂine in

online competitive games (Plante, 2015), to national

security, as when the nation-state of Georgia’s Inter-

net was disrupted ahead of a Russian land invasion in

2008 (Markoff, 2008). DDoS attacks are a perennial

and widespread problem: in a seminal paper (Moore

et al., 2006) on the enumeration of DDoS attacks,

Moore described the observation of 68,000 such at-

tacks between 2001 and 2004. The NetScout Threat

Intelligence Report for the second half of 2018 (Modi,

2018) estimated monthly DDoS attacks numbering in

the hundreds of thousands.

Traditional DDoS attacks attempt to overwhelm a

service by transmitting such a large volume of trafﬁc

that the server is unable to handle all of the requests

it receives, including those of legitimate clients. If

the attacker is able to generate more trafﬁc than the

service can handle, the service will be forced to drop

requests.

Today, various commercial services such as Aka-

mai and CloudFlare are effective in mitigating these

kinds of attacks. In 2016, a world-record-setting, 620-

Gbps DDoS attack was mounted against KrebsOnSe-

curity.com; however, this attack was unsuccessful, ab-

sorbed by Akamai’s DDoS mitigation service (Krebs,

2016). Volumetric attacks are also quite expensive

to launch. A recent study found that a small, 1,000-

machine DDoS attack costs on the order of $25 per

hour (Makrushin, 2013), an attack that would be all-

but-unnoticed by a service with DDoS protection.

In addition to their high cost, volumetric attacks

are also relatively easy to detect due to the abnormal

increase in trafﬁc volume they entail. In comparison,

low-volume attacks send smaller amounts of data and

can more easily evade detection systems.

Low-volume or low-rate attacks stealthily degrade

server performance through cleverly crafted transmis-

sions of data. While these terms encompass a wide

body of DDoS approaches, our work focuses on a

subset of this category called low-and-slow attacks.

These attacks slowly send small streams of data that

keep connections alive for long periods of time, ty-

ing up server resources throughout the process. There

356

Cadalzo, L., Todd, C., Obayomi, B., Moore, W. and Wong, A.

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation.

DOI: 10.5220/0010192303560367

In Proceedings of the 7th International Conference on Information Systems Security and Privacy (ICISSP 2021), pages 356-367

ISBN: 978-989-758-491-6

are different ways in which attackers can utilize this

kind of trafﬁc to create a denial-of-service condi-

tion. At the application layer, they can exhaust web

servers through specially crafted HTTP requests. At

the transport layer, attackers can exploit vulnerabili-

ties in the TCP stack. We will detail the scope of the

low-and-slow attacks we employ in Section 3.1.2.

The relatively low resource requirement for

mounting a low-and-slow attack has two important

consequences. First, the potential for abuse is much

higher as the barrier to launching an attack is reduced.

Additionally, these attacks are more difﬁcult to de-

tect since they are not characterized by large bursts

of trafﬁc. LSDDoS attacks not only expose vulner-

able Internet services to lower-resourced adversaries,

but also present a distinct advantage for adversaries

against whom defenders were previously on equal

footing.

Canopy is designed to protect services from such

attacks, both detecting and, importantly, mitigating

LSDDoS adversaries in real-time. We present what

we believe is the ﬁrst application-agnostic defense for

mitigating a diverse set of low-and-slow attacks, in-

cluding attacks not represented in the defense’s train-

ing dataset, or signature database.

2 BACKGROUND

Denial-of-service attacks have been a persistent threat

in the computer security space for decades; as such,

a large volume of research toward mitigating their

threat has developed. Denial-of-service attacks whose

efﬁcacy is based on raw volume of data have estab-

lished effective methods of mitigation (Tripathi and

Mehtre, 2013; Specht and Lee, 2003). The cost of

these methods generally scales with the size of the at-

tack, though research (Fayaz et al., 2015) into making

these methods more efﬁcient continues.

The focus of this paper, however, is not on these

”brute-force” denial-of-service attacks. We also seek

to distinguish our work from research on low-volume

attacks that induce congestion through periodic bursts

or ”pulses” of trafﬁc (Kuzmanovic and Knightly,

2003; Zhang et al., 2012; Zhou et al., 2017). Our

work pertains speciﬁcally to low-and-slow attacks,

whose efﬁcacy stems from requests crafted to ex-

haust its target’s resources through low-bandwidth,

slow transmissions of data. Despite progress made

in the research community, defense against low-and-

slow attacks remains an open problem. This section

reviews LSDDoS research within the context of these

key characteristics of an effective solution:

1. Operates in real-time

2. Is not tied to a speciﬁc application

3. Can detect a variety of low-and-slow attacks

4. Can detect low-and-slow attacks not included in

signature database

Real-time Detection. Considering that the ultimate

goal in DDoS detection is the mitigation of attacks, it

is imperative that a viable detection system operates

in real-time. Siracusano et al (Siracusano et al., 2018)

extract TCP statistics for benign and malicious ﬂows,

including multiple low-and-slow attacks: slowread,

slowheaders, and slowbody. The experiments reveal

that, given this set of features, decision trees and KNN

algorithms are able to achieve very high accuracy in

detecting malicious ﬂows. The fact that these attacks,

which exploit vulnerabilities at the application layer,

yield highly separable transport-layer features is a no-

table ﬁnding that is corroborated by our work. The

real-world applicability of this approach is limited,

however, by the fact that features are obtained through

post-processing steps following the capture of net-

work trafﬁc, rather than in real-time.

Sharafaldin et al (Sharafaldin et al., 2018) intro-

duced the CICIDS2017 dataset, a widely-used public

source of LSDDoS trafﬁc. The authors additionally

trained various classiﬁers on features extracted from

their experiments in a post-processing fashion. While

the high accuracy of these models provides evidence

as to possible features to include in a mitigative de-

fense, the ofﬂine manner in which features were ex-

tracted limits the direct utility of such a solution. The

dataset is also limited in the scope of LSDDoS attacks

it contains, as well as by the fact that attackers and be-

nign clients target different services.

Application Agnosticism. An additional considera-

tion when assessing the utility of LSDDoS solutions

is the degree of versatility with regard to the applica-

tions it protects. FINELAME (Demoulin et al., 2019)

detects low-and-slow attacks, such as Slowloris,

through the use of probes that identify anomalous re-

source utilization. This approach, however, requires

host-based instrumentation of executable code upon

installation, as well as componentization of the pro-

tected application. FINELAME’s performance is also

dependent on the K-means algorithm parameter that

needs to be altered depending on the number of re-

quest types in the application. We seek to design a

system that is easily portable to new environments

and not tied to any one application.

Robustness to Attack Diversity. An important at-

tribute of any LSDDoS defense is the ability to detect

a variety of low-and-slow attacks that differ in nature.

Attacks can be conducted at both the transport layer

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

357

and, more commonly, the application layer. Within

the scope of application layer LSDDoS, attackers can

exploit the HTTP protocol through varying means in-

cluding slowly reading server responses, slowly send-

ing packets, opening many partial connections, and

requesting large numbers of overlapping byte ranges.

Smart Detection (Lima Filho et al., 2019) presents

a detection system that, from pcap ﬁles, extracts

statistics on IP lengths, source and destination ports,

as well as TCP ﬂags. These features are then input

into a random forest model. The model achieves high

accuracy on a dataset that consists primarily of ﬂood-

ing attacks, but also includes HTTP-based LSDDoS

attacks. It remains inconclusive, though, whether

Smart Detection can defend against LSDDoS attacks

that are not HTTP-based, such as Sockstress. Addi-

tionally, since the vast majority of instances in the

dataset are from ﬂooding attacks, its performance on

LSDDoS is difﬁcult to evaluate absent a breakdown

of results by attack type.

Detection of Unknown Attacks. Methods for DDoS

detection can broadly be grouped into two categories:

anomaly-based and signature-based. Anomaly-based

systems identify attacks by developing a proﬁle of be-

nign behavior and ﬂagging trafﬁc that deviates too

signiﬁcantly from this proﬁle (Demoulin et al., 2019;

Ranjan et al., 2008; Wang et al., 2017). While these

systems are thus naturally suited to detect new or un-

known attacks, they often suffer from high false posi-

tive rates.

Signature-based methods entail building a dataset

of benign and malicious trafﬁc. These systems can

then identify attacks assuming the trafﬁc exhibits sim-

ilar characteristics as the previously collected mali-

cious trafﬁc. The downside of this approach is that

this assumption may not hold true for attacks not in-

cluded in the signature dataset (i.e. new or unknown

attacks). Given the constantly-evolving nature of LS-

DDoS threats, it is important that a real-world defense

can detect new or unknown attacks. Absent this capa-

bility, defenses would need to undergo expensive cy-

cles of data collection and model retraining whenever

new attacks surface.

Researchers can assess the ability of signature-

based methods to identify new or unknown attacks by

omitting certain attacks from the signature database.

Recent work follows this methodology and suggests

that signature-based methods have the potential to

detect DDoS attacks not included in the collected

dataset (Demoulin et al., 2018; Saied et al., 2016). We

seek to determine to what extent this holds true for

LSDDoS attacks speciﬁcally, including cases where

the omitted attack differs substantially in nature from

those in the signature database.

3 CONTRIBUTION AND

METHODOLOGY

This section describes our LSDDoS defense, Canopy,

in detail. First, we introduce our custom testbed, in-

cluding the myriad components and parameters that

allow us to experiment with different test scenarios.

We then detail our novel data featurization process

that enables us to extract rich information from TCP

state sensors in real-time. Next, we discuss how our

detection and mitigation engines work to protect the

system under test. In Section 3.4, we describe the

machine learning techniques we employ to make pre-

dictions based on the data we generate. Finally, in

Section 3.5, we discuss our experimental setup, de-

tailing how test runs are conﬁgured, as well as how

data is selected for training and validation

3.1 Architecture

To evaluate our defenses, we test attacks on a cus-

tom testbed with a virtual network connecting con-

tainerized instances of our server, clients, attackers,

and other necessary utility applications. We refer

to instances of a test as test runs, which are conﬁg-

ured in test plans that include the following param-

eters: number and type of clients, type and severity

of attack (e.g. number of threads, size of packets,

requests per second), the event timing of the run, a

victim system-under-test, and whether mitigation is

enabled. Containers are provisioned by Mesosphere

DC/OS

, through a custom test orchestration suite

that provides scheduling of the various containers dur-

ing each test run, job queue management, and exper-

iment creation via our custom management UI. We

use three machines for testing, each with a 28-core

Intel(R) Xeon(R) CPU E5-2660 v4 CPU @ 2.00GHz

and 128GB of RAM.

The containers we use in experimental test runs

are as described below. Test runs consist of three main

entities: the system subjected to attack, the attack-

ers, and legitimate clients attempting to obtain service

from that system.

3.1.1 System-Under-Test

For our work, we choose Apache HTTP Server

with WordPress as our System-Under-Test (SUT). We

choose this stack for two reasons: ﬁrst, it is widely

used on the web, ensuring our system has wide ap-

DC/OS is an open-source, distributed operating system

based on the Apache Mesos distributed systems kernel that

provides networking, service discovery, and resource man-

agement

ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy

358

Figure 1: Canopy system architecture.

plicability even before considering the agnosticism of

the techniques used. Second, there is a substantial

set of known LSDDoS attack implementations for this

stack. We speciﬁcally use Apache 2.2.11 for its vul-

nerability to the set of attacks we wish to test. SUTs

are conﬁgured with 8 CPUs and 16GB of RAM. SUTs

are based on stock Ubuntu 14.04 with minimal ad-

ditions other than of the aforementioned server stack

and a lightweight TCP state sensor, which uses the

conntrack

user space daemon to capture TCP con-

nection state changes.

Note that while our evaluation focuses on a par-

ticular web server, no aspect of Canopy is tailored

to this particular application. Furthermore, we em-

phasize that while the version of Apache we use has

been superceded, and the attacks we test on largely

mitigated, our techniques are not speciﬁc to this ap-

plication, version, or these attacks. Rather, we show

that Canopy is capable of detection and mitigation of

a given attack despite not having seen it before.

3.1.2 Attack and Client Containers

During simulation, attacks are carried out by one

of six types of test containers, each with eight pre-

determined levels of intensity. The set of imple-

mented attack images consists of Slowloris (Valialkin,

2014), R U Dead Yet (Shekyan, 2011), Slow

Read (Shekyan, 2011), Apache Killer (Stampar,

2011), Sockstress (Hornby, 2012), and a second im-

plementation of Slowloris (Shekyan, 2011). The se-

lection of attacks warrants careful consideration, as

we seek to employ attacks that differ in the means by

which they induce congestion. Sockstress attempts to

exhaust the SUT’s resources by means of TCP pro-

tocol exploitation, while the other attacks accomplish

this end by exploiting different aspects of the HTTP

protocol. As we will show, TCP state transition pat-

terns are affected even in the case of an attack exe-

cuted at the application layer, for example, by a TCP

connection being held longer in the ESTABLISHED

conntrack is a set of user space tools allowing interac-

tion with the kernel connection tracking system

Figure 2: Transforming conntrack sensor data into TCP

state vectors.

Figure 3: Imputing missing time windows using a ”sample

and hold” approach.

TCP state during a slow HTTP request. The differing

targets of these attacks help us ensure that Canopy’s

usefulness is not dependent on the protocol being ex-

ploited.

In our simulations, synthetic legitimate-user traf-

ﬁc is generated by HTTP request agents that mimic

human behavior. These customized agents, based on

Locust (Bystr

om et al., 2019), continuously browse

the SUT, traversing randomly from link to link within

the hosted WordPress site. Locust allows us to easily

scale the number of clients per container and their rate

of activity. We also use Noisy (Hury, 2019), a website

crawler whose intended purpose is to add ”noise” to

mask a user’s web browsing patterns. While these two

methods of trafﬁc generation are similar in intent, im-

plementation details can cause differences in the TCP

state transition patterns they exhibit, hence the inclu-

sion of both.

3.2 TCP State Data Featurization

This section describes the steps we take to convert raw

TCP sensor data into a form that can be input into a

detection model. Figure 1 shows these steps in re-

lation to the overall system. The key components of

this featurization pipeline are our TCP state sensor,

our data transformation process, and ﬁnally our im-

putation engine.

3.2.1 TCP State Transition Sensor

As mentioned in Section 3.1.1 and depicted in Fig-

ure 1, data is captured from the TCP state sensor co-

located with the SUT. This sensor utilizes conntrack

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

359

on the host, identifying TCP state transitions in con-

nections to the SUT. These transition messages are

sent to an Apache Kafka cluster, leading to the next

part of our pipeline: the data transformation step.

3.2.2 Transformation

At this step of the process, the various TCP state tran-

sition messages sent by the sensor are aggregated into

summary statistics for 100 milliseconds time win-

dows for each IP. This component of our system gath-

ers all the messages for an IP in a time window, and

then outputs a vector containing the percentage of the

IP’s open connections in each TCP state. For each

IP, we append 10 consecutive 100 milliseconds time

window vectors to form an array that’s input into a

classiﬁer. Figure 2 provides an example of how this

process works. While the window size and number

of windows that form an array are tunable parame-

ters, we selected these values based on the intuition

that the malicious behavior of an LSDDoS attack will

manifest itself in TCP states in less than a full sec-

ond (ten 100 millisecond time windows). In the future

we aim to empirically test the performance exhibited

when varying these parameters, as discussed in Sec-

tion 5.4 Limitations and Future Work.

3.2.3 Imputation

While we mentioned in the previous section that 10

consecutive time window vectors are appended for an

IP to form an input array, it is often the case that the

transformation engine does not receive data from the

TCP state sensor for a given IP during a time win-

dow. This occurs whenever an IP does not experience

any TCP state changes during a time window. In this

event, the missing window of data is imputed by ﬁll-

ing in the most recent TCP state vector outputted by

the transformation engine for the IP. This ”sample and

hold” approach is depicted in Figure 3.

During experimental runs used to generate train-

ing data, the imputed arrays are serialized and stored

in MinIO

. These arrays can later be converted from

serialized format into different forms necessary for

training models.

3.3 Inference and Mitigation Engines

Once the data exists in a form suitable for input into

a model, the next step is to classify examples as be-

nign or malicious. These predictions are fed into our

mitigation engine, which takes the steps necessary to

MinIO is an open-source cloud data storage service re-

leased under Apache License v2.

protect the SUT. Figure 1 again shows how these pro-

cesses relate to Canopy as a whole.

With mitigation enabled, imputed data is passed

along in real-time to an inference engine that makes a

prediction on each array by running the input through

a loaded model. The ﬁrst time an IP is seen by the in-

ference engine, its predicted class (benign or attack)

is sent to the mitigation engine. From that point for-

ward, a prediction is sent to the mitigation engine only

if the predicted class differs from the most recent pre-

diction for that IP.

The mitigation engine is co-located with the SUT

and is responsible for consuming prediction mes-

sages and mitigating attack IPs. When a new predic-

tion is received, the mitigation engine adds or drops

iptables

ﬁlter rules to deny or allow trafﬁc for the

classiﬁed IP.

3.4 Learning Methods

Our work aims to differentiate between malicious and

benign trafﬁc using a supervised learning approach,

using features extracted from the temporal patterns of

TCP state transitions. Our experiments include three

different methods of varying complexity for our clas-

siﬁcation model: temporal convolutional networks

(TCNs), decision trees, and an ensemble approach.

TCNs are a special case of convolutional neural

networks designed to learn meaningful relationships

between temporally-related features, and we include

this algorithm due to its success on tasks featuring

temporal data (Bai et al., 2018). We evaluate decision

trees because they have proven effective in LSDDoS

detection, and have even been shown to outperform

neural networks (Siracusano et al., 2018; Lima Filho

et al., 2019). The success of decision trees in our

initial experiments prompted us to also build an en-

semble of simpler classiﬁers (decision trees, random

forests, and logistic regression) in an attempt to im-

prove model robustness. To combine predictions from

the components of the ensemble, we simply average

the output probabilities.

3.5 Experimental Setup

In this section, we discuss the experimental condi-

tions under which we train and test Canopy.

Test Run Conﬁguration. As mentioned in Sec-

tion 3.1, our experimental testbed suite allows for the

speciﬁcation of many different parameters of each test

run, including attack type, intensity of attack, client

type(s), and whether mitigation is enabled. For each

iptables is a user space program allowing conﬁguration

of the kernel ﬁrewall

ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy

360

attack and attack level, we generate test runs includ-

ing all Locust clients, all Noisy clients, and an even

mix of both. Mitigation is disabled for the generation

of training data and is activated only to evaluate how

well models are protecting the SUT.

Global vs. N-1 and N-2 Experiments. In a global

experiment, we include data from all test run conﬁg-

urations in the training set. This means, of course,

that for any data the model is evaluated on, the model

has been trained on data generated using the same test

run conﬁguration. This experiment tests how well a

model detects known attacks.

To understand how a model would respond to un-

known attacks, we conduct two other types of experi-

ments. In an N-1 experiment, we leave one attack type

out of the training set. For example, in a Sockstress

N-1 experiment, we train the model on data from all

test run conﬁgurations except those that contain Sock-

stress attackers. This kind of experiment simulates an

unknown attack by excluding an attack (in this case,

Sockstress) from the training set, but testing against

it in the validation set. The N-2 experiments follow

the same logic, except we exclude two attack types

from the train set, testing the models’ ability to gen-

eralize given an even further limited set of attacks to

train on. These experiments demonstrate the general-

izability of Canopy, which is critically important to its

practical utility, given that new LSDDoS attack vec-

tors will continue to be discovered and abused.

4 EVALUATION

In this section, we discuss our performance evaluation

of Canopy. In Section 4.1, we cover the detection

metrics used to evaluate our models and provide re-

sults for different experiments. Next, in Section 4.2,

we deﬁne the mitigation metrics used to assess our

system in a practical test setting, providing results for

how well Canopy protects the SUT.

Before diving into the speciﬁcs of detection and

mitigation metrics, we highlight two key distinctions

when understanding our results:

Per-example vs. Per-client. First, the detection met-

rics we will discuss in Section 4.1 show AUC, pre-

cision, and recall computed on a per-example basis.

If, for example, one of 50 examples from a benign

Table 1: Mean detection results for global models, averaged

over each attack.

Method AUC Precision Recall

Decision Tree 0.99 0.99 0.95

Ensemble 0.99 0.99 0.95

TCN 0.96 0.91 0.91

client is classiﬁed as malicious, this constitutes a per-

example false positive rate of 2%. On the contrary, the

per-client false positive rate describes what percent of

benign clients are classiﬁed as malicious at least once

during the run. Using the previous example, if that

benign client is the only benign client, we would have

a per-client false positive rate of 100%.

Disabling vs. Enabling the Mitigation Engine. Sec-

ond, detection metrics (unless explicitly stated other-

wise) are computed from test runs where mitigation

is disabled and the SUT is left unprotected. This pro-

cess of evaluation is necessary to assess models be-

fore they are inserted into the mitigation engine. Im-

portantly, this means that these metrics are a function

of ﬂagged-IP examples that reach the system before,

during, and after the occurrence of an attack. On

the contrary, mitigation metrics are of course com-

puted with mitigation enabled. Thus, IPs ﬂagged as

malicious get blocked and no longer impact the met-

rics computed. To understand why this difference is

important, consider a model that struggles to detect

the onset of a attack, but correctly identiﬁes attack-

ers long after they’ve had their effect. Such a model

would yield poor mitigation metrics, but its detection

metrics would be inﬂated by parts of the run not con-

taining the onset of the attack.

4.1 Detection Results

Global Results. In Table 1, we observe the results of

the global experiments. Each global model is evalu-

ated on all attacks at varying levels, and the results

are averaged over the different attacks. The results

indicate that the ensemble and decision tree perform

very similarly, both outperforming the TCN in AUC,

precision, and recall.

N-1 Results. In these experiments (results illustrated

in Table 2), we ﬁnd that the ensemble classiﬁer stands

out as the best performing method with a precision

of 0.99, compared to 0.92 and 0.89 for the TCN and

decision tree classiﬁers, respectively. While TCN’s

recall of 0.91 exceeds those of the ensemble and deci-

sion tree methods, both 0.89, this metric is less critical

than precision in our context. The reasoning for this is

that false positives cause benign clients to be blocked,

and while false negatives may delay the mitigation of

an attack, Canopy can ﬂag the attacker at subsequent

Table 2: Mean detection results on a single never-before-

seen attack, averaged over all N-1 models.

Method AUC Precision Recall

Decision Tree 0.92 0.89 0.89

Ensemble 0.95 0.99 0.89

TCN 0.94 0.92 0.91

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

361

data examples it receives. Notably, the absolute per-

cent decrease in AUC, precision, and recall for the de-

cision trees in comparison to the global experiments

is 0.07, 0.10, and 0.06 respectively.

N-2 Results. For the N-2 experiments, we sample ﬁve

different subsets of attacks, each time omitting two

attacks from the training set. We then evaluate each

trained model on the two attacks omitted from its re-

spective training set. The results are averaged over all

ﬁve models for each method and can be found in Ta-

ble 3. The ensemble method again ﬁnishes with the

highest precision at 0.99. Its AUC of 0.96 also ex-

ceeds the TCN’s and decision tree’s AUC’s of 0.95

and 0.89 respectively.

Effect of Attack Type on Performance. We evalu-

ate each global model, averaging over the three learn-

ing methods, on all attacks and observe how perfor-

mance is dependent on the attack type. These results

are depicted in Table 4. Notably, Canopy’s perfor-

mance against Sockstress is on par with the HTTP-

based attacks, exemplifying Canopy’s application-

agnosticism.

The detection results indicate that Apache Killer

is the most challenging attack for Canopy to detect.

While the recall for all other attacks ranges between

0.95 and 0.98, Apache Killer is identiﬁed with a re-

call of 0.77. Though our detection metrics indicate

weak performance against Apache Killer with miti-

gation disabled, in operation with mitigation active,

recall rises from 0.77 to 0.98. This observation of-

fers a key insight: Canopy is able to identify an attack

like Apache Killer as it is ramping up and mitigates

it before later stages of the attack cause confusion for

the model. This also demonstrates that Canopy can

perform signiﬁcantly better during practical operation

than what may be reﬂected in detection metrics absent

mitigating action. This leads us to our mitigation met-

rics, which will be discussed in the following section.

4.2 Mitigation

To evaluate Canopy’s mitigation capability, we mount

an Apache Killer attack campaign against a speciﬁed

SUT and measure client experience throughout the

test run. We focus on the Apache Killer attack as it

represents a likely lower bound for Canopy’s mitiga-

tive performance, given the results in Table 4. We ex-

Table 3: Mean detection results on multiple never-before-

seen attacks, averaged over all N-2 models.

Method AUC Precision Recall

Decision Tree 0.89 0.92 0.86

Ensemble 0.96 0.99 0.86

TCN 0.95 0.95 0.86

amine global and N-1 models in mitigation, for each

learning method. In this section, we ﬁrst deﬁne our

mitigation policies. Next, we describe the metrics we

use to evaluate mitigation, examine the results for our

experiments, and visualize the impact Canopy has on

a system.

4.2.1 Mitigation Policies

A risk associated with Canopy and other defensive

systems is the possibility of incorrectly identifying

a client as an attack (false positive). A fundamen-

tal design principle in developing Canopy is to favor

models and policies that would reduce false positives

and impact to benign users. One measure we take

to this end is the implementation of a classiﬁcation

strike policy; the inference engine will not take de-

fensive action unless the active model classiﬁes an IP

as an attacker for multiple consecutive examples. Fur-

thermore, Canopy forces increasing backoff on IP ad-

dresses classiﬁed as malicious. These IPs are blocked

for an increasing amount of time for every such clas-

siﬁcation by the model. This ensures that attackers

will continue to be mitigated for long periods of time,

while client IPs that are misclassiﬁed as an attack will

eventually return to normal response rates.

4.2.2 Description of Mitigation Metrics Used

We have presented Canopy as a practical lightweight

LSDDoS defense that is server application-agnostic

and easily deployable. As such, we evaluate not

only our models’ ability to classify IPs in mitigation-

disabled runs, but also the practical measurable ef-

fects of Canopy’s ability to mitigate attacks in defense

of a server. Speciﬁcally, we measure:

• the time to detect an attack campaign

• the time to mitigate the effects of an attack, with

benign clients’ experience returning to normal

• the per-client false positive rate

Time to Detect. We measure this value as the time

between the ﬁrst TCP state transition reported from

an attacker IP and the time the model identiﬁes an

attacker IP.

Table 4: Mean detection results for global models on each

attack, averaged over each method.

Attack AUC Precision Recall

Goloris/Slowloris 0.99 0.95 0.95

R.U.D.Y. 0.99 0.96 0.99

Slowread 0.99 0.97 0.99

Apache Killer 0.93 0.99 0.77

Sockstress 0.99 0.99 0.98

ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy

362

Time to Mitigate. We measure the impact of an at-

tack by the time it takes for legitimate clients to inter-

act with the server. As our server is running a Word-

Press website, we measure this interaction by the time

it takes for the client to be served all content on a page

from the time that page was requested. We refer to

this measurement as Round-Trip Time, or RTT. When

we refer to a percentile RTT, we are referring to the

percentage of client requests in which the calculated

RTT is below a speciﬁed value. We consider an at-

tack to be mitigated when the RTT for client requests

return to normal levels. For each test run, a baseline

RTT is measured to determine the expected RTT that

a client will encounter when using the SUT during

non-attack conditions. A client’s experienced RTT is

considered to have returned to normal levels when it is

within one standard deviation of the baseline value.

4.2.3 Mitigation Results

Time to Detect. In global experiments, Canopy de-

tects the Apache Killer attack within 650 millisec-

onds. In N-1 experiments, the time to detect increases

to 750 milliseconds. Table 5 shows the average time it

takes for Canopy’s models to detect an attacker’s IP.

These metrics show that the models are identifying

patterns in how an attack communicates with a SUT

early in the attack, which is critical for mitigating in a

timely manner and reducing the impact on clients.

Time to Mitigate. After Canopy takes mitigating

action, the SUT begins to quickly recover, with client

request RTTs returning to pre-attack levels. In our ex-

periments, within 7.5 seconds, 90% of new requests

are served without impact. Table 6 shows the time

it takes for the average client request RTT to return

Table 5: Time to detect Apache Killer.

Experiment Time Since Attack

Campaign Start (ms)

Global DT 634

Global Ensemble 602

Global TCN 1925

N-1 DT 658

N-1 Ensemble 637

N-1 TCN 2009

The idea that we expect most or all clients to return to

within one standard deviation of a mean RTT may raise a

red ﬂag for some readers. However, this assumption is rea-

sonable due to the way we evaluate client RTT. The stan-

dard deviation cutoff that creates the bounds for a ”normal”

RTT is calculated based on individual RTTs measured be-

fore the attack. A client’s experienced RTT, used to de-

termine whether that client is back to experiencing normal

RTTs, is calculated as the mean of a window of RTTs.

to pre-attack levels for different learning methods, at-

tacks, and experiment types.

Per-client False Positive Rate. Table 7 depicts false

positive rates computed on a per-client level and re-

veals that models generally have low false positive

rates. The most-performant model classiﬁes under

5% of benign clients as attacks for a short period of

time (on average 4 seconds). Considering the duration

of these clients in our experiments, they thus spend

less than 0.1% of the time erroneously mitigated.

Moreover, during normal SUT operation where no at-

tack is degrading server performance, model classiﬁ-

cations are ignored and yield no impact to the user.

Only when the SUT is being impacted by an attack

does the mitigation engine block clients.

4.2.4 Visualizing the Mitigative Effect of Canopy

To show the impact of Canopy on the benign clients of

a SUT, we visualize the following two aspects of user

experience: the distribution of TCP states, as well as

client RTT. We will do so for the following three sce-

narios: no attack campaign present, attack campaign

present with no defense, and attack campaign present

with Canopy active.

No Attack Campaign Present. First, we observe a

run of our experimental system without an active at-

tack campaign to establish a baseline of performance

characteristics under normal operation. As seen in

Figure 4, our test run with only benign connections in-

cludes sessions that transition normally through TCP

states and end in TIME WAIT. This is expected, as old

client connections are aged out after completed ser-

vice requests (page-loads of the WordPress site). Fig-

ure 5 also shows that, after a short start-up, the RTT

is steady for the duration of the run.

Attack Campaign Present, No Defense. Let us next

observe the effect of Apache Killer on the SUT with-

out Canopy active. In this scenario, the attack begins

50 seconds into the test run with a duration of 70 sec-

onds.

Without mitigation enabled, the attack impacts

service availability dramatically. The TCP state tran-

sition (Figure 6) and RTT graphs (Figure 7) show a

taxed server with behavior that diverges signiﬁcantly

Table 6: Time to mitigate Apache Killer (seconds).

X% of Clients’ RTT Restored

Experiment 25% 50% 75% 90%

Global DT 5.54 5.54 5.76 5.76

Global Ensemble 5.92 5.98 6.14 6.24

Global TCN 6.57 6.63 6.78 7.11

N-1 DT 6.28 6.39 6.39 6.81

N-1 Ensemble 6.02 6.02 6.09 6.42

N-1 TCN 6.88 6.96 7.23 7.23

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

363

from our expectations under non-attack conditions. In

Figure 6, we see the manifestation of Apache Killer

clearly in the TCP state data; as expected, there is

a saturation of TCP states like ESTABLISHED during

the attack. Figure 7 shows that the average RTT in-

creases from an approximate 250 millisecond base-

line to about ﬁve seconds during the lifetime of the

attack.

Attack Campaign Present, Canopy Active. Now, we

demonstrate how Canopy affects benign clients when

mitigation is enabled. We insert the global ensemble

model into our mitigation engine since Table 5 indi-

cates that this model achieves the fastest time to de-

tect.

Figure 8 and Figure 9 illustrate the TCP states of

open connections and RTTs during an attack cam-

paign with mitigation enabled, respectively. The TCP

and RTT trends for the run leading up to the start of

the attack campaign mirror that of a scenario with

Canopy disabled until the attack campaign is miti-

gated by the Canopy (at approximately t=57 in our

example). At this time, attack connections are ﬁltered

out by the mitigation engine, allowing service avail-

ability to return to pre-attack levels. From this point

forward, the TCP state transitions and RTTs resemble

a scenario with no attack present.

It is critical to note that only a small percentage of

connections are affected by the attack when Canopy is

active. This can be observed in Figure 9: the 90th and

75th percentiles of average client RTTs spike while

the lower percentiles remain consistent near baseline

RTT for the duration of the experiment. The mod-

els and attacks tested all exhibit similar TCP and RTT

trends; the critical difference is that faster attack de-

tection and mitigation lead to quicker recoveries for

benign client performance.

Our mitigation results also show that Canopy

is effective at generalizing to new or unknown at-

tacks; Canopy detects attack campaigns even when

not trained on the particular type of attack. Though

the detection (and the subsequent mitigation) of an

attack when utilizing N-1 models is slightly delayed,

the SUT still recovers in a timely fashion. Detec-

tion time increases to 750 milliseconds when Apache

Table 7: Per-client false positive rate during Apache Killer

attack.

Experiment % of Benign Clients Mitigated

At Least Once During Test Run

Global DT 3.13

Global Ensemble 4.38

Global TCN 3.75

N-1 DT 2.50

N-1 Ensemble 2.35

N-1 TCN 6.88

Killer is held out from the training data, versus 650

milliseconds with a model trained with Apache Killer.

Recovery times for N-1 also increase by around one

second in comparison to global models.

5 DISCUSSION

In the previous section, we looked at the concrete re-

sults from our experimentation with Canopy. In this

section, we discuss these results in context, noting

some perhaps unexpected results, and examining the

potential impactfulness and practicality of Canopy in

the real world.

5.1 Comparing Performance of

Learning Methods

The detection and mitigation results presented in Sec-

tion 4 indicate that, overall, the ensemble method

achieves the best performance. In spite of their po-

tential to capture rich complexities and their speciﬁc

applicability to temporal data, TCNs are consistently

outperformed by ensemble and decision tree models

when inserted into our mitigation engine and eval-

uated for practical performance (i.e. our mitigation

metrics), as they can be susceptible to false posi-

tives and take signiﬁcantly longer to mitigate attacks.

The efﬁcacy of simpler machine learning methods is

a testament to the richness of the features extracted

through the TCP state data featurization process dis-

cussed in Section 3.2.

5.2 Generalizing to Never-Before-Seen

Attacks

The N-1 and N-2 results shown in Tables 2 and 3 re-

spectively indicate that Canopy is able to generalize

to never-before-seen attacks with minimal dropoff in

performance. This observation is further supported by

the minimal increase in the time to detect and mitigate

Apache Killer when using an N-1 model for mitiga-

tion, shown in Tables 5 and 6 respectively.

While the recall in the N-2 experiments decreases

to 0.86, the ensemble model maintains a precision of

0.99. This drop in recall doesn’t necessarily have an

adverse effect on practical performance, given that

Canopy need not correctly identify all examples from

an attacker to trigger mitigative action.

5.3 Comparison with Other Work

The comparison between Canopy and related work is

largely complicated by differences in datasets and ex-

ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy

364

Figure 4: TCP states with no attack campaign (see legend

in Figure 6).

Figure 6: TCP states with an Apache Killer campaign and

no defense.

Figure 8: TCP states with an Apache Killer campaign and

Canopy active (see legend in Figure 6).

perimental context (e.g. ofﬂine vs real-time feature

extraction, testing on never-before-seen attacks, etc.).

We attempt to provide this comparison, with appro-

priate caveats, in Table 8. We are unable to compare

the practical performance of Canopy with these solu-

tions, however, since most do not operate in real-time

or perform mitigation.

As can be seen, all the examples in this compar-

ison report metrics (accuracy, precision, recall) be-

tween 95% and 100%. In multiple cases (Lima Filho

et al., 2019; Sharafaldin et al., 2018), the results

pertaining speciﬁcally to LSDDoS are inconclusive

because the datasets used predominately comprise

high-volume trafﬁc. The lone real-time solution,

FINELAME, reports 100% accuracy in detecting

Slowloris, although this is the only LSDDoS attack

employed in the study. It is also worth noting that the

Figure 5: Request RTT with no attack campaign (see legend

in Figure 7).

Figure 7: Request RTT with an Apache Killer campaign

and no defense.

Figure 9: Request RTT with an Apache Killer campaign

and Canopy active. Timeline shortened to the duration of

the attack leading up to and shortly after mitigation. (See

legend in Figure 7.)

supervised approaches, all except FINELAME which

performs anomaly detection, do not mention evaluat-

ing on never-before-seen attacks.

In an attempt to further validate our approach, we

investigated evaluating our model on the LSDDoS

trafﬁc in the publicly available CICIDS2017 dataset

employed by Sharafaldin et al. However, the absence

of distinct external client IPs prevents the use of our

IP-based data featurization process.

The CICIDS2017 test network includes a NAT on the

border router that obscures source IP addresses of trafﬁc

to the victim server. Since the packet capture appears to

have been recorded from a network interface internal to the

victim’s network, all external trafﬁc appears to be coming

from the router itself.

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

365

Table 8: Comparison of results to related work.

Work Acc Prec Rec Dataset Real-

Time?

Mitigation? App-

agnostic?

Never-

Before-

Seen

Attacks?

(Lima Filho et al., 2019) N/A 0.99 0.97 mostly ﬂood attacks;

<1% slow HTTP

N N Y N

FINELAME (Demoulin

et al., 2019)

1.00 1.00 1.00 Slowloris Y N N Y

(Siracusano et al., 2018) 0.99 N/A N/A Slowloris, Slowread,

Slowpost

N N Y N

(Sharafaldin et al., 2018) N/A 0.98 0.97 mostly high-volume;

Slowloris

N N Y N

Canopy (global experiments) N/A 0.99 0.95 Slowloris, R.U.D.Y.,

ApacheKiller,

Slowread, Sockstress

Y Y Y N

Canopy (N-1 experiments) N/A 0.99 0.89 ” ” Y Y Y Y

While related work in the ﬁeld of DDoS preven-

tion has sought to use aspects of TCP state data to de-

tect attacks, our work presents a new approach for en-

capsulating TCP state data in a form that lends itself

well towards accurate, rapid prediction of malicious

behavior.

5.4 Limitations and Future Work

We have shown Canopy to be highly effective in de-

tecting and mitigating LSDDoS attacks. However,

during attack mitigation, Canopy erroneously blocks

roughly 5% of benign clients for 4 seconds each, on

average. Here, we discuss potential avenues to im-

prove Canopy’s performance.

Windowing Parameters. As described in Section 3.2,

TCP state data is aggregated in 100 millisecond time

windows for each IP. Given that TCP state changes of-

ten occur far more quickly, it may be useful to shrink

the size of this window. This increased granularity

could improve detection performance.

Input Resolution. Also mentioned in Section 3.2 is

the fact that each data example is composed of 10

time windows. While this number needs to be kept

reasonably small in order to maintain adequately fast

mitigation metrics as described in Section 4.2, short-

ening the duration of each window would enable the

resolution (or number of windows that make up an ex-

ample) to be increased without increasing the amount

of real time that an example spans. For example, 40

time windows of 25 milliseconds span the same dura-

tion of time as 10 windows of 100 milliseconds.

Increased Client Activity and Attack Variety. The

presented results are based on LSDDoS attacks and

clients listed in the paper. While we strive to choose

attackers and clients that were representative of what

Canopy would see in the wild, these sets can always

be improved. To increase the viability of the system,

more complex client user patterns, new LSDDoS at-

tacks, and a larger variety of SUTs could be added to

the training datasets.

5.5 Performance

While Canopy has a relatively small footprint, re-

source efﬁciency is not the primary goal for this pa-

per. We ﬁnd that the overhead of running our sensor

on the SUT is minimal

during periods without attack.

During such periods, CPU usage hovers around 5% of

a single core to track TCP states. During periods of

attack, resource usage increases: Canopy will utilize

an entire CPU core until the attack is mitigated. The

inference engine uses about half a core to make classi-

ﬁcation decisions, regardless of the presence or inten-

sity of an attack. We also reserve four CPU cores for

Kafka. While four cores is signiﬁcant with respect to

the resources of our small experimental testbed, this

aspect of the infrastructure scales well and would re-

main this size for a large real-world deployment.

6 CONCLUSION

With this paper, we proposed Canopy, a novel ap-

proach for identifying LSDDoS attacks by applying

machine learning techniques to extract meaning from

observed patterns of TCP state transitions. Our tests

showed that Canopy was able to detect and mitigate

these low-and-slow attacks accurately and quickly:

attacks were detected during 100% of test runs within

650 milliseconds, with clients’ experience restored to

normal within 7.5 seconds. Our tests also showed that

Future work may include replacing the inefﬁcient

conntrack userspace application with a custom TCP mon-

itoring kernel module of our own.

ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy

366

Canopy exhibits minimal erroneous mitigation of be-

nign clients, achieving a precision of 99%. Finally,

we showed that Canopy’s capabilities generalize well

to LSDDoS attacks not included in its training dataset,

identifying never-before-seen attacks within 750 mil-

liseconds.

ACKNOWLEDGEMENTS

This research was developed with funding from

the Defense Advanced Research Projects Agency

(DARPA) under Contract No. HR0011-16-C-0060.

This document was cleared for release under Distri-

bution Statement ”A” (Approved for Public Release,

Distribution Unlimited). The views, opinions, and/or

ﬁndings expressed are those of the authors and should

not be interpreted as representing the ofﬁcial views

or policies of the Department of Defense of the U.S.

Government. In alphabetical order, we would like to

thank Patrick Dwyer, Robert Gove, Heather Hardway,

Bryan Hoyle, Melissa Kilby, Alex Lim, Sean Morgan,

David Slater, and Scott Wimer, for their contributions

to the project.

REFERENCES

Bai, S., Kolter, J. Z., and Koltun, V. (2018). An em-

pirical evaluation of generic convolutional and recur-

rent networks for sequence modeling. arXiv preprint

arXiv:1803.01271.

Bystr

om, C., Heyman, J., Hamr

en, J., and Heyman, H.

(2019). Locust. https://github.com/locustio/locust.

Demoulin, H. M., Pedisich, I., Phan, L. T. X., and Loo,

B. T. (2018). Automated detection and mitigation of

application-level asymmetric dos attacks. In Proceed-

ings of the Afternoon Workshop on Self-Driving Net-

works, pages 36–42.

Demoulin, H. M., Pedisich, I., Vasilakis, N., Liu, V., Loo,

B. T., and Phan, L. T. X. (2019). Detecting asymmet-

ric application-layer denial-of-service attacks in-ﬂight

with ﬁnelame. In 2019 {USENIX} Annual Technical

Conference ({USENIX}{ATC} 19), pages 693–708.

Fayaz, S. K., Tobioka, Y., Sekar, V., and Bailey, M. (2015).

Bohatei: Flexible and elastic ddos defense. In 24th

USENIX Security Symposium (USENIX Security 15),

pages 817–832, Washington, D.C. USENIX Associa-

tion.

Hornby, T. (2012). Sockstress. https://github.com/defuse/

sockstress.

Hury, I. (2019). Noisy. https://github.com/1tayH/noisy.

Krebs, B. (2016). Krebsonsecurity hit with record ddos.

Kuzmanovic, A. and Knightly, E. W. (2003). Low-rate tcp-

targeted denial of service attacks: the shrew vs. the

mice and elephants. In Proceedings of the 2003 con-

ference on Applications, technologies, architectures,

and protocols for computer communications, pages

75–86.

Lima Filho, F. S. d., Silveira, F. A., de Medeiros Brito Ju-

nior, A., Vargas-Solar, G., and Silveira, L. F. (2019).

Smart detection: an online approach for dos/ddos at-

tack detection using machine learning. Security and

Communication Networks, 2019.

Makrushin, D. (2013). The cost of launching a ddos attack.

Markoff, J. (2008). Before the gunﬁre, cyberattacks.

Modi, H. (2018). Introducing netscout’s threat intelligence

report.

Moore, D., Shannon, C., J. Brown, D., M. Voelker, G., and

Savage, S. (2006). Inferring internet denial-of-service

activity. ACM Trans. Comput. Syst., 24:115–139.

Plante, C. (2015). Valve’s $18 million dota 2 tournament

delayed by ddos attack.

Ranjan, S., Swaminathan, R., Uysal, M., Nucci, A.,

and Knightly, E. (2008). Ddos-shield: Ddos-

resilient scheduling to counter application layer at-

tacks. IEEE/ACM Transactions on networking,

17(1):26–39.

Saied, A., Overill, R. E., and Radzik, T. (2016). Detection

of known and unknown ddos attacks using artiﬁcial

neural networks. Neurocomputing, 172:385–393.

Sharafaldin, I., Lashkari, A. H., and Ghorbani, A. A.

(2018). Toward generating a new intrusion detec-

tion dataset and intrusion trafﬁc characterization. In

ICISSP, pages 108–116.

Shekyan, S. (2011). Slowhttptest. https://github.com/

shekyan/slowhttptest.

Siracusano, M., Shiaeles, S., and Ghita, B. (2018). Detec-

tion of lddos attacks based on tcp connection param-

eters. In 2018 Global Information Infrastructure and

Networking Symposium (GIIS), pages 1–6. IEEE.

Specht, S. and Lee, R. (2003). Taxonomies of Dis-

tributed Denial of ServiceNetworks, Attacks, Tools,

and Countermeasures. Technical report, Princeton Ar-

chitecture Laboratory for Multimedia and Security.

Stampar, M. (2011). Killapachepy. https://github.com/

tkisason/KillApachePy/.

Tripathi, N. and Mehtre, B. (2013). Dos and ddos attacks:

Impact, analysis and countermeasures. pages 1–6.

Valialkin, A. (2014). Goloris. https://github.com/valyala/

goloris.

Wang, C., Miu, T. T., Luo, X., and Wang, J. (2017).

Skyshield: A sketch-based defense system against ap-

plication layer ddos attacks. IEEE Transactions on

Information Forensics and Security, 13(3):559–573.

Zhang, C., Cai, Z., Chen, W., Luo, X., and Yin, J. (2012).

Flow level detection and ﬁltering of low-rate ddos.

Computer Networks, 56(15):3417–3431.

Zhou, L., Liao, M., Yuan, C., and Zhang, H. (2017). Low-

rate ddos attack detection using expectation of packet

size. Security and Communication Networks, 2017.

Canopy: A Learning-based Approach for Automatic Low-and-Slow DDoS Mitigation

367