Decomposing Training Data to Improve Network Intrusion Detection
Performance
Roberto Saia, Alessandro Sebastian Podda, Gianni Fenu and Riccardo Balia
Department of Mathematics and Computer Science,
University of Cagliari, Via Ospedale 72, 09124 Cagliari, Italy
Keywords:
Intrusion Detection, Security, Networking, Data Decomposition, Algorithms.
Abstract:
Anyone working in the field of network intrusion detection has been able to observe how it involves an ever-
increasing number of techniques and strategies aimed to overcome the issues that affect the state-of-the-art
solutions. Data unbalance and heterogeneity are only some representative examples of them, and each mis-
classification made in this context could have enormous repercussions in different crucial areas such as, for
instance, financial, privacy, and public reputation. This happens because the current scenario is characterized
by a huge number of public and private network-based services. The idea behind the proposed work is decom-
posing the canonical classification process into several sub-processes, where the final classification depends on
all the sub-processes results, plus the canonical one. The proposed Training Data Decomposition (TDD) strat-
egy is applied on the training datasets, where it applies a decomposition into regions, according to a defined
number of events and features. The reason that leads this process is related to the observation that the same
network event could be evaluated in a different manner, when it is evaluated in different time periods and/or
when it involves different features. According to this observation, the proposed approach adopts different
classification models, each of them trained in a different data region characterized by different time periods
and features, classifying the event both on the basis of all model results, and on the basis of the canonical
strategy that involves all data.
1 INTRODUCTION
The exponential growth of network-based services,
now even more impressive due to the COVID-
19 (Chang and Meyerhoefer, 2020; Dhawan, 2020;
Rapanta et al., 2020) emergency, has made the prob-
lem of their security a central aspect of the every-
day life, public and private. The high degree of
heterogeneity (Zuech et al., 2015) that characterizes
network events, in terms both of type and behavior,
makes the detection of attacks a very difficult task.
This troubling scenario is made even more difficult by
considering that many attacks are actually conducted
by operating in apparently legitimate ways (i.e., they
are characterized by the same patterns of normal net-
work activities (Carta et al., 2020b)), whereas other
attacks are being detected for the first time and there-
fore we have no knowledge of them (zero-days at-
tacks (Radhakrishnan et al., 2019)), and everything
is further worsened by the high dynamism of the net-
work scenarios.
For this reason, tools widely used in the past with
good results, such as the Intrusion Detection Systems
(IDSs) (Tidjon et al., 2019), nowadays appear unable
to face the new threats with the same effectiveness.
This is a big problem, since there is the need to ensure
a high level of security to many crucial services such
as, for example, those related to the finance (Wazid
et al., 2019), health (Y
¨
uksel et al., 2017), and educa-
tion (Luker and Petersen, 2003).
To cope with the new threats, approaches dif-
ferent from the canonical ones have therefore been
sought, integrating increasingly sophisticated tech-
nologies and strategies that involve areas such as
machine learning (Gao et al., 2019), artificial intel-
ligence (Kanimozhi and Jacob, 2019), neural net-
works (Le et al., 2019), and so on, also by combin-
ing them to improve the intrusion detection perfor-
mance (Li and Lu, 2019; Meyer and Labit, 2020).
The idea on which the proposed strategy revolves
was born according to many literature works in this
field (Carta et al., 2020b; Saia et al., 2018b; Saia et al.,
2019; Saia et al., 2020), where we trivially observed
that the data used to train an evaluation model refers
to different period of time, as well as the features that
characterize each event refer to different aspect of it.
Saia, R., Podda, A., Fenu, G. and Balia, R.
Decomposing Training Data to Improve Network Intrusion Detection Performance.
DOI: 10.5220/0010661400003064
In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 1: KDIR, pages 241-248
ISBN: 978-989-758-533-3; ISSN: 2184-3228
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
241
Premising that the data to which we refer are those
collected by network security devices (e.g., an IDS),
on the basis of the previous observation, we thought
to divide the classification process into several sub-
processes, then on several evaluation models, each of
them trained on a different part of the dataset, in terms
of events and features. We believe that such a strategy
can be able to mitigate the event heterogeneity prob-
lem, classifying them on the basis of considerations
taken on different parts of the training data, therefore
on the basis of different time periods and event char-
acteristics, rather than based on a model trained on the
entire dataset.
In more detail, in this work we propose a Training
Data Decomposition (TDD) strategy to intrusion de-
tection, which is based on the subdivision of the train-
ing dataset in several regions, according to a number
of rows and columns that bound it in a certain number
of events (rows) and features (columns). Each region
is used to train a different evaluation model, and the
final classification of a new event is given by ensem-
bling all the models through a majority voting crite-
rion.
The main scientific contribution related to the pro-
posed work can be summarize as follows:
- formalization of a Training Data Decomposition
(TDD) strategy aimed to bound the training set on
the basis of a certain number of events (rows) and
features (columns);
- formalization of the comparison process between
an unevaluated network event and a series of eval-
uation models defined on the basis of the proposed
TDD strategy, along with the formalization of the
classification criterion;
- formalization of a classification algorithm able to
exploit the TDD strategy in order to classify each
new network event as normal or intrusion.
The remainder of this paper has been structured
in the following way: Section 2 discusses the back-
ground and related works of the intrusion detection
domain, discussing also about the most suitable eval-
uation metrics; Section 3 provides details about the
idea behind the proposed strategy, beside its practical
implementation; Section 4 ends this work with some
remarks on the proposed strategy, making mention of
future research directions.
2 BACKGROUND AND RELATED
WORK
The intrusion detection term has been introduced for
the first time in the eighties (Anderson, 1980), where
in his technical report the author tried to define a kind
of guideline to improve the computer security audit-
ing and surveillance capability of the computer sys-
tems. In the following years the literature proposed
numerous works focused on the intrusion detection
topic, both of a theoretical nature, where aspects such
as their taxonomy have been discussed (Axelsson,
2000), and practical application of them (Khraisat
et al., 2019), up to the present days, where the discus-
sion has extended to recent areas, such as that of the
Internet of Things (IoT) (Zarpel
˜
ao et al., 2017), the
cloud computing (Modi et al., 2013), and the smart
cities (Aloqaily et al., 2019).
The concept of intrusion leads back to a series
of attacks based on techniques and/or strategies that
evolve over time, an activity whose effectiveness
can depend on the skill of a human operator (Latha
and Prakash, 2017) or on a specific software/mal-
ware (Rehman et al., 2011), or both of them. To this
must also be added strategies not directly related to
a direct attack, such as, for instance, the social engi-
neering (Salahdine and Kaabouch, 2019) one.
An IDS is aimed to detect and identify unau-
thorized network activities in a network, and it
can perform this activity by following different ap-
proaches. On the basis of the literature, the most com-
mon of them are: anomaly-based, signature-based,
specification-based, and hybrid-based.
In more detail: the anomaly-based analyzes and
classifies the network events without comparing them
to those in a dataset of known event patterns, since it
adopts a heuristic/rules-based strategy, detecting the
intrusions on the basis of their atypical network ac-
tivity (Samrin and Vasumathi, 2017); the signature-
based works by comparing each detected network
event to those in a dataset of known event pat-
terns (Bronte et al., 2016); the specification-based
operates by inspecting the network protocols, with
the aim to identify non canonical sequences of com-
mands that can be part of an attack (Liao et al., 2013);
the hybrid-based approach can be considered a com-
bination of one or more of the aforementioned ap-
proaches (Li et al., 2005).
On the basis of the current literature, we can also
summarize the open problems that affect the different
intrusion detection approaches: the Anomaly-based
is able to identify novel form of network attacks
(zero-days), as well as the anomalous exploitation
of privileges, but it can not be considered an effec-
tive approach, due the high dynamism of the network
scenario and the high response-time; the Signature-
based well works in the context of known attacks or
their variations, but it is not able to inspect the in-
volved protocols, generating also an high computa-
KDIR 2021 - 13th International Conference on Knowledge Discovery and Information Retrieval
242
tional load; the Specification-based is able to inspect
the involved protocols, detecting their anomalous ex-
ploitation, but it is not able to distinguish those at-
tacks characterized by the same behavior of a legiti-
mate network activity and, in addition, it presents an
high computational cost related to the protocols in-
spection and tracing; the Hybrid-based presents the
same pros and cons of the combined approaches.
According to the approach and placement in the
network area, an IDS can be also classified in four
(most common) categories: Host-based, Network-
based, Network-node-based, and Distributed-based.
In more detail: the Host-based Intrusion Detec-
tion Systems (HIDSs) (Jose et al., 2018) adopts sev-
eral hosts to detect the network activity, detecting the
attacks, by comparing the events to a series of known
patterns (signature-based approach); the Network-
based Intrusion Detection Systems (NIDSs) (Mazini
et al., 2019) adopts a single host to detect the net-
work activity, it exploits a database of known patterns,
analyzing only the events characterized by unknown
patterns (hybrid signature-based and analysis-based
approach); the Network-Node-based Intrusion Detec-
tion Systems (NNIDSs) (Potluri and Diedrich, 2016)
adopts a single host strategically placed in the net-
work, operating by combining the HIDS and NIDS
strategies; the Distributed-based Intrusion Detection
Systems (DIDSs) (Amrita, 2018) adopts an hybrid
strategy based on the combination of all the aforemen-
tioned ones.
Moreover, recently, some security features offered
by techniques typically exploited in other areas have
been started to be taken into consideration for facing
network security tasks, such as, for instance, those
leveraging on the blockchain primitives (Vieira et al.,
2020; Longo et al., 2020).
2.1 Performance Evaluation
A preliminary consideration regarding the evaluation
metrics used in this domain is related to the fact that,
similar to some other domains (Saia et al., 2017; Carta
et al., 2020a; Saia and Carta, 2017a; Saia, 2017; Saia
and Carta, 2016; Saia et al., 2018a; Saia and Carta,
2017b), the involved data are usually characterized by
a high degree of imbalance (in the intrusion detection
context the minority class is the intrusion one), re-
quiring assessment metrics that are not biased by this
characteristic.
Since, in the literature, an intrusion detection task
is commonly expressed in terms of a binary problem,
then addressed as a classification task (Chuang et al.,
2019), the best approach requires to adopt multiple
evaluation metrics, in order to get a reliable evalua-
tion of the effectiveness of a classification model. For
this reason, simple metrics based on the confusion-
matrix, e.g., accuracy, sensitivity, and specificity, and
more sophisticated ones, such as those based on the
Receiver Operating Characteristic (ROC) curve, e.g.,
the Area Under the Receiver Operating Character-
istic curve (AUC), are often combined in the litera-
ture (Munaiah et al., 2016).
Moreover, considering that the main objective of
an intrusion detection system is the correct identifi-
cation and classification of the negative cases (intru-
sions), as their misclassification would have a higher
cost than that of the positive ones (normals), many of
the works in the literature use the specificity and the
AUC metrics.
3 PROPOSED STRATEGY
This section provides the formal notation adopted in
this work, along with the definition of the problem to
face, and the implementation of the proposed strategy.
3.1 Preliminary Notation
Premising that we adopted the notation |set| to in-
dicate the cardinality of a set, we denote as E =
{e
1
, e
2
, . .. , e
N
} a series of network events composed
by:
- a subset E
+
= {e
+
1
, e
+
2
, . . . , e
+
X
} of normal events,
then E
+
E;
- a subset E
= {e
1
, e
2
, . . . , e
Y
} of intrusion
events, then E
E;
- a subset
ˆ
E = { ˆe
1
, ˆe
2
, . . . , ˆe
M
} of unclassified
events, then
ˆ
E E.
So we have that E = (E
+
E
ˆ
E), and each
event e E is characterized by the features in the set
F = { f
1
, f
2
, . . . , f
W
}, and it can belong to one of the
classes in the set C = {normal, intrusion}. We also
formalize:
- the training set T = {e
1
, e
2
, . . . , e
K
} given by E
+
E
;
- the possibility to divide T into R = {r
1
, r
2
, . . . , r
Z
}
regions, according to the T events (set rows) and
features (set columns);
- the regions definition operation as R
(ER,FC)
, with
ER the number of Event Rows, and FC the num-
ber of Feature Columns, then |R| = Z = (ER ×
FC).
From this it follows that:
Decomposing Training Data to Improve Network Intrusion Detection Performance
243
- generalizing on set E, each region can be com-
posed by
N
ER
events and
W
FC
features, since |E| = N
and |F| = W ;
- the bounds of ER and FC are, respectively, 1
ER |T | and 1 FC |F|, but it must be con-
sidered that ER = FC = 1 defines the canonical
data configuration, where all the events and the
features are involved, and that each region de-
fined according the ER value must contain sam-
ples (events) of both classes in C, in order to allow
us to perform the training process of an evaluation
model.
3.2 Problem Statement
We face the intrusion detection in terms of binary
classification, according to the classes in the set C
(i.e., normal and intrusion). Hence, the problem can
be formalized as shown in Equation 1, where Ψ de-
notes a generic intrusion detection approach, whereas
eval( ˆe, Ψ) is the evaluation function of the event ˆe,
which returns 1 when a classification has been per-
formed correctly, 0 otherwise. This allows us to ex-
press this problem in terms of maximization of the ω
value, since it is given by the sum of the correct clas-
sifications (then the ω upper bound is |
ˆ
E|).
max
0ω≤|
ˆ
E|
ω =
|
ˆ
E|
m=1
eval( ˆe
m
, Ψ) (1)
3.3 Strategy Overview
We now briefly introduce the TDD strategy, a domain-
specific case of the bootstrap aggregation (Breiman,
1996) strategy, here proposed as an alternative to a
canonical intrusion detection evaluation model that
exploits, during the training process, all the data that
have been collected and correctly classified in the
past, involving all the events and features.
Indeed, we propose to combine different evalua-
tion models in a fusion fashion, each of them trained
on a different region of the training set. The regions
are bounded in terms of events and features, with the
aim to selectively capture the properties of each sub-
region by focusing on specific periods of time (rows of
events) and behavior (columns of features). This has
been made according to the simple scheme reported
in the following, where four regions of the training
set T , with two events and two features each one (one
of them has been highlighted), have been selected by
way of example:
T =
behavior
e
1
( f
1
) e
1
( f
2
) e
1
( f
...
) e
1
( f
W
)
time
e
2
( f
1
) e
2
( f
2
) e
2
( f
...
) e
2
( f
W
)
.
.
.
.
.
.
.
.
.
.
.
.
e
N
( f
1
) e
N
( f
2
) e
N
( f
...
) e
N
( f
W
)
This leads to the division of the training process
into several sub-processes, each of them based on a
different part of the dataset, in terms of events and
features. In other words, we speculate that such a
strategy is able to mitigate the issues related to the
event heterogeneity, since the classification of the new
events is now based on decisions taken on the basis
of different time periods and characteristics (aggre-
gated according to a criterion), rather than on the en-
tire dataset.
3.4 Strategy Formalization
The problem formalized in Equation 1 needs to be ar-
ranged according to the subdivision into regions we
introduced, substantially by subdividing the evalua-
tion process into Z sub-processes (i.e., |R| = Z). More
formally, a generic intrusion detection approach Ψ is
used Z times, and the final event classification will be
determined on the basis of all the classifications per-
formed by these approaches, as shown in the example
of Equation 2, where we hypothesized K = 4, W = 4,
ER = 2, and FC = 2, thus subdividing the training set
T into |R| = Z = (2 × 2) = 4 regions, each of them
composed by
K
ER
=
4
2
= 2 events and
W
FC
=
4
2
= 2 fea-
tures. This generates the four m
1
, m
2
, m
3
, m
4
evalua-
tion models.
R
(2,2)
=
"
r
1
r
2
r
3
r
4
#
=
f
1,1
f
2,1
f
3,1
f
4,1
f
1,2
f
2,2
f
3,2
f
4,2
f
1,3
f
2,3
f
3,3
f
4,3
f
1,4
f
2,4
f
3,4
f
4,4
"
m
1
m
2
m
3
m
4
#
(2)
Otherwise speaking, the process of training of the
evaluation model m of a classification algorithm is
performed by using the events and features in each
r
1
, r
2
, r
3
, r
4
regions of the Equation 2, generating the
four m
1
, m
2
, m
3
, m
4
evaluation models.
Assuming the evaluation of a new event ˆe
ˆ
E,
which on the basis of the scenario we hypothesized
will be composed by the f
1
, f
2
, f
3
, f
4
features, its eval-
uation and classification will be performed by com-
paring (we denoted this operation as ) it to all eval-
uation models trained on the different regions, inde-
pendently, and this process returns four classification
KDIR 2021 - 13th International Conference on Knowledge Discovery and Information Retrieval
244
c
1
, c
2
, c
3
, c
4
, according to the criterion shown in Equa-
tion 3.
c
1
=
h
m
1
i
h
f
1
f
2
i
c
2
=
h
m
2
i
h
f
3
f
4
i
c
3
=
h
m
3
i
h
f
1
f
2
i
c
4
=
h
m
4
i
h
f
3
f
4
i
(3)
3.4.1 Padding Rule
Considering that the number of regions given by the
values of FC and ER may not exactly partition the
sets F and T in the cases reported in Equation 4, we
need to formalize a padding rule aimed to face this
problem.
(|F| mod FC) 6= 0
(|T | mod ER) 6= 0
(4)
Introducing the notation µ
1
= (|F| mod FC) and µ
2
=
(|T | mod ER), according to the preliminary notation
provided in Section 3.1 (i.e., F = { f
1
, f
2
, . . . , f
W
} and
T = {e
1
, e
2
, . . . , e
K
}), we formalize in Equation 5 the
padding rule pad.
pad(F) = { f
1
, f
2
, . . . , f
W
, f
W +1
, f
W +2
, . . . , f
W +µ
1
}
with f
W +1
= f
W +2
= . . . = f
W +µ
1
= f
W
pad(T ) = {e
1
, e
2
, . . . , e
K
, e
K+1
, e
K+2
, . . . , e
K+µ
2
}
with e
K+1
= e
K
, e
K+2
= e
K1
, . . . = e
K+µ
2
= e
Kµ
2
(5)
Practically, with the aim of not significantly altering
the involved information during the padding process,
the problem has been faced by following two different
strategies: (i) in the case of F, we operate by dupli-
cating the last column (i.e., the last feature of the set
E) µ
1
times; (ii) in the case of T we duplicated the
last µ
2
rows (i.e., the last events of the set T ), reduc-
ing the risk that the added events belong to the same
class in C. The proposed naive solution does not sig-
nificantly affect the machine learning process, as it is
applied to both training and test data. For simplifica-
tion reasons, we will always consider this rule applied
during the process of definition of the regions (as in-
ternal preprocessing step), without referring to it from
time to time.
3.4.2 Classification Rule
As previously formalized, excepting for the case
ER = FC = 1 that bounds a single region (i.e., the
canonical data configuration), the process of classi-
fication of an event ˆe
ˆ
E involves a series of eval-
uation models m
1
, m
2
, . . . , m
Z
, whose cardinality de-
pends on the number of regions, since |R| = Z. Con-
sidering that such models generate c
1
, c
2
, . . . , c
Z
clas-
sifications, this configures one of the two cases re-
ported in Equation 6.
Case 1 : Z = 2n, n N
Case 2 : Z = 2n 1, n N
(6)
Whereas in the Case 2 an univocal classification
of the event is possible on the basis of the majority
classification, in the Case 1 we need to introduce a
discriminating element. For this purpose we use a
further classification c
Z+1
obtained by using an eval-
uation model trained on the entire set E, then we will
have the c
1
, c
2
, . . . , c
Z
, c
Z+1
classifications.
In more detail, assuming an example sce-
nario related Case 1, for instance by using ER =
FC = 2, which generate the classification models
m
1
, m
2
, m
3
, m
4
, providing the c
1
, c
2
, c
3
, c
4
classifica-
tions (each of them that belongs to a class in the set
C, then normal or intrusion) for an event ˆe
ˆ
E, the
final classification of the event ˆe is given by adding
the classification c
5
obtained by training an evalua-
tion model on the entire set T .
A majority criterion is then applied by following
the classification rule ρ formalized in Equation 7,
where c
1
and c
2
denote, respectively, the elements
normal and intrusion of the C set.
ρ( ˆe) =
c
1
, i f
Z
i=1
φ(c
i
, c
1
) >
Z
i=1
φ(c
i
, c
2
)
c
2
, i f
Z
i=1
φ(c
i
, c
1
) <
Z
i=1
φ(c
i
, c
2
)
c
1
, i f
Z
i=1
φ(c
i
, c
1
) =
Z
i=1
φ(c
i
, c
2
) c
Z+1
= c
1
c
2
, i f
Z
i=1
φ(c
i
, c
1
) =
Z
i=1
φ(c
i
, c
2
) c
Z+1
= c
2
with
φ(a, b) =
0, i f a 6= b
1, i f a = b
(7)
3.5 Algorithm Definition
On the basis of the proposed strategy, we formalized
the Algorithm 1, which is aimed to classify each new
network event. It takes as input a classification algo-
rithm α, the set of classified events T (i.e., the training
set), the set
ˆ
E of events to classify, and the number of
rows (ER) and columns (FC) for the regions defini-
tion, returning the classification of all events in the
set
ˆ
E.
At Steps from 2 to 4 an evaluation model is trained
by using the entire set T , if the numbers of regions
(i.e., Z = |ER × FC|) is even. At Step 5 the training
test T is processed in order to define the regions, ac-
cording to the ER and FC values. For each region, at
Steps from 6 to 9, an evaluation model of the algo-
rithm α is trained. The classification of the events in
ˆ
E is performed from Step 10 to 21, where: at Step 11
the features of the event ˆe to evaluate are divided into
Decomposing Training Data to Improve Network Intrusion Detection Performance
245
regions, according to the ER and FC values; at Steps
from 12 to 15 each region is classified according to
the M models previously trained, and an additional
classification based on the model trained at Step 3 is
added to the set C when the number of regions is even
(Steps from 16 to 19). At Step 20 the event ˆe is clas-
sified, and the classification is stored in the set κ. The
set κ with the classification of all events in the set
ˆ
E
is finally returned by the algorithm at Step 22.
Algorithm 1: Classifier algorithm.
Require: α=Classification algorithm, T =Evaluated events,
ˆ
E=Unevaluated
events, ER=Number of event rows, FC=Number of feature columns
Ensure: κ=Classification of the
ˆ
E events
1: procedure CLASSIFIER(α, T ,
ˆ
E, ER, FC)
2: if Z is even then Verifies if the number of regions is even
3: m
00
getTraining(α, T ) Trains evaluation model by using the
whole set T
4: end if
5: R getRegions(T, ER, FC) Divides training set into regions
6: for each r R do Trains an evaluation model for each regions
7: m getTraining(α, r) Trains evaluation model
8: M.add(m) Stores evaluation model
9: end for
10: for each ˆe
ˆ
E do Processes events in
ˆ
E
11: R
00
getRegions(ˆe,ER, FC) Divides event into regions
12: for each m M do Gets all event classifications
13: c getEventClass(m, R
00
) Classifies event according to
regions
14: C.add(c) Stores classification
15: end for
16: if Z is even then Verifies if the number of regions is even
17: c
00
getEventClass(m
00
, ˆe) Classifies event according to
the whole set E
18: C.add(c
00
) Add classification to the set C
19: end if
20: κ.add(getFinalClassi f ication( ˆe,C)) Gets and store final
event classification
21: end for
22: return κ Returns classification of
ˆ
E events
23: end procedure
4 CONCLUSIONS AND FUTURE
DIRECTIONS
This work proposes a Training Data Decomposition
(TDD) strategy aimed to improve the performance of
an intrusion detection system. It is based on the con-
sideration that by dividing the training dataset into
several regions, it is possible to characterize differ-
ent scenarios in terms of time (number of events) and
characteristics (features), allowing the definition of a
more effective analysis model. Each region is used in
order to train an independent evaluation model, and
the final classification of each new event is performed
by taking into account the classifications of all the in-
volved models, in a fusion fashion regulated by a ma-
jority voting criterion. In order to be able to apply the
TDD strategy, regardless the used dataset (i.e., differ-
ent number of events and features), some rules have
been also formalized.
As next step, we will experiment the proposed
strategy in the context of a real-world dataset such as,
for instance, the largely used in the literature NSL-
KDD
1
one, which includes a large number of different
network events in terms of protocols and intrusions,
allowing us to verify the hypothesis behind this work.
REFERENCES
Aloqaily, M., Otoum, S., Al Ridhawi, I., and Jararweh, Y.
(2019). An intrusion detection system for connected
vehicles in smart cities. Ad Hoc Networks, 90:101842.
Amrita, K. K. R. (2018). A hybrid intrusion detection
system: Integrating hybrid feature selection approach
with heterogeneous ensemble of intelligent classifiers.
International Journal of Network Security (IJNS’18),
20(1):41–55.
Anderson, J. P. (1980). Computer security threat monitoring
and surveillance.
Axelsson, S. (2000). Intrusion detection systems: A survey
and taxonomy. Technical report, Technical report.
Breiman, L. (1996). Bagging predictors. Machine learning,
24(2):123–140.
Bronte, R., Shahriar, H., and Haddad, H. M. (2016). A
signature-based intrusion detection system for web
applications based on genetic algorithm. In Proceed-
ings of the 9th International Conference on Security
of Information and Networks, pages 32–39.
Carta, S., Corriga, A., Ferreira, A., Podda, A. S., and Recu-
pero, D. R. (2020a). A multi-layer and multi-ensemble
stock trader using deep learning and deep reinforce-
ment learning. Applied Intelligence, pages 1–17.
Carta, S., Podda, A. S., Reforgiato Recupero, D. R., and
Saia, R. (2020b). A local feature engineering strategy
to improve network anomaly detection. Future Inter-
net, 12(10):177.
Chang, H.-H. and Meyerhoefer, C. D. (2020). Covid-19 and
the demand for online food shopping services: Empir-
ical evidence from taiwan. American Journal of Agri-
cultural Economics.
Chuang, H.-M., Huang, H.-Y., Liu, F., and Tsai, C.-H.
(2019). Classification of intrusion detection system
based on machine learning. In International Cogni-
tive Cities Conference, pages 492–498. Springer.
Dhawan, S. (2020). Online learning: A panacea in the time
of covid-19 crisis. Journal of Educational Technology
Systems, 49(1):5–22.
Gao, X., Shan, C., Hu, C., Niu, Z., and Liu, Z. (2019). An
adaptive ensemble machine learning model for intru-
sion detection. IEEE Access, 7:82512–82521.
1
https://github.com/defcom17/NSL KDD
KDIR 2021 - 13th International Conference on Knowledge Discovery and Information Retrieval
246
Jose, S., Malathi, D., Reddy, B., and Jayaseeli, D. (2018). A
survey on anomaly based host intrusion detection sys-
tem. In Journal of Physics: Conference Series, vol-
ume 1000, page 012049. IOP Publishing.
Kanimozhi, V. and Jacob, T. P. (2019). Artificial intelli-
gence based network intrusion detection with hyper-
parameter optimization tuning on the realistic cy-
ber dataset cse-cic-ids2018 using cloud computing.
In 2019 International Conference on Communication
and Signal Processing (ICCSP), pages 0033–0036.
IEEE.
Khraisat, A., Gondal, I., Vamplew, P., and Kamruzzaman,
J. (2019). Survey of intrusion detection systems:
techniques, datasets and challenges. Cybersecurity,
2(1):20.
Latha, S. and Prakash, S. J. (2017). A survey on network at-
tacks and intrusion detection systems. In 2017 4th In-
ternational Conference on Advanced Computing and
Communication Systems (ICACCS), pages 1–7. IEEE.
Le, T.-T.-H., Kim, Y., Kim, H., et al. (2019). Network intru-
sion detection based on novel feature selection model
and various recurrent neural networks. Applied Sci-
ences, 9(7):1392.
Li, Y. and Lu, Y. (2019). Lstm-ba: Ddos detection approach
combining lstm and bayes. In 2019 Seventh Interna-
tional Conference on Advanced Cloud and Big Data
(CBD), pages 180–185. IEEE.
Li, Z., Das, A., and Zhou, J. (2005). Usaid: Unifying
signature-based and anomaly-based intrusion detec-
tion. In Pacific-Asia Conference on Knowledge Dis-
covery and Data Mining, pages 702–712. Springer.
Liao, H.-J., Lin, C.-H. R., Lin, Y.-C., and Tung, K.-Y.
(2013). Intrusion detection system: A comprehensive
review. Journal of Network and Computer Applica-
tions, 36(1):16–24.
Longo, R., Podda, A. S., and Saia, R. (2020). Analysis of a
consensus protocol for extending consistent subchains
on the bitcoin blockchain. Computation, 8(3):67.
Luker, M. A. and Petersen, R. J. (2003). Computer and
network security in higher education. Jossey-Bass San
Francisco, CA.
Mazini, M., Shirazi, B., and Mahdavi, I. (2019). Anomaly
network-based intrusion detection system using a re-
liable hybrid artificial bee colony and adaboost algo-
rithms. Journal of King Saud University-Computer
and Information Sciences, 31(4):541–553.
Meyer, M. L. B. and Labit, Y. (2020). Combining machine
learning and behavior analysis techniques for network
security. In 2020 International Conference on Infor-
mation Networking (ICOIN), pages 580–583. IEEE.
Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., and
Rajarajan, M. (2013). A survey of intrusion detection
techniques in cloud. Journal of network and computer
applications, 36(1):42–57.
Munaiah, N., Meneely, A., Wilson, R., and Short, B. (2016).
Are intrusion detection studies evaluated consistently?
a systematic literature review.
Potluri, S. and Diedrich, C. (2016). High performance in-
trusion detection and prevention systems: A survey. In
ECCWS2016-Proceedings fo the 15th European Con-
ference on Cyber Warfare and Security, page 260.
Academic Conferences and publishing limited.
Radhakrishnan, K., Menon, R. R., and Nath, H. V. (2019).
A survey of zero-day malware attacks and its detection
methodology. In TENCON 2019-2019 IEEE Region
10 Conference (TENCON), pages 533–539. IEEE.
Rapanta, C., Botturi, L., Goodyear, P., Gu
`
ardia, L., and
Koole, M. (2020). Online university teaching during
and after the covid-19 crisis: Refocusing teacher pres-
ence and learning activity. Postdigital Science and Ed-
ucation, 2(3):923–945.
Rehman, R., Hazarika, G., and Chetia, G. (2011). Malware
threats and mitigation strategies: a survey. Journal
of Theoretical and Applied Information Technology,
29(2):69–73.
Saia, R. (2017). A discrete wavelet transform approach to
fraud detection. In International Conference on Net-
work and System Security, pages 464–474. Springer.
Saia, R. and Carta, S. (2016). A linear-dependence-based
approach to design proactive credit scoring models. In
KDIR, pages 111–120.
Saia, R. and Carta, S. (2017a). Evaluating credit card trans-
actions in the frequency domain for a proactive fraud
detection approach. In SECRYPT, pages 335–342.
Saia, R. and Carta, S. (2017b). A fourier spectral pattern
analysis to design credit scoring models. In Proceed-
ings of the 1st International Conference on Internet of
Things and Machine Learning, page 18. ACM.
Saia, R., Carta, S., et al. (2017). A frequency-domain-
based pattern mining for credit card fraud detection.
In IoTBDS, pages 386–391.
Saia, R., Carta, S., and Fenu, G. (2018a). A wavelet-based
data analysis to credit scoring. In Proceedings of the
2nd International Conference on Digital Signal Pro-
cessing, pages 176–180. ACM.
Saia, R., Carta, S., and Recupero, D. R. (2018b). A
probabilistic-driven ensemble approach to perform
event classification in intrusion detection system. In
KDIR, pages 139–146.
Saia, R., Carta, S., Recupero, D. R., and Fenu, G. (2020).
A feature space transformation to intrusion detection
systems. In KDIR. ScitePress.
Saia, R., Carta, S., Recupero, D. R., Fenu, G., and Stan-
ciu, M. (2019). A discretized extended feature space
(defs) model to improve the anomaly detection per-
formance in network intrusion detection systems. In
KDIR, pages 322–329.
Salahdine, F. and Kaabouch, N. (2019). Social engineering
attacks: A survey. Future Internet, 11(4):89.
Samrin, R. and Vasumathi, D. (2017). Review on anomaly
based network intrusion detection system. In 2017
International Conference on Electrical, Electronics,
Communication, Computer, and Optimization Tech-
niques (ICEECCOT), pages 141–147. IEEE.
Tidjon, L. N., Frappier, M., and Mammar, A. (2019).
Intrusion detection systems: A cross-domain
overview. IEEE Communications Surveys & Tutori-
als, 21(4):3639–3681.
Decomposing Training Data to Improve Network Intrusion Detection Performance
247
Vieira, E., Ferreira, J., and Bartolomeu, P. C. (2020).
Blockchain technologies for iot applications: Use-
cases and limitations. In 2020 25th IEEE Interna-
tional Conference on Emerging Technologies and Fac-
tory Automation (ETFA), volume 1, pages 1560–1567.
IEEE.
Wazid, M., Zeadally, S., and Das, A. K. (2019). Mobile
banking: evolution and threats: malware threats and
security solutions. IEEE Consumer Electronics Mag-
azine, 8(2):56–60.
Y
¨
uksel, B., K
¨
upc¸
¨
u, A., and
¨
Ozkasap,
¨
O. (2017). Research
issues for privacy and security of electronic health ser-
vices. Future Generation Computer Systems, 68:1–13.
Zarpel
˜
ao, B. B., Miani, R. S., Kawakani, C. T., and de Al-
varenga, S. C. (2017). A survey of intrusion detection
in internet of things. Journal of Network and Com-
puter Applications, 84:25–37.
Zuech, R., Khoshgoftaar, T. M., and Wald, R. (2015). Intru-
sion detection and big heterogeneous data: a survey.
Journal of Big Data, 2(1):3.
KDIR 2021 - 13th International Conference on Knowledge Discovery and Information Retrieval
248