A Distributed Intelligent Intrusion Detection System based on Parallel
Machine Learning and Big Data Analysis
Faten Louati
1,4 a
, Farah Barika Ktata
2,4
and Ikram Amous Ben Amor
3,4
1
Faculty of Economics and Management of Sfax, Tunisia
2
Higher Institute of Applied Sciences and Technology of Sousse, Tunisia
3
National School of Electronics and Telecommunications of Sfax, Tunisia
4
Multimedia, InfoRmation Systems and Advanced Computing Laboratory (MIRACL), Tunisia
Keywords:
Intrusion Detection, Big Data, Reinforcement Learning, Multi Agent System.
Abstract:
Networking security continue to be a serious challenge for all domains because of the increasing number of
attacks launched every day due to the advent of connected devices and the emergence of the Internet. Hence,
Intrusion detection system comes into focus, especially with the inception of big data challenges. In this
paper, we propose a distributed and parallel intrusion detection system suitable for big data environments
using machine learning-based multi agent system and big data analysis.
1 INTRODUCTION
Machine Learning (ML) techniques are investigated
for malicious purposes as well as for good purposes.
In fact, Hackers spares no effort to benefit from the
novel technologies and intelligent ML algorithms to
launch novel attacks. For this reason, researchers are
facing a very serious challenge, they should find so-
lutions to secure networks from zero day attacks and
win the war. The largest amount of proposed solu-
tions is based on Intrusion Detection Systems (IDS).
An IDS is a kind of software that monitors, ana-
lyzes networking traffics and send an alert automat-
ically once a malicious activity is detected (Louati
and Ktata, 2020). Two major techniques are used in
the intrusion detection field, namely signature-based
technique (misuse detection) and anomaly-based de-
tection (behavioral detection). Signature-based tech-
nique is based on a set of rules (signatures) stored in a
database and present the attacks known by the system.
The technique consists of comparing the activities of
the network with the stored patterns, if any similarity
is detected, then an alarm is sent automatically to the
administrator. The main advantage of this technique
is that it reduces considerably the false alarm rate.
However, it fails to detect new attacks, even a little
deviation in known attacks can deceive it. However,
hackers quickly learned to use a variety of techniques
a
https://orcid.org/0000-0002-8582-6092
to modify their attacks to avoid detection. Thus, sig-
nature database should be frequently updated which
is a hard task. Anomaly-based detection in turn con-
sists of comparing the activities in the network with
the normal behavior. Hence, the alarm is triggered
when an abnormal event is detected. Therefore, this
technique can efficiently detect known, as well as un-
known attacks. However, it generates a high false
alarm rate, which is the main drawback of this tech-
nique.
Most of the existing and well-used IDSs are
signature-based (Kukielka and Kotulski, 2008), so
they usually fail to detect unseen attacks, also they
are not able to handle big data requirements. To
tackle this problem, we propose an intelligent and dis-
tributed IDS using multi agent system based on paral-
lel ML algorithms.
The remaining part of the paper is organized in the
following way: In section 2 we discuss same previ-
ous works. The description of the proposed IDS and
our contributions are introduced in section 3. Finally,
section 4 concludes the entire work and describes our
future plans.
2 RELATED WORKS
Several previous works are investigated to build an in-
trusion detection mechanism for networks. Recently,
152
Louati, F., Ktata, F. and Ben Amor, I.
A Distributed Intelligent Intrusion Detection System based on Parallel Machine Learning and Big Data Analysis.
DOI: 10.5220/0010886300003118
In Proceedings of the 11th International Conference on Sensor Networks (SENSORNETS 2022), pages 152-157
ISBN: 978-989-758-551-7; ISSN: 2184-4380
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
with the emergence of ML, many researches benefit
from it. In instance, (Diro and Chilamkurti, 2017)
presented distributed and parallel IDS for IoT using
DL. Experimental results on the NSL KDD dataset
show that DL provides greater accuracy compared
to other ML algorithms, however, it consumes larger
time in training.
The proposed intrusion detection framework in
(Rathore and Park, 2018) is based on semi-supervised
learning. To this end, the authors combine two algo-
rithms, namely, Extreme Learning Machine (ELM),
in which the neural network is composed of a sin-
gle hidden layer, and semi supervised Fuzzy c-means
(SFCM) which is based on the concept of different
types of prior knowledge to enhance clustering per-
formance. In this paper labels are considered as the
prior knowledge.
(Pajouh et al., 2018) used deep recurrent neural
network (RNN) to detect malware by analyzing ap-
plication’s operation codes. The evaluation of the pro-
posed model achieved an accuracy of 98%.
The work in (Mahmudul et al., 2019) compared
different ML algorithms, namely, logistic Regression,
support vector machine (SVM), decision tree, random
forest and artificial neural network and the results
show that random forest is the most efficient tech-
nique in detect attacks for IoT networks with 99.4%
of accuracy rate.
(Zhang et al., 2019) proposed an intrusion detec-
tion scheme using genetic algorithm (GA) and deep
believe network (DBN). Evaluation on NSL KDD
dataset provided more than 99% of detection rate.
Multi agent system (MAS) approach has been also
exploited in detection task. In instance, (Achbarou
et al., 2018) proposed a distributed IDS (DIDS) for
cloud computing that combines both misuse detec-
tion and anomaly detection techniques. The pro-
posed model is based on a number of agents. The
authors performed a comparison between their model
and similar systems that not used MAS and they con-
cluded that IDS with MAS worked better in term of
efficiency and detection time.
Other works used MAS approach in intrusion de-
tection such as (Al-Yaseen et al., 2016) in which au-
thors used two classifiers with MAS namely SVM and
ELM, and (Mehmood et al., 2018) which used naive
Bayes algorithm.
There are some existing works that used reinforce-
ment learning (RL) algorithm with MAS technology
in intrusion detection task for IoT such as (Servin and
Kudenko, 2007), however, there is a lack of commu-
nication between agents which is an essential require-
ment to create MAS.
(Arel et al., 2010) provided an IDS based on DRL
for cloud computing.
(Nie et al., 2021) introduce an IDS for green IoT
(composed of a number of devices connected to ap-
plication) based on DRL, the model performs good
results (more than 99% of prediction rate).
(Sethi et al., 2020), (Gu et al., 2020) and (Arel
et al., 2010) propose an IDS for cloud computing and
IoT based on DRL.
Ather works investigated zero shot learning (ZSL)
such as, (Zhang et al., 2020) who used ZSL based on
autoencoder. This method gives 88.3% of accuracy
rate with NSL KDD dataset.
(Zerhoudi et al., 2020) proposed an IDS using zero
shot recognition via graph embedding. The results
show an accuracy rate = 88.3% in NSL KDD dataset.
Although those cited works performed very good
results, their proposed solutions are not suitable for
new environments based on big data. Hence, they are
not able to deal with big data challenges.
Few papers focus on Intrusion detection systems
in big data frameworks (Hassan et al., 2020), such as
the work of (Terzi et al., 2017) which detects network
anomaly from Netflow data using clustering algo-
rithm. The work was tested on the CTU-i3 dataset and
performs 96% of accuracy rate. However, it causes
hiqh false alarm rate (FAR).
In addition, (Hassan et al., 2020) used conven-
tional neural network (CNN) and weight dropped
long short-term memory (WDLSTM) network in big
data context. Experiments are performed on UNSW-
NB15 and show good results (e.g. accuracy=97.17%).
3 METHODOLOGY
3.1 Concepts
3.1.1 Big Data
The term big data dates back to 2005, it is typically
defined with three words: volume, velocity and vari-
ety (the famous 3Vs of big data):
Volume: Size of the data; Big data refers to a huge
amount of data.
Velocity: The speed of the data; Big data charac-
terized by its speed of movement.
Variety: Diversity of data; Varied Big data are
coming from diverse sources, it could be struc-
tured, non structured or semi structured.
A Distributed Intelligent Intrusion Detection System based on Parallel Machine Learning and Big Data Analysis
153
3.1.2 Multi Agent System
Multi agent system is a collection of entities called
agents that are communicating with each other. Ac-
cording to the definition of FIPA (Foundation for the
Intelligent Physical Agent), an agent is a kind of en-
tity with autonomy, activity, mobility, re-activity, so-
ciality, intelligence and other features (FIPA, 2002)
(Wooldridge, 2009). (Oprea, 2004) defines the multi-
agent system equation as it ”states that in a multi-
agent system a task is solved by agents that commu-
nicate among them.
3.2 Proposed Solution
Our solution consists of merging machine learning al-
gorithms with Big data analysis techniques to create
an intrusion detection system able to perform detec-
tion in new big data environments. Big data means
a huge amount of data and that what new machine
learning algorithms need to make their models.
Also benefiting from multi agent system is a
promising idea as multi agent systems help to im-
prove reliability and availability and ensure informa-
tion collecting, sharing and processing in distributed
networks.
Figure 1 depicts the structure of our solution: The
model is composed of a combination of multi agent
system, machine learning algorithm and big data ana-
lytic. Firstly, the network traffic is collected by Agent
Sniffer then sent to the Agent preprocessor to be pro-
cessed. After that, the data are sent to Hadoop frame-
work to be analyzed in parallel manner. This frame-
work in turn is composed of a name node which con-
sists of an Agent decision maker and a number of data
nodes (at least three nodes). The agent decision maker
runs reinforcement learning (RL) algorithm -which is
sub-filed of machine learning- in the name node and
detect intrusions.
RL is about an agent interacting with the envi-
ronment, learning an optimal policy, by trial and er-
ror (and not by simple recognition of data that may
have been provided to it which is the case of tradi-
tional ML techniques): the program repeats one or
more experiences a large number of times in order to
learn to take good choices and avoid bad choices in a
given situation. This learning method allows agents
to understand the environment in which they operate,
to build their own decision-making mechanism, and
to act with more rationality. Moreover, this learning
technique could ultimately make agents even smarter
by allowing them to react to unexpected situations.
See (Li, 2017) for deep details.
Due to the dynamic changes of networks, new
kinds of attacks are emerging every day. DRL could
be a good choice to deal with this challenge as it is
enable the agent to learn with a rational manner by
optimizing its choices from previous experiences sim-
ulated by himself.
All the agents in the MAS ensure communications
between them.
Merging Big data analysis, recent machine learn-
ing algorithms and multi agent system seems promis-
ing solution since it performs 1) parallels of the pro-
cessing of large volume of data by means Hadoop
framework, 2) Excellent classification of the data us-
ing efficient machine learning classifiers and 3) En-
hance time consumption and performance using multi
agent system.
3.3 Experiments
This paper presents a work in progress, the implemen-
tation of this framework is not yet achieved. So far,
we build an IDS based on Deep Reinforcement learn-
ing agent, This agent is expected to be executed in the
name node.
RL agent is trained on NSL KDD which is a very
large dataset so it well represents the context of big
data. NSL KDD (NSL, ) is an enhanced version of
KDD 99 benchmark dataset (Ring et al., 2019) and is
well known and wide used by researchers in intrusion
detection field. Figure2 describes the dataset which
is composed of three sub-datasets KDDTrain+20%
composed of 25192 samples, KDDTrain+ composed
of 125973 samples and KDDTest+ composed of
22544 samples. The dataset is about 42 feature; the
last one presents whether the record is normal or one
of the four known networking attacks namely, Denial
of Service (DoS) Probe, User to Root (U2R) and Re-
mote to Local (R2L).
Firstly, the dataset was preprocessed. The prepro-
cessing phase is composed of three steps 1)Numeri-
calization; as there are symbolic and categorical fea-
tures in the dataset 2)Removing attributes with miss-
ing data 3) Data scaling because the data have varying
ranges.
The RL agent is based on deep Q network(DQN)
algorithm which the neural network is composed of 1
input layer (122 neurons, i.e the number of features
of NSL KDD data set after being preprocessed), three
hidden layers composed of 80, 50 and 20 neurons re-
spectively and an output layer composed of 5 neurons
which refer to the five classes (normal, DoS, U2R,
Probe and R2L). All the layers are fully connected
and use Relu as activation function. Tables 2 and 1
describe the parameters of the model which performs
good results (see Table 3).
SENSORNETS 2022 - 11th International Conference on Sensor Networks
154
Figure 1: The sructure of the proposed solution.
Figure 2: Description of NSL KDD dataset (Saporit, 2019).
Table 1: Reinforcement learning components.
Algorithm Environment States Actions Rewards
DQN Network Features of the dataset attack types + normal 1 if good classification
0 if not
The objective of this paper is to share our solution
and explain our idea. The next step consists of :
1) Improving the results
2) Implementing the whole proposed scheme, i.e.
Hadoop and multi agent system
3) Performing experimentation with the dataset as
well as real network traffic
4) Performing near-real time detection.
A Distributed Intelligent Intrusion Detection System based on Parallel Machine Learning and Big Data Analysis
155
Table 2: Reinforcement learning parameters.
Policy Epsilon greedy
Optimizer Adam
batch size 1
minibatch size 500
decay rate 0.99
gamma 0.001
Table 3: Experimental results.
Accuracy 75.65%
Precision 79.51%
F1 score 72.58%
Recall 75.65%
4 CONCLUSION
In this paper, we presented a short description of our
proposed solution for security problem in networks.
We introduced firstly the general context of this re-
search, then we listed a number of existing solution.
in literature. Next we described our solution and our
contributions. This work is an initial proposal; The
next steps are implementation and evaluation of our
proposed model using conventional metrics to show
its efficiency in the detection of zero-day attacks in
real time.
REFERENCES
Nsl kdd. https://www.unb.ca/cic/datasets/nsl.html.
Achbarou, O., Kiram, M., Bourkoukou, O., and Elbouanani,
S. (2018). A Multi-agent System-Based Distributed
Intrusion Detection System for a Cloud Comput-
ing: MEDI 2018 International Workshops, DETECT,
MEDI4SG, IWCFS, REMEDY, Marrakesh, Morocco,
October 24–26, 2018, Proceedings, pages 98–107.
Al-Yaseen, A. P. D. W., Othman, Z., and Ahmad Nazri,
M. Z. (2016). Real-time multi-agent system for an
adaptive intrusion detection system. Pattern Recogni-
tion Letters, 85.
Arel, I., Liu, C., Urbanik, T., and Kohls, A. (2010). Rein-
forcement learning-based multi-agent system for net-
work traffic signal control. Intelligent Transport Sys-
tems, IET, 4:128 – 135.
Diro, A. and Chilamkurti, N. (2017). Distributed attack de-
tection scheme using deep learning approach for inter-
net of things. Future Generation Computer Systems.
FIPA (2002). Fipa abstract architecture specifica-
tion. Available from: http://www. fipa.org/
specs/fipa00001/SC00001L.html.
Gu, T., Abhishek, A., Fu, H., Zhang, H., Basu, D., and
Mohapatra, P. (2020). Towards learning-automation
iot attack detection through reinforcement learning.
In 2020 IEEE 21st International Symposium on ”A
World of Wireless, Mobile and Multimedia Networks”
(WoWMoM), pages 88–97.
Hassan, M. M., Gumaei, A. H., Alsanad, A., Alrubaian, M.,
and Fortino, G. (2020). A hybrid deep learning model
for efficient intrusion detection in big data environ-
ment. Inf. Sci., 513:386–396.
Kukielka, P. and Kotulski, Z. (2008). Analysis of differ-
ent architectures of neural networks for application in
intrusion detection systems. 2008 International Mul-
ticonference on Computer Science and Information
Technology, pages 807–811.
Li, Y. (2017). Deep reinforcement learning: An overview.
Louati, F. and Ktata, F. (2020). A deep learning-based
multi-agent system for intrusion detection. SN Applied
Sciences, 2.
Mahmudul, H., Islam, M, Z., and M.M.A., H. (2019). At-
tack and anomaly detection in iot sensors in iot sites
using machine learning approaches. 7.
Mehmood, A., Mukherjee, M., Ahmed, S. H., Song, H., and
Malik, K. (2018). Nbc-maids: Na
¨
ıve bayesian classi-
fication technique in multi-agent system-enriched ids
for securing iot against ddos attacks. The Journal of
Supercomputing, 74.
Nie, L., Sun, W., Wang, S., Ning, Z., Rodrigues, J. J.
P. C., Wu, Y., and Li, S. (2021). Intrusion detection
in green internet of things: A deep deterministic pol-
icy gradient-based algorithm. IEEE Transactions on
Green Communications and Networking, 5(2):778–
788.
Oprea, M. (2004). Applications of multi-agent systems. In
IFIP Congress Tutorials.
Pajouh, H. H., Dehghantanha, A., Khayami, R., and Choo,
K.-K. R. (2018). A deep recurrent neural network
based approach for internet of things malware threat
hunting. Future Gener. Comput. Syst., 85:88–96.
Rathore, S. and Park, J. (2018). Semi-supervised learning
based distributed attack detection framework for iot.
Appl. Soft Comput., 72:79–89.
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., and
Hotho, A. (2019). A survey of network-based intru-
sion detection data sets. Comput. Secur., 86:147–167.
Saporit, G. (2019). A deeper dive into the nsl-kdd
data set. https://towardsdatascience.com/a-deeper-
dive-into-the-nsl-kdd-data-set-15c753364657.
Servin, A. and Kudenko, D. (2007). Multi-agent reinforce-
ment learning for intrusion detection. In Adaptive
Agents and Multi-Agents Systems.
Sethi, K., Kumar, R., Prajapati, N., and Bera, P. (2020).
Deep reinforcement learning based intrusion detection
system for cloud infrastructure. In 2020 International
Conference on COMmunication Systems NETworkS
(COMSNETS), pages 1–6.
Terzi, D. S., Terzi, R., and Sagiroglu, S. (2017). Big data
analytics for network anomaly detection from netflow
data. In 2017 International Conference on Computer
Science and Engineering (UBMK), pages 592–597.
Wooldridge, M. (2009). An Introduction to MultiAgent Sys-
tems. Wiley Publishing, 2nd edition.
Zerhoudi, S., Granitzer, M., and Garchery, M. (2020). Im-
proving intrusion detection systems using zero-shot
recognition via graph embeddings. In 2020 IEEE 44th
SENSORNETS 2022 - 11th International Conference on Sensor Networks
156
Annual Computers, Software, and Applications Con-
ference (COMPSAC), pages 790–797.
Zhang, Y., Li, P., and Wang, X. (2019). Intrusion detection
for iot based on improved genetic algorithm and deep
belief network. IEEE Access, 7:31711–31722.
Zhang, Z., Liu, Q., Qiu, S., Zhou, S., and Zhang, C. (2020).
Unknown attack detection based on zero-shot learn-
ing. IEEE Access, 8:193981–193991.
A Distributed Intelligent Intrusion Detection System based on Parallel Machine Learning and Big Data Analysis
157