ent proportions (Ahmad et al., 2022). For example,
when looking at datasets to study the effects of cy-
ber attacks on network traffic flow, most of the data is
normal (no attack behaviour), and only a tiny amount
is attack data (Churcher et al., 2021).
Thus, it is essential to combat Denial of Service
(DoS) attacks. Even though work on detecting DoS
attacks has become more popular in the past few
years, it is still a big problem for IoT apps today (Adat
and Gupta, 2018). DoS attacks on IoT apps usually
have significant effects, mainly because limited sen-
sor devices make them (Adat and Gupta, 2018). IDS
is regarded as one of the most effective methods for
detecting DoS attacks. IDSs monitor system activity
in order to identify and block malicious traffic. At-
tacks can be easily detected by figuring out network
traffic’s normal pattern and size (Almomani et al.,
2016).
This paper uses an undersampling technique for
multiclass data balancing to build a classification IDS
for IoT apps called Multiclass Similarity-Based Se-
lection (MSBS). We used the WSN-DS dataset, a
multiclass imbalanced dataset with five cyber-DoS at-
tacks labeled blackhole attack, grayhole attack, flood-
ing attack, scheduling attack, and normal (no attacks).
The proposed technique balances the dataset by re-
ducing the sample size of the majority classes.
We compared the proposed method to Random
Undersampling (RUS) (Leevy et al., 2021) and the
multi-label approach for Tomek Link undersampling
(MLTL) (Pereira et al., 2020) to test it. In order
to evaluate the three undersampling techniques, we
used the widely used machine learning algorithms
named K-Nearest neighbours (kNN), Logical Regres-
sion, and Naive Bayes. In addition, the evaluation pa-
rameters accuracy, precision, sensitivity, specificity,
F-measure, area under the curve (AUC), and G-means
were used to compare the classification performance
between the proposed technique and the other two un-
dersampling techniques.
The following is a summary of the main contribu-
tions of this study:
1. In the paper, an undersampling technique was de-
veloped for IDS to find cyber-DoS attacks, and its
effectiveness in a big data environment was con-
firmed.
2. It has been shown that the MSBS undersampling
technique is better at finding cyber-DoS attacks in
IoT apps than the other undersampling techniques
in the literature.
3. The proposed method is evaluated with three dis-
tinct machine learning classification algorithms to
assess its effectiveness. The results showed that
the proposed method performed significantly bet-
ter than the methods described in the literature.
The rest of the paper is organized: Section 2 pro-
vides review-related work. Section 3 provides an
overview of the WSN-DS dataset used to classify
cyber-DoS attacks and the proposed MSBS under-
sampling technique. Section 4 presents the results and
discussion. Finally, conclusions and suggestions for
future research are presented in Section 5.
2 LITERTIAL REVIEW
In recent years, numerous IDSs have been proposed
in the literature and are used to monitor IoT devices
against various cyber-DoS attacks.
(Almomani et al., 2016) created a specialized
dataset for WSN networks that he called WSN-DS.
This dataset was based on the network traffic in wire-
less sensor nodes and included four types of cyber-
DoS attacks: blackhole, grayhole, flooding, and
scheduling. Using this dataset, the authors trained an
artificial neural network (ANN) to detect and classify
DoS attacks without considering the dataset’s balanc-
ing. Experiments show that DoS attacks were more
accurately detected when one hidden layer was used.
(Kumari and Mehta, 2020) developed an
ensemble-based intrusion detection model using vari-
ous ML classification algorithms, including Decision
Tree, J48, and Support Vector Machine (SVM) The
nine most relevant and significant intrusion detection
features from the KDD99 dataset were determined
using particle swarm optimization. The proposed
model produced results that were 90% more accurate.
(Pokharel et al., 2020) present a hybrid IDS
model of Naive Bayes and SVM. A real-time histor-
ical log dataset was normalized and preprocessed for
this study. After enhancement, the proposed model
achieved 95% accuracy and precision. In addition, it
has been demonstrated that classifier performance im-
proved when session-based features were added.
(Kumari and Mehta, 2020) evaluate Bayesian net-
works and RandomTree classifiers with ensemble
learning. On the KDDcup99 dataset, the ensemble
IDS model was compared to base classifiers for accu-
racy, precision, and recall. This study concludes that
the proposed model has a better effect on precision
and recall than the accuracy rate and claims that IDS
presents a sound effect for the whole dataset, no mat-
ter the sample size. Furthermore, the Bayesian net-
work performs better on small datasets, while Ran-
domTree does better on large ones.
(Vinayakumar et al., 2019) proposed a scal-
able, hybrid DNN framework called Scale-Hybrid-
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
862