investigated for their ability to identify unusual
patterns in network data, there has been an important
focus area in cyber security research for a long time in
infiltration. Despite the effect of these models, inspires
questions with scalability, convenience relevance and
truth flexibility continued research for strong, hybrid
strategies (Mhawi, Aldallal, and Hassan 2022).
Despite their excellent classification accuracy, SVMs
often have a scalability problem due to their high
calculation complexity and sensitivity to
hyperparameter setting, especially when used on mass
or high-dimensional network traffic data (Sebakor
2023). The proposed model attempts to address issues
that enterprise networks face, such as strong
authentication, high data security, multilevel
protection, network traffic encryption, no insider
intrusion, strong masquerades, IPsec, port forwarding,
internet traffic filtering, and various web access
policies for users. A threat prevention policy is
required to ensure security protection (Arefin et al.
2021)(Nife and Kotulski 2020). By utilising ensemble
learning, which blends several decision trees to
increase overall prediction stability, Random Forests
(RFs) provide a strong substitute to overcome these
drawbacks. Compared to single classifiers, RFs are
less likely to overfit and naturally minimise variance.
They are ideal for intrusion detection because of their
capacity to handle high-dimensional data and evaluate
the significance of features. Additionally, RFs work
well on unbalanced datasets, preserving good recall
for minority classes, which is crucial for security
applications such as NGFWs (Jaw and Wang 2021).
Because RFs have built-in procedures for evaluating
feature importance—a critical component of
interpretability and model optimization—they are
especially well-suited for multiclass classification
problems. Convolutional and recurrent neural
networks are used in deep learning techniques that
have been suggested more recently for intrusion
detection systems (IDS) in order to identify temporal
and spatial patterns in network traffic. Although
strong, these models frequently need a lot of
resources, making them unsuitable for use in real-time
security systems such as Next-Generation Firewalls
(NGFWs)(Neupane, Haddad, and Chen 2018) (Gold
2011).Simultaneously, NGFWs have advanced
beyond conventional packet inspection to use machine
learning (ML) for behavioural analysis and intelligent
traffic profiling(Wang et al. 2023). Nevertheless, a lot
of these integrations fail to take optimised feature
selection into sufficient account, which might raise
processing overhead and reduce detection accuracy.
While specifically focussing on their integration
within the operational context of NGFWs, this
research expands upon the advantages of current ML-
based IDS models. By combining hybrid feature
selection (filter + wrapper methods) with an efficient
RF classifier, the proposed framework enhances real-
time detection performance and threat mitigation
capabilities (Wang et al. 2023)(Yin et al. 2023). It
directly addresses the challenge of balancing detection
accuracy and computational efficiency—a critical
factor in the deployment of ML-enhanced security
solutions in dynamic and large-scale networks
(Golrang et al. 2020).
3 METHODOLOGY
3.1 Dataset and NGFW Context
We use CIC-IDS2018 and KDD Cup 1999 datasets to
replicate diverse traffic patterns encountered in
NGFW environments, including both benign and
malicious traffic. The dataset contains 41 features
representing connection-level attributes and labels for
five categories: Normal, DoS, Probe, R2L, and U2R.
TCP/IP attributes (e.g., duration, protocol type,
src_bytes), content-specific indicators (e.g.,
num_failed_logins, hot), and traffic behaviour-based
statistics (e.g., count, srv_diff_host_rate).
3.2 NGFW Emulation Environment
To simulate a Next-Generation Firewall (NGFW)
using a three-stage detection pipeline, each stage plays
a crucial role in identifying, classifying, and acting on
potentially malicious network traffic. Here's a detailed
breakdown of each stage:
3.2.1 Traffic Classification
Accurately identify the type of network traffic (e.g.,
benign, suspicious, malicious) using machine learning
models trained on engineered features.
A. Data Collection
Collect real-time or batch network traffic data,
such as:
• Packet metadata (IP addresses, port numbers,
protocol types)
• Flow features (duration, byte count, packet
count)
• Application-layer metadata (HTTP headers,
TLS handshake info)
• Payload content (if available and privacy-
compliant)