Authors:
Fouad Al Tfaily
1
;
2
;
Zakariya Ghalmane
1
;
Mortada Termos
1
;
2
;
Mohamed-el-Amine Brahmia
1
;
Ali Jaber
2
and
Mourad Zghal
1
Affiliations:
1
CESI LINEACT UR 7527, Strasbourg, France
;
2
Computer Science Department, Faculty of Sciences, Lebanese University, Beirut, Lebanon
Keyword(s):
Complex Networks, Internet of Things, Artificial Intelligence, Cyber Security, Federated Learning, Network Properties, Intrusion Detection.
Abstract:
In the cybersecurity community, finding suitable datasets for evaluating Intrusion Detection Systems (IDS) is a challenge, particularly due to limited diversity in complex network properties. This paper proposes a dual-purpose approach that generates diverse datasets while producing efficient, compact versions that maintain detection accuracy. Our approach employs three techniques - community mixing modification, centrality-based modification, and time-based modification - each targeting specific network property adjustments while achieving significant dataset size reductions (up to 81.5%). Our approach is validated on real-world datasets, including NF-UQ-NIDS, CCD-INID-V1, and TON-IoT, demonstrating its ability to generate realistic datasets while preserving network properties, attack patterns, and structural integrity. The generated datasets exhibit diverse complex network properties, making them particularly useful for IDS technique evaluation that incorporates complex network mea
sures. The reduced size and preserved accuracy (96.4%) make these datasets especially valuable for resource-constrained environments. Moreover, our approach facilitates the construction of homogeneous datasets required for federated learning situations where data distribution similarity across clients is essential. This contribution helps address both dataset scarcity and computational efficiency challenges while ensuring that the generated datasets retain the characteristics of real-world network traffic.
(More)