Authors:
Xiaojuan Cai
1
and
Hiroshi Koide
2
Affiliations:
1
Department of Information Science and Technology, Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
;
2
Section of Cyber Security for Information Systems, Research Institute for Information Technology, Kyushu University, Fukuoka, Japan
Keyword(s):
Data Exfiltration, Command and Control Channel, Transfer Size Limitation, Advanced Persistent Threat, Deep Learning, Ensemble Tree, Extreme Gradient Boosting, Internet Traffic.
Abstract:
Data exfiltration of Advanced Persistent Threats (APTs) is a critical concern for high-value entities such as governments, large enterprises, and critical infrastructures, as attackers deploy increasingly sophisticated and stealthy tactics. Although extensive research has focused on methods to detect and halt APTs at the onset of an attack (e.g., examining data exfiltration over Domain Name System tunnels), there has been a lack of attention towards detecting sensitive data exfiltration once an APT has gained a foothold in the victim system. To address this gap, this paper analyzes data exfiltration detection from two new perspectives: exfiltration over a command-and-control channel and limitations on exfiltration transfer size, assuming that APT attackers have established a presence in the victim system. We introduce two detection mechanisms (Transfer Lifetime Volatility & Transfer Speed Volatility) and propose an ensemble deep learning tree model, EDeepXGB, based on eXtreme Gradien
t Boosting, to analyze data exfiltration from these perspectives. By comparing our approach with eight deep learning models (including four deep neural networks and four convolutional neural networks) and four traditional machine learning models (Naive Bayes, Quadratic Discriminant Analysis, Random Forest, and AdaBoost), our approach demonstrates competitive performance on the latest public real-world dataset (Unraveled-2023), with Precision of 91.89%, Recall of 93.19%, and F1-Score of 92.49%.
(More)