PDF Malware Detection based on Stacking Learning

Maryam Issakhani, Princy Victor, Ali Tekeoglu, Arash Lashkari

2022

Abstract

Over the years, Portable Document Format (PDF) has become the most popular content presenting format among users due to its flexibility and easy-to-work features. However, advanced features such as JavaScript or file embedding make them an attractive target to exploit by attackers. Due to the complex PDF structure and sophistication of attacks, traditional detection approaches such as Anti-Viruses can detect only specific types of threats as they rely on signature-based techniques. Even though state-of-the-art researches utilize AI technology for a higher PDF Malware detection rate, the evasive malicious PDF files are still a security threat. This paper proposes a framework to address this gap by extracting 28 static representative features from PDF files with 12 being novel,and feeding to the stacking ML models for detecting evasive malicious PDF files. We evaluated our solution on two different datasets, Contagio and a newly generated evasive PDF dataset (Evasive-PDFMal2022). In the first evaluation, we achieved accuracy and F1-score of 99.89% and 99.86%, which outperforms the existing models. Then, we re-evaluated the proposed model using the newly generated evasive PDF dataset (Evasive-PDFMal2022)as an improved version of Contagio. As a result, we achieved 98.69% and 98.77% as accuracy and F1-scores, demonstrating the effectiveness of our proposed model. A comparison with state-of-the-art methods proves that our proposed work is more resilient to detect evasive malicious PDF files.

Download


Paper Citation


in Harvard Style

Issakhani M., Victor P., Tekeoglu A. and Lashkari A. (2022). PDF Malware Detection based on Stacking Learning. In Proceedings of the 8th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-553-1, pages 562-570. DOI: 10.5220/0010908400003120


in Bibtex Style

@conference{icissp22,
author={Maryam Issakhani and Princy Victor and Ali Tekeoglu and Arash Lashkari},
title={PDF Malware Detection based on Stacking Learning},
booktitle={Proceedings of the 8th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2022},
pages={562-570},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010908400003120},
isbn={978-989-758-553-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 8th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - PDF Malware Detection based on Stacking Learning
SN - 978-989-758-553-1
AU - Issakhani M.
AU - Victor P.
AU - Tekeoglu A.
AU - Lashkari A.
PY - 2022
SP - 562
EP - 570
DO - 10.5220/0010908400003120