Malware Detection in PDF Files using Machine Learning

Bonan Cuan, Aliénor Damien, Claire Delaplace, Mathieu Valois

2018

Abstract

We present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack was almost 100% successful. Next, we provided counter-measures to this attack: a more elaborated features selection and the use of a threshold allowed us to stop up to 99.99% of this attack. Finally, using adversarial learning techniques, we were able to prevent gradient-descent attacks by iteratively feeding the SVM with malicious forged PDF files. We found that after 3 iterations, every gradient-descent forged PDF file were detected, completely preventing the attack.

Download


Paper Citation


in Harvard Style

Delaplace C. and Valois M. (2018). Malware Detection in PDF Files using Machine Learning.In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications - Volume 1: SECRYPT, ISBN 978-989-758-319-3, pages 412-419. DOI: 10.5220/0006884704120419


in Bibtex Style

@conference{secrypt18,
author={Claire Delaplace and Mathieu Valois},
title={Malware Detection in PDF Files using Machine Learning},
booktitle={Proceedings of the 15th International Joint Conference on e-Business and Telecommunications - Volume 1: SECRYPT,},
year={2018},
pages={412-419},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006884704120419},
isbn={978-989-758-319-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on e-Business and Telecommunications - Volume 1: SECRYPT,
TI - Malware Detection in PDF Files using Machine Learning
SN - 978-989-758-319-3
AU - Delaplace C.
AU - Valois M.
PY - 2018
SP - 412
EP - 419
DO - 10.5220/0006884704120419