Authors:
Jose Torres
and
Sergio De los Santos
Affiliation:
Telefónica Digital España, Spain
Keyword(s):
PDF, Malware, JavaScript, Machine Learning, Malware Detection.
Abstract:
PDF has been historically used as a popular way to spread malware. The file is opened because of the confidence
the user has in this format, and malware executed because of any vulnerability found in the reader that
parses the file and gets to execute code. Most of the time, JavaScript is involved in some way in this process,
exploiting the vulnerability or tricking the user to get infected. This work aims to verify whether using Machine
Learning techniques for malware detection in PDF documents with JavaScript embedded could result in
an effective way to reinforce traditional solutions like antivirus, sandboxes, etc. Additionally, we have developed
a base framework for malware detection in PDF files, specially designed for cloud computing services,
that allows to analyse documents online without needing the document content itself, thus preserving privacy.
In this paper we will present the comparison results between different supervised machine learning algorithms
in malwar
e detection and a overall description of our classification framework.
(More)