Malicious PDF Documents Detection using Machine Learning Techniques - A Practical Approach with Cloud Computing Applications

Jose Torres, Sergio De los Santos

2018

Abstract

PDF has been historically used as a popular way to spread malware. The file is opened because of the confidence the user has in this format, and malware executed because of any vulnerability found in the reader that parses the file and gets to execute code. Most of the time, JavaScript is involved in some way in this process, exploiting the vulnerability or tricking the user to get infected. This work aims to verify whether using Machine Learning techniques for malware detection in PDF documents with JavaScript embedded could result in an effective way to reinforce traditional solutions like antivirus, sandboxes, etc. Additionally, we have developed a base framework for malware detection in PDF files, specially designed for cloud computing services, that allows to analyse documents online without needing the document content itself, thus preserving privacy. In this paper we will present the comparison results between different supervised machine learning algorithms in malware detection and a overall description of our classification framework.

Download


Paper Citation


in Harvard Style

Torres J. and De los Santos S. (2018). Malicious PDF Documents Detection using Machine Learning Techniques - A Practical Approach with Cloud Computing Applications.In Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-282-0, pages 337-344. DOI: 10.5220/0006609503370344


in Bibtex Style

@conference{icissp18,
author={Jose Torres and Sergio De los Santos},
title={Malicious PDF Documents Detection using Machine Learning Techniques - A Practical Approach with Cloud Computing Applications},
booktitle={Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2018},
pages={337-344},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006609503370344},
isbn={978-989-758-282-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - Malicious PDF Documents Detection using Machine Learning Techniques - A Practical Approach with Cloud Computing Applications
SN - 978-989-758-282-0
AU - Torres J.
AU - De los Santos S.
PY - 2018
SP - 337
EP - 344
DO - 10.5220/0006609503370344