Representation of PE Files using LSTM Networks

Martin Jureček, Matouš Kozák

Abstract

An ever-growing number of malicious attacks on IT infrastructures calls for new and efficient methods of protection. In this paper, we focus on malware detection using the Long Short-Term Memory (LSTM) as a preprocessing tool to increase the classification accuracy of machine learning algorithms. To represent the malicious and benign programs, we used features extracted from files in the PE file format. We created a large dataset on which we performed common feature preparation and feature selection techniques. With the help of various LSTM and Bidirectional LSTM (BLSTM) network architectures, we further transformed the collected features and trained other supervised ML algorithms on both transformed and vanilla datasets. Transformation by deep (4 hidden layers) versions of LSTM and BLSTM networks performed well and decreased the error rate of several state-of-the-art machine learning algorithms significantly. For each machine learning algorithm considered in our experiments, the LSTM-based transformation of the feature space results in decreasing the corresponding error rate by more than 58.60 %, in comparison when the feature space was not transformed using LSTM network.

Download


Paper Citation


in Harvard Style

Jureček M. and Kozák M. (2021). Representation of PE Files using LSTM Networks.In Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-491-6, pages 516-525. DOI: 10.5220/0010257105160525


in Bibtex Style

@conference{icissp21,
author={Martin Jureček and Matouš Kozák},
title={Representation of PE Files using LSTM Networks},
booktitle={Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2021},
pages={516-525},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010257105160525},
isbn={978-989-758-491-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - Representation of PE Files using LSTM Networks
SN - 978-989-758-491-6
AU - Jureček M.
AU - Kozák M.
PY - 2021
SP - 516
EP - 525
DO - 10.5220/0010257105160525