Authors:
Anantha Rao Chukka
1
and
V. Susheela Devi
2
Affiliations:
1
Defence Research and Development Organisation, India
;
2
Indian Institute of Science, Bengaluru, Karnataka, 560012, India
Keyword(s):
Malware Detection, Machine Learning Models, Malware Analysis, API Sequences, Opcode Sequences, Import Function, File Meta Information, Malware Operational Patterns, Portable Executable, Artificial Neural Network, Support Vector Machine, Random Forest, Naive Bayes, K-nearest Neighbour.
Abstract:
In recent times malware attacks on government and private organizations are rising. These attacks are carried out to steal confidential information which leads to loss of privacy, intellectual property issues and loss of revenue. These attacks are sophisticated and described as Advanced Persistent Threats(APT). The payloads used in this type of attacks are polymorphic and metamorphic in nature and contains stealth and root-kit components. As a result the conventional defence mechanisms like rule-based and signature-based methods fail to detect these malware. So modern approaches rely on static and dynamic analysis to detect sophisticated malware. However this process generates huge log files. The domain expert needs to review these logs to classify whether the binary is malicious or benign which is tedious, time consuming and expensive. Our work uses machine learning models trained on the datasets, created using the analysis logs, to overcome these problems. In this paper a number of
supervised machine learning models are presented to classify the binary as malicious or benign. In this work we have used automated malware analysis framework to collect run time behavioural artefacts. Static analysis mainly focuses on collecting binary meta information, import functions and opcode sequences. The dataset is created by collecting malware from online sources and benign files from windows operating system and third party software.
(More)