Ransomware Detection with Deep Neural Networks

Matan Davidian

, Natalia Vanetik

and Michael Kiperberg

Shamoon College of Engineering, Beer Sheva, Israel

Keywords:

Dynamic Analysis, Malware Detection, Neural Networks, Ransomware.

Abstract:

The number of reported malware and their average identiﬁcation time increases each year, thus increasing

the mitigation cost. Static analysis techniques cannot reliably detect polymorphic and metamorphic mal-

ware, while dynamic analysis is more effective in detecting advanced malware, especially when the analysis

is performed using machine-learning techniques. This paper presents a novel approach for the detection of

ransomware, a particular type of malware. The approach uses word embeddings to represent system call fea-

tures and deep neural networks such as Convolutional Neural Networks (CNN) and Long Short-Term Memory

Networks (LSTM). The evaluation, performed on two datasets, shows that the described approach achieves a

detection rate of over 99% for ransomware samples.

1 INTRODUCTION

The AV-TEST Institute

registered 1139 million ma-

licious programs in 2020. The number of reported

malware increases each year. According to IBM

the average time to identify a breach was 206 days

in 2019—a 5% increase over the identiﬁcation time

in 2018. The costs associated with a breach increase

with its identiﬁcation time. Therefore, better identiﬁ-

cation techniques are required to shorten the identiﬁ-

cation time and lower the associated costs.

Malicious programs vary by their goal. Ran-

somware is a type of malicious software (malware),

which when deployed on the computer encrypts or

locks a computer or ﬁles, requesting that a ransom

be paid to the author of the ransomware for the suc-

cessful decryption and release of the user’s data and

system. Ransomware aims to compromise the avail-

ability, conﬁdentiality, and integrity of the victim’s

data (Sharma and Sahay, 2016; Egunjobi et al., 2019).

The majority of techniques for detecting and clas-

sifying malware involve the use of static or dynamic

features (Islam et al., 2013). The most basic ap-

proach for malware detection is checking program

https://orcid.org/0000-0002-3676-0066

https://orcid.org/0000-0002-4939-1415

https://orcid.org/0000-0001-8906-5940

https://www.av-test.org/en/statistics/malware/

https://www.ibm.com/downloads/cas/ZBZLY7KL?

ga=2.148238199.1762516747.1577395260-1128561362.

1577395260

samples against a predeﬁned repository of patterns.

Pattern comparison is usually performed as an ex-

act match of executable ﬁle signatures generated, for

instance, by the SHA-256 cryptographic hash func-

tion (Gilbert and Handschuh, 2003). In other cases,

a pattern is described by a regular expression over

instructions (Idika and Mathur, 2007). For this ap-

proach, patterns should be sufﬁciently general to de-

scribe slight malware variations but not too broad to

capture benign programs.

Unfortunately, polymorphic and metamorphic

malware mutate themselves upon each replication,

thus allowing them to achieve a high degree of vari-

ance. These types of malware evade static analysis

by obfuscating their semantics and altering their syn-

tactic structure on each replication. However, mal-

ware’s behavior remains intact between replications.

This is the main motivation for using dynamic analy-

sis for malware detection. Unlike static analysis, dy-

namic analysis is immune to evasions based on obfus-

cation, making it the preferred choice for analyzing

unknown, zero-day malware.

The goal of behavioral analysis is to classify a pro-

gram as malicious based on its behavior. Conceptu-

ally, the behavioral analysis system consists of two

components: a process or a system monitor and an

analyzer. The monitoring component can be imple-

mented as a user-mode agent, a kernel module, a hy-

pervisor, an emulator, or an external device. More

privileged and stealthy monitoring components de-

liver more reliable descriptions of the system’s behav-

ior.

656

Davidian, M., Vanetik, N. and Kiperberg, M.

Ransomware Detection with Deep Neural Networks.

DOI: 10.5220/0011008000003120

In Proceedings of the 8th International Conference on Information Systems Security and Privacy (ICISSP 2022), pages 656-663

ISBN: 978-989-758-553-1; ISSN: 2184-4356

While the analyzer can be based on deterministic

policies with certain guarantees regarding the clas-

siﬁcation outcomes, writing such policies requires a

deep understanding of the overall system operation

and the security risks. Therefore, machine learning

techniques become more favorable for securing sys-

tems from known and unknown malware.

In this paper, we propose a method for the clas-

siﬁcation of system call sequences that utilizes Deep

Neural Networks (DNNs). We divide all system calls

made by a process into relatively short sequences, and

then classify those sequences as belonging to either

malicious (belonging to ransomware) or benign pro-

cesses.

The main contribution of this paper is three-fold:

1. two new datasets for training machine learn-

ing models for malware detection called REY

(RnsomwarE sYstem calls) and CryptoRansom.

2. a novel representation of string Win32 API

attributes of a system call with word vec-

tors (Mikolov et al., 2013) (also called word em-

beddings) that express how a word is used in the

text and what it means; they are learned by con-

sidering the context in which the words appear;

3. a novel approach for Ransomware detection

that uses Convolutional Neural Networks

(CNN) (Kim, 2014), Long Short-Term Mem-

ory (LSTM) neural networks (Hochreiter and

Schmidhuber, 1997) and their combinations. We

compare our approach with baseline machine

learning methods and state-of-the-art solutions

and show that our method achieves both higher

accuracy (the ratio of correctly classiﬁed samples)

and higher sensitivity (the ratio of correctly clas-

siﬁed samples for the positive, i.e., ransomware,

class of samples) than the baselines.

This paper is organized as follows. Section 2 de-

scribes related work; Section 3 describes dataset con-

struction, representation of data features, and neural

models we employ for data classiﬁcation. Section 4

is dedicated to experimental evaluation, and Section 5

concludes our work.

2 BACKGROUND AND RELATED

WORK

Malware analysis systems can be classiﬁed as static

or dynamic. Static analysis systems (Iwamoto and

Wasaki, 2012; Fereidooni et al., 2016) extract infor-

mation from the malicious ﬁle itself without execut-

ing it. This type of analysis is fast and secure be-

cause the malicious behavior is never invoked. Un-

fortunately, modern malware employs evasion tech-

niques whose purpose is to conceal its malicious in-

tent. In particular, polymorphism and metamorphism

are widely used in modern malware (You and Yim,

2010), thus imposing limitations on the applicability

of static analysis (Moser et al., 2007).

The dynamic analysis method (Or-Meir et al.,

2019) attempts to solve the difﬁculties imposed by

polymorphic and metamorphic malware. This method

concentrates on the behavior of the malware rather

than on its structure. The dynamic analysis system

executes the potentially malicious program, observes

its actions, and classiﬁes them as malicious or benign.

Dynamic analysis systems vary by multiple as-

pects: some systems perform the analysis online dur-

ing normal computer operation and notify the system

administrator about any potential detections (Leon

et al., 2021). Other dynamic analysis systems perform

the analysis in an isolated environment (Jamalpur

et al., 2018).

Another aspect by which dynamic analysis sys-

tems vary is the set of observed actions. Some

systems acquire ﬁne-grained information about mal-

ware’s execution, like calls to system functions (sys-

tem calls) (Jamalpur et al., 2018) or even private func-

tions of the malware itself (Dash et al., 2016). Other

systems collect general information, like CPU utiliza-

tion or values of performance counters (Demme et al.,

2013; Zhou et al., 2018; Bahador et al., 2014; Alam

et al., 2020).

Finally, dynamic analysis systems differ in the

analysis method of the acquired information. Some

systems determine whether the acquired behavior is

malicious or benign according to a set of pre-deﬁned

rules of malicious behavior (Amit et al., 2021). Other

systems, and this work, use machine-learning tech-

niques for the classiﬁcation of malicious behavior.

The machine-learning techniques can operate on

dynamic features (observed actions) (Or-Meir et al.,

2019), static features (e.g., executable ﬁles) (Iwamoto

and Wasaki, 2012), or a combination of the

two (Zhou, 2018). Multiple Machine Learning (ML)

designs have been used for malware detection and

classiﬁcation. Extensive surveys of these works

can be found in (Sneha et al., 2020). Traditional

techniques include Random Forests (RF), Decision

Trees (DT), Support Vector Machines (SVM), k-

nearest neighbors algorithm (kNN), Logistic Regres-

sion (LR), Multi-Layered Perceptron (MCP), and so

on. Additionally, multiple techniques based on deep

neural networks (NN) have been utilized for ran-

somware detection. We mention the most prominent

papers in Table 1.

Our method belongs to the dynamic analysis cat-

Ransomware Detection with Deep Neural Networks

657

Table 1: ML methods and feature representations in works

malware detection and classiﬁcation.

paper methods

(Islam et al., 2013) SVM, RF, DT

(Han et al., 2019) rule-based, tf-idf

(Rhode et al., 2019) NN, RF, SVM

(Zhang et al., 2019) DT, RF, tf-idf

(Khan et al., 2020) DNA sequences, feature selection

(Cusack et al., 2018) RF

(Alhawi et al., 2018) Bayes Network, MCP, RF, KNN, LR

(Scalas et al., 2019) RF

(Chen et al., 2019) TF-IDF, Linear Discriminant Analy-

sis, Extremely Randomized Trees

(Pal et al., 2016) RF

(Shaukat and Ribeiro, 2018) LR, SVM, NN, RF

Gradient Tree Boosting

(Egunjobi et al., 2019) Naive Bayes

(Vinayakumar et al., 2017) shallow and deep NN

(Rhode et al., 2018) RNN

(Homayoun et al., 2019) LSTM, CNN

(Agrawal et al., 2019) LSTM with attention

(Al-Hawawreh and Sitnikova, 2019) Convolutional Autoencoders (CAE),

Variational Autoencoders (VAE)

(Arabo et al., 2020) traditional ML and NN

egory as it uses information about system calls and

neural ML classiﬁers for malware detection. We aim

to use Win32 API attributes of system calls issued by

processes to determine whether a subsequence of pro-

cess system calls belongs to malicious ransomware.

3 PROPOSED METHOD

3.1 Dataset Construction

3.1.1 Data Collection

Our datasets consist of execution logs of benign pro-

grams and ransomware. Execution logs were col-

lected using Windows Internals Process Monitor (Mi-

crosoft Corporation, ) under Windows 7 SP1 run-

ning on a virtual machine. We chose this OS be-

cause it has been shown to have more vulnerabil-

ities related to ransomware in comparison to Win-

dows 10 OS (Zavarsky et al., 2016). The execution

strategy was different for benign and malicious sam-

ples. Benign logs were collected by running the PC-

Mark 8 benchmark (UL Benchmarks, ) until comple-

tion. Speciﬁcally, we executed the “Storage” and the

“ Work” tests provided by PCMark.

We have extracted a different number of system

calls for benign and ransomware samples. Our goal

was to obtain a balanced dataset with a comparable

number of system calls in both categories. Therefore,

we extracted a different number of system calls from

each sample to achieve this foal. We note that during

“malicious runs” the system calls of benign processes

were also recorded. However, these system calls were

excluded from the dataset.

We have recorded all Win32 API attributes that

appeared in the runs, but only a subset of them was

included in the datasets, as explained below. We in-

cluded the following attributes: (1) process identi-

ﬁer, denoted by PID, which is a unique positive in-

teger number, (2) system call name, which stands for

an operation performed by the call, (3) return value,

a string attribute representing either that the operation

completed successfully or, if not, the error that oc-

curred during its execution, (4) duration in seconds,

computed from the start and end time of a process,

(5) system call arguments. The remaining attributes

were excluded from the dataset because they provide

either irrelevant (e.g., parent process identiﬁer, image

path) or overly speciﬁc information (e.g., thread iden-

tiﬁer, completion time) that cannot be used to train a

generalizable ML model. A ﬁnal dataset, therefore,

contains a separate sequence of system calls for each

process. Each sequence is identiﬁed by the run num-

ber and the process identiﬁer.

3.1.2 REY (RansomwarE sYstem Calls) Dataset

The REY dataset includes a single ransomware sam-

ple — the Jigsaw ransomware. In addition, it contains

30 benign programs that are distributed with the Win-

dows OS.

Two runs were made to build REY, as speciﬁed

below.

1. The ﬁrst run with the ‘jigsaw’ malware running in

the background, that took 13 minutes. At that time

994,816 system calls were performed, 613,501

system calls of the malware. The system calls the

malware only, from 150,000th to 236,505th, mak-

ing it a total of 86,505 calls.

2. The second recording with only benign processes

running in the background lasted for 10 minutes.

30 different processes were running. For every

process, we added its ﬁrst 5,000 system calls or

less (if a process performed less than 5,000 calls)

to the dataset.

The ﬁnal size of the REY dataset is 173,010, with

86,505 malicious and 86,505 benign calls.

3.1.3 CryptoRansom Dataset

The CryptoRansom dataset contains recordings of

46 different benign processes and 12 different ran-

somware samples. The ransomware samples are a

subset of the VirusShare (Corvus Forensics, ) reposi-

tory, a publicly available repository of malware sam-

ples. Each sample of the VirusShare repository was

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

658

submitted to VirusTotal (VirusTotal, ) for classiﬁ-

cation by multiple antivirus vendors. If the classi-

ﬁcation string of at least 10 vendors contained the

substring ”ransom”, we included that sample in our

dataset. Additionally, we veriﬁed manually that these

processes indeed encrypt ﬁles.

Malicious samples (ransomware) were executed

for 15 minutes. After the execution period of each

sample, the virtual machines were reset to their orig-

inal state. The process that performed the highest

number of system calls during this 15 minutes-period

was considered malicious, and it was included in the

dataset. Moreover, because malicious processes man-

ifest their malicious behavior after they complete their

initialization process, we extract from the recorded

system call sequence the last 10,000 entries. We

recorded every one of the ransomware sample runs for

20 minutes, and included the last 10,000 system calls

from these runs into the dataset. To collect data for

benign processes, we performed the following three

runs.

1. 10 minute recording of processes running in the

background. There were 26 different processes,

and we collected the ﬁrst 5,000 system calls for

every process. Overall, 36,664 system calls were

collected during this run.

2. PCMark8 benchmark complete Work tests (UL

Benchmarks, ). The Work tests are designed for

testing typical ofﬁce notebooks and desktop PCs

that lack media capabilities and they contain the

following workloads: Web Browsing, Writing,

Video Chat, Spreadsheet.

There were 7 different processes (one for every

test), and we collected the ﬁrst 5,000 system calls

(at most) for every process. Overall, 35,000 sys-

tem calls were collected during this run.

3. PCMark8 benchmark complete Storage tests that

contain the following workload traces – Adobe

Photoshop light, Adobe Photoshop heavy, Adobe

Illustrator, Adobe InDesign, Adobe After Ef-

fects, Microsoft Word, Microsoft Excel, Mi-

crosoft PowerPoint, World of Warcraft, Battleﬁeld

There were 10 different tests run on a single pro-

cess, and we collected the ﬁrst 5,000 system calls

(at most), making it a total of 50,000 system calls.

As a result, the CryptoRansom dataset is balanced and

includes the data for 46 different benign processes and

12 different ransomware samples, with 121,664 sys-

tem calls of benign processes(50.34%) and 120,000

processes of ransomware processes (49.66%).

3.2 Data Representation

In this section, we describe the Win32 API attributes

of system calls that we have used in our neural models

and baselines, and representations we built for these

features.

3.2.1 Feature Selection

In general, attributes for Win32 API functions can be

divided into three categories by their domain - time at-

tributes, string attributes, and numeric attributes. We

collected all of the attributes produced by processes.

We did not use time attributes in our data model, but

used subsets of string and numeric attributes that do

not identify the process.

3.2.2 Feature Representation

All string attributes were ﬁrst split to separate words

by using either delimiter (e.g., underscore, space, tab

as in ’END OF FILE’) or capital letters (as in ’Query-

BasicInformationFile’). Additionally, some shortcuts

were expanded into full words using hand-crafted

rules – for example, ’RegQueryKey’ was replaced

with ’Registry Query Key’. Finally, all strings rep-

resenting words were transformed into lower-case.

As a result, every string attribute a is associated

with a sequence of English words (w

,...,w

). For

every word w

of an attribute a we extracted its k-

dimensional word vector

−→

(see (Mikolov et al.,

2013)). We have used fastText vectors pre-trained on

English webcrawl and Wikipedia of length k = 300

(see (Grave et al., 2018)). The ﬁnal representation

of an attribute a is an average of word vectors of its

words:

vec

= avg(

−→

,...,

−→

) (1)

All numeric attribute values were max-min nor-

malized and preﬁx zero-padded to vectors of length

300, namely

norm

= (0,...,0,

max

−a

min

), (2)

where a is the original numeric attribute value, and

max

and a

min

are the maximal and the minimal values

of this attribute in the dataset.

3.3 The Pipeline

The ﬁnal pipeline of our approach includes the fol-

lowing steps: (1) data preprocessing, during which

the parameters are collected for a process, as de-

scribed in Section 3.2.1; (2) data representation where

all features are represented by numeric vectors as de-

scribed in Section 3.2.2; (3) randomly splitting the

Ransomware Detection with Deep Neural Networks

659

Table 2: Data summary.

dataset virus total virus benign majority

types syscalls syscalls syscalls (%)

REY 1 173,010 86,505 86,505 50

CryptoRansom 12 241,664 120,000 121,664 50.34

process data into training and test sets; (4) generation

of system call sequences for train and test set sepa-

rately; (5) training and evaluation of a neural model.

This approach is depicted in Figure 1.

3.4 Neural Classiﬁcation Models

To represent a sequence of system calls of length W ,

we use the following neural models:

1. 2-dimensional CNN, where every system calls a

300-dimensional representing either a numeric or

a string attribute as described in Section 3.2.2.

A system call sequence of length W is then rep-

resented by N

features

× 300 × W tensor S

features

300 × W , where N

features

= 57 is the number of

system calls features.

This model uses two convolutional layers with

kernel size 3 and ”relu” activation, followed by

three decreasing fully connected layers.

2. Many-to-one LSTM layer with the number of

neurons set to N

features

with sigmoid activation.

3. CNN-LSTM that combines the CNN and the

LSTM; it is built from a two-dimensional CNN

layer followed by a unidirectional LSTM layer.

For all the models loss function was deﬁned to be

binary crossentropy, and Adam optimization algo-

rithm (Chang et al., 2018) was used.

4 EXPERIMENTS

4.1 Datasets

We evaluate our approach on two datasets – REY and

CryptoRansom – whose construction is described in

detail in Section 3.1. In all cases, the data were ran-

domly shufﬂed and split to the 80% training set/20%

test set. All the methods were trained on the training

set and evaluated on the test set; evaluation results are

reported below. A summary of dataset parameters is

given in Table 2, together with the majority vote.

4.2 Setup and Metrics

The tests were performed on a server with Tesla

K80(NVIDIA) GPU, Intel Xeon 2.3GHz CPU, 24

GB or RAM and 400GB SSD. Neural models were

implemented using Keras (Chollet et al., 2015) with

the TensorFlow backend (Abadi et al., 2015), and we

used sklearn (Pedregosa et al., 2011) implementations

of RF and SVM algorithms. We report binary classi-

ﬁcation accuracy (measured as the ratio of correctly

labeled samples to all samples), the runtime it too to

train and test each system, and sensitivity (the propor-

tion of true positives to all positives).

4.3 Evaluation Results and Analysis

We used RandomForest (Ho, 1995; Breiman, 2001)

(RF) and Support Vector Machine (SVM) (Cortes

and Vapnik, 1995), with linear and poly-kernel, as

baseline models. We have also used a dense Fully-

Connected Neural model (FCN) with 7 layers as a

neural baseline. The neural models we compared to

baselines are the CNN model, the LSTM model, and

the stacked CNN-LSTM model, all described in Sec-

tion 3.4. All neural models, including the baseline,

ran for 10 epochs with batch size 12.

Table 3 shows evaluations results for baselines on

both datasets. We report binary classiﬁcation accu-

racy and sensitivity for the ’Yes’ class, e.g., the ratio

of correctly determined virus sequences to all virus

sequences in the data. SVM with poly kernel on the

CryptoRansom dataset was discovered to be too slow

and it was stopped after 10 hrs of training; therefore,

its results are not reported. Table 4 shows evaluations

results for neural models that use CNN or LSTM or

both; W denotes the length of system call sequences

analyzed by a model. The aim of this experiment

was also to determine what sequence length is opti-

mal w.r.t. the accuracy and training and test times;

best scores are marked in bold.

As can be seen from Table 4, classiﬁcation ac-

curacy is very high for CNN and CNN-LSTM mod-

els, and it is much lower for the LSTM model. Dif-

ferent values of W produce minor variation in accu-

racy scores of CNN and CNN-LSTM models, and we

conducted tests to check if the difference in these re-

sults is statistically signiﬁcant by applying Wilcoxon

paired non-parametric two-tail test (Wilcoxon, 1992).

It is worth noting that the Wilcoxon test applied to

the CryptoRansom dataset showed no statistical sig-

niﬁcance in any of the above tests, and therefore we

focused our efforts on the REY dataset.

Table 5 demonstrates results of prediction com-

parison for different values of W for both CNN and

CNN-LSTM models on the REY dataset, and com-

parison of predictions of these models for the same W .

We can conclude that the difference between predic-

tions of CNN and CNN-LSTM for the same W is sta-

tistically signiﬁcant in every case, while CNN model

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

660

processes

system calls

features

string features

numeric features

0-1 normalization

word embedding

data representation

train sequences

test sequences

neural model

evaluate

train

Figure 1: Pipeline of the ransomware detection with neural models.

Table 3: Baselines evaluation scores and runtime.

REY dataset

baseline train time (m) test time (m) acc (%)

RF 36.7 0.02 87.85

SVM (linear kernel) 7.5 0.003 83.28

SVM (poly kernel) 59 11.2 52.05

FCN 4.61 0.47 85.7

CryptoRansom dataset

baseline train time (m) test time (m) acc (%)

RF 14.25 0.04 93.34

SVM (linear kernel) 174.57 0.73 92.97

SVM (poly kernel)

- - -

FCN 14.25 0.83 93.34

Table 4: Evaluation scores and runtime for CNN, LSTM

and CNN-LSTM neural models; W is the length of a system

calls sequence. All times are in minutes.

REY dataset

model W acc (%) sens (%) train time test time

CNN 3 96.67 93.71 4.15 0.028

CNN 6 99.51 99.02 6.88 0.567

CNN 9 99.41 99.23 10.75 0.789

CNN 12 99.74 99.51 18.83 1.191

CNN 15 99.82 99.65 36.13 1.516

LSTM 3 51.40 100 3.46 0.4

LSTM 6 95.23 97.02 7.01 0.539

LSTM 9 85.12 86.31 10.95 0.8

LSTM 12 79.76 82.55 18.83 1.842

LSTM 15 82.34 87.68 35.88 1.442

CNN-LSTM 3 95.79 93.81 4.54 0.031

CNN-LSTM 6 97.93 98.76 8.23 0.551

CNN-LSTM 9 99.62 99.23 12.36 0.812

CNN-LSTM 12 99.71 99.41 20.8 1.192

CNN-LSTM 15 99.74 99.48 36.73 1.054

CryptoRansom dataset

model W acc (%) sens (%) train time test time

CNN 3 95.31 97.59 2.27 0.13

CNN 6 99.27 99.29 5.72 0.7

CNN 9 99.59 99.77 9.92 2.02

CNN 12 99.53 99.95 20.1 2.51

CNN 15 99.70 99.70 39.1 2.28

LSTM 3 89.72 83.23 4.86 0.14

LSTM 6 89.73 85.47 10.96 0.71

LSTM 9 92.14 88.93 16.68 5.68

LSTM 12 48.77 48.83 24.39 2.47

LSTM 15 94.47 95.42 42.51 2.17

CNN-LSTM 3 97.32 96.74 6.14 0.38

CNN-LSTM 6 98.84 98.39 11.19 0.72

CNN-LSTM 9 99.60 99.65 17.02 1.32

CNN-LSTM 12 99.69 99.72 24.8 2.51

CNN-LSTM 15 99.74 99.82 72.15 2.35

predictions do not improve from W = 9 to W = 15.

Because the highest accuracy is achieved by CNN

with W = 6, we selected this model as the best one.

Table 5: Statistical signiﬁcance tests on the REY dataset for

CNN and CNN-LSTM models.

model from W to W signiﬁcant? p-value

CNN-LSTM 3 6 yes 0.0057

CNN-LSTM 6 9 yes <0.0001

CNN-LSTM 9 12 no 0.6694

CNN-LSTM 12 15 no 0.9115

CNN 3 6 yes 0.0124

CNN 6 9 yes 0.0056

CNN 9 12 no 0.3482

CNN 12 15 no 0.9557

model 1 model 2 W signiﬁcant? p-value

CNN CNN-LSTM 3 yes <0.0001

CNN CNN-LSTM 6 yes <0.0001

CNN CNN-LSTM 9 yes <0.0001

CNN CNN-LSTM 12 no 0.125

CNN CNN-LSTM 15 yes 0.0146

5 CONCLUSIONS

In this paper, we present a method for ransomware

detection that classiﬁes system call sequences of a

process as belonging to either malicious and or be-

nign categories with very high accuracy. We rep-

resented processes as sequences of system calls and

used NLP-based representation of system call fea-

tures that are words or multi-word expressions. Then,

we deﬁned neural models that use CNN and LSTM

neural layers, and process system call sequences of

predeﬁned length W . We evaluated our approach on

two datasets: (1) REY dataset, and (2) CryptoRan-

som dataset, whose construction and annotation are

described in Section 3.1.

From the evaluation, we can conclude that models

using CNN as their ﬁrst layers, such as pure CNN and

CNN-LSTM, achieve the highest classiﬁcation accu-

racy of over 99% on both datasets, while the pure

LSTM model falls far behind. Moreover, statistical

signiﬁcance tests we conducted show that there is no

Ransomware Detection with Deep Neural Networks

661

need to enlarge W beyond 6 and that the results of

the CNN neural model are just as good as the results

of the CNN-LSTM that contains a recurrent layer and

is therefore slower. From a practical perspective, it

means that short system calls sequences are enough

to determine whether or not the process is malicious,

and that this detection can be done using a fast pre-

trained CNN model.

REFERENCES

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,

Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,

M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,

G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-

lur, M., Levenberg, J., Man

e, D., Monga, R., Moore,

S., Murray, D., Olah, C., Schuster, M., Shlens, J.,

Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-

houcke, V., Vasudevan, V., Vi

egas, F., Vinyals, O.,

Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and

Zheng, X. (2015). TensorFlow: Large-scale machine

learning on heterogeneous systems. Software avail-

able from tensorﬂow.org.

Agrawal, R., Stokes, J. W., Selvaraj, K., and Marinescu, M.

(2019). Attention in recurrent neural networks for ran-

somware detection. In ICASSP 2019-2019 IEEE In-

ternational Conference on Acoustics, Speech and Sig-

nal Processing (ICASSP), pages 3222–3226. IEEE.

Al-Hawawreh, M. and Sitnikova, E. (2019). Leveraging

deep learning models for ransomware detection in the

industrial internet of things environment. In 2019 Mil-

itary Communications and Information Systems Con-

ference (MilCIS), pages 1–6. IEEE.

Alam, M., Sinha, S., Bhattacharya, S., Dutta, S.,

Mukhopadhyay, D., and Chattopadhyay, A. (2020).

Rapper: Ransomware prevention via performance

counters. arXiv preprint arXiv:2004.01712.

Alhawi, O. M., Baldwin, J., and Dehghantanha, A. (2018).

Leveraging machine learning techniques for windows

ransomware network trafﬁc detection. In Cyber threat

intelligence, pages 93–106. Springer.

Amit, G., Yeshooroon, A., Kiperberg, M., and Zaidenberg,

N. J. (2021). Dlp-visor: A hypervisor-based data leak-

age prevention system. In ICISSP, pages 416–423.

Arabo, A., Dijoux, R., Poulain, T., and Chevalier, G. (2020).

Detecting ransomware using process behavior analy-

sis. Procedia Computer Science, 168:289–296.

Bahador, M. B., Abadi, M., and Tajoddin, A. (2014). Hpc-

malhunter: Behavioral malware detection using hard-

ware performance counters and singular value de-

composition. In 2014 4th International Conference

on Computer and Knowledge Engineering (ICCKE),

pages 703–708. IEEE.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Chang, Z., Zhang, Y., and Chen, W. (2018). Effective adam-

optimized lstm neural network for electricity price

forecasting. In 2018 IEEE 9th international confer-

ence on software engineering and service science (IC-

SESS), pages 245–248. IEEE.

Chen, Q., Islam, S. R., Haswell, H., and Bridges, R. A.

(2019). Automated ransomware behavior analysis:

Pattern extraction and early detection. In International

Conference on Science of Cyber Security, pages 199–

214. Springer.

Chollet, F. et al. (2015). Keras. https://github.com/fchollet/

keras.

Cortes, C. and Vapnik, V. (1995). Support-vector networks.

Machine learning, 20(3):273–297.

Corvus Forensics. Virusshare.com. https://virusshare.com/.

Accessed: 2021-05-11.

Cusack, G., Michel, O., and Keller, E. (2018). Machine

learning-based detection of ransomware using sdn. In

Proceedings of the 2018 ACM International Workshop

on Security in Software Deﬁned Networks & Network

Function Virtualization, pages 1–6.

Dash, S. K., Suarez-Tangil, G., Khan, S., Tam, K., Ahmadi,

M., Kinder, J., and Cavallaro, L. (2016). Droidscribe:

Classifying android malware based on runtime behav-

ior. In 2016 IEEE Security and Privacy Workshops

(SPW), pages 252–261. IEEE.

Demme, J., Maycock, M., Schmitz, J., Tang, A., Waksman,

A., Sethumadhavan, S., and Stolfo, S. (2013). On

the feasibility of online malware detection with per-

formance counters. ACM SIGARCH Computer Archi-

tecture News, 41(3):559–570.

Egunjobi, S., Parkinson, S., and Crampton, A. (2019).

Classifying ransomware using machine learning al-

gorithms. In International Conference on Intelligent

Data Engineering and Automated Learning, pages

45–52. Springer.

Fereidooni, H., Conti, M., Yao, D., and Sperduti, A. (2016).

Anastasia: Android malware detection using static

analysis of applications. In 2016 8th IFIP interna-

tional conference on new technologies, mobility and

security (NTMS), pages 1–5. IEEE.

Gilbert, H. and Handschuh, H. (2003). Security analysis

of sha-256 and sisters. In International workshop

on selected areas in cryptography, pages 175–193.

Springer.

Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and

Mikolov, T. (2018). Fasttext word vectors. https:

//fasttext.cc/docs/en/crawl-vectors.html.

Han, W., Xue, J., Wang, Y., Huang, L., Kong, Z., and Mao,

L. (2019). Maldae: Detecting and explaining malware

based on correlation and fusion of static and dynamic

characteristics. Computers & Security, 83:208–233.

Ho, T. K. (1995). Random decision forests. In Proceedings

of 3rd international conference on document analysis

and recognition, volume 1, pages 278–282. IEEE.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Homayoun, S., Dehghantanha, A., Ahmadzadeh, M.,

Hashemi, S., Khayami, R., Choo, K.-K. R., and New-

ton, D. E. (2019). Drthis: Deep ransomware threat

hunting and intelligence system at the fog layer. Fu-

ture Generation Computer Systems, 90:94–104.

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

662

Idika, N. and Mathur, A. P. (2007). A survey of malware

detection techniques. Purdue University, 48(2).

Islam, R., Tian, R., Batten, L. M., and Versteeg, S. (2013).

Classiﬁcation of malware based on integrated static

and dynamic features. Journal of Network and Com-

puter Applications, 36(2):646–656.

Iwamoto, K. and Wasaki, K. (2012). Malware classiﬁcation

based on extracted api sequences using static analy-

sis. In Proceedings of the Asian Internet Engineeering

Conference, pages 31–38.

Jamalpur, S., Navya, Y. S., Raja, P., Tagore, G., and Rao,

G. R. K. (2018). Dynamic malware analysis using

cuckoo sandbox. In 2018 Second international confer-

ence on inventive communication and computational

technologies (ICICCT), pages 1056–1060. IEEE.

Khan, F., Ncube, C., Ramasamy, L. K., Kadry, S., and Nam,

Y. (2020). A digital dna sequencing engine for ran-

somware detection using machine learning. IEEE Ac-

cess, 8:119710–119719.

Kim, Y. (2014). Convolutional neural networks for sentence

classiﬁcation. arXiv preprint arXiv:1408.5882.

Leon, R. S., Kiperberg, M., Zabag, A. A. L., and Zaiden-

berg, N. J. (2021). Hypervisor-assisted dynamic mal-

ware analysis. Cybersecurity, 4(1):1–14.

Microsoft Corporation. Process monitor. https:

//docs.microsoft.com/en-us/sysinternals/downloads/

procmon. Accessed: 2021-05-11.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and

Dean, J. (2013). Distributed representations of words

and phrases and their compositionality. In Proceed-

ings of the Advances in neural information processing

systems, pages 3111–3119.

Moser, A., Kruegel, C., and Kirda, E. (2007). Limits of

static analysis for malware detection. In Twenty-Third

Annual Computer Security Applications Conference

(ACSAC 2007), pages 421–430. IEEE.

Or-Meir, O., Nissim, N., Elovici, Y., and Rokach, L. (2019).

Dynamic malware analysis in the modern era—a state

of the art survey. ACM Computing Surveys (CSUR),

52(5):1–48.

Pal, A., Dasgupta, R., Saha, A., and Nandi, B. (2016).

Human-like sensing for robotic remote inspection

and analytics. Wireless Personal Communications,

88(1):23–38.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,

A., Cournapeau, D., Brucher, M., Perrot, M., and

Duchesnay, E. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Rhode, M., Burnap, P., and Jones, K. (2018). Early-stage

malware prediction using recurrent neural networks.

computers & security, 77:578–594.

Rhode, M., Tuson, L., Burnap, P., and Jones, K. (2019).

Lab to soc: Robust features for dynamic malware de-

tection. In 2019 49th Annual IEEE/IFIP International

Conference on Dependable Systems and Networks–

Industry Track, pages 13–16. IEEE.

Scalas, M., Maiorca, D., Mercaldo, F., Visaggio, C. A.,

Martinelli, F., and Giacinto, G. (2019). On the ef-

fectiveness of system api-related information for an-

droid ransomware detection. Computers & Security,

86:168–182.

Sharma, A. and Sahay, S. K. (2016). An effective approach

for classiﬁcation of advanced malware with high ac-

curacy. arXiv preprint arXiv:1606.06897.

Shaukat, S. K. and Ribeiro, V. J. (2018). Ransomwall:

A layered defense system against cryptographic ran-

somware attacks using machine learning. In 2018 10th

International Conference on Communication Systems

& Networks (COMSNETS), pages 356–363. IEEE.

Sneha, M., Arya, A., and Agarwal, P. (2020). Ransomware

detection techniques in the dawn of artiﬁcial intelli-

gence: A survey. In 2020 The 9th International Con-

ference on Networks, Communication and Computing,

pages 26–33.

UL Benchmarks. Benchmarks pcmark 10. https://

benchmarks.ul.com/pcmark10. Accessed: 2021-05-

11.

Vinayakumar, R., Soman, K., Velan, K. S., and Ganorkar, S.

(2017). Evaluating shallow and deep networks for ran-

somware detection and classiﬁcation. In 2017 Interna-

tional Conference on Advances in Computing, Com-

munications and Informatics (ICACCI), pages 259–

265. IEEE.

VirusTotal. Virustotal. https://virustotal.com. Accessed:

2021-05-11.

Wilcoxon, F. (1992). Individual comparisons by ranking

methods. In Breakthroughs in statistics, pages 196–

202. Springer.

You, I. and Yim, K. (2010). Malware obfuscation tech-

niques: A brief survey. In 2010 International con-

ference on broadband, wireless computing, communi-

cation and applications, pages 297–300. IEEE.

Zavarsky, P., Lindskog, D., et al. (2016). Experimen-

tal analysis of ransomware on windows and android

platforms: Evolution and characterization. Procedia

Computer Science, 94:465–472.

Zhang, H., Xiao, X., Mercaldo, F., Ni, S., Martinelli, F.,

and Sangaiah, A. K. (2019). Classiﬁcation of ran-

somware families with machine learning based on n-

gram of opcodes. Future Generation Computer Sys-

tems, 90:211–221.

Zhou, B., Gupta, A., Jahanshahi, R., Egele, M., and Joshi,

A. (2018). Hardware performance counters can detect

malware: Myth or fact? In Proceedings of the 2018 on

Asia Conference on Computer and Communications

Security, pages 457–468.

Zhou, H. (2018). Malware detection with neural network

using combined features. In China cyber security an-

nual conference, pages 96–106. Springer.

Ransomware Detection with Deep Neural Networks

663