On Tracking Ransomware on the File System

Luigi Catuogno

1 a

and Clemente Galdi

2 b

Dipartimento di Informatica, Universit

a degli Studi di Salerno, Fisciano, Salerno, Italy

Dipartimento di Studi Politici e Sociali, Universit

a degli Studi di Salerno, Fisciano, Salerno, Italy

Keywords:

Ransomware, Ransomware Detection, Ransomware Tracking, Malice Indicators, File System Hooking,

Testbed.

Abstract:

Ransomware detection is gaining growing importance in the scientiﬁc literature because of widespread and

economic impact of this type of malware. A successful ransomware detection system must identify a malicious

behaviour as soon as possible while reducing false positive detection. To this end, different strategies have been

explored. Recently, a promising approach has risen. It consists in looking for possible running ransomware

by measuring the different activities every process does on the ﬁlesystem. Such measurements are represented

with quantitative “indicators”. Indicators selection and their interpretation, is a critical and challenging task. In

this paper we survey some of most representative ﬁle-system centered ransomware detectors and describe their

chosen behavioural indicators and strategies used to measure them. Then we compare the different solutions

and discuss pros, cons and open issues of every approach.

1 INTRODUCTION

Young and Yung (1996) ﬁrst envisioned in 1996 the

threat of crypto-viruses, malware intended to block

(using cryptographic algorithms) or steal user data

for extortion intents. Since their seminal paper, “ran-

somware” have turn out as a major plague for desktop

computer.

Ransomware are a speciﬁc type of malware and

require speciﬁc strategies to be contained. Indeed,

nowadays ranswomware increasingly threaten tradi-

tional malware prevention systems such as signature-

based detection systems and anomaly detection sys-

tems (ADS).

On one hand, the number of ransomware that

leverage polymorphic code or that introduces some

randomization in the code execution (Ma et al., 2012,

Milosevic et al., 2016, Oberheide et al., 2009), in or-

der to circumvent signature scanning, is constantly

growing. Moreover, new virus variants evolve so

rapidly that sometimes the deployment of signature

updates gets left behind.

On the other hand, ransomware mimic ordinary

processes. Indeed, very often, ransomware samples

feature an execution proﬁle quite similar to the one

of legitimate processes, so that, it is increasingly dif-

https://orcid.org/0000-0002-6315-4221

https://orcid.org/0000-0002-2988-700X

ﬁcult, for ADSes to identify deviating activities. In

this case, raising ADSes sensitivity may result in

an unacceptable growth of false positives (Le, 2015,

Viswanathan et al., 2013).

Furthermore, generic malware present a wide

range of tasks and objectives that might require a cer-

tain time to be achieved while ransomware is imme-

diately dangerous. When a sample is discovered, it is

probably too late.

Ransomware pursue a sole plan: encrypting the

user data stored on the victim’s computer in order to

request the payment of a fee (ransom). To this end,

ransomware always perform these actions: run an al-

gorithm to select the target ﬁles; open such ﬁles and

encrypt their content; replace/overwrite their original

contents with the corresponding cyphertexts.

Ransomware are designed to encrypt as much data

as possible in the shortest possible time (i.e., likely

before being detected). Unfortunately, “traditional”

Anomaly Detection Systems need to gather some in-

formation for a while before making a decision about

any suspect ransomware process. In the meantime,

the ﬁles that have been accessed by the malicious pro-

cess are gone.

A different approach to ransomware detection can

be set up on the basis of some observation. First,

ransomware processes always perform the aforemen-

tioned operations following essentially the same pat-

210

Catuogno, L. and Galdi, C.

On Tracking Ransomware on the File System.

DOI: 10.5220/0010985000003120

In Proceedings of the 8th International Conference on Information Systems Security and Privacy (ICISSP 2022), pages 210-219

ISBN: 978-989-758-553-1; ISSN: 2184-4356

tern, whereas benign processes may not. Second, un-

like benign processes, ransomware usually perform

massive ﬁle system activity.

Taking advantage by these observations, Kharraz

et al. (2016, 2015) proposed to seek evidence of ran-

somware activities by tracking the victim’s ﬁle sys-

tem operation. So that, whenever any suspect activity

is discovered, the detection system identiﬁes the re-

sponsible precess(es) and takes the appropriate deci-

sions.

Notice that the difference of this new approach

with respect to traditional ADSes is twofold: (1) the

system looks for those processes whose activity con-

verges to a limited set of expected “malicious behav-

iors” instead of analyzing any deviation from the pre-

sumptive “normal” behavioral model; (2) the detec-

tion process focuses on the effects of the malicious

code rather than its execution.

Moreover, placing the detector at the ﬁle system

level has a valuable advantage: detectors are enabled

to trace every suspect activity. So that, whenever any

malicious action is discovered, it is possible to revert

it, as long as a certain data retention capability has

been given to the ﬁle system. This allows to recover

every damage to data the malware have done.

In the last three years, this strategy has witnessed

a remarkable development. Several solutions are cur-

rently on the stage. Main differences amongst such

proposals mainly concern of: (a) the technique each

system uses to track per-process ﬁle system activ-

ity; (b) the characteristics of this activity each system

takes in account; (c) the way in which collected in-

formation are aggregated and represented as “malice

indicators”; how suspect processes are classiﬁed ac-

cording to their indicators and how each system even-

tually takes a decision.

In this paper, we provide a survey of some of the

most representative proposals of “ﬁle system centric”

ransomware detection systems. We compare the dif-

ferent solutions according to the way each one deals

with the aforementioned aspects.

Rather than a mere survey of ransomware detec-

tion systems, the main goal of our work is a qualitative

analysis of the “toolbox” that has been made available

over time to counter the threat of ransomware. We an-

alyze the way each tool is used and how it may affect

the overall system performance. Furthermore, we aim

at brieﬂy discussing pros and cons of the different so-

lutions along with possible further developments and

challenges.

The paper is organized as follows. In Section 2 we

introduce the main techniques and frameworks ran-

somware detectors use to collect information about

how suspect processes access the ﬁle system. In Sec-

tion 3 we examine the different criteria to select the

information to be gathered for each process along

with the techniques for malice indicators computa-

tion and interpretation. Section 4 provides a brief sur-

vey of the ransomware detection systems we consid-

ered in our study. Section 5 introduces some of main

datasets related to ransomware samples activity that

have been made available to test and train new detec-

tion systems. Section 6 concludes the work.

2 TRACKING RANSOMWARE ON

THE FILE SYSTEM

In this section we survey the main techniques and

tools that are currently used to gather information

about the ﬁle system activities of running processes.

Hence, we propose a discussion about the way they

are employed for ransomware detection and some of

the most challenging open issues.

File System API Hooking. Modern operating sys-

tems provide development tools and APIs which al-

low to extend the ﬁle system services by adding

new features, as well as implementing diagnostic and

monitoring functionalities. In particular, in such OSes

the Virtual File System layer (VFS) features some

hooks in every system call implementation. This

makes possible, for example, to intercept every sys-

tem call invoked by any user-level process having r/w

access to each parameter (including read or to-be-

written data buffer), as well as intercepting its results

before the call returns. Ransomware detection may

leverage such a feature to track whatever a process

does on the ﬁle system and in the ﬁles it opens.

The Windows

Operating Systems features dif-

ferent levels of ﬁle system hooking. In particular, the

File System Miniﬁlter Driver (Microsoft Inc., 2014)

provides a high-level API which addresses moni-

toring applications, including proﬁling and antivirus

tools.

With Android’s FileObserver (Google Inc., 201x)

applications register their listener to the ﬁle system

manager in order to be notiﬁed whenever any event

such as OPEN, ATTRIB, CLOSE occurs on any ﬁle of

interest.

Virtual Machine Introspection (VMI). Virtual

Machine Introspection (VMI) is a technique ﬁrstly

proposed by Garﬁnkel and Rosenblum (2003). In

VMI, the monitored system (target) runs in a Vir-

tual Machine (VM), whereas a monitoring applica-

tion (monitor) is executed in a privileged VM or

On Tracking Ransomware on the File System

211

plugged into the hypervisor itself. The monitor has

access to every aspect of the target operation through

an interface oriented to hardware-level details such as

target’s memory page allocation, CPU register status,

interrupts, memory access and so on. Nowadays, Lib-

VMI (Payne et al., 2015) is probably one of most suc-

cessful VMI framework and is widely used for mal-

ware analysis.

However, the inspection of the target operating

system at a ﬁner-grained level (e.g., tracing per-

process ﬁle system activity) through such low-level

interfaces is quite hard due to the so called “semantic

gap” as argued by Chen and Noble (2001) and More

and Tapaswi (2014). To cope with this problem, the

monitor should be provided of adequate information

concerning the VM/Guest OS internal organization

and current status (e.g., the Guest’s kernel symbols

map, etc.)

The DRAKVUF toolkit (Lengyel et al., 2014),

is built upon LibVMI and features higher level API

which help to develop ad hoc plug-ins tailored to

monitor speciﬁc Guest OS-level data structures and

events.

Sandboxing. Sandboxing is a technique which

makes possible to safely run an untrusted/untested

software within a conﬁned execution environment

(the sandbox) which resembles the legacy execution

environment. Nevertheless, computational resources

assigned to the “sandboxed” application are strictly

controlled and conﬁgurable according the needs of

testing.

Running a malicious application into a sandbox,

allows to gather useful information about its be-

haviour ensuring that while executing, it can neither

harm the host machine nor spread through the net-

work. It has been shown in Catuogno and Galdi

(2016) that different security models, coming from

academia and industry essentially identify the same

security perimeter, thus guaranteeing the above prop-

erty being granted independently of the speciﬁc se-

curity model under evaluation. In case of malware

execution. To this end, plenty of sandboxing sys-

tems have been proposed. Amongst the most rep-

resentatives we mention the Cuckoo Sandbox (Clau-

dio Guarnieri, 2011).

We highlight some differences between sandboxes

and VMI-based tools. First, sandboxes mainly col-

lect information about “how a given malware sample

behaves” instead of “how an infected VM behaves”.

Second, sandbox components observe the target pro-

cess through the OS data structure, whereas VMI col-

lects information at a lower level. Finally, sandboxes

feature some components that run in the same envi-

ronment of the malware sample in the same environ-

ment, while VMI tools are designed to remain stealth

(and external) with respect to the guest environment.

2.1 Discussion

Ransomware detection systems (monitors) are used in

two different scenarios, each raising different issues

and requirements. We denote these scenarios respec-

tively: operational and testbed. Notice that every de-

tection system can be indifferently used in both sce-

narios.

In the operational scenario, monitors are used on

computers which are actually deployed in their oper-

ational environment (e.g., the corporate network). In

this case, primary goals are: early detection of any

in-progress threat and mitigating its effects/impact.

Performance is a crucial issue. Indeed, the moni-

tor is required not to introduce signiﬁcant overhead to

the system performance, both in terms of throughput

and in computational resource consumption.

Table 1 summarizes ﬁle system performance over-

head claimed by some of most recent proposal ad-

dressing the operational scenario and leveraging both

virtualization and hooking (such proposals are intro-

duced in Section 4). These values depend on multiple

factors and reﬂect different ways of measuring perfor-

mance.

At ﬁrst sight, in presence of virtualization-based

monitors, users/applications experience the highest

overhead. However such an overhead is mainly due

to the virtualization infrastructure rather than the de-

tection services. For example, Gutierrez et al. (2018)

highlight that overhead due the ﬁle system operation

analysis is only a slight percentage (between 0.13%

and 1.54%) of the total (which is up to 20%). More-

over, as suggested by Subedi et al. (2017), monitors

performance is signiﬁcantly inﬂuenced by the under-

lying hypervisor.

Hooking-based monitors promise better overall

performance as, in principle, throughput slowdown

mainly affects ﬁle system bound processes. How-

ever, beyond the callbacks computational costs, over-

all performance degradation largely depends on the

number of hooked calls and their invocation fre-

quency. Systems like CryptoLock (Scaife et al.,

2016) and ShieldFS (Continella et al., 2016) feature

the highest overhead as they intercept and “inspect”

numerous ﬁle system calls. Furthermore, prevent-

ing data losses by featuring per-ﬁle shadowing and

snapshotting mechanisms (such as in ShieldFS and

R2D2 (Gutierrez et al., 2018)) increases the latency

in those ﬁle system operations which entail the cre-

ation and removal of different ﬁle versions/copies.

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

212

Table 1: Ransomware detectors ﬁtting the operational application scenario: performance summary. Legenda: waylaying

techniques (WT): hooking (H), Virtual Machine (VM).

System WT Benchmark Performance overhead

CryptoLock H n/a open,read: 1ms; close: 1.58ms; write, rename: 9 − 16ms

ShieldFS H n/a “user-perceived” overhead can reach 45 − 75%

DAD H WPT, CrystalDiskMark write, info: 0.011ms, 46 − 82%

R2D2 VM PCMark 8 1.4 − 9.29% of total latency

RDS3 VM ﬁo serial read: 41 − 58%; serial write: 30 − 65% (depending on the VMM)

Redemption H IOZone read: 2.8%; write: 3.4%; rewrite and create: up to 9%

RansomCare H n/a 8.7% overhead measured during the execution of a sample task.

Measuring how much these factors affect the ﬁle

system performance, depends on: (a) ﬁle system char-

acteristics including its morphology as well as the

amount, the size and the types of ﬁles it stores; (b)

statistical considerations concerning how monitored

processes access stored ﬁles.

Furthermore, we point out that it is quite hard

doing any quantitative comparison amongst existing

anti-ransomware systems due to the variety of bench-

marking tools used by the authors (see Table 1). In

facts, different benchmarks behave rather differently

each other as they stress different aspects of the ﬁle

systems operation. For example, we have bench-

marks which perform bulk r/w operations (e.g., IO-

Zone (Don Capps et al., 2002)) as well as tools which

aim at simulating more realistic ﬁle system opera-

tion such as PCMark (Underwriters Labs LLC, 2013).

Nevertheless, we notice that even launching the same

test on differently populated ﬁle systems may lead to

rather different results.

To this regard, we argue that common guidelines

in evaluating the performance of anti-malware sys-

tems are still due.

In the testbed scenario, the emphasis is on an-

alyzing the behavior of potentially infected systems

and suspected samples, in order to acquire knowledge

suitable to improve the detection process in the oper-

ational scenario. No matter about neither the victim

system performance nor the integrity of the ﬁctitious

data it stores. Here, the main concern is making the

testbed as much realistic as possible.

Ransomware attack strategies are signiﬁcantly

driven by the victim’s ﬁle system layout. For exam-

ple, running a sample on an almost empty (or ﬂat) ﬁle

system might not give useful information about its be-

haviour if launched on a computer whose hard disk

is populated with plenty of working document ﬁles

and folders. To this end, Kharraz and Kirda (2017)

and Continella et al. (2016) modeled their testbed on

the basis of the activity of a number of volunteers

who worked on fully operational desktop computers

for several weeks. However, Palisse et al. (2017) and

Scaife et al. (2016) have based their model on sev-

eral studies concerning ﬁle systems population statis-

tics (Agrawal et al., 2007) and sample ﬁles collec-

tions (Garﬁnkel et al., 2009).

Transparency (stealthiness) of detection systems

has capital importance in both scenarios. Indeed, ran-

somware are increasing their capability of realizing

the presence of detection systems. In such a case, ma-

licious processes may adopt two strategies. On one

hand, it attempts to attack the detector, by subverting

its code or the system features it relies upon. On the

other hand, it may change its behavior (evasion) in

order to avoid to be analyzed (Bordoni et al., 2017,

Bulazel and Yener, 2017).

3 MALICE INDICATORS FOR

RANSOMWARE DETECTION

Once an observation point has been taken, detector

engines gather tracks of any operation is done on the

ﬁle system by every single process in order to recog-

nize those patterns which may lead to malicious ac-

tivities.

In literature, statistical measurements of the oc-

currence of such operation patterns are deﬁned as

malice indicators. Each indicator provides a quan-

titative representation (score) of the phenomenon it

is related to. Indicators roughly fall in two cate-

gories (Kharraz and Kirda, 2017): content-based and

behavior-based. The formers (e.g., entropy measure-

ments, similarity, ﬁle content r/w and deletion statis-

tics) are related to the way in which the content of

each ﬁle changes as a result of accessing processes

operation. The latter are about measuring the ef-

fects that any suspect process activity produces on the

whole ﬁle system.

This section brieﬂy introduces and discusses some

of main proposals about such indicators. Table 2 sum-

marizes the malice indicators considered in some of

most recently proposed ransomware detectors.

Read/write Access Statistics. These indicators

measure “how” a process reads and/or writes within

the ﬁles it opens. For example, highly frequent over-

On Tracking Ransomware on the File System

213

writes of the whole ﬁle as well as high frequency of

writes performed to multiple ﬁles might be an evi-

dence of ransomware activity.

Divergence Measures in Overwritten Buffers.

Signiﬁcant entropy variations within ﬁle read/write

buffers might entail that a process is overwriting

clear/structured data with encrypted data (Lin, 1991).

Amongst further measures commonly considered for

this purpose, we mention the Kullback-Liebler di-

vergence (Kullback and Leibler, 1951) and the χ

goodness-of-ﬁt test (Cochran, 1952, Pont et al.,

2020). Performance achieved by leveraging such met-

rics are rather variable. Content-based indicators are

sensible to the type of analyzed ﬁles and their conﬁg-

ured thresholds . We argue that a comprehensive com-

parison of the performances achieved by using these

indicators is worth of on-the-ﬁeld

Similarity. Evaluating the similarity between r/w

buffers may help to discover fraudulent encryption

activity. Similarity-aware hash functions (Roussev,

2010) have been proposed to be used in conjunction

with the entropy measurement with the aim of reduc-

ing false positives.

Moving/Removing Files. Frequency of ﬁles moves

and removals a process performs on the ﬁle system is

a behavior potentially related to ransomware activity.

In facts, frequently ransomware do not directly over-

write their target ﬁle. Instead, this kind of adversary,

creates a new ﬁle in which it stores the encrypted ver-

sion of its victim and, eventually, it deletes the origi-

nal ﬁle and replaces it with the encrypted one.

Files Type Modiﬁcation Rate. The suitability of

this indicator relies on the following fact. Data en-

cryption obviously entails that ﬁles structured accord-

ing any type (images, videos, documents, etc.) are

changed into ﬁles of unknown (or non-existing) type.

This is a straightforward consequence of ransomware

activity, whereas benign process operation with the

same effect is rather unfrequent.

File Types Access Statistics and “funneling”. File

type funneling is a quantitative measure concerning

the relationship between the number of ﬁles any pro-

cess opens and the number of their types. Applica-

tions handle (both as input and output) ﬁles belong-

ing to a well deﬁned set of types. Therefore, a pro-

cess which opens many ﬁles of an unexpectedly high

number of different types may be worth of growing

attention.

A simple (though not the only) deﬁnition for fun-

neling is due to Scaife et al. (2016):

File System Traversal. Ransomware may thor-

oughly explore the victim’s ﬁle system looking for

target ﬁles. To this end, the malicious process uses

directory-related APIs and system calls with a un-

usual frequency.

Files Access Statistics. Usually, benign processes

open a limited subset of the ﬁles stored in the whole

ﬁle system. In malicious processes, the number of

accessed ﬁles is generally larger and grows with the

execution time.

Further Malice Indicators. Further malice indica-

tors, not only related to ﬁle system operation, have

been proposed to be “aggregated” with the formers,

in order to mitigate false positives and to improve the

classiﬁcation process.

The system proposed in Song et al. (2016) lever-

ages processor, and memory usage statistics, in order

to enrich information obtained with ﬁle access moni-

toring. PayBreak Kolodenker et al. (2017a) is a sys-

tem which tells apart suspicious processes by search-

ing the memory for evidences of in-progress execu-

tion of encryption algorithms.

Hardware performance counters in ARM based

devices (running the Android OS) are the basis of the

proposal presented in Demme et al. (2013).

Power consumption promises to be a viable malice

indicator. Indeed, peaks in the amount of energy used

by a process can reveal that potential harmful activi-

ties (e.g., compulsory data encryption and ﬁle system

access) are in progress. Current operating systems

provide the possibility of measuring the power used

by a speciﬁc process/set of processes Catuogno et al.

(2017, 2018) that, in turn, might be used to identify

the resources such processes use.

T. J. Richer Richer (2017) investigates the possi-

bility of measuring the entropy of obuscated network

messages, in order to tell apart botnet (and, arguably,

ransomware) C&C communications.

3.1 Discussion

Typically, ransomware are assumed to follow a

“greedy” strategy - i.e. once in execution, they at-

tempt to seek and encrypt the largest number of user

ﬁles in the shortest possible time. This is to in-

crease the probability of capturing “precious” docu-

ments and to minimize the probability to be detected

in the meantime. Therefore, ransomware behaviors

may deviate quite soon from that of benign processes,

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

214

Table 2: Summary of most widely used indicators of potential ransomware activity.

ShieldFS

UNVEIL

Redemption

Mbol and Sadighian

DaD

Cryptolock

RansomCare

Per ﬁle R/W stats !

Entrophy variation in overwrites ! ! ! ! !

Kullback-Liebler !

Similarity !

File moving/removal stats ! ! !

Files type modiﬁcation rate ! ! !

Accessed ﬁle types stats and funneling ! !

File system traversals ! !

Files access statistics ! !

if measured with an adequate set of indicators. Nev-

ertheless, it is possible that benign processes may ap-

pear to behave as ransomware if considering single

(or few) indicators. Typical examples of such pro-

cesses are zip-archivers which usually open plenty of

ﬁles (of multiple types) and feature high differences

between read and written data entropy. However, it

must be pointed out that such “benign misbehavings”

occur only occasionally and temporarily, while within

ransomware, they happen on regular basis.

To cope with this aspect, some indicators come

with a threshold which can be chosen according dif-

ferent policies. Whenever the score of an indicator

exceeds the threshold, the event is notiﬁed to the com-

ponent which makes decision about suspected pro-

cesses (classiﬁer). Optionally, at every event occur-

rence, a counter (related to the event itself) is incre-

mented and only whether in turn the counter exceeds

its own threshold, the decision-maker component is

notiﬁed.

In order to improve the classiﬁcation process (and

in particular for the sake of reducing the rate of false

positives), indicators can be weighted on the basis of

experimental ﬁndings and incremental reﬁnements as

well as hierarchies amongst indicator can be intro-

duced (Scaife et al., 2016).

The set of thresholds and weights, implicitly de-

ﬁnes the edge between ransomware and benign soft-

ware. The point is that several researches have

pointed out that ransomware might implement strate-

gies through which they can carry out their job with-

out triggering any event or fatally delaying their dis-

covery.

Indicator Evasion. techniques have been widely

investigated in literature. Ransomware might cir-

cumvent frequency based indicators (both content-

based and behavioral) by slowing down the pace

of their ﬁle accesses; encrypting ﬁles in multiple

rounds (Continella et al., 2016) or by encrypting only

speciﬁc parts of each ﬁle instead of encrypting it en-

tirely (Kharraz et al., 2016).

Content-based indicators pose several issues.

Firstly, these metrics looks rather prone to high rates

of false positives if related to processes perform-

ing ﬁles compression, (legitimate) ﬁles encryption

and compressed images manipulation (Gaspari et al.,

2021, Pont et al., 2020).

Secondly, such indicators may be eluded by pro-

ceeding to encrypt/overwrite ﬁles content following

rather simple heuristics. For example, a malicious

process could avoid entropy related indicators by

interleaving cyphertext overwrites with low-entropy

paddings when encrypting ﬁles (Kharraz and Kirda,

2017, Mbol et al., 2016, Palisse et al., 2017).

A suitable solution to this problem is “aggregat-

ing” indicators. Indeed, it looks quite unlikely that a

malicious process is able to mimic benign processes

according all considered indicators. To this end the

CryptoLock (Scaife et al., 2016) system features a

union indicator which is triggered by the simultane-

ous occurrence of the events related to its primary in-

dicators. Experiments show that this solution helps to

lower false positives.

Multi-process ransomware are still considered an

insidious threat. In facts, such viruses distribute their

different ﬁle re-encryption subtasks among multiple

agents. Each agent is likely to avoid to be detected

due to its individual subtask may not be classiﬁed as

On Tracking Ransomware on the File System

215

malicious activity (Gaspari et al., 2020, Palisse et al.,

2017)

4 FILE SYSTEM-CENTRIC

RANSOMWARE DETECTION

This section brieﬂy reports recent ransomware de-

tection systems that use strategies described above.

Amongst seminal papers concerning ﬁle system-

centric ransomware detection, we mention the study

due Kharraz et al. (2015) and the UNVEIL sys-

tem (Kharraz et al., 2016). In particular, UNVEIL

leverages a malice indicator based on entropy. The

Redemption (Kharraz and Kirda, 2017) system, of-

fers a comprehensive indicators-based approach, and

investigates the possible solutions to aggregte multi-

ple indicators into a single malice score.

ShieldFS (Continella et al., 2016) features mul-

tiple behavioral indicators in order to detect ran-

somware activity. Indicators include: entropy, frac-

tion of accessed ﬁletypes, fraction of accessed ﬁles

(r/w). A versioned ﬁlesystem is used to reverse pos-

sible malicious ﬁles encryption. RDS3 (Subedi et al.,

2017) is similar to ShieldFS though it uses backups

data in spare/unused storage space that is supposed

ransomware cannot touch. The data retention mecha-

nism is put in charge to a trusted virtual machine.

CryptoLock (Scaife et al., 2016) follows a sim-

ilar approach. In order to reduce the occurrence

of false positives and to quicken ransomware detec-

tion, authors divides indicators in two categories: pri-

mary and secondary. Primary indicators include: per-

process ﬁle-type changing rate, Shannon entropy and

similarity measurement (Roussev, 2010). Secondary

indicators are ﬁle deletion per process and funneling.

A union indicator, related to the simultaneous rise of

the primary indicators is also featured.

Mbol et al. (2016) leverage the Kullback-Liebler

divergence (Kullback and Leibler, 1951) to realize

whether a process is turning a structured ﬁle data (e.g.,

a jpeg image) into an encrypted ﬁle.

R2D2 (Gutierrez et al., 2018) addresses the detec-

tion of so called “wiper” ransomware. The system,

which is built on top of a virtualization infrastructure,

leverages VMI to detect the execution of secure dele-

tion algorithms on the target ﬁle system.

FlashGuard (Huang et al., 2017) aims at prevent-

ing/delaying clear data deletion by leveraging SSD

hard disks properties. In facts, SSDs do not directly

overwrite chunks of data in their ﬁnal destination (due

to a certain delete latency). FlashGuard, operates at

ﬁrmware level making possible to retain such out-of-

place temporary copies as “backup” of overwritten

data.

The DaD (Data Aware Defense) (Palisse et al.,

2017) system monitors processes’ ﬁle activity at

run-time and measures variations in data distribu-

tion using a metric based on χ

goodness-of-ﬁt

test (Cochran, 1952) instead of Shannon entropy. In

order to validate the achieved results, authors also

propose a full-featured test environment Malware-o-

Matic which actually is suitable for application in the

testbed scenario. RansomCare (Faghihi and Zulker-

nine, 2021) measures the extent of changes in the ﬁles

structure and in the ﬁles content entropy. A mal-

ice score is computed for each Application. When-

ever the score reaches the conﬁgured Anomalous Data

Limit (ADL) the system stops the application and

warns the user.

Honeypots can be used to improve ransomware

detection as in DcyFS (Kohlbrenner et al., 2017) and

R-Locker (G

omez-Hern

andez et al., 2018) . The ap-

proach is the following. Decoy ﬁles (whose content is

assumed to never change) are disseminated through-

out the ﬁlesystem and an agent periodically veriﬁes

wether their content have been changed since the pre-

vious check. Changes in decoy ﬁles reveals that unau-

thorized ﬁle system activity is in progress.

5 DATASETS

Having rich and up-to-date ransomware collections

(datasets) is a central need in developing any ran-

somware detection system. Indeed, datasets are es-

sential both for the sake of detectors tuning/training

and for performance and accuracy evaluation.

Nowadays, several initiatives have arisen for the

purpose of gathering and making available samples of

malware through on-line repositories such as Virus-

Total (Chronicle, 2021), VirusShare (Corvus Foren-

sics, 2021) and Hybrid-Analysis (Hybrid Analysis

GmbH, 2018). In such services, users can (1) con-

tribute to the repository by submitting captured mal-

ware along with any available information concerning

its behavior and origin; (2) submit any suspect exe-

cutable in order to have it analyzed and (3) query the

repository for any malware according its name, clas-

siﬁcation, ﬁngerprint and so on. In some cases, mal-

ware repositories provide applications and API (e.g.,

VxAPI GmbH (2018)) to enable registered users to

automate the interaction with the database and to han-

dle high numbers of queries.

Several research labs have made available the

datasets built while developing their projects, for

the sake of the repeatability and reproducibility of

achieved results. Authors of ShieldFS Continella

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

216

Table 3: Experiments setup: testbed ﬁlesystem composition, ransomware samples, measured performance.

System testbed Act. samples TP FP FN

CryptoLock Agrawal et al. (2007), Hicks et al. (2008) 492 93% n/a n/a

ShieldFS 11 workstations held by volunteers for “several weeks” 305 97.70% 0 0

DAD DigitalCorpora (Digital Corpora Initiative, 2009) 798 99.37% n/a 0.62%

R2D2 GovDoc (Garﬁnkel et al., 2009) n/a 99.80% 0.20% 0.20%

Redemption 5 workstations held by volunteers for one week 1174 100.0% 0.8% n/a

UNVEIL n/a (“typical FS layout”) 2121 96.3% 0 n/a

RansomCare Agrawal et al. (2007), Hicks et al. (2008) 2389 98.38% 0.0049% n/a

et al. (2016) disclosed the collection of ransomware

samples, the detailed description the testbed used in

their experiments along with all produced IRP log-

ﬁles (Continella et al., 2018). Datasets and further

information about experimental results have been re-

leased in Andronio et al. (2015) for the Heldriod

project and in and in Sgandurra et al. (2016) for

the EldeRan project. Another example is the Pay-

Break (Kolodenker et al., 2017a) team that has made

available the RADDAR toolkit (Kolodenker et al.,

2017b).

6 CONCLUSION

We presented a survey on some existing ransomware

detection systems based on the measurement of quan-

titative indicators representing the activity each pro-

cess performs on the ﬁle system. This approach has

proven to be quite effective and promising. In this

paper we put the focus on the strategies in choosing

and interpreting the indicators set up in each system,

investigating their impact on effectiveness and perfor-

mance. For each considered indicator we discussed

how to face problems like containing false positive

and evasion attempts.

In conclusion we would put forth some point thats,

in our opinion, are worth of further reﬂections. First, a

common indicators speciﬁcation is still due. In some

cases, the same statistics are computed differently as

well as the same indicator has rather different deﬁ-

nitions (e.g., funneling). Second, testbed ﬁlesystems

used for performance evaluation are largely differ-

ent. This makes the different systems quite hard to

be compared as this factor remarkably affect perfor-

mance measurements. Finally, common guidelines

for performance measurements are probably due. Dif-

ferent projects leverage different benchmarking tools,

each measuring different things.

In this work, we pay a great attention to the deﬁ-

nition of malice indicators as we believe that, due to

its characteristics, ransomware can be effectively con-

tained by correctly interpretating the track it leaves on

the ﬁle system rather than attempting to recognize its

executable code.

Nevertheless, we left aside aspects related to the

classiﬁcation of suspect processes. To this end, the

overwhelming majority of ransomware detectors cur-

rently on the shelf leverages machine learning tech-

niques. Different techniques have different perfor-

mance in different scenarios (operational vs testbed).

This may depend from different factors. Moreover,

the quality of datasets and samples is very important

as well. The availability of many datasets and the

growing diffusion of on-line cooperative repositories

(such as VirusShare) has boosted the development of

new interesting solutions. However, the risk that ad-

versaries could submit misleading datasets, in order

to affect the detectors training and testing process, is

concrete and has to be coped with. We plan to extend

our investigation in both this direction.

REFERENCES

Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch,

J. R. (2007). A ﬁve-year study of ﬁle-system meta-

data. ACM Transactions on Storage (TOS), 3(3):9.

Andronio, N., Zanero, S., and Maggi, F. (2015). HelDroid:

Dissecting and detecting mobile ransomware. In In-

ternational Workshop on Recent Advances in Intru-

sion Detection, pages 382–404. Springer.

Bordoni, L., Conti, M., and Spolaor, R. (2017). Mirage: To-

ward a stealthier and modular malware analysis sand-

box for android. In European Symposium on Research

in Computer Security, pages 278–296. Springer.

Bulazel, A. and Yener, B. (2017). A survey on auto-

mated dynamic malware analysis evasion and counter-

evasion: Pc, mobile, and web. In Proceedings of the

1st Reversing and Offensive-oriented Trends Sympo-

sium, page 2. ACM.

Catuogno, L. and Galdi, C. (2016). On the evaluation of se-

curity properties of containerized systems. In 2016

15th International Conference on Ubiquitous Com-

puting and Communications and 2016 International

Symposium on Cyberspace and Security (IUCC-CSS),

pages 69–76.

Catuogno, L., Galdi, C., and Pasquino, N. (2017). Mea-

suring the effectiveness of containerization to prevent

power draining attacks. In 2017 IEEE International

On Tracking Ransomware on the File System

217

Workshop on Measurement and Networking (M N),

pages 1–6.

Catuogno, L., Galdi, C., and Pasquino, N. (2018). An effec-

tive methodology for measuring software resource us-

age. IEEE Transactions on Instrumentation and Mea-

surement, 67(10):2487–2494.

Chen, P. M. and Noble, B. D. (2001). When virtual is better

than real [operating system relocation to virtual ma-

chines]. In Proceedings Eighth Workshop on Hot Top-

ics in Operating Systems, pages 133–138.

Chronicle (2021). Virustotal community. https://www.

virustotal.com.

Claudio Guarnieri, e. a. (2011). Cuckoo sandbox web page.

https://cuckoosandbox.org.

Cochran, W. G. (1952). The χ2 test of goodness of ﬁt. The

Annals of Mathematical Statistics, pages 315–345.

Continella, A., Guagnelli, A., Zingaro, G., De Pasquale,

G., Barenghi, A., Zanero, S., and Maggi, F. (2016).

ShieldFS: a self-healing, ransomware-aware ﬁlesys-

tem. In Proceedings of the 32nd Annual Conference

on Computer Security Applications, pages 336–347.

ACM.

Continella, A., Guagnelli, A., Zingaro, G., De Pasquale,

G., Barenghi, A., Zanero, S., and Maggi, F. (2018).

Shieldfs website. http://shieldfs.necst.it/.

Corvus Forensics (2021). Virusshare repository. https://

virusshare.com.

Demme, J., Maycock, M., Schmitz, J., Tang, A., Waksman,

A., Sethumadhavan, S., and Stolfo, S. (2013). On

the feasibility of online malware detection with per-

formance counters. In ACM SIGARCH Computer Ar-

chitecture News, volume 41, pages 559–570. ACM.

Digital Corpora Initiative (2009). Corpora. http://

digitalcorpora.org/corpora/.

Don Capps et al. (2002). IOZone ﬁle system benchmark.

http://www.iozone.org/.

Faghihi, F. and Zulkernine, M. (2021). Ransomcare:

Data-centric detection and mitigation against smart-

phone crypto-ransomware. Computer Networks,

191:108011.

Garﬁnkel, S., Farrell, P., Roussev, V., and Dinolt, G. (2009).

Bringing science to digital forensics with standardized

forensic corpora. digital investigation, 6:S2–S11.

Garﬁnkel, T. and Rosenblum, M. (2003). A virtual machine

introspection based architecture for intrusion detec-

tion. In Proceedings of the Network and Distributed

System Security Symposium, NDSS 2003, San Diego,

California, USA. The Internet Society.

Gaspari, F. D., Hitaj, D., Pagnotta, G., Carli, L. D., and

Mancini, L. V. (2020). The naked sun: Malicious

cooperation between benign-looking processes. In

Conti, M., Zhou, J., Casalicchio, E., and Spognardi,

A., editors, Applied Cryptography and Network Se-

curity - 18th International Conference, ACNS 2020,

Rome, Italy, October 19-22, 2020, Proceedings, Part

II, volume 12147 of Lecture Notes in Computer Sci-

ence, pages 254–274. Springer.

Gaspari, F. D., Hitaj, D., Pagnotta, G., Carli, L. D., and

Mancini, L. V. (2021). Reliable detection of com-

pressed and encrypted data. CoRR, abs/2103.17059.

GmbH, H. A. (2018). A generic interface and CLI for all

endpoints of the Falcon Sandbox API. https://github.

com/PayloadSecurity/VxAPI.

omez-Hern

andez, J.,

Alvarez-Gonz

alez, L., and Garc

ıa-

Teodoro, P. (2018). R-Locker: Thwarting ransomware

action through a honeyﬁle-based approach. Comput-

ers & Security, 73:389–398.

Google Inc. (201x). Android ﬁleobserver. https://developer.

android.com/reference/android/os/FileObserver.

Gutierrez, C. N., Spafford, E. H., Bagchi, S., and Yurek,

T. (2018). Reactive redundancy for data destruction

protection (R2D2). Computers & Security.

Hicks, B. J., Dong, A., Palmer, R., and Mcalpine, H. C.

(2008). Organizing and managing personal elec-

tronic ﬁles: A mechanical engineer’s perspective.

ACM Transactions on Information Systems (TOIS),

26(4):23.

Huang, J., Xu, J., Xing, X., Liu, P., and Qureshi, M. K.

(2017). Flashguard: Leveraging intrinsic ﬂash prop-

erties to defend against encryption ransomware. In

Proceedings of the 2017 ACM SIGSAC Conference

on Computer and Communications Security, pages

2231–2244. ACM.

Hybrid Analysis GmbH (2018). Hybrid analyisis. https:

//www.hybrid-analysis.com.

Kharraz, A., Arshad, S., Mulliner, C., Robertson, W., and

Kirda, E. (2016). UNVEIL: A Large-Scale, Auto-

mated Approach to Detecting Ransomware. In 25th

USENIX Security Symposium (USENIX Security 16),

pages 757–772. USENIX Association.

Kharraz, A. and Kirda, E. (2017). Redemption: Real-time

protection against ransomware at end-hosts. In In-

ternational Symposium on Research in Attacks, Intru-

sions, and Defenses, pages 98–119. Springer.

Kharraz, A., Robertson, W., Balzarotti, D., Bilge, L., and

Kirda, E. (2015). Cutting the gordian knot: A look un-

der the hood of ransomware attacks. In International

Conference on Detection of Intrusions and Malware,

and Vulnerability Assessment, pages 3–24. Springer.

Kohlbrenner, A., Araujo, F., Taylor, T., and Stoecklin,

M. P. (2017). POSTER: Hidden in plain sight: A

ﬁlesystem for data integrity and conﬁdentiality. In

Proceedings of the 2017 ACM SIGSAC Conference

on Computer and Communications Security, pages

2523–2525. ACM.

Kolodenker, E., Koch, W., Stringhini, G., and Egele, M.

(2017a). Paybreak: Defense against cryptographic

ransomware. In Proceedings of the 2017 ACM on Asia

Conference on Computer and Communications Secu-

rity, pages 599–611. ACM.

Kolodenker, E., William, K., Gianluca, S., and Manuel, E.

(2017b). Real-time Automation to Discover, Detect

and Alert of Ransomware (RADDAR). https://github.

com/BUseclab/raddar.

Kullback, S. and Leibler, R. A. (1951). On information and

sufﬁciency. Ann. Math. Statist., 22(1):79–86.

Le, T. (2015). A recommended framework for anomaly in-

trusion detection system (ids). In GI-Jahrestagung,

pages 1829–1840.

ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy

218

Lengyel, T. K., Maresca, S., Payne, B. D., Webster, G. D.,

Vogl, S., and Kiayias, A. (2014). Scalability, ﬁdelity

and stealth in the DRAKVUF dynamic malware anal-

ysis system. In Proceedings of the 30th Annual Com-

puter Security Applications Conference, pages 386–

395. ACM.

Lin, J. (1991). Divergence measures based on the Shannon

entropy. IEEE Transactions on Information Theory,

37(1):145–151.

Ma, W., Duan, P., Liu, S., Gu, G., and Liu, J.-C. (2012).

Shadow attacks: automatically evading system-call-

behavior based malware detection. Journal in Com-

puter Virology, 8(1):1–13.

Mbol, F., Robert, J.-M., and Sadighian, A. (2016). An ef-

ﬁcient approach to detect torrentlocker ransomware

in computer systems. In International Conference

on Cryptology and Network Security, pages 532–541.

Springer.

Microsoft Inc. (2014). File system miniﬁlter drivers. https:

//docs.microsoft.com/en-gb/windows-hardware/

drivers/ifs/ﬁle-system-miniﬁlter-drivers.

Milosevic, J., Sklavos, N., and Koutsikou, K. (2016). Mal-

ware in iot software and hardware.

More, A. and Tapaswi, S. (2014). Virtual machine intro-

spection: towards bridging the semantic gap. Journal

of Cloud Computing, 3(1):16.

Oberheide, J., Bailey, M., and Jahanian, F. (2009). Poly-

Pack: an automated online packing service for op-

timal antivirus evasion. In Proceedings of the 3rd

USENIX conference on Offensive technologies, pages

9–9. USENIX Association.

Palisse, A., Durand, A., Le Bouder, H., Le Guernic, C.,

and Lanet, J.-L. (2017). Data aware defense (dad):

Towards a generic and practical ransomware counter-

measure. In Nordic Conference on Secure IT Systems,

pages 192–208. Springer.

Payne, B. D., Maresca, S., Lengyel, T. K., and Saba, A.

(2015). LibVMI Github repository. http://libvmi.

com/.

Pont, J., Arief, B., and Hernandez-Castro, J. (2020). Why

current statistical approaches to ransomware detection

fail. In Susilo, W., Deng, R. H., Guo, F., Li, Y., and

Intan, R., editors, Information Security - 23rd Inter-

national Conference, ISC 2020, Bali, Indonesia, De-

cember 16-18, 2020, Proceedings, volume 12472 of

Lecture Notes in Computer Science, pages 199–216.

Springer.

Richer, T. J. (2017). Entropy-based detection of botnet com-

mand and control. In Proceedings of the Australasian

Computer Science Week Multiconference, page 75.

ACM.

Roussev, V. (2010). Data ﬁngerprinting with similarity di-

gests. In IFIP International Conference on Digital

Forensics, pages 207–226. Springer.

Scaife, N., Carter, H., Traynor, P., and Butler, K. R. (2016).

Cryptolock (and drop it): stopping ransomware at-

tacks on user data. In Distributed Computing Systems

(ICDCS), 2016 IEEE 36th International Conference

on, pages 303–312. IEEE.

Sgandurra, D., Mu

noz-Gonz

alez, L., Mohsen, R., and

Lupu, E. C. (2016). Automated dynamic analysis of

ransomware: Beneﬁts, limitations and use for detec-

tion. arXiv preprint arXiv:1609.03020.

Song, S., Kim, B., and Lee, S. (2016). The effective ran-

somware prevention technique using process monitor-

ing on android platform. Mobile Information Systems,

2016.

Subedi, K. P., Budhathoki, D. R., Chen, B., and Dasgupta,

D. (2017). Rds3: Ransomware defense strategy by

using stealthily spare space. In Computational Intelli-

gence (SSCI), 2017 IEEE Symposium Series on, pages

1–8. IEEE.

Underwriters Labs LLC (2013). Pcmark 8. https://

benchmarks.ul.com/.

Viswanathan, A., Tan, K., and Neuman, C. (2013). De-

constructing the assessment of anomaly-based intru-

sion detectors. In International Workshop on Re-

cent Advances in Intrusion Detection, pages 286–306.

Springer.

Young, A. and Yung, M. (1996). Cryptovirology: extortion-

based security threats and countermeasures. In Pro-

ceedings 1996 IEEE Symposium on Security and Pri-

vacy, pages 129–140.

On Tracking Ransomware on the File System

219