A Categorical Data Approach for Anomaly Detection in WebAssembly

Applications

Tiago Heinrich

1 a

, Newton C. Will

2 b

, Rafael R. Obelheiro

3 c

and Carlos A. Maziero

1 d

Computer Science Department, Federal University of Paraná, Curitiba, 81530–015, Brazil

Computer Science Department, Federal University of Technology, Paraná, Dois Vizinhos, 85660–000, Brazil

Computer Science Department, State University of Santa Catarina, Joinville, 89219–710, Brazil

Keywords:

WebAssembly, WASI Interface, Intrusion Detection, Web Services, Security.

Abstract:

The security of Web Services for users and developers is essential; since WebAssembly is a new format that has

gained attention in this type of environment over the years, new measures for security are important. However,

intrusion detection solutions for WebAssembly applications are generally limited to static binary analysis. We

present a novel approach for dynamic WebAssembly intrusion detection, using data categorization and machine

learning. Our proposal analyses communication data extracted from the WebAssembly sandbox, with the goal

of better capturing the applications’ behavior. Our approach was validated using two strategies, online and

ofﬂine, to assess the effectiveness of categorical data for intrusion detection. The obtained results show that

both strategies are feasible for WebAssembly intrusion detection, with a high detection rate and low false

negative and false positive rates.

1 INTRODUCTION

WebAssembly, also known as Wasm, is a bytecode-

like format that aims to enable the portability of other

languages to the Web environment (Battagline, 2021).

It runs in an isolated environment provided by

a sandbox and is conceived to easily interact with

other languages, through a shared-memory interaction

model called exposed functions. WebAssembly helps

developers with the implementation of Web services,

providing a performance gain and code size reduction.

WasmEdge and Atmo are popular examples of Web

services applications developed using WebAssembly

(WasmEdge, 2023; Connor, 2023). WebAssembly had

a quick adoption and is supported by all the main-

stream Web browsers. Its current stable version is 1.0,

and it is being actively developed to add new features.

Wasm applications are executed inside a sandbox

environment that offers a layer of isolation between

the application and the underlying environment. How-

ever, security issues exist and can affect Web users

(Kim et al., 2022). Attackers are also exploring Wasm

https://orcid.org/0000-0002-8017-1293

https://orcid.org/0000-0003-2976-4533

https://orcid.org/0000-0002-4014-6691

https://orcid.org/0000-0003-2592-3664

features to make cryptojacking and to obfuscate mali-

cious applications (Musch et al., 2019; Romano et al.,

2022).

Proposals for security improvement developed in

the latest years consider the classiﬁcation of Wasm

binaries (Romano and Wang, 2020), hardening the

sandbox by the use of Trusted Execution Environ-

ments (

TEEs

) (Qiang et al., 2018; Ménétrey et al.,

2021), using fuzzing techniques to ﬁnd ﬂaws in soft-

ware implementation (Liang et al., 2018), and dynamic

analysis tools that extract information from binaries

to ﬁnd vulnerabilities (Lehmann and Pradel, 2019;

Brito et al., 2022). However, to the best of our knowl-

edge, solutions that explore the interactions between

the application and the sandbox environment to detect

anomalous application behavior were not proposed

yet.

The use of data extracted from Inter-Process Com-

munication (

IPC

) for anomaly detection is not new

(Forrest et al., 1996; Castanhel et al., 2021; Lemos

et al., 2022). This approach allows to observe the

operations being executed by the application, captur-

ing interaction patterns that can be used by Machine

Learning (

) models to identify malicious behavior.

Our proposal explores the interactions between the

WebAssembly application and its runtime for intrusion

detection.

Heinrich, T., Will, N., Obelheiro, R. and Maziero, C.

A Categorical Data Approach for Anomaly Detection in WebAssembly Applications.

DOI: 10.5220/0012252800003648

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Information Systems Security and Privacy (ICISSP 2024), pages 275-284

ISBN: 978-989-758-683-5; ISSN: 2184-4356

275

The approach proposed here consists in extracting

data from the interactions between a Wasm application

and its runtime and classifying such data using a cate-

gorical model. This pre-processed data is then used to

train

models to capture the application behavior.

Finally, the trained models are used for intrusion detec-

tion. We explore two detection strategies (online and

ofﬂine) with different models of

for the detection

of threats.

To the best of our knowledge, we present the ﬁrst

proposal to use dynamic interaction data for intrusion

detection in the WebAssembly environment. Our main

contributions are:

•

An intrusion detection solution for WebAssembly

based on interaction data;

•

A comparison between online and ofﬂine ap-

proaches for the proposed intrusion detection;

•

A generic categorical data model that explores in-

teraction data to detect anomalies; this model is

generic enough to be applied to other situations,

like OS system calls and Android binder calls; and

•

An overview on the use of categorical data in se-

curity approaches that explore machine learning

solutions.

The remainder of this paper is structured as fol-

lows: Section 2 presents the background; Section 3

presents related works; Section 4 discusses the pro-

posal; Section 5 presents the evaluation; and Section 6

concludes the paper.

2 BACKGROUND

This Section presents a brief overview about the con-

cepts used in this work. It contextualizes the We-

bAssembly environment, the use of categorical data

approaches, and intrusion detection using Machine

Learning (ML).

2.1 WebAssembly

WebAssembly (also known as Wasm) is a bytecode

binary format as a target. It presents a performance

gain and reduced size in comparison with Web lan-

guages. Programs in WebAssembly can be written

using WebAssembly Text (

WAT

), a textual program-

ming format that is compiled into the bytecode. Be-

sides that, developers can use languages such as Rust,

C, and C++, which are not natively supported in the

Web environment, but can also be compiled into Wasm

binaries. This allows to easily port code developed in

such languages to the Web environment. Such features

open new opportunities and easiness for developers

(Hoffman, 2019).

The WebAssembly binary contains a module that

encapsulates metadata (like the the WebAssembly ver-

sion), functions, variables, information about exported

functions that can be reached by other applications,

or even imported resources (Battagline, 2021). A We-

bAssembly application executes inside a sandbox en-

vironment, with external resources being accessible

through an Application Programming Interface (

API

)

(Kim et al., 2022). This design choice allows applica-

tions to run on a wide range of platforms, reduces the

size of messages exchanged with the server to retrieve

content, and offers more security for the environment

when running unknown binaries (Battagline, 2021).

Figure 1 presents the WebAssembly environment,

with three of the most popular ways that WebAssem-

bly applications can be used as a service. The three

options presented from the left to the right are, (i) a

runtime inside a host (ii) a container environment, and

(iii) inside the Web browser. The application interacts

with the underlying environment through the use of

the WebAssembly System Interface (

WASI

) calls that

passes through the WASI

API

. The interface deﬁnes

rules and controls the interactions that are performed

between the application and the environment. This

layer allows the execution of native operations with-

out breaking the security levels already deﬁned for

WebAssembly applications (Battagline, 2021).

Kernel Space

Hardware

User Space

WebAssembly

https://webassembly.org/

WASI libc

WASI API

GNU C Library (glibc)

Figure 1: WebAssembly architecture overview.

The interaction with the

API

is made through

WASI calls, which have behavior similar to system

calls, but in a more abstract layer, closer to the applica-

tion. They allow a WebAssembly application running

inside the sandbox to access resources provided by the

underlying environment. Currently, 45 distinct

WASI

calls are supported (more calls will be added in future

versions) (BytecodeAlliance, 2021a).

Despite the similarity between WASI calls and sys-

tem calls, WASI calls are made speciﬁcally for the

WebAssembly environment, not having a 1:1 binding

to system calls. This allows to offer WASI calls more

abstract and better suited to build Web applications, in

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

276

opposition to system calls, which should be closer to

kernel resources.

2.2 Categorical Data

Categorical data or categorical variables are deﬁned

as a data type that can represent a known number of

variables by a limited number of categories (Powers

and Xie, 2008). Quantitative and qualitative variables

can usually be represented by categories (except con-

tinuous variables). The use of data categorization as a

measurement strategy is popular in the social sciences

ﬁeld, representing data types such as age, gender, and

location.

Categorical representation in speciﬁc situations can

be a better ﬁt to represent data that would be oversim-

pliﬁed by numerical representations. However, both

representations are not that different, the key differ-

ence is that the probability distribution of the ﬁrst one

is associated with categories instead of numerical data

(CHARU, 2017).

The use of categorical data representation also lim-

its the strategies that can be applied for the data evalu-

ation. Statistical algorithms, linear models, proximity-

based algorithms, and density-based models are exam-

ples of techniques that should be adapted to correctly

work with categorical data (CHARU, 2017).

Categorical data analysis is a widely explored ﬁeld.

(Bay and Pazzani, 1999) present a strategy for min-

ing categorical data, proposing a methodology for the

identiﬁcation of contrasting groups. Also, (Das et al.,

2008) proposes a range of solutions focusing on pat-

tern detection; anomaly detection is covered by (Liao

and Vemuri, 2002; Taha and Hadi, 2019), and outlier

detection for categorical data by (Wu and Wang, 2011;

Li et al., 2018).

In our proposal, the use of categorical representa-

tions for the data allows us to (i) evaluated the use of

this data representation for intrusion detection solu-

tions, and (ii) have a better understanding of categori-

cal data representations for machine learning strategies.

Data categorization is important because it allows us-

ing a representation that can highlight new patterns for

the samples by the use of categories or groups.

2.3 Intrusion Detection Using Machine

Learning

Intrusion Detection System (

IDS

) is a mechanism that

aims to warn about activities that may put the system

security at risk (Axelsson, 2000). The mechanism can

monitor the network, hosts, and applications, looking

for signs of attacks or intrusions (Stallings et al., 2012).

When an evidence of an attack is observed, an alert

is issued for further analysis, and in some cases, the

system itself can respond and take countermeasures.

IDS

tries to distinguish different behaviors and

patterns to identify unwanted actions. Two strategies

are popularly used to identify threats (Lam, 2005):

Signature-based, which uses a database of known at-

tack samples to identify unwanted behavior; these

samples are signatures of threats or known attack be-

havior. On the other hand, an anomaly-based

IDS

focuses on using statistical models to characterize the

normal/usual activity in the system/environment. Thus,

an anomaly portrays some activity that deviates from

the deﬁned normal model (Debar et al., 1999; Yassin

et al., 2013).

Over the years, the adoption of Machine Learn-

ing (

) solutions to improve security strategy in-

creased. Intrusion Detection Systems (

IDSs

) can be

improved by the use of models that allow an appli-

cation to learn the patterns of attacks by itself.

models are capable of creating a general representa-

tion of the environment, identifying key differences

between types of attacks, and, for some speciﬁc mod-

els, continuously learning during the time of evaluation

(Mishra et al., 2018; Ceschin et al., 2020).

3 RELATED WORK

There are some few research works targeting We-

bAssembly binary code analysis. Most of them focus

on feature extraction from the binary code to identify

vulnerabilities and other problems.

Wasabi is a framework for dynamic analysis of

WebAssembly binaries (Lehmann and Pradel, 2019).

It allows the analysis of instructions, call graphs, taint

and tracing. The information is collected during the

execution and is not limited to WebAssembly appli-

cations. An

API

allows new evaluation systems to be

implemented and enables the extraction of information

from the framework.

The paper (Romano and Wang, 2020) presents

WASim, a tool that performs the classiﬁcation of We-

bAssembly binaries. A dataset for model training and

evaluation was built using binaries extracted from the

top 1 million sites in the Alexa ranking. Their predic-

tion strategy achieved an accuracy of 91.6%.

Wasmati is a tool for static analysis of WebAssem-

bly binaries (Brito et al., 2022), aiming at identifying

code vulnerabilities. Property graphs are generated

for each application based on 10 queries, and a set of

datasets was scanned, looking for vulnerabilities.

Another strategy for static analysis of WebAssem-

bly binaries is presented by (Stiévenart and De Roover,

2020). Flows of information are extracted for the bi-

A Categorical Data Approach for Anomaly Detection in WebAssembly Applications

277

nary code, allowing the evaluation of function calls in

WebAssembly.

4 PROPOSAL

Our proposal consists of using data extracted from

WASI calls issued by the application during its ex-

ecution, to detect anomalies. The extracted data is

classiﬁed using a categorical data strategy and then

used to train a machine learning model for detecting

such threats.

Two strategies to evaluate the proposal were de-

ﬁned. The ﬁrst strategy consists of an ofﬂine evalua-

tion, that is aimed at the classiﬁcation of WebAssembly

binaries. The second strategy is an online evaluation,

that aims to simulate a real case of intrusion detection,

where the application would be running. Both solu-

tions are aimed at identifying intrusion detection in

WebAssembly applications.

The categorization strategy we propose is pre-

sented in Section 4.1, the dataset built for the eval-

uation is described in Section 4.2, and the proposal

evaluation itself is presented in Section 5.

4.1 Data Categorization

Instead of directly using WASI calls as features for

the machine learning modes, our proposal is based

on the data categorization. A variety of approaches

can be applied to the data, as the categorization allows

to highlight new characteristics from the dataset. For

intrusion detection, the security risk associated to each

call can be highlighted through the use of categoriza-

tion.

Categorical data are variables that are measured

by categorizing a limited set of “values” (Powers and

Xie, 2008). This is a basic data type used for anal-

ysis (CHARU, 2017). Discrete, ordinal, or nominal

variables can be represented as categories. For se-

curity and machine learning, this concept is applied

with limitations. The application of data categorization

strategies requires understanding the data to group the

variables in such a way that key points are represented

in the categories and information on their respective

members is not lost (Markman and Ross, 2003).

The system calls taxonomy proposed by

(Bernaschi et al., 2002) is a pioneer work considering

system call grouping. Through a categorization

strategy they deﬁne two groups for the classiﬁcation

of system calls. The ﬁrst group aims to classify system

calls according to their functionality, and the second

group deﬁnes a threat level for each system call. The

threat levels are 1 (enable full system compromise), 2

(allow a denial-of-service attack against the system),

3 (enable subversion of the calling process), and 4

(harmless calls). This classiﬁcation is hierarchical, as

it assumes that a system call classiﬁed at level

can

also perpetrate an attack at threat level

for

i < j

. The

ﬁrst issue is treating threat levels as an hierarchy, with

threats at lower levels subsuming threats at higher

levels, which may not always be true. A counter

example would be the

fork

system call, which can

cause a denial-of-service (

DoS

) on the system (and is

thus classiﬁed at threat level 2) but cannot subvert a

process (threat level 3).

In our proposal, we aim to remove hierarchical

problems from the classiﬁcation deﬁned by (Bernaschi

et al., 2002) and propose a categorical deﬁnition that

is ﬂexible for using with other data types. A range of

communication mechanisms are used in the Operating

System (

), such as system calls (Galvin et al., 2003),

binder calls (Lemos et al., 2023), or

IPC

, and

IDSs

ana-

lyzing them can beneﬁt from exploring categorization.

Each WASI call has speciﬁc information related

to its operation, functionality, and security. Each in-

teraction (or operation) that an application performs

with the system can be categorized. Also, each inter-

action also has a respective risk associated to it. Table

1 presents two possible categorizations based on the

operation that the application generates when interact-

ing with the system. Two categories are presented, a

level for the type of operation and a risk level for the

operations.

The ﬁve risk levels presented aim to group interac-

tions (that in our proposal are WASI calls) that alone

offer some kind of risk to the system if used by a mali-

cious application. When considering the deﬁned risk

group (high or low), we tend to directly relate it to the

type of operation that the interaction performs in the

system. The ﬁrst three risk levels (A, B, and C), in the

high risk group, are interactions that can be used by

an attacker. Such risk levels deﬁne different behavior

of operations that malicious applications have on the

environment.

We also propose a classiﬁcation based on the func-

tions performed by each interaction with the runtime

(also WASI calls). Such classes are shown in Table 2.

This classiﬁcation is an expansion of categorizations

already proposed, such as (Bernaschi et al., 2002) and

(Galvin et al., 2003). We added to it the device ma-

nipulation (10) and the removed/debug (9) interaction

groups, since several changes were observed in the

system APIs.

We assume that an application making calls clas-

siﬁed in class A more frequently tends to offer more

risk than an application that only uses calls presented

in class D. This classiﬁcation is not used alone in the

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

278

Table 1: Operation Classiﬁcation.

Risk Level Risk Group Type of operation

High

These calls alone offer system risk/control

B Calls that may offer some risk to the system

C Calls that are dangerous when much repeated

Low

Harmless calls

E Unused/deprecated calls

Table 2: Functionalities Classiﬁcation.

Group Functionalities

1 File manipulation

2 Process control

3 Module management

4 Memory management

5 Time operation

6 Communication

7 System information

8 Reserved

9 Not implemented/removed/debug

10 Device manipulation

intrusion detection process, but together with the clas-

siﬁcation of functionalities. In this way, an application

that makes calls classiﬁed as A-2 (a process control

call that offers a high risk) offers more risk than if it

makes calls classiﬁed as C-5 (time-related calls that

offer some risk). This distinction contributes to the

training stage of the

models, as it is possible to

emphasize calls that pose more risk.

Table 3 presents the proposed data categorization

for the WASI calls. WebAssembly is a format in active

development, thus we are using the calls supported

for version 1.0. In future versions (

snapshots

) new

calls will be introduced, but this proposal already has

the required categories for the correct classiﬁcation of

such calls.

The advantages of the categorization proposed here

are (i) it is ﬂexible enough to be applied to other data

types; (ii) it reduces the volume of data used for train-

ing and identiﬁcation; (iii) it allows a better understand-

ing of the behaviors behind application interactions;

and (iv) it allows to relate different types of interac-

tions.

4.2 Data Approach

The dataset used for the training and testing of our

proposal was created by us from different data sources,

allowing us to build a dataset that represents a variety

of behavior found in the WebAssembly environment.

Overall, we have 18 types of attack samples present,

with a total of 377 samples to form the anomaly class,

and we have 263 samples for the normality class. The

data samples came from a range of data sources like

test suits, benchmarks, single WebAssembly modules,

and applications (Stiévenart, 2023; Denis, 2023; We-

bAssembly, 2023; Beyer, 2023).

Our goal is to have a varied set of samples that

represents the environment. The variety of the data

samples allows us to better assess how the data catego-

rization proposed is able to highlight the functionality

and operations behind each interaction of the applica-

tion with the underlying environment. The samples

are also helpful to demonstrate that the proposal cate-

gorization is generic and may be applied to other data

sources.

For the extraction of the WASI calls in each binary

sample, we used the Wasmtime runtime CLI (Byte-

codeAlliance, 2021b), which executes standalone. It

provides functionality to generate detailed traces of

the WebAssembly application being run, without in-

strumenting or modifying the application.

5 EVALUATION

To evaluate the proposal, we built the classiﬁcation

strategy, collected data to conduct the experiments, and

train/test the machine learning models. This allows

us to evaluate the efﬁciency of our intrusion detection

solution, based on the categorization of calls from the

WebAssembly environment.

Two types of strategies are possible for intrusion

detection: an online evaluation is made in real-time

and uses the data generated during the runtime of an

application to identify threats. On the other hand, an

ofﬂine evaluation is not made in real-time and is quite

useful in a controlled environment, enabling the study

of the entire application and its interactions.

We deﬁned two groups of tests with the goal of

evaluating both scenarios, in how effectively the use

of calls from the WebAssembly environment is for

intrusion detection solutions. The ﬁrst scenario con-

siders an ofﬂine approach and uses all the interactions

from the environment. The second scenario used a

ﬁxed size sliding window, deﬁning a partial view of

the environment, and simulating an online approach

A Categorical Data Approach for Anomaly Detection in WebAssembly Applications

279

Table 3: WASI Calls classiﬁcation.

WASI Calls

path_link

path_rename

path_symlink

path_unlink_file

path_remove_directory

path_filestat_set_times, args_get, environ_get

fd_fdstat_set_flags

fd_tell

fd_seek

path_create_directory

fd_pread

fd_pwrite

sock_recv

fd_read, sock_send , fd_write, fd_filestat_get, path_create_directory

fd_advise, sched_yield

fd_renumber , fd_allocate, path_open, random_get

sock_recv, sock_send, proc_raise

fd_close

path_readlink

path_filestat_get

args_sizes_get

environ_sizes_get

fd_prestat_get

fd_prestat_dir_name fd_readdir

clock_res_get, clock_time_get

sock_shutdown

fd_filestat_set_times

fd_datasync, fd_sync, fd_fdstat_get

for intrusion detection solutions.

In both scenarios, the interactions from the envi-

ronment were treated equally. Using the same clas-

siﬁcation process for the calls of the WebAssembly

environment. This way, we are able to evaluate the

effectiveness of the use of this type of interaction for

intrusion detection in WebAssembly and discuss how

this type of solution based on the use of categorical

data may be used in other solutions.

Overall, we selected six

models for the experi-

ment, to study the impact of the proposal in different

model approaches. These models are from a range

of supervised learning algorithms, encompassing de-

cision trees classiﬁer, ensemble methods, and neural

network models. The models were used with their

default parameters.

Section 5.1 presents the ofﬂine evaluation, dis-

cussing the detection of each model. Section 5.2

presents the online evaluation. Both strategies are

discussed in-depth, with details on how the machine

learning models behave. Section 5.3 presents a discus-

sion of how the use of calls (communication between

the application and the runtime) can be used for intru-

sion detection, and what we learnt from our evaluation

of such data.

5.1 Ofﬂine Evaluation

In the ofﬂine evaluation, all the information collected

during an application execution is used by the models

during the classiﬁcation phase. Table 4 presents the

results obtained for each model. The dataset was split

in half for the training and test phases. The overall

result is promising for the strategy, with all models

achieving an f1score above 80%.

The precision shows the false positive impact in

the models. Although we obtained high precision val-

ues, our models are still impacted by a small fraction

of the samples being missclassiﬁed and generating

false positives. SGB is the exception in all metrics,

having a low performance overall. The recall doesn’t

depend on precision and is inﬂuenced by false nega-

tive samples, which do not impact our models. The

f1score is based on precision and recall. This metric

enables the description of the positive classes, describ-

ing the adequacy of the models to classify the normal

behavior and the detection of the intrusion classes. In

these three metrics, we are not considering the impact

of true negative samples.

The accuracy and balanced accuracy (BAC) met-

rics show a better understanding of the true positive

and true negative classiﬁcations. Despite not consider-

ing the false negative/positive classes, we notice that

our results are similar to the previous results found for

the f1score, showing that our models are being able to

classify correctly most of the samples. The similarity

between accuracy and BAC results also shows that the

dataset used for the training and test is balanced.

The lower the Brier score is, the most calibrated

the models are to make the classiﬁcation. This evalu-

ation strategy enables the measurement between the

predicted probability of a sample and the achieved re-

sult. Only two models present a brier score higher than

2% and only one of such models present a poor overall

result.

The Stochastic Gradient Descent (SGD) is a linear

classiﬁer, which in our case is being impacted by a

variety of samples found in the dataset. With a high

number of classes that come from a variety of samples,

the linear model is not being able to correctly distin-

guish between the two groups, representing a model

that is not adequate for intrusion detection with this

type of data.

The similarity between Decision Tree and Ad-

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

280

Table 4: Performance of the models for the ofﬂine strategy.

Classiﬁer Precision Recall F1Score Accuracy BAC Brier Score

XGBoost 98.36% 100% 99.18% 99.06% 98.91% 0.94%

Decision Tree 98.38% 100% 99.18% 99.06% 98.91% 0.94%

Nu-Support Vector 92.86% 100% 96.30% 95.61% 94.89% 4.39%

MLP 96.81% 100% 98.38% 98.12% 97.81% 1.88%

AdaBoost 98.38% 100% 99.18% 99.06% 98.91% 0.94%

SGD 73.09% 89.56% 80.49% 75.24% 72.88% 24.76%

aBoost results (over even XGBoost), is due to the prox-

imity of the strategies used in such models (Hastie

et al., 2009; Molnar, 2020). They are also popular for

intrusion detection and malware detection, enabling

users to study the decision made by the classiﬁer.

Overall, our results are promising for the proposal

for intrusion detection. Most of the evaluated models

were able to classify correctly both classes, with few

samples being missclassiﬁed. The strategy for ofﬂine

detection is quite adequate using the classiﬁcation and

calls between the WebAssembly application and its

runtime.

5.2 Real-Time Evaluation

For the online proposal, we have the objective in iden-

tify threats during execution time. A new model was

trained and tested. However, we simulated a limited

vision for the models using a ﬁxed size sliding window,

using two strategies: non-overlapping sliding windows

(abc, def, ghi, ...) and overlapping sliding windows

(abc, bcd, cde, ...).

With a sliding window we can understand the im-

pact of our proposal in a real-time intrusion detection

strategy where only small portions of the execution

interactions are available, since the application is still

running. In comparison with the previous solution, the

same categorical classiﬁcation is used, with the only

difference being the use of the sliding window.

The sliding window size was deﬁned as three (3),

based on previous research using this value, the size

of the traced applications, and characteristics observed

from WebAssembly applications (Liu et al., 2018; Cas-

tanhel et al., 2020). WebAssembly applications are

limited to a module, a limited number of types ex-

ist in the format, and its design limits the number of

operations available. In consequence, the size of an

application is quite smaller than what would be found

in other languages, and it generates a smaller amount

of calls to the runtime. Such reasons led to choose a

small window size.

Table 5 presents the results of the online experi-

ments with a non-overlapping sliding window. Overall,

the restriction on the information amount provided to

the models impacts the detection efﬁcacy. However,

the precision was reduced in four of the classiﬁers and

the recall reduced for all of them, directly affecting the

f1score. The f1score and accuracy for these models

indicates a reduction in the detection rate, leading to

an increase of false positives and false negatives. This

behavior was expected because we are trying to detect

threats with only partial information being provided to

the classiﬁers.

Two of our classiﬁers in Table 5 achieve a better

result for precision than the ofﬂine strategy, showing a

reduction in the number of false positives. However,

the same models have an increase in the number of

false negatives (as the recall shows a reduction in com-

parison with Table 4), and the accuracy describes an

impact in both classes (negative/positive detection rate

are lower than the previous solution).

With BAC and Brier score it is clear that the mod-

els were impacted and had an increase in the number

of missclassiﬁed samples. However, Table 5 shows

a small reduction in comparison with the ofﬂine ap-

proach; we consider such results acceptable. The clas-

siﬁers are still able to detect threats despite missclassi-

fying some samples.

Considering an online (i.e. real-time) intrusion

detection, the f1score metric above 93% obtained in

four of the classiﬁers is considered an adequate result.

The low score obtained by the SGD classiﬁer was

expected; as presented in Table 4, such linear model

seems not well-suited to this kind of data.

Table 6 presents the results for the overlapping

sliding window with overlap 1 (abc, cde, efg, ...). We

included these results as it is quite popular when using

sliding windows to consider overlapping to simulate

a real-world case and to increase the amount of data,

since more windows are created. We notice an increase

in the number of false positives, despite the reduction

in the number of false negatives. The models had an

inferior performance as demonstrated by the f1score

result. The similar values of accuracy and BAC be-

tween Tables 5 and 6 are not enough to suggest that

the use of overlapping is a good strategy when using

categorical data.

Giving a deeper look at the data, we can better

A Categorical Data Approach for Anomaly Detection in WebAssembly Applications

281

Table 5: Performance of the models for the real-time strategy (non-overlapping).

Classiﬁer Precision Recall F1Score Accuracy BAC Brier Score

XGBoost 94.69% 92.88% 93.78% 94.90% 94.60% 5.10%

Decision Tree 99.86% 92.88% 93.86% 94.97% 94.66% 5.03%

Nu-Support Vector 93.70% 92.88% 93.29% 94.46% 94.23% 5.54%

MLP 93.7% 92.88% 93.29% 94.46% 94.23% 5.54%

AdaBoost 88.20% 96.01% 91.94% 93.03% 93.46% 6.97%

SGD 32.81% 68.58% 44.38% 28.83% 34.66% 71.17%

Table 6: Performance of the models for the real-time strategy (overlapping).

Classiﬁer Precision Recall F1Score Accuracy BAC Brier Score

XGBoost 82.44% 100% 90.37% 92.25% 93.91% 7.75

Decision Tree 82.44% 100% 90.37% 92.25% 93.91% 7.75

Nu-Support Vector 72.17% 100% 83.83% 85.96% 88.97% 14.04

MLP 81.47% 99.84% 89.73% 91.68% 93.43% 8.32

AdaBoost 82.54% 100% 90.44% 92.30% 93.95% 7.7

SGD 71.97% 99.84% 83.65% 85.80% 88.80% 14.2

understand why the overlapping approach is not ideal

for this type of data. Since our interactions that are

WASI calls are being categorized, when the overlap-

ping strategy is adopted we duplicated some of the

information. In this case, we are adding more cate-

gories to the window generated from the trace of the

application, because we have an overlap of one (that

adds a category in each window).

In all the categories, most groups present a similar

number of calls, except for three classes. Two of them

are from the high-risk level and present growth in the

number of calls for C1 (87.4%) and B1 (23.1%). Class

D5 also presented a growth of 37.2%. The increase in

the false positive rate of our models is directly asso-

ciated with the increase in the number of calls in the

categories with high risk. As we are categorizing the

information used, in the case of overlapping we are

also adding information to the dataset that is incorrect.

Since we are duplicating small portions of the data,

we are also adding information related to operations,

functionality, and security of the application that in

the reality do not exist. For example, with a growth

of 87.4%, we are also telling the models that the risk

is higher in comparison with the previous experiment

(Table 5), which is not exact, because the risk is the

same.

The impact in machine learning models also ex-

ists because of the categorization, we are specifying

distinct features that with an overlapping strategy are

misleading the models. For this reason, we cannot

recommend the use of categorization and overlapping

in the same data.

5.3 Use of Calls for Intrusion Detection

The use of communication data for intrusion detection

is not new (Forrest et al., 1996; Lemos et al., 2023).

However, an approach that considers this type of in-

teraction in the WebAssembly sandbox is a novelty.

Considering such data as categorical is also a new strat-

egy for intrusion detection in this context. Our results

highlight how the use of this type of strategy is able

to represent properly the characteristics found in the

applications on the dataset.

The categorization of variables enables the user to

highlight key factors in the data, that otherwise may

be difﬁcult to identify by machine learning models. In

our case, we were able to highlight security charac-

teristics for the WASI calls, and through the grouping

of operations/functionality, we were able to introduce

new features that present more complete information

about the behavior of each application.

The data categorization also presented good results

for both online/ofﬂine solutions.

For the WebAssembly sandbox, the WASI calls

are representative and, with our experiments, we can

say that our proposal is a viable solution for intrusion

detection solutions for the Web. Considering that dif-

ferent sandbox solutions exist, this proposal can be

adapted to other environments. The categorical classi-

ﬁcation presented is ﬂexible and easily applicable to

other types of communication as system calls and

IPC

mechanisms.

We applied a range of machine learning classiﬁers

intending to evaluate the efﬁcacy obtained by exclu-

sively using categorical data for intrusion detection.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

282

As previously discussed, the models do not appear to

be harmed by this design choice and we can go further

by saying that the use of categorical data to emphasize

speciﬁc features of the data helps the efﬁcacy of the

models.

6 CONCLUSION

This paper presents an intrusion detection solution for

WebAssembly. We use data from the interactions be-

tween the WebAssembly application and the sandbox

runtime to identify malicious applications. Such inter-

actions (WASI calls) were categorized with the goal

of highlighting key features for the machine learning

classiﬁers.

We proposed a categorical classiﬁcation to be used

for intrusion detection, to be applied on the WASI

calls data. The operations and functionalities were the

main aspects used for the classiﬁcation. The result was

a generic model that is not limited to WebAssembly

applications, enabling the categorization of system

calls or IPC messages.

Our results show that for both online/ofﬂine solu-

tions, the use of WASI calls is adequate for intrusion

detection. Most of our models achieve a high detection

rate with low false negative and false positive values.

The models in both cases were able to learn from the

categorical features offered in the training phase. The

results showed adequacy in the use of data categoriza-

tion for intrusion detection, allowing the discussion of

the effectiveness of using categorical data for machine

learning classiﬁcation strategies in intrusion detection.

For future work, we will improve the detection

strategy for WebAssembly in a more complex environ-

ment, not being limited to Web services. We will also

explore further the beneﬁts of using data categorization

in machine learning models. We also aim to explore

different data types that can be used for intrusion de-

tection and may provide performance improvements

when using data categorization.

ACKNOWLEDGEMENTS

This study was ﬁnanced in part by the Coordenação de

Aperfeiçoamento de Pessoal de Nível Superior – Brasil

(CAPES) – Finance Code 001 and Fundação de Am-

paro à Pesquisa e Inovação do Estado de Santa Cata-

rina (FAPESC). The authors also thank the UDESC,

UFPR and UTFPR Computer Science departments.

REFERENCES

Axelsson, S. (2000). Intrusion detection systems: A survey

and taxonomy.

Battagline, R. (2021). The Art of WebAssembly: Build Secure,

Portable, High-Performance Applications. No Starch

Press, San Francisco, CA, USA.

Bay, S. D. and Pazzani, M. J. (1999). Detecting change in

categorical data: Mining contrast sets. In Proceedings

of the ﬁfth ACM SIGKDD international conference on

Knowledge discovery and data mining, pages 302–306.

Bernaschi, M., Gabrielli, E., and Mancini, L. V. (2002). Re-

mus: A security-enhanced operating system. ACM

Transactions on Information and System Security (TIS-

SEC), 5(1):36–61.

Beyer, C. (2023). Amalgamated webassembly system inter-

face test suite. https://github.com/caspervonb/wasi-tes

t-suite.

Brito, T., Lopes, P., Santos, N., and Santos, J. F. (2022).

Wasmati: An efﬁcient static vulnerability scanner for

WebAssembly. Computers & Security, 118:102745.

BytecodeAlliance (2021a). Wasmtime. https://github.com

/bytecodealliance/wasmtime/blob/main/docs/WASI-o

verview.md.

BytecodeAlliance (2021b). Wasmtime. https://docs.wasmt

ime.dev/introduction.html.

Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,

C. (2021). Taking a peek: An evaluation of anomaly

detection using system calls for containers. In Pro-

ceedings of the IEEE Symposium on Computers and

Communications, pages 1–6. IEEE.

Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,

C. A. (2020). Sliding window: The impact of trace size

in anomaly detection system for containers through

machine learning. In Anais da XVIII Escola Regional

de Redes de Computadores, pages 141–146. SBC.

Ceschin, F., Gomes, H. M., Botacin, M., Bifet, A.,

Pfahringer, B., Oliveira, L. S., and Grégio, A. (2020).

Machine learning (in) security: A stream of problems.

arXiv preprint arXiv:2010.16045.

CHARU, C. A. (2017). OUTLIER ANALYSIS. Springer.

Connor (2023). Atmo. https://www.civo.com/marketplace/a

tmo.

Das, K., Schneider, J., and Neill, D. B. (2008). Anomaly pat-

tern detection in categorical datasets. In Proceedings

of the 14th ACM SIGKDD international conference on

Knowledge discovery and data mining, pages 169–176.

Debar, H., Dacier, M., and Wespi, A. (1999). Towards a

taxonomy of intrusion-detection systems. Computer

Networks, 31(8):805–822.

Denis, F. (2023). webassembly-benchmarks. https://github

.com/jedisct1/webassembly-benchmarks.

Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff,

T. A. (1996). A sense of self for unix processes. In

Proceedings 1996 IEEE Symposium on Security and

Privacy, pages 120–128. IEEE.

Galvin, P. B., Gagne, G., Silberschatz, A., et al. (2003).

Operating system concepts, volume 10. John Wiley &

Sons.

A Categorical Data Approach for Anomaly Detection in WebAssembly Applications

283

Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman,

J. H. (2009). The elements of statistical learning: data

mining, inference, and prediction, volume 2. Springer.

Hoffman, K. (2019). Programming webassembly with rust:

uniﬁed development for web, mobile, and embedded

applications. Programming WebAssembly with Rust,

pages 1–220.

Kim, M., Jang, H., and Shin, Y. (2022). Avengers, Assem-

ble! survey of WebAssembly security solutions. In

Proceedings of the 15th International Conference on

Cloud Computing, pages 543–553, Barcelona, Spain.

IEEE.

Lam, A. (2005). New ips to boost security, reliability and

performance of the campus network. Newsletter of

Computing Services Center.

Lehmann, D. and Pradel, M. (2019). Wasabi: A frame-

work for dynamically analyzing WebAssembly. In

Proceedings of the 24th International Conference on

Architectural Support for Programming Languages and

Operating Systems, pages 1045–1058, Providence, RI,

USA. ACM.

Lemos, R., Heinrich, T., Maziero, C. A., and Will, N. C.

(2022). Is it safe? identifying malicious apps through

the use of metadata and inter-process communication.

In Proceedings of the 16th IEEE International Systems

Conference, pages 1–8. IEEE.

Lemos, R., Heinrich, T., Will, N. C., Obelheiro, R. R., and

Maziero, C. A. (2023). Inspecting binder transactions

to detect anomalies in android. In Proceedings of the

17th Annual IEEE International Systems Conference,

Vancouver, BC, Canada. IEEE.

Li, J., Zhang, J., Pang, N., and Qin, X. (2018). Weighted

outlier detection of high-dimensional categorical data

using feature grouping. IEEE Transactions on Systems,

Man, and Cybernetics: Systems, 50(11):4295–4308.

Liang, H., Pei, X., Jia, X., Shen, W., and Zhang, J. (2018).

Fuzzing: State of the art. IEEE Transactions on Relia-

bility, 67(3):1199–1218.

Liao, Y. and Vemuri, V. R. (2002). Using text categorization

techniques for intrusion detection. In USENIX Security

Symposium, volume 12, pages 51–59.

Liu, M., Xue, Z., Xu, X., Zhong, C., and Chen, J. (2018).

Host-based intrusion detection system with system

calls: Review and future trends. ACM Computing

Surveys, 51(5):98.

Markman, A. and Ross, B. (2003). Category use and cate-

gory learning. Psychological bulletin, 129:592–613.

Ménétrey, J., Pasin, M., Felber, P., and Schiavoni, V. (2021).

Twine: An embedded trusted runtime for WebAssem-

bly. In Proceedings of the 37th International Confer-

ence on Data Engineering, pages 205–216, Chania,

Greece. IEEE.

Mishra, P., Varadharajan, V., Tupakula, U., and Pilli, E. S.

(2018). A detailed investigation and analysis of using

machine learning techniques for intrusion detection.

IEEE communications surveys & tutorials, 21(1):686–

728.

Molnar, C. (2020). Interpretable machine learning. Lulu.

com.

Musch, M., Wressnegger, C., Johns, M., and Rieck, K.

(2019). New kid on the Web: A study on the preva-

lence of WebAssembly in the wild. In Proceedings

of the 16th International Conference on Detection of

Intrusions and Malware, and Vulnerability Assessment,

pages 23–42, Gothenburg, Sweden. Springer.

Powers, D. and Xie, Y. (2008). Statistical methods for cate-

gorical data analysis. Emerald Group Publishing.

Qiang, W., Dong, Z., and Jin, H. (2018). Se-Lambda: Se-

curing privacy-sensitive serverless applications using

SGX enclave. In Proceedings of the 14th International

Conference on Security and Privacy in Communication

Systems, pages 451–470, Singapore. Springer.

Romano, A., Lehmann, D., Pradel, M., and Wang, W. (2022).

Wobfuscator: Obfuscating JavaScript malware via op-

portunistic translation to WebAssembly. In Proceed-

ings of the 43rd Symposium on Security and Privacy,

pages 1574–1589, San Francisco, CA, USA. IEEE.

Romano, A. and Wang, W. (2020). Wasim: Understanding

WebAssembly applications through classiﬁcation. In

Proceedings of the 35th International Conference on

Automated Software Engineering, pages 1321–1325,

Melbourne, Australia. IEEE.

Stallings, W., Brown, L., Bauer, M. D., and Bhattacharjee,

A. K. (2012). Computer security: principles and prac-

tice. Pearson Education Upper Saddle River, NJ, USA.

Stiévenart, Q. and De Roover, C. (2020). Compositional

information ﬂow analysis for WebAssembly programs.

In Proceedings of the 20th International Working Con-

ference on Source Code Analysis and Manipulation,

pages 13–24, Adelaide, Australia. IEEE.

Stiévenart, Q. (2023). Sac 2022 dataset. https://ﬁgshare.co

m/articles/dataset/SAC_2022_Dataset/17297477.

Taha, A. and Hadi, A. S. (2019). Anomaly detection meth-

ods for categorical data: A review. ACM Computing

Surveys (CSUR), 52(2):1–35.

WasmEdge (2023). WasmEdgeRuntime. https://github.com

/WasmEdge/WasmEdge.

WebAssembly (2023). Wasi tests. https://github.com/Web

Assembly/wasi-testsuite.

Wu, S. and Wang, S. (2011). Parameter-free anomaly detec-

tion for categorical data. In International Workshop on

Machine Learning and Data Mining in Pattern Recog-

nition, pages 112–126. Springer.

Yassin, W., Udzir, N. I., Muda, Z., Sulaiman, M. N., et al.

(2013). Anomaly-based intrusion detection through

k-means clustering and naives bayes classiﬁcation. In

Proc. 4th Int. Conf. Comput. Informatics, ICOCI.

ICISSP 2024 - 10th International Conference on Information Systems Security and Privacy

284