Towards a Systematic Process-aware Behavioral Analysis for Security

Laura Genga and Nicola Zannone

Eindhoven University of Technology, Eindhoven, The Netherlands

Keywords:

Behavior Analysis, Process Mining.

Abstract:

Nowadays, security is a key concern for organizations. An increasingly popular solution to enhance security

in organizational settings is the adoption of anomaly detection systems. These systems raise an alert when

an abnormal behavior is detected, upon which proper measures have to be taken. A well-known drawback

of these solutions is that the underlying detection engine is a black box, i.e., the behavioral proﬁles used for

detections are encoded in some mathematical model that is challenging to understand for human analysts or, in

some cases, is not even accessible. Therefore, anomaly detection systems often fail in supporting analysts in

understanding what is happening in the system and how to respond to detected security threats. In this work,

we investigate the use of process analysis techniques to build behavioral models understandable by human

analysts. We also delineate a systematic methodology for process-aware behaviors analysis and discuss the

ﬁndings obtained by applying such a methodology to a real-world event log.

1 INTRODUCTION

The rapidly growing of security incidents and frauds

during the last years, often due to insider threats and

the lack of effective internal controls (Association of

Certiﬁed Fraud Examiners, 2018; Richardson, 2008),

has made security a key concern for organizations.

Several research efforts have been devoted both by

academics and practitioners to address such issues.

An increasingly popular solution to enhance the secu-

rity of an organization is represented by anomaly de-

tection systems (Patcha and Park, 2007). These sys-

tems aim to analyze users’ interactions with the sys-

tem in order to detect abnormal behaviors and raise

alerts when security threats are detected.

One of the main issues with these techniques is

that they model system behaviors as a black-box. In

particular, they build models of normal behavior that

are not accessible or, anyway, not easily understand-

able by human analysts. This hampers the compre-

hension of what is happening in the system and, thus,

makes it challenging to determine how to response to

the raised alerts (Costante et al., 2016).

In this paper, we investigate the potentiality of

data science, in particular of process analysis, tech-

niques for deriving white-box models of normal sys-

tem behavior, i.e. models that are actually accessible

by a human analyst, to support the detection and anal-

ysis of anomalous behaviors.

The deﬁnition of accurate behavioral models for

security is an undeveloped topic. Only a few work

have investigated this issue and, to date, a system-

atic methodology is still missing. The main aim of

this work is illustrating the beneﬁts of applying pro-

cess analysis techniques to explore security-related

aspects, to stimulate future research in this direction.

We also propose a methodology for a systematic be-

havior analysis for security. As a proof of concept, we

exploit a real-world case study to show how existing

process analysis techniques can be employed to build

behavioral proﬁles tailored to the detection of secu-

rity ﬂaws within organizational settings. We discuss

the main ﬁndings obtained from the event log at hand.

The remainder of the paper is organized as fol-

lows. Section 2 provides an overview of related work

on behavioral analysis for security. Section 3 delin-

eates the proposed methodology and Section 4 dis-

cusses the main ﬁndings derived from a real-life case

study. Finally, Section 5 draws conclusions and delin-

eates directions for future work.

2 BEHAVIORAL ANALYSIS FOR

SECURITY

Within the scope of this work, a behavior refers to

a set of activities performed by one or more entities

(either humans or software agents) on a given com-

putational system, usually tracked by some logging

460

Genga, L. and Zannone, N.

Towards a Systematic Process-aware Behavioral Analysis for Security.

DOI: 10.5220/0006944604600469

In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE 2018) - Volume 1: DCNET, ICE-B, OPTICS, SIGMAP and WINSYS, pages 460-469

ISBN: 978-989-758-319-3

mechanism (Cao, 2010). Behavioral Analysis aims to

explore event logs to gain insights on the system be-

havior, e.g., typical behavioral patterns. In the secu-

rity domain, behavioral analysis is often used to sup-

port anomaly detection. The goal of an anomaly de-

tection system is to identify system behaviors that de-

viate from what it is considered to be the desired or

normal behavior (Patcha and Park, 2007; Chandola

et al., 2009; Costante et al., 2016).

Anomaly detection systems involve a training

phase, where a model representing the normal system

behavior is learned with respect to features of inter-

ests (e.g., CPU usage, combination of users’ com-

mands), and a detection phase, where the learned

model is used to detect possible anomalies.

To build proﬁles of normal behavior, either su-

pervised or unsupervised techniques are usually em-

ployed. The former techniques build a classiﬁer from

a labeled training dataset that explicitly distinguishes

normal and abnormal behavior (Shon et al., 2005; Lee

et al., 1998). On the other hand, unsupervised tech-

niques (semi-)automatically infer baseline behaviors,

which is assumed to represent the normal behavior. A

variety of techniques have been proposed to address

unsupervised anomaly detection, ranging from sim-

ple statistical observations (e.g., frequency) on the se-

lected features (Qu et al., 1998; Smaha, 1988) to data

mining techniques such as, e.g., clustering (Bolton

and Hand, 2001) or Markov chain (Ye et al., 2000).

A well-known drawback of classic anomaly de-

tection systems is that the system behavior is often

modeled as a black box; namely, the behavior is of-

ten encoded into a mathematical model that is either

not accessible or, anyway, too abstract and complex

to be easily understood by a human analyst (Etalle,

2017). Therefore, analysts are often not aware of what

is happening in the system. This poses some chal-

lenges in determining how to response upon an alert,

thus hampering the actionability of anomaly detection

systems, which is a crucial property to their applica-

tion in practice. In this work, we explore the poten-

tiality of process analysis techniques to infer white

box behavioral model, i.e., models able to represent

the end-to-end structure of the system behavior. To

this end, we adopt a process-oriented perspectives;

namely, we assume that activities supported by the

system are structured according to an underlying no-

tion of process, which poses constraints/guidelines on

the order in which activities should be executed, data

access and so on. In particular, we adopt a broad no-

tion of process, intended as a set of activities that have

to be performed in a given order to achieve a given

goal. Adopting a process-oriented perspective allows

us to map sets of activities to a speciﬁc process ex-

ecution. This introduces a level of abstraction that

provides a comprehensive overview of the system be-

havior and simpliﬁes the understanding and analysis

of behavioral patterns.

The application of process analysis techniques for

security analysis is an under-investigated topic. A ﬁrst

step in this direction has been taken in, e.g., (van der

Aalst and de Medeiros, 2005; Accorsi et al., 2013).

Those works propose to apply a process discovery

technique on an event log to infer a process model

representing the normal system behavior, i.e. the nor-

mal ordering relations existing among process activ-

ities. However, those works infer a single model of

normal behavior from the log. As shown in (Alizadeh

et al., 2018b), this can lead to misleading diagnos-

tics. Another stream of research (Adriansyah et al.,

2013; Alizadeh et al., 2018a; Genga et al., 2018) pro-

pose to detect security threats by identifying devia-

tions of the observed behavior from process speciﬁca-

tions. However, these works typically assume that a

process model representing the normative behavior is

available, which is often not the case in real settings.

Overall, our literature review revealed the lack of

systematic approaches able to infer accurate models

of normal behavior. In the next section, we present

guidelines for a systematic behavioral analysis and

discuss the main challenges.

3 METHODOLOGY

In this section, we present a methodology for a sys-

tematic process-aware behavior analysis. The main

aims consist in (i) inferring work practices character-

izing the organization processes from the recorded be-

havior, and (ii) analyzing these practices to determine

which ones should be deemed representative of the

normal system behavior. To this end, ﬁrst we need

to reconstruct behaviors according to the underlying

process; then, we have to select features that allows

identifying and grouping similar behaviors; ﬁnally,

we delve into the inferred practices to spot possible

security concerns. Next, we present each step in de-

tail.

Behavior Identiﬁcation. The ﬁrst step of the

methodology is to represent the logs recording the

observed system behavior in a format suitable for

the application of process analysis techniques. To

this end, the activities recorded in the log should

be grouped into process executions. Although sev-

eral organization IT systems support the generation

of this type of logs (e.g., Workﬂow Management Sys-

tems, Enterprise Resource Planning systems), many

Towards a Systematic Process-aware Behavioral Analysis for Security

461

logging systems are not process-aware; therefore, a

preprocessing phase might be needed to identify pro-

cess executions from the available logs. This pre-

processing phase can be further reﬁned in two steps.

The ﬁrst one is aimed to collect and properly inte-

grate process data possibly spread among several and

heterogeneous sources (e.g., Excel spreadsheets, or

database tables). To this end, Extract, Transform

and Load (ETL) methodologies and techniques de-

veloped within the data warehouse community can be

exploited (see, e.g., (Vassiliadis, 2009) for a survey on

the most commonly used ETL tools). The second one

is devoted to process identiﬁcation. Logging systems

that are not driven by a notion of process provide a

simple stream of recorded activities, with no indica-

tion of which activities belong to the same process ex-

ecution. Therefore, the event log has to be processed

in order to identify and group together activities be-

longing to a single process execution. It is worth not-

ing that this challenge is far from trivial and several

research efforts have been devoted to cope with this

issue, e.g. (Ferreira and Gillblad, 2009; Walicki and

Ferreira, 2011).

The output of this step is an event log that com-

prises a collection of traces, each corresponding to

a speciﬁc process execution (also called case). Each

trace consists of a sequence of events, each recording

data related to the execution of a process activity.

Behavior Proﬁling. The second step of the method-

ology aims to build behavioral proﬁles from event

logs. This involves (i) a features selection phase,

where relevant features are identiﬁed based of the

security properties of interest, and (ii) a clustering

phase, in which behaviors similar with respect to the

selected feature set are grouped together.

As regards point (i), one can exploit both prop-

erties related to process executions (e.g., their dura-

tion) and properties related to single events (e.g., the

user who performed the corresponding activities). Al-

though this choice is strongly affected by the data

available in the event log, we can list four main gen-

eral properties that is worth considering for a security

analysis: the control-ﬂow, which refers to the order

in which activities are performed during process exe-

cutions; the users, which refers to the distribution of

the workload among involved actors; the time, which

refers to temporal aspects of process executions (e.g.,

their durations); and, ﬁnally, the data, which refers

to any other information on the process and/or on the

process activities stored in the event log.

These dimensions are often taken into account by

process analysis techniques, and we argue that they

can provide meaningful insights that can be exploited

for the detection of security threats. As an example,

analyzing the control-ﬂow one might detect violations

of security constraints (e.g., the skipping of some crit-

ical checking activities when assessing a loan appli-

cation); the analysis of users’ behaviors might point

out undesirable or risky practices (e.g., violations of

separation of duty constraints); the analysis of tempo-

ral aspects might reveal suspicious differences in the

completion time of process executions; and so on.

It is worth noting that feature selection is often an

incremental step, especially when little or no knowl-

edge on the underlying processes is available. In these

cases, one can proceed by selecting one dimension per

time and checking the obtained behaviors, then using

the obtained ﬁndings to further reﬁne the elicited fea-

tures.

Once the relevant features have been identiﬁed,

behaviors clustering is performed to build behavioral

proﬁles, i.e. clusters of behaviors that are similar ac-

cording to the selected feature set. Behaviors clus-

tering requires determining suitable ranges for the set

of intervals of the selected features. Intervals can be

determined in several ways. A simple solution con-

sists in exploiting simple statistics, e.g. mean, me-

dian, quartiles, to manually determine the intervals;

while a more sophisticated ones consists in adopting

automatic trace clustering techniques (Hompes et al.,

2015; Song et al., 2008). The choice of the technique

depends on the set of features and on the process at

hand. For instance, if a single feature is used for clus-

tering, the use of simple statistics might be sufﬁcient

to detect meaningful intervals; on the other hand, if

the combination of several properties is used, trace

clustering techniques offer a more effective alterna-

tive.

The output of this step consists in a set of behav-

ioral proﬁles, each representing a cluster of behaviors

similar with respect to a given feature set.

Behavior Analysis. The ﬁnal step of the methodol-

ogy focuses on exploring the obtained behavioral pro-

ﬁles to determine which ones should be considered

representative of normal behavior and which ones

should not.

When no a-priori knowledge is available, a com-

mon practice to determine what behavior should be

considered as normal is to use statistical measures

(Alizadeh et al., 2018b). In fact, it is reasonable to

assume normal behavior to be the one occurring more

frequently in the event log. Therefore, at ﬁrst one

might label as normal behavior the one correspond-

ing to the interval covering the standard deviations,

when the proﬁling is performed by means of statis-

tics on single features, or the one corresponding to

BASS 2018 - International Workshop on Behavioral Analysis for System Security

462

the largest cluster, when trace clustering techniques

are applied.

However, we argue that this simple labeling is not

enough to obtain a proper comprehension of the pro-

cess and further investigation should be performed.

For example, it might be the case that a behavior, con-

sidered normal with respect to a given feature, can ac-

tually raise security concerns from another perspec-

tive; or, on the opposite, behaviors that are less fre-

quent or, anyway, diverge from the normal distribu-

tion, do not reveal particular security concerns, so that

they could be labeled as normal as well.

To address this issue, as a general guideline, we

propose to analyze behavioral proﬁles in order to de-

termine which ones correspond to desirable working

practices and which ones do not. To this end, be-

sides analyzing behavioral models individually, we

also propose to compare the obtained behavioral pro-

ﬁles with each other. As shown in the next section,

this analysis indeed provides more in-depth under-

standing and insights of working practices.

4 CASE STUDY: BPI2012 EVENT

LOG

In this section, we present an application of the pro-

posed methodology to a real-life event log. We ﬁrst

introduce the dataset; then, we present the ﬁndings

obtained by inferring and exploring behaviors from

the log.

4.1 Dataset

We used the event log recording the loan management

process of a Dutch Financial Institute, which was

made available for the 2012 BPI challenge (BPI Chal-

lenge 2012, 2012). The event log contains the events

recorded for three intertwined subprocesses: subpro-

cess A speciﬁes how loan applications should be han-

dled, subprocess O describes how loan offers should

be handled, and subprocess W speciﬁes how work

items are processed. For each activity, the log stores

its name, timestamp and the involved user. Moreover,

for each application the log provides the requested

amount. The ﬁrst row of Table 1 shows some statistics

of the event log, namely the number of traces, activ-

ities and events, together with the minimum, maxi-

mum and mean number of events in each trace.

4.2 Behavior Identiﬁcation

The BPI2012 log consists of traces grouping activities

belonging to the same process execution; thus, a pre-

Table 1: Statistics related to the entire set of traces and to

the traces corresponding to approved applications.

Behavior Cases Activities Events

Min

Events

Max

Events

Mean

Events

all 13087 36 262200 3 175 20

approved 2246 21 99925 22 163 44

processing step is not needed for this log. Neverthe-

less, analyzing this log, we observed a strong variabil-

ity among process traces. In particular, the log records

process executions related to approved applications,

declined applications, canceled applications, and run-

ning applications. It is reasonable to expect that traces

concerning the approval of applications differ signif-

icantly from traces concerning, e.g., the denial of ap-

plications. Such heterogeneity usually has a negative

effect on the results of process analysis techniques.

Among them, we decided to focus on a subset of

the traces, i.e. those related to approved applications.

This is because, from a security viewpoint, approved

applications are more critical than others, since se-

curity incidents in those process executions can re-

sult in a ﬁnancial loss for the ﬁnancial institute. Pro-

cess executions concerning the approval of applica-

tions are typically characterized by the occurrence of

activity A APPROVED and the absence of activities

A DECLINED, A CANCELED. We ﬁltered the log

accordingly and obtained a new event log. The sec-

ond row of Table 1 shows statistics of the resulting

event log. There are 2246 approved applications, cor-

responding to approximately 17% of the total number

of applications.

4.3 Behavior Proﬁling

Behavioral proﬁles can be built with respect to the

dimensions discussed in Section 3. For the sake of

space, in this work we present the results for only one

dimension, namely the temporal duration of process

executions. We selected this feature since large dif-

ferences in the duration of process executions might

indicate differences in the conscientiousness applied

in the approval of application, which is in general un-

desirable.

From the event log we observed that the mean

time of approval is around 17 days; to delve more into

the temporal dimension, we plot histogram related

to applications completion times in days, reported in

Figure 1. The distribution shows that applications are

mainly closed before 40 days; analyzing the event log,

we observed that most of the applications are closed

within 10-30 days. However, there are some notable

outliers, some of which required much more time. It

is also worth noting that we even have a set of appli-

cations closed in less than one day.

Towards a Systematic Process-aware Behavioral Analysis for Security

463

Figure 1: Histograms of temporal duration of approved ap-

plications in logarithmic scale.

Table 2: Statistics of the event logs related to approved ap-

plications.

Behavior Cases Activities Events

Min

Events

Max

Events

Mean

Events

long 292 21 20059 26 163 69

normal 1877 21 77284 22 109 41

short 50 20 1855 22 70 37

instant 27 19 727 22 49 27

These differences in the duration of process exe-

cutions conﬁrm that the temporal dimension is indeed

relevant for this dataset. If on the one hand these dif-

ferences might simply be due, for instance, to differ-

ent customers (e.g., the procedure is likely to be much

smoother and shorter for well-known customers than

for new ones); on the other hand, they might also be

due to shallow checking or even abuses. Since we are

considering a single feature, we exploited some ba-

sic statistics for behavior clustering. The mean dura-

tion time for approved applications is 16.8 days, with

a standard deviation of 9.75. Therefore, we marked

as normal behavior all applications approved in more

than six days and in less than 27 days. Applications

longer than 27 days are marked as long applications.

We partitioned shorter applications in two classes:

short applications, i.e. applications approved in more

than 1 day and in less than 6, and instant applications,

i.e. applications that were approved in few hours and

hence result of particular interest for the scope of our

analysis. Red lines in Figure 1 delimit the portion of

applications belonging to each class (e.g., the ﬁrst line

delimits the portion related to instant applications, the

second line short applications and so on). Table 2

shows some statistics for the logs corresponding to

the classes of behaviors considered in our analysis.

Note that other statistics, as well as trace clustering

techniques, could been used as well to discriminate

behaviors.

4.4 Behavior Analysis

The last step of the methodology in Section 3 involves

a comparative analysis of the obtained behavioral pro-

ﬁles to support analysts in characterizing normal be-

haviors from abnormal ones. Next, we show the

results of the comparison considering control-ﬂow,

time, users and loan amount as features.

Control Flow. The ﬁrst feature we consider for the

comparison is the control-ﬂow of the behavior clus-

ters. By comparing the control-ﬂow we expect to re-

veal differences in whether and how activities are per-

formed within each cluster. In particular, we can de-

termine, for example, whether some checking activi-

ties tend to be delayed or completely skipped in some

cluster.

To determine the control-ﬂow, we need to uncover

the process structure from the event log. To this end,

we employ process discovery techniques, which ana-

lyze the ordering relations among events stored in the

event log to infer a process model. Since our process

is rather complex and involves many different vari-

ants, we apply a heuristic technique that only high-

lights the most relevant process behaviors. More pre-

cisely, we use Disco

, a well-known tool for process

discovery.

Given an event log, Disco returns a process model

in which each node corresponds to a process activ-

ity and each edge represents an ordering relation be-

tween a pair of activities. Only activities and relations

whose frequency is above a user-deﬁned threshold are

reported. For our analysis we used the default set-

tings. Figure 2 shows the process model returned by

Disco for long applications. The numbers inside the

boxes and above the edges show the maximum num-

ber of repetitions within a trace of the corresponding

activity/relation. On the edges it is also displayed, in

lighter black, the average time between the pair of ac-

tivities.

The discovered model highlights three main

phases of the process. The top one involves some

initial activities concerning the submission of the

application and the starting of the process; the central

part involves activities related to the check of the

application and to the creation and the negotiation

of an offer with the applicant; ﬁnally, the bottom

part comprises additional controls on the application,

e.g. fraud checking, together with the ﬁnalization of

the application. For the other types of applications

(Table 2), we found almost the same structure, with

some minor variations (e.g., some missing activities

https://ﬂuxicon.com/disco/

BASS 2018 - International Workshop on Behavioral Analysis for System Security

464

Figure 2: Activities ﬂow of long applications.

in shorter applications). The main differences lie in

the number of times activities are repeated, especially

concerning activities related to application check,

and in the average times between pairs of activi-

ties. For example, activity W Completerenaanvrag

was repeated a maximum of 7 times within in-

stant applications, against a maximum of 28 for

normal applications. Moreover, the average time

between activities W Completerenaanvrag and

W Nabellenoffertes is less than two minutes, against

an average time of 3 days for the normal cases. This

implies that instant applications were characterized

by few rounds of checking and a very quick interac-

tion with the applicant.

Another interesting difference is that some

process activities have not been performed in

Towards a Systematic Process-aware Behavioral Analysis for Security

465

Figure 3: Workload distribution over time for approved applications.

shorter applications (i.e., short and instant).

In particular, in those applications activity

W Wi jzigencontractgegevens was not executed,

indicating that there were no changes made to the

contract after the approval. In addition, activity

W Beoordelenfraude was not performed in instant

applications, indicating that a check for possible

frauds was never performed for these applications.

Time. The second features we take into account for

our comparison is the temporal distribution of activi-

ties. In particular, we intend to analyze how activities

in different behavioral proﬁles are distributed among

working days and week-ends. Since the BPI2012 log

concerns a ﬁnancial institute, we expect activities to

be executed only from Monday to Saturday; activ-

ities performed on Sunday would represent an un-

usual, possibly suspicious, behavior that needs to be

checked.

Figure 3 displays a dotted chart showing the activ-

ities performed within each process execution against

the corresponding days, thus allowing us to explore

process workload over time. The x-axis represents

the events’ timestamps, whereas the-y axis the trace

identiﬁers. Colors represent classes of behaviors.

We can observe that process activities of long and

normal applications are mainly grouped in bunches,

separated by inactivity periods. Checking the corre-

sponding timestamps, we observed that these inter-

ruptions correspond to week-ends. In this respect, it

is worth noting that, although the workload is much

less intense, many activities were also performed dur-

ing the week-end. We could not ﬁnd a similar case

Figure 4: Distribution of the amount requested for each pro-

cess behavior.

for the short and instant applications. However, we

observed some regularities in instant applications, i.e.

they tended to occur in groups of three or four.

Amount. We also analyzed the distribution of the

requested amounts for each behavioral proﬁle to de-

termine whether there exist a correlation between

them. It is reasonable to expect to ﬁnd some corre-

lation between the requested amounts and the dura-

tion of the process since higher amount should require

more careful controls and negotiations with the appli-

cant. In general, the approval of applications request-

ing high amounts in short period of time could be un-

desirable and might deserve further investigations to

understand the reasons.

The results of the analysis are reported in Figure 4.

It is worth noting that the distribution of requested

amounts look roughly the same for the different be-

havioral proﬁles, with the exception of some high val-

BASS 2018 - International Workshop on Behavioral Analysis for System Security

466

ues that occur mostly in the normal and long appli-

cations (marked as outliers in the ﬁgure). Even more

surprisingly, it looks like instant applications involved

amounts overall higher than amounts requested in

other applications. Nonetheless, it is worth noting that

we have less cases for this type of applications, which

means that variations of even few cases have a larger

impact on the overall distribution.

To conﬁrm that this difference is statistically not

signiﬁcant, we also performed a hypothesis testing,

comparing the distribution of the amounts requested

in normal applications with the distribution of the

amount requested in the other classes of applications.

Since the distributions of amounts do not ﬁt the nor-

mality assumption, we adopted a non-parametric test.

In particular, we used the WilcoxonMannWhitney

two-sample rank-sum test (Mann and Whitney, 1947),

which is a test commonly used when the assumptions

of t-test are not met.

Note that we have a sufﬁcient number of samples

to perform this test for almost all pairs.

The only

exception is the comparison between the normal and

instant applications; nevertheless, we have 27 appli-

cations for the latter, so the results we obtain can be

still considered a good approximation.

For the testing between normal and long applica-

tions, we obtained a p-value of 0.28; between nor-

mal and short applications we obtained a p-value of

0.69; ﬁnally, between normal and instant applications

we obtained a p-value of 0.40. Since differences be-

tween distributions are typically considered of statis-

tically signiﬁcance when the p-value is smaller than

0.1, the obtained values conﬁrm that the differences

we observed for the amounts requested in the different

classes of applications are not statistically relevant.

Users. The process at hand involves several, and of-

ten iterated, activities related to the management and

checking of applications. Therefore, by analyzing the

workload distribution we can grasp some insights on

whether duties tend to be distributed among different

actors. Such information can help analysts in un-

derstanding whether the responsibilities are properly

distributed among users. As an example, one can

check whether critical activities (e.g., the approval

of an application) are performed by different users,

or, instead, few users performed most of the critical

activities. The latter is an undesirable situation since

it might indicate violations of separation of duty

constraints.

Common rule-of-thumb for the applicability of Wilcoxon-

Mann-Whitney test is to have at least 30 samples per

group.

Table 3: Workload distribution over users for approved ap-

plications.

Behavior Min Actors Max Actors Mean Actors

long 5 28 12.5

normal 3 20 7.3

short 3 10 5.3

instant 2 5 2.6

We analyzed the number of different users in-

volved in each process execution, computing the min-

imum, maximum and average number for each behav-

ioral proﬁle. Results are reported in Table 3.

It is worth noting that different behavioral proﬁles

are characterized by a different workload distribution.

In particular, while longer applications seem to nor-

mally involve many different actors in their execu-

tions, much less actors are normally involved in the

approval of shorter applications. This is especially

evident for instant applications, in which two to ﬁve

employees always performed of all process activities.

5 DISCUSSION AND

CONCLUSION

In this work, we performed a preliminary exploration

of the potentialities of process-aware behaviors anal-

ysis for security. We adopted a process-oriented per-

spective to structure raw data recorded by logging

systems. By exploiting this representation, we de-

lineated a methodology to build accurate and under-

standable behavioral proﬁles that can assist analysts

in the understanding of the current work practices.

We applied the proposed methodology to a real-world

event log, providing some concrete examples of how

off-the-shelf process analysis techniques and tools

can be used to explore several security-related aspects

of process executions along different dimensions.

Despite our analysis showed that process-aware

behavioral analysis has great potentialities when ap-

plied to security, it presents a number of challenges.

Next, we list some of the most relevant ones:

• Data Collection & Preprocessing: Our analysis

assumes that it is possible to obtain an event log

grouping all activities related to the same process

execution. Although the log used in our case study

meets this demand, this is not always the case. In

many IT systems, log data might be spread along

several and heterogeneous data sources (e.g., Ex-

cel spreadsheets, databases). This requires re-

trieving all scattered data and merging them in a

proper format for the analysis. Even when events

have been reconstructed along with the relevant

information, additional challenges have to be face

Towards a Systematic Process-aware Behavioral Analysis for Security

467

to identify which events belong to the same pro-

cess execution (Alizadeh et al., 2018b). Solving

such challenges is far from being trivial.

• Features Selection: The choice of the features to

be considered depends on the application domain

and scope of the analysis. In particular, it requires

background knowledge of the underlying process

and prior knowledge of what to look for, which

is not always the case, especially in the security

context. On top of this, the analysis is constrained

by the information available in the log.

• Technique Choice: There is not a one-ﬁt-all tech-

nique for all cases. The choice of the techniques

to be used for the analysis depends on the scope

of analysis and type of data. This requires expe-

rienced and highly skilled analysts, with a strong

background both in security and in data analysis

techniques.

More research efforts are necessary to explore and

systematize ﬁndings and results obtained so far and to

develop a more general framework. In future work,

we plan to investigate these issues. In particular, we

intend to further elaborate on the observations made

in this work to devise general guidelines to apply data

science to behavior analysis for security, taking into

account a larger range of techniques. At the same

time, we plan to perform an extensive experimental

evaluation on both synthetic and real-world logs.

ACKNOWLEDGMENTS

This work is partially supported by ITEA3 through

the APPSTACLE project (15017) and by ECSEL

through the SECREDAS project.

REFERENCES

Accorsi, R., Stocker, T., and M

uller, G. (2013). On the ex-

ploitation of process mining for security audits: the

process discovery case. In Proceedings of Annual

ACM Symposium on Applied Computing, pages 1462–

1468. ACM.

Adriansyah, A., van Dongen, B. F., and Zannone, N. (2013).

Controlling break-the-glass through alignment. In

Proceedings of International Conference on Social

Computing, pages 606–611. IEEE.

Alizadeh, M., Lu, X., Fahland, D., Zannone, N., and van

der Aalst, W. M. P. (2018a). Linking data and process

perspectives for conformance analysis. Computers &

Security, 73:172–193.

Alizadeh, M., Peters, S., Etalle, S., and Zannone, N.

(2018b). Behavior Analysis in the Medical Sector:

Theory and Practice. In Proceedings of ACM/SIGAPP

Symposium On Applied Computing. ACM.

Association of Certiﬁed Fraud Examiners (2018). Report

to the Nations: 2018 Global study on occupational

fraud and abuse. http://www.acfe.com/report-to-the-

nations/2018/. Accessed: 2018-06-01.

Bolton, R. J. and Hand, D. J. (2001). Unsupervised proﬁling

methods for fraud detection. In Proceedings of Credit

Scoring and Credit Control, pages 235–255.

BPI Challenge 2012 (2012). Event log of a loan application

process. http://dx.doi.org/10.4121/uuid:3926db30-

f712-4394-aebc-75976070e91f.

Cao, L. (2010). In-depth behavior understanding and use:

The behavior informatics approach. Information Sci-

ences, 180(17):3067 – 3085. Including Special Sec-

tion on Virtual Agent and Organization Modeling:

Theory and Applications.

Chandola, V., Banerjee, A., and Kumar, V. (2009).

Anomaly detection: A survey. ACM Computing Sur-

veys, 41(3):15.

Costante, E., Fauri, D., Etalle, S., den Hartog, J., and Zan-

none, N. (2016). A hybrid framework for data loss

prevention and detection. In Proceedings of IEEE Se-

curity and Privacy Workshops, pages 324–333. IEEE.

Etalle, S. (2017). From intrusion detection to software de-

sign. In Computer Security, pages 1–10. Springer.

Ferreira, D. R. and Gillblad, D. (2009). Discovering pro-

cess models from unlabelled event logs. In Business

Process Management, pages 143–158. Springer.

Genga, L., Alizadeh, M., Potena, D., Diamantini, C., and

Zannone, N. (2018). Discovering anomalous frequent

patterns from partially ordered event logs. Journal of

Intelligent Information Systems.

Hompes, B., Buijs, J., van der Aalst, W., Dixit, P., and Buur-

man, J. (2015). Discovering deviating cases and pro-

cess variants using trace clustering. In Proceedings of

Benelux Conference on Artiﬁcial Intelligence, pages

5–6.

Lee, W., Stolfo, S. J., et al. (1998). Data mining approaches

for intrusion detection. In Proceedings of USENIX

Security Symposium, pages 79–93.

Mann, H. B. and Whitney, D. R. (1947). On a Test of

Whether one of Two Random Variables is Stochas-

tically Larger than the Other. Ann. Math. Statist.,

18(1):50–60.

Patcha, A. and Park, J.-M. (2007). An overview of

anomaly detection techniques: Existing solutions

and latest technological trends. Computer networks,

51(12):3448–3470.

Qu, D., Vetter, B. M., Wang, F., Narayan, R., Wu, S. F.,

Hou, Y., Gong, F., and Sargor, C. (1998). Statistical

anomaly detection for link-state routing protocols. In

Proceedings of International Conference on Network

Protocols, pages 62–70. IEEE.

Richardson, R. (2008). CSI Computer Crime and Security

Survey.

Shon, T., Kim, Y., Lee, C., and Moon, J. (2005). A ma-

chine learning framework for network anomaly detec-

tion using SVM and GA. In Proceedings from Annual

BASS 2018 - International Workshop on Behavioral Analysis for System Security

468

IEEE SMC Information Assurance Workshop, pages

176–183.

Smaha, S. E. (1988). Haystack: An intrusion detection sys-

tem. In Proceedings of Aerospace Computer Security

Applications Conference, pages 37–44. IEEE.

Song, M., G

unther, C. W., and van der Aalst, W. M. (2008).

Trace clustering in process mining. In Business Pro-

cess Management, pages 109–120. Springer.

van der Aalst, W. M. and de Medeiros, A. K. A. (2005). Pro-

cess mining and security: Detecting anomalous pro-

cess executions and checking process conformance.

Electronic Notes in Theoretical Computer Science,

121:3–21.

Vassiliadis, P. (2009). A survey of extract–transform–load

technology. International Journal of Data Warehous-

ing and Mining, 5(3):1–27.

Walicki, M. and Ferreira, D. R. (2011). Sequence parti-

tioning for process mining with unlabeled event logs.

Data & Knowledge Engineering, 70(10):821 – 841.

Ye, N. et al. (2000). A markov chain model of tempo-

ral behavior for anomaly detection. In Proceedings

of IEEE Systems, Man, and Cybernetics Information

Assurance and Security Workshop, volume 166, page

169.

Towards a Systematic Process-aware Behavioral Analysis for Security

469