An Interval Distribution Analysis for RTI+

Fabian Witter

, Timo Klerx

and Artus Krohn-Grimberghe

Department of Computer Science, Paderborn University, Paderborn, Germany

Department of Business Information Systems, Paderborn University, Paderborn, Germany

{fwitter, timok}@mail.upb.de, artus@aisbi.de

Keywords:

Statistical Distribution Analysis, Outlier Detection, Deterministic Real-Time Automata, Sequence-based

Anomaly Detection, ATM Fraud Detection.

Abstract:

The algorithm RTI+ learns a Probabilistic Deterministic Real-Time Automaton (PDRTA) from unlabeled timed

sequences. RTI+ is an efﬁcient algorithm that runs in polynomial time and can be applied to a variety of

real-world behavior identiﬁcation problems. Nevertheless, we uncover a lack of accuracy in identifying the

intervals (or time guards) of the PDRTA. This inaccuracy can lead to wrong predictions of timed sequences in

the learned model. We show by example that segments in intervals that are not covered by training data are

responsible for this effect. We call those segments gaps and name three types of gaps that can appear. Two of

the types cause wrong predictions of sequences and should thus be removed from the model. Therefore, we

introduce our novel Interval Distribution Analysis (IDA) which utilizes statistical outlier detection to identify

and remove gaps. In the context of ATM fraud detection, we show that IDA can improve the results of RTI+ in

a real-world scenario.

1 INTRODUCTION

The algorithm Real-Time Identiﬁcation from Positive

Data (RTI+) was developed by Verwer et al. (Verwer

et al., 2010) to identify real-time behavioral models

from positive data. RTI+ runs in polynomial time and

can identify sufﬁciently large models to be able to

learn the behavior of real-world systems (Verwer et al.,

2010). Hence, it is a practically valuable algorithm

which can be applied to a wide range of behavioral

scenarios of log-emitting systems, e.g. ATM fraud

detection or elevator breakdown prediction.

In this paper, we reveal a weakness of RTI+. The

models learned by RTI+ may include a wider variety

of time values than the training data yields. Hence,

it underﬁts the data regarding the time values. This

effect is reﬂected in widened intervals which are part

of the model. We call the segments of the intervals,

that are not covered by the training data, gaps. To

overcome this imprecision of RTI+, we introduce our

novel Interval Distribution Analysis (IDA). IDA uti-

lizes statistical outlier detection to detect and remove

the gaps and, thus, improves the model quality.

The paper is structured as follows: We start with a

description of the algorithm RTI+ which our research

is based on in Section 2. In Section 3 we go into de-

tails of RTI+ and analyze the origin of gaps in intervals.

Furthermore, we describe the negative impact of gaps

in assigning probabilities to sequences. Our conse-

quent next step is to introduce our IDA procedure for

detecting and removing those gaps in Section 4. There

we describe the metrics that IDA uses and how RTI+

and IDA collaborate. Subsequently, we apply RTI+

and IDA to the problem of ATM fraud detection in Sec-

tion 5 to point out the impact of IDA in a practical use

case. Moreover, in Section 6 we give an outlook of the

future development of IDA by describing alternative

algorithms to solve the problem. This is followed by

a review of related work in Section 7. In the end, we

summarize the main points of the paper and conclude

on IDA in Section 8.

2 BASICS: RTI+

In this section we describe the algorithm RTI+ (in-

troduced by Verwer et al. in (Verwer et al., 2010)).

This algorithm identiﬁes real-time behavior models in

polynomial time.

RTI+ is based on the Evidence-Driven State Merg-

ing (EDSM) algorithm (Lang et al., 1998). It computes

a model out of a ﬁnite multiset of unlabeled timed

sequences. A timed sequence is a chain of symbols ex-

tended by the time delays between them. As a model,

Witter, F., Klerx, T. and Krohn-Grimberghe, A.

An Interval Distribution Analysis for RTI+.

DOI: 10.5220/0006146603510358

In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 351-358

ISBN: 978-989-758-222-6

351

Verwer et al. use a Probabilistic Deterministic Real-

Time Automaton (PDRTA) which is an extended De-

terministic Real-Time Automaton (DRTA) (cf. (Dima,

2001)). A PDRTA is deﬁned as follows:

Deﬁnition 1

(PDRTA)

A Probabilistic Determinis-

tic Real-Time Automaton (PDRTA) is a four-tuple

A = hA

,H, S ,T i, where

• A

= hQ, Σ,∆, q

is a DRTA without accepting

states, where

– Q is a ﬁnite set of states

– Σ is a ﬁnite set of symbols

– ∆ is a ﬁnite set of timed transitions

– q

is the start state

• H

is a ﬁnite set of histogram bins

h = [v,v

]

with

v,v

∈ N

• S

is a ﬁnite set of symbol probability distributions

= {Pr(s | q) | s ∈ Σ,q ∈ Q}

• T

is a ﬁnite set of histogram bin probability distri-

butions T

= {Pr(t ∈ h | q) | h ∈ H,q ∈ Q}

The DRTA is extended by probability distributions for

symbols and time delays which are modeled by global

histogram bins. These probability distributions are

used for learning the model and later also for predict-

ing sequences.

: a [1, 3]

: b [4, 7]

δ'

: a [8, 10]

: c [4, 10]

: d [1, 10]

δ'

: c [1, 3]

H = { [1, 3], [4, 7], [8,10] }

Figure 1: Example PDRTA with three states and six transi-

tions.

An example PDRTA is shown in Fig. 1. This

PDRTA consists of three states where

is the start

state indicated by the sourceless transition on the

left.

has three outgoing transitions which all have

state

as target. For example when in state

, the

symbol

with a time delay of

time units will use

transition

. The symbol

can also take transition

when it is delayed with a time in

[8,10]

. The three

histogram bins for time delays are shown below the

automaton structure. For reasons of clarity we do not

show the probability distributions S and T . But there

exist probabilities for each state to be left with a cer-

tain symbol and a time delay in a certain histogram

bin. Both probability distributions are independent of

each other. In the following, we describe the routine

and methods of RTI+ to construct a PDRTA from a

multiset of training sequences.

The goal of RTI+ is to ﬁnd a small PDRTA that de-

scribes the training sequences sufﬁciently well. RTI+

starts with generating a Preﬁx Tree Acceptor from

training sequences. Initially the intervals (time guards)

of the transitions are chosen as the global minimum

and maximum of time delays in the training sequences.

Hence, the structure of the tree is equal to the untimed

version of the tree (when time delays are ignored).

This initial model may be missing time-dependent sub-

structures and may contain multiple similar substruc-

tures. The task of RTI+ is to identify these two types

of substructures. In case of a missing time-dependent

substructure, the PDRTA is specialized (SPLIT), while

in case of multiple similar substructures, the PDRTA

is generalized to compress the model (MERGE).

To improve the initial tree, RTI+ iterates within the

red-blue coloring framework (inherited from EDSM)

over the tree and manipulates it with a MERGE oper-

ation for compression (inherited from EDSM) and a

newly introduced SPLIT operation for specialization.

The pseudo code of this routine can be found in Algo-

rithm 3 in Section 4 alongside with the routine for our

novel distribution analysis.

The MERGE operation is used to generalize the

model by combining two similar substructures into a

new one with a preferably small loss of model qual-

ity regarding the training sequences. The merging

process of the substructures starts by merging two cor-

responding states in each substructure into one state.

All transitions of the two states are preserved which

can cause non-determinism in the outgoing transitions

of the merged state. To solve this problem, all target

states of non-deterministic transitions are merged re-

cursively, too, until the model is deterministic again.

Note that a MERGE operation can also create cycles or

combine two subtrees.

The SPLIT operation has a different goal than a

MERGE. Instead of generalizing the model, SPLITS

aim to specialize by creating time-dependent substruc-

tures that model the training sequences more precisely.

This is done by splitting an interval of a transition at

a certain time value. For the two resulting transitions,

both appending subtrees are recomputed on the basis

of the training sequences. Since a wrong SPLIT can be

made undone by the more powerful MERGE operation,

SPLITS will always be preferred to MERGES.

Usually, many possibilities to perform MERGES

and SPLITS exist in every iteration of RTI+. For de-

termining useful operations and for deciding which of

these operations to perform, Verwer et al. introduce a

statistical Likelihood Ratio Test (LRT) for PDRTAs.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

352

In each iteration RTI+ tests all possible SPLITS and

MERGES before greedily performing the operation that

results in a PDRTA which should model the training

sequences best. RTI+ may also perform no operation

at all if the actual PDRTA is the best option. Before

presenting our approach to improve SPLITS of time

values we describe and identify the reason for improve-

ment in the next section.

3 GAPS IN INTERVALS

In this section we ﬁrst describe the problem of RTI+

when creating time intervals of timed transitions, fol-

lowed by two reasons why intervals are identiﬁed in-

correctly.

The intervals of the transitions of PDRTAs that are

learned by RTI+ may contain segments that are not

covered by sequences from the training set. In the

following we refer to these segments as gaps.

For demonstrating the inﬂuence of RTI+ in creating

intervals with gaps, we use the PDRTA from Fig. 1 to

sample a set of training sequences. Afterwards we let

RTI+ recreate the automaton. For our example we only

consider the transitions between state

and state

which are shown in Fig. 2(a). There exist three tran-

sitions

and

that go from state

to state

Transition

is used for symbol

and time delays in

interval

[4,7]

. In addition, transition

(resp.

) is

used for symbol a and interval [1,3] (resp. [8,10]).

: a [1, 3]

: b [4, 7]

δ'

: a [8, 10]

(a) Transitions between state

and state

the original PDRTA shown in Fig. 1.

: a [1, 10]

: b [1, 10]

(b) Transitions between state

and state

the PDRTA generated by RTI+.

Figure 2: Differences for an excerpt of the PDRTA in Fig. 1.

The transitions of the PDRTA trained with RTI+

are shown in Fig. 2(b). In contrast to the original

automaton the transitions

and

are now modeled

by a single transition

. Furthermore, the interval

is extended compared to the original

. This

is caused by the disability of RTI+ to detect gaps in

intervals. For

these gaps are in

[1,3]

and

[8,10]

while for

the gap is in [4,7].

After exemplarily showing the existence of gaps in

intervals, we investigate the reasons for those gaps to

appear in learned PDRTAs. Fig. 3 shows a visualiza-

tion of the three different types of gaps in an interval.

The blue bars mark segments of time delays that are

covered by sequences of the training set. The gaps are

caused by one of the following reasons:

Symbol Diversity.

Multiple symbols have different

domains of time delays. Since the initial tree is con-

structed with intervals deﬁned by the global minimum

and maximum of delays in the training set, large gaps

may appear at the beginning or the end of intervals (cf.

Fig. 3). The gaps in

are caused by this effect.

Time Delay Partitioning.

Time delays of a sym-

bol are partitioned into several disconnected subdo-

mains. Intervals then contain gaps between the sub-

domains, e.g. the subdomains

[x,x

]

and

[y,y

]

with

x ≤ x

 y ≤ y

surround a gap between

and

. This

applies for the gap in the interval of

. In this exam-

ple we say

3  8

related to the global maximum delay

. These gaps can be a lot larger in real-world

scenarios. We visualized this gap type in Fig. 3.

Training Set Incompleteness.

The ﬁnite set of train-

ing sequences usually does not contain all possible

sequences of the (inﬁnite) model space. Therefore,

some time delays within the domain(s) of a symbol

may not be included in the training set for some states

in the PDRTA. These missing time delays also form

gaps. One of those gaps is shown in Fig. 3. The gaps

of this type may not only appear in between covered

segments but also next to gaps of the other two types.

Time delays

Training Set

Incompleteness

Time Delay

Partitioning

Symbol Diversity

}

Figure 3: Visualization of the different gap types Symbol

Diversity, Time Delay Partitioning and Training Set Incom-

pleteness in an interval.

All of the gap types mentioned above are poten-

tially included in PDRTAs that were learned by RTI+.

But only Training Set Incompleteness is part of the nor-

mal behavior model. Hence, a PDRTA should return

a probability of zero for test sequences that contain

time delays lying in gaps of types Symbol Diversity

and Time Delay Partitioning. This result cannot be

guaranteed since Verwer et al. model the histogram

bin probability distributions independently of the sym-

bol probability distributions. In fact, such sequences

An Interval Distribution Analysis for RTI+

353

can inherit a positive probability from other symbols’

time delays. For example the pair of symbol and

time delay

(a,5)

is not part of the original PDRTA

in Fig. 2(a). Nevertheless, it receives a positive proba-

bility for its time delay from the histogram bin

[4,7]

in the reconstructed PDRTA in Fig. 2(b). This his-

togram bin is only used by pairs with symbol

in the

original PDRTA and, hence, in the training sequences.

Thus, the false probability assignment can cause false

sequence predictions and should be avoided.

This problem could be easily overcome by using a

symbol-dependent histogram bin probability distribu-

tion. But when designing RTI+, Verwer et al. decided

to model the probabilities for symbols and histogram

bins independently to avoid that the size of PDRTAs is

increased by a polynomial factor (Verwer et al., 2010).

This feature should be left untouched as the structural

identiﬁcation ability of RTI+ is recognizably good.

Our approach is to extend RTI+ by a new feature

that simulates symbol-dependency for time delays. In

contrast to the real symbol-dependency, our extension

can be easily adapted and its impact on the resulting

PDRTA can be scaled based on the needs for the cur-

rent problem domain. In the following section we

introduce the new feature called Interval Distribution

Analysis (IDA) in detail.

4 INTERVAL DISTRIBUTION

ANALYSIS (IDA)

In this section we describe the Interval Distribution

Analysis (IDA) that removes segments of intervals that

are not part of the normal behavior model.

Above we described how time delays that are not

part of the model can inherit positive probabilities. In

our approach, we try to avoid this wrong assignment

of probabilities by limiting the paths in the resulting

PDRTA. Thus, sequences which have time delays out-

side of the domains of the corresponding symbols do

not have a path in the PDRTA after the paths have been

limited. We limit the paths by removing gaps from

the intervals of transitions during training. To real-

ize this approach, we use the present SPLIT operation

with a new heuristic to determine and remove empty

segments in intervals.

First, reconsider which gaps should be removed

and which should be kept according to the three cases

described in the previous Section 3. The gaps from

the ﬁrst two types Symbol Diversity and Time Delay

Partitioning are clearly not part of the normal behavior

of the process to be modeled and should be removed.

In Fig. 3, we underlined the segments that should be

removed from the PDRTA in orange color. On the

contrary the gaps of the third type Training Set Incom-

pleteness are part of the normal behavior that is not

covered by the training set and, hence, should be kept.

To be able to distinguish between gaps to remove

and gaps to keep, we make an initial assumption for

the training set: We assume that the gaps from the

third case Training Set Incompleteness are smaller than

those from the other two cases because the training se-

quences represent the model to be learned adequately.

This transforms the problem to removing only larger

gaps. For detecting the larger gaps, we use statisti-

cal methods for distribution analysis on the distances

between covered elements in an interval. Therefore,

we call our new feature Interval Distribution Analy-

sis (IDA). IDA computes a maximally allowed gap

for an interval by collecting and analyzing the dis-

tances between neighbored covered elements. For this

analysis, we also consider the distances of size

be-

tween directly neighbored covered elements. Thus, we

are able to take connected blocks of covered elements

into account. To compute

from the collected dis-

tances, we perform a statistical outlier detection. We

use outlier-robust measures since the analyzed distribu-

tions of distances are unknown and our analysis should

not be distorted by outliers. After the outlier detection,

we remove the outlier distances that are larger than the

expected distances and, thus, are gaps.

A suitable measure is the outlier detection with Me-

dian Absolute Derivation (MAD) (Leys et al., 2013).

By using median and MAD, we deﬁne

as the upper

border for outliers as follows:

MAD

= m + 2.5 · MAD = m + 2.5 · median

{

− m|

}

where

is the median of the collected distances

This formula is an outlier-robust and, thus, advisable

replacement for the outlier detection with mean and

standard deviation

µ + 2.5 · σ

(Leys et al., 2013). We

chose

2.5

as coefﬁcient of the MAD which is mod-

erately conservative according to Leys et al. To get

MAD

= 0

, at least

50%

of the distances need to have

a length of 0.

Another way to calculate the border for outliers

is the Interquartile Range (IQR). This is an outlier-

robust standard method in statistics for outlier detec-

tion which was proposed by Tukey (Tukey, 1977).

then deﬁned as:

IQR

= Q

+ 1.5 · IQR = Q

+ 1.5 · (Q

− Q

)

where

are the quartiles. This method allows larger

gaps than

MAD

because more than

75%

of the dis-

tances need to have a length of 0 to get

IQR

= 0.

After

has been determined, the interval has to

be segmented at the correct positions with SPLITS to

remove the gaps. The borders of the resulting intervals

should be chosen wisely, especially not directly after a

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

354

Input: A training set with timed sequences S

, a set of histogram

bins H and a signiﬁcance level σ

Output: A small PDRTA A according to the input

1 Initially create PDRTA A as a tree from the input sequences in S

2 Color the start state q

of A red

3 while A contains non-red states do

4 Color blue all non-red target states of transitions with red

source states

5 Let δ = hq

,s,[n,n

]i be the most visited transition from a

red state q

to a blue state q

6 if [n,n

] is an initial interval then

7 Check interval for empty spaces that should be removed

by performing IDA

8 if there exist such empty spaces then

9 Remove these empty spaces with SPLITS

10 if δ has not been modiﬁed by IDA then

11 Evaluate all possible MERGES of q

with red states

12 Evaluate all possible SPLITS of [n, n

]

13 if the lowest p-value of a SPLIT is less than σ then

14 Perform this SPLIT

15 else if the highest p-value of a MERGE is greater than σ

then

16 Perform this MERGE

17 else

18 Color q

red

Algorithm 1: Extended RTI+ (eRTI+) (adapted version of

the pseudo code in (Verwer et al., 2010)) which contains the

new IDA routine (blue, lines 6-10).

covered element. This results from the gap type Train-

ing Set Incompleteness which says that some time

delays may be missing in the training set. Hence, by

splitting the interval not directly before or after a cov-

ered element, we want to create small gaps of type

Training Set Incompleteness. For a margin between a

SPLIT position and covered elements, we chose

/2e

Thus, a gap between two neighbored covered elements

is at most of size

and the distance from the interval

borders to the ﬁrst or last covered element is

/2e

after

the interval has been split according to IDA.

Our new IDA procedure has now to be integrated

into RTI+. We call this alternative procedure Extended

RTI+ (eRTI+). The empty gaps should only be re-

moved once from initial and untouched intervals. If

IDA would be applied again to a resulting interval,

IDA might lead to removing even more gaps unin-

tentionally from this interval. Hence, the following

change in the iteration routine of RTI+ will be applied

(cf. Algorithm 3, lines 6-10). After identifying the

most visited transition, RTI+ checks if the according

interval borders are the global minimum and maximum

of time delays in the training set which gives evidence

whether IDA already processed this interval. If the

interval is initial, IDA will be performed afterwards to

determine whether or not gaps can be removed from

the interval. If there are any gaps, they will be re-

moved with SPLITS and the iteration ends because the

determined most visited transition no longer exist in its

original form and has to be recomputed. If no gaps can

be removed, the original iteration routine with testing

SPLITS and MERGES will be performed.

To show the usefulness of our new feature, we re-

run our experiment from the beginning of Section 3.

But this time we deploy eRTI+ for reconstructing the

PDRTA. Within IDA we use

MAD

for determining

the maximally allowed gap. With the help of IDA

we are able to identify all transitions and intervals ex-

actly. Hence, all gaps have been detected and removed

correctly. Note that by using

IQR

, an exact reconstruc-

tion was not possible since

IQR

allows larger gaps

than

MAD

. Nevertheless, even with

IQR

we achieved

a PDRTA that is closer to the original one than the

PDRTA generated by the original RTI+ (cf. Fig. 2).

All in all, we are able to reduce or eliminate false

probability assignments for sequences by using IDA

in support of RTI+. With this result in mind, we evalu-

ate IDA in experiments on ATM fraud detection with

real-world data in the following section.

5 EXPERIMENTS WITH ATM

LOG DATA

In this section we compare RTI+ and eRTI+ in the

context of anomaly detection to detect ATM fraud.

When withdrawing money from an Automated Teller

Machine (ATM) the user usually inserts a card, enters

a pin, chooses the amount of money, takes out the card

again and ﬁnally the money. A delay is present be-

tween every of these steps. Additionally, some users

e.g. just insert the card without entering the pin but

pushing the cancel button and aborting the withdraw-

ing. Hence, a PDRTA is a suitable behavior model for

the withdrawing sequences of ATMs.

We use data from a publicly available ATM that

was gathered in the period of nine months. In this time

no attacks were registered, so the data set comprises

normal sequences with 15 million events, resulting in

size of

1.6

GB. To be able to measure the anomaly de-

tection effectiveness of RTI+ and eRTI+ we need data

containing fraud (attempts). As it is almost impossible

to get real-world data of ATM fraud and since the sim-

ulation in a laboratory is very expensive, we choose

to insert anomalies randomly into normal sequences.

Hence, we modify a normal sequence by either switch-

ing two events in a sequence (anomalous event) or

multiplying a single time value in a sequence (anoma-

lous event timing). For a given PDRTA

(learned

from normal sequences) we decide whether a given

sequence

is normal by calculating the probability of

An Interval Distribution Analysis for RTI+

355

given

. We do so by traversing the PDRTA

and

computing the symbol and time probability in every

state using the symbol and time probability distribu-

tions

and

. Then, we multiply all probabilities and

normalize by dividing by sequence length. Finally, we

compare the result of these operations with a threshold

and decide whether the given sequence

is abnormal

with respect to the PDRTA A.

For testing, we implemented four variants of IDA:

v1:

Use

MAD

and remove all gaps discovered by IDA

v2: Use

IQR

and remove all gaps discovered by IDA

v3:

Use

MAD

and remove only border gaps according

to IDA

v4:

Use

IQR

and remove only border gaps according

to IDA

We only present the results of IQR while removing

only border gaps (v4) or all gaps (v2). We do not show

the results of IDA with MAD because it always per-

formed worse than IQR (opposed to experiments with

artiﬁcial data). We measure the effectiveness of IDA

in terms of precision, recall and F-measure. The preci-

sion is the ratio between correctly detected anomalies

and all detected anomalies, while the recall is the ratio

between detected anomalies and all anomalies. The F-

measure (F1) is the harmonic mean between precision

and recall. In real-world scenarios a high precision

(low false positive rate) often improves existing busi-

ness cases without any negative side effect. Neverthe-

less, we want to detect all anomalies, thus present the

recall. The F-measure can be regarded as the trade-off

between precision and recall. Figures 4 and 5 show the

result of applying RTI+ with and without IDA when

optimizing for F1 and precision.

In general, IDA with IQR improves the recall and

often the F-measure. Additionally, IDA leads to more

false positives because more normal sequences have

no path in the inferred automaton as IDA removed

gaps on purpose.

When optimizing for F1 (Fig. 4), IDA v4 improves

the recall without a loss of precision while IDA v2

decreases recall and precision. The results in Fig. 5

(precision) are two fold. IDA variants 2 and 4 improve

recall and F1 at the cost of lower precision. As in

Fig. 4 IDA v4 achieves a similar precision as RTI+

but a higher recall while v2 drastically decreases the

precision. In the scenario of ATM fraud detection a

high precision is more important than a high recall

because false positives (low precision) are costly caus-

ing technicians to examine the ATM, whereas a recall

greater zero already yields an improvement.

Note that we applied eRTI+ with the same hyper-

parameter setting that we used for RTI+ to show the

change caused by IDA. Maybe, we could achieve a

orig. RTI+

eRTI+ with

IDA v4

eRTI+ with

IDA v2

0.2

0.4

0.6

0.8

0.67

0.55

0.49

0.55

0.48

0.56

0.6

0.51

2.44 · 10

−2

3.04 · 10

−2

4.33 · 10

−2

precision

recall

F-measure

normal seq. w/o path

Figure 4: Precision, recall, F-measure and relative frequency

of normal sequences without path for original RTI+ and

eRTI+ with IDA variants v4 and v2. The results had the best

F-measure for their RTI+ variant.

orig. RTI+

eRTI+ with

IDA v4

eRTI+ with

IDA v2

0.2

0.4

0.6

0.8

0.87

0.81

0.55

0.18

0.39

0.48

0.29

0.52

0.51

2.96 · 10

−3

1.04 · 10

−2

4.33 · 10

−2

precision

recall

F-measure

normal seq. w/o path

Figure 5: Precision, recall, F-measure and relative frequency

of normal sequences without path for original RTI+ and

eRTI+ with IDA variants v4 and v2. The results had the

highest precision with recall greater zero for their RTI+ vari-

ant.

better result when tuning the hyperparameters directly

for eRTI+ because RTI+ is very sensitive with respect

to the hyperparameter conﬁguration. Additionally, the

real-world data set may cause odd results because the

data-generating process is not controllable.

All in all, IDA usually improves the F-measure and

recall compared to RTI+ in the ATM context. On the

other hand the precision most often drops which is not

desirable for ATM fraud detection. For other scenarios

a high F-measure may be more important than a high

precision, gaining a beneﬁt from applying RTI+ with

IDA.

6 FUTURE WORK

During the development of IDA, we already had some

ideas to advance the IDA procedure. In this section,

we describe how we can apply alternative algorithms

to solve the IDA problem. Furthermore, we propose a

new approach for IDA and RTI+ to work together.

First, we interpret IDA in a different way: From a

more general perspective, the IDA procedure solves a

(one-dimensional) density-based clustering problem

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

356

without a predeﬁned number of clusters. In the con-

text of IDA, the unsplit segments of an interval form

clusters and the gaps between the clusters are removed.

Therefore, we can use established clustering algo-

rithms to identify gaps in an interval. Possible algo-

rithm candidates are DBSCAN (Ester et al., 1996),

OPTICS (Ankerst et al., 1999), and X-means (Pelleg

and Moore, 2000). By considering the frequency be-

sides the raw time values, we can transform IDA into

a two-dimensional clustering problem. This exten-

sion to two dimensions might lead to better results. In

Fig. 6(a) we show how IDA could look like with a clus-

tering algorithm. The histogram values are combined

into three clusters. The interval segments that are not

covered by a cluster are removed (orange underlin-

ing), while the covered segments are kept. Since the

density-based distance might be less important than

the time-based distance, we can think of an additional

weight parameter for eRTI+.

Time delays

Events per time delay

(a) The IDA problem solved with a two-

dimensional clustering algorithm. Four gaps

are detected.

Time delays

Events per time delay

(b) The IDA problem solved with a density

function. Three gaps are detected.

Figure 6: Alternative approaches for solving the IDA prob-

lem.

Apart from clustering, a completely different ap-

proach would be to learn a density function of the inter-

val where the parts without sequences will contain no

data. The method we use for learning the density can

be Kernel Density Estimation (Parzen, 1962) among

others. The learned function will possibly approach

zero for large gaps with high slope. We can split the

interval at the points where the density function ap-

proaches zero. Fig. 6(b) shows how the IDA problem

is solved with a density function. The density func-

tion is constructed from the histogram values and ap-

proaches zero four times. The interval segments where

the density function is zero are removed (orange).

As an alternative to density functions, we can also

ﬁt general polynomials to the interval data. Accord-

ingly, we can chose from a wider range of methods,

e.g. Support Vector Regression (Drucker et al., 1997).

In contrast to density functions, general polynomials

can have values below zero where the slope is high.

Hence, we can identify gaps where the polynomial

is below zero and perform the necessary SPLITS at

the zero-crossings. By considering a high slope of a

learned function as an indicator for gaps, this approach

might produce more natural results than the original

IDA or the clustering approach.

Besides using alternative algorithms for IDA, we

can also alternate the strategy of applying IDA. Instead

of applying it actively during the training of a PDRTA,

we could also apply IDA passively on the ﬁnal PDRTA

after RTI+ has terminated. We then iterate over all

states of the PDRTA and apply IDA to all outgoing

transitions of each state. If a transition between two

states is split, the new transitions will be created in

parallel between the same two states. Hence, the num-

ber of states in PDRTA is not increased by IDA and

the basic structure is preserved. With this new strategy

we want to avoid inconsistencies between the LRT and

IDA. With our original strategy (cf. Algorithm 3) this

inconsistency might occur when eRTI+ merges two

states based on the LRT which have already been pro-

cessed by IDA. Since the LRT cannot consider gaps, it

is possible that sequences from one state are merged

into the removed gaps of the other state. This will

negatively affect the quality and size of the PDRTA.

We can avoid this risk if we apply IDA on the PDRTA

after RTI+ ﬁnished the identiﬁcation.

Based on the various possibilities to create alterna-

tive IDA procedures and apply them in two different

ways, we are conﬁdent to develop procedures that

are applicable to many different types (e.g. crude or

smooth) of training data. The validation of those pro-

cedures will be part of our future research.

7 RELATED WORK

Besides RTI+ other algorithms also infer automata

inference from timed sequences. In this section we

review other approaches and point out differences.

In (Verwer et al., 2012) Verwer et al. describe the

Real-Time Identiﬁcation (RTI) algorithm. It is based

on the same idea as RTI+ but infers a DRTA with

accepting and rejecting states. Therefore, it requires

positive and negative labeled sequences for training.

The Bottom-Up Timed Learning Algorithm

(BUTLA) (Maier, 2015) learns a Probabilistic Deter-

ministic Timed Automaton (PDTA; very similar to a

PDRTA) from positive timed sequences. Opposed to

RTI+, BUTLA only performs a merge operation but

no split. Instead, it performs a global preprocessing of

An Interval Distribution Analysis for RTI+

357

all time values by ﬁtting kernel density estimators and

computing their local minima. For every local mini-

mum BUTLA creates a subevent. This preprocessing

shall remove the necessity of split operations.

In (Klerx et al., 2014) a Probabilistic Determin-

istic Timed-Transition Automaton (PDTTA) and an

algorithm for learning PDTTAs are presented. The

learning algorithm does not split events (like BUTLA)

or transitions based on time values (like RTI+). In-

stead, it learns the event structure using any state-

of-the-art algorithm (e.g. ALERGIA; (Carrasco and

Oncina, 1994)) and approximates the time values per

transition via kernel density estimators. Hence, it mod-

els the time behavior in more detail, is easier to learn,

but cannot detect temporal substructures.

8 CONCLUSION

RTI+ is an efﬁcient algorithm that learns PDRTAs

from timed sequences. We have revealed a deﬁcit of

RTI+ in learning broadened intervals for time values.

Combined with the independence of symbol and time

probability distributions, this deﬁcit leads to wrong

predictions of sequences. We have investigated that

two of three types of gaps cause the broadened inter-

vals and developed our novel IDA procedure to remove

those gaps. IDA has been integrated into the RTI+ al-

gorithm, which we now call Extended RTI+ (eRTI+).

We have shown that IDA is an effective way to elim-

inate the disadvantage of the independent time and

symbol probability distributions used by RTI+. For

our experiment with an artiﬁcial example PDRTA, IDA

was able to identify and remove all gaps in intervals.

IDA was also able to improve the results in the exper-

iment with ATM fraud detection. Although IDA did

not work optimal on this real-world data, we are conﬁ-

dent that this result can be improved further. IDA is a

very ﬂexible and adaptable procedure. As mentioned

in Section 6, we want to apply IDA after RTI+ has

terminated instead of integrating IDA into the proce-

dure in the future. Furthermore, we plan to replace our

statistical outlier detection by established clustering

algorithms and density estimation procedures.

REFERENCES

Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander,

J. (1999). OPTICS: Ordering Points to Identify the

Clustering Structure. In SIGMOD’99, ACM Interna-

tional Conference on Management of Data, pages 49–

60. ACM.

Carrasco, R. C. and Oncina, J. (1994). Learning Stochas-

tic Regular Grammars by Means of a State Merging

Method. In ICGI’94, 2nd International Colloquium

on Grammatical Inference and Applications, pages

139–152. Springer.

Dima, C. (2001). Real-Time Automata. Journal of Automata,

Languages and Combinatorics, 6(1):3–23.

Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. J.,

and Vapnik, V. (1997). Support Vector Regression Ma-

chines. In NIPS’96, 9th Neural Information Processing

Systems Conference, pages 155–161. MIT Press.

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A

Density-Based Algorithm for Discovering Clusters in

Large Spatial Databases with Noise. In KDD’96, 2nd

International Conference on Knowledge Discovery and

Data Mining, pages 226–231. AAAI Press.

Klerx, T., Anderka, M., Kleine B

uning, H., and Priesterjahn,

S. (2014). Model-Based Anomaly Detection for Dis-

crete Event Systems. In ICTAI’14, 26th IEEE Interna-

tional Conference on Tools with Artiﬁcial Intelligence,

pages 665–672. IEEE Computer Society.

Lang, K. J., Pearlmutter, B. A., and Price, R. A. (1998). Re-

sults of the Abbadingo One DFA Learning Competition

and a New Evidence-Driven State Merging Algorithm.

In ICGI’98, 4th International Colloquium Conference

on Grammatical Inference, pages 1–12. Springer.

Leys, C., Ley, C., Klein, O., Bernard, P., and Licata, L.

(2013). Detecting Outliers: Do Not Use Standard

Deviation around the Mean, Use Absolute Deviation

around the Median. Journal of Experimental Social

Psychology, 49(4):764–766.

Maier, A. (2015). Identiﬁcation of Timed Behavior Mod-

els for Diagnosis in Production Systems. PhD thesis,

University of Paderborn.

Parzen, E. (1962). On Estimation of a Probability Den-

sity Function and Mode. The Annals of Mathematical

Statistics, 33(3):1065–1076.

Pelleg, D. and Moore, A. W. (2000). X-means: Extending

K-means with Efﬁcient Estimation of the Number of

Clusters. In ICML’00, 7th International Conference on

Machine Learning, pages 727–734. Morgan Kaufmann

Publishers Inc.

Tukey, J. W. (1977). Exploratory Data Analysis. Pearson.

Verwer, S., de Weerdt, M., and Witteveen, C. (2010). A

Likelihood-Ratio Test for Identifying Probabilistic De-

terministic Real-Time Automata from Positive Data.

In ICGI’10, 10th International Colloquium Conference

on Grammatical Inference, pages 203–216. Springer.

Verwer, S., Weerdt, M., and Witteveen, C. (2012). Efﬁciently

Identifying Deterministic Real-Time Automata from

Labeled Data. Machine Learning, 86(3):295–333.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

358