Detecting Anomalies by using Self-Organizing Maps in Industrial

Environments

Ricardo Hormann and Eric Fischer

Shopﬂoor IT, Volkswagen AG, Wolfsburg, Germany

Keywords:

Anomaly Detection, Self-Organizing Maps, Proﬁnet.

Abstract:

Detecting anomalies caused by intruders are a big challenge in industrial environments due to the complex

environmental interdependencies and proprietary ﬁeldbus protocols. In this paper, we proposed a network-

based method for detecting anomalies by using unsupervised artiﬁcial neural networks called Self-Organizing

Maps (SOMs). Therefore, we published an algorithm which identiﬁes clusters and cluster centroids in SOMs

to gain knowledge about the underlying data structure. In the training phase we created two neural networks,

one for clustering the network data and the other one for ﬁnding the cluster centroids. In the operating phase

our approach is able to detect anomalies by comparing new data samples with the ﬁrst trained SOM model.

We used a conﬁdence interval to decide if the sample is too far from its best matching unit. A novel additional

conﬁdence interval for the second SOM is proposed to minimize false positives which have been a major

drawback of machine learning methods in anomaly detection. We implemented our approach in a robot cell

and inﬁltrated the network like an intruder would do to evaluate our method. As a result, we signiﬁcantly

reduced the false positive rate to 0.07% using the second interval while providing an accuracy of 99% for the

detection of network attacks.

1 INTRODUCTION

Traditionally, industrial communication networks are

isolated systems that usually only allow internal com-

munication. Recent developments within the Industry

4.0 show the trend towards using more information

and communication technologies (ICT) in industrial

control systems (ICS) (Schuster et al., 2013). Many

ﬁeld devices and protocols currently in use have been

originally designed for isolated and highly trusted au-

tomation networks with a physical air gap to other

networks. As cyber security has not been taken into

account during their development, these systems lack

essential IT security features. This design decision is

not compatible with the nowadays often implemented

connection to more open and thus less trusted net-

works like supervisory networks that use open pro-

tocols (e.g. TCP/IP) (Knapp, 2011).

With these new interfaces, more attack possibili-

ties emerge with an increased damage potential like

“Stuxnet” in 2010 (Langner, 2011) and “WannaCry”

in 2017 (Ehrenfeld, 2017). The steadily increasing

complexity of automation networks indicates the ne-

cessity of more advanced intrusion detection systems

(IDS) that do not exclusively rely on searching for

signa-

tures that characterize previously known attacks.

There is a need for IDSs that also identify unknown

threats (zero-day attacks). Due to the general lack of

carefully labeled data sets in complex communication

networks, unsupervised machine learning algorithms

like Self-Organizing Maps (SOMs) are suitable meth-

ods for such IDS (Ippoliti and Zhou, 2012). They aim

at constructing a model characterizing normal behav-

ior of a system by extracting raw data during normal

operation (Di Pietro and Mancini, 2008).

These aspects motivate the analysis of industrial

networks using SOMs to detect anomalies that indi-

cate security relevant incidents. The general prob-

lem of too many false alarms of anomaly-based IDS

should be addressed to make them attractive for prac-

tical use. Taking the above into account, this paper

proposes a concept for the analysis of SOMs to facil-

itate the detection of outliers in ICS.

2 INDUSTRIAL CONTROL

SYSTEMS

Industrial control systems describe networks that op-

erate and manage industrial processes such as an

336

Hormann, R. and Fischer, E.

Detecting Anomalies by using Self-Organizing Maps in Industrial Environments.

DOI: 10.5220/0007364803360344

In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 336-344

ISBN: 978-989-758-359-9

automated manufacturing step in an automotive as-

sembly line. ICSs utilize a real-time communica-

tion between their devices through ﬁeldbuses and in-

dustrial Ethernet protocols (Knapp, 2011). ICSs are

located in production cells and contain control de-

vices such as programmable logic controllers (PLCs)

to operate machines and robots and human-machine

interfaces (HMI) for monitoring and manual adjust-

ments (Stouffer et al., 2011; Knapp, 2011). Further,

PLCs can be connected to ﬁeldbuses that link ﬁeld

devices like sensors, motors and valves at shop ﬂoor

level. Different network zones ensure that devices in

the corporate network cannot communicate directly

with the control network and vice versa which is im-

portant as the networks have different trust levels.

Proﬁnet is a real-time capable Ethernet-based

communication protocol and was invented in the

1980s to facilitate fast and reliable data exchange be-

tween controlling units and ﬁeld devices. There are

mainly two different classes of Proﬁnet communica-

tion: Proﬁnet CBA to transmit non time critical data

and Proﬁnet IO for the cyclic and non cyclic data ex-

change in real time between IO controllers (PLCs)

and IO devices (ﬁeld devices). Proﬁnet IO (PNIO)

itself consists of several sub-protocols. To setup a

Proﬁnet communication, the PNIO-Context Manager

(PNIO-CM) is used. It bases on UDP and establishes

the initial PNIO connection. The actual cyclic sta-

tus data is exchanged as PNIO packets via Ethernet

frames on OSI layer 2 and is not acknowledged. Each

Proﬁnet data unit is directly embedded in an Ether-

net frame and directed from one node to another via

the MAC addresses of the communicating devices. Its

payload contains status information and IO data val-

ues whose meanings are negotiated by PNIO-CM in

the communication setup. The data exchange is struc-

tured as a publish/subscribe model where the IO con-

troller can subscribe to information published by IO

devices. Further, Proﬁnet IO also supports non cyclic

communication for alarm messages in form of PNIO-

AL (Frank, 2009).

3 SELF-ORGANIZING MAPS

A Self-Organizing Map is an artiﬁcial neural net-

work (ANN) that receives high dimensional data as

an input and learns its complex structures to repre-

sent it on a two dimensional map of neurons as the

output. It behaves like a dynamic and ﬂexible lat-

tice that is spanned over the input data samples to

approximate it as shown in Fig. 1 on the left hand

side. In other words it ﬁts two-dimensionally ordered

prototype vectors to the distribution of the high di-

mensional input data vectors (Kohonen et al., 2001).

The method is inspired by the functionality of the hu-

man brain where similar inputs activate neurons in the

same area of the brain. Thus, data instances that are

close in the input data space are mapped to neurons

that are nearby on the output map. This property is

called topological correctness and makes the SOM a

unique and useful tool for exploring data sets as it vi-

sually represents high dimensional data. Further, the

SOM does not require manually labeled data sets as

an input and can be classiﬁed as an unsupervised ma-

chine learning method. Originally, the algorithm was

introduced by the Finnish researcher Teuvo Kohonen

in 1982 (Kohonen, 1982) and has been successfully

used in well over 10000 publications until 2011 (Ko-

honen et al., 2001).

3.1 Algorithm

The original SOM algorithm essentially consists of

four steps. First, a usually two dimensional lattice of

neurons like in Fig. 1 is initialized by creating a vec-

tor for each neuron that has certain coordinates on the

map. The dimension of each of those prototype vec-

tors has to match with the later used input data vec-

tors. The values of each prototype vector can be ini-

tialized randomly or in a linear way according to the

minimum and maximum values of each vector com-

ponent in the training data.

The second phase can be called competition phase

as the neurons compete for each input that is pre-

sented to the neural network to be selected as the win-

ner neuron. In Fig. 1 a training sample s is chosen

randomly from the input data set represented as a blue

cloud. Next, the input vector is compared to each pro-

totype vector on the map computing the distance be-

tween them (e.g. euclidean distance) and the one with

the smallest distance is chosen. Hence the winner is

also called best matching unit of s (BMU(s)) and rep-

resents it most accurately out of all neurons.

Figure 1: SOM algorithm and learning phase based on (Ko-

honen et al., 2001).

Thirdly, the winning neuron determines the topo-

logical area of the map that is ’activated’ by the input

s in the cooperation phase. Further, the grid structure

Detecting Anomalies by using Self-Organizing Maps in Industrial Environments

337

of the map deﬁnes the neighborhood neurons that are

stimulated along with its center node (the winner neu-

ron). In our case the three closest neurons are set as

the neighborhood of BMU(s).

Last is the adaption phase where the winning neu-

ron and its neighborhood adapt to the input. Given a

certain learning rate that decreases over time the stim-

ulated prototype vectors are ’shifted’ towards the in-

put vector so that their distances to the input decrease.

Hereby the degree of adaption is dependent on the

similarity of the current neuron to the input s so that

neighboring neurons adapt less than BMU(s). These

four steps are performed for each input training data

sample. After all samples have been applied to the

SOM one training iteration (called epoch) is ﬁnished.

The number of epochs that are performed until the al-

gorithm stops depends on the learning rate (Kohonen

et al., 2001).

3.2 U-Matrix and Clusters

A popular possibility to represent the SOM and its

topological cluster structures is the so called Uniﬁed

Distance Matrix (U-Matrix) introduced by (Ultsch,

1995). For each prototype vector of the SOM the av-

erage distance to its surrounding vectors is computed

and saved in a matrix structure preserving the respec-

tive map coordinate. Thus, low values in the U-Matrix

describe dense areas among the SOM i.e. neuron

clusters containing prototype vectors that are close to

each other characterize similar behavior. On the other

hand high distance values represent sparse areas in-

dicating outlying neurons. Moreover, when analyz-

ing cluster structures valuable information about the

underlying data can be extracted from the U-Matrix,

e.g. the number of clusters in the input data (Vesanto

and Alhoniemi, 2000) for further cluster algorithms

like k-means. In related research, (Brugger et al.,

2008) proposed an algorithm creating a clusot sur-

face to cluster data and to determine the number of

clusters. This method however is computationally

too expensive for our objective to identify the num-

ber of clusters as a ﬁrst step of several analysis meth-

ods and anomaly detection. Other research introduced

a semi-automatic approach (Opolon and Moutarde,

2004) based on the U-Matrix to cluster the SOM neu-

rons by trial-and-error principle requiring manual in-

teraction. Thus, there is need for a fully automatic

algorithm that quickly computes a reliable reference

value approximating the number of clusters in a SOM.

This issue is addressed by a novel algorithm based on

the U-Matrix proposed in Section 4.1.

4 APPROACH

In this chapter a concept for analyzing Self-

Organizing Maps is proposed to detect anomalies and

to reduce false positives in the trafﬁc of industrial con-

trol systems.

4.1 Cluster Analysis of Self-Organizing

Maps

The goal of this section is to gain knowledge about

the trained SOM to improve the detection accuracy of

outliers in the test data. First, a novel algorithm is

proposed that identiﬁes the number of neuron clusters

present in a trained SOM model in Section 4.1.1. The

quality of the resulting clusters can be validated with

methods described in Section 4.1.2. In Section 4.1.3

the centroid vector of each neuron cluster is approxi-

mated.

4.1.1 Identifying Clusters

In general there is no ’right’ amount of clusters, but

a clustering quality metric can be used to ﬁnd the

number that delivers the best performance consider-

ing that metric. Usually this is done by the trial-

and-error method. However, there is a signiﬁcant

amount of possible cluster numbers when analyzing

high amounts of data occurring in communication

networks e.g. 160 is optimal in (Y

uksel et al., 2016).

Thus, an algorithm that automatically computes the

approximated number of clusters can be useful and

saves time. In the following such an algorithm is pro-

posed that utilizes the structure of the U-Matrix of

a SOM to determine the number of existing neuron

clusters. It solely relies on the data provided in the

Uniﬁed Distance Matrix. Essentially, Alg. 1 tries to

ﬁnd starting points for a cluster in the most dense ar-

eas of the map. When it ﬁnds an unused neuron it

scans the neighboring neurons i and adds them to the

current cluster j if they are close enough (see line 6 in

Alg. 1) meaning they fulﬁll Eq. (1)

. The algorithm

searches the U-Matrix like a depth-ﬁrst-search regard-

ing the minimum U-Matrix value of the current clus-

ter members and neighbors. Moreover, if there are no

more potential nodes to add to the current cluster, it

validates the cluster depending on its size and density

compared to the map size of the SOM and global av-

erage density of the U-Matrix. Thus, if a found cluster

is too sparse or does not have enough members, it is

not accepted. These nodes may be added to a cluster

avg umat j is the average U-Matrix value for the cur-

rent cluster and σ

global umat dist

is the global standard devia-

tion of all U-Matrix values.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

338

in a future iteration of the algorithm, but they are not

forced to ﬁt a cluster and some neurons may be left

unassigned at the end of the algorithm.

umat

<= avg umat j + σ

global umat dist

(1)

Algorithm 1: Textual description of the ’clus-

terﬁnder’ algorithm.

Input : Uniﬁed Distance Matrix of the

trained SOM model.

Output: Reference number of neuron

clusters in the SOM.

1 while Potential cluster starting points left do

2 From all neurons that are not part of any

cluster add the neuron with the smallest

U-Matrix value to the potential new

cluster and mark it as used

3 Add its neighboring unassigned neurons

to the potential member list

4 while Potential members left for the

current cluster do

5 current neuron ← get the one with the

current minimum U-Matrix value

from the potential neurons

6 if current neuron’s Umat value is

small enough then

7 add the neuron to the cluster and

add its neighbors to the potential

member list

8 else

9 remove neuron from the potential

members list

10 end

11 end

12 If the current cluster is valid then save it

13 end

14 Return the number of found valid clusters.

After ﬁnishing the clustering process the number

of clusters is returned which can be interpreted as a

reference number of the existing clusters in the SOM.

Reference means in this case that the cluster number

is validated by checking the numbers around that ref-

erence value as described in Section 4.1.2.

4.1.2 Conﬁrming Cluster Quality

The ’cluster ﬁnder’ algorithm from Section 4.1.1 pro-

posed a possible number of neuron clusters k in the

SOM after analyzing the U-Matrix representation.

However, the quality of the resulting clustering has

to be veriﬁed as a next step using some kind of qual-

ity metric. To determine the ’right amount’ of clus-

ters, silhouette analysis is a widely accepted method

as the resulting silhouette score is only dependent on

the partition of the objects and not on the used cluster-

ing algorithm (Rousseeuw, 1987). Moreover, (Petro-

vic, 2006) show that silhouette analysis performs bet-

ter than an alternative metric like Davies Bouldin in

the context of anomaly IDS at cost of a more complex

and time consuming algorithm.

The silhouette score s of a single object i belong-

ing to some cluster is deﬁned for k >= 2 in Eq. (2).

s(i) =

b(i) − a(i)

max(a(i), b(i))

(2)

Variable b(i) is the mean of all distances between

object i and the objects in its nearest cluster to which

i does not belong. Variable a(i) is the average dis-

tance between object i and all other objects that be-

long to the same cluster as object i. The single silhou-

ette scores s(i) ∈ [−1;1]

are averaged for all clusters

resulting in an overall average silhouette score ¯s(k).

A positive value near 1 means the clusters are (in av-

erage) really different to each other and objects of the

same cluster are similar. Ergo, the k with the highest

¯s(k) should be chosen (Rousseeuw, 1987).

In our concept, we evaluate ¯s(k) for several values

of k around the reference value that has been com-

puted by the ’cluster ﬁnder’ algorithm to save time

compared to a pure trial-and-error approach where k

increases for higher map sizes of the SOM.

4.1.3 Finding Cluster Centroids

As previously stated, the goal is to gain more knowl-

edge about the computed SOM representing normal

network behavior. By identifying its cluster centers

using an additional SOM layer, the distance of a test-

ing data sample to said centers can be a useful in-

formation for the decision if a data sample is normal

or anomalous. This is due to the clusters represent-

ing high values for the probability density function of

the input space (Kohonen, 2014). This is motivated

by following scenario illustrated in Fig. 2 where the

output space of a trained SOM is shown. There are

three clusters of neurons with their respective cluster

centroid vectors c1, c2 and c3. Let s1 be a normal

testing sample and s2 be an anomalous testing sam-

ple. The BMU distances (red lines) of both samples

are equal but s1 is quite similar to the normal behav-

iors of the three clusters compared to s2 which is a

signiﬁcant outlier. In other words, the summed dis-

tances of s1 to the centroids are much smaller than

the summed distances of s2 to said centroids. If only

A negative value indicates the object i has been as-

signed to the wrong cluster and a value of 0 means the object

is right between the two clusters.

Detecting Anomalies by using Self-Organizing Maps in Industrial Environments

339

the BMU distance of a testing sample is considered,

valuable information might be lost. Therefore, con-

sidering the average cluster centroid distances in the

anomaly detection process could improve the detec-

tion performances.

Figure 2: SOM Clusters with two possible outliers.

To compute the centroid vectors of neuron clusters

of a SOM, the number of clusters has to be determined

before which is done in Section 4.1.1. Subsequently,

a second layer SOM - called cluster center SOM (CC-

SOM) in the following - can be trained to ﬁnd the cen-

troids of the main SOM like in Fig. 2. The CCSOM’s

number of neurons equals the desired cluster number

(e.g. 3) while the training data remains the same as

for the main SOM. Consequently, the CCSOM tries

to represent the whole input space with the three cho-

sen neurons. The prototype vector of each neuron in

the output layer adapts to a speciﬁc cluster represent-

ing its centroid vector (Kohonen et al., 2001). On the

other hand, the centroid vectors could also be com-

puted using the k-means algorithm. However, since

it is prone to an initialization bias with its local op-

tima differing from the global optimum, the CCSOM

has been chosen to ﬁnd ﬁtting cluster centroids in the

initial try (Bac¸

ao et al., 2005).

4.2 Detecting Anomalies

As the trained SOM represents the normal operating

state of the monitored network in our concept, we can

identify anomalous data samples by comparing them

to the SOM.

The ﬁrst step is to compute a threshold δ

max

that

describes the acceptable degree of deviation from a

SOM neuron (its BMU). This can be done e.g. by

setting the maximum observed BMU distance as the

max

. However, this value may be distorted due to the

training data being noisy in a real scenario. To ﬁlter

out this noise, one possible approach is to compute a

conﬁdence interval I

bmu

= [0, δ

percentile

] during train-

ing that contains e.g. 99.99% of the BMU distance

values to the respective training samples (Bellas et al.,

2014). Further, for each testing sample x

the distance

to its BMU neuron d

bmu

(i) = d(x

, bmu(x

)) is com-

puted. If d

bmu

(i) ∈ I

bmu

then the testing data sample i

is classiﬁed as a normal packet. Else it is labeled as

an anomaly.

(Bellas et al., 2014) also introduced the idea to

construct a conﬁdence interval for every SOM node’s

distance to its mapped training inputs. However, this

does not consider the nodes which were not cho-

sen as BMUs during the training phase but repre-

sent small deviations from other mapped neurons.

This together with the evaluation of (Goldstein and

Uchida, 2016) motivated the authors’ idea for the fol-

lowing. The summed distance to the cluster centroids

cent

may be useful as they represent dense points

of the input space (normal behavior) as explained in

Section 4.1.3. Also when training a SOM with an

anomaly in the training data, we observed that the

BMU of said anomaly is most of the time located at

the outside of the map as the anomaly differs signiﬁ-

cantly from other inputs. Hence it has a high value for

cent

. The idea is to build a second conﬁdence interval

centroid dist

which describes the summed distances of

a SOM neuron to the computed centroid vectors. If

cent

of a testing data sample is outside of I

centroid dist

the testing data sample is not located in between the

found clusters or around their neuron members and

thus rather outside like the sample s2 in Fig. 2. This

information is useful when the main conﬁdence inter-

val I

bmu

barely identiﬁes a sample as an anomaly, e.g.

it only exceeds the threshold δ

max

by a bit based on

the variance of the other distances. In other words,

it may be a false positive. Using I

centroid dist

as a sec-

ondary decision component can help to verify that the

anomalously classiﬁed testing sample is indeed intru-

sive.

5 IMPLEMENTATION

As a ﬁrst step the collected raw network data is trans-

formed, because usually machine learning algorithms

require numerical values (Hormann et al., 2018). This

is critical if features are created per packet as the raw

data types depend on the spoken network protocol and

may not be of numerical type. In our scenario (see

Section 6) we observe apart from the main Proﬁnet IO

trafﬁc various other protocols (e.g. TCP/UDP, DHCP,

LLDP) with different structures which will be mod-

eled using one general SOM model. The open source

tool Wireshark is used to extract and create similar

features from the raw pcap ﬁles for the different pro-

tocol types. The features chosen utilized by our SOM

model are listed in Tab. 1. They are extracted per

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

340

Table 1: Chosen features for the SOM model covering all

protocols.

Feature Type Example Processed Value

Source String 48:ba:4e:ea:41:e3 1

Dest. String 00:0e:8c:ac:c3:db 2

Protocol String Proﬁnet IO 1

Length Integer 60 60

Info String ...ID:0x8080, Cycle:30464 (Valid,...) 1

dissected network packet which has been proven use-

ful for anomaly detection (Zanero and Savaresi, 2004)

and is also considered for industrial networks (Schus-

ter et al., 2013). It converts each observed packet into

one data sample containing features that the differ-

ent protocols share, i.e. a source address (string), a

destination address (string), a protocol type (string),

a packet length (integer) and information (string).

Last contains the packet payload that is interpreted

by Wireshark. The string valued features have to be

converted to a numerical representation for the ma-

chine learning model. To reduce complexity a simple

method is chosen which replaces each unique string

value with its own integer index (compare ’Processed

value’ column in Tab. 1). The converted training

data is subsequently normalized and a SOM model is

trained. The prototype analyzes its cluster structures

and computes both conﬁdence intervals for anomaly

detection and false positive prevention as discussed in

Chapter 4.

6 EVALUATION

To evaluate the performance of our developed pro-

totype and detection concept we executed attack and

anomaly events in a realistic manufacturing scenario

described in Section 6.1. In Section 6.2 two training

data sets are discussed. Section 6.3 explains the attack

scenarios we performed on the network to generate

testing data sets and the results our anomaly detection

prototype achieved. The ﬁndings are interpreted in

Section 6.4.

6.1 Experimental Scenario

Our experimental setup is a human-robot collabo-

ration scenario where a robot is assisting a worker

to manufacture engine blocks in a production cell.

The Ethernet-based network structure is illustrated in

Fig. 3. It consists of the robot that is commanded by

its own robot controller. The latter is connected via an

Ethernet cable to an industrial network switch. The

PLC which is controlling the production process is

connected over an additional communication proces-

sor to the switch in the middle. On this line we in-

stalled a TAP device which is used to extract network

trafﬁc ﬂowing from or towards the PLC. For testing

purposes we collected the network packets from the

TAP on a separate laptop with two Ethernet interfaces.

Figure 3: The network structure of our experimental sce-

nario.

Other network participants are a server with sev-

eral virtual machines (VM) running on it which will

later represent an industrial PC that is inﬁltrated and

controlled by an attacker. Moreover, a router repre-

sents the network gate to our small production cell.

6.2 Training Data

In our evaluation, we gathered two training data sets

of different difﬁculty. For both sets we simulated

normal operation by running the production program

several times where the robot is performing certain

movements and actions. We extracted all network

packets involving the PLC network interfaces and all

kind of multicast

packets.

For the simple training data set we started the

packet extraction process during normal operation run

where all devices were already running which is the

usual case in a manufacturing scenario. Thus, we ob-

served mostly Proﬁnet IO trafﬁc between the PLC

interfaces and the robot controller which made up

98.4% of all packets. The other packets are Proﬁnet

PTCP, ARP and ICMP by the PLC and CP devices

on the one hand. On the other hand, the network in-

terface card of our capturing PC send some DHCP,

LLMNR, NBNS and IGMP packets. Overall the data

set consists of 71693 packets observed over 265 sec-

onds.

The main difference of the complex training data

set to the simple one is that the former addition-

ally contains the ’startup phase’ of the used devices.

Thus, there is more noise in form of discovery and

resolution protocols and others e.g. LLDP, LMNR,

DHCP, ICMP and ARP. Also, Proﬁnet IO-CM (con-

text manager) is observable as it is used to establish

the Proﬁnet IO communication setup. Because of the

Multicast packets originate from one source but are dis-

tributed to multiple recipients.

Detecting Anomalies by using Self-Organizing Maps in Industrial Environments

341

heavy use of IP based protocols during the start up

procedure, it will be more difﬁcult to distinguish be-

tween normal and anomalous IP stack trafﬁc in the

testing phase. The complex data set contains about

47000 packets extracted over 187 seconds.

6.3 Testing and Results

To validate the performance of our prototype, we per-

formed several attacks and incidents to gather test

data as shown in Tab. 2. We deﬁne a single packet

as malicious if it is part of or caused by an attack or

incident that we initiate during the testing scenario

unless equal packets have been also observed in the

training phase. For each scenario and data set, we

evaluate the performance of our prototype using two

different settings. First, we ignore I

centroid dist

result-

ing in a false positive rate FPR

. Subsequently, we do

consider I

centroid dist

resulting in FPR

as illustrated in

Tab. 2 and Tab. 3.

For the ﬁrst exploit we force the PLC into ’stop

mode’ which pauses any controlling activity of the

PLC and disables any outputs. In our test, the VM

(a known and trusted network participant) communi-

cates via TCP with the PLC to setup a communica-

tion over S7comm

. Subsequently, the PLC receives

an S7comm packet which tells it to turn into stop

mode. All those packets involving the described at-

tack are detected by our prototype since the PLC nei-

ther talked to the VM nor via S7comm or TCP. This

network behavior is not described by our SOM model

which causes a signiﬁcant deviation from it triggering

an anomaly alert for each involved packet. One mali-

cious packet where the PLC talks ARP with the VM

after the attack has not been detected however. Hence

the DR of 97.73%. As no packets have falsely been

classiﬁed as a positive the FPR is 0% for all 12450

packets in this test data set. In the second scenario

we let the VM execute the same exploit with the dif-

ference that we force the PLC from stop mode into

run mode resulting in a DR of 99.1% and FPR of 0%.

In the similar third attack the adversary manipulates

a part of the memory of the PLC via the inﬁltrated

VM. This modiﬁcation causes the robot to change its

movement routine to a position about 20 centimeters

higher than before. 97.62% of the malicious packets

have been detected by our prototype with a FPR of

0%.

In the 4th scenario we simulate the disconnection

of the PLC by removing its Ethernet cable. During

real operation this disconnection can be caused by

S7comm is a proprietary protocol by Siemens for the

communication between PLCs and other controllers and is

based on the TCP/IP stack.

Table 2: Testing scenarios and their results using the simple

training data set.

Scenario # Packets # Malicious Packets DR FPR

FPR

1. Exploit PLC Stop 12450 111 97.73% 0% 0%

2. Exploit PLC Start 2801 44 99.1% 0% 0%

3. Exploit PLC Write 5331 42 97.62% 0% 0%

4. PLC Disconnect 1830 45 100% 0% 0%

5. Robot Disconnect 1501 37 100% 0% 0%

6. IP Scan 1682 517 99.42% 0% 0%

accident or of course by an attacker that has physi-

cal access to the PLC. Following, the robot controller

sends Proﬁnet IO-AL (alarm) packages as its Proﬁnet

controller (PLC) is not communicating anymore. All

45 intrusive packets have been identiﬁed without false

positives resulting in a DR of 100% and FPR of 0%.

The 5th test simulates the disconnection of the robot

controller by unplugging its Ethernet cable which is

reliably detected once again with perfect accuracy as

shown in Fig. 2. For the last scenario we perform an

IP scan in our testing network using the open source

scanning tool ’nmap’

which is executed by our at-

tacking laptop. This attack scenario is particularly

important as a network intruder has to discover the

network devices accessible over this network to at-

tack speciﬁc critical devices (Hutchins et al., 2011).

Once again, our prototype detects well over 99% of

the malicious packets without any false positives.

In the second part we will use the complex train-

ing data set (see Section 6.2) to train our anomaly de-

tector and test the same attack types as for the simple

data set before. As Tab. 3 shows the DRs remain in

the same range. However, we experienced numerous

false positive alerts (see FPR

) if we do not use our

secondary interval I

centroid

ist

proposed in Section 4.2

to prevent false positives. If we do use I

centroid

ist

how-

ever, a signiﬁcant amount of FPs are correctly identi-

ﬁed as negatives (see FPR

). In scenarios 2 and 3, the

FPR is reduced from 0.32% and 0.032% to 0%. On

the downside, our approach slightly reduced the DR

in scenario 3 from 97.92% to 96.08% as it classiﬁed 2

TPs as FPs. In scenario 4 we experienced about 30%

less false positives. The robot disconnection incident

is the only scenario with at least one FP where our

approach did not improve the FPR.

Table 3: Testing scenarios and their results using the com-

plex training data set.

Scenario # Packets # Malicious Packets DR FPR

FPR

1. Exploit PLC Stop 1501 49 97.96% 0% 0%

2. Exploit PLC Start 361 52 100% 0.32% 0%

3. Exploit PLC Write 3201 50 97.92(96.08)% 0.032% 0%

4. PLC Disconnect 871 5 100% 1.04% 0.69%

5. Robot Disconnect 503 4 100% 0.20% 0.20%

6. IP Scan 328 47 100% 0% 0%

https://nmap.org/

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

342

6.4 Interpretation

The results indicate an overall good detection per-

formance of our prototype with detection rates rang-

ing from 97.62% to 100% in both difﬁculties. The

absence of false positives (using the simple train-

ing data) is surprising at ﬁrst as machine learning

based NIDS are generally prone to FPs (Landress,

2016). This especially holds true for when normal

behavior is observed that did not occur during the

training phase and thus may not be covered by the

trained model (Buczak and Guven, 2016; Zanero and

Savaresi, 2004). However, industrial networks’ traf-

ﬁc is more cyclic than in ofﬁce networks reducing

the noise and facilitating a potentially more accurate

model. Though, even if our SOM model is trained us-

ing noisy data samples containing the startup commu-

nication of all devices (complex training set in Tab.3)

the prototype reliably detects the tested attacks once

again. Nevertheless the importance of the feature

selection process cannot be emphasized too much.

Since the constructed SOM model creates its features

per packet it is intended to detect attacks that con-

tain packets signiﬁcantly deviating from the learned

normal ones. A signiﬁcant deviation in our model is

a packet containing e.g. an unknown device, an un-

known protocol, a known device speaking to an un-

usual partner via an unusual protocol, unknown or

unusual payload or any combination of these. Hence,

an attack consisting of many packets that appear nor-

mal when analyzed independently e.g. a DoS attack

is probably not detected by our current setup (Schus-

ter et al., 2013). This is due to the feature set not

containing relations between multiple packets. If a

second SOM model is deployed using connection ori-

ented features per packet sequence or time window

like (Sestito et al., 2018) these kinds of attacks are de-

tectable as well as (Mitrokotsa and Douligeris, 2005)

show with DRs of well over 97%.

The current prototype is not intended to detect all

kinds of possible attacks as only one feature set and

model is used. It is rather intended to be a comple-

menting detection component of a complex NIDS.

Therefore, it has been shown that our concept is capa-

ble to correctly abstract the normal behavior of an ICS

with respect to the chosen features and identify secu-

rity relevant deviations from it. Further, analyzing the

cluster structure of the trained SOM model using the

proposed interval I

centroid

dist

is shown to be useful to

derive additional knowledge preventing almost 50%

of the observed false positives.

Related work proposed by (Sestito et al., 2018)

utilizes connection oriented features using sliding

windows and observed DRs of 92% to 99% with

FPRs of 0% up to 7%. However, they focused on

Proﬁnet speciﬁc events whereas our methodology is

more generic covering all kinds of protocols. (Y

uksel

et al., 2016) focuses on analyzing payload informa-

tion of industrial protocols like Modbus and S7comm

and at best achieved an overall detection rate of 99.1%

paired with a FPR of 0.047% for scan attacks for

which we did not observe any FPs. Further, (Schus-

ter et al., 2015) used a one class Support Vector Ma-

chine

on Proﬁnet IO trafﬁc with a similar feature set

as ours resulting in a DR of 96% and a FPR of about

1%. Concluding, our approach shows better perfor-

mances for the tested scenarios than comparable pro-

posals. Moreover, we use a generic packet-based fea-

ture set that is applicable for all Ethernet-based proto-

cols. However, (Sestito et al., 2018; Schuster et al.,

2015) can detect DoS attacks while (Y

uksel et al.,

2016) interprets payloads.

7 CONCLUSION AND FUTURE

WORK

To sum up, our work shows that anomaly-based IDSs

in particular implementing SOMs are an effective

method to detect novel attacks on a Proﬁnet-based

ICN. The general drawback of too many false posi-

tives can be addressed using additional conﬁdence in-

tervals that further analyze the normal network state.

Thus, anomaly-based IDS hopefully become more at-

tractive for the deployment in industrial sites which

have been increasingly exposed to sophisticated cy-

ber attacks. Consequently, novel attacks can be de-

tected before causing major material damage or per-

sonal injuries. Future research may evaluate the pro-

posed approach in more attack scenarios and consider

additional models built from e.g. connection-oriented

feature sets. The installation of multiple taps in the

same network (distributed IDS) or the usage of a mir-

ror port of a central switch will also provide new op-

portunities to model the networks behavior. The pro-

posed anomaly detection approach could further be

applied to general outlier detection problems besides

network trafﬁc data.

REFERENCES

Bac¸

ao, F., Lobo, V., and Painho, M. (2005). Self-organizing

maps as substitutes for k-means clustering. In Pro-

ceedings of the 5th International Conference on Com-

putational Science - Volume Part III, ICCS’05, pages

476–483, Berlin, Heidelberg. Springer-Verlag.

Support Vector Machine is a supervised learning algo-

rithm for classiﬁcation tasks.

Detecting Anomalies by using Self-Organizing Maps in Industrial Environments

343

Bellas, A., Bouveyron, C., Cottrell, M., and Lacaille, J.

(2014). Anomaly detection based on conﬁdence in-

tervals using som with an application to health mon-

itoring. In Advances in Self-Organizing Maps and

Learning Vector Quantization, pages 145–155, Cham.

Springer International Publishing.

Brugger, D., Bogdan, M., and Rosenstiel, W. (2008). Auto-

matic cluster detection in kohonen’s som. Trans. Neur.

Netw., 19(3):442–459.

Buczak, A. and Guven, E. (2016). A survey of data min-

ing and machine learning methods for cyber security

intrusion detection. IEEE Communications Surveys &

Tutorials, 18:1153–1176.

Di Pietro, R. and Mancini, L. (2008). Intrusion Detection

Systems. Advances in Information Security. Springer

US.

Ehrenfeld, J. (2017). Wannacry, cybersecurity and health

information technology: A time to act. Journal of

Medical Systems, 41(7):104.

Frank, H. (2009). Industrielle Kommunikation mit Proﬁnet.

https://www.hs-heilbronn.de/1749571/proﬁnet,

accessed on 11.07.2018.

Goldstein, M. and Uchida, S. (2016). A comparative eval-

uation of unsupervised anomaly detection algorithms

for multivariate data. PloS one, 11 4.

Hormann, R., Nikelski, S., Dukanovic, S., and Fischer,

E. (2018). Parsing and extracting features from opc

uniﬁed architecture in industrial environments. In

Proceedings of the 2Nd International Symposium on

Computer Science and Intelligent Control, ISCSIC

’18, pages 52:1–52:7, New York, NY, USA. ACM.

Hutchins, E., Cloppert, M., and Amin, R. (2011).

Intelligence-driven computer network defense in-

formed by analysis of adversary campaigns and intru-

sion kill chains. Leading Issues in Information War-

fare & Security Research, 1:80.

Ippoliti, D. and Zhou, X. (2012). A-GHSOM: An adap-

tive growing hierarchical self organizing map for net-

work anomaly detection. J. Parallel Distrib. Comput.,

72(12):1576–1590.

Knapp, E. (2011). Industrial Network Security: Secur-

ing Critical Infrastructure Networks for Smart Grid,

SCADA, and Other Industrial Control Systems. Syn-

gress Publishing.

Kohonen, T. (1982). Self-organized formation of topolog-

ically correct feature maps. Biological Cybernetics,

43(1):59–69.

Kohonen, T. (2014). MATLAB Implementations and Appli-

cations of the Self-Organizing Map. Unigraﬁa Oy.

Kohonen, T., Schroeder, M. R., and Huang, T. S., editors

(2001). Self-Organizing Maps. Springer-Verlag New

York, Inc., Secaucus, NJ, USA, 3rd edition.

Landress, A. D. (2016). A hybrid approach to reducing the

false positive rate in unsupervised machine learning

intrusion detection. In SoutheastCon 2016, pages 1–

Langner, R. (2011). Stuxnet: Dissecting a cyberwarfare

weapon. IEEE Security Privacy, 9(3):49–51.

Mitrokotsa, A. and Douligeris, C. (2005). Detecting de-

nial of service attacks using emergent self-organizing

maps. In Proceedings of the Fifth IEEE Interna-

tional Symposium on Signal Processing and Informa-

tion Technology, 2005., pages 375–380.

Opolon, D. and Moutarde, F. (2004). Fast semi-automatic

segmentation algorithm for Self-Organizing Maps. In

European Symposium on Artiﬁcal Neural Networks

(ESANN’2004), Bruges, Belgium.

Petrovic, S. (2006). A comparison between the silhouette

index and the davies-bouldin index in labelling ids

clusters.

Rousseeuw, P. (1987). Silhouettes: A graphical aid to the in-

terpretation and validation of cluster analysis. J. Com-

put. Appl. Math., 20(1):53–65.

Schuster, F., Paul, A., and K

onig, H. (2013). Towards learn-

ing normality for anomaly detection in industrial con-

trol networks. In Emerging Management Mechanisms

for the Future Internet, pages 61–72, Berlin, Heidel-

berg. Springer Berlin Heidelberg.

Schuster, F., Paul, A., Rietz, R., and Koenig, H. (2015). Po-

tentials of using one-class svm for detecting protocol-

speciﬁc anomalies in industrial networks. In 2015

IEEE Symposium Series on Computational Intelli-

gence, pages 83–90.

Sestito, G. S., Turcato, A. C., Dias, A. L., Rocha, M. S.,

da Silva, M. M., Ferrari, P., and Brandao, D. (2018). A

method for anomalies detection in real-time ethernet

data trafﬁc applied to proﬁnet. IEEE Transactions on

Industrial Informatics, 14(5):2171–2180.

Stouffer, K., Falco, J., and Scarfone, K. (2011). Guide to

industrial control systems (ICS) security. Technical

report, National Institute of Standards and Technology

USA, Gaithersburg, MD, United States.

Ultsch, A. (1995). Self-organizing-feature-maps versus sta-

tistical clustering methods: A benchmark.

Vesanto, J. and Alhoniemi, E. (2000). Clustering of the self-

organizing map. Trans. Neur. Netw., 11(3):586–600.

uksel, O., den Hartog, J., and Etalle, S. (2016). Reading

between the ﬁelds: Practical, effective intrusion de-

tection for industrial control systems. In Proceedings

of the 31st Annual ACM Symposium on Applied Com-

puting, SAC ’16, pages 2063–2070, New York, NY,

USA. ACM.

Zanero, S. and Savaresi, S. M. (2004). Unsupervised learn-

ing techniques for an intrusion detection system. In

Proceedings of the 2004 ACM Symposium on Applied

Computing, SAC ’04, pages 412–419, New York, NY,

USA. ACM.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

344