Knowledge Base System for Risk Analysis of the Multi-step

Computer Attacks

Gulnara Yakhyaeva and Aleksey Ershov

Novosibirsk State University, Department of Information Technology, Pirogova st. 2, Novosibirsk, Russian Federation

Keywords: Information Security, Cyber Threats, Multi-step Attack, Knowledge Base, Description Logic, Interpretation,

Case-based Model, Fuzzy Model, Generalized Fuzzy Model.

Abstract: This work describes the module of the "RiskPanel" software system, performing risk analysis of multi-step

computer attacks. The module is based on statistical analysis of actual computer attack precedents. At the

user's request the system calculates objective probability of information security risks, taking into account

all possible multi-step attacks (i.e. possible combinations of known attacks). The estimation of probability is

presented as an interval because we don't always have a full description of real attacks.

The task of this work is described using the model-theoretic formalism. The first step is to build a

knowledge base of computer attacks. The formal description of the knowledge base structure is made with

the Description Logic. Formalization of estimated (fuzzy) judgments is made in the language of the Fuzzy

Model Theory. The article contains algorithms for calculation of probabilistic risk intervals and describes

program implementation of the developed methods.

1 INTRODUCTION

The information risk management is currently one of

the most important and dynamically developing

directions of strategic and operational management

in the field of information security. Computer

networks play an important role in many areas.

Successful cyber attacks lead to considerable

financial, image and other losses and provoke the

increase in the number of potential attackers. Rising

demands lead to complexity of structure and

increasing size of the networks, which in its turn

leads to increased complexity of security analysis

and delay in application of protective

countermeasures.

The main task of computer security specialists is

rapid response to changes of the current security

status of all the components of the corporate

information system. For this purpose it is useful to

have a software system that allows quickly

determining the type of an attack without any special

skills, learning the latest information about possible

consequences of the attack and how to prevent them.

Computer attacks can be divided into two

classes: single-step and multi-step attacks (Dawkins

and Hale, 2004). In a single-step attack, the attacker

exploits a vulnerability, commits an attack, and

reaches his ultimate goal. Multi-step attacks are

more complicated. The attacker can exploit the

vulnerability not to immediately achieve the goal,

but to open a new vulnerability, which can be used

for carrying out another attack. In this case, the

ultimate objective is achieved at the end of the chain

of multiple attacks.

There are two peculiar properties of multi-step

attacks: random relationship between multiple attack

steps and uncertainty in the description of the attack

steps of a multi-step attack (Zhang, et al., 2014; Zhai

and Zhou, 2011). Therefore, it is impossible to

manually analyze all possible combinations of the

attacks.

One of the possible approaches to the analysis of

network security includes attack modeling and

impact assessment based on attack graphs (Xinming

Ou, 2006). The concept of "attack graph" is quite

simple. An attack graph is a graph that represents all

possible sequences of attacks that allow the attacker

to reach his goals through multi-step exploitation of

vulnerabilities.

Attack graphs can be a useful tool in different

areas of computer security. A system administrator

can use attack graphs for:

− building a list of attacks to which the system is

vulnerable in the current configuration;

Yakhyaeva, G. and Ershov, A.

Knowledge Base System for Risk Analysis of the Multi-step Computer Attacks.

In Proceedings of the 18th International Conference on Enterprise Information Systems (ICEIS 2016) - Volume 2, pages 143-150

ISBN: 978-989-758-187-8

143

− search for ways how an attacker can achieve a

specific purpose;

− determination of the countermeasures which

should be taken to ensure the security of the

system.

This paper examines the precedent approach to

building attack graphs. This approach is based on the

model-theoretic formalization of domain ontologies

(Palchunov, 2006; Palchunov, 2008). The formal

description of the structure of the knowledge base is

made with the Description Logic (Baader, et al.,

2007). Formalization of estimated (fuzzy) judgments

is made in the language of the Fuzzy Model Theory

(Pal'chunov and Yakhyaeva, 2015). Software

implementation of the knowledge base is made with

the Neo4j

graph DBMS.

2 RELATED WORK

At present there are more than a hundred software

systems for information risk management. All of

them can be divided into two groups (Alhomidi and

Reed, 2014):

• basic software systems for qualitative risk

analysis;

• full analysis software systems for quantitative

risk analysis.

Basic software systems are commonly used by

the companies of the 3rd CMM maturity level

(Carcary, 2013).

Software systems for full risk analysis include

systems with more advanced tools of risk analysis

and management. Such tools are in demand among

the organizations of the 4th and 5th CMM maturity

levels. The fourth-level organizations address the

measurement of parameters, characterizing the

information security policy. Technology of

information security management remains the same,

but the risk analysis phase includes quantitative

methods of estimation of the residual risk parameters

and effectiveness of various countermeasures in risk

management. Different kinds of optimization

problems in the field of information security are

solved on the fifth level. The most famous software

systems of this class are CRAMM

, RiskWatch

Digital Security

and OCTAVE

neo4j.com

www.cramm.com

www.riskwatch.com

www.dsec.ru

www.cert.org/octave

There is currently no commonly used definition

of the "attack graph" concept, and various research

groups often mean different things by this word

combination.

One of the approaches (Sheyner, et al., 2002) is

to define the attack model, which is a finite state

machine with multiple states and possible transitions

between them, where each element of the set of

states describes the state of the entire computer

network at the current stage of the computer attack.

The attack graph of the specified model is the set of

finite sequences of transitions from the initial state

of the system in such states that violate specified

security property. This approach has one major

drawback. With the growth of the number of nodes

in a computer network, the number of possible states

of the network increases exponentially, making the

creation of such attack graph practically useful only

for small networks with a small number of

vulnerabilities.

Another approach is based on building a logical

attack graph (Wayne and Boyer, 2006). In this

approach the graph node is a logical expression. It

describes not the state of the network as a whole, but

only one aspect of this state. Graph edges represent

causal connections between the network

configuration and the potential privileges that an

attacker can get.

This logical approach has two certain advantages

over the "attack scenario" approach. Firstly, logical

attack graph clearly defines the causal relationship

between the system configuration and potential

attacker's privileges. In the case of "scenario"

approach it would be necessary to take into account

the full description of the system state on one or

several of the preceding steps to determine the

causes of dangerous situation. While logical graph

represents that causal relationship with the edges of

the graph. It is possible to restore all possible attack

scenarios by performing a simple depth-first

traversal of the logical graph. Secondly, the size of

the logical graph always depends polynomially on

the size of the network, and the size of the scenario

graph in the worst case has an exponential

dependence on the size of the network.

This paper introduces one more semantics of the

"attack graph" concept. In the proposed approach a

graph node is a precedent of a computer attack with

several properties. The edges of the graph are based

on the values of two properties of an attack

precedent: necessary preconditions to commit this

attack, and the consequences that the attack has for

the system. The oriented edge from graph vertex A

to vertex B is built if the consequences of

ICEIS 2016 - 18th International Conference on Enterprise Information Systems

144

committing attack A open the possibility of carrying

out the attack B. The multi-step attack is a path

through this oriented graph, each vertex of which is

reachable from the previous one by a single step

over a directed edge.

3 THE KNOWLEDGE BASE ON

COMPUTER SECURITY

3.1 Knowledge Base Structure

Let’s consider the domain Δ= "computer attacks".

All atomic concepts of the subject area are divided

into six classes:

1) ℙ



: “Symptoms”;

2) ℙ



: “Threats”;

3) ℙ



: “Vulnerabilities”;

4) ℙ



: “Consequences”;

5) ℙ



: “Losses”;

6) ℙ



: “Countermeasures”.

Each of these concept classes is represented in

the database as a treelike structure, and each attack

case is characterized by concepts from these classes

(Yakhyaeva and Yasinskaya, 2014). A set of

concepts for each class is composed by parsing the

database of computer security precedents, the

National Vulnerability Database (the NIST

agency).

The NVD is a USA state project, a regularly

updated database of computer vulnerabilities with

the descriptions (or references to the descriptions in

the other Internet resources) of vulnerabilities

discovered in various software systems and

components. The descriptions of these

vulnerabilities contain information on which version

of the product is affected by this vulnerability, how

it could be exploited, what measures can be taken to

remedy this vulnerability, and other information.

Table 1 shows some examples of atomic

concepts for each of the classes.

Hereafter let’s denote the atomic concepts by





()

, where the top index means that 



()

∈ℙ



. For

formalization of knowledge in the domain Δ the

binary relation ⊆ℙ



×ℙ



is introduced. The

semantics of this relation is the following:

〈





(



)

;



()

〉

∈ iff the consequence (of some attack)





(



)

opens vulnerability 



()

.

http://www.nist.gov/

Table 1: Examples of concepts for each of the above

classes.

Concept class Example of atomic concept

Symptoms Outgoing traffic increase

Threats Trojan Win32 Antavmu

Vulnerabilities Buffer overflow

Consequences Denial of service

Losses Loss of availability

Countermeasures Antivirus software

On the basis of this relation the atomic role  is

defined.

Therefore, in the domain Δ we will examine a

finite set ℙ=ℙ



∪ℙ



∪ℙ



∪ℙ



∪ℙ



∪ℙ



atomic concepts and one atomic role . The set of all

concepts



of the domain Δ is composed

according to the standard syntax of the Description

Logic. All the concepts from 



are divided into

role concepts (i.e. using an atomic role ) and non-

role concepts.

Terminology of the object domain  or 



(i.e. intensional knowledge about the domain)

consists of the set of all the specialization axioms,

representing the hierarchical structure of concept

classes.

The second component of a knowledge base is

the World Description or 



(i.e. extensional

knowledge about the domain). In the 



individuals are firstly introduced by giving them

names, and then the asserts properties of these

individuals (Baader, et al., 2007).

In the domain Δ lets define individuals as the

precedents of computer attacks. The main sources of

information about computer attacks for RiskPanel

system are corporate databases, NIST and MITRE

Each attack  is characterized by the

presence/absence of certain concepts from each class

of concepts ℙ



. Thus, for each attack we build a

finite set of concept assertions of the following two

kinds:





(



)

() or ¬



(



)

().

The concept assertion of the kind (



,



) is

defined in the following way. There is concept

assertion 



,



 iff there are such concept

assertions 



(



)

(



) and 



(



)

(



) that

〈





(



)

,



(



)

〉

∈

http://www.mitre.org/

Knowledge Base System for Risk Analysis of the Multi-step Computer Attacks

145





(where 



is the transitive closure of the relation

).

Here the composition of the World Description

for object domain  or 



is finished. The pair

〈





,



〉

is the Knowledge Base of the

domain Δ. As new concepts of this domain (i.e. New

threats, vulnerabilities, etc.) or new precedents of

computer attacks appear, the Knowledge base is

expanded. However, the structure of the knowledge

base remains unchanged.

3.2 Model-theoretic Formalization of

Estimated Judgements

To perform statistical analysis of the data we need

the precedent model and the fuzzy model of the

domain (Pulchunov and Yakhyaeva, 2005;

Yakhyaeva, 2007).. These models are composed,

based on the interpretation set of the Knowledge

Base 



Consider a finite set of the computer attacks

=







,…,





which was used for describing 



. Let  be a

domain of quantification. Then  is a binary relation

over this domain, and ℙ is a collection of unary

relations as well. Denote



=ℙ∪







. Then

Definition 1. Algebraic system 



〈

,



〉

called interpretation of 



if 



⊨



(i.e.

for each concept assertion ∈



we have





⊨).

Definition 2. An ordered triple (



)⇋

〈





,



,

〉

is called a precedent model of the

domain , generated by the interpretation 



〈

,



〉

, provided that for every concept () ∈





we have

(

()

)



∈|



⊨()



In the precedent model each concept is associated

with the set of computer attack precedents which

have this concept as a characteristic. Note that from

the model-theoretical point of view, the model

(



) is a Boolean-valued model. In this

Boolean-valued model each proposition of the

signature 



∪









is associated with an element of

Boolean algebra ().

Most methods of statistical processing of data

use objective and/or subjective probability. The

objective probability refers to the relative frequency

of occurrence of any event in the set of observations

or the ratio of the number of desired events to the

total number of observations. The subjective

probability refers to the degree of confidence of

some expert or group of experts that this event will

actually happen.

In the proposed approach the concept of fuzzy

model is used to describe objective probabilities.

Definition 3. An ordered triple (



)⇋

〈





,



,

〉

is called a fuzzy model of the domain ,

generated by the interpretation 



〈

,



〉

provided that for every concept () ∈ 



have

μ

(



)

=

‖

∈|



⊨()

‖

‖



‖

In the fuzzy model, the truth values of the

statements (concepts) are the numbers from the

interval [0,1], which represent the objective

probability of concept belonging to a random attack

precedent. A more detailed description of the

properties of the precedent and fuzzy models can be

found in the works of (Yakhyaeva, 2009; Pal'chunov

and Yakhyaeva, 2015).

Note that the information about precedents from

the Internet can be not complete. Then (according to

the “open-world semantics” paradigm) we have a set

of different interpretations of the 



. From the

perspective of the model theories, it is a class of

models generated by the set of sentences 



Denote the class of all the interpretations of 













|



⊨





Definition 4. An ordered triple 



⇋

〈





,



,

〉

is called the generalized fuzzy model of

the domain , generated by the class of

interpretations 



, provided that for every concept

() ∈ 



we have



(



)

=μ

(



)



(



)

〈





,



,

〉





∈



.

Consider the concept 

(



)

∈



. Let





‖

∈|∀



∈



:



⊨()

‖

and





=1−

‖

∈|∀



∈



:



⊭()

‖

In paper ( (Yakhyaeva & Yasinskaya, 2015)) it was

proven that

inf

(



)

=





‖



‖

and

sup

(



)

=





‖



‖

Note that if () ∈ 



is not a role concept than



(



)

=





‖



‖







‖



‖

,…,





‖



‖

.

ICEIS 2016 - 18th International Conference on Enterprise Information Systems

146

And if () is a role concept than



(



)

⊆





‖



‖







‖



‖

,…,





‖



‖

.

4 ALGORITHMS OF RISK

ANALYSIS

Risk analysis is the systematic use of available

information to determine how often specified events

may occur and the magnitude of their

consequences

. In our case, risk analysis is

determining the probability of threats,

vulnerabilities, consequences and losses which may

happen as a result of a computer attack on a

corporate information system. In other words, the

problem of risk analysis is finding the infimum and

the supremum of truth values of concepts from





on the generalized fuzzy model 



Note that if () ∈ 



is a non-role concept,

then it describes the state which the system will have

after a single-step computer attack. The article

(Yakhyaeva and Yasinskaya, 2015) describes the

algorithm

getPDNFVerityOnCase, computing

truth values for such concepts. The algorithm is

proven to be correct. The work (Yakhyaeva, et al.,

2014) describes the module of the RiskPanel system,

implementing risk analysis for single-step attacks in

the question-answer mode.

On the other hand, role concepts describe

properties of multi-step attacks. This work proposes

algorithms for answering the following questions:

(Q1) What is the probability that after some multi-

step attack the system will have the state,

matching the description of non-role concept

()?

(Q2) What is the probability that after any multi-step

attack the system, sooner or later, will

definitely have the state, matching the

description of non-role concept 

(



)

Questions (Q1) and (Q2) can be formalized with

concepts





(

∨∃.

)

и 



=∨∀.

(

∃.

)

,

where  is any non-role concept.

Our task is to find the infimum (denoted by

inf(



)) and the supremum (denoted by sup(



)) of

truth values of these concepts on the generalized

fuzzy model 



. Let us consider the graph =

〈

,

〉

defined on the set of attack precedents. We

http://www.palisade.com/risk/risk_analysis.asp

perform postfix traversal of . As a result, every

graph vertex is assigned with one of the three values:

TRUE, UNKNOWN or FALSE. Therefore

inf

(





)



(



)



sup(



1−()



where () is the number of vertices in the

graph  labeled , () is the number of

vertices in  labeled  and  – total amount of

vertices in .

Here we provide pseudocode of the two

algorithms, FuzGLEMP (Fuzzy graph labeling for

the extended modality of possibility) and

FuzGLEMN (Fuzzy graph labeling for the extended

modality of necessity), for labeling graph  with

concepts 



and 



respectively (see Table 2 and

Table 3).

Table 2: Algorithm FuzGLEMP.

PG; // precedent graph

E; // expression from CONΔ

// queue for labeling

Q = {leafs of PG};

while (Q ≠ Ø) {

// the first vertice in the Q

V = Q.take();

if (V is leaf of PG) {

// label V with the result of

// evaluation of E on V

label(V, eval(E, V));

Q.add({all immediate ancestors of V that are not in

Q yet});

} else {

if (all immediate successors of V are labeled) {

D = {labels of all immediate successors of V};

if (D contains TRUE) {

label(V, TRUE);

} else if (D contains only UNKNOWN and FALSE)

{

if (eval(E, V) == TRUE) {

label(V, TRUE);

} else {

label(V, UNKNOWN);

}

} else if (D contains only FALSE) {

label(V, eval(E, V));

}

} else {

Q.add(V);

}

Knowledge Base System for Risk Analysis of the Multi-step Computer Attacks

147

Figure 1: The main window.

Both algorithms are based on the bottom-up

graph labeling algorithm from ((Blackburn, et al.,

2007), which was modified to work with non-

complete knowledge about computer attack

precedents.

Table 3: Algorithm FuzGLEMN.

PG; // precedent graph

E; // expression from CONΔ

// queue for labeling

Q = {leafs of PG};

while (Q ≠ Ø) {

// the first vertice in the Q

V = Q.take();

if (V is leaf of PG) {

// label V with the result of

// evaluation of E on V

label(V, eval(E, V));

Q.add({all immediate ancestors of V that are not in

Q yet});

} else {

if (all immediate successors of V are labeled) {

D = {labels of all immediate successors of V};

if (D contains FALSE) {

label(V, FALSE);

} else if (D contains only UNKNOWN and

TRUE) {

if (eval(E, V) == FALSE) {

label(V, FALSE);

} else {

label(V, UNKNOWN);

}

} else if (D contains only TRUE) {

label(V, eval(E, V));

}

} else {

Q.add(V);

}

5 SOFTWARE INTERFACE

DESCRIPTION

We developed a software system called RiskPanel

asessentially a workplace for experts to ensure the

security of corporate information, based on the

methodology of generalized fuzzy models

(Pulchunov, et al., 2011). RiskPanel has a module

for analysis of the various multi-step computer

attacks. Consider the module interface.

ICEIS 2016 - 18th International Conference on Enterprise Information Systems

148

Figure 2: Input area.

The main window (Fig. 1) has four main areas:

− A — the input area

− B — the area for selection of atomic concepts

− C — the area with detailed information about

atomic concepts

− D — the output area

Input area A allows user to enter the specific

question (Fig. 2).

First of all, the user selects the type of his

question (drop-down list (1)). Currently there are

only two types: MODAL_BOX and

MODAL_DIAMOND (implementing the concepts





and 



respectively).

Secondly the user composes non-role concept

(), which is what the question is related to.

Atomic concepts (2), necessary for composition of

the concept () can be selected in the main

window area B. Clicking with the left mouse button

on the concept shows all the available information

related to this concept in the area C. Logical

operators, connecting atomic concepts, can be set

with drop-down selectors (3). The new operator or

an atomic concept can be added by pressing the

button (4). If the newly added item is an operator,

then its operands will be empty, and buttons like (4)

are displayed instead of them.

When the statement is composed, the user can

press the button (5) to start computation of the truth

value of the statement. If the statement is not

complete (e.g. there are still some empty operands

left), the user will be warned about this. The truth

value cannot be calculated until the user composes

the complete statement.

If the computation is successful, the area D will

contain the truth probability interval of the

composed statement.

6 CONCLUSIONS

This article proposes methods of quantitative risk

analysis of multi-step computer attacks. These

methods are based on the combination of the

precedent approach to knowledge representation and

description logic.

Domain knowledge is represented as a finite set

of real attack precedents, i.e. a set of semi-structured

natural language texts.

Based on these precedents, in TBox and ABox

terms of the Discription Logic the Knowledge Base

is built. For each interpretation of the Knowledge

Base the precedent and the fuzzy model of the

domain is generated. The class of fuzzy models,

generated by the set of all possible interpretations is

a generalized fuzzy model. This model is used for

evaluation of truth values of formula descriptions of

information risks.

Developed methods are implemented in one of

the modules of the “RiskPanel” software system.

ACKNOWLEDGMENT

The reported study was partially supported by

RFBR, research project No. 14-07-00903-a.

REFERENCES

Alhomidi, M. & Reed, M., 2014. Attack graph-based risk

assessment and optimization approach. International

Journal of Network Security & Its Applications

(IJNSA), 6(3), pp. 31-43.

Baader, F., McGuinness, D., Nardi, D. & Patel-Schneider,

P., 2007. The description logic handbook: Theory,

implementation, and applications. 2-d ред. Cambridge

: Cambridge University Press.

Blackburn, P., Van Benthem, J. & Wolter, F., 2007.

Handbook of Modal Logic. Amsterdam: Elsevier.

Carcary, M., 2013. IT Risk Management: A Capability

Maturity Model. The Electronic Journal Information

Systems Evaluation, 16(1), pp. 3-13.

Dawkins, J. & Hale, J., 2004. A Systematic Approach to

Multi-Stage Network Attack Analysis. Washington,

Proceedings ofthe Second IEEE International

Information Assurance Workshop (IWIA '04), IEEE

Computer Society.

Palchunov, D., 2006. Simulation of thinking and

formalization of reflection: I. Model-theoretic

formalization of the ontology and reflection.

Filosofiya nauki, 31(4), pp. 86-114.

Palchunov, D., 2008. Simulation of thinking and

formalization of reflection: II. Ontologies and

Knowledge Base System for Risk Analysis of the Multi-step Computer Attacks

149

formalization of concepts. Filosofiya nauki, 37(2), pp.

62-99.

Pal'chunov, D. & Yakhyaeva, G., 2015. Fuzzy logics and

fuzzy model theory. Algebra and Logic, 54(1), pp. 74-

80.

Pulchunov, D. & Yakhyaeva, G., 2005. Interval fuzzy

algebraic systems. Proceedings of the Asian Logic

Conference , pp. 23-37.

Pulchunov, D., Yakhyaeva, G. & Hamutskya, A., 2011.

Software system for information risk manadgement

"RiskPanel". Programmnaya ingeneriya, Том 7, pp.

29-36.

Sheyner, O. и др., 2002. Automated Generation and

Analysis of Attack Graphs. Oakland, California, IEEE

Symposium on Security and Privacy.

Wayne, F. & Boyer, M. A. M. X. O. A., 2006. A Scalable

Approach to Attack Graph Generation. New York,

CCS '06 Proceedings of the 13th ACM conference on

Computer and communications security.

Xinming Ou, W. F. B. M. A. M., 2006. A Scalable

Approach to Attack Graph Generation. New York,

б.н., pp. 336-345.

Yakhyaeva, G., 2007. Fuzzy model truth values.

Bratislava, Slovak Republic, Proceedings of the 6-th

International Conference Aplimat, pp. 423-431.

Yakhyaeva, G., 2009. Logic of Fuzzifications. Tumkur,

Proceedings of the 4th Indian International Conference

on Artificial Intelligence (IICAI-09), pp. 222-239.

Yakhyaeva, G. & Yasinskaya, O., 2014. Application of

Case-based Methodology for Early Diagnosis of

Computer Attacks. Journal of Computing and

Information Technology, 22(3), p. 145–150.

Yakhyaeva, G. & Yasinskaya, O., 2015. An Algorithm to

Compare Computer-Security Knowledge from

Different Sources. Barcelona, Spain, Proceedings of

the 17th International Conference on Enterprise

Information Systems, pp. 565-572.

Yakhyaeva, G., Yasinskaya, O. & Karmanova, A., 2014.

Probabilistic question-answering system in the field of

computer security. Vestn. Novosib. gos. un-ta. Seriya:

Informacionnye tehnologii, 12(3), pp. 132-145.

Zhai, G. & Zhou, S., 2011. Construction and

implementation of multistep attacks alert correlation

model. Journal of Computer Applications, 31(5), p.

1276–1279.

Zhang, Y., Z. D. & Liu, J., 2014. The Application of

Baum-Welch Algorithm in Multistep Attack Hindawi

Publishing Corporate Scientific World Journal

Volume 2014. Scientific World Journal, Том 2014, p.

ICEIS 2016 - 18th International Conference on Enterprise Information Systems

150