An Algorithm to Compare Computer-security Knowledge from

Different Sources

Gulnara Yakhyaeva and Olga Yasinkaya

Department of Information Technology, Novosibirsk State University, Novosibirsk, Russian Federation

Keywords: Information Security, Cyber Threats, Case-based Model, Fuzzy Model, Generalized Fuzzy Model,

Generalized Case.

Abstract: In this paper we describe a mathematical apparatus and software implementation of a module of the

RiskPanel system, aimed to compare computer-security knowledge learned from various online sources.

To describe this process, we use model-theoretic formalism. The knowledge of a particular computer attack

obtained from the same source is formalized as an underdetermined algebraic system, which we call a

generalized case. The knowledge base is a set of generalized cases. To implement the knowledge

comparison, we construct a generalized fuzzy model, the product of all algebraic systems stored in the

database.

We consider an algorithm for computing consistent truth values and describe a software implementation of

the developed methods. The developed algorithm has polynomial complexity.

1 INTRODUCTION

The main problem in designing intelligent systems is

how to represent and process knowledge. Computer

programs should have knowledge of a given subject

domain presented in a formalism that is useful for

the program. Knowledge representation consists

mainly of identifying the most appropriate

formalisms for representing knowledge and the most

effective methods for manipulating this knowledge

(Thayse, 1989).

This problem is particularly acute for knowledge

of information security and cyber threats. In these

subject domains, the value of information depends

much more on its novelty than in most other

scientific and technological domains. To effectively

protect against computer threats, they must be

identified as early as possible. Text in natural

language on the Internet is one of the most relevant

sources of such information. This gives rise to the

need of representing security knowledge as

ontologies. There are many application of

knowledge based systems to computer security (for

example (Ruhroth et al., 2014), (Gartner et al.,

2014), (Burger et al., 2013)).

One method to process knowledge learned from

text in natural language is model-theoretic

knowledge representation, based on the model-

theoretic approach developed to formalize domain

ontologies (Palchunov, 2008) and Case-based

reasoning methodology (Kolodner, 1992), (Assali et

al., 2013). Under this approach, the knowledge

learned from texts written in natural language is

presented as algebraic systems (domain cases)

(Yakhyaeva and Yasinskaya, 2014). Using these

systems, a case-based model of the subject domain

can be constructed. The truth value of a sentence in

the case-based model is the set of cases for which

the sentence is true in a strict sense. From the case-

based model fuzzification we obtain a fuzzy model,

in which the truth values of the sentences are

numbers in the interval [0, 1]. By fuzzifying a set of

case-based models, we obtain a generalized fuzzy

model. A formal (model-theoretic) description of

these models can be found in (Pulchunov and

Yakhyaeva, 2005) and (Yakhyaeva, 2007).

Knowledge-based systems are required to exploit

knowledge from multiple sources to solve

increasingly difficult problems. Therefore there is a

need to establish a mechanism of knowledge

integration. Many researchers in different subject

domains are interested in that problem and consider

it from different points of view. Haddad and

Bozdogan (2009) provide definition for the

knowledge integration phenomenon at both the

565

Yakhyaeva G. and Yasinkaya O..

An Algorithm to Compare Computer-security Knowledge from Different Sources.

DOI: 10.5220/0005347205650572

In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS-2015), pages 565-572

ISBN: 978-989-758-096-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

conceptual and operational levels. Steier et al.,

(1993) implemented knowledge source integration

mechanism in Soar architecture system. Console et

al., (1991) analyzed integration of different

knowledge sources in model-based diagnostic

system. Semi-automatic integration of knowledge

sources using semantic knowledge articulation tool

(SKAT) was provided by Mitra et al., (1999).

One way to generate new knowledge through

texts in natural language by comparing and

integrating knowledge from different texts

(Pulchunov, 2009). While extracting knowledge

from natural language texts, different generalized

fuzzy models are built. Accordingly, there is a need

to compare the different algebraic systems.

This paper presents a model-theoretic description

of comparing knowledge learned from different texts

in natural language and an application of this theory

in the subject domain of computer security.

2 MATHEMATICAL

FOUNDATIONS OF THE

DEVELOPED APPROACH

2.1 Case-based Models

First, we define a finite set of documents, each

describing some case of computer attacks. We

describe each case by algebraic system 

〈

,

〉

where  is the universe of the algebraic system and

 is its signature. Signature σ is a set of concepts

that describe this subject domain: the set of

vulnerabilities, threats, countermeasures,

consequences, and so on. We assume that all these

cases have the same signature. We denote set A and

signature  as 



∪







|∈



. Algebraic

systems by which we describe instances of domain

belong to the following class











⇋



〈







|∈



,



〉

|











.

Let ℘ denote the set of all subsets of X, and let











denote the set of all sentences of the signature





The algebraic system , a model of some

computer attack, will be called the case of the

considered subject domain. For each set of cases 

we define a case-based model 



Definition 1. Let ⊆



 be a set of cases. A

system 



⇋

〈

,,



〉

, where 



:









→℘, is

called a case-based model (generated by the set E) if















∈|⊨



for any sentence  of the signature 



The case-based model is a Boolean model in

which the truth values of the sentences are the

elements of Boolean algebra. In this case, the truth

values of the sentences are the elements of Boolean

algebra of all subsets of set .

Consider set  to be the set of all kinds of

computer attacks: those that have already occurred,

and those that can still occur. At any one moment,

our knowledge of cyber attacks that have already

happened is finite. However, this knowledge is

constantly growing, adding new cases. Thus, we can

assume that set  can be counted. It is sufficient to

consider only the finite subsets of  to formalize our

knowledge about the domain at different times.

Thus, we consider only finite sets of cases. Denote a

class of all finite case-based models





⇋







|⊆











‖



‖





Suppose we have a case-based model 



, a

mathematical formalization of the knowledge base

of computer-attack cases. To calculate the objective

probabilities of different attacks occurring, we

define the notion of the fuzzy model.

Definition 2. Let 



∈



be a base-based model. A

system 





〈

,



,

〉

is called a fuzzy model,

generated by the model 



(denoted 









) if 









‖







‖



‖

for any sentence  of the signature 



We introduce the notation for class of all fuzzy

models, generated by the models from class 







⇋



|∃



∈



:







.

In practice, one cannot have full information of a

considered subject domain. For example, we cannot

have information about all the cyber attacks and

information-security violations that have occurred.

Also, the documentation of particular attacks may be

incomplete or inaccurate. It is impossible to give a

complete description of the case-based model

describing this subject domain, so we must consider

fuzzy models that describe the properties of the

subject domain that are already known. To formally

describe this situation, we introduce the concept of

the generalized fuzzy model.

Definition 3. Let ⊆



and ∅. A system







〈

,



,



〉

is called a generalized fuzzy model

(generated by the class ) if











∈



0,1



|∃



∈:









ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

566

for any sentence  of the signature 



2.2 Principle of Comparing the

Generalized Fuzzy Models

One interpretation of the generalized fuzzy model is

as follows: Suppose there is some expert in a subject

domain described by the language 



. For example,

this expert may be the system administrator of an

enterprise, and the subject domain may be computer

security. The expert must deal with a set of

situations—the cases of that subject domain—for

example, a set of cyber attacks. This set of cases can

be considered as a probability space. The cases are

the elementary outcomes of this probability space.

Naturally, the expert does not know the full

description of each case or the truth value of all

sentences of signature 



for each case.

Nevertheless, the expert can estimate the

probabilities of the truth values of the sentences

based on known information. For example, an expert

can claim that 70% of computer attacks use denial-

of-service attacks or that not less than 60% of cyber

attacks are done to steal information. Previous

studies (Pulchunov and Yakhyaeva, 2010) have

shown how such probabilistic expert knowledge can

be formalized into a generalized fuzzy model.

Now, suppose we have several experts in the

subject domain. Each expert has unique knowledge

of the subject domain, as they may get their

knowledge from different sources and may have

different training. When making a decision, we

would like to account for the opinions of all the

experts to find a compromise.

This problem can be described in formal

language as follows: Let the subject domain be

described by a signature 



, where  is the set of

individuals (basic set) taken from the total set of

individuals (basic set) in this subject domain. To

describe the domain, we construct a finite number of

generalized fuzzy models as









〈

,



,



〉

|1,…,,

where 



is the set of case-based models that

generate the model 





, and 



is the evaluation of

all sentences of the signature 



in model 





. Note

that the truth values of a sentence in the generalized

fuzzy model are different subsets of the interval [0,

1].

Then, the problem of comparing a finite number

of models 





,…,





can be formulated as follows:

the description of the procedure (algorithm)

allowing for any ∈



 based on the truth

values 









,…,



 of this sentence on models







,…,





build a consistent truth value









⊆0,1.

This problem can be formalized by constructing an

n-ary function

:



0,1







→



0,1



.

While constructing this function, consistent truth

values for different sentences should not contradict

each other. For example, it would be strange for our

comparison principle to produce

















1.

Thus, it is more reasonable to formulate the principle

of comparing n generalized fuzzy models as an n-ary

function  defined by the set of all generalized fuzzy

models:

:

〈







,…,





〉

↦



.

Moreover, it would be ideal for this comparison

principle to work on any finite set of models and to

not depend on the order the models are considered.

These properties are achieved by using the

properties of associativity and commutativity.

2.3 Product of the Generalized Fuzzy

Models

First we define the operation of the product on the

class 



of case-based models.

Definition 4. Let 





,





be case-based models.

We assume that 



∩



∅ (perhaps, after

renaming). A model 



is called the product of 





and 





, denoted as 









∗





, if:

1) 



∪



;

2) 





















∪





 for any ∈









Paper (Pulchunov and Yakhyaeva, 2010) proved that

the operation * is associative, commutative, and

closed in a set of case-based models.

Statement 5. Let 









∗





and 













,











, 







. Then,





















∙

‖





‖





∙

‖





‖





‖



‖





‖

for any ∈









A proof of this statement can also be found in

(Pulchunov and Yakhyaeva, 2010).

Consequence 6. Let 









∗





and 













,











, 







. Then,

AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources

567















,































,









for any ∈









Now we can define the operation of the product

on a set of generalized fuzzy models.

Definition 7. Consider the generalized fuzzy models













. Let 



∗











∗





|





∈



,





∈



.

Then, the generalized fuzzy model 





∗



is called

the product of models 











Because the product of case-based models is

commutative and associative, the product of the

generalized fuzzy models will also be commutative

and associative.

3 COMPUTER SECURITY

SOFTWARE

3.1 Knowledge Base

We developed a software system called RiskPanel,

essentially a workplace for experts to ensure the

security of corporate information, based on the

methodology of generalized fuzzy models

(Pulchunov et al., 2011).

The core of this system is an information-

security knowledge base. To organize and work with

the knowledge base, we use OntoBox technology

(Malykh and Mantsivoda, 2010). This system

represents and stores data in an ontological format

and has powerful, flexible processing tools. It allows

for great modularity and portability of knowledge

bases, advantageous when developing complex

information systems.

Seven categories of attributes (classes) were

created to describe the cases in the OntoBox

knowledge base: symptoms, threats, vulnerabilities,

consequences, loss, countermeasures, and

configurations. Each attribute category was

represented with a tree structure. The cases in the

database are characterized by certain attributes of

each category. Each case is formed based on natural-

language text found on the Internet (Yakhyaeva and

Yasinskaya, 2012).

While analyzing the texts provided to form the

cases, we found most of them had clear but not full

information. In other words, we could not perfectly

describe whether a particular case had specific

knowledge-base attributes. To solve this problem,

we proposed using an open-world semantic

methodology, widely used in description logic

systems (Baader, 2003). Basically, this approach

considers all possible interpretations of unknown

information. Thus, to mathematically describe a

computer-attack case, we consider a generalized

fuzzy model with certain attributes, called a partial

case.

Definition 8. Consider a set ⊆



 and

evaluation :→0,1. We say that Case  is

consistent with the evaluation  (and denote ↑)

if ⊨⟺







1 for any ∈.

Definition 9. Consider a set ⊆



 and

evaluation :→0,1. A generalized fuzzy model





is called a generalized case (generated by the

evaluation ν) if





|∈









↑



In this formalism, the entire knowledge base of

RiskPanel can be considered a finite set of

generalized cases. When drawing conclusions from

this knowledge base, we must compare these

models.

For a knowledge base formalized as a set of

generalized cases, it is most appropriate to use a

comparison principle based on the product of the

generalized fuzzy models, because it is consistent

with open-world semantics.

Note that each generalized case 



is not an

interval model. Moreover, for each sentence ∈











, the truth value 



 belongs to













0,1



. Now, we must formulate an

algorithm for calculating the truth values in a

consistent model of generalized cases.

Theorem 10. Let 





,…,





be generalized cases.

Then, for ∈



 we have







∗…∗















;

1



;…;





,

where







|











1

and







|











0,1.

Proof. Let ∈





∗…∗









. Then, there are such





∈



,…,



∈



such that (see Statement 5)













⋯







where for any 1,…, if 



⊨ then we have











1, and if 









0 then we have 



⊭.

Obviously,

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

568











⋯









.

Thus, we obtain ∈





;





;…;





.

Now consider ∈





;





;…;





. Let 





Obviously, . Let







|











1







,…,







,







|











0,1







,…,







.

First, we select one case from each generalized case

of set A, denoting them as 





∈





1,…,.

Then we select cases 





∈





1,…,

from the generalized cases of set B such that 





⊨

. Last, we select cases 





∈







1,…, from the generalized cases of set B such

that 





⊭.

Obviously,







∗…∗





∗





∗…∗





⊨



.

Thus, ∈





∗…∗









Note that comparing the finite set of generalized

cases will not produce an interval model. But, when

→

∞

, the truth values of sentences in a consistent

model will tend toward intervals on the set

0,…,1∩ℚ. Thus, in practice, when dealing with a

large enough set of cases, we can view the truth

values in a consistent model as intervals.

3.2 Theorem of Atomically Generalized

Cases

Definition 11. A generalized case is called an

atomically generalized case if it is generated by

evaluating the subset of the set of all atomic

propositions.

Consider a quantifier-free sentence 



,…,





from  atomic propositions. Let us reduce this

sentence to PDNF:









,…,









∨…∨



where 



∈



1,…,



 are the elementary

conjunctions consisting of atomic propositions





,…,



We introduce the following notation:

















,…,









,









∈







|









0







,









∈







|









1







,



0,1







∈







|









0,1



Theorem 12. Let 



be an atomically generalized

case, and let be a quantifier-free sentence of

signature 



. Then,



























,











,







;











,







∅







‖





,



0,1



‖

2

‖







|











,



‖

;





0,1



,.



Proof. Obviously,

















⇔∀∈



⊭



⇔

⇔∀∈



⊭



,…⊭





⇔

⇔











⋯

















⇔

⇔











,





On the other hand,





,





∅⇔∃



∀∈



⊨





⇒

⇒∀∈



⊨



⇔















Let







,…,





be a set of atomic propositions

included in . Consider the set of elementary

conjunctions









&…&







|∃∈:⊨







&…&







.

Let 

‖





|











0,1

‖

. Obviously, 

0; otherwise, 



,



0,1



∅. Consequently,

‖



‖

2



Assume now that 











,





and





,





∅. Thus, 



,



0,1



∅.

Moreover, 



,



0,1



⊆.

Consider two cases: 



,



0,1



 and





,



0,1



.

Let 



,



0,1



. Then, for any case ∈

, there is a conjunct 



∈



,



0,1



such that

⊨



. Consequently, 















for any case

∈.

Assume now that 



,



0,1



⊂. Then there

is a case 



∈ such that 

′

⊭



for any 



∈





,



0,1



. Because we have assumed that





,





∅, then 

′

⊭. On the other hand,

because 



,



0,1



∅, there is a case 



such

that 



⊨. Thus, 













0,1



3.3 Module of Knowledge Comparison

RiskPanel has a module for comparing knowledge

learned from various computer-attack cases.

Currently, the module interface allows one to

calculate the truth value as an interval for a formula

presented in PDNF (perfect disjunctive normal

form).

Consider the module interface (Fig. 1). To input

data into the main algorithm, the user must enter the

AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources

569

parameters of the formula using the resources

provided. First, the user must select the attributes

included in all conjunctions of PDNF. Next, the user

must specify the number of conjunctions in the

formula. Then, drop-down lists of «+» and «–»

values appear with the resulting PDNF, where «–»

symbolizes negation of the argument. The data from

this window with PDNF can be inputted into the

main algorithm by clicking the button titled «Get the

value of the formula».

The value of the formula is calculated as an

interval (see Theorem 10). The start value of the

interval is the ratio of the number of cases for which

the formula is true to the number of all existing

cases. The end value of the interval is the ratio of the

number of cases for which the formula is true, added

to the number of cases for which the truth value of

the formula is not defined, to the number of all

existing cases.

The algorithm used to determine the truth value

of a formula in the generalized case is based on

Theorem 12 and shown in Table 1. At first, false

conjunctions that contradict the available

information for the case are eliminated from the

formula. If no conjunctions in the formula remain,

then the formula for the case is false. If the

remaining conjunctions do not have unknown

attribute values for the case, then the formula for the

case is considered true. If the remaining

conjunctions have unknown attribute values, then

the algorithm operates as follows: If the number of

remaining conjunctions is less than 2



, where  is

the number of unknown attribute values in the

conjunction, then the truth value of the formula for

the case is not defined, otherwise the formula is true.

To determine whether a case has attribute values

included in the conjunctions requires 

operations, where  is the number of attribute values

in all categories stored in OntoBox. To eliminate

false conjunctions for the case based on the

information of attribute values requires 

operations, where  is the number of conjunctions in

PDNF. The total number of attribute values involved

in the conjunctions cannot exceed . Thus, the total

algorithmic complexity of the developed approach

for defining the truth value of PDNF in a case is

.

Further, if the OntoBox knowledge base has m

computer-attack cases, then calculating the truth

value of the formula in interval form needs

 operations.

Table 1: The algorithm for determining the truth value of a

formula in the generalized case.

alg getPDNFVerityOnCase(arg Case case,

arg list PDNFFormulaAttrs,

arg matrix PDNFBoolMatrix)

begin

| bool rightValue,

| int unknownAttrsCount,

| list removedIndexes

| for each Attribute attr in

| | PDNFFormulaAttrs

| | int attrValueOnCase :=

| | checkIfCaseHasAttr(attr, case)

| | if (attrValueOnCase = UNKNOWN_ATTR)

| | | unknownAttrsCount :=

| | | unknownAttrsCount + 1

| | else

| | | if (attrValueOnCase = HAS_ATTR)

| | | | rightValue := true

| | | else

| | | | rightValue := false

| | | list boolRow :=

| | | PDNFBoolMatrix.get(

| | | PDNFFormulaAttrs.indexOf(attr))

| | | for int i = 0 to boolRow.size()

| | | | if (removedIndexes does not

| | | end of loop

| end of loop

| int remainingConjCount :=

| PDNFBoolMatrix.get(0).size() –

| removedIndexes.size()

| if (remainingConjCount = 0)

| | return PDNF_FALSE

| if (unknownAttrsCount = 0)

| | return PDNF_TRUE

| if (remainingConjCount <

| | 2^unknownAttrsCount)

| | return PDNF_UNKNOWN

| else

| | return PDNF_TRUE

end

4 CONCLUSIONS

This work describes the mathematical apparatus and

software implementation of one of the modules of

the RiskPanel system, aimed to compare computer-

security knowledge learned from various online

sources.

Algorithms implemented in this module are

based on the methodology of generalized fuzzy

models. The knowledge obtained from a single

source is formalized as an algebraic system and is

stored in the knowledge base of the RiskPanel

system. To implement the knowledge comparison,

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

570

Figure 1: Module of Knowledge Comparison.

we construct a generalized fuzzy model, the product

of all algebraic systems stored in the database.

The system interface allows one to calculate the

truth value of any quantifier-free sentence. The input

sentence is presented in PDNF. The truth value is

calculated as a probability interval.

The developed algorithm has polynomial

complexity.

ACKNOWLEDGEMENTS

The research for this paper was financially supported

by the Ministry of Education of the Russian

Federation (project no. 2014/139) and was partially

supported by RFBR (project no. 14-07-00903-a).

REFERENCES

Assali, A., Lenne, D. & Debray, B., 2013. Adaptation

Knowledge Acquistion in a CBR System.

International Journal on Artificial Intelligence Tools,

22(1).

Baader, F., 2003. The Description Logic Handbook. Ney

York: Cambridge University Press.

Burger, J. et al., 2013. Model-Based Security Engineering:

Managed Co-evolution of Security Knowledge and

Software Models. Foundation of Security Analysis and

Design VII - FOSAD 2012/2013 Tutorial Lectures.

Springer Lecture Notes in Computer Science, pp. 34-

53.

Console, L., Theseider, D. & Torasso, P., 1991. Towards

the integration of different knowledge sources in

model-based diagnosis. Trends in Artifician

Intelligence, Lecture Notes in Computer Science,

Volume 549, pp. 177-186.

Gartner, S. et al., 2014. Maintaining requirements for

long-living software systems by incorporating security

knowledge. IEEE 22nd International Requirements

Engineering Conference, pp. 103-112.

Haddad, M. & Bozdogan, K., 2009. Knowledge

Integration in Large-Scale Organizations and

Networks - Conceptual Overviev and Operational

Definition. [Online] Available at: http://dx.doi.org/

10.2139/ssrn.1437029

Kolodner, J., 1992. An introduction to Case-based

reasoning. Artificial Intelligence Review, Volume 6,

pp. 3-34.

Malykh, A. & Mantsivoda, A., 2010. Query Language for

Logic Architectures. Perspectives of System

Informatics: Proceedings of 7th International

Conference. Lecture Notes in Computer Science,

Volume 5947, pp. 294-305.

Mitra, P., Wiederhold, G. & Jannink, J., 1999. Semi-

automatic Integration of Knowledge Sources.

Sunnyvale, CA, July 6-8, 2-nd International

Conference on Information Fusion.

Palchunov, D., 2008. The solution of the problem of

information retrieval based on ontologies. Bisnes-

informatika, 1(1), pp. 3-13.

Pulchunov, D., 2009. Knowledge search and production:

creation of new knowledge on the basis of natural

language text analysis. Filosofiya nayki, 43(4), pp. 70-

90.

Pulchunov, D. & Yakhyaeva, G., 2005. Interval fuzzy

algebraic systems. Proceedings of the Asian Logic

AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources

571

Conference , pp. 23-37.

Pulchunov, D. & Yakhyaeva, G., 2010. Fuzzy algebraic

systems. Vestnik NGU. Seriya: Matematica, mexanika,

informatika, 10(3), pp. 75-92.

Pulchunov, D., Yakhyaeva, G. & Hamutskya, A., 2011.

Software system for information risk manadgement

"RiskPanel". Programmnaya ingeneriya, Volume 7,

pp. 29-36.

Ruhroth, T. et al., 2014. Towards Adaptation and

Evolution of Domain-Specific Knowledge for

Maintaining Secure Systems. 15th International

Conference on Product-Focused Software Process

Improvement, Springer Lecture Notes in Computer

Science, pp. 239-253.

Steier, D., Lewis, R., Lehman, J. & Zacherl, A., 1993.

Combining multiple knowledge sources in an

integrated intelligent system. IEEE Expert, 8(3), pp.

35-44.

Thayse, F., 1989. From Modal Logic to Deductive

Databases: Introduction a Logic Based Approach to

Artificial Intelligence. Chichester: Wiley.

Yakhyaeva, G., 2007. Fuzzy model truth values.

Bratislava, Proceedings of the 6-th International

Conference Aplimat, pp. 423-431.

Yakhyaeva, G. & Yasinskaya, O., 2012. The application

of precedent model methodology in the risk-

management system aimed at early detection of

computer attacks. Vestnik NGU. Seriya:

Informationnie Texnologii, 10(2), pp. 106-115.

Yakhyaeva, G. & Yasinskaya, O., 2014. Application of

Case-based Methodology for Early Diagnosis of

Computer Attacks. Journal of Computing and

Information Technology, 22(3), p. 145–150.

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

572