An Algorithm to Compare Computer-security Knowledge from
Different Sources
Gulnara Yakhyaeva and Olga Yasinkaya
Department of Information Technology, Novosibirsk State University, Novosibirsk, Russian Federation
Keywords: Information Security, Cyber Threats, Case-based Model, Fuzzy Model, Generalized Fuzzy Model,
Generalized Case.
Abstract: In this paper we describe a mathematical apparatus and software implementation of a module of the
RiskPanel system, aimed to compare computer-security knowledge learned from various online sources.
To describe this process, we use model-theoretic formalism. The knowledge of a particular computer attack
obtained from the same source is formalized as an underdetermined algebraic system, which we call a
generalized case. The knowledge base is a set of generalized cases. To implement the knowledge
comparison, we construct a generalized fuzzy model, the product of all algebraic systems stored in the
database.
We consider an algorithm for computing consistent truth values and describe a software implementation of
the developed methods. The developed algorithm has polynomial complexity.
1 INTRODUCTION
The main problem in designing intelligent systems is
how to represent and process knowledge. Computer
programs should have knowledge of a given subject
domain presented in a formalism that is useful for
the program. Knowledge representation consists
mainly of identifying the most appropriate
formalisms for representing knowledge and the most
effective methods for manipulating this knowledge
(Thayse, 1989).
This problem is particularly acute for knowledge
of information security and cyber threats. In these
subject domains, the value of information depends
much more on its novelty than in most other
scientific and technological domains. To effectively
protect against computer threats, they must be
identified as early as possible. Text in natural
language on the Internet is one of the most relevant
sources of such information. This gives rise to the
need of representing security knowledge as
ontologies. There are many application of
knowledge based systems to computer security (for
example (Ruhroth et al., 2014), (Gartner et al.,
2014), (Burger et al., 2013)).
One method to process knowledge learned from
text in natural language is model-theoretic
knowledge representation, based on the model-
theoretic approach developed to formalize domain
ontologies (Palchunov, 2008) and Case-based
reasoning methodology (Kolodner, 1992), (Assali et
al., 2013). Under this approach, the knowledge
learned from texts written in natural language is
presented as algebraic systems (domain cases)
(Yakhyaeva and Yasinskaya, 2014). Using these
systems, a case-based model of the subject domain
can be constructed. The truth value of a sentence in
the case-based model is the set of cases for which
the sentence is true in a strict sense. From the case-
based model fuzzification we obtain a fuzzy model,
in which the truth values of the sentences are
numbers in the interval [0, 1]. By fuzzifying a set of
case-based models, we obtain a generalized fuzzy
model. A formal (model-theoretic) description of
these models can be found in (Pulchunov and
Yakhyaeva, 2005) and (Yakhyaeva, 2007).
Knowledge-based systems are required to exploit
knowledge from multiple sources to solve
increasingly difficult problems. Therefore there is a
need to establish a mechanism of knowledge
integration. Many researchers in different subject
domains are interested in that problem and consider
it from different points of view. Haddad and
Bozdogan (2009) provide definition for the
knowledge integration phenomenon at both the
565
Yakhyaeva G. and Yasinkaya O..
An Algorithm to Compare Computer-security Knowledge from Different Sources.
DOI: 10.5220/0005347205650572
In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS-2015), pages 565-572
ISBN: 978-989-758-096-3
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
conceptual and operational levels. Steier et al.,
(1993) implemented knowledge source integration
mechanism in Soar architecture system. Console et
al., (1991) analyzed integration of different
knowledge sources in model-based diagnostic
system. Semi-automatic integration of knowledge
sources using semantic knowledge articulation tool
(SKAT) was provided by Mitra et al., (1999).
One way to generate new knowledge through
texts in natural language by comparing and
integrating knowledge from different texts
(Pulchunov, 2009). While extracting knowledge
from natural language texts, different generalized
fuzzy models are built. Accordingly, there is a need
to compare the different algebraic systems.
This paper presents a model-theoretic description
of comparing knowledge learned from different texts
in natural language and an application of this theory
in the subject domain of computer security.
2 MATHEMATICAL
FOUNDATIONS OF THE
DEVELOPED APPROACH
2.1 Case-based Models
First, we define a finite set of documents, each
describing some case of computer attacks. We
describe each case by algebraic system 
,
,
where is the universe of the algebraic system and
is its signature. Signature σ is a set of concepts
that describe this subject domain: the set of
vulnerabilities, threats, countermeasures,
consequences, and so on. We assume that all these
cases have the same signature. We denote set A and
signature as

|
. Algebraic
systems by which we describe instances of domain
belong to the following class

〈

|
,
|



.
Let ℘ denote the set of all subsets of X, and let
denote the set of all sentences of the signature
.
The algebraic system , a model of some
computer attack, will be called the case of the
considered subject domain. For each set of cases
we define a case-based model 
.
Definition 1. Let ⊆
be a set of cases. A
system 
,,
, where
:
→℘, is
called a case-based model (generated by the set E) if
∈|⊨
for any sentence of the signature
.
The case-based model is a Boolean model in
which the truth values of the sentences are the
elements of Boolean algebra. In this case, the truth
values of the sentences are the elements of Boolean
algebra of all subsets of set .
Consider set to be the set of all kinds of
computer attacks: those that have already occurred,
and those that can still occur. At any one moment,
our knowledge of cyber attacks that have already
happened is finite. However, this knowledge is
constantly growing, adding new cases. Thus, we can
assume that set can be counted. It is sufficient to
consider only the finite subsets of to formalize our
knowledge about the domain at different times.
Thus, we consider only finite sets of cases. Denote a
class of all finite case-based models

|


.
Suppose we have a case-based model 
, a
mathematical formalization of the knowledge base
of computer-attack cases. To calculate the objective
probabilities of different attacks occurring, we
define the notion of the fuzzy model.
Definition 2. Let 
∈
be a base-based model. A
system 
,
,
is called a fuzzy model,
generated by the model 
(denoted 

) if

for any sentence of the signature
.
We introduce the notation for class of all fuzzy
models, generated by the models from class
⇋
|∃
∈
:

.
In practice, one cannot have full information of a
considered subject domain. For example, we cannot
have information about all the cyber attacks and
information-security violations that have occurred.
Also, the documentation of particular attacks may be
incomplete or inaccurate. It is impossible to give a
complete description of the case-based model
describing this subject domain, so we must consider
fuzzy models that describe the properties of the
subject domain that are already known. To formally
describe this situation, we introduce the concept of
the generalized fuzzy model.
Definition 3. Let ⊆
and ∅. A system

,
,
is called a generalized fuzzy model
(generated by the class ) if

0,1
|∃
∈:

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
566
for any sentence of the signature
.
2.2 Principle of Comparing the
Generalized Fuzzy Models
One interpretation of the generalized fuzzy model is
as follows: Suppose there is some expert in a subject
domain described by the language
. For example,
this expert may be the system administrator of an
enterprise, and the subject domain may be computer
security. The expert must deal with a set of
situations—the cases of that subject domain—for
example, a set of cyber attacks. This set of cases can
be considered as a probability space. The cases are
the elementary outcomes of this probability space.
Naturally, the expert does not know the full
description of each case or the truth value of all
sentences of signature
for each case.
Nevertheless, the expert can estimate the
probabilities of the truth values of the sentences
based on known information. For example, an expert
can claim that 70% of computer attacks use denial-
of-service attacks or that not less than 60% of cyber
attacks are done to steal information. Previous
studies (Pulchunov and Yakhyaeva, 2010) have
shown how such probabilistic expert knowledge can
be formalized into a generalized fuzzy model.
Now, suppose we have several experts in the
subject domain. Each expert has unique knowledge
of the subject domain, as they may get their
knowledge from different sources and may have
different training. When making a decision, we
would like to account for the opinions of all the
experts to find a compromise.
This problem can be described in formal
language as follows: Let the subject domain be
described by a signature
, where is the set of
individuals (basic set) taken from the total set of
individuals (basic set) in this subject domain. To
describe the domain, we construct a finite number of
generalized fuzzy models as

,
,
|1,,,
where
is the set of case-based models that
generate the model 
, and
is the evaluation of
all sentences of the signature
in model 
. Note
that the truth values of a sentence in the generalized
fuzzy model are different subsets of the interval [0,
1].
Then, the problem of comparing a finite number
of models 
,…,
can be formulated as follows:
the description of the procedure (algorithm)
allowing for any ∈
based on the truth
values
,…,
 of this sentence on models

,…,
build a consistent truth value

⊆0,1.
This problem can be formalized by constructing an
n-ary function
:

0,1

→
0,1
.
While constructing this function, consistent truth
values for different sentences should not contradict
each other. For example, it would be strange for our
comparison principle to produce



1.
Thus, it is more reasonable to formulate the principle
of comparing n generalized fuzzy models as an n-ary
function defined by the set of all generalized fuzzy
models:
:

,…,
↦
.
Moreover, it would be ideal for this comparison
principle to work on any finite set of models and to
not depend on the order the models are considered.
These properties are achieved by using the
properties of associativity and commutativity.
2.3 Product of the Generalized Fuzzy
Models
First we define the operation of the product on the
class
of case-based models.
Definition 4. Let 
,
be case-based models.
We assume that
∩
∅ (perhaps, after
renaming). A model 
is called the product of 
and 
, denoted as 

∗
, if:
1) 
∪
;
2)

∪
 for any ∈
.
Paper (Pulchunov and Yakhyaeva, 2010) proved that
the operation * is associative, commutative, and
closed in a set of case-based models.
Statement 5. Let 

∗
and 

,

, 

. Then,


for any ∈
.
A proof of this statement can also be found in
(Pulchunov and Yakhyaeva, 2010).
Consequence 6. Let 

∗
and 

,

, 

. Then,
AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources
567

,



,

for any ∈
.
Now we can define the operation of the product
on a set of generalized fuzzy models.
Definition 7. Consider the generalized fuzzy models


. Let
∗

∗
|
∈
,
∈
.
Then, the generalized fuzzy model 
∗
is called
the product of models 

.
Because the product of case-based models is
commutative and associative, the product of the
generalized fuzzy models will also be commutative
and associative.
3 COMPUTER SECURITY
SOFTWARE
3.1 Knowledge Base
We developed a software system called RiskPanel,
essentially a workplace for experts to ensure the
security of corporate information, based on the
methodology of generalized fuzzy models
(Pulchunov et al., 2011).
The core of this system is an information-
security knowledge base. To organize and work with
the knowledge base, we use OntoBox technology
(Malykh and Mantsivoda, 2010). This system
represents and stores data in an ontological format
and has powerful, flexible processing tools. It allows
for great modularity and portability of knowledge
bases, advantageous when developing complex
information systems.
Seven categories of attributes (classes) were
created to describe the cases in the OntoBox
knowledge base: symptoms, threats, vulnerabilities,
consequences, loss, countermeasures, and
configurations. Each attribute category was
represented with a tree structure. The cases in the
database are characterized by certain attributes of
each category. Each case is formed based on natural-
language text found on the Internet (Yakhyaeva and
Yasinskaya, 2012).
While analyzing the texts provided to form the
cases, we found most of them had clear but not full
information. In other words, we could not perfectly
describe whether a particular case had specific
knowledge-base attributes. To solve this problem,
we proposed using an open-world semantic
methodology, widely used in description logic
systems (Baader, 2003). Basically, this approach
considers all possible interpretations of unknown
information. Thus, to mathematically describe a
computer-attack case, we consider a generalized
fuzzy model with certain attributes, called a partial
case.
Definition 8. Consider a set ⊆
and
evaluation :0,1. We say that Case  is
consistent with the evaluation (and denote ↑)
if 
1 for any ∈.
Definition 9. Consider a set ⊆
and
evaluation :0,1. A generalized fuzzy model

is called a generalized case (generated by the
evaluation ν) if

|

.
In this formalism, the entire knowledge base of
RiskPanel can be considered a finite set of
generalized cases. When drawing conclusions from
this knowledge base, we must compare these
models.
For a knowledge base formalized as a set of
generalized cases, it is most appropriate to use a
comparison principle based on the product of the
generalized fuzzy models, because it is consistent
with open-world semantics.
Note that each generalized case 
is not an
interval model. Moreover, for each sentence ∈
, the truth value
 belongs to
0
,
1
,
0,1
. Now, we must formulate an
algorithm for calculating the truth values in a
consistent model of generalized cases.
Theorem 10. Let 
,…,
be generalized cases.
Then, for ∈
we have
∗…∗

;
1
;…;

,
where

|
1
and

|
0,1.
Proof. Let ∈
∗…∗
. Then, there are such

∈
,…,
∈
such that (see Statement 5)

⋯

,
where for any 1,, if 
⊨ then we have
1, and if
0 then we have 
⊭.
Obviously,
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
568

⋯
.
Thus, we obtain ∈
;

;…;

.
Now consider ∈
;

;…;

. Let 
.
Obviously, . Let

|
1
,,
,

|
0,1
,,
.
First, we select one case from each generalized case
of set A, denoting them as 
∈
1,,.
Then we select cases 
∈
1,,
from the generalized cases of set B such that 
. Last, we select cases 
∈

1,, from the generalized cases of set B such
that 
⊭.
Obviously,

∗…∗
∗
∗…∗
.
Thus, ∈
∗…∗
.
Note that comparing the finite set of generalized
cases will not produce an interval model. But, when
→
, the truth values of sentences in a consistent
model will tend toward intervals on the set
0,…,1. Thus, in practice, when dealing with a
large enough set of cases, we can view the truth
values in a consistent model as intervals.
3.2 Theorem of Atomically Generalized
Cases
Definition 11. A generalized case is called an
atomically generalized case if it is generated by
evaluating the subset of the set of all atomic
propositions.
Consider a quantifier-free sentence 
,…,
from atomic propositions. Let us reduce this
sentence to PDNF:
,…,

∨…∨
,
where

1,,
are the elementary
conjunctions consisting of atomic propositions
,…,
.
We introduce the following notation:

,…,
,

,
0

∈
|
0
,

,
1

∈
|
1
,

,
0,1

∈
|
0,1
.
Theorem 12. Let 
be an atomically generalized
case, and let be a quantifier-free sentence of
signature
. Then,
0
,

,
0
;
1
,

,
1
∅


,
0,1
‖
2
|
,
;
0,1
,.
Proof. Obviously,
0
⇔∀
⊭

⇔∀
⊭
,…
⇔
⇔
⋯
0
⇔
⇔

,
0

.
On the other hand,

,
1

∅⇔
∀
⊨
⇒
⇒∀
⊨

1
.
Let
,…,
be a set of atomic propositions
included in . Consider the set of elementary
conjunctions

&…&
|∃:
&…&
.
Let 
‖
|
0,1
‖
. Obviously, 
0; otherwise, 
,
0,1

∅. Consequently,
2
.
Assume now that 

,
0

and

,
1

∅. Thus, 
,
0,1

∅.
Moreover, 
,
0,1

⊆.
Consider two cases: 
,
0,1

 and

,
0,1

.
Let 
,
0,1

. Then, for any case ∈
, there is a conjunct
∈
,
0,1

such that
⊨
. Consequently,
1
for any case
∈.
Assume now that 
,
0,1

⊂. Then there
is a case 
∈ such that 
⊭
for any

,
0,1

. Because we have assumed that

,
1

∅, then 
⊭. On the other hand,
because 
,
0,1

∅, there is a case 

such
that 

⊨. Thus,
0,1
.
3.3 Module of Knowledge Comparison
RiskPanel has a module for comparing knowledge
learned from various computer-attack cases.
Currently, the module interface allows one to
calculate the truth value as an interval for a formula
presented in PDNF (perfect disjunctive normal
form).
Consider the module interface (Fig. 1). To input
data into the main algorithm, the user must enter the
AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources
569
parameters of the formula using the resources
provided. First, the user must select the attributes
included in all conjunctions of PDNF. Next, the user
must specify the number of conjunctions in the
formula. Then, drop-down lists of «+» and «–»
values appear with the resulting PDNF, where «–»
symbolizes negation of the argument. The data from
this window with PDNF can be inputted into the
main algorithm by clicking the button titled «Get the
value of the formula».
The value of the formula is calculated as an
interval (see Theorem 10). The start value of the
interval is the ratio of the number of cases for which
the formula is true to the number of all existing
cases. The end value of the interval is the ratio of the
number of cases for which the formula is true, added
to the number of cases for which the truth value of
the formula is not defined, to the number of all
existing cases.
The algorithm used to determine the truth value
of a formula in the generalized case is based on
Theorem 12 and shown in Table 1. At first, false
conjunctions that contradict the available
information for the case are eliminated from the
formula. If no conjunctions in the formula remain,
then the formula for the case is false. If the
remaining conjunctions do not have unknown
attribute values for the case, then the formula for the
case is considered true. If the remaining
conjunctions have unknown attribute values, then
the algorithm operates as follows: If the number of
remaining conjunctions is less than 2
, where is
the number of unknown attribute values in the
conjunction, then the truth value of the formula for
the case is not defined, otherwise the formula is true.
To determine whether a case has attribute values
included in the conjunctions requires 
operations, where is the number of attribute values
in all categories stored in OntoBox. To eliminate
false conjunctions for the case based on the
information of attribute values requires 
operations, where is the number of conjunctions in
PDNF. The total number of attribute values involved
in the conjunctions cannot exceed . Thus, the total
algorithmic complexity of the developed approach
for defining the truth value of PDNF in a case is
.
Further, if the OntoBox knowledge base has m
computer-attack cases, then calculating the truth
value of the formula in interval form needs
 operations.
Table 1: The algorithm for determining the truth value of a
formula in the generalized case.
alg getPDNFVerityOnCase(arg Case case,
arg list PDNFFormulaAttrs,
arg matrix PDNFBoolMatrix)
begin
| bool rightValue,
| int unknownAttrsCount,
| list removedIndexes
| for each Attribute attr in
| | PDNFFormulaAttrs
| | int attrValueOnCase :=
| | checkIfCaseHasAttr(attr, case)
| | if (attrValueOnCase = UNKNOWN_ATTR)
| | | unknownAttrsCount :=
| | | unknownAttrsCount + 1
| | else
| | | if (attrValueOnCase = HAS_ATTR)
| | | | rightValue := true
| | | else
| | | | rightValue := false
| | | list boolRow :=
| | | PDNFBoolMatrix.get(
| | | PDNFFormulaAttrs.indexOf(attr))
| | | for int i = 0 to boolRow.size()
| | | | if (removedIndexes does not
| | | | | contain i)
| | | | | if (boolRow.get(i) !=
| | | | | | rightValue)
| | | | | | removedIndexes.add(i)
| | | end of loop
| end of loop
| int remainingConjCount :=
| PDNFBoolMatrix.get(0).size() –
| removedIndexes.size()
| if (remainingConjCount = 0)
| | return PDNF_FALSE
| if (unknownAttrsCount = 0)
| | return PDNF_TRUE
| if (remainingConjCount <
| | 2^unknownAttrsCount)
| | return PDNF_UNKNOWN
| else
| | return PDNF_TRUE
end
4 CONCLUSIONS
This work describes the mathematical apparatus and
software implementation of one of the modules of
the RiskPanel system, aimed to compare computer-
security knowledge learned from various online
sources.
Algorithms implemented in this module are
based on the methodology of generalized fuzzy
models. The knowledge obtained from a single
source is formalized as an algebraic system and is
stored in the knowledge base of the RiskPanel
system. To implement the knowledge comparison,
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
570
Figure 1: Module of Knowledge Comparison.
we construct a generalized fuzzy model, the product
of all algebraic systems stored in the database.
The system interface allows one to calculate the
truth value of any quantifier-free sentence. The input
sentence is presented in PDNF. The truth value is
calculated as a probability interval.
The developed algorithm has polynomial
complexity.
ACKNOWLEDGEMENTS
The research for this paper was financially supported
by the Ministry of Education of the Russian
Federation (project no. 2014/139) and was partially
supported by RFBR (project no. 14-07-00903-a).
REFERENCES
Assali, A., Lenne, D. & Debray, B., 2013. Adaptation
Knowledge Acquistion in a CBR System.
International Journal on Artificial Intelligence Tools,
22(1).
Baader, F., 2003. The Description Logic Handbook. Ney
York: Cambridge University Press.
Burger, J. et al., 2013. Model-Based Security Engineering:
Managed Co-evolution of Security Knowledge and
Software Models. Foundation of Security Analysis and
Design VII - FOSAD 2012/2013 Tutorial Lectures.
Springer Lecture Notes in Computer Science, pp. 34-
53.
Console, L., Theseider, D. & Torasso, P., 1991. Towards
the integration of different knowledge sources in
model-based diagnosis. Trends in Artifician
Intelligence, Lecture Notes in Computer Science,
Volume 549, pp. 177-186.
Gartner, S. et al., 2014. Maintaining requirements for
long-living software systems by incorporating security
knowledge. IEEE 22nd International Requirements
Engineering Conference, pp. 103-112.
Haddad, M. & Bozdogan, K., 2009. Knowledge
Integration in Large-Scale Organizations and
Networks - Conceptual Overviev and Operational
Definition. [Online] Available at: http://dx.doi.org/
10.2139/ssrn.1437029
Kolodner, J., 1992. An introduction to Case-based
reasoning. Artificial Intelligence Review, Volume 6,
pp. 3-34.
Malykh, A. & Mantsivoda, A., 2010. Query Language for
Logic Architectures. Perspectives of System
Informatics: Proceedings of 7th International
Conference. Lecture Notes in Computer Science,
Volume 5947, pp. 294-305.
Mitra, P., Wiederhold, G. & Jannink, J., 1999. Semi-
automatic Integration of Knowledge Sources.
Sunnyvale, CA, July 6-8, 2-nd International
Conference on Information Fusion.
Palchunov, D., 2008. The solution of the problem of
information retrieval based on ontologies. Bisnes-
informatika, 1(1), pp. 3-13.
Pulchunov, D., 2009. Knowledge search and production:
creation of new knowledge on the basis of natural
language text analysis. Filosofiya nayki, 43(4), pp. 70-
90.
Pulchunov, D. & Yakhyaeva, G., 2005. Interval fuzzy
algebraic systems. Proceedings of the Asian Logic
AnAlgorithmtoCompareComputer-securityKnowledgefromDifferentSources
571
Conference , pp. 23-37.
Pulchunov, D. & Yakhyaeva, G., 2010. Fuzzy algebraic
systems. Vestnik NGU. Seriya: Matematica, mexanika,
informatika, 10(3), pp. 75-92.
Pulchunov, D., Yakhyaeva, G. & Hamutskya, A., 2011.
Software system for information risk manadgement
"RiskPanel". Programmnaya ingeneriya, Volume 7,
pp. 29-36.
Ruhroth, T. et al., 2014. Towards Adaptation and
Evolution of Domain-Specific Knowledge for
Maintaining Secure Systems. 15th International
Conference on Product-Focused Software Process
Improvement, Springer Lecture Notes in Computer
Science, pp. 239-253.
Steier, D., Lewis, R., Lehman, J. & Zacherl, A., 1993.
Combining multiple knowledge sources in an
integrated intelligent system. IEEE Expert, 8(3), pp.
35-44.
Thayse, F., 1989. From Modal Logic to Deductive
Databases: Introduction a Logic Based Approach to
Artificial Intelligence. Chichester: Wiley.
Yakhyaeva, G., 2007. Fuzzy model truth values.
Bratislava, Proceedings of the 6-th International
Conference Aplimat, pp. 423-431.
Yakhyaeva, G. & Yasinskaya, O., 2012. The application
of precedent model methodology in the risk-
management system aimed at early detection of
computer attacks. Vestnik NGU. Seriya:
Informationnie Texnologii, 10(2), pp. 106-115.
Yakhyaeva, G. & Yasinskaya, O., 2014. Application of
Case-based Methodology for Early Diagnosis of
Computer Attacks. Journal of Computing and
Information Technology, 22(3), p. 145–150.
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
572