An Expert System for Facilitating an Institutional Risk Proﬁle Deﬁnition

for Cyber Situational Awareness

Roman Graf

, Sergiu Gordea

, Heather M. Ryan

and Tibaut Houzanme

AIT Austrian Institute of Technology, Vienna, Austria

University of Denver, Library and Information Science Program, Denver, CO, U.S.A.

Electronic Records Archivist, Indiana Archives and Records Administration, Indianapolis, IN, U.S.A.

Keywords:

Situational Awareness, Data Mining.

Abstract:

Advances in cyber situational awareness technology lead to the creation of increasingly complex tools. Hu-

man analysts face challenges ﬁnding relevant information in large, complex data sets, when exploring data to

discover patterns and insights. To be effective in identifying and defeating future cyber-attacks, cyber analysts

require novel tools and models that can ﬁll the gap between cyber data and situation comprehension. The

research presented here is designed to develop a system that will warn a cyber analyst of ﬁle format endanger-

ment that could impact cyber situational awareness. The expert system statistically determines an institutional

risk proﬁle based on collected expert knowledge in the form of risk proﬁles calculated by means of risk fac-

tors. The institutional risk proﬁle indicates risks that could endanger digital content employed in analysis of

cyber situational awareness. Based on the institutional risk proﬁle, a cyber analyst can implement measures

for stabilising and securing situational awareness. Each institution may have multiple risk proﬁle deﬁnitions

dependent on network, critical infrastructure, and the role of the cyber analyst. Another contribution relates

to the provided support for visualisation and analysis of risk factors for individual dimensions. To facilitate

decision-making, the aggregated information about the risk factors is presented as a multidimensional vector.

1 INTRODUCTION

The cyber Situational Awareness (SA) (Barford et al.,

2010) is a perception of security and threat situation

with current and future impact assessment. In re-

cent years, researchers in SA ﬁeld have created in-

creasingly complex tools across many application do-

mains. Often, tool outputs are of a complex na-

ture, involving non-textual, high-dimensional, and

various multimedia data. Designers of automated

tools are aiming to address situational awareness

challenges like complex and ﬂuid system topology,

rapidly changing technologies, high noise to signal ra-

tio, rapidly evolving and multi-faceted threats, speed

of events, data overload, and meaning underload (Kott

and Wang, 2014). These factors make real-time sit-

uational awareness of cyber operations very difﬁcult

to evaluate. Contrary to the human brain - which is

well designed to derive situational awareness from the

world based on a complex set of cognitive processes

and schema learned through experience - the artiﬁ-

cial world of cyber operations seriously stresses that

process. Addressing the lack of good, integrated tools

that help bridge this gap by assisting the operator with

a comprehensive set of needed information is critical

for developing the necessary cyber security awareness

required to secure operations.

In security planning, it is necessary to analyse data

that are often vague and imprecise. In cyber situa-

tional awareness, we have to rely on such imperfect

information to detect real attacks and to prevent an

attack from happening through appropriate risk man-

agement. Cyber SA aggregates raw data at the lower

level (Barford et al., 2010). Missing data or rendering

errors can cause serious SA failure in achieving the

overall goal of cyber defense.

While making a decision, a human cyber analyst

faces challenges like ﬁnding relevant information in

large, complex data sets and in searching data to dis-

cover patterns and insights. For humans to be effec-

tive in identifying and defeating future cyber-attacks,

novel tools and models that can ﬁll the gap between

cyber data and situation comprehension are highly de-

sired.

One of the core risk factors for situational aware-

ness is an ability to render stored data such as log

Graf, R., Gordea, S., Ryan, H. and Houzanme, T.

An Expert System for Facilitating an Institutional Risk Proﬁle Deﬁnition for Cyber Situational Awareness.

DOI: 10.5220/0005738303470354

In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), pages 347-354

ISBN: 978-989-758-167-0

347

ﬁles, emails and documentation. Emails may com-

prise improper mime types and additional informa-

tion in attachments, such as PDF and Word. Sys-

tem logs mostly are expected in a text form, but they

can also make use of any proprietary formats. And

system documentation mostly employs either pdf or

web formats. Web formats can be composite docu-

ments that may contain a wide variety of other for-

mats. This risk factor deals with the evaluation of ap-

propriate formats used for encoding digital content.

The preservation risks for a particular ﬁle format are

difﬁcult to estimate as described in (Graf and Gordea,

2013). Since each institution has its speciﬁc Situa-

tional Awareness, the proposed system can only sup-

port decision making with its risk proﬁles. However,

the automatically computed institutional risk proﬁles

have to be adjusted and validated by institutional ex-

pert. The deﬁnition of risk factors and associated met-

rics is still an open research topic in the archiving

community. Involvement of digital preservation ex-

perts is required for collecting complete information

and evaluating risks as shown in (Ayris et al., 2008).

The SA expert is dependent on rendering soft-

ware and archived data that describe previous inci-

dents, vulnerabilities and attacks. The digital preser-

vation domain addresses the problem of long-term

data archiving and can be employed in SA ﬁeld to

ensure data rendering. For example facing render-

ing problems and employing File Format Metadata

Aggregator (FFMA) (Ryan et al., 2015) analyst can

quickly ﬁnd an alternative rendering solution, though

FFMA can not prevent rendering problems.

Currently, each institution selects its own ﬁle for-

mats for preservation depending on particular task

and critical infrastructure. Due to the scale of dig-

ital information that has to be managed, institutions

are facing challenges regarding preservation, mainte-

nance, and quality assurance of stored data. For that

reason, automated solutions for data management and

digital preservation are absolutely necessary. Trust-

worthy and continued access to data encoded in all

presented formats is important for further analysis and

building knowledge base to support decision making

in a fast-changing environment, possibly under attack.

Ensuring the continued accessibility of content en-

coded in various ﬁle formats before an attack arises

can ensure that a cyber situational operator has access

to the necessary data when crisis occurs.

Many ﬁle formats are properly documented, are

open-source and well supported by software vendors.

Other formats may be outdated or no longer func-

tional with modern software or hardware. There are

also custom/proprietary formats - which may be ob-

solete and not renderable with commodity hardware.

The novelty of this technical solution is the em-

ployment of data mining methods to facilitate com-

plex risk factor settings for cyber analysts. Our goal is

to make use of a domain expert knowledge base to de-

tect situational awareness risks for a particular insti-

tution. An automated ﬁle format endangerment warn-

ing can assist the cyber analyst to secure robust sit-

uational awareness. The research data for the expert

knowledge base and factors for the risk metrics cal-

culation were provided through two studies organised

by Heather Ryan (Ryan, 2014), (Ryan et al., 2015).

The data from the ﬁrst study (Ryan, 2014) reﬁned the

factors that are employed to create the risk proﬁles.

The (Ryan et al., 2015) short paper outlined the re-

search methods and goals of the study that produced

the data used in the test scenarios demonstrated here.

This paper is structured as follows: Section 2 gives

an overview of related work and concepts. Section

3 explains the risk factor visualisation workﬂow and

also covers data mining issues. Section 4 presents the

experimental setup, applied methods and results. Sec-

tion 5 concludes the paper and provides an outlook on

planned future work.

2 RELATED WORK

The research on risk management in SA increasingly

gains in importance. The SA framework (Morita

et al., 2011) describes how a person perceives ele-

ments of the environment, comprehends and projects

its actions into the future. This framework employs

the situation awareness model that can be used in the

assessment of risk awareness focusing on the adverse

event notiﬁcation system. Our expert system takes a

similar approach, but focuses on renderability of es-

sential information, rather than events. The review of

existing situation awareness measurement techniques

for their suitability for use in the assessment of SA in

different environments (Salmon et al., 2006) demon-

strates that current SA measurement techniques are

inadequate by themselves for use in the assessment

of SA, and a multiple-measure approach utilising dif-

ferent approaches is recommended. To address this

gap, we employ speciﬁc risk factors obtained from

the archiving community for the digital content fo-

cus of SA. In security planning, it is necessary to

analyse data that are often vague and imprecise. In

(Barford et al., 2010) authors survey existing tech-

nologies in handling uncertainty and risk manage-

ment in cyber situational awareness, but the focus is

on looking for vulnerabilities in a system, whereby

our approach is focused on secure rendering of the

raw data at a lower level that creates a basis for fur-

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

348

ther SA aspects, such as situation recognition, situa-

tion comprehension and situation projection (Barford

et al., 2010). It is difﬁcult to guarantee the longivity

of digital information. The investigation (Lawrence

et al., 2000) examines risk assessment of migrating

ﬁle formats. Accurate format identiﬁcation and ren-

dering is a challenging task due to malformed MIME

types, rendering expenses, dependence on content not

embedded in the ﬁle, changed fonts, etc. In (Jack-

son, 2012), the author examines how network effects

can stabilise formats against obsolescence. Jackson’s

evaluation demonstrates that most formats last much

longer than ﬁve years, that network effects stabilize

formats, and that new formats appear at a modest,

manageable rate. However, a number of formats are

fading from use and every corpus contains its own bi-

ases. Digital preservation tools like PANIC (Hunter

and Choudhury, 2006), AONS II (Pearson and Webb,

2008), SPOT (Vermaaten et al., 2012), P2 registry

(David Tarrant, 2011), were designed to notify repos-

itory managers of ﬁle format-related events that might

impact access to stored content. They also deﬁne

alerting mechanisms when ﬁle formats become obso-

lete. As distinct from our approach, they do not apply

expert knowledge and do not specify risk factors that

may inﬂuence ﬁle format endangerment. In the pro-

posed approach we intend to apply standard statistics

and data mining methods. The proposed system is

unique for the given domain.

3 VISUALISATION METHOD

Visualisation of risk proﬁles facilitates risk proﬁle

selection, assessment and replacement, if required.

Each risk proﬁle is represented by a multidimensional

vector. In the presented approach 31 dimensions were

evaluated and rated by the domain experts. The risk

proﬁle visualisation is conducted according the work-

ﬂow shown in Fig. 1.

The risk proﬁle data provided by domain experts

is stored in a text ﬁle. As described in (Graf et al.,

2015) the workﬂow reads data (step 1) and employs a

data mining method that calculates distances between

risk proﬁles based on the values of their risk factors.

Domain experts rated the risk factors on a scale of

1-5, where 1 is “extremely important,” and 5 is “not

at all important.” To remove a possible mismatch of

scale between the features and have a possibly well-

balanced risk factor r f set, the workﬂow applies nor-

malisation in the second step.

MSS

r f

− M

card(r f )

∑

|r f

− M|

. (1)

Normalisation employs the modiﬁed standard score

(MSS) (Tanner, 2012; Zacharski, 2012) (see Formula

1), which prevents the inﬂuence of the outliers. Each

risk proﬁle column is normalised separately. The

modiﬁed standard score demonstrates how big the de-

viation from the median value M is. First the median

value for each column is calculated. The median is

a middle value from the list, arranged from lowest to

highest value. Then, based on the median, the abso-

lute standard deviation can be calculated. In the third

step, computed risk factors are visualised for a given

dimension. For example, one such dimension is the

relation between institutional and expert risk proﬁles.

Finally, a cyber analyst should analyse the resulting

plot in the context of a particular SA task.

The calculation of the nearest risk proﬁle is de-

scribed in the workow shown in Fig. 2. The risk pro-

ﬁle data collected from domain experts is stored in

a text ﬁle and is used in the classiﬁcation task. The

institutional risk proﬁle that comprises the most im-

portant factors for the institution settings is stored in

an additional ﬁle. The workﬂow execution starts with

the reading of both input ﬁles. Input risk proﬁles are

stored in the data model classiﬁer and are converted

1. READ RISK

FACTORS

2. NORMALIZE

FACTOR 1 FACTOR 2 FACTOR n

MEDIAN

PLOT OF RISK FACTORS FOR

INDIVIDUAL EXPERT PROFILES

3. VISUALIZE RISK

FACTOR DIMENSIONS

4. ANALYZE PLOT

END

ABSOLUTE STANDARD

DEVIATION

START

Figure 1: The risk factor visualisation workﬂow.

An Expert System for Facilitating an Institutional Risk Proﬁle Deﬁnition for Cyber Situational Awareness

349

1. READ EXPERT

PROFILES

2. CALCULATE

RISK

VECTORS

PROFILE 1 PROFILE 2 PROFILE n

n RISK VECTORS

END

START

1. READ USER

PROFILE

2. CALCULATE

RISK

VECTOR

AUTOCOMPLETED USER RISK PROFILE

RISK VECTOR

3. FIND NEAREST NEIGHBOUR

BY COSINE SIMILARITY

4. COMPLETE USER PROFILE BY

EXPERT PROFILE SETTINGS

PROFILE

5. TRIGGER REQUIRED SA ACTIONS

Figure 2: The workﬂow for autocompletion of a user risk

proﬁle.

in risk proﬁle vectors in the second step. Applying

the cosine similarity algorithm (Dehak et al., 2010),

(Ye, 2011) we ﬁnd the nearest risk proﬁle from the

expert knowledge base. In the next step we merge the

detected nearest risk proﬁle with institutional settings

and produce the autocompleted institutional risk pro-

ﬁle.

4 EVALUATION

The goal of this evaluation was to leverage the do-

main expert knowledge base for detection of the near-

est risk proﬁle as described in the workﬂow for auto-

completion of a user risk proﬁle (see Fig. 2), pointing

out potential risks relevant for Situation Awareness

and exploitation of aggregated data for visualisation

of risk factor coherences. This process is described in

the risk factor visualization workﬂow (see Fig. 1).

4.1 Hypothesis and Evaluation Methods

of the Risk Factor Analysis

The hypothesis is that similar risk factor proﬁles auto-

matically aggregated from the domain expert knowl-

edge base are located close to each other in the plot

for a particular dimension. Therefore, a cyber analyst

can easily detect alternative risk factor proﬁles with

particular features for a speciﬁc task. Our approach

should give an organisation a base of information that

helps to determine an alternative risk proﬁle with the

required feature set. This decision should be the best

choice for the organisation’s Situation Awareness in-

frastructure. The employment of data mining tech-

niques facilitates this task for a cyber analyst by per-

forming complex calculations and comparisons.

In all scenarios (see section 4.3), we performed the

sample risk proﬁle calculation for different use cases

of Situational Awareness. The hypothesis is that a cy-

ber analyst will deﬁne some of the most important risk

factors and apply them as an input to the data mining

tool. The output of the tool should be the given input

accomplished with risk factor settings for the remain-

ing risk factors from the nearest expert risk proﬁle.

The calculated proﬁle then supports the decision mak-

ing (e.g. ﬁle format selection) for the critical infras-

tructure conﬁguration and triggers required for Situa-

tional Awareness actions.

Each evaluation scenario addresses the visualisa-

tion of risk factor dimensions. The hypothesis is that

visualisation of particular risk factor dimensions will

facilitate and speed up endangerment analysis and

demonstrate a level of agreement between important

risk factors. Thus, a cyber analyst can adjust required

risk factor settings in order to reduce SA risks. We

evaluated different risk proﬁle dimensions for the se-

lected risk factors.

4.2 Evaluation Data Set

The basis for the risk metrics calculation was pro-

vided through two exploratory studies organised by

Heather Ryan (Ryan, 2014), (Ryan et al., 2015)

in which 170 digital preservation experts evaluated

thirty one ﬁle format endangerment factors (see Ta-

ble 1). Table 1 represents the expert proﬁles from

the dataset from the ﬁle format survey data. In the

survey, digital preservation experts rated 31 risk fac-

tors on a scale of 1 to 5. Where 1 stands for the ex-

tremely important impact of the risk factor and 5 for

the low impact. We interpreted the experts ratings of

the endangerment factors as levels of risk associated

with each factor. The risk estimation ratings from the

trusted digital preservation experts were evaluated for

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

350

each of these factors based on their knowledge, ex-

perience and expertise. The columns to the right of

the Risk Factor column present ﬁrst ﬁve expert risk

proﬁles out the 170 participants, whereas the number

marks an expert index, e.g. E1 for “Expert 1”.

For evaluation purposes well known risk factors

were selected and each risk factor was graded in the

range from 1 (high impact on preservation risk) to 5

(low impact on preservation risk).

For evaluation of the modiﬁed standard score in

the visualisation sample four selected expert proﬁles

are used (see Table 2). These manually aggregated

metrics were used as an input data by the tool for vi-

sualisation and analysis of risk factor coherences.

4.3 Experimental Results and

Interpretation

One of the main information source for SA are log

ﬁles. In the ﬁrst scenario we investigate risk factors

that are important to secure rendering and accessibil-

ity of these data (see Table 2). The columns to the

right of the “Risk Factor” column present four use

case risk proﬁles, whereas the number marks a use

case index. In the second scenario, we consider a

use case, where a format should be selected to cre-

ate a backup in case of attack, or to restore data af-

ter cleaning and new installation. The third use case

is intended for graphical data e.g. maps or images

and the selection of associated ﬁle formats e.g. “GIF”

or “PNG”. Important risk factors here are “Render-

ing Software Availability” and “Storage Space”. We

focus the use case description on this scenario. The

fourth scenario is about decisions for instructions for-

mat e.g. “PDF” and for the documentation format e.g.

“DOC”.

The experimental results are presented in Table

3 that shows the institutional, nearest expert, and

merged risk proﬁles for the graphical use case, calcu-

lated by the modiﬁed standard score. The associated

original values for the expert risk proﬁles presented

in column “Inst” can be found in the Table 2 in the

column “UC3” and values presented in the “Expert”

column is one of the 170 expert risk proﬁles. Addi-

tional risk proﬁles are presented in the Table 1.

The experimental results are visualized in Figure

3 according to the graphics use case. The institutional

settings for risk factors are ﬂagged by the green cir-

cles.

In the calculated log ﬁle use case, the maximal

cosine similarity 0.584 was found for the anonymized

expert proﬁle with index 79. This expert proﬁle points

cyber analyst attention to additional important risk

factors, such as “Developer/Corporate Support”, “Le-

Table 1: Risk Factor Ratings for Digital Content Preserva-

tion from the Survey.

Expert Proﬁles ExpertID

Risk Factor E1 E2 E3 E4 E5

Availability Online 2 2 3 2 2

Backward Compatibility 1 1 2 1 1

Community Support 2 1 2 2 2

Complexity 1 2 2 3 2

Compression 3 4 2 2 2

Cost 2 1 2 2 1

Developer/Corporate Support 2 2 2 1 2

Domain Speciﬁcity 3 2 2 2 3

Ease Of Identiﬁcation 1 2 2 3 2

Ease Of Validation 1 4 2 3 2

Error Tolerance 2 1 2 2 2

Expertise Available 2 2 2 1 2

Forward Compatibility 2 5 2 4 1

Geographical Spread 3 4 4 4 3

Institutional Policies 1 4 3 4 1

Legal Restrictions 1 1 2 1 1

Life Time 2 3 3 2 3

Metadata Support 2 3 1 2 1

Rendering Software Availability 1 1 1 1 1

Rendering Software Functionality 2 2 1 3 1

Revision Rate 3 3 2 2 3

Speciﬁcations Available 2 2 1 1 2

Speciﬁcation Quality 2 2 1 2 2

Standardization 2 2 1 3 1

Storage Space 2 1 2 3 2

Technical Dependencies 2 1 2 1 1

Technical Protection Mechanism 1 1 2 3 2

Third Party Support 2 1 2 2 2

Ubiquity 2 1 2 2 2

Value 1 1 2 3 1

Viruses 2 3 2 3 3

Table 2: Use Case Risk Proﬁles.

Institutional Proﬁles Use Case ID

Risk Factor UC1 UC2 UC3 UC4

Availability Online 1 - - 1

Backward Compatibility 1 - - -

Community Support 5 - - -

Complexity - - - -

Compression - 1 - -

Cost 2 - 2 1

Developer/Corporate Support - - - -

Domain Speciﬁcity - - - -

Ease Of Identiﬁcation - - 3 4

Ease Of Validation - 2 - 2

Error Tolerance - - - -

Expertise Available 1 - - 4

Forward Compatibility - - - 1

Geographical Spread - - - -

Institutional Policies 1 1 2 -

Legal Restrictions - - - -

Life Time 2 2 - -

Metadata Support - - - -

Rendering Software Availability 1 1 1 1

Rendering Software Functionality 1 2 2 2

Revision Rate - - - -

Speciﬁcations Available 1 - 4 5

Speciﬁcation Quality 2 - - -

Standardization - - - -

Storage Space 1 1 1 4

Technical Dependencies 1 1 - -

Technical Protection Mechanism 1 1 - -

Third Party Support - - - -

Ubiquity - - - -

Value - - - -

Viruses - - - -

gal Restrictions”, “Metadata Support”, “Value” and

“Viruses”. This should trigger the required SA ac-

tions to reduce these risks. For example, ensure that

the cyber analyst has software developer contact data,

verify that there is sufﬁcient level of access rights for

the analyst’s role, review licenses expiration date, ex-

amine how to identify existing formats, etc.

An Expert System for Facilitating an Institutional Risk Proﬁle Deﬁnition for Cyber Situational Awareness

351

For the backup use case the best match 0.587

was found for the anonymized expert proﬁle with in-

dex 36. This expert proﬁle directs the cyber ana-

lyst’s attention to additional important risk factors,

such as “Availability Online”, “Community Support”,

“Cost”, “Error Tolerance”, “Geographical Spread”,

“Legal Restrictions”, “Standardization”, “Ubiquity”

and “Value”. This indicates that the cyber analyst

should determine that there is enough free space for

the backup in the storage, consider what should be in-

cluded in the backup, conﬁrm that the backup data is

renderable and conﬁrm that possible errors will not

compromise the backup. It is obvious that not all sug-

gested risk factors are applicable for particular case.

For example, “Geographical Spread” might not be

necessarily important in the SA domain. In this case

cyber analyst can adjust related risk factor according

to his institution requirements. Suggested risk factors

provide an idea about what should be considered in

the decison making process.

In the calculated graphics use case, the maxi-

mal cosine similarity 0.484 was computed for the

anonymized expert proﬁle with index 9. This ex-

pert proﬁle (see Figure 3) considers additional impor-

tant risk factors, such as “Backward Compatibility”,

“Compression”, “Error Tolerance”, “Forward Com-

patibility”, “Metadata Support” and “Third Party Sup-

port”. This indicates that the cyber analyst should re-

view selected format documentation to estimate the

complexity and forward/backward compatibility of

the format. The goal here to ensure image render-

ability.

Graphics Risk Profile

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Expert 3

Graphics

Rating

Risk

Factors

Figure 3: Plot for relation of graphics risk factor settings be-

tween institutional expert and the most nearest expert pro-

ﬁle.

For the documentation use case the best match

0.586 was calculated for the anonymized expert pro-

ﬁle with index 54. This expert proﬁle suggests

additional important risk factors, such as “Back-

ward Compatibility”, “Community Support”, “Devel-

oper/Corporate Support”, “Error Tolerance”, “Meta-

Table 3: The Most Similar Risk Proﬁle for Graphical Data

Scenario Based.

Risk Factor Inst Expert Generated

Availability Online - 2 2

Backward Compatibility - 1 1

Community Support - 2 2

Complexity - 2 2

Compression - 1 1

Cost 2 1 2

Developer/Corporate Support - 2 2

Domain Speciﬁcity - 2 2

Ease Of Identiﬁcation - 3 3

Ease Of Validation - 2 2

Error Tolerance - 1 1

Expertise Available - 3 3

Forward Compatibility - 1 1

Geographical Spread - 4 4

Institutional Policies 2 4 2

Legal Restrictions - 2 2

Life Time - 2 2

Metadata Support - 1 1

Rendering Software Availability 1 3 1

Rendering Software Functionality 2 3 2

Revision Rate - 4 4

Speciﬁcations Available 4 3 4

Speciﬁcation Quality - 5 5

Standardization - 2 2

Storage Space 1 1 1

Technical Dependencies 1 1 1

Technical Protection Mechanism - 2 2

Third Party Support - 1 1

Ubiquity - 2 2

Value - 3 3

Viruses - 2 2

data Support”, “Value” and “Viruses”. This indicates

that the cyber analyst should examine existing docu-

mentation and instructions for renderability.

Figure 3 shows a visualisation of the values from

associated Table 3. These plot demonstrate the rela-

tion between two selected expert setting vectors for

the 31 evaluated risk factors on the X axis. The asso-

ciated risk factor labels are presented in the Table 2.

The Y axis is range of the risk factor raitings. Figure

shows that the selected expert proﬁles demonstrate

good agreement with institutional proﬁle.

This approach should support the deﬁnition of

institutional policies for SA risk calculation. This

knowledge about risks can reduce endangerment level

of a digital data by providing the cyber analyst with

an opportunity to ameliorate potential issues before

they pose a more substantial threat. Employing the

provided algorithm the cyber analyst can either select

between predeﬁned expert settings or estimate impor-

tant risk factors by themself and ﬁnd the most similar

expert proﬁle for the deﬁnition of remaining values.

In order to acquire more speciﬁc information

about a particular ﬁle format the File Format Metadata

Aggregator (FFMA) tool (Graf and Gordea, 2012) is

employed.

These results demonstrate that a semi-automatic

approach for risk factor visualisation is very effective

and it is a signicant improvement compared with man-

ual analysis for planning and validation of design for

critical infrastructure. Resulting actions of the pre-

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

352

sented analysis tool may be validation, checking or

updating of the software.

5 CONCLUSIONS

In this work we presented an approach for the easy

creation of an institutional risk proﬁle in Situational

Awareness. Models employed in digital preservation

domain we apply for analysis in cyber SA ﬁeld.

The main contribution of this work is the employ-

ment of data mining techniques to support risk factors

set up with a few of the most important values for a

particular institution. The resulting risk proﬁle is used

to support cyber analysts with semi-automatic estima-

tion of endangerment level for ﬁle formats.

The presented method employs a domain expert

knowledge base collected through a survey to detect

Situational Awareness risks for particular institutions.

Another contribution is support for the visualisa-

tion and analysis of risk factors. To facilitate easier

decision-making, the collected information about the

risk factors is presented as a multidimensional vector.

The proposed methods improve the interpretability of

risk factor information and the quality of the Situa-

tional Awareness process.

We make use of the modiﬁed standard score data

mining method to analyse the collected data, and the

cosine similarity calculation to compare risk proﬁles.

In the evaluation section, different risk factor di-

mensions are exposed. The presented plots demon-

strate coherences in risk factors and help solve prac-

tical Situational Awareness issues. Using the devel-

oped approach and adjusting input data, cyber ana-

lysts have the ability to choose the appropriate risk

factor setting for digital preservation planning in their

institution.

The presented approach is designed to facilitate

decision making for the Situational Awareness using

domain expert knowledge. As future work we plan

to increase the amount and quality of collected expert

information and to extend the tool with additional vi-

sualisation scenarios.

REFERENCES

Ayris, P., Davies, R., McLeod, R., Miao, R., Shenton, H.,

and Wheatley, P. (2008). The life2 ﬁnal project report.

Final project report, LIFE Project, London, UK.

Barford, P., Dacier, M., Dietterich, T., Fredrikson, M., Gif-

ﬁn, J., Jajodia, S., Jha, S., Li, J., Liu, P., Ning, P., Ou,

X., Song, D., Strater, L., Swarup, V., Tadda, G., Wang,

C., and Yen, J. (2010). Cyber sa: Situational aware-

ness for cyber defense. In Jajodia, S., Liu, P., Swarup,

V., and Wang, C., editors, Cyber Situational Aware-

ness, volume 46 of Advances in Information Security,

pages 3–13. Springer US.

David Tarrant, Steve Hitchcock, L. C. (2011). Where the

semantic web and web 2.0 meet format risk manage-

ment: P2 registry. International Journal of Digital

Curation, 6(1):165–182.

Dehak, N., Dehak, R., Glass, J., Reynolds, D., and Kenny,

P. (2010). Cosine similarity scoring without score

normalization techniques. in Proceedings of Odyssey

2010 - The Speaker and Language Recognition Work-

shop (Odyssey 2010), pages 71–75.

Graf, R. and Gordea, S. (2012). Aggregating a knowledge

base of ﬁle formats from linked open data. Proceed-

ings of the 9th International Conference on Preserva-

tion of Digital Objects, poster:292–293.

Graf, R. and Gordea, S. (2013). A risk analysis of ﬁle

formats for preservation planning. In Proceedings

of the 10th International Conference on Preservation

of Digital Objects (iPres2013), pages 177–186, Liss-

abon, Portugal. Biblioteca Nacional de Portugal, Lis-

boa.

Graf, R., Gordea, S., and Ryan, H. M. (2015). A tool

for facilitating an institutional risk proﬁle deﬁnition.

17th International Conference on Digital Information

Management (ICDIM2015), pages 2998 – 3003.

Hunter, J. and Choudhury, S. (2006). Panic: an integrated

approach to the preservation of composite digital ob-

jects using semantic web services. International Jour-

nal on Digital Libraries, 6, (2):174–183.

Jackson, A. N. (2012). Formats over time: Exploring uk

web history. Proceedings of the 9th International

Conference on Preservation of Digital Objects, pages

155–158.

Kott, A. and Wang, C. (2014). Cyber Defense and Situa-

tional Awareness. Springer International Publishing

Switzerland.

Lawrence, G. W., Kehoe, W. R., Rieger, O. Y., Walters,

W. H., and Kenney, A. R. (2000). Risk management

of digital information: A ﬁle format investigation.

Morita, P., Burns C.M., H., and He, Q. (2011). Situa-

tion awareness and risk management understanding

the notiﬁcation issues. In Stud Health Technol Inform.

PubMed.

Pearson, D. and Webb, C. (2008). Deﬁning ﬁle format obso-

lescence: A risky journey. The International Journal

of Digital Curation, Vol 3, No 1:89–106.

Ryan, H. (2014). Occam’s razor and ﬁle format endan-

germent factors. In Proceedings of the 11th Interna-

tional Conference on Preservation of Digital Objects

(iPres2014) (accepted for publication), Melbourne,

Australia.

Ryan, H., Graf, R., and Sergiu, G. (2015). Human and

machine-based ﬁle format endangerment notiﬁcation

and recommender systems development. In Proceed-

ings of the 12th International Conference on Digi-

tal Preservation (iPres2015), Chapel Hill, North Car-

olina, USA. UNC.

Salmon, P., Stanton, N., Walker, G., and Green, D. (2006).

Situation awareness measurement: A review of appli-

An Expert System for Facilitating an Institutional Risk Proﬁle Deﬁnition for Cyber Situational Awareness

353

cability for {C4i} environments. Applied Ergonomics,

37(2):225 – 238.

Tanner, D. (2012). Using statistics to make educa-

tional decisions. Library of Congress Cataloging-in-

Publication Data, pages 77–104.

Vermaaten, S., Lavoie, B., and Caplan, P. (2012). Identify-

ing threats to successful digital preservation: the spot

model rsik assessment. D-Lib Magazine, 18(9/10).

Ye, J. (2011). Cosine similarity measures for intuitionistic

fuzzy sets and their applications. Mathematical and

Computer Modelling, 53(12):91 – 97.

Zacharski, R. (2012). A Programmer’s Guide to Data Min-

ing: The Ancient Art of the Numerati.

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

354