NOVEL AND ANOMALOUS BEHAVIOR DETECTION USING

BAYESIAN NETWORK CLASSIFIERS

Salem Benferhat and Karim Tabia

CRIL - CNRS UMR8188, Universit´e d’Artois, Rue Jean Souvraz SP 18 62307 Lens Cedex, France

Keywords:

Bayesian classiﬁers, Intrusion detection, Anomaly approach, Novelty detection.

Abstract:

Bayesian networks have been widely used in intrusion detection. However, most works showed that they are

ineffective for anomaly detection since novel attacks and new behaviors are not efﬁciently detected. In this

paper, we ﬁrstly analyze this problem due to inadequate treatment of novel and unusual behaviors and to insuf-

ﬁcient decision rules which do not meet anomaly approach requirements. We accordingly propose to enhance

the standard Bayesian classiﬁcation rule in order to ﬁt anomaly detection objectives and effectively detect

novel attacks. We carried out experimental studies on recent and real http trafﬁc and showed that Bayesian

classiﬁers using enhanced decision rules allow detecting most novel attacks without triggering signiﬁcantly

higher false alarm rates.

1 INTRODUCTION

Intrusion detection aims at detecting any mali-

cious action compromising integrity, conﬁdentiality

or availability of computer and network resources or

services (Axelsson, 2000). Intrusion detection sys-

tems (IDSs) are either misuse-based (Snort, 2002)

or anomaly-based (Neumann and Porras, 1999) or a

combination of both the approaches in order to ex-

ploit their mutual complementarities (Tombini et al.,

2004). Anomaly-based approaches build proﬁles rep-

resenting normal activities and detect intrusions by

comparing current system activities with learnt pro-

ﬁles. Every signiﬁcant deviation may be interpreted

as an intrusion since it represents an anomalous be-

havior. The main advantage of anomaly approaches

lies their potential capacity to detect both known and

novel attacks. However, there is no anomalyapproach

ensuring acceptable tradeoff between attack detection

and underlying false alarm rate.

Intrusion detection can be viewed as a classiﬁcation

problem in order to classify audit events (network

packets, Web server logs, system logs, etc.) as nor-

mal events or attacks. Several works used classi-

ﬁers in intrusion detection (Kruegel et al., 2003)(Se-

byala et al., 2002)(Valdes and Skinner, 2000) and

achieved acceptable detection rates on well-known

benchmarks such as KDD’99 (Lee, 1999), Darpa’99

(Lippmann et al., 2000). Examples of such classiﬁers

are Bayesian networks which have been widely used

in intrusion detection. In comparison with other clas-

siﬁers, main advantage of Bayesian ones for detecting

anomalous behaviors lies in using all the features and

feature dependencies. For instance, in (Valdes and

Skinner, 2000), naive Bayes classiﬁer is used to de-

tect malicious audit events while in (Kruegel et al.,

2003), authors use Bayesian classiﬁcation in order to

improve the aggregation of different anomaly detec-

tion model outputs. The recurring problem with the

majority of classiﬁers is their high false negative rates

mostly caused by the incapacity to correctly classify

novel attacks (Elkan, 2000)(Lee, 1999). For instance,

in (Benferhat and Tabia, 2005)(Barbar´a et al., 2001),

decision trees and variants of Bayes classiﬁers are

used to classify network connections and concluded

that their main problem lies in their failure to detect

novel attacks which they classify normal connections.

In this paper, we ﬁrst analyze and explain the prob-

lem of high false negative rates and Bayesian classi-

ﬁers incapacity to correctly classify novel behaviors

especially malicious ones. We ﬁrst focus on how new

behaviors affect and manifest through a given feature

set. Then we explain why standard Bayesian classi-

ﬁcation rule fail in detecting these new events. We

consider on one hand problems related to handling

unusual and new behaviors and on other hand prob-

lems due to insufﬁcient decision rules which do not

meet anomaly detection requirements. After that, we

Salem S. and Tabia K. (2008).

NOVEL AND ANOMALOUS BEHAVIOR DETECTION USING BAYESIAN NETWORK CLASSIFIERS.

In Proceedings of the International Conference on Security and Cryptography, pages 13-20

DOI: 10.5220/0001923300130020

 SciTePress

propose to enhance standard Bayesian classiﬁers with

four new decision rules in order to improve detect-

ing novel attacks involving abnormal behaviors. The

main objective of enhancing standard Bayesian clas-

siﬁers is to detect and identify both known and novel

attacks. Experimental studies on recent real and sim-

ulated http trafﬁc are carried out to evaluate the ef-

fectiveness of the new decision rules in detecting new

intrusive behaviors. Two variants of Bayesian classi-

ﬁers using the enhanced classiﬁcation rule are trained

on real and recent http trafﬁc involving normal data

and several Web attacks. Then we evaluated these

classiﬁers on known and novel attacks as well novel

normal behaviors.

The rest of this paper is organized as follows: Section

1 brieﬂy presents Bayesian networks and Bayesian

classiﬁcation. In section 2, we introduce anomaly de-

tection approach. We focus in section 3 on Bayesian

classiﬁcation problems in detecting novel attacks.

Section 5 proposes enhancements to the standard

Bayesian classiﬁcation rule in order to improve de-

tecting novel attacks. Experimental studies on http

trafﬁc are presented in section 6. Section 7 concludes

this paper.

2 BAYESIAN CLASSIFIERS

Anomaly detection can be viewed, to some extent, as

classiﬁers which are mapping functions from a dis-

crete or continuous feature space to a discrete set of

class labels. Once a classiﬁer is built on labeled train-

ing data, it can classify any new instance. Decision

trees (Quinlan, 1986) and Bayesian classiﬁers (Fried-

man et al., 1997) are well-known classiﬁers.

Bayesian networks are powerful graphical models for

representing and reasoning under uncertainty condi-

tions (Jensen, 1996). They consist of a graphicalcom-

ponent DAG (Directed Acyclic Graph) and a quanti-

tative probabilistic one. The graphical component al-

lows an easy representation of domain knowledge in

the form of an inﬂuence network (vertices represent

events while edges represent ”inﬂuence” relations be-

tween these events). The probabilistic component ex-

presses uncertainty relative to relationships between

domain variables using conditional probability tables.

Bayesian classiﬁcation is a particular kind of

Bayesian inference. Classiﬁcation is ensured by com-

puting the greatest a posteriori probability of the class

variable given an attribute vector. Namely, having

an attribute vector A (observed variables A

= a

, .., A

= a

), it is required to ﬁnd the most

plausible class value c

∈ C={c

, c

,..,c

}) forthis

observation. The class c

associated to A is the class

with the most a posteriori probability p(c

/A). Then

Bayesian classiﬁcation rule can be written as follows:

Class = argmax

∈C

(p(c

/A)) (1)

Term p(c

/A) denotes the posterior probability of

having class c

given the evidence A. This probability

is computed using Bayes rule as follows:

p(c

/A) =

p(A/c

) ∗ p(c

)

p(A)

(2)

In practice, the denominator of Equation 2 is ignored

because it does not depend on the different classes.

Equation 2 means that posterior probability is propor-

tional to likelihood and prior probabilities while evi-

dence probability is just a normalizing constant. For

computation complexity reasons, naive Bayes classi-

ﬁer (which the simplest form of Bayes networks) as-

sumes that features are independent in the class vari-

able context. This assumption leads to the following

formula

p(c

/A) =

p(a

) ∗ p(a

)..p(a

) ∗ p(c

)

p(A)

(3)

In the other Bayesian classiﬁers such as TAN (Tree

Augmented Naive Bayes), BAN (Augmented Naive

Bayes) and GBN (General Bayes Network), Equation

2 takes into account feature dependencies in comput-

ing conditional probabilities as it is denoted in Equa-

tion 4.

p(c

/A) =

p(a

/Pa(a

)) ∗.. ∗ p(a

/Pa(a

)) ∗ p(c

)

p(A)

(4)

Terms Pa(a

) denote parents of feature a

. Note that

learning naive Bayes classiﬁers requires only train-

ing data to compute the conditional probability tables

since the structure is known. The other Bayesian clas-

siﬁers require both structure and parameter learning.

3 ANOMALY DETECTION

Anomaly approaches build models or proﬁles rep-

resenting normal activities and detect intrusions by

computing deviations of current system activities

form normal activity proﬁle. Every signiﬁcant de-

viation may be interpreted as an intrusion since it

represents an anomalous behavior. Anomaly-based

IDSs are efﬁcient in detecting new attacks but cause

high false alarm rates which may really encumber the

application of anomaly-based IDSs in real environ-

ments. In fact, conﬁguring anomaly-based systems

to acceptable false alarm rates result in failure to de-

tect most malicious activities. The main advantage of

anomaly detection lies in it potential capacity to de-

tect both new and unknown (previously unseen) as

SECRYPT 2008 - International Conference on Security and Cryptography

well as known attacks. The capacity to detect un-

known/new attacks is a key feature in IDSs effective-

ness. This is particularly critical since new attacks ap-

pear every day and it often takes several days between

the apparition of a new attack and updating signature

data bases or ﬁxing/correcting the exploit.

In (Kumar and Spafford, 1994), authors maintain that

intrusive activities used to extract signatures or train

detection systems are a subset of anomalous behav-

iors and pointed out four audit event possibilities with

non zero probabilities:

• Intrusive but not anomalous (False Negative):

They are attacks where input data donot catch any

anomalous evidence. This is usually due to fea-

ture extraction problem. Therefore, new attacks

often require supplementary features and data in

order to be detected.

• Not intrusive but anomalous (False positive):

Commonly called false alarms, these events are

legitimate but new. Consequently, they signiﬁ-

cantly deviate from normal events proﬁle. This

problem requires updating normal proﬁles in or-

der to integrate such new normal events.

• Not intrusive and not anomalous (True Negative):

They correspond to known normal events.

• Intrusive and anomalous (True Positive): Such

events correspond to attacks where intrusive ev-

idence is caught by input data.

In practice, when proﬁling normal activities for

anomaly detection purposes, it is only a subset of nor-

mal activities which is proﬁled. This fact explains

in part false alarm rates relative to anomaly-based

IDSs. In (Kruegel et al., 2003), other problems caus-

ing high false alarm rates were identiﬁed such as sim-

plistic individual anomaly scores aggregation. Simi-

larly, building attack models or proﬁles involves only

a subset of all intrusive activities and attack variants.

This results in failure in detecting several new attacks

and attack variants. For instance, feature extraction

focuses on known attacks and normal events in or-

der to differentiate between normal and intrusive au-

dit events. Consequently, there will always be new

attacks for which old feature sets do not catch new

attacks evidence. In order to analyze the standard

Bayesian classiﬁcation incapacity to detect novel at-

tacks, we particularly focus on how novel attacks in-

volving new behaviors affect feature sets which pro-

vide input data to be analyzed.

3.1 Novel Attacks’ Impact on Feature

Sets

The following are different possibilities about how

new anomalous events affect and manifest through

feature sets:

1. New Value(s) in a Feature(s). A never seen

value is anomalous and it is due in most cases to

a malicious event. For example, Web server re-

sponse codes are from a ﬁxed set of predeﬁned

values (ex. 200, 404, 500,..). If a new response

code or any other response is observed, then this

constitutes an anomalousevent. For instance, suc-

cessful shell code attacks cause server response

without a common code. Similarly, a network

service using a new and uncommon port num-

ber is probably intrusive since most back-door

attacks communicate through uncommon ports

while common services are associated with com-

mon port numbers.

2. New Combination of Known Values. In normal

audit events, there are correlations and relation-

ships between features. Then an anomalous event

can be in the form of a never seen combination

of normal values. For example, in some http re-

quests, numerical values are often provided as pa-

rameters. The same values which are correctly

handled by a given program, can in other con-

texts cause value misinterpretations and result in

anomalous behaviors.

3. New Sequence of Events. There are several nor-

mal audit events which show sequence patterns.

For example, in on-line marketing applications,

users are ﬁrst authenticated using https protocol

for conﬁdentialdata transfers. Then a user session

beginning without https authentication is proba-

bly intrusive since the application controlﬂow has

not been followed. Such intrusive evidence can

be caught by history features summarizing past

events or by using appropriate mining anomaly

sequence patterns algorithms.

4. No Anomalous Evidence. In this case, new

anomalous events do not result in any unseen ev-

idence. The underlying problem here is related to

feature extraction and selection since not enough

data is used for catching the anomalous evidence.

From a theoretical point of view, the ﬁrst three

possibilities can be detected since intrusive behavior

evidence had appeared in the feature set. For instance,

By never seen value we mean new value in case of

nominal features or very deviating value in case of numeri-

cal features.

NOVEL AND ANOMALOUS BEHAVIOR DETECTION USING BAYESIAN NETWORK CLASSIFIERS

naive Bayes classiﬁer can be used to detect new val-

ues appearing in features because this classiﬁer uses

all features. However, new value combinations re-

quire using attribute dependencies in order to be de-

tected. A TAN, BAN or GBN classiﬁers (Friedman

et al., 1997) can be suitable for detecting such anoma-

lous evidence. As for anomalous sequence patterns,

they can be detected by Bayesian classiﬁers if the fea-

ture set includes derived features properly summariz-

ing past event sequences. However, anomalous audit

event of fourth case can not be detected for lack of

any anomalous evidence in the audit event. In prac-

tice, most novel attacks involving novel behaviors are

ﬂagged normal due to inadequate handling of novel

and unusual events and insufﬁcient decision rule.

4 WHY STANDARD BAYESIAN

CLASSIFIERS FAIL IN

DETECTING NOVEL ATTACKS

In intrusion detection, each instance to classify rep-

resents an audit event (network packet, connection,

application log record, etc.).

Novel attacks often involve new behaviors. However,

in spite of these anomalousness evidence in the fea-

ture set, Bayesian classiﬁers ﬂag in most cases novel

attacks as normal events. This failure is mainly due to

the following problems:

1. Inadequate Handling of Novel and Unusual Be-

haviors: New and unusual values or value combi-

nations are often involved by novel attacks. How-

ever, Bayesian classiﬁers handle such evidence

inadequately regarding anomaly detection objec-

tives. For instance, new values cause zero proba-

bilities which most implementations replace with

extremely small values and rely on remaining fea-

tures in order to classify the instance in hand.

An other problem with handling new and unusual

events is ﬂoating point underﬂows which happen

when multiplying several small probabilities.

2. Insufﬁcient Decision Rules: The objective of stan-

dard classiﬁcation rules is to maximize classify-

ing previously unseen instances relying on known

(training) behaviors. However, unseen behav-

iors which should be ﬂagged abnormal according

to anomaly approach, are associated with known

behavior classes. For instance, Bayesian clas-

siﬁers rely only on likelihood and prior prob-

abilities to ensure classiﬁcation. This strongly

penalizes detection of new and unusual behav-

iors in favor of frequent and common behaviors.

As we will see in experimental studies, standard

Bayesian classiﬁers predict the major part of new

normal/intrusive audit events as normal events

(Benferhat and Tabia, 2005). Attacks often have

speciﬁc signatures and may have slight variations.

Consequently, a new (or very deviating) value in

feature a

will force the conditional probability

p(a

/Attack

) to zero (or an extremely negligible

value in case of numeric features). Then the like-

lihood of the evidence will be negligible over all

classes. This problem is even stressed by the weak

a priori frequencies of some attack classes. As

a consequence, classiﬁcation will depend in this

case on class prior probabilities. Given that nor-

mal training events often represent most training

data (Lippmann et al., 2000)(Elkan, 2000), then

new audit evidence will be classiﬁed normal. Fur-

thermore, normal events are characterized by very

large variance (Benferhat and Tabia, 2008b) be-

cause normal activities involve several users, ap-

plications, etc. This leads in mostcasesto a condi-

tional probability of an attribute value in the nor-

mal class p(a

/Normal) greater than zero.

5 ENHANCING STANDARD

BAYESIAN CLASSIFICATION

RULE

In this section, we focus on enhancing Bayesian clas-

siﬁers in order to effectively detect novel attacks.

Bayesian classiﬁcation lies on posterior probabilities

given the evidence to classify (according to Equations

1 and 2). The normality associated with audit event E

(observed variablesE

= e

, E

= e

, .., E

= e

) can

be measured by posterior probability p(Normal/E).

This measure is proportional to the likelihood of E in

Normal class and prior probability of Normal class.

In practice, normality can not be directly inferred

from probability p(Normal/E) because this proba-

bility is biased. For instance, major Bayesian clas-

siﬁer implementations ignore denominator of Equa-

tion 2 while zero probability and ﬂoating point un-

derﬂow problems are handled heuristically. Assume

for instance that a never seen value had appeared in

a nominal feature e

. Then according to Equation 2,

the probability p(e

) equals zero over all classes

. In most implementations, it is an extremely small

value that is assigned to this probability. The strategy

of assigning non zero probabilities in case of new val-

ues is to use remaining features and prior probabilities

in order to classify the instance in hand. The other

problem consists in ﬂoating point underﬂow which is

caused by multiplying several small probabilitieseach

SECRYPT 2008 - International Conference on Security and Cryptography

varying between 0 and 1. This case is often handled

by ﬁxing a lower limit when multiplying probabili-

ties. In the following, we propose enhancements in

order to better handle novel behaviors and effectively

detect novel attacks.

5.1 Enhancing Bayesian Classiﬁcation

Rule to Exploit

Normality/Abnormality Duality

Anomaly-based systems ﬂag audit events ”Normal”

or ”Abnormal” according to a computed normality

degree associated with each audit event. Having

two scaled functions computing respectively normal-

ity and abnormality relative to audit event E then

these two functions are dual. Namely, this propriety

can be formulated as follows:

Normality(E) + Abnormality(E) = constant (5)

The intuitive interpretation of this propriety is

more an event is normal, less it is abnormal. Con-

versely, less normal is the event, it is more abnormal.

Translated in probability terms, Equation 5 gives the

following propriety:

P(Normal/E)+ P(Abnormal/E) = 1 (6)

Term P(Normal/E) (resp. P(Abnormal/E)) de-

notes the probability that audit event E is normal

(resp. abnormal). Bayesian classiﬁers associate a

probability distribution with the instance to classify

(audit event) and return the class having the utmost

posterior probability. Let us assume for instance

that training data involve normal data (with class la-

bel Normal) and several attack categories (labeled

Attack

, Attack

,.., Attack

). Consider the case when

p(Normal/E) is greater than all posterior probabili-

ties p(Attack

/E),.., p(Attack

/E). In this case, stan-

dard Bayesian rule, will return Normal class accord-

ing to Equation 1. However, if

p(E/Normal) < (p(E/Attack

) + .. + p(E/Attack

))

Then according to Equation 6, the probability that

audit event E is abnormal is 1-(p(Normal/E)). In-

tuitively, this audit event should be ﬂagged anoma-

lous. We accordingly propose to enhance standard

Bayesian rule as follows:

Rule 1:

If p(Normal/E)<(

∑

(p(c

6= Normal/E))

then Class = argmax

∈C

(p(c

6= Normal/E)

else Class = argmax

∈C

(p(c

/E))

Rule 1 enhances standard Bayesian classiﬁca-

tion rule in order to take into account normal-

ity/abnormality duality relative to audit events. Un-

like standard Bayesian classiﬁcation rule, Rule 1 ﬁrst

compares normality with abnormality relative to au-

dit event E and returns Normal only when the poste-

rior probability p(Normal/E) is greater than the sum

of posterior probabilities p(Attack

/E). When abnor-

mality is more important, this rule returns the attack

having the utmost posterior probability.

5.2 Enhancing Bayesian Classiﬁcation

Rule to Exploit Zero Probabilities

As discussed in Section 4, anomalous audit events

will affect the feature set either by new values, new

value combinations or new audit event sequences.

Then classifying anomalous events strongly depends

on how zero-probability and ﬂoating point underﬂow

problems are dealt with. However, since a zero prob-

ability is due to new (hence anomalous) value, then

this is anomalousnessevidence. The underlying inter-

pretation is that instance to classify involves a never

seen evidence. Then anomaly approach should ﬂag

this audit event anomalous. Similarly, an extremely

small a posteriori probability can be interpreted as a

very unusual event, hence anomalous. Then, stan-

dard Bayesian classiﬁcation rule can accordingly be

enhanced in the following way:

• If there is a feature e

where probability p(e

)

equals zero over all training classes, then this is a

new value (never seen in training data). Enhanced

Bayesian classiﬁcation rule can be formulated as

follows:

Rule 2:

If ∃ e

, ∀k, p(e

) = 0 then Class = New

else Class = argmax

∈C

(p(c

/E))

• New intrusive behaviors can be in the form

of unseen combination of seen values. In this

case, feature dependencies must be used in

order to reveal such anomalousness. Since new

value combinations will cause zero conditional

probabilities, then this anomalous evidence can

be formulated as follows:

Rule 3:

If ∃ e

, p(e

/Pa(e

) = 0 then Class = New

else Class = argmax

(p(c

/E))

Note that when building Bayesian classiﬁers,

structure learning algorithms extract feature de-

pendencies from training data. Then there may

be unseen value combinations that can not be de-

tected if the corresponding dependencies are not

extracted during structure learning phase.

NOVEL AND ANOMALOUS BEHAVIOR DETECTION USING BAYESIAN NETWORK CLASSIFIERS

5.3 Enhancing Bayesian Classiﬁcation

Rule to Exploit Likelihood of

Unusual Attacks

When training classiﬁers, some attacks have often

very small frequencies in training data sets. The

problem with such prior probabilities is to strongly

penalize the corresponding attacks likelihood. This

problem was pointed out in (Ben-Amor et al., 2003)

where authors proposed simple duplication of weak

classes in order to enhance their prior probabilities.

An alternative solution is to exploit the likelihood of

audit events as if training classes (Normal, Attack

,..,

Attack

) were equiprobable. Assume for instance

intrusive audit event E is likely to be an attack (for

example, likelihood p(E/Attack

) is the most im-

portant). Because of the negligible prior probability

of Attack

, posterior probability p(Attack

/E) will

be extremely small while p(Normal/E) can be

signiﬁcant since Normal class prior probability is

important. Then we can rely on likelihood in order to

detect attacks with small frequencies:

Rule 4:

If ∃ Attack

, ∀k, p(E/Attack

) >= p(E/c

) and

p(Normal/E) > P(Attack

/E) and p(Attack

) < ε

then Class = Attack

else Class = argmax

∈C

(p(c

/E))

This rule is provided in order to help detecting

anomalous events with best likelihood in attacks hav-

ing extremely small prior probabilities (p(Attack

) <

ε). It will be applied only if the proportion of in-

stances of Attack

in training data is less than thresh-

old ε ﬁxed by the expert. For example, this threshold

can ﬁxed for attacks representing less that 1% of the

training set. Then Rule 4 will be applied only for at-

tacks representing less 1% of training instances.

Note that standard Bayesian classiﬁcation rule (see

Equation 1) is applied only if Rules 1, 2, 3 and 4

can not be applied. As for the priority for applying

these rules, we must begin by zero probability rules

(Rules 1 and 2) then normality/abnormality duality

rule (Rule 3) and ﬁnally likelihood rule (Rule 4).

6 EXPERIMENTAL STUDIES

In this section, we provide experimental studies of our

enhanced Bayesian classiﬁcation rule on http trafﬁc

including normal real data and several http attacks.

Before giving further details, we ﬁrst present train-

ing and testing data sets then provide the experimen-

tations’ results.

6.1 Training and Testing Data Sets

We carried out experimentations on a real http traf-

ﬁc collected on a University campus during 2007.

Note that this trafﬁc includes inbound http connec-

tions to the university Web server and outbound http

connections of inside university users requesting out-

side Web servers. We extracted http trafﬁc and pre-

processed it into connection records using only packet

payloads. Each http connection is characterized by

four feature categories(Benferhat and Tabia, 2008a):

Request general features providing general informa-

tion on htt p requests. Examples of such features are

request method, request length, etc.

Request content features searching for particularly

suspicious patterns in http requests. The number

of non printable/metacharacters, number of directory

traversal patterns, etc. are few examples of request

content features.

Response features extracted by analyzing htt p re-

sponse to a given request. These features can reveal

the success or failure of an attack and can reveal sus-

picious http content in the response, in which case

Web clients are targeted by a possible attack. Ex-

amples of these features are response code, response

time, etc.

Request history features providing statistics about

past connections given that several Web attacks such

as ﬂooding, brute-force, Web vulnerability scans per-

form through several repetitive connections. Exam-

ples of such features are the number/rate of con-

nections issued by same source host and requesting

same/different URIs.

Note that in order to label the preprocessed http traf-

ﬁc (as normal or attack), we analyzed this data using

Snort(Snort, 2002) IDS as well as manual analysis.

As for other attacks, we simulated most of the attacks

involved in (Ingham and Inoue, 2007) which is to

our knowledge the most extensive and uptodate open

Web-attack data set. In addition, we played vulnera-

bility scanning sessions using w3af(Riancho, 2007).

Attacks of Table1 are categorized accordingto the

vulnerability category involved in each attack. Re-

garding attacks effects, attacks of Table 1 include de-

nial of service attacks, vulnerability scans, informa-

tion leak, unauthorized and remote access (Ingham

and Inoue, 2007). In order to evaluate the generaliza-

tion capacities and the ability to detect new attacks,

we build a testing data set including normal real http

connections as well as known attacks, known attack

variations and novel ones (attacks in bold in Table 1).

Note that new attacks included in testing data either

involve new feature values or anomalous value com-

binations.

SECRYPT 2008 - International Conference on Security and Cryptography

Table 1: Training/testing data set distribution.

Training data Testing data

Class Number % Number %

Normal 55342 55.87% 61378 88.88 %

Vulnerability scan 31152 31.45% 4456 6.45 %

Buffer overﬂow 9 0.009% 15 0.02%

Value misinterpretation 2 0.002% 1 0.00%

Poor management 3 0.003% 0 0.00%

URL decoding error 3 0.003% 0 0.00%

Other input validations 44 0.044% 4 0.01 %

Flooding 12488 12.61% 3159 4.57 %

Cross Site Scripting 0 0.00% 6 0.0001 %

SQL injection 0 0.00% 14 0.001 %

Command injection 0 0.00% 9 0.001 %

Total 99043 100% 69061 100%

6.2 Brief Description of Naive Bayes

and TAN Classiﬁers

Naive Bayes classiﬁer is the simplest form of

Bayesian networks. Its graphical component only

includes two node types: (1) A unique parent node

called root which is associated to the hidden variable

in classiﬁcation problems, and (2) a child node for

every observed variable (attribute). Note that naive

Bayes assumes that child nodes are independent in

their parent context. It is a simplifying assumption

which is not true in many real world problems but

useful for reducing computational complexity. In

order into relax this problematic assumption, other

Bayesian classiﬁers represent some of feature depen-

dencies. For instance TAN classiﬁer is a naive Bayes

classiﬁer augmented by allowing child node depen-

dencies to form a tree (Friedman et al., 1997). We use

naiveBayes classiﬁer in orderto evaluatetheability to

detect anomalous events causing new feature values

while TAN classiﬁer is used for detecting new value

combinations as TAN classiﬁers allow child node de-

pendencies.

6.3 Standard vs Enhanced Bayesian

Classiﬁcation Rule on http Trafﬁc

Table 2 compares results of standard then enhanced

naive Bayes and TAN classiﬁers built on training data

and evaluated on testing one.

Note that enhanced classiﬁcation rule evaluated in

Table 2 uses normality/abnormality duality and zero

probabilities (see Rule 1, 2 and 3).

Experiments on standard Bayesian classiﬁcation rule:

At ﬁrst sight, both classiﬁers achieve good detection

rates regarding their PCCs (Percent of Correct Classi-

Table 2: Evaluation of naive Bayes (NB) and TAN classi-

ﬁers using standard/enhanced Bayesian classiﬁcation rules

on http trafﬁc.

Standard Enhanced

Bayesian rule Bayesian rule

NB TAN NB TAN

Normal 98.2% 99.9% 91.7% 97.8%

Vulnerability scan 15.8% 44.1% 100% 100%

Buffer overﬂow 6.7% 20.2% 80% 100%

Value misinterpretation 100% 0.00% 100% 100%

Other input validation 75.0% 100% 100% 100%

Flooding 100% 100% 100% 100%

Cross Site Scripting 0.00% 0.00 % 100% 100%

SQL injection 0.00% 0.00% 100 % 100%

Command injection 0.00% 0.00 % 100 % 100%

Total PCC 92.87% 96.24% 96.45% 98.07%

ﬁcation) but they are ineffective in detecting novel at-

tacks (attacks in bold in Table 2). Confusion matrixes

relative to this experimentation show that naive Bayes

and TAN classiﬁers misclassiﬁed all new attacks and

predicted them Normal. However, results of Table 2

show that TAN classiﬁer performs better than naive

Bayes since it represents some feature dependencies.

Furthermore, testing attacks causing new value com-

binations of seen anomalous values (involved sepa-

rately in different training attacks) cause false nega-

tives. For instance, testing vulnerability scans are not

well detected since they involve new value combina-

tions.

Experiments on enhanced Bayesian classiﬁcation

rule: Naive Bayes and TAN classiﬁers using the en-

hanced rule perform signiﬁcantly better than with

standard rule. More particularly, both the classi-

ﬁers succeeded in detecting both novel and known

attacks. Unlike naive Bayes, enhanced TAN classi-

ﬁer improves detection rates without triggeringhigher

false alarm rate (see PCC of Normal class in Table

2). Furthermore, TAN classiﬁer correctly detects and

identiﬁes all known and novel attacks.

Figure 1 reports results of enhanced naive Bayes us-

ing likelihood rule (see Rule 4) with a threshold ﬁxed

to different values.

Figure 1: Naive Bayes evaluation using different thresholds

for Rule 4.

NOVEL AND ANOMALOUS BEHAVIOR DETECTION USING BAYESIAN NETWORK CLASSIFIERS

Figure 1 shows that novel attacks detection rates

can be improved by exploiting likelihood of attacks

having small prior probabilities. For instance, ﬁxing

the threshold of Rule 4 to 1% signiﬁcantly improves

detection rates of several attacks since the detection

of these attacks was strongly penalized by their fre-

quencies in training data.

Results of Table 2 and Figure 1 show that signiﬁcant

improvements can be achieved in detecting novel at-

tacks by enhancing standard classiﬁcation rules in or-

der to meet anomaly detection requirements.

Note that we carried out other experimentations

Darpa’99 data set (Lippmann et al., 2000) and con-

cluded that our enhancements allow signiﬁcantly im-

proving the detection of novel attacks.

7 CONCLUSIONS

In this paper, we proposed enhancements to the stan-

dard Bayesian classiﬁcation rule in order to effec-

tively detect both known and novel attacks. We ﬁrstly

analyzed Bayesian classiﬁers failure to detect most

novel attacks which they ﬂag normal behaviors. Ac-

cordingly, we proposed to enhance standard Bayesian

classiﬁcation rule in order to meet anomaly detec-

tion objectives. Our enhancements aim at better han-

dling novel and unusual behaviors and providing a

Bayesian classiﬁcation rule which better ﬁts anomaly

detection requirements. More precisely, we proposed

enhancements to exploit normality/abnormality dual-

ity relative to audit events as well as zero probabilities

caused by anomalous evidence occurrence and likeli-

hood of attacks having extremely small prior proba-

bilities. Experiments on recent http trafﬁc involving

real data and several Web attacks showed the signiﬁ-

cant improvements achieved by the enhanced classiﬁ-

cation rule in comparison with the standard one.

ACKNOWLEDGEMENTS

This work is supported by MICRAC project

(http://www.irit.fr/MICRAC/).

REFERENCES

Axelsson, S. (2000). Intrusion detection systems: A sur-

vey and taxonomy. Technical Report 99-15, Chalmers

Univ.

Because of the limit on the number of pages, we cannot

report the results on Darpa’99 data set

Barbar´a, D., Wu, N., and Jajodia, S. (2001). Detecting

novel network intrusions using bayes estimators. In

Proceedings of the First SIAM Conference on Data

Mining.

Ben-Amor, N., Benferhat, S., and Elouedi, Z. (2003). Naive

bayesian networks in intrusion detection systems. In

ACM, Cavtat-Dubrovnik, Croatia.

Benferhat, S. and Tabia, K. (2005). On the combination of

naive bayes and decision trees for intrusion detection.

In CIMCA/IAWTIC, pages 211–216.

Benferhat, S. and Tabia, K. (2008a). Classiﬁcation features

for detecting server-side and client-side web attacks.

In 23rd International Security Conference, Italy.

Benferhat, S. and Tabia, K. (2008b). Context-based proﬁl-

ing for anomaly intrusion detection with diagnosis. In

ARES2008 : Third International Conference on Avail-

ability, Reliability and Security, Barcelona, Spain.

Elkan, C. (2000). Results of the kdd’99 classiﬁer learning.

SIGKDD Explorations, 1(2):63–64.

Friedman, N., Geiger, D., and Goldszmidt, M. (1997).

Bayesian network classiﬁers. Machine Learning,

29(2-3):131–163.

Ingham, K. L. and Inoue, H. (2007). Comparing anomaly

detection techniques for http. In RAID, pages 42–62.

Jensen, F. V. (1996). An Introduction to Bayesian Networks.

UCL press, London.

Kruegel, C., Mutz, D., Robertson, W., and Valeur, F. (2003).

Bayesian event classiﬁcation for intrusion detection.

Kumar, S. and Spafford, E. H. (1994). An appli-

cation of pattern matching in intrusion detection.

Tech. Rep. CSD–TR–94–013, Department of Com-

puter Scien’ces, Purdue University, West Lafayette.

Lee, W. (1999). A data mining framework for constructing

features and models for intrusion detection systems.

PhD thesis, New York, NY, USA.

Lippmann, R., Haines, J. W., Fried, D. J., Korba, J., and

Das, K. (2000). The 1999 darpa off-line intrusion de-

tection evaluation. Computer Networks, 34(4).

Neumann, P. G. and Porras, P. A. (1999). Experience with

EMERALD to date. pages 73–80.

Quinlan, J. R. (1986). Induction of decision trees. Mach.

Learn., 1(1).

Riancho, A. (2007). w3af - web application attack and audit

framework.

Sebyala, A. A., Olukemi, T., and Sacks, L. (2002). Ac-

tive platform security through intrusion detection us-

ing naive bayesian network for anomaly detection. In

Proceedings of the London Communications Sympo-

sium.

Snort (2002). Snort: The open source network intrusion

detection system. http://www.snort.org.

Tombini, E., Debar, H., Me, L., and Ducasse, M. (2004).

A serial combination of anomaly and misuse idses ap-

plied to http trafﬁc. In ACSAC ’04: Proceedings of the

20th Annual Computer Security Applications Confer-

ence (ACSAC’04), pages 428–437.

Valdes, A. and Skinner, K. (2000). Adaptive, model-based

monitoring for cyber attack detection. In Recent Ad-

vances in Intrusion Detection, pages 80–92.

SECRYPT 2008 - International Conference on Security and Cryptography