EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR

LOCAL INCONSISTENCY ISOLATION IN FIREWALL ACLS

S. Pozo, A. J. Varela-Vaca, R. M. Gasca and R. Ceballos

Department of Computer Languages and Systems. Computer Engineering College

University of Seville. Avda. Reina Mercedes S/N, 41012 Seville, Spain

Keywords: Isolation, Inconsistency, Conflict, Anomaly, Firewall, ACL, Ruleset.

Abstract: Writing and managing firewall ACLs are hard, tedious, time-consuming and error-prone tasks for a wide

range of reasons. During these tasks, inconsistent rules can be introduced. An inconsistent firewall ACL

implies in general a design fault, and indicates that the firewall is accepting traffic that should be denied or

vice versa. This can result in severe problems such as unwanted accesses to services, denial of service,

overflows, etc. However, the administrator is who ultimately decides if an inconsistent rule is a fault or not.

Although many algorithms to detect and manage inconsistencies in firewall ACLs have been proposed, they

have different drawbacks regarding different aspects of the consistency diagnosis problem, which can

prevent their use in a wide range of real-life situations. In this paper, we review these algorithms along with

their drawbacks, and propose a new divide and conquer based algorithm, which uses specialized abstract

data types. The proposed algorithm returns consistency results over the original ACL. Its computational

complexity is better than the current best algorithm for inconsistency isolation, as experimental results will

also show.

1 INTRODUCTION

A firewall is a network element that controls the

traversal of packets across different network

segments. It is a mechanism to enforce an Access

Control Policy, represented as an Access Control

List (ACL).

One of the most important and frequent faults

during firewall ACL design and management are

inconsistencies (Wool, 2004) (Pozo2, 2008). A

firewall ACL with inconsistent rules implies in

general design faults, and indicates that the firewall

is accepting traffic that should be denied or vice

versa. This can result in severe problems such as

unwanted accesses to services, denial of service,

overflows, etc. ACL consistency is of extreme

importance in several contexts, such as highly

sensitive applications (e.g. health care). Thus,

algorithms and tools to automatically isolate and

characterize inconsistencies must be provided in

order to give firewall administrator enough

information to correct them and reduce the number

of faults in firewall ACLs. In this paper we are only

interested in layer 3 firewall ACLs, and thus in the

five typical selectors (Taylor, 2005): protocol,

source and destination IPs, and source and

destination ports.

Many algorithms to isolate and characterize

inconsistencies in firewall ACLs have been

proposed, but it is the firewall administrator who

ultimately decides which rules have to be corrected.

However, these algorithms have many drawbacks

regarding different aspects of the consistency

diagnosis problem. One of the most important ones

is that they pre-process the firewall ACL using

different types of non-trivial decompositions in

order to use more efficient abstract data types and

techniques. However, these decomposition

techniques increase the number of rules in the ACL

and have worst-case exponential time and space

complexity. As a consequence, results of these

consistency management algorithms are given over

the modified ACL. Time and space complexity of

inconsistency isolation algorithms is very important,

since these algorithms are being used in a new range

of applications in resource-constrained devices in

ubiquitous networks, such as ad-hoc network node

real-time ACL updates, real-time IDS or IPS rule

updates, etc.

42

Pozo S., J. Varela-Vaca A., M. Gasca R. and Ceballos R. (2009).

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN FIREWALL ACLS.

In Proceedings of the International Conference on Security and Cryptography, pages 42-53

DOI: 10.5220/0002233100420053

Copyright

c

SciTePress

To the best of our knowledge, there are only two

algorithms that do not decompose the ACL: the

trivial one (which is worst case O(f

2

) time

complexity with the number of rules in the ACL, f);

an optimization over the trivial one (Pozo2, 2008),

which only improves the average and best cases by

an order of magnitude in best and average cases.

However, the best algorithm to date (which

decompose the ACL) represents an improvement

over 30 times (on average) over the trivial one

(Baboescu, 2003).

In this paper we propose a rule-order

independent inconsistency isolation algorithm. Our

approach is based on an analysis of which data type

each rule selector can store, on the design of

specialized abstract data types for each one, and on

divide and conquer algorithm. Worst-case

computational complexity of the algorithm proposed

in this paper is better in all cases than Baboescu one,

as is going to be shown in both theoretical

complexity analysis and experimental results with

real ACLs. Furthermore, ACL pre-process is not

needed by our algorithm and thus results are

returned over the original, unmodified ACL.

This paper is structured as follows. In section 2,

we review related works comparing them to our

proposal. In section 3 we briefly analyze the

internals of the consistency diagnosis problem in

firewall ACLs. In section 4 we explain the

methodology followed to solve the problem, and

propose abstract data types (ADTs) and algorithms,

with their theoretical complexity analysis. In section

5, we give experimental results with real ACLs,

comparing these results with other proposals. In

section 6 we give some concluding remarks.

2 RELATED WORKS

The closest works to ours are related with

consistency isolation in general network filters. In

the most recent work, (Baboescu, 2003) provides

algorithms to detect inconsistencies in router filters

that are worst-case 30 times (an order of magnitude)

faster than O(f

2

) ones for the general case of any

number of selectors per rule, where f is the number

of rules in the ACL. Although a theoretical

complexity analysis is not provided, it improves

other previous isolation algorithms for k filters

(Eppstein, 2001) (Hari, 2000). Baboescu proposal

implies ACL decomposition as a pre-process,

converting selector ranges to prefixes (Srinivasan,

1998). Nevertheless, the range to prefix conversion

technique could need to split a range in several

prefixes and thus the final number of rules could

increase over the original ACL. In (Gupta, 1999)

(Taylor, 2005), Taylor and Gupta outlined that this

kind of conversion could be inefficient, because

transport layer specifications vary widely (for

example it possible to specify open port ranges, such

as “all ports greater than 1023”). Taylor also

calculated that, in the worst case, a range covering

w-bit port numbers may require 2(w-1) prefixes, and

that a single ACL including only two port ranges

could require 2(w-1)

2

entries (900 entries for 16-bit

port numbers, increasing the number of rules in the

ACL. Thus, inconsistency isolation results are given

over the modified ACL, which is bigger and

different that the original one. Baboescu also

calculated ACL size increase for its data set in his

paper.

Other researchers have analyzed the minimal

inconsistency diagnosis problem. This problem is

different to the inconsistency isolation one, since

isolation is the action of finding the rules that are

inconsistent with other ones, and is a polynomial

problem. However, inconsistency diagnosis also

implies the identification of the minimal number of

rules which are the cause of the isolated

inconsistencies (Pozo2, 2008), and the minimal

characterization of the diagnoses among an

established taxonomy (Hamed, 2006). The

consistency diagnosis problem consists in the

resolution of these three problems (isolation,

identification, characterization) plus a correction

stage (if necessary).

These researchers apply ACL decompositions in

some cases, and combinatorial algorithms in others,

in order to optimally solve the three problems at a

time. Decompositions are used in (Al-Shaer, 2004)

and in (García-Alfaro, 2008), which use ACL

decorrelation (Luis, 2002). As with range to prefix

conversion, ACL decorrelation increase the number

of rules in the ACL and have worst-case exponential

time and space complexity. As a consequence,

results of the consistency management algorithms

are given over the modified ACL.

Ordered Binary Decision Diagrams (OBDDs)

have been used in Fireman (Yuan, 2006). Fireman

authors’ do not decorrelate the ACL, and thus,

results are given over the original one. Note that the

complexity of OBDD algorithms depends on the

optimal ordering of its nodes, which is a NP-

Complete problem (Bollig, 1996). This results in

worst case exponential complexity, as with other

proposals for the consistency diagnosis problem.

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

43

3 CONSISTENCY IN FIREWALL

ACLS

Firewall rule-matching engines match packets in a

linear way, checking rules from the first one to the

last. The matching process stops once a rule has

been matched, or once there are no more rules in the

ACL (in this case, the firewall platform executes a

predefined default action). The values of selectors

(or filtering fields) between different rules can

overlap, and can even be rules that are completely

equal to others. An example of an ACL is presented

in Fig. 2. In this example,

43RR⊂ , because all

selectors of R4 are at least subsets of the same

selectors of R3. However, their actions are the

opposite. In this case R4 is never going to be

matched in this ACL, because all packets that R4

could match are also matched by a rule with higher

priority, R3. In this case, the firewall administrator

must be notified, since R3 may be a faulty rule (the

consequence, or the error, is that there is traffic that

is denied by R3 and it may be accepted). As another

example take rules R1 and R2,

12RR⊂ . In this

case traffic that is denied by R1 is also accepted by

R2. This kind of relation is used by administrators to

express exceptions (the most specific rule, R1) to a

general rule (R2), and is not usually considered to be

a fault, because there is no error in the ACL

execution.

Note that in these two examples, actions are

always different (in firewalls there only two possible

actions: to allow or to deny a packet). If actions were

equal, there is no potential erroneous behaviour in

the executed ACL, and thus there is no

inconsistency. However, in this case, the relation

between the rules is a redundancy, which is another

kind of problem that can reduce the performance and

increase the memory consumption of the rule-

matching engine. In this paper, we are only

interested in rules that be potential faults, or

inconsistent rules [4] (there could be cases where a

rule is inconsistent with many others). It must be

clarified that inconsistencies are order-independent

and mutual. We assume that ACL

f

do not have

redundancies (redundancies can be efficiently

detected and removed (Liu, 2008))

3.1 Problem Formalization

A layer 3 Firewall ACL is in general a list of linearly

ordered (total order) condition/action rules. Each

rule firewall rule is formed by an antecedent and a

(binary) consequent representing the action that

must be taken once a packet matches the rule.

Let PORTSRC and PORTDST be sets of natural

numbers and intervals of naturals between

[0..65535] representing a port number. Le IPSRC

and IPDST be two sets of valid IPv4 addresses in the

octet and CIDR format (o1.o2.o3.o4/CIDR). Let

PROTOCOL be a set of natural numbers in [0..255]

representing a protocol number. Let ID≥1 be a

natural number representing the rule priority in the

ACL (1 is the rule with more priority). These five

sets plus the ID represent the typical selectors of a

firewall rule [3]. Let ACTION={Allow, Deny} be the

binary set of possible actions for a rule consequent.

Let W=PROTOCOL×IPSRC×IPDST×PORTSRC×

PORTDST be the cartesian product of the five

previous sets or selectors, which represents a 5-

dimensional hypercube. W is the space where an

antecedent of a firewall rule can be defined. Layer 7

firewalls use different selectors (e.g. a selector to

express the content of a packet) and thus needs a

different problem analysis.

Definition 3.1. A layer 3 firewall ACL or rule set, is

defined as the cartesian product ACL

f

=W×ACTION,

where |ACL

f

|=f. A rule in ACL

f

is defined as

,1

kf

RACL kf

∈

≤≤ , kID∈ where

R

k

[PROTOCOL], R

k

[IPSRC], R

k

[IPDST],

R

k

[PORTSRC], R

k

[PORTDST] represent the

corresponding selectors of the rule.

Definition 3.2. ACL

f

can be trivially divided in two

disjoint sets, one composed of rules with Allow

action (ACL

allow

, where |ACL

allow

|=m), and the other

composed of rules with Deny action (ACL

deny

, where

|ACL

deny

|=n). Thus

allow deny f

A

CL ACL ACL=∪ and

allow deny

ACL ACL

∅

=

∩

Definition 3.3. Let the antecedent of a rule of

kf

RACL

∈

defined as an element or subset of W,

()

k

aR W⊆ . Let the consequent of a rule

kf

RACL

∈

be defined as ()

k

c R Allow Deny=∨.

The union of the antecedents of all rules in

ACL

allow

is the set

A,

1

()

m

i allow

AaRACL=∈

∪

. The union of

the antecedents of all rules in

ACL

deny

is the set D,

()

n

j

deny

j

DaRACL=∈

∪

Definition 3.4. Inconsistency Detection.

()()

iallowjdeny

aR ACL aR ACL

∈

∈≠∅∩ iff R

i

and R

j

are mutually inconsistent, I(R

i

, R

j

) |− ⊥. Since two

elements in ACL

f

representing an action and the

contrary over a subset of W are logically

inconsistent. In the same way ACL

f

is inconsistent

SECRYPT 2009 - International Conference on Security and Cryptography

44

Priority/ID Protocol Source IP Src Port Destination IP Dst Port Action

R1 tcp 192.168.1.5/32 any *.*.*.*/0 80 deny

R2 tcp 192.168.1.*/24 any *.*.*.*/0 80 allow

R3 tcp *.*.*.*/0 any 172.0.1.10/32 80 allow

R4 tcp 192.168.1.*/24 any 172.0.1.10/32 80 deny

R5 tcp 192.168.1.60/32 any *.*.*.*/0 21 deny

R6 tcp 192.168.1.*/24 any *.*.*.*/0 21 allow

R7 tcp 192.168.1.*/24 any 172.0.1.10/32 21 allow

R8 tcp *.*.*.*/0 any *.*.*.*/0 any deny

R9 udp 192.168.1.*/24 any 172.0.1.10/32 53 allow

R10 udp *.*.*.*/0 any 172.0.1.10/32 53 allow

R11 udp 192.168.2.*/24 any 172.0.2.*/24 any allow

R12 udp *.*.*.*/0 any *.*.*.*/0 any deny

Figure 1: Example of a Firewall ACL.

iff AD≠∅∩ . Consistency is not affected by the

relative priority between rules. An inconsistency is

considered to be a fault if an administrator identifies

the behaviour of the executed ACL as being causing

undesirable effects (or having errors).

Definition 3.5. Inconsistency Isolation. It is to find

out all

i allow

RACL∈ ,

jdeny

RACL

∈

such that I(R

i

,

R

j

) |− ⊥.

The objective of the algorithm proposed in this

paper is to isolate (Definition 3.4) the elements in

ACL

allow

and ACL

deny

which are mutually

inconsistent (Definition 3.5). The trivial algorithm

consist in checking all pairs of rules in ACL

f

that are

consistent with Definition 3.4, which is in O(f

2

), or

to check rules in the set A with rules in the set D

with (again with Definition 3.5), which is in O(n·m).

Our algorithm depart from ACL

allow

and ACL

deny

,

with

|| | |

deny allow

ACL ACL< (and thus n<m) (if

|| | |

deny allow

ACL ACL< , results and explanations

are analogous), and is based on divide and conquer

algorithm:

Each set M

i

contains the ID selector of rules in

ACL

allow

that intersect with a given rule

ideny

RACL∈

:

Thus, the result of the isolation process consists

of several M

i

sets, where each one contains the IDs

of the inconsistent rules in ACL

deny

for a given rule

R

i

of ACL

allow

. A set M

i

is empty iff R

i

is consistent.

This result can be trivially decomposed to obtain all

pairs of inconsistent rules in ACL

f

.

Figure 2: Result of the inconsistency isolation process.

This result can be used directly by the firewall

administrator, or as input to an inconsistency

identification process (Pozo2, 2008), resulting in a

diagnosis that can be characterized. There is a

complete taxonomy for firewall ACL inconsistencies

available in (Hamed, 2006). We are interested in all

kinds of inconsistencies, independently of their

characterization, since all of them are equally

important for the firewall administrator (who is the

responsible of deciding if they are considered faults

or not).

4 INCONSISTENCY ISOLATION

PROCESS

An extensive analysis of the market-leader firewall

languages was presented in (Pozo1, 2009). In the

analysis it was shown that IP addresses can be

expressed by all of them in octets with a CIDR value

(IP blocks), port numbers as naturals or intervals of

naturals, and protocols as a natural number. Thus,

each selector although being different by nature, can

1

,

,,,

,

([])

deny

i

in

j

id i i

j

ACL

R

P

ROTOCOL IPSRC IPDST

jj

PORTSRC PORTDST

ARj M

σ

≤≤

∀∈

⎧⎫

∀∈

⎨⎬

⎩⎭

∩≠∅=

∏

ii

,,

,,

,

j

ii

j

PROTOCOL IPSRC

IPDST PORTSRC PORTDST

MMj

⎧⎫

=∈

⎨⎬

⎩⎭

∩

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

45

Figure 3: Proposed inconsistency isolation process.

be expressed as natural numbers and, in some cases,

as intervals of naturals.

One of the main ideas of our approach is to use a

specialized abstract data type (ADT) to store the set

of all selectors of the same type of the m rules in

ACL

allow

(i.e. one ADT to store protocols used in all

rules, two ADTs to store the source and destination

IPs used in all rules, and another two ADTs to store

source and destination ports). With this division and

using divide and conquer, in order to know with

which rules in ACL

allow

the rule

ddeny

RACL∈ is

inconsistent with, it is needed a search in each ADT

for each selector in R

d

. Each of these five searches

returns all rules which have an intersecting selector.

However, not all of these rules are inconsistent with

R

d

, a final combination step is needed. Attending to

the presented inconsistency definitions, R

d

is

inconsistent with one or more rules in ACL

allow

only

if all of its selectors intersect with all the selectors of

one or more rules in ACL

allow

. Thus, the results of

the five searches must be intersected in order to get

this information. Since the process must be repeated

for the n rules in ACL

deny

, this complexity must be

multiplied by n. The whole process is graphically

represented in Fig. 3.

In the following sections, a different data

structure is going to be proposed for each selector,

based on the analysis of the particular data set that

each one can store (Pozo1, 2009). The objective is to

find or design ADTs capable of doing searches in

worst case time complexity better or equal than

O(logm). Finally, a combination step for these

search results in worst case time complexity in O(r)

is also going to be proposed, where r=m/k, and

where k a very big constant (k≥128) which is going

to be theoretically calculated. So, the final time

complexity of the algorithm is in worst case

O(f

2

/4k). Note that in the worst case, n=m=f/2 (i.e.

ACL

f

has half rules with allow action, and the other

half with deny). As we have said at the beginning of

the section, we have assumed that m>n. However, if

n>m, the algorithm can adapt itself and use ACL

deny

rules (instead of ACL

allow

ones) to instantiate ADTs.

That is, the algorithm always dynamically takes the

bigger of the two ACLs for ADT instantiation.

Since the ADTs are only populated once,

insertion time for each ADT is amortized during

search operations. Also take into account that if the

ACL is going to be updated (new rules are inserted,

modified, or removed), ADT operations for updates

are necessary. However update operations have not

been considered, since updates are not the focus of

this paper, but a topic for future research. The

presented isolation process is thus considered static.

4.1 ADT for Protocol Number Selector

Attending to the exhaustive analysis of real firewall

languages presented in an earlier work (Pozo1,

2009) the protocol selector only admits 8-bit natural

numbers and the wildcard, ‘*’. Although symbolic

names are also possible, they can be converted to

naturals using IANA protocol number list

(RFC5237). An important fact is that no ranges are

allowed in the syntax of the selector, and thus search

is a trivial operation, since in order to find a non-

empty intersection with a protocol number (the one

of rule

ddeny

RACL∈ ) there are only two possible

coincidences in the ADT: ‘*’; or exactly the same

value. In the case that Rd protocol number is ‘*’,

SECRYPT 2009 - International Conference on Security and Cryptography

46

then R

d

intersects with all rules of the ADT, that is

all rules in ACL

allow

, and no search is necessary.

To store the association <Protocol number, Rule

ID> we propose to use a hash table with protocol as

the key, and the rule IDs as value. Hash tables

(Cormen, 2001) have O(1) (constant) time

complexity for insertions, removals, updates, and

search operations if a perfect hash function is used.

A perfect and minimal hash function is possible,

since the key space is limited and known in advance

(from 0 to 65535, plus the ‘*’). Hash table

instantiation is thus worst case O(n) (the number of

rules in ACL

allow

).

However, hash tables cannot store duplicate

keys. This is an important problem, since in most

real-life firewall ACLs only a few protocol numbers

are used, although they could be thousand of rules in

the ACL. This issue can be solved grouping all

protocol selectors of the rules that share the same

value (the same key). In this case, the associated

value to the key is a set containing the rule IDs of all

rules that have the key value as the value of their

protocol selector. However, as removal of values

could be inefficient in this way (a hash lookup plus a

search in the list of rule IDs), instead of a list, it is

used a fixed-size bit set of size m (the size of

ACL

allow

). Each position of the bit set represents one

of the m rules in ACL

allow

. Positions are set to ‘1’ for

the rules in the hash table that share the same

protocol number. As a side effect, with only one

lookup operation in the hash table, all rule IDs that

share the same protocol number are returned, as the

bit set is the return result of search operations.

Fig. 4 presents the hash table associated to Fig. 1

example, and the result of all the possible search

operations using the same protocol selector values of

ACL

deny

rules. In order to simplify the figure, only

the set positions of bit sets are represented (rule IDs

of the assigned rules have been directly used).

Furthermore, protocol names have been transformed

to IANA protocol numbers in the rightmost part of

the figure.

4.2 ADT for Port Number Selectors

Again, attending to the syntax analysis of market-

leader firewall languages (Pozo1, 2009), the port

selectors admit 16-bit natural numbers, double-

ended closed natural intervals, and ‘*’. Symbolic

names are converted to naturals or intervals. Source

and destination port selectors are treated the same

way from ADT and complexity viewpoint, and thus

the discussion is applicable for both.

As with the protocol selector, the result of the

search operation for port numbers is all rule IDs of

ACL

allow

which have an intersecting port number

with

ddeny

RACL∈ . In this case, a hash table is

useless, since searching a port or an interval in it will

only return equality result, but not intersections with

port intervals. For example, in a hash table with keys

{80, 79-81} and the port of R

d

is 80, the search

operation would return only the rule IDs associated

to port 80 key, but note that port 80 also intersects

with the interval 79-81. In the same way, if the port

of R

d

is the interval [81-82], then no value will be

returned, since the interval [81-82] is not stored in

the hash table. Searching the entire hash table would

return the needed result, but this operation has a

linear time complexity with the number of different

keys in the hash table.

ACL

allow

R

1

(proto=tcp)

R

2

(proto=tcp)

R

5

(proto=tcp)

R

6

(proto=tcp)

R

8

(proto=udp)

R

9

(proto=udp)

R

10

(proto=udp)

S

E

A

R

C

H

<K

1

, V

1

> = <6, {R

1

, R

2

, R

5

, R

6

}>

<K

2

, V

2

> = <17, {R

8

, R

9

, R

10

}>

H

ASH TABLE

ACL

deny

R

0

(proto=tcp)

R

3

(proto=tcp)

R

4

(proto=tcp)

R

7

(proto=tcp)

R

11

(proto=udp)

RESULTS

Search(R

0

)={R

1

, R

2

, R

5

, R

6

}

Search(R

3

)={R

1

, R

2

, R

5

, R

6

}

Search(R

4

)={R

1

, R

2

, R

5

, R

6

}

Search(R

7

)={R

1

, R

2

, R

5

, R

6

}

Search(R

11

)={R

8

, R

9

, R

10

}

Figure 4: Hash table (perfect and minimal hash function)

of protocol selector of Figure 1 example (ACL

allow rules)

and search results for ACL

deny rules.

There are two well-known 2D problems in

computational geometry that solve similar searches

(Chiang, 1991): first, given a set of data points (port

numbers) and a query rectangle (port interval), give

all the points that are inside the rectangle (this is the

orthogonal range search problem); second, given a

set of (possibly intersecting) data rectangles (port

intervals) and a query point (port number), give all

rectangles that intersect the query point (this is the

stabbing problem).

These two 2D problems can be reformulated into

1D space, where rectangles are intervals and points

are only represented by one coordinate. In 1D, these

problems are called 1D range search problem (de

Berg, 1997) and overlapping interval search

problem (Edelsbrunner, 1983) (EdelsBrunner2,

1983) respectively. Fortunately, specialized data

structures for 1D and 2D problems that give optimal

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

47

bounds (in time and space) solutions to these two

problems exist (Chiang, 1991). In the particular case

of 1D, the Interval Tree (Cormen, 2001)

(Edelsbrunner, 1983) (EdelsBrunner2, 1983), or

ITree, is the selected ADT because it has optimal

bound for the 1D problem (in time and space).

Fortunately, our port number or port interval

search problems can trivially be reformulated to

range search and overlapping interval search

problems respectively, as port numbers can be

represented as points in a 1D plane, and port

intervals can be presented as lines in the same 1D

plane.

Let X be a set of M points in a line, and S a set of

m segments with endpoints in X. The primary

structure for the ITree, T, can be a balanced binary

search tree (Chiang, 1991) or a red-black tree

(Cormen, 2001), whose internal nodes store the

points of X, sorted from left to right, and whose

leaves represent intervals between consecutive

points of X. Each segment s of S is allocated at the

least common ancestor of the nodes associated with

the endpoints of s. The set of segments allocated at a

node b, denoted by S(b), is represented by two lists

that store the left endpoints sorted from left to right,

and the right endpoints sorted from right to left.

Hence, the space complexity is in O(m+M). In our

problem, this is linear with the number of rules in

ACL

allow

. Furthermore, in our implementation

duplicate intervals or points are not allowed (as with

duplicate protocols), and are only stored once (again,

using a bits set for rule IDs). Thus, the space

complexity is reduced in a constant factor.

ITrees are static ADTs, where only a fixed set of

segments and points, known in advance, can be

stored. However, in order to support insertions and

deletions of segments and points, the endpoint lists

can be replaced with inorder-threaded balanced

search trees. Hence, the update time is in amortized

O(logm). Query time is in O(logm + L), where L is

the number of returned results (a constant factor).

Thus, instantiation is in worst case amortized

O(m*logm), one insertion for each rule in ACL

allow

.

Best case time complexity could be very small when

the number of port repetitions between different

rules is very high, since the resulting ITree would be

very small. ITrees is a well-know ADT, which has

been widely used in database searches. Due to space

constraints, no more information will be presented

here, but it is available in the given references,

including a time and space complexity analysis.

The result of the search operation over the ITree

with a port number or interval of the rule

ddeny

RACL∈ , is the union of all bit sets associated

to port values in the ITree which intersect the given

port of R

d

, or a bit set with all bits set to ‘1’ if the

given port of R

d

is ‘*’. Fig. 5 presents the ITree

associated to Fig. 1 example (destination port

selector of ACL

allow

only), and the result of all the

possible search operations using destination port

selector of Fig. 1 ACL

deny

rules.

ACL

allow

R

1

(dport=80)

R

2

(dport=80)

R

5

(dport=21)

R

6

(dport=21)

R

8

(dport=53)

R

9

(dport=53)

R

10

(dport=*)

ACL

deny

R

0

(dport=80)

R

3

(dport=80)

R

4

(dport=21)

R

7

(dport*)

R

11

(dport=*)

S

E

A

R

C

H

RESULTS

Search(R

0

)={R

10

, R

1

, R

2

}

Search(R

3

)={R

10

, R

1

, R

2

}

Search(R

4

)={R

10

, R

5

, R

6

}

Search(R

7

)={R

10

, R

5

, R

6

,R

8

, R

9

, R

1

,R

2

}

Search(R

11

)={R

10

, R

5

, R

6

,R

8

, R

9

, R

1

,R

2

}

[0-65535]

{R

10

}

[21,21]

{R

5

, R

6

}

[80,80]

{R

1

, R

2

}

[53,53]

{R

8

, R

9

}

INTERVAL TREE

Figure 5: Interval tree of destination port selector of

Figure 1 example (ACL

allow rules) and search results for

ACL

deny rules.

4.3 ADT for IP Address Selectors

Attending to the syntax analysis of firewall

languages (Pozo1, 2009), both IP address selectors

admit 32-bit host IP addresses in CIDR format, and

‘*’. Symbolic names are converted to octets. Source

and destination IP selectors are treated the same way

from ADT and complexity viewpoint, and thus the

discussion is applicable for both.

As with previous cases duplicates are not

allowed (bit sets are used again). Thus, the result of

the search operation must be a bit set with positions

set to ‘1’ for all rule IDs of ACL

allow

which have an

intersecting IP with the given in the rule

ddeny

RACL∈ . As an IP block is a compact way of

expressing IP address intervals, a hash table is again

useless for IPs. An IP address is composed by four

octets, each one being an 8-bit natural. A search

operation over an ADT must use the CIDR of the

IPs stored in it: Let IP

1

/CIDR

1

and IP

2

/CIDR

2

be two

IP addresses, if CIDR

s

is the shortest of the two

SECRYPT 2009 - International Conference on Security and Cryptography

48

netmasks, then the intersection of IP

1

and IP

2

is not

empty if IP

1

&CIDR

s

=IP

2

&CIDR

s

.

Note that valid network IP addresses have CIDR

values between 1 and 30. Value 31 is useless, since

it only permits two hosts (.0 and .255, which are not

valid host IP addresses); CIDR 32 is reserved for

host IPs; and CIDR 0 is only used for the wildcard

IP (0.0.0.0/0).

We propose the design of a completely new and

specialized ADT to store IP addresses, capable of

doing searches that return multiple intersections (as

with previous selectors) in time better than O(m)

(where m is the size of ACL

allow

). As we are going to

show, space complexity of this ADT is better than

O(m). This new ADT is called IP Tree. The general

structure of an IP Tree as well as several example

IPs and its corresponding IP Tree are presented in

Figures 6 and 7 respectively.

The IP Tree is formed by four levels (root is not

considered to be a valid level). For each node, 255

children are possible at most (0-254). These children

values of each node (octets) are recursively stored in

a hash table (with a perfect and minimal hash

function). The association <Node octet, Children

nodes> is called an IP Tree node, where children

octets is another hash table of the same type (IP Tree

node, Fig. 6).

Figure 6: IPTree general structure.

As in the other ADTs, no repetition of IPs are

allowed. Leaf nodes maintain the information

regarding the IDs of the rules that share a common

value for an IP address selector. In fact, leaf nodes

does not have a hash table for storing <Node octet,

Children octets> (since they do not have any

children), but a hash table with a perfect hash

function (there are only 30 possible CIDRs) to store

<CIDR, RuleID Bit set>. CIDRs represent the

CIDRs of the inserted rules that ended in that leaf,

and if there are many with same CIDR (i.e. a

repeated value), then bits are set to ‘1’ in the bit set.

Insertions are done traversing the tree from top to

bottom. First, the IP/CIDR address to be inserted is

decomposed in its four natural octets plus the CIDR

value: o1.o2.o3.o4/cidr. Then, the root node hash

table is asked in order to know if o1 is already in the

first level of the IP Tree. If it is, the next step is to

navigate to the second level through the found octet

(using the children hash table). If not, a new IP Tree

node with value o1 is inserted in the root node

children hash table. These same is done for o2, o3,

and o4. Once at the last level, if o4 has been found, a

check is launched for the CIDR data stored in the

leaf <CIDR, Rule ID Bit set> hash table using cidr

value of the IP. If cidr value is found, the bit

corresponding to the ID of the inserted IP is set to

‘1’. If not, a new CIDR value is created with its

corresponding bit set. Thus, the insertion of a new IP

consists, in the worst case, of three O(1) searches in

perfect hash tables, plus a O(1) search in a leaf

perfect hash table, resulting in O(1) worst case time

complexity.

The search operation follows the same scheme as

the insertion one. Note that in order to know if two

IP addresses intersect, the application of the shortest

netmask of the two IP addresses is necessary, as has

been pointed at the beginning of the subsection.

However the result we need is the intersection of one

IP with all IPs in the IP Tree, which contains all the

IPs of the m rules in ACL

allow

. Thus, the application

of all netmasks of the IPs in the IP Tree which are

smaller than or equal the CIDR of the given R

d

IP

address is necessary (at most 30 netmasks). The

result of the application of these netmasks is a set of

(at most) 30 network IPs. Now, a search operation

for each of these IPs is launched. The search

operation follows the same algorithm used for

insertions, but taking the list of rule IDs associated

to the CIDR of the leaf which coincide with the

CIDR used for the search, if a search ends

successfully. The result of the search operation is the

union of all bit sets associated to IP addresses in the

IP Tree which intersect the given IP address of R

d

(e.g. the result of the –at most- 30 searches), or a bit

set with all bits set to ‘1’ if the given IP address of

R

d

is ‘*’. Note that having 30 different netmaks in a

real firewall ACL is not very usual, because this

usually indicates that the firewall is controlling

traffic between 30 different networks, each one

attached to a different physical network interface.

Thus, worst case time complexity of a search

operation of a network address in a network tree is

in O(30*(4*1+1))=O(1). However, in the average

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

49

Figure 7: IPTree example for network addresses.

case, the multiplicative factor 30 of one search

operation can be reduced to 30-h. If a search

operation successfully ended in a leaf l, and l

contains k CIDR values not yet used for a search in

the IPTree list of CIDRs, then these values should

not be used, because if used for searches, they will

lead to the same leaf l, causing a duplicate search.

This reduction of CIDRs can be made each time a

new leaf is visited, thus the sum of these removed

CIDRs is h.

Finally, host search is slightly different. Suppose

that the search operation receives a host IP address

from R

d

. In this case, all CIDRs of the IP tree must

be applied to the host IP, creating at most 30 IP

network address. If the IP tree only contains network

IP addresses, the procedure is the described above

with no modifications at all. However, if the IP Tree

also has host IP addresses, the new network address

created from the application of a CIDR to the host IP

address of R

d

, could also intersect with another host

IP address of the IP tree. This is an important

problem, because the IP address of R

d

cannot

intersect with more than one host IP address of the

tree (the one that is exactly equal to the IP of R

d

),

although it can intersect with many network IP

addresses. This multiple host IP intersection problem

can be solved splitting the IP Tree in two disjoint IP

trees: one to store network ACL

allow

IP addresses and

wildcards, and another one to store ACL

allow

host IP

addresses. The network IP Tree is exactly the

described one (Figs. 6 and 7), but the host IP Tree is

a simplified version, where no CIDR information is

stored in leaves, and where all searches are exact

1..1. In addition, a slightly simplified version of

insert and search methods are necessary for the host

IP Tree. These simplifications are not described here

due to space constraints, but could easily be derived.

4.4 Combination of Search Results

Using the calculated worst case time complexities of

the search operations for the five selectors and, by

the sum of the rule, the combined search time for

five selectors is in worst case

O(1+2*1+2*logm)=O(logm). The first factor is the

time associated to the hash table search (used by

protocol number selector), the second is the two

searches in IP Trees (used by source and destination

IP address selectors), and the last one is the two

searches in interval trees (used by source and

destination port selectors).

The obtained results are five bit sets with

positions set to ‘1’ for intersecting rule IDs.

However, from the inconsistency definitions, all

selectors must overlap for a rule to be inconsistent

with other(s). Thus, the composition of this result is

somewhat trivial: the intersection of the five bit sets.

This intersection gives all between

ddeny

RACL∈ and

all rules in ACL

allow

. This result can directly be used

by the firewall administrator, since no

decomposition has been made to the firewall ACL.

Fig. 8 is an example for

0 deny

RACL∈ , where the

five bit sets resulting from the five searches for R

0

selectors and their intersection are shown. Some of

these results have been presented in previous

figures. In this case the combination step is

necessary because all searches have returned non-

empty bit sets. The returned bit set indicates that R

0

is inconsistent with R

1

and R

2

. Now is the firewall

SECRYPT 2009 - International Conference on Security and Cryptography

50

administrator who decides if these two

inconsistencies are faults or not.

R

1

R

2

R

5

R

6

R

8

R

9

R

10

SrcIP 1 1 1 1 1 1 0

DstIP 1 1 1 1 1 1 1

SrcPort 1 1 1 1 1 1 1

DstPort 1 1 0 0 0 0 1

Protocol 1 1 1 1 0 0 0

Combination 1 1 0 0 0 0 0

Figure 8: Combination step example.

As its name indicates, a bit set is an ADT whose

main purpose is to store bit elements. The

intersection of the five bit sets is a linear time

operation with the size of the bit sets (or the number

of rules in the ADTs, m, reduced by a constant factor

derived from the duplicate removals). However, in

the worst case, no repetitions are considered. Note

that although the problem is linear, logical

operations over bit arrays are very efficient, as they

are instructions that can be executed in one machine

cycle over 128 bit registers (at least) using special

multi-register multimedia instructions. This yields a

severe problem reduction by a big constant, k≥128,

in time (with no space penalty).

Thus, worst case time complexity of the full

process (for the n rules in ACL

deny

), including the

combination operation, is in worst case O(n*(logm +

m/k), n=m=f/2, m/k>logm

Ö

O(n*logm)+O(n*m/k)

Ö

O(f/2*log(f/2))+O(f/2*(f/2)/k),(f/2)/k>log(f/2)

Ö

O(f/2*(f/2)/k)

Ö

O((f

2

/2)/2k)

Ö

O(f

2

/4k), k≥128.

Derived from this analysis is the fact the

complexity is bounded principally by the number of

allow and deny rules (if they are equal, n=m=f/2,

worst case is achieved). However, as it is going to be

shown in the experimental results, worst case ACLs

are really unusual in the real word, where firewalls

usually control traffic between small network

segments with very specific services, and where

multiple firewall configurations are the norm. Thus,

best and average cases are achieved when there a lot

of selector repetitions in ACL

allow

(and thus ADTs

are very small), when n<<m, and when ACL

f

is

consistent (if a selector of a rule of ACL

deny

is

consistent with the same selector of all the rules of

ACL

allow

, then that rule of ACL

deny

is consistent by

definition, no more searches for the rest of selectors

are needed, and thus no combination of search

results is needed). This results in O(n*logm), where

m is very small due to duplicates

Ö

O(n), n<<m.

As has also been shown, the space needed in the

process is linear with the number of rules in ACL

allow

plus some bit sets (the space needed to store the bit

sets is negligible).

As the experimental results will show, this

complexity represents (for the tested real ACLs

(average cases) an algorithm that is up to three

orders of magnitude faster than the trivial O(f

2

), and

one to two orders faster than (Baboescu, 2003),

which is the best known algorithm to date.

Unfortunately, a direct theoretical comparison with

Baboescu ASBV algorithm is not possible, since its

time complexity is provided in number of memory

accesses. The complexity reduction of our algorithm

in the worst case is mainly obtained from the big

multiplicative constant, k, and in the best case is

mainly obtained from the ADTs.

5 EXPERIMENTAL RESULTS

In absence of standard ACLs or synthetic ACL

generators, the algorithms have been tested with real

firewall ACLs (Table 1).

The conducted performance analysis represents a

wide spectrum of cases, with ACLs of sizes ranging

from 50 to 10600 rules, and percentages of allow

and deny rules ranging from 2% to 65%. Recall that

worst case for our proposal is achieved when half

rules are allow and the other half are deny, and

where all rules are inconsistent. Also note that real

ACLs have some important differences with

synthetically generated ones. The most important

one is the number of deny and allow rules: as real

firewall ACLs are usually designed with deny all

default policy, most rules are going to have allow

actions, and thus ACL

allow

will be bigger than

ACL

deny

. The result is that the worst case would not

normally be achieved in real firewall ACLs.

Experiments were performed on a monothreaded

Java implementation with Sun JDK 1.6.0 64-Bit

Server VM, on an isolated HP Proliant 145G2

(AMD Opteron 275 2.2GHz, 2Gb RAM DDR400).

Execution times are in milliseconds.

As is shown in Table 1, execution of the isolation

process (for all rules in ACL

deny

) is really fast, even

in large ACLs. If the difference between the trivial

algorithm and the optimized version proposed in

(Pozo2, 2008) is very big, the difference between the

trivial one and the proposed in this paper is

dramatic, with improvements of up to x3000 for the

trivial, and up to x60 for the optimized trivial (with

the test ACLs).

In the case of Baboescu ASBV algorithm, results

show that his algorithm is 30 times faster than the

trivial one, which also coincide with the conclusions

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

51

Table 1: Performance evaluation.

ACL

Size

%Deny

Rules

No.

Inconsist

Trivial

Isolation

(ms)

Optimized Trivial

Isolation Algorithm

(ms)

Baboescu

Isolation

Algorithm (ms)

Proposed Isolation

Algorithm (ms)

ADT

build (ms)

50 28,21 37 0.22 0.09 0.58 0.03 0.09

144 30,91 108 1.34 0.62 1.50 0.06 0.17

238 66,43 231 3.56 2.04 2.71 0.17 0.22

450 34,73 422 13.22 5.61 5.29 0.26 0.54

900 14,8 871 51.57 3.46 11.11 0.4 1.14

2500 6,97 3349 387.86 55.01 43.12 0.86 3.54

5000 1,98 4937 3160.09 64.33 106.99 1.06 9.02

10611 2,05 11866 12046.67 332.85 476.81 8.31 21.85

Table 2: Number of different elements per selector and per ACL.

ACL

n

Size

Protocol

Hash Table

size

Source Port

ITree size

Dst Port

ITree size

SrcIP Host IP

Tree size

SrcIP NW IP

Tree size

DstIP Host IP

Tree size

DstIP NW IP

Tree size

39 34 4928 2

110 3 11 11 18 4 27 4

143 3 14 17 20 4 30 4

334 3 19 26 29 4 38 4

784 3 19 31 31 4 47 4

2337 3 47 49 76 5 70 5

4903 2 45 50 136 10 142 10

10398 3 86 87 177 25 217 25

Our proposal represents a 10 to 100 times faster

alternative than the current best known one. This

represents a dramatic improvement over other

proposals, specially taken into account that our

algorithm returns results the isolation over the

original, unmodified, ACL, and not over a pre-

processed one (as Baboescu proposal does).

Figure 9 presents a graphic comparison between

the optimized trivial (Pozo2, 2008), Baboescu

ASBV (Baboescu, 2003), and our new algorithms.

The last column in Table 1 presents ADT build

time for all ADTs, showing that they are very

reasonable and amortizable in a few worst-case

searches. Note that once ADTs are built, they need

no modification (unless the ACL changes).

The number of different values per selector and

per ACL is presented in Table 2. Note that if there

are a lot of values of selectors repeated in different

rules of the same ACL, then search times severely

improve. This is especially important for the two

port selectors, since the Interval Tree is the ADT

which has the worst time complexity of all ADTs.

As can be seen in Table 2, the number of repetitions

in the values of the selectors is unsurprisingly high

in real ACLs, even if they are very big. With

sufficiently small ADTs, experimental search times

are near constant (this fact can be seen in Table 1, in

the proposed algorithm column). The algorithms

scale very well. This confirms our assumptions over

real ACLs made at the end of the previous section.

Figure 9: Execution times.

6 CONCLUSIONS

During firewall ACL design and management

inconsistencies can be introduced. An inconsistent

firewall ACL implies in general a design error.

SECRYPT 2009 - International Conference on Security and Cryptography

52

However, the firewall administrator is who

ultimately decides if an inconsistent rule is faulty.

In this paper, we have proposed a new

inconsistency isolation algorithm for firewalls with

five integer (or intervals of integer). Our approach

has been based on an analysis of which data type

each rule selector can to store, on the design of

specialized abstract data types for each one, and on

divide and conquer algorithm. A theoretical

algorithmic complexity as well as an experimental

performance analysis has been made in order to

validate our theoretical results.

Our proposal represents an algorithm that is 10 to

100 times faster then the current best known one.

Furthermore, results are returned over the original,

unmodified ACL in our case, rather than over a

decomposed ACL which is different than the

original one.

However, our approach has some limitations that

give us opportunities for improvement in future

works. A performance analysis of each part ADT of

the algorithm is necessary in order to know where

the bottleneck is now, in order improve even more

the algorithms. Checking the behaviour of the

proposed ADTs in dynamic environments could be

another interesting point, where another comparison

in complexity and memory requirements to

Baboescu algorithm would be a point.

ACKNOWLEDGEMENTS

This work has been partially funded by Spanish

Ministry of Science and Education project under

grant DPI2006-15476-C02-01, and by FEDER

(under ERDF Program).

REFERENCES

Al-Shaer, E., Hamed, H. Modeling and Management of

Firewall Policies. IEEE eTransactions on Network and

Service Management (eTNSM) Vol.1, No.1, 2004.

Baboescu, F., Varguese, G. Fast and Scalable Conflict

Detection for Packet Classifiers. Computers & Networks

Vol.42, No.6, Elsevier 2003.

Bollig, B., Wegener, I. Improving the Variable Ordering of

OBDDs is NP-Complete. IEEE Transactions on

Computers, Vol.45 No.9, September 1996.

Cormen, T., Leiserson, C., Rivest, R., Stein, C. Introduction

to Algorithms, 2nd Ed. McGraw-Hill, 2001.

Chiang, Y., Tamassia, R. Dynamic Algorithms in

Computational Geometry. Technical Report CS-91-24.

Brown University, Providence, RI, USA, 1991.

de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf,

O. Computational Geometry: Algorithms and

Applications. Springer-Verlag, Berling, 1997.

Edelsbrunner, H. A new approach to rectangle intersections,

Part II. International Journal on Computational

Mathematics. Vol.13, pp. 221-229, 1983.

Edelsbrunner2, H. A new approach to rectangle intersections,

Part I. International Journal on Computational

Mathematics. Vol.13, pp. 209-219, 1983.

Eppstein, D., Muthukrishnan, S. Internet Packet Filter

Management and Rectangle Geometry. Proceedings of

the Annual ACM-SIAM Symposium on Discrete

Algorithms (SODA), January 2001.

García-Alfaro, J., Boulahia-Cuppens, N., Cuppens, F.

Complete Analysis of Configuration Rules to Guarantee

Reliable Network Security Policies, Springer-Verlag

International Journal of Information Security. Vol.7,

No.2, 2008.

Gupta, P., McKcown, N. Packet classification on multiple

fields. Proceedings of the ACM SIGCOMM. Cambridge,

MA, USA. September 1999.

Hamed, H., Al-Shaer, E. Taxonomy of Conflicts in Network

Security Policies. IEEE Communications Magazine

Vol.44, No.3, 2006.

Hari, B., Suri, S., Parulkar, G. Detecting and Resolving

Packet Filter Conflicts. Proceedings of IEEE INFOCOM,

March 2000.

Liu, Alex X., Gouda, Mohamed G., "Complete Redundancy

Removal for Packet Classifiers in TCAMs," IEEE

Transactions on Parallel and Distributed Systems, 24

Sept. 2008. IEEE computer Society Digital Library. IEEE

Computer Society.

Luis, S., Condell, M. Security policy protocol. IETF Internet

Draft IPSPSPP-01, 2002.

Pozo1, S., Ceballos, R., Gasca, R.M. Model Based

Development of Firewall Rule Sets: Diagnosing Model

Faults. Information and Software Technology Journal,

No. 51, Issue 5, pp. 894-915. Elsevier, 2009.

Pozo2, S., Ceballos, R., Gasca, R.M.. A Heuristic Polynomial

Algorithm for Local Inconsistecy Diagnosis in Firewall

Rule Sets. 3rd International Conference on Security and

Cryptography (SECRYPT), in International Conference

on e-Business and Telecommunications (ICETE). Porto,

Portugal. INSTICC Press, 2008.

Srinivasan, V., Varguese, G, Suri, S., Waldvogel, M. Fast and

Scalable Layer Four Switching. Proceedings of the ACM

SIGCOMM conference on Applications, Technologies,

Architectures and Protocols for Computer

Communication, Vancouver, British Columbia, Canada,

ACM Press, 1998.

Taylor, David E. Survey and taxonomy of packet

classification techniques. ACM Computing Surveys,

Vol.37, No.3, 2005.

Wool, A. A quantitative study of firewall configuration errors.

IEEE Computer, Vol.37, No.6, 2004.

Yuan, L., Mai, J., Su, Z., Chen, H., Chuah,, C. Mohapatra, P.

FIREMAN: A Toolkit for FIREwall Modelling and

ANalysis. IEEE Symposium on Security and Privacy

(S&P’06). Oakland, CA, USA. May 2006.

EFFICIENT ALGORITHMS AND ABSTRACT DATA TYPES FOR LOCAL INCONSISTENCY ISOLATION IN

FIREWALL ACLS

53