K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in

the K Framework

Md. Imran Alam

, Raju Halder

1,2

, Harshita Goswami

and Jorge Sousa Pinto

Indian Institute of Technology Patna, India

HASLab/INESC TEC & Universidade do Minho, Braga, Portugal

Keywords:

Taint Analysis, K Framework, Information Flow, Security.

Abstract:

The K framework is a rewrite logic-based framework for deﬁning programming language semantics suitable

for formal reasoning about programs and programming languages. In this paper, we present K-Taint, a rewrit-

ing logic-based executable semantics in the K framework for taint analysis of an imperative programming

language. Our K semantics can be seen as a sound approximation of programs semantics in the corresponding

security type domain. More speciﬁcally, as a foundation to this objective, we extend to the case of taint anal-

ysis the semantically sound ﬂow-sensitive security type system by Hunt and Sands’s, considering a support

to the interprocedural analysis as well. With respect to the existing methods, K-Taint supports context- and

ﬂow-sensitive analysis, reduces false alarms, and provides a scalable solution. Experimental evaluation on

several benchmark codes demonstrates encouraging results as an improvement in the precision of the analysis.

1 INTRODUCTION

Taint analysis is a widely used program analysis tech-

nique that aims at averting malicious inputs from

corrupting data values in critical computations of

programs (Huang et al., 2014; Jovanovic et al.,

2006; Tripp et al., 2009). Examples where taint at-

tacks severely compromise security are SQL injec-

tion, cross-site scripting, buffer overﬂow, etc. (Jo-

vanovic et al., 2006). The following code snippet in

Figure 1 depicts one such taint attack where input sup-

plied by a malicious source through the formal param-

eter ‘src’ of the function ‘foo()’ may affect neighbor-

ing cells of the character array ‘buf ’ in the memory.

1. void foo(char

src){

2. char buf[20]; int i=0;

3. while(i<= strlen(src)){

4. buf[i] = src[i]; i + +;}

5. return ;}

Figure 1: An Example Taint Attack.

This way attackers may store some malicious data

into the neighboring cells of ‘buf ’ which may be

accessed by legitimate applications, causing unpre-

dictable behavior.

Static taint analysis approaches, in principle, ana-

lyze the propagation of tainted values from untrusted

sources to security-sensitive sinks along all possible

program paths without actually executing the code

(Cifuentes and Scholz, 2008; Huang et al., 2014; Jo-

vanovic et al., 2006; Tripp et al., 2009). Of course,

due to their sound and conservative nature, they of-

ten over-approximate the analysis results which, al-

though may introduce false positives, however always

establish a security guarantee: tainted data cannot be

passed to security-sensitive operations.

In the context of software security, the integrity

of software systems is treated as a dual of the conﬁ-

dentiality problem (Sabelfeld and Myers, 2006), both

of which can be enforced by controlling information

ﬂows. Works in this direction have been starting with

the pioneer work of Denning and Denning in (Den-

ning and Denning, 1977) which enforces a restrictive

information ﬂow policy deﬁned on a mathematical

lattice-model of security classes partially ordered by

sensitivity levels. Inspired from this, a wide range of

language-based approaches are proposed in the litera-

ture, majority of which focuses on the conﬁdentiality

(Amtoft and Banerjee, 2004; Hunt and Sands, 2006;

Sabelfeld and Myers, 2006; Volpano et al., 1996).

Nevertheless, in the line of taint information ﬂow ad-

dressing software integrity, the existing data-ﬂow and

point-to analysis-based approaches (Jovanovic et al.,

2006; Noundou, 2015; Sridharan et al., 2011; Tripp

Alam, M., Halder, R., Goswami, H. and Sousa Pinto, J.

K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework.

DOI: 10.5220/0006786603590366

In Proceedings of the 13th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2018), pages 359-366

ISBN: 978-989-758-300-1

359

et al., 2009; Livshits and Lam, 2005) basically suf-

fer from false alarms due to ignorance of the control-

ﬂow and the semantics of constant functions. Security

type-system (Foster et al., 2002; Huang et al., 2014)

has emerged independently as a probably most pop-

ular approach to static taint analysis in a competing

manner.

In this paper, as a contribution to the same re-

search line, we put forward a rewriting logic-based

executable semantics for taint analysis in the K frame-

work, considering an extension of Hunt and Sands’s

semantically sound ﬂow-sensitive security type sys-

tem as the basis. The K framework (Ros¸u and

S¸erb

anut

a, 2010) is a rewrite logic-based formal

framework for deﬁning programming languages se-

mantics. Such semantic deﬁnitions are directly ex-

ecutable in a rewriting logic language, e.g. Maude

(Clavel and et al., 2007), thus support a development

of veriﬁcation and analysis tools at no cost.

To summarize, our main contributions are:

• We explore the power of K framework to deﬁne

K-Taint.

• To this aim, we extend the ﬂow-sensitive security

type system proposed by Hunt and Sands’s (Hunt

and Sands, 2006) as the basis.

• We specify K rewrite rules which captures taint

information propagation along all possible pro-

gram paths.

• We enhance our proposed approach in terms of

precision by handling pointer aliasing and con-

stant functions.

• We present experimental evaluation results to es-

tablish the effectiveness of our approach.

The paper is organized as follows: Section 2 discusses

the related works in the literature on static taint anal-

ysis. Section 3 brieﬂy introduces the K framework.

In section 4, we extend to the case of taint analysis

the Hunt and Sands’s security type system. Sections

5 and 6 present the executable rewriting logic seman-

tics in K designed for taint analysis. Section 7 deﬁnes

the semantics rules to handle pointer aliases and con-

stant functions. The experimental evaluation results

are reported in section 8. Finally, section 9 concludes

our work.

2 RELATED WORKS

Although many language-based information ﬂow ap-

proaches addressing conﬁdentiality exist in the liter-

ature (Sabelfeld and Myers, 2006; Hunt and Sands,

2006; Volpano et al., 1996), this section restricts the

discussions only to the static taint approaches in the

same line. Works on taint analysis, as a dual of conﬁ-

dentiality, include security type systems (Foster et al.,

2002; Huang et al., 2014), ﬂow-analysis (Evans and

Larochelle, 2002; Jovanovic et al., 2006; Noundou,

2015; Scholz et al., 2008; Tripp et al., 2009), point-

to analysis (Livshits and Lam, 2005; Tripp et al.,

2009), etc. The ﬂow-sensitivity in CQual (Foster

et al., 2002) is triggered by specifying manually a

partial order conﬁguration on security qualiﬁers. Un-

fortunately, CQual is unable to support implicit ﬂow-

sensitivity in presence of branches. On the other

hand, SFlow (Huang et al., 2014), a type-based taint

analyzer for Java Web applications, performs type

judgement based on calling context viewpoint adap-

tion without actually ﬂowing the context information

through the called function body, which may often

result false alarms. Like CQual, the SFlow also for-

goes the implicit ﬂow. As alternative solutions, taint

analysis attracts many proposals on data-ﬂow analy-

sis (Jovanovic et al., 2006; Noundou, 2015; Sridharan

et al., 2011; Tripp et al., 2009) and point-to analysis

(Livshits and Lam, 2005; Tripp et al., 2009). Unfor-

tunately, given the ignorance of control dependencies,

these techniques are unable to capture indirect inﬂu-

ence of taint information on other variables due to

implicit-ﬂow. Although the authors in (Cifuentes and

Scholz, 2008; Corin and Manzano, 2012; Evans and

Larochelle, 2002; Scholz et al., 2008) have consid-

ered both data- and control-dependencies, these ap-

proaches fail to address false positives in presence

of constant functions, such as x := 0 × x, x := y − y,

etc. A summary of the state-of-the-art tools and tech-

niques in the line of static taint analysis only, as com-

pared with K-Taint, is given in Table 1.

3 THE K FRAMEWORK

The K framework provides a rewrite logic-based

framework suitable for design and analysis of pro-

gramming languages. Inspired by rewrite-logic se-

mantics project (Meseguer and Ros¸u, 2007), this

framework uniﬁes algebraic denotational semantics

and operational semantics by considering them as two

different view over the same object.

To deﬁne semantics of programming language

constructs, the K framework mainly relies on conﬁg-

uration and K rewrite rules. Conﬁguration speciﬁes

the structure of the abstract machine on which pro-

grams written in that language will run and this is

represented as labeled nested cells (i.e., List, Map,

Bag, Set, etc.). For example, consider the following

conﬁguration with three cells:

ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

360

Table 1: A Comparative Summary (X denotes partially successful at this stage).

K-Taint

Pixy

Taintgrind

SAINT

TAJ

Splint

Parfait

SFlow

CQual

KLEE

Semantics/Security

Type System

X 7 7 7 X 7 X X X X

Explicit Flow

X X X X X X X X X X

Implicit Flow

X 7 7 7 7 X X 7 7 X

Constant Functions

X 7 7 7 7 7 7 7 7 7

Flow-Sensitivity

X X X X X X X 7 X X

Context-Sensitivity

X X 7 X X 7 X X 7 X

Language

Supported

Imperative (including C-like

syntax)

PHP C C Java C C Java C C

con f iguration ≡ hhKi

hMap[Var 7→ Loc]i

env

hMap[Loc 7→ Val]i

store

The k cell holds a list of computational tasks, that is k :

List{K, y} where K holds computational contents

such as programs or fragment of programs and y is

the task sequentialization operator which sequential-

izes program statements. The env cell maps variables

to their locations (i.e., env : Var 7→ Loc) and the store

cell maps locations to values (i.e., store : Loc 7→ Val).

These cells are covered by the top cell denoted by T .

K rewrite rules are classiﬁed into two types: com-

putational rules, that may be interpreted as transition

in a program execution, and structural rules, that re-

arrange a term to enable the application of computa-

tional rule. For better understanding, let us consider

the following rule, considering two cells k and env,

for ﬁnding address of a variable:

...i

h.. .Y 7→ L .. .i

env

This speciﬁes that the next task to evaluate is a refer-

ence operator (&) on the variable Y , which results the

location L in the memory based on the match in the

environment cell env.

In the K framework, a language syntax is given us-

ing conventional BNF notation annotated with seman-

tics attributes which enforces the evaluation strategy

of the construct. For example, consider the following

deﬁnition for arithmetic expression:

syntax E ::= E

"+" E

[strict]

The attribute strict allows E

and E

to evaluate in

any order, thus enforces a non-determinism. The an-

notation above corresponds to the following four heat-

ing/cooling rules:

+ E

y  + E

...i

| h

+ E

y  + E

...i

y  + E

+ E

...i

| h

y E

+ 

+ V

...i

Here, V

and V

are the evaluated results of the ex-

pressions E

and E

respectively. The construct 

(HOLE) is a place-holder that will be replaced by the

result of the evaluated term or sub-term.

4 EXTENDING HUNT AND

SANDS’S TYPE SYSTEM TO

TAINT ANALYSIS

In this section, we deﬁne a type-based taint analy-

sis for imperative programming language supporting

functions, arrays, pointers, etc. Table 2 depicts the

abstract syntax of the basic language under consider-

ation, where D and E denote respectively a sequence

of declarations hτ id

, τ id

,. .. i and a sequence of

arithmetic expressions hE

, E

,. .. i respectively.

Table 2: Abstract Syntax of the Language.

E ::= n | id | &id |

∗

E | id[E] | E op E | (E), where op ∈ {+,−, ×, /}

where rel ∈ {>,6, <,>, ==}

D ::= τ id

A ::= id := E |

∗

E := E | id[E] := E | id := read()

C ::= skip; | D; | A; | defun id(

D){C} | call id(

E); | return; | return E; |

| if B then {C} | if(B) then {C

} else {C

} | while(B) do {C}

Our work mainly motivated by the security type

system proposed in (Hunt and Sands, 2006), which is

primarily proposed to detect possible leakage of sen-

sitive information from programs. Unlike other sim-

ilar type systems, (Hunt and Sands, 2006) is featured

with ﬂow-sensitivity. We extend this ﬂow-sensitive

type system for the purposes of our taint analysis with

an additional support to the context-sensitivity in case

of inter-procedural code, leading to a signiﬁcant im-

provement in the precision. This is depicted in Fig-

ure 2. Although our proposal is scalable enough, we

consider, for the sake of simplicity, a simple instance

of the problem involving two security types: taint

and untaint. We will work with the ﬂow semi-join

lattice of the type domain as SD = hS,v,ti, where

S = {taint,untaint} and the partial order relation de-

ﬁned as untaint v taint.

The typing judgements are of the form pc ` Γ

{C} Γ

, where pc ∈ S represents the security con-

text used to address implicit ﬂow, C is the program

K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework

361

[Expression]

Γ ` E : t

x∈FV(E)

Γ(x)

[skip]

pc ` Γ{skip}Γ

[Declaration]

pc ` Γ {τ id} Γ[[id 7→ pc t untaint]]

[Read]

pc ` Γ {id = read()} Γ[[id 7→ pc t taint]]

[Assignment]

Γ ` E : T

pc ` Γ {id = E} Γ[[id 7→ pc t T]]

[Function

Call]

Γ `

E :

de f un id(

D){C}

X = getParam(

Γ[[

X 7→

T]] ≡ Γ

pc ` Γ

{C} Γ

pc ` Γ

{de f un id(

D){C}} Γ

pc ` Γ {call id(

E)} Γ

[if]

Γ ` B : T pc t T ` Γ{C}Γ

pc ` Γ {i f B then C} Γt Γ

[if-else]

Γ ` B : T pc t T ` Γ{C

}Γ

pc t T ` Γ{C

}Γ

pc ` Γ {i f B then {C

} else {C

}} Γ

t Γ

[while]

` B : T

pc t T

` Γ

{C}Γ

0 ≤ i ≤ k

= Γ Γ

i+1

= Γ

t Γ Γ

k+1

= Γ

pc ` Γ {while B do {C}} Γ

Figure 2: Flow- and Context-sensitive Security Type Rules

for Taint Analysis.

statements, and Γ, Γ

: Variables → S are environ-

ments. The security type T of expression E (denoted

Γ ` E : T) is deﬁned simply by the least upper bound

of the types of all free variables (FV) in E, where t

represents the join operation in the security lattice SD.

The typing rules ensure that for any given C, Γ, and

pc there is an environment Γ

such that pc ` Γ {C} Γ

is derivable. We use the notation Γ `

E :

T to denote

the sequence of type judgements hΓ ` E

: T

, Γ `

: T

,. .. i. Similarly, Γ[[

id 7→

T]] denotes a sequence

of type substitutions hΓ[[id

7→ T

]], Γ[[id

7→ T

]], . . . i.

Observe that reading inputs from unsanitized sources

through read() always makes the corresponding vari-

ables tainted. The rule for function calls ensures the

context-sensitivity in the system, where getParam()

extracts formal parameters from the function deﬁni-

tion. The analyzer associates security types with pro-

gram constructs treating source variables as tainted,

and then propagates their types along the program

code to determine application’s security. The ﬂow

sensitive typing rules in case of branching statements

leverage the lattice-based operations on the security

domain, resulting into conservative analysis results.

5 K SPECIFICATION OF

SECURITY TYPE SYSTEM: A

ROADMAP

This section provides a roadmap to specify K rewrite

rules corresponding to the typing rules depicted in

Figure 2. Let us consider the typing judgement pc `

Γ{C}Γ

which speciﬁes that the security environment

is derived by executing the statement C on the secu-

rity environment Γ under the program’s security con-

text pc. To capture this, let us give algebraic rep-

resentations of Γ, Γ

, C and pc in K by deﬁning a

conﬁguration consisting of three cells – k cell to con-

tain program statements as a sequence of computa-

tions, env cell to hold the security levels of program

variables and context cell to capture current program

context pc in the security type domain – as follows:



hKi

hMapi

env

hMapi

context



The corresponding K rewrite rule capturing the

type judgement pc ` Γ{C}Γ

is speciﬁed as:

.. .i

env

hpc 7→ i

context

The symbol “. . . ” appearing in the k cell represents

remaining computations. As a result of the execution

of C which eventually be consumed (denoted by dot),

the previous environment Γ in the env cell will be up-

dated by the modiﬁed environment Γ

(implicitly) in-

ﬂuenced by the current value (denoted by ) of the

security context pc in the context cell.

Similarly, the derivation rule Γ ` E : T speci-

ﬁed as h. . . E 7→ T . . . i

env

means that expression E

has the security type T somewhere in the environ-

ment env. Each security type rule is written based

on a number of premise judgements Γ

` ζ

above a

horizontal line, with a single conclusion judgement

Γ ` ζ below the line. For example, given the type

rule

Γ ` M : Nat Γ ` N : Nat

M + N : Nat

, the corresponding K rule is

deﬁned as: h

M + N

M +

Nat

.. .i

h.. . M 7→ Nat,N 7→ Nat . . . i

env

where

M: Nat, N: Nat, and +

Nat

: Nat × Nat 7→ Nat. Hav-

ing this setting as foundation, in the next section we

specify K rewrite rules for static taint analysis of im-

perative language in the abstract security type domain

6 K REWRITING LOGIC

SEMANTICS FOR TAINT

ANALYSIS

This section introduces an executable rewriting logic

semantics in the K framework for taint analysis of our

language under consideration. As mentioned earlier,

our semantics can be seen as a sound semantics ap-

proximation in the security type domain.

To this aim, we consider the following K model-

ing of the program conﬁguration on which the seman-

tics is deﬁned:

conﬁguration ≡



hKi

hMapi

env

hMapi

context

h hMap i

λ-De f

hListi

f stack

control

hListi

out

h hMapi

alias

hSeti

ptr

ptr-alias



As mentioned earlier, the special cell hi

contains

the list of computation tasks of a special sort K sepa-

rated by the associative sequentialization operator y.

ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

362

The environment cell env maps variables (including

pointers variables) to their security types. The cur-

rent program context pc over the security domain is

captured in context cell. The λ-Def cell supports in-

terprocedural feature holding the bindings of function

names (when deﬁned) to their lambda abstraction. All

the function calls are controlled by control cell main-

taining a stack-based context switching using fstack

cell. The cells in and out are used to perform stan-

dard input-output operations. To avoid false nega-

tives in the analysis-results, we consider ptr-alias cell

which maintains pointer aliasing information in alias

cell. The ptr cell is aimed to separate pointer variables

from other variables to assist the alias analysis.

Figure 3 depicts the semantics rewrite rules for

taint analysis in the K deﬁnitional framework. We la-

bel the deﬁned rules by R

for future reference. These

rules captures both the explicit and implicit ﬂow sen-

sitivity, the context-sensitivity in presence of func-

tion calls, the semantics of constant functions, pointer

aliases, etc. Let us explain these rules in detail.

Declaration, Input, Lookup and Assignment:

The ﬁrst rule (R

)

decl

deals with variables declara-

tions and initialization of variables by their initial se-

curity types (untaint in our case) in the environment

cell env. Any unsanitized input gets its type tainted

in the rule (R

)

read

. The lookup rule (R

)

lookup

re-

places the variable term appearing on top of k cell

by its security type by looking into the environment

cell. Note that the look up rule for constant terms,

although we do not mention here, always returns un-

taint. As deﬁned in rule (R

)

ar-op

, the security types

of expressions are sound approximated by least upper

bound (deﬁned in rule (R

)

join

) of their component-

terms security types. Rule (R

)

asg

which handles as-

signment computations, updates the security type of

id somewhere in the env cell by the least upper bound

of the security types of the right hand side expression

(i.e. T) and program’s current security context pc in

the context cell. The assignment is then replaced by

an empty computation.

Conditional or Iteration: The presence of condi-

tion B in simple if - or while-statement gives rise to

the following two: (i) implicit ﬂow of taint informa-

tion based on the security type of B, and (ii) multiple

execution paths with the possibility of entering into

the if - or while-block. The former is achieved by up-

dating the security context µ in the context cell based

on the security type of B and the later is achieved

by following restore

(µ) and approx(ρ). These are

depicted in rules (R

)

, (R

)

while

, (R

)

restore

and

)

approx

)

decl

: h

τ id

.. . i

ρ[id ← T : Type]

env

)

read

: h

read( )

taint

.. . i

)

lookup

: h

T : Type

.. . i

h . .. id 7→ T : Type .. . i

env

)

ar-op

: h

: Type op T

: Type

: Type t T

: Type

.. . i

)

asg

: h

id := T : Type

.. . i

h.. . ρ[id 7→

µ(pc) t T : Type

]

.. . i

env

hµi

context

)

: h

i f (B : T) then {C}

C y restore

(µ) y approx(ρ)

.. . i

µ[pc ← µ(pc) t T]

context

hρi

env

)

if-else

i f (B : T) then {C

} else {C

}

y exitIf() y restore

env

(ρ) y C

y exitElse() y restore

(µ)

.. . i

hρi

env

µ[pc ← µ(pc) t T]

context

)

exit-if

: h

exitIf()

save(ρ)

.. . i

hρi

env

)

exit-else

: h

exitElse()

approx(save(ρ))

.. . i

)

while

while(B : T) do {C}

C y restore

(µ) y approx(ρ) y fixpoint(B,C, ρ)

.. .i

hρi

env

µ[pc ← µ(pc) t T]

context

)

fun-decl

de f un F unc name(Params){C}

.. . i



ψ[Func name ← lambda(Params,C)]

λ-De f



control

)

fun-lookup

call Func name(Es : T s)

lambda(Params,C)(Es : Ts)

.. . i



h.. . Func name 7→ lambda(Params,

C) . . .i

λ-De f



control

)

fun-call

lambda(Params,C)(Es : Ts) y K

McDecls(Params, T s) y C y return;

.. . i



.List

[ListItem(ρ, K,Ctr)]

.. . i

f stack

Ctr



control

hρi

env

)

fun-ret

return(T : Type); y

T : Type y K

.. . i



[ListItem(ρ, K,Ctr)]

.List

.. . i

f stack

(

−

Ctr

)



control

−

env

)

join

: Type t T

: Type . ..i











: Type t T

: Type

untaint

.. . i

, i f T

= T

= untaint

: Type t T

: Type

taint

.. . i

, otherwise

)

restore

: h

restore

(µ)

.. .i

context

)

approx

: h

approx(ρ)

.. .i

ρ t ρ

env

Figure 3: K rewrite rules for executable semantics-based

taint analysis.

Speciﬁcally, restore

(µ) restores the previous con-

text on exiting a block guarded by B and approx(ρ)

provides a sound approximation of the semantics as a

least upper bound of the environments obtained over

all possible execution paths due to the presence of B.

Observe that the least ﬁxed point solution in case of

“while” is achieved by deﬁning an auxiliary function

ﬁxpoint() as follows: either (1) h

ﬁxpoint(B,C, ρ

)

.. .i

hρ

env

when ρ

= ρ

, or (2) h

ﬁxpoint(B,C, ρ

)

while(B) do {C}

.. .i

hρ

env

when ρ

. Note that the ﬁrst case indicates that the computa-

tion reaches the ﬁx-point and therefore the computa-

tion is consumed. If not, then the iteration continues

as shown in the second case.

The soundness of the analysis in presence of if-

K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework

363

else is guaranteed by approximating the analysis-

results from both the branches C

and C

(a may-

analysis), as depicted in rule (R

)

if-else

, (R

)

exit-if

and (R

)

exit-else

. Observe that both the branches

are executed over the same environment (using

restore

env

(ρ) which restores environment and is de-

ﬁned similar to the rule (R

)

restore

) which occurs at

the entry point of if-else.

Dealing with Functions: We specify the rules

)

fun-decl

, (R

)

fun-lookup

, (R

)

fun-call

, and

)

fun-ret

to handle interprocedural feature in

our analysis. For each function deﬁnition, the rule

)

fun-decl

creates a lambda abstraction binding it

to the function name in the h i

λ-De f

cell. Coming

across a function call, the rule (R

)

fun-lookup

replaces

this function call by its lambda abstraction. We

use a helper function McDecls() which recursively

extracts the formal parameters in the called function

and assigns to them the security types of the actual

parameters in the calling function, as shown below:

McDecls((param, params), (Type, Types))

param := Type; y McDecls(params, Types)

.. .i

Note that the function McDecls() enforces the con-

text sensitivity by treating same function call with dif-

ferent parameters differently. As usual, McDecls()

is followed by a sequence of computations C in the

function body and then by a return statement. When

a function returns the result by explicitly mention-

ing it as “return E” statement, the rule (R

)

fun-ret

is applied which returns the security type of the re-

sultant expression and restores the previous context

to start the execution of remaining tasks speciﬁed as

[ListItem(ρ, K,Ctr)]

.List

.. .i

f stack

7 DEALING WITH POINTERS

ALIASING AND CONSTANT

FUNCTIONS

The rules deﬁned for implicit ﬂow in Figure 3 are un-

sound in presence of pointers. More precisely, given

an assignment computation id := E, the correctness

of the analysis is established by ensuring the update

of the security type not only for id but also for all of

its aliases by the security type of E. To handle this

scenario, the nested cells alias and ptr are designed

to store the alias information and the set of pointer

variables. The semantics rules are depicted in Fig-

ure 4. In case of a simple assignment id := E when

id is not a pointer variable, the rule (R

10a

)

alias

trig-

gers the update of the security type of id and its di-

rect pointers identiﬁed in the alias cell by the security

type of E. As a consequence of it, the rule (R

10b

)

alias

then performs the same update action to all of its in-

direct pointers as well. The reason behind this is to

ensure that all pointers which are pointing, directly

or indirectly, to a taint value must be tainted, lead-

ing to a sound analysis. Similarly, rules (R

10c

)

alias

10d

)

alias

, and (R

10e

)

alias

refer to the assignment of

security types to pointer variables and the creation

of new alias information in the alias cell. This is to

note that the author in (As

avoae, 2014) integrated the

alias analysis in K as an instantiation of the collecting

semantics where alias information can be extracted

from the alias cell on demand-driven way. Our ap-

proach follows the same line, but in a much simpler

way without considering an exhaustive execution in

worst case scenario.

10a

)

alias

: h

id := E : T

id := T y P := T

.. .i



h.. . P 7→ PointsTo(id) . .. i

alias

hηi

ptr



ptr-alias

hρi

env

when P ∈ η

10b

)

alias

: h

P := T

R := T

.. .i



h.. . R 7→ PointsTo(P) . .. i

alias

hηi

ptr



ptr-alias

h.. . P 7→

.. .i

env

when P ∈ η

10c

)

alias

: h

P := &Q : T

P := T

.. .i



h ξ[P 7→

PointsTo(Q)

] i

alias

hηi

ptr



ptr-alias

when P ∈ η

10d

)

alias

: h

P := Q : T

P := T

.. .i



h.. . Q 7→ PointsTo(S) . .. P 7→

PointsTo(S)

.. .i

alias

hηi

ptr



ptr-alias

when P ∈ η

10e

)

alias

: h

P :=

∗

Q : T

P := T

.. .i



h.. . Q 7→ PointsTo(S) . .. S 7→

PointsTo(M) . .. P 7→

PointsTo(M)

.. .i

alias

hηi

ptr



ptr-alias

when P ∈ η

)

con-func

: hid

∗ id

.. .i











∗ id

untaint

.. .i

when id

= zero

or id

= zero

∗ id

∗

Type

.. .i

otherwise

Figure 4: K rules for pointer aliasing and constant func-

tions.

Apart from this, capturing the semantics of con-

stant functions has a signiﬁcant impact on the pre-

cision of taint analysis. For example, consider the

statement v := x × 0 + 4, where x is a tainted vari-

able. It is worthwhile to observe that, although the

syntax-based taint ﬂow makes the variable v tainted,

the semantics of the constant function “x × 0 +4” that

always results 4 irrespective of the value of x makes

v untainted. The semantics approximation in the se-

curity domain, due to the abstraction, leads to a chal-

lenge in dealing with constant functions. As a par-

tial solution, we specify rules for some simple cases

of constant functions such as x - x, x xor x, x × 0,

etc. We mention one of such rules in (R

)

con-func

In this context, as a notable observation, we con-

sider the following scenario: given the code fragment

ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

364

Table 3: Taint Analysis on Benchmark Programs Set (SecuriBench, 2006; Cavallaro et al., 2008; Vogt et al., 2007; Evans

et al., 2003; Russo and Sabelfeld, 2010) (X: Passed, 7

: False Positives, 7

−

: False negatives).

Progs. Descriptions K-Taint

Splint

(Evans and Larochelle, 2002)

Pixy

(Jovanovic et al., 2006)

SFlow

(Huang et al., 2014)

CQual

(Foster et al., 2002)

Prog1

Explicit Flow

X X X X X

Prog2

Implicit Flow

X 7

−

Prog3

Malware Attack

X 7

−

Prog4

XSS Attack

X 7

−

Prog5

Buffer Overﬂow

X X 7

−

Prog6

Constant Function “subtraction”

X 7

Prog7

Program consists of multiple functions

X 7

−

, 7

−

X 7

−

Prog8

Program with context-sensitivity

X 7

−

, 7

X X 7

Prog9

Factorial Program

X 7

−

Prog10

Binary Search

−

Prog11

Merge Sort

−

Prog12

Program with ﬂow-sensitivity

X 7

−

X 7

−

Prog13

Swapping of two numbers using pointers

X X X X 7

−

y := read(); x := y; v := x xor y, the analysis success-

fully marks the variable v as tainted. Indeed, attack-

ers may inject some malicious input containing a vul-

nerable control part for which the xor operation fails

to nullify the effect, affecting the subsequent critical

computation involving v.

We end this section stating the fundamental results

on K-Taint. We skip the proofs for brevity.

Theorem 1 (Soundness). The semantics deﬁned in

the K-Taint is a sound approximation of the concrete

collecting semantics with respect to variables security

properties.

Theorem 2 (Termination) . Any execution in the

K-Taint is always ﬁnite.

Consider the security type domain S of n security

levels with order relation v. Given s

∈ S, s

v s

denotes that s

is more trusted than s

. For example,

untaint v taint.

Deﬁnition 1 (s

-indistinguishability). Let X be the

set of program variables participating in critical com-

putations of a program P. Let s

∈ S be the per-

missible security level for critical computations in P,

meaning that any variable in X with security level

s v s

can securely participate in the critical com-

putations. Given two type environments ρ

and ρ

we say that they are s

-indistinguishable (denoted

hρ

env

≈

hρ

env

) iff ∀x ∈ X. ρ

(x) v s

∧ρ

(x) v s

meaning that they agree on the sensitivity levels for

security-sensitive variables.

Theorem 3 (Non-interference). Given any two type

environments ρ

and ρ

such that hρ

env

≈

hρ

env

A program P is secure iff K-executions of P on the

above two environments result into the environments

hρ

env

and hρ

env

respectively satisfying hρ

env

≈

hρ

env

8 EXPERIMENTAL ANALYSIS

We have implemented the full set of semantics rules

(more than 200 rules) in the K tool (version 4.0)

for our imperative language under consideration and

performed experiments on a set of benchmark codes

collected from (SecuriBench, 2006; Cavallaro et al.,

2008; Evans et al., 2003; Russo and Sabelfeld, 2010;

Vogt et al., 2007) and on some well-known pro-

grams

1,2

. A wide range of representative programs

are considered, including explicit ﬂow, implicit ﬂow

due to conditional or iteration, XSS attacks, malware

attacks, merge sort, binary search, factorial, constant

functions, etc. Since K-Taint supports C-like lan-

guage, it accepts the benchmark C-codes ﬁles as input

from the console using K Framework-speciﬁc com-

mands. The evaluation results are shown in Table

3. The results of K-Taint are compared with the re-

sults obtained from some of the available static taint

analysis tools, such as Splint (Evans and Larochelle,

2002), Pixy (Jovanovic et al., 2006), SFlow (Huang

et al., 2014), and CQual (Foster et al., 2002), are re-

ported in columns 3-7. The notations ‘7

’ and ‘7

−

’

indicate failures due to false positives and false nega-

tives respectively, whereas ‘X’ indicates a success-

ful detection of taint vulnerabilities. Observe that,

due to the ﬂow-sensitivity, context-sensitivity and the

enhancement to deal with constant functions, K-Taint

signiﬁcantly reduces the occurrences of false alarms.

The authors in (Cavallaro et al., 2008), (Russo and

Sabelfeld, 2010) and (Vogt et al., 2007) highlighted

some special cases where their approaches fail. We

consider those special cases (shown as Prog3, Prog4

and Prog12 in Table 3), and observed that K-Taint suc-

cessfully captures those taint ﬂows.

The online version of the K tool is available at

http://www.kframework.org/tool/run/

The full set of semantics rules in K-taint and the evalu-

ation results on the test codes are available for download at

www.iitp.ac.in/∼halder/ktaint

K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework

365

9 CONCLUSION

This paper presented an executable rewriting logic se-

mantics for static taint analysis of an imperative pro-

gramming language in the K framework. The pro-

posed approach has improved precision with respect

to the existing techniques, as shown by our experi-

mental evaluation on a set of well-known benchmark

programs. We made the full set of semantics rules

and the experimental data available for download. We

are currently investigating how to integrate in the pro-

posed analyzer a preprocessing phase which allows

to address speciﬁc cases where exact variables values

may improve the precision. We consider in our fu-

ture endeavor more semantic rules to cover more lan-

guage features as an extension to the current imper-

ative language and we also address more semantics-

based non-dependencies.

ACKNOWLEDGEMENT

This work is partially supported by the research grant

(SB/FTP/ETA-315/2013) from the Science and Engi-

neering Research Board (SERB), Department of Sci-

ence and Technology, Government of India.

REFERENCES

Amtoft, T. and Banerjee, A. (2004). Information ﬂow anal-

ysis in logical form. In SAS, volume 3148, pages 100–

115. Springer.

avoae, I. M. (2014). Abstract semantics for alias anal-

ysis in k. Electronic Notes in Theoretical Computer

Science, 304:97–110.

Cavallaro, L., Saxena, P., and Sekar, R. (2008). On the lim-

its of information ﬂow techniques for malware anal-

ysis and containment. In Proc. of Int. Conf. on De-

tection of Intrusions and Malware, and Vulnerability

Assessment, pages 143–163. Springer.

Cifuentes, C. and Scholz, B. (2008). Parfait: designing a

scalable bug checker. In Proc. of the 2008 workshop

on Static analysis, pages 4–11. ACM.

Clavel, M. and et al. (2007). All about maude-a high-

performance logical framework: how to specify, pro-

gram and verify systems in rewriting logic. Springer-

Verlag.

Corin, R. and Manzano, F. A. (2012). Taint analysis of se-

curity code in the klee symbolic execution engine. In

ICICS, pages 264–275. Springer.

Denning, D. E. and Denning, P. J. (1977). Certiﬁcation of

programs for secure information ﬂow. Communica-

tions of the ACM, 20(7):504–513.

Evans, D. and Larochelle, D. (2002). Improving security

using extensible lightweight static analysis. IEEE soft-

ware, 19(1):42–51.

Evans, D., Larochelle, D., and Evans, D.

(2003). Splint manual: Version 3.1.1-1.

http://lclint.cs.virginia.edu/manual/manual.html.

Foster, J. S. et al. (2002). Cqual user’s guide. University of

California, Berkeley, version 0.9 edition.

Huang, W., Dong, Y., and Milanova, A. (2014). Type-based

taint analysis for java web applications. In In Proc.

of Int. Conf. on Fundamental Approaches to Software

Engineering, pages 140–154. Springer.

Hunt, S. and Sands, D. (2006). On ﬂow-sensitive security

types. In Conf. Record of the 33rd ACM SIGPLAN-

SIGACT Sym. on POPL, pages 79–90, S. California.

ACM.

Jovanovic, N., Kruegel, C., and Kirda, E. (2006). Pixy: A

static analysis tool for detecting web application vul-

nerabilities. In IEEE Symposium on Security and Pri-

vacy (S&P’06), pages pp. 258–263. IEEE. IEEE.

Livshits, V. B. and Lam, M. S. (2005). Finding security vul-

nerabilities in java applications with static analysis. In

USENIX Security Symposium, volume 14, pages 18–

18.

Meseguer, J. and Ros¸u, G. (2007). The rewriting logic

semantics project. Theoretical Computer Science,

373(3):213–237.

Noundou, X. N. (2015). Saint: Simple

static taint analysis tool users manual.

https://archive.org/details/saint 201507.

Ros¸u, G. and S¸erb

anut

a, T. F. (2010). An overview of the k

semantic framework. The Journal of Logic and Alge-

braic Programming, 79(6):397–434.

Russo, A. and Sabelfeld, A. (2010). Dynamic vs. static

ﬂow-sensitive security analysis. In 23rd IEEE Com-

puter Security Foundations Symposium, pages 186–

199. IEEE.

Sabelfeld, A. and Myers, A. C. (2006). Language-based

information-ﬂow security. IEEE Journal on selected

areas in communications, 21(1):5–19.

Scholz, B., Zhang, C., and Cifuentes, C. (2008). User-input

dependence analysis via graph reachability. Techni-

cal Report SMLI TR-2008-171, Mountain View, CA,

USA.

SecuriBench (2006). Stanford securibench micro.

http://suif.stanford.edu/∼livshits/work/securibench-

micro/.

Sridharan, M., Artzi, S., Pistoia, M., Guarnieri, S., Tripp,

O., and Berg, R. (2011). F4f: taint analysis of

framework-based web applications. ACM SIGPLAN

Notices, 46(10):1053–1068.

Tripp, O., Pistoia, M., Fink, S. J., Sridharan, M., and Weis-

man, O. (2009). Taj: effective taint analysis of web ap-

plications. In ACM Sigplan Notices, volume 44, pages

87–97. ACM.

Vogt, P., Nentwich, F., Jovanovic, N., Kirda, E., Kruegel,

C., and Vigna, G. (2007). Cross site scripting preven-

tion with dynamic data tainting and static analysis. In

NDSS, volume 2007, page 12.

Volpano, D., Irvine, C., and Smith, G. (1996). A sound type

system for secure ﬂow analysis. J. Comput. Secur.,

4(2-3):167–187.

ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

366