K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in
the K Framework
Md. Imran Alam
1
, Raju Halder
1,2
, Harshita Goswami
1
and Jorge Sousa Pinto
2
1
Indian Institute of Technology Patna, India
2
HASLab/INESC TEC & Universidade do Minho, Braga, Portugal
Keywords:
Taint Analysis, K Framework, Information Flow, Security.
Abstract:
The K framework is a rewrite logic-based framework for defining programming language semantics suitable
for formal reasoning about programs and programming languages. In this paper, we present K-Taint, a rewrit-
ing logic-based executable semantics in the K framework for taint analysis of an imperative programming
language. Our K semantics can be seen as a sound approximation of programs semantics in the corresponding
security type domain. More specifically, as a foundation to this objective, we extend to the case of taint anal-
ysis the semantically sound flow-sensitive security type system by Hunt and Sands’s, considering a support
to the interprocedural analysis as well. With respect to the existing methods, K-Taint supports context- and
flow-sensitive analysis, reduces false alarms, and provides a scalable solution. Experimental evaluation on
several benchmark codes demonstrates encouraging results as an improvement in the precision of the analysis.
1 INTRODUCTION
Taint analysis is a widely used program analysis tech-
nique that aims at averting malicious inputs from
corrupting data values in critical computations of
programs (Huang et al., 2014; Jovanovic et al.,
2006; Tripp et al., 2009). Examples where taint at-
tacks severely compromise security are SQL injec-
tion, cross-site scripting, buffer overflow, etc. (Jo-
vanovic et al., 2006). The following code snippet in
Figure 1 depicts one such taint attack where input sup-
plied by a malicious source through the formal param-
eter ‘src’ of the function foo()’ may affect neighbor-
ing cells of the character array buf in the memory.
1. void foo(char
*
src){
2. char buf[20]; int i=0;
3. while(i<= strlen(src)){
4. buf[i] = src[i]; i + +;}
5. return ;}
Figure 1: An Example Taint Attack.
This way attackers may store some malicious data
into the neighboring cells of buf which may be
accessed by legitimate applications, causing unpre-
dictable behavior.
Static taint analysis approaches, in principle, ana-
lyze the propagation of tainted values from untrusted
sources to security-sensitive sinks along all possible
program paths without actually executing the code
(Cifuentes and Scholz, 2008; Huang et al., 2014; Jo-
vanovic et al., 2006; Tripp et al., 2009). Of course,
due to their sound and conservative nature, they of-
ten over-approximate the analysis results which, al-
though may introduce false positives, however always
establish a security guarantee: tainted data cannot be
passed to security-sensitive operations.
In the context of software security, the integrity
of software systems is treated as a dual of the confi-
dentiality problem (Sabelfeld and Myers, 2006), both
of which can be enforced by controlling information
flows. Works in this direction have been starting with
the pioneer work of Denning and Denning in (Den-
ning and Denning, 1977) which enforces a restrictive
information flow policy defined on a mathematical
lattice-model of security classes partially ordered by
sensitivity levels. Inspired from this, a wide range of
language-based approaches are proposed in the litera-
ture, majority of which focuses on the confidentiality
(Amtoft and Banerjee, 2004; Hunt and Sands, 2006;
Sabelfeld and Myers, 2006; Volpano et al., 1996).
Nevertheless, in the line of taint information flow ad-
dressing software integrity, the existing data-flow and
point-to analysis-based approaches (Jovanovic et al.,
2006; Noundou, 2015; Sridharan et al., 2011; Tripp
Alam, M., Halder, R., Goswami, H. and Sousa Pinto, J.
K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework.
DOI: 10.5220/0006786603590366
In Proceedings of the 13th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2018), pages 359-366
ISBN: 978-989-758-300-1
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
359
et al., 2009; Livshits and Lam, 2005) basically suf-
fer from false alarms due to ignorance of the control-
flow and the semantics of constant functions. Security
type-system (Foster et al., 2002; Huang et al., 2014)
has emerged independently as a probably most pop-
ular approach to static taint analysis in a competing
manner.
In this paper, as a contribution to the same re-
search line, we put forward a rewriting logic-based
executable semantics for taint analysis in the K frame-
work, considering an extension of Hunt and Sands’s
semantically sound flow-sensitive security type sys-
tem as the basis. The K framework (Ros¸u and
S¸erb
˘
anut
˘
a, 2010) is a rewrite logic-based formal
framework for defining programming languages se-
mantics. Such semantic definitions are directly ex-
ecutable in a rewriting logic language, e.g. Maude
(Clavel and et al., 2007), thus support a development
of verification and analysis tools at no cost.
To summarize, our main contributions are:
We explore the power of K framework to define
K-Taint.
To this aim, we extend the flow-sensitive security
type system proposed by Hunt and Sands’s (Hunt
and Sands, 2006) as the basis.
We specify K rewrite rules which captures taint
information propagation along all possible pro-
gram paths.
We enhance our proposed approach in terms of
precision by handling pointer aliasing and con-
stant functions.
We present experimental evaluation results to es-
tablish the effectiveness of our approach.
The paper is organized as follows: Section 2 discusses
the related works in the literature on static taint anal-
ysis. Section 3 briefly introduces the K framework.
In section 4, we extend to the case of taint analysis
the Hunt and Sands’s security type system. Sections
5 and 6 present the executable rewriting logic seman-
tics in K designed for taint analysis. Section 7 defines
the semantics rules to handle pointer aliases and con-
stant functions. The experimental evaluation results
are reported in section 8. Finally, section 9 concludes
our work.
2 RELATED WORKS
Although many language-based information flow ap-
proaches addressing confidentiality exist in the liter-
ature (Sabelfeld and Myers, 2006; Hunt and Sands,
2006; Volpano et al., 1996), this section restricts the
discussions only to the static taint approaches in the
same line. Works on taint analysis, as a dual of confi-
dentiality, include security type systems (Foster et al.,
2002; Huang et al., 2014), flow-analysis (Evans and
Larochelle, 2002; Jovanovic et al., 2006; Noundou,
2015; Scholz et al., 2008; Tripp et al., 2009), point-
to analysis (Livshits and Lam, 2005; Tripp et al.,
2009), etc. The flow-sensitivity in CQual (Foster
et al., 2002) is triggered by specifying manually a
partial order configuration on security qualifiers. Un-
fortunately, CQual is unable to support implicit flow-
sensitivity in presence of branches. On the other
hand, SFlow (Huang et al., 2014), a type-based taint
analyzer for Java Web applications, performs type
judgement based on calling context viewpoint adap-
tion without actually flowing the context information
through the called function body, which may often
result false alarms. Like CQual, the SFlow also for-
goes the implicit flow. As alternative solutions, taint
analysis attracts many proposals on data-flow analy-
sis (Jovanovic et al., 2006; Noundou, 2015; Sridharan
et al., 2011; Tripp et al., 2009) and point-to analysis
(Livshits and Lam, 2005; Tripp et al., 2009). Unfor-
tunately, given the ignorance of control dependencies,
these techniques are unable to capture indirect influ-
ence of taint information on other variables due to
implicit-flow. Although the authors in (Cifuentes and
Scholz, 2008; Corin and Manzano, 2012; Evans and
Larochelle, 2002; Scholz et al., 2008) have consid-
ered both data- and control-dependencies, these ap-
proaches fail to address false positives in presence
of constant functions, such as x := 0 × x, x := y y,
etc. A summary of the state-of-the-art tools and tech-
niques in the line of static taint analysis only, as com-
pared with K-Taint, is given in Table 1.
3 THE K FRAMEWORK
The K framework provides a rewrite logic-based
framework suitable for design and analysis of pro-
gramming languages. Inspired by rewrite-logic se-
mantics project (Meseguer and Ros¸u, 2007), this
framework unifies algebraic denotational semantics
and operational semantics by considering them as two
different view over the same object.
To define semantics of programming language
constructs, the K framework mainly relies on config-
uration and K rewrite rules. Configuration specifies
the structure of the abstract machine on which pro-
grams written in that language will run and this is
represented as labeled nested cells (i.e., List, Map,
Bag, Set, etc.). For example, consider the following
configuration with three cells:
ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering
360
Table 1: A Comparative Summary (X denotes partially successful at this stage).
K-Taint
Pixy
Taintgrind
SAINT
TAJ
Splint
Parfait
SFlow
CQual
KLEE
Semantics/Security
Type System
X 7 7 7 X 7 X X X X
Explicit Flow
X X X X X X X X X X
Implicit Flow
X 7 7 7 7 X X 7 7 X
Constant Functions
X 7 7 7 7 7 7 7 7 7
Flow-Sensitivity
X X X X X X X 7 X X
Context-Sensitivity
X X 7 X X 7 X X 7 X
Language
Supported
Imperative (including C-like
syntax)
PHP C C Java C C Java C C
con f iguration hhKi
k
hMap[Var 7→ Loc]i
env
hMap[Loc 7→ Val]i
store
i
T
The k cell holds a list of computational tasks, that is k :
List{K, y} where K holds computational contents
such as programs or fragment of programs and y is
the task sequentialization operator which sequential-
izes program statements. The env cell maps variables
to their locations (i.e., env : Var 7→ Loc) and the store
cell maps locations to values (i.e., store : Loc 7→ Val).
These cells are covered by the top cell denoted by T .
K rewrite rules are classified into two types: com-
putational rules, that may be interpreted as transition
in a program execution, and structural rules, that re-
arrange a term to enable the application of computa-
tional rule. For better understanding, let us consider
the following rule, considering two cells k and env,
for finding address of a variable:
h
&Y
L
...i
k
h.. .Y 7→ L .. .i
env
This specifies that the next task to evaluate is a refer-
ence operator (&) on the variable Y , which results the
location L in the memory based on the match in the
environment cell env.
In the K framework, a language syntax is given us-
ing conventional BNF notation annotated with seman-
tics attributes which enforces the evaluation strategy
of the construct. For example, consider the following
definition for arithmetic expression:
syntax E ::= E
1
"+" E
2
[strict]
The attribute strict allows E
1
and E
2
to evaluate in
any order, thus enforces a non-determinism. The an-
notation above corresponds to the following four heat-
ing/cooling rules:
h
E
1
+ E
2
E
1
y + E
2
...i
k
| h
E
1
+ E
2
E
2
y + E
1
...i
k
h
V
1
y + E
2
V
1
+ E
2
...i
k
| h
V
2
y E
1
+
E
1
+ V
2
...i
k
Here, V
1
and V
2
are the evaluated results of the ex-
pressions E
1
and E
2
respectively. The construct
(HOLE) is a place-holder that will be replaced by the
result of the evaluated term or sub-term.
4 EXTENDING HUNT AND
SANDS’S TYPE SYSTEM TO
TAINT ANALYSIS
In this section, we define a type-based taint analy-
sis for imperative programming language supporting
functions, arrays, pointers, etc. Table 2 depicts the
abstract syntax of the basic language under consider-
ation, where D and E denote respectively a sequence
of declarations hτ id
1
, τ id
2
,. .. i and a sequence of
arithmetic expressions hE
1
, E
2
,. .. i respectively.
Table 2: Abstract Syntax of the Language.
E ::= n | id | &id |
E | id[E] | E op E | (E), where op {+,, ×, /}
B ::= true | false | E rel E | ¬B | B AND B | B OR B,
where rel {>,6, <,>, ==}
τ ::= int | f loat | char | bool | τ[n] | τ*
D ::= τ id
A ::= id := E |
E := E | id[E] := E | id := read()
C ::= skip; | D; | A; | defun id(
~
D){C} | call id(
~
E); | return; | return E; |
C
1
C
2
| if B then {C} | if(B) then {C
1
} else {C
2
} | while(B) do {C}
Our work mainly motivated by the security type
system proposed in (Hunt and Sands, 2006), which is
primarily proposed to detect possible leakage of sen-
sitive information from programs. Unlike other sim-
ilar type systems, (Hunt and Sands, 2006) is featured
with flow-sensitivity. We extend this flow-sensitive
type system for the purposes of our taint analysis with
an additional support to the context-sensitivity in case
of inter-procedural code, leading to a significant im-
provement in the precision. This is depicted in Fig-
ure 2. Although our proposal is scalable enough, we
consider, for the sake of simplicity, a simple instance
of the problem involving two security types: taint
and untaint. We will work with the flow semi-join
lattice of the type domain as SD = hS,v,ti, where
S = {taint,untaint} and the partial order relation de-
fined as untaint v taint.
The typing judgements are of the form pc ` Γ
{C} Γ
0
, where pc S represents the security con-
text used to address implicit flow, C is the program
K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework
361
[Expression]
Γ ` E : t
xFV(E)
Γ(x)
[skip]
pc ` Γ{skip}Γ
[Declaration]
pc ` Γ {τ id} Γ[[id 7→ pc t untaint]]
[Read]
pc ` Γ {id = read()} Γ[[id 7→ pc t taint]]
[Assignment]
Γ ` E : T
pc ` Γ {id = E} Γ[[id 7→ pc t T]]
[Function
Call]
Γ `
~
E :
~
T
de f un id(
~
D){C}
~
X = getParam(
~
D)
Γ[[
~
X 7→
~
T]] Γ
0
pc ` Γ
0
{C} Γ
00
pc ` Γ
0
{de f un id(
~
D){C}} Γ
00
pc ` Γ {call id(
~
E)} Γ
00
[if]
Γ ` B : T pc t T ` Γ{C}Γ
0
pc ` Γ {i f B then C} Γt Γ
0
[if-else]
Γ ` B : T pc t T ` Γ{C
1
}Γ
0
pc t T ` Γ{C
2
}Γ
00
pc ` Γ {i f B then {C
1
} else {C
2
}} Γ
0
t Γ
00
[while]
Γ
0
i
` B : T
i
pc t T
i
` Γ
0
i
{C}Γ
00
i
0 i k
Γ
0
0
= Γ Γ
0
i+1
= Γ
00
i
t Γ Γ
0
k+1
= Γ
0
k
pc ` Γ {while B do {C}} Γ
0
k
Figure 2: Flow- and Context-sensitive Security Type Rules
for Taint Analysis.
statements, and Γ, Γ
0
: Variables S are environ-
ments. The security type T of expression E (denoted
Γ ` E : T) is defined simply by the least upper bound
of the types of all free variables (FV) in E, where t
represents the join operation in the security lattice SD.
The typing rules ensure that for any given C, Γ, and
pc there is an environment Γ
0
such that pc ` Γ {C} Γ
0
is derivable. We use the notation Γ `
~
E :
~
T to denote
the sequence of type judgements hΓ ` E
1
: T
1
, Γ `
E
2
: T
2
,. .. i. Similarly, Γ[[
~
id 7→
~
T]] denotes a sequence
of type substitutions hΓ[[id
1
7→ T
1
]], Γ[[id
2
7→ T
2
]], . . . i.
Observe that reading inputs from unsanitized sources
through read() always makes the corresponding vari-
ables tainted. The rule for function calls ensures the
context-sensitivity in the system, where getParam()
extracts formal parameters from the function defini-
tion. The analyzer associates security types with pro-
gram constructs treating source variables as tainted,
and then propagates their types along the program
code to determine application’s security. The flow
sensitive typing rules in case of branching statements
leverage the lattice-based operations on the security
domain, resulting into conservative analysis results.
5 K SPECIFICATION OF
SECURITY TYPE SYSTEM: A
ROADMAP
This section provides a roadmap to specify K rewrite
rules corresponding to the typing rules depicted in
Figure 2. Let us consider the typing judgement pc `
Γ{C}Γ
0
which specifies that the security environment
Γ
0
is derived by executing the statement C on the secu-
rity environment Γ under the program’s security con-
text pc. To capture this, let us give algebraic rep-
resentations of Γ, Γ
0
, C and pc in K by defining a
configuration consisting of three cells – k cell to con-
tain program statements as a sequence of computa-
tions, env cell to hold the security levels of program
variables and context cell to capture current program
context pc in the security type domain as follows:
hKi
k
hMapi
env
hMapi
context
T
.
The corresponding K rewrite rule capturing the
type judgement pc ` Γ{C}Γ
0
is specified as:
h
C
.
.. .i
k
h
Γ
Γ
0
i
env
hpc 7→ i
context
The symbol “. . . appearing in the k cell represents
remaining computations. As a result of the execution
of C which eventually be consumed (denoted by dot),
the previous environment Γ in the env cell will be up-
dated by the modified environment Γ
0
(implicitly) in-
fluenced by the current value (denoted by ) of the
security context pc in the context cell.
Similarly, the derivation rule Γ ` E : T speci-
fied as h. . . E 7→ T . . . i
env
means that expression E
has the security type T somewhere in the environ-
ment env. Each security type rule is written based
on a number of premise judgements Γ
i
` ζ
i
above a
horizontal line, with a single conclusion judgement
Γ ` ζ below the line. For example, given the type
rule
Γ ` M : Nat Γ ` N : Nat
M + N : Nat
, the corresponding K rule is
defined as: h
M + N
M +
Nat
N
.. .i
k
h.. . M 7→ Nat,N 7→ Nat . . . i
env
where
M: Nat, N: Nat, and +
Nat
: Nat × Nat 7→ Nat. Hav-
ing this setting as foundation, in the next section we
specify K rewrite rules for static taint analysis of im-
perative language in the abstract security type domain
S.
6 K REWRITING LOGIC
SEMANTICS FOR TAINT
ANALYSIS
This section introduces an executable rewriting logic
semantics in the K framework for taint analysis of our
language under consideration. As mentioned earlier,
our semantics can be seen as a sound semantics ap-
proximation in the security type domain.
To this aim, we consider the following K model-
ing of the program configuration on which the seman-
tics is defined:
configuration
hKi
k
hMapi
env
hMapi
context
h hMap i
λ-De f
hListi
f stack
i
control
hListi
in
hListi
out
h hMapi
alias
hSeti
ptr
i
ptr-alias
T
As mentioned earlier, the special cell hi
k
contains
the list of computation tasks of a special sort K sepa-
rated by the associative sequentialization operator y.
ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering
362
The environment cell env maps variables (including
pointers variables) to their security types. The cur-
rent program context pc over the security domain is
captured in context cell. The λ-Def cell supports in-
terprocedural feature holding the bindings of function
names (when defined) to their lambda abstraction. All
the function calls are controlled by control cell main-
taining a stack-based context switching using fstack
cell. The cells in and out are used to perform stan-
dard input-output operations. To avoid false nega-
tives in the analysis-results, we consider ptr-alias cell
which maintains pointer aliasing information in alias
cell. The ptr cell is aimed to separate pointer variables
from other variables to assist the alias analysis.
Figure 3 depicts the semantics rewrite rules for
taint analysis in the K definitional framework. We la-
bel the defined rules by R
-
for future reference. These
rules captures both the explicit and implicit flow sen-
sitivity, the context-sensitivity in presence of func-
tion calls, the semantics of constant functions, pointer
aliases, etc. Let us explain these rules in detail.
Declaration, Input, Lookup and Assignment:
The first rule (R
1a
)
decl
deals with variables declara-
tions and initialization of variables by their initial se-
curity types (untaint in our case) in the environment
cell env. Any unsanitized input gets its type tainted
in the rule (R
1b
)
read
. The lookup rule (R
2
)
lookup
re-
places the variable term appearing on top of k cell
by its security type by looking into the environment
cell. Note that the look up rule for constant terms,
although we do not mention here, always returns un-
taint. As defined in rule (R
3a
)
ar-op
, the security types
of expressions are sound approximated by least upper
bound (defined in rule (R
8
)
join
) of their component-
terms security types. Rule (R
3b
)
asg
which handles as-
signment computations, updates the security type of
id somewhere in the env cell by the least upper bound
of the security types of the right hand side expression
(i.e. T) and program’s current security context pc in
the context cell. The assignment is then replaced by
an empty computation.
Conditional or Iteration: The presence of condi-
tion B in simple if - or while-statement gives rise to
the following two: (i) implicit flow of taint informa-
tion based on the security type of B, and (ii) multiple
execution paths with the possibility of entering into
the if - or while-block. The former is achieved by up-
dating the security context µ in the context cell based
on the security type of B and the later is achieved
by following restore
c
(µ) and approx(ρ). These are
depicted in rules (R
4
)
if
, (R
6
)
while
, (R
9a
)
restore
and
(R
9b
)
approx
.
(R
1a
)
decl
: h
τ id
.
.. . i
k
h
ρ
ρ[id T : Type]
i
env
(R
1b
)
read
: h
read( )
taint
.. . i
k
(R
2
)
lookup
: h
id
T : Type
.. . i
k
h . .. id 7→ T : Type .. . i
env
(R
3a
)
ar-op
: h
T
1
: Type op T
2
: Type
T
1
: Type t T
2
: Type
.. . i
k
(R
3b
)
asg
: h
id := T : Type
.
.. . i
k
h.. . ρ[id 7→
µ(pc) t T : Type
]
.. . i
env
hµi
context
(R
4
)
if
: h
i f (B : T) then {C}
C y restore
c
(µ) y approx(ρ)
.. . i
k
h
µ
µ[pc µ(pc) t T]
i
context
hρi
env
(R
5a
)
if-else
:
h
i f (B : T) then {C
1
} else {C
2
}
C
1
y exitIf() y restore
env
(ρ) y C
2
y exitElse() y restore
c
(µ)
.. . i
k
hρi
env
h
µ
µ[pc µ(pc) t T]
i
context
(R
5b
)
exit-if
: h
exitIf()
save(ρ)
.. . i
k
hρi
env
(R
5c
)
exit-else
: h
exitElse()
approx(save(ρ))
.. . i
k
(R
6
)
while
:
h
while(B : T) do {C}
C y restore
c
(µ) y approx(ρ) y fixpoint(B,C, ρ)
.. .i
k
hρi
env
h
µ
µ[pc µ(pc) t T]
i
context
(R
7a
)
fun-decl
:
h
de f un F unc name(Params){C}
.
.. . i
k
h
ψ
ψ[Func name lambda(Params,C)]
i
λ-De f
control
(R
7b
)
fun-lookup
:
h
call Func name(Es : T s)
lambda(Params,C)(Es : Ts)
.. . i
k
h.. . Func name 7→ lambda(Params,
C) . . .i
λ-De f
control
(R
7c
)
fun-call
:
h
lambda(Params,C)(Es : Ts) y K
McDecls(Params, T s) y C y return;
.. . i
k
h
.List
[ListItem(ρ, K,Ctr)]
.. . i
f stack
Ctr
control
hρi
env
(R
7d
)
fun-ret
:
h
return(T : Type); y
T : Type y K
.. . i
k
h
[ListItem(ρ, K,Ctr)]
.List
.. . i
f stack
(
Ctr
)
control
h
ρ
i
env
(R
8
)
join
:
hT
1
: Type t T
2
: Type . ..i
k
=
h
T
1
: Type t T
2
: Type
untaint
.. . i
k
, i f T
1
= T
2
= untaint
h
T
1
: Type t T
2
: Type
taint
.. . i
k
, otherwise
(R
9a
)
restore
: h
restore
c
(µ)
.
.. .i
k
h
µ
i
context
(R
9b
)
approx
: h
approx(ρ)
.
.. .i
k
h
ρ
c
ρ t ρ
c
i
env
Figure 3: K rewrite rules for executable semantics-based
taint analysis.
Specifically, restore
c
(µ) restores the previous con-
text on exiting a block guarded by B and approx(ρ)
provides a sound approximation of the semantics as a
least upper bound of the environments obtained over
all possible execution paths due to the presence of B.
Observe that the least fixed point solution in case of
while is achieved by defining an auxiliary function
fixpoint() as follows: either (1) h
fixpoint(B,C, ρ
i
)
.
.. .i
k
hρ
0
i
i
env
when ρ
i
= ρ
0
i
, or (2) h
fixpoint(B,C, ρ
i
)
while(B) do {C}
.. .i
k
hρ
0
i
i
env
when ρ
i
6=
ρ
0
i
. Note that the first case indicates that the computa-
tion reaches the fix-point and therefore the computa-
tion is consumed. If not, then the iteration continues
as shown in the second case.
The soundness of the analysis in presence of if-
K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework
363
else is guaranteed by approximating the analysis-
results from both the branches C
1
and C
2
(a may-
analysis), as depicted in rule (R
5a
)
if-else
, (R
5b
)
exit-if
and (R
5c
)
exit-else
. Observe that both the branches
are executed over the same environment (using
restore
env
(ρ) which restores environment and is de-
fined similar to the rule (R
9a
)
restore
) which occurs at
the entry point of if-else.
Dealing with Functions: We specify the rules
(R
7a
)
fun-decl
, (R
7b
)
fun-lookup
, (R
7c
)
fun-call
, and
(R
7d
)
fun-ret
to handle interprocedural feature in
our analysis. For each function definition, the rule
(R
7a
)
fun-decl
creates a lambda abstraction binding it
to the function name in the h i
λ-De f
cell. Coming
across a function call, the rule (R
7b
)
fun-lookup
replaces
this function call by its lambda abstraction. We
use a helper function McDecls() which recursively
extracts the formal parameters in the called function
and assigns to them the security types of the actual
parameters in the calling function, as shown below:
h
McDecls((param, params), (Type, Types))
param := Type; y McDecls(params, Types)
.. .i
k
Note that the function McDecls() enforces the con-
text sensitivity by treating same function call with dif-
ferent parameters differently. As usual, McDecls()
is followed by a sequence of computations C in the
function body and then by a return statement. When
a function returns the result by explicitly mention-
ing it as return E statement, the rule (R
7d
)
fun-ret
is applied which returns the security type of the re-
sultant expression and restores the previous context
to start the execution of remaining tasks specified as
h
[ListItem(ρ, K,Ctr)]
.List
.. .i
f stack
.
7 DEALING WITH POINTERS
ALIASING AND CONSTANT
FUNCTIONS
The rules defined for implicit flow in Figure 3 are un-
sound in presence of pointers. More precisely, given
an assignment computation id := E, the correctness
of the analysis is established by ensuring the update
of the security type not only for id but also for all of
its aliases by the security type of E. To handle this
scenario, the nested cells alias and ptr are designed
to store the alias information and the set of pointer
variables. The semantics rules are depicted in Fig-
ure 4. In case of a simple assignment id := E when
id is not a pointer variable, the rule (R
10a
)
alias
trig-
gers the update of the security type of id and its di-
rect pointers identified in the alias cell by the security
type of E. As a consequence of it, the rule (R
10b
)
alias
then performs the same update action to all of its in-
direct pointers as well. The reason behind this is to
ensure that all pointers which are pointing, directly
or indirectly, to a taint value must be tainted, lead-
ing to a sound analysis. Similarly, rules (R
10c
)
alias
,
(R
10d
)
alias
, and (R
10e
)
alias
refer to the assignment of
security types to pointer variables and the creation
of new alias information in the alias cell. This is to
note that the author in (As
˘
avoae, 2014) integrated the
alias analysis in K as an instantiation of the collecting
semantics where alias information can be extracted
from the alias cell on demand-driven way. Our ap-
proach follows the same line, but in a much simpler
way without considering an exhaustive execution in
worst case scenario.
(R
10a
)
alias
: h
id := E : T
id := T y P := T
.. .i
k
h.. . P 7→ PointsTo(id) . .. i
alias
hηi
ptr
ptr-alias
hρi
env
when P η
(R
10b
)
alias
: h
P := T
R := T
.. .i
k
h.. . R 7→ PointsTo(P) . .. i
alias
hηi
ptr
ptr-alias
h.. . P 7→
T
.. .i
env
when P η
(R
10c
)
alias
: h
P := &Q : T
P := T
.. .i
k
h ξ[P 7→
PointsTo(Q)
] i
alias
hηi
ptr
ptr-alias
when P η
(R
10d
)
alias
: h
P := Q : T
P := T
.. .i
k
h.. . Q 7→ PointsTo(S) . .. P 7→
PointsTo(S)
.. .i
alias
hηi
ptr
ptr-alias
when P η
(R
10e
)
alias
: h
P :=
Q : T
P := T
.. .i
k
h.. . Q 7→ PointsTo(S) . .. S 7→
PointsTo(M) . .. P 7→
PointsTo(M)
.. .i
alias
hηi
ptr
ptr-alias
when P η
(R
11
)
con-func
: hid
1
id
2
.. .i
k
=
h
id
1
id
2
untaint
.. .i
k
when id
1
= zero
or id
2
= zero
h
id
1
id
2
id
1
Type
id
2
.. .i
k
otherwise
Figure 4: K rules for pointer aliasing and constant func-
tions.
Apart from this, capturing the semantics of con-
stant functions has a significant impact on the pre-
cision of taint analysis. For example, consider the
statement v := x × 0 + 4, where x is a tainted vari-
able. It is worthwhile to observe that, although the
syntax-based taint flow makes the variable v tainted,
the semantics of the constant function “x × 0 +4” that
always results 4 irrespective of the value of x makes
v untainted. The semantics approximation in the se-
curity domain, due to the abstraction, leads to a chal-
lenge in dealing with constant functions. As a par-
tial solution, we specify rules for some simple cases
of constant functions such as x - x, x xor x, x × 0,
etc. We mention one of such rules in (R
11
)
con-func
.
In this context, as a notable observation, we con-
sider the following scenario: given the code fragment
ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering
364
Table 3: Taint Analysis on Benchmark Programs Set (SecuriBench, 2006; Cavallaro et al., 2008; Vogt et al., 2007; Evans
et al., 2003; Russo and Sabelfeld, 2010) (X: Passed, 7
+
: False Positives, 7
: False negatives).
Progs. Descriptions K-Taint
Splint
(Evans and Larochelle, 2002)
Pixy
(Jovanovic et al., 2006)
SFlow
(Huang et al., 2014)
CQual
(Foster et al., 2002)
Prog1
Explicit Flow
X X X X X
Prog2
Implicit Flow
X 7
7
7
7
Prog3
Malware Attack
X 7
7
7
7
Prog4
XSS Attack
X 7
7
7
7
Prog5
Buffer Overflow
7
+
X X 7
+
,7
7
Prog6
Constant Function “subtraction
X 7
+
7
+
7
+
7
+
Prog7
Program consists of multiple functions
X 7
, 7
+
7
X 7
Prog8
Program with context-sensitivity
X 7
, 7
+
X X 7
+
Prog9
Factorial Program
X 7
7
7
7
Prog10
Binary Search
7
+
7
7
7
7
Prog11
Merge Sort
7
+
7
7
7
7
Prog12
Program with flow-sensitivity
X 7
X 7
7
Prog13
Swapping of two numbers using pointers
X X X X 7
y := read(); x := y; v := x xor y, the analysis success-
fully marks the variable v as tainted. Indeed, attack-
ers may inject some malicious input containing a vul-
nerable control part for which the xor operation fails
to nullify the effect, affecting the subsequent critical
computation involving v.
We end this section stating the fundamental results
on K-Taint. We skip the proofs for brevity.
Theorem 1 (Soundness). The semantics defined in
the K-Taint is a sound approximation of the concrete
collecting semantics with respect to variables security
properties.
Theorem 2 (Termination) . Any execution in the
K-Taint is always finite.
Consider the security type domain S of n security
levels with order relation v. Given s
i
,s
j
S, s
i
v s
j
denotes that s
i
is more trusted than s
j
. For example,
untaint v taint.
Definition 1 (s
t
-indistinguishability). Let X be the
set of program variables participating in critical com-
putations of a program P. Let s
t
S be the per-
missible security level for critical computations in P,
meaning that any variable in X with security level
s v s
t
can securely participate in the critical com-
putations. Given two type environments ρ
1
and ρ
2
,
we say that they are s
t
-indistinguishable (denoted
hρ
1
i
env
s
t
hρ
2
i
env
) iff x X. ρ
1
(x) v s
t
ρ
2
(x) v s
t
,
meaning that they agree on the sensitivity levels for
security-sensitive variables.
Theorem 3 (Non-interference). Given any two type
environments ρ
1
and ρ
2
such that hρ
1
i
env
s
t
hρ
2
i
env
.
A program P is secure iff K-executions of P on the
above two environments result into the environments
hρ
0
1
i
env
and hρ
0
2
i
env
respectively satisfying hρ
0
1
i
env
s
t
hρ
0
2
i
env
.
8 EXPERIMENTAL ANALYSIS
We have implemented the full set of semantics rules
(more than 200 rules) in the K tool (version 4.0)
for our imperative language under consideration and
performed experiments on a set of benchmark codes
collected from (SecuriBench, 2006; Cavallaro et al.,
2008; Evans et al., 2003; Russo and Sabelfeld, 2010;
Vogt et al., 2007) and on some well-known pro-
grams
1,2
. A wide range of representative programs
are considered, including explicit flow, implicit flow
due to conditional or iteration, XSS attacks, malware
attacks, merge sort, binary search, factorial, constant
functions, etc. Since K-Taint supports C-like lan-
guage, it accepts the benchmark C-codes files as input
from the console using K Framework-specific com-
mands. The evaluation results are shown in Table
3. The results of K-Taint are compared with the re-
sults obtained from some of the available static taint
analysis tools, such as Splint (Evans and Larochelle,
2002), Pixy (Jovanovic et al., 2006), SFlow (Huang
et al., 2014), and CQual (Foster et al., 2002), are re-
ported in columns 3-7. The notations 7
+
and 7
indicate failures due to false positives and false nega-
tives respectively, whereas X indicates a success-
ful detection of taint vulnerabilities. Observe that,
due to the flow-sensitivity, context-sensitivity and the
enhancement to deal with constant functions, K-Taint
significantly reduces the occurrences of false alarms.
The authors in (Cavallaro et al., 2008), (Russo and
Sabelfeld, 2010) and (Vogt et al., 2007) highlighted
some special cases where their approaches fail. We
consider those special cases (shown as Prog3, Prog4
and Prog12 in Table 3), and observed that K-Taint suc-
cessfully captures those taint flows.
1
The online version of the K tool is available at
http://www.kframework.org/tool/run/
2
The full set of semantics rules in K-taint and the evalu-
ation results on the test codes are available for download at
www.iitp.ac.in/halder/ktaint
K-Taint: An Executable Rewriting Logic Semantics for Taint Analysis in the K Framework
365
9 CONCLUSION
This paper presented an executable rewriting logic se-
mantics for static taint analysis of an imperative pro-
gramming language in the K framework. The pro-
posed approach has improved precision with respect
to the existing techniques, as shown by our experi-
mental evaluation on a set of well-known benchmark
programs. We made the full set of semantics rules
and the experimental data available for download. We
are currently investigating how to integrate in the pro-
posed analyzer a preprocessing phase which allows
to address specific cases where exact variables values
may improve the precision. We consider in our fu-
ture endeavor more semantic rules to cover more lan-
guage features as an extension to the current imper-
ative language and we also address more semantics-
based non-dependencies.
ACKNOWLEDGEMENT
This work is partially supported by the research grant
(SB/FTP/ETA-315/2013) from the Science and Engi-
neering Research Board (SERB), Department of Sci-
ence and Technology, Government of India.
REFERENCES
Amtoft, T. and Banerjee, A. (2004). Information flow anal-
ysis in logical form. In SAS, volume 3148, pages 100–
115. Springer.
As
˘
avoae, I. M. (2014). Abstract semantics for alias anal-
ysis in k. Electronic Notes in Theoretical Computer
Science, 304:97–110.
Cavallaro, L., Saxena, P., and Sekar, R. (2008). On the lim-
its of information flow techniques for malware anal-
ysis and containment. In Proc. of Int. Conf. on De-
tection of Intrusions and Malware, and Vulnerability
Assessment, pages 143–163. Springer.
Cifuentes, C. and Scholz, B. (2008). Parfait: designing a
scalable bug checker. In Proc. of the 2008 workshop
on Static analysis, pages 4–11. ACM.
Clavel, M. and et al. (2007). All about maude-a high-
performance logical framework: how to specify, pro-
gram and verify systems in rewriting logic. Springer-
Verlag.
Corin, R. and Manzano, F. A. (2012). Taint analysis of se-
curity code in the klee symbolic execution engine. In
ICICS, pages 264–275. Springer.
Denning, D. E. and Denning, P. J. (1977). Certification of
programs for secure information flow. Communica-
tions of the ACM, 20(7):504–513.
Evans, D. and Larochelle, D. (2002). Improving security
using extensible lightweight static analysis. IEEE soft-
ware, 19(1):42–51.
Evans, D., Larochelle, D., and Evans, D.
(2003). Splint manual: Version 3.1.1-1.
http://lclint.cs.virginia.edu/manual/manual.html.
Foster, J. S. et al. (2002). Cqual user’s guide. University of
California, Berkeley, version 0.9 edition.
Huang, W., Dong, Y., and Milanova, A. (2014). Type-based
taint analysis for java web applications. In In Proc.
of Int. Conf. on Fundamental Approaches to Software
Engineering, pages 140–154. Springer.
Hunt, S. and Sands, D. (2006). On flow-sensitive security
types. In Conf. Record of the 33rd ACM SIGPLAN-
SIGACT Sym. on POPL, pages 79–90, S. California.
ACM.
Jovanovic, N., Kruegel, C., and Kirda, E. (2006). Pixy: A
static analysis tool for detecting web application vul-
nerabilities. In IEEE Symposium on Security and Pri-
vacy (S&P’06), pages pp. 258–263. IEEE. IEEE.
Livshits, V. B. and Lam, M. S. (2005). Finding security vul-
nerabilities in java applications with static analysis. In
USENIX Security Symposium, volume 14, pages 18–
18.
Meseguer, J. and Ros¸u, G. (2007). The rewriting logic
semantics project. Theoretical Computer Science,
373(3):213–237.
Noundou, X. N. (2015). Saint: Simple
static taint analysis tool users manual.
https://archive.org/details/saint 201507.
Ros¸u, G. and S¸erb
˘
anut
˘
a, T. F. (2010). An overview of the k
semantic framework. The Journal of Logic and Alge-
braic Programming, 79(6):397–434.
Russo, A. and Sabelfeld, A. (2010). Dynamic vs. static
flow-sensitive security analysis. In 23rd IEEE Com-
puter Security Foundations Symposium, pages 186–
199. IEEE.
Sabelfeld, A. and Myers, A. C. (2006). Language-based
information-flow security. IEEE Journal on selected
areas in communications, 21(1):5–19.
Scholz, B., Zhang, C., and Cifuentes, C. (2008). User-input
dependence analysis via graph reachability. Techni-
cal Report SMLI TR-2008-171, Mountain View, CA,
USA.
SecuriBench (2006). Stanford securibench micro.
http://suif.stanford.edu/livshits/work/securibench-
micro/.
Sridharan, M., Artzi, S., Pistoia, M., Guarnieri, S., Tripp,
O., and Berg, R. (2011). F4f: taint analysis of
framework-based web applications. ACM SIGPLAN
Notices, 46(10):1053–1068.
Tripp, O., Pistoia, M., Fink, S. J., Sridharan, M., and Weis-
man, O. (2009). Taj: effective taint analysis of web ap-
plications. In ACM Sigplan Notices, volume 44, pages
87–97. ACM.
Vogt, P., Nentwich, F., Jovanovic, N., Kirda, E., Kruegel,
C., and Vigna, G. (2007). Cross site scripting preven-
tion with dynamic data tainting and static analysis. In
NDSS, volume 2007, page 12.
Volpano, D., Irvine, C., and Smith, G. (1996). A sound type
system for secure flow analysis. J. Comput. Secur.,
4(2-3):167–187.
ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering
366