Tailoring Taint Analysis for Database Applications in the K Framework
Md. Imran Alam
1,2 a
and Raju Halder
1 b
1
Indian Institute of Technology Patna, India
2
Università Ca’ Foscari Venezia, Italy
Keywords:
Database Applications, Taint Analysis, K Framework, Security.
Abstract:
Maintaining the integrity of underlying databases of any information systems is one of the challenges. This
could be either due to coding flaws or due to improper flow of information from source to sink in the asso-
ciated database applications. Compromising this may lead to either disclosure of sensitive information to the
attackers or illegitimately modification of private data stored in the databases. Taint analysis is a widely used
program analysis technique that aims at averting malicious inputs from corrupting data values in critical com-
putations of programs. In this paper, we propose K-DBTaint, a rewriting logic-based executable semantics
for taint analysis of database applications in the K framework. We specify the semantics for a subset of SQL
statements along with host imperative program statements. Our K semantics can be seen as a sound approxi-
mation of program semantics in the corresponding security type domain. With respect to the existing methods,
K-DBTaint supports context- and flow-sensitive analysis, reduces false alarms, and provides a scalable solu-
tion. Experimental evaluation on several PL/SQL benchmark codes demonstrates encouraging results as an
improvement in the precision of the analysis.
1 INTRODUCTION
Database applications are ubiquitous today and many
of these applications are developed in general-
purposes languages embedded with SQL statements.
Vulnerabilities in database applications pose serious
security and privacy threats such as the exposure of
confidential data, loss of customer trust, and denial
of service. According to OWASP
1
, the most serious
vulnerabilities are SQL injection, cross-site scripting,
buffer overflow, etc. These vulnerabilities are usually
caused either due to coding flaws or due to improper
flow of information from source to sink in the associ-
ated database applications.
Taint analysis is a well-established technique that
aims at averting malicious inputs from corrupting data
values in critical computations of programs (Huang
et al., 2014). Tainted data refers to data which orig-
inate from potentially malicious users and can cause
security problems at vulnerable points (known as sen-
sitive sinks) in the program. Tainted data may en-
ter in a program through specific places and spread
across the program via assignments and similar con-
a
https://orcid.org/0000-0003-2700-4127
b
https://orcid.org/0000-0002-8873-8258
1
https://owasp.org/www-project-top-ten/
structs. To exemplify this scenario, let us consider
the code snippet, shown in Figure 1, that allows users
to search a product catalog. The user input is read
through the auxiliary function read() and is integrated
with the SQL query at line number 5. When the query
is executed, it extracts information of those products
whose description IDs (i.e., "Descpr") match with
user-inputs (i.e., "prod"). If a user supplies " ’tpid’
OR 1 = 1; DROP TABLE Products" as input, the query
returns all the rows of the Products table and drops
Products table form the database. This way, the at-
tackers may inject tainted input and attempt to alter
the actual intent of the code. This is known as SQL
Injection attack (Su et al., 2018).
void prodsearch( ) {
1. prod VARCHAR(15);
2. z int;
3. prod = read();
4. if prod > 1 then
5. SELECT PName INTO z FROM Products WHERE Descpr =
prod;
}
Figure 1: A motivating example.
In practice, many database applications which
370
Alam, M. and Halder, R.
Tailoring Taint Analysis for Database Applications in the K Framework.
DOI: 10.5220/0010618603700377
In Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021), pages 370-377
ISBN: 978-989-758-521-0
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
deal with sensitive data, e.g. financial, healthcare,
etc., do not perform proper checking of input data,
and allows attackers to steal or modify weakly pro-
tected data in the underlying database to conduct, for
example, credit card fraud, identity theft, or other
crimes.
Approaches based on taint analysis (Jovanovic
et al., 2006; Wassermann and Su, 2008; Cao et al.,
2017; Tripp et al., 2009) identify potential vulner-
abilities in program code. Unfortunately, these ap-
proaches suffer form false alarms due to ignorance
of the control-flow, the semantics of SQL statements
and constant functions. Security type-system (Huang
et al., 2014) has emerged independently as a proba-
bly most popular approach to static taint analysis in a
competing manner.
In this paper, we extend Hunt and Sands’s se-
curity type system (Hunt and Sands, 2006) and we
propose K-DBTaint, a rewriting logic-based exe-
cutable semantics in the K framework for taint anal-
ysis of database applications. The K framework is a
rewrite logic-based framework for defining program-
ming language semantics suitable for formal anal-
ysis and reasoning about programs and program-
ming languages (Ro¸su and ¸Serb
˘
anut
˘
a, 2010; As
˘
avoae,
2014). Inspired by rewrite-logic semantics project
(Meseguer and Ro¸su, 2007), this framework unifies
algebraic denotational semantics and operational se-
mantics by considering them as two different view
over the same object. Such semantic definitions are
directly executable in a rewriting logic language, e.g.
Maude (Clavel and et al., 2007), thus support a de-
velopment of analysis tools at no cost. With respect
to the literature, the developed prototype based on
our theoretical foundation is flow-sensitive, context-
sensitive, and captures integrity violations in database
applications.
To summarize, our main contributions are:
We apply the K framework to define taint anal-
ysis of database applications by extending Hunt
and Sands’s (Hunt and Sands, 2006) security type
system as the basis.
We specify K rewrite rules which capture seman-
tics of both the database statements and the imper-
ative statements of a host imperative language.
The proposed analysis is flow-sensitive, context-
sensitive, and improve the precision by handling
constant functions.
We develop a prototype tool based on our theoret-
ical foundation which allow the users to analyse
PL/SQL codes.
We present experimental evaluation results on a
set of PL/SQL benchmark codes to establish the
effectiveness of our approach.
The paper is organized as follows: Section 2 discusses
the related works in the literature on static taint analy-
sis. A brief descriptions of abstract syntax of database
language and K framework are presented in Section 3.
Section 4 presents formulation of Hunt and Sands’s
security type system in the K framework. In Sections
5, we present the executable rewriting logic seman-
tics in K for taint analysis. The experimental evalua-
tion results are reported in section 6. Finally, section
7 concludes our work.
2 RELATED WORKS
Taint analysis, a form of information-flow analysis,
detects integrity violations in database applications
(Cao et al., 2017; Huang et al., 2014; Tripp et al.,
2009; Medeiros et al., 2015; Wassermann and Su,
2008; Jovanovic et al., 2006; Maskur and Asnar,
2019; Halim and Asnar, 2019; Vijayalakshmi and
Syed Mohamed, 2021; Jana et al., 2018). The authors
in (Tripp et al., 2009) present TAJ, an analysis tool
for industrial applications. WAP-TA (Medeiros et al.,
2015) and PIXY (Jovanovic et al., 2006) apply taint
analysis to detect vulnerabilities in server-side scripts
written in PHP through an inter-procedural context-
sensitive data flow analysis. However, these analyses
are imprecise as they do not support constant func-
tions. Static taint analysis for detecting cross-site
scripting vulnerabilities in JavaScript codes is pro-
posed in (Wassermann and Su, 2008). Unfortunately,
due to ignorance of control dependencies, the pro-
posed technique is unable to capture the indirect in-
fluence of taint information on other variables due to
implicit flow. Control flow graph-based fine-grained
taint analysis of PHP scripts is proposed in (Cao et al.,
2017; Halim and Asnar, 2019). Pattern-based taint
analysis of web application is proposed in (Maskur
and Asnar, 2019). The proposed approaches are not
context-sensitive and generate false alarms with func-
tion calls. Moreover, the above approaches fail to
address false positives in presence of constant func-
tions, such as x := 0 × x, x := y y, etc. Observe
that these approaches are not directly applicable to
database applications due to the presence of external
database states along with program’s internal states.
Intuitively, precise taint analysis of database applica-
tions requires handling of both database and program
states on the basis of semantics foundation. Ignoring
database states and treating database statements as a
black box, of course, provide a pathway for possible
database-specific security threats.
Table 1 reports a summary of the state-of-the-art
tools and techniques in the line of static taint analysis,
Tailoring Taint Analysis for Database Applications in the K Framework
371
Table 1: A Comparative Summary (X denotes partially successful at this stage).
K-DBTaint
Pixy
WAP-TA
Su et al.
TAJ
Cao et al.
Semantics/Security
Type System
X 7 7 7 X X
Explicit Flow
X X X X X X
Implicit Flow
X 7 X 7 7 X
Constant Functions
X 7 7 7 7 7
Flow-Sensitivity
X X X 7 X X
Context-Sensitivity
X X X X X 7
SQL Semantics X 7 7 7 7 7
Language
Supported
Database
Applications
PHP PHP +
SQL
JavaScript Java PHP
as compared with K-DBTaint. A recent case study
on detection of vulnerabilities in web applications us-
ing taint analysis is reported in (Vijayalakshmi and
Syed Mohamed, 2021).
3 PRELIMINARIES
In this section, we first recall the abstract syntax
of the database language (Alam et al., 2021; Alam
and Halder, 2021) under consideration and then de-
scribe the K framework (Ro¸su and ¸Serb
˘
anut
˘
a, 2010;
As
˘
avoae, 2014) in brief.
Table 2: Abstract Syntax of the Database Language (Alam
et al., 2021; Alam and Halder, 2021).
E ::= n | id | E ap E | (E), where ap {+,,×, /}
B ::= true | false | E rel E | ¬B | B AND B | B OR B,
where rel {>, 6, <, >, ==}
τ ::= int | f loat | char | bool
D ::= τ id
A ::= id := E | id := read()
Q
sel
::= SELECT
~
E INTO
~
rs FROM id WHERE B;
Q
ins
::= INSERT INTO id (
~
id) VALUES (
~
E);
Q
upd
::= UPDATE id SET
~
id =
~
E WHERE B;
Q
del
::= DELETE FROM id WHERE B;
Q ::= Q
sel
| Q
ins
| Q
upd
| Q
del
C ::= skip; | Q | D; | A; | defun id(
~
D){C} | call
id(
~
E); | return; | return E; | if B then {C}
| if(B) then {C
1
} else {C
2
} | while(B) do {C}
P ::= C | C ; P
3.1 Abstract Syntax of Database
Language
We consider a generic database language scenario
where SQL statements are embedded in a high-level
imperative language. Its abstract syntax is shown in
Table 2. An identifier denoted by id represents ei-
ther a program variable or a database attribute or a
database table name. For simplicity, we assume that
all attributes names in the database schema are dis-
tinct. The arithmetic and boolean expressions are de-
noted by E and B respectively. By convention,
~
id
stands for a sequence of identifiers hid
1
, id
2
, . . . , id
n
i
and
~
E stands for a sequence of arithmetic expres-
sions hE
1
, E
2
, . . . , E
m
i. The declaration and assign-
ment statements in imperative language are denoted
by D and A. We consider a subset of database ma-
nipulation statements, namely SELECT (Q
sel
), INSERT
(Q
ins
), UPDATE (Q
upd
), and DELETE (Q
del
).
The Q
sel
statement filters a set of tuples from the
target table id based on the satisfiability of B and
stores them in the resultset program variable
~
rs. Sim-
ilarly, the Q
upd
updates the attributes
~
id of table id
by new values of
~
E, indicated by
~
id =
~
E, when the
corresponding rows satisfy B. More specifically, the
term
~
id =
~
E denotes the sequence of assignments id
1
= E
1
, . . . , id
n
= E
n
, where id
i
denotes the i
th
attribute
name, and E
i
represents i
th
expression. To exemplify
this, let us consider the statement "UPDATE Product
SET Item
1
:= ‘apple’, Item
2
:= ‘orange’ WHERE Pid
= uspid;". Its abstract syntax
~
id =
~
E is denoted by
hItem
1
, Item
2
i = h ‘apple’, ‘orange’ i and B is de-
noted by “Pid = uspid". The Q
ins
appends a new data
record (denoted by VALUES(
~
E)) to a table id. The
Q
del
statement deletes records from table id, that sat-
isfy the condition B. The other imperative statements
supported by the language are assignment, function,
conditional, looping, and return. A database program
P consists of sequence of statements C.
3.2 The K Framework
In this section, we briefly describe the K framework
(Ro¸su and ¸Serb
˘
anut
˘
a, 2010). The K framework is a
rewrite logic-based framework for defining program-
ming language semantics suitable for formal reason-
ing about programs and programming languages.
Any formal semantics of a language requires first
a formal syntax. In K framework, language syntax is
defined using a variant of the familiar BNF notation,
with terminals enclosed in quotes and non-terminals
starting with capital letters. For example, the syntax
declaration:
DATA 2021 - 10th International Conference on Data Science, Technology and Applications
372
syntax E ::= id
| E E [strict]
defines a syntactic category E, containing the pro-
gram variables and a basic arithmetic operation on ex-
pressions of database language. Each production can
have a space-separated list of attributes which can be
specified in square brackets at the end of the produc-
tion.
Specifying language semantics using K frame-
work, consists of three parts: providing evaluation
strategies that conveniently (re)arrange computations
(computations), giving the structure of the configura-
tion to hold program states (configuration), and writ-
ing K rules to describe transitions between configura-
tions (K rewrite rules).
Evaluation strategies serve as a link between syn-
tax and semantics, by specifying how the arguments
of a language construct should be evaluated. For ex-
ample, consider the following syntax for arithmetic
expression:
syntax E ::= E
1
"+" E
2
[strict]
The attribute strict allows E
1
and E
2
to evaluate in
any order, thus enforces a non-determinism. The an-
notation above corresponds to the following four heat-
ing/cooling rules:
h
E
1
+ E
2
E
1
y + E
2
...i
k
| h
E
1
+ E
2
E
2
y + E
1
...i
k
| h
V
1
y + E
2
V
1
+ E
2
...i
k
| h
V
2
y E
1
+
E
1
+ V
2
...i
k
Here, V
1
and V
2
are the evaluated results of the ex-
pressions Exp
1
and Exp
2
respectively. The construct
(HOLE) is a place-holder that will be replaced by
the result of the evaluated term or sub-term.
Configurations represent the state of a running
program/system and are structured as nested, la-
beled cells containing various computation-based
data structures. Within the K, configuration cells are
represented using an XML-like notation, with the la-
bel of the cell as the tag name and the contents be-
tween the opening and closing tags (i.e., List, Map,
Bag, Set, etc.). For example, consider the following
configuration with three cells:
configuration hhKi
k
hMap[id 7→ L]i
env
hMap[L 7→
n]i
store
i
T
The k cell holds a list of computational tasks, that
is k : List{K,y} where K holds computational con-
tents such as programs or fragment of programs and
y is the task sequentialization operator which se-
quentializes program statements. The env cell maps
variables to their locations (i.e., env : id 7→ L) and the
store cell maps locations to values (i.e., store : L 7→ n).
These cells are covered by the top cell denoted by T .
K rules describe how a running configuration
evolves by advancing the computation and potentially
altering the state/environment. K rewrite rules are
classified into two types: computational rules, that
may be interpreted as transition in a program execu-
tion, and structural rules, that rearrange a term to en-
able the application of computational rule. For better
understanding, let us consider the following rule, con-
sidering two cells k and env, for a variable declaration:
h
τ id
.
.. . i
k
h
ρ
ρ[id T : Type]
i
env
This specifies that the next task to evaluate is a vari-
able declaration, which is replaced by an empty com-
putation . and a type T is assigned to the variable
id in the environment cell env.
By keeping rules compact and less redundant, it is
less likely that a rule will need to be changed as the
configuration is changed or new constructs are added
to the language.
4 FORMULATING SECURITY
TYPE SYSTEM IN THE K
FRAMEWORK
In this section, we first extend Hunt and Sands’s se-
curity type system (Hunt and Sands, 2006) to the case
of database applications, and then we formulate their
typing judgments and rules in the K framework.
Extending Hunt and Sands’s Security Type
Systems: As Hunt and Sands’s security type sys-
tem is flow-sensitive, we extend this for the purposes
of our taint analysis with context-sensitivity in case
of inter-procedural database code. This is depicted in
Figure 2. We consider two security types taint and
untaint with the semi-join lattice of the type domain
as SD = hS, v, ti, where S = {taint, untaint} and
the partial order relation defined as untaint v taint.
Formulating the Type System: Let us now formu-
late the typing judgements and rules in the K fram-
work. Let us first consider the typing judgement
pc ` Γ{C}Γ
0
which specifies that the security environ-
ment Γ
0
is derived by executing the statement C on the
security environment Γ under the program’s security
context pc. To formulate this, we first define a config-
uration with three cells namely, k cell , env cell, and
context cell as follows:
hKi
k
hMapi
env
hMapi
context
T
.
Then we define following K rule to formulate the type
judgement pc ` Γ{C}Γ
0
:
h
C
.
.. . i
k
h
Γ
Γ
0
i
env
hpc 7→ _ i
context
Tailoring Taint Analysis for Database Applications in the K Framework
373
The symbol “. . . appearing in the k cell represents
remaining computations. As a result of the execution
of C which eventually be consumed (denoted by “."),
the previous environment Γ in the env cell will be up-
dated by the modified environment Γ
0
(implicitly) in-
fluenced by the current value (denoted by “_") of the
security context pc in the context cell.
Each security type rule is written as
Γ
i
` ζ
i
Γ ` ζ
,
where judgements Γ
i
` ζ
i
denotes number of premise
and judgement Γ ` ζ denotes a single conclu-
sion. For example, the type rule for assign-
ment statement
Γ`E: T
pc`Γ {id:=E} Γ[[id7→pctT]]
is defined by
the corresponding K rule h
id := T : Type
.
.. . i
k
h... ρ[id 7→
_
µ(pc) t T : Type
] . . . i
env
hµi
context
. Considering this as basis,
in the following section we discuss K rewrite rules
for static taint analysis of database language in the
abstract security type domain S.
[Expression]
Γ ` E : t
xFV(E)
Γ(x)
[skip]
pc ` Γ{skip}Γ
[Declaration]
pc ` Γ{τ id} Γ[[id 7→ pc t untaint]]
[Read]
pc ` Γ{id := read()} Γ[[id 7→ pc ttaint]]
[Assignment]
Γ ` E : T
pc ` Γ {id := E} Γ[[id 7→ pc t T]]
[DELETE]
Γ ` B : T
pc ` Γ{DELETE FROM id WHERE B}Γ
[SELECT]
Γ ` B : T pc t T ` Γ{
~
rs :=
~
E}Γ
0
pc ` Γ {SELECT
~
E INTO
~
rs FROM id WHERE B} Γ t Γ
0
[INSERT]
pc ` Γ{
~
id :=
~
E}Γ
0
pc ` Γ {INSERT INTO id(
~
id) VALUES(
~
E)} Γ t Γ
0
[UPDATE]
Γ ` B : T pc t T ` Γ{
~
id :=
~
E}Γ
0
pc ` Γ {UPDATE id SET
~
id :=
~
E WHERE B} Γ t Γ
0
[Function
Call]
Γ `
~
E :
~
T
de f un id(
~
D){C}
~
X = getParam(
~
D)
Γ[[
~
X 7→
~
T]] Γ
0
pc ` Γ
0
{C} Γ
00
pc ` Γ
0
{de f un id(
~
D){C}} Γ
00
pc ` Γ {call id(
~
E)} Γ
00
[if]
Γ ` B : T pc t T ` Γ{C}Γ
0
pc ` Γ {i f B then C} Γt Γ
0
[if-else]
Γ ` B : T pc t T ` Γ{C
1
}Γ
0
pc t T ` Γ{C
2
}Γ
00
pc ` Γ {i f B then {C
1
} else {C
2
}} Γ
0
t Γ
00
[while]
Γ
0
i
` B : T
i
pc t T
i
` Γ
0
i
{C}Γ
00
i
0 i k
Γ
0
0
= Γ Γ
0
i+1
= Γ
00
i
t Γ Γ
0
k+1
= Γ
0
k
pc ` Γ {while B do {C}} Γ
0
k
Figure 2: Flow- and Context-sensitive Security Type Rules
for Taint Analysis.
5 K SEMANTICS FOR
K-DBTaint
In this section, we present K rewrite rules for taint
analysis of our database language. The proposed
analysis addresses semantics of both the database
statements and the imperative program statements to-
gether. To this aim, we consider the following K con-
figuration on which the semantics is defined:
configuration
hKi
k
hMapi
env
hMapi
context
h hMap i
λ-De f
hListi
f stack
i
control
hListi
in
hListi
out
T
Where the special cell hi
k
contains the list of com-
putation tasks, environment cell env maps variables
or database attributes to their security types, context
cell denotes the current program context, λ-Def cell
supports inter-procedural feature, in and out cells de-
notes input-output operations, and control cell man-
ages function calls using fstack cell. We now describe
the K rules for imperative program statements and
database statements as depicted in Figure 3. We label
the defined rules by R
-
for future reference. Note that
these rules captures explicit- and implicit- flow sen-
sitivity, context-sensitivity, the semantics of constant
functions, and the semantic of SQL statements.
Imperative Program Statements: Due to space
scarcity, here we discuss a few rules of imperative
program statements. However, a reader can refer
(Alam et al., 2018) for a complete set of rules of
imperative programs. The first rule R
decl
deals with
variables declarations and initialization of variables
by their initial security types (untaint in our case) in
the environment cell env. Any unsanitized input gets
its type tainted in the rule R
read
. Rule R
asg
which
handles assignment computations, updates the secu-
rity type of id somewhere in the env cell by the least
upper bound of the security types of the right hand
side expression (i.e. T) and program’s current security
context pc in the context cell. The assignment is then
replaced by an empty computation. In order to cap-
ture implicit flow of taint information in presence of
conditional and loop statements, we define rules R
if
and R
while
which update the security context µ in the
context cell based on the security type of B. The term
restore
c
(µ) restores the previous context on exiting a
block guarded by B and approx(ρ) provides a sound
approximation of the semantics as a least upper bound
of the environments obtained over all possible execu-
tion paths due to the presence of B. Note that the least
fixed point solution in case of while is achieved by
defining one of the following auxiliary function fix-
point():
(1)
fixpoint(B,C,ρ
i
)
.
.. .
k
hρ
0
i
i
env
when ρ
i
= ρ
0
i
,
DATA 2021 - 10th International Conference on Data Science, Technology and Applications
374
R
decl
: h
τ id
.
.. . i
k
h
ρ
ρ[id T : Type]
i
env
R
read
: h
read( )
taint
.. . i
k
R
con-func
: hid
1
id
2
.. . i
k
=
h
id
1
id
2
untaint
.. . i
k
when id
1
= zero
or id
2
= zero
h
id
1
id
2
id
1
Type
id
2
.. . i
k
otherwise
R
asg
: h
id := T : Type
.
.. . i
k
h... ρ[id 7→
_
µ(pc) t T : Type
] . . . i
env
hµi
context
R
if
: h
i f (B : T) then {C}
C y restore
c
(µ) y approx(ρ)
.. . i
k
h
µ
µ[pc µ(pc)t T]
i
context
hρi
env
R
while
: h
while(B : T) do {C}
C y restore
c
(µ) y approx(ρ) y f ixpoint(B,C, ρ)
.. . i
k
hρi
env
h
µ
µ[pc µ(pc)t T]
i
context
R
fun-decl
: h
de f un F unc_name(Params){C}
.
.. . i
k
h
ψ
ψ[Func_name lambda(Params,C)]
i
λ-De f
control
R
fun-call
: h
lambda(Params,C)(Es : T s) y K
McDecls(Params,T s) y C y return;
.. . i
k
h
.List
[ListItem(ρ,K,Ctr)]
.. . i
f stack
Ctr
control
hρi
env
R
aux-fun(a)
: h
makeAssign(id, E)
id := E;
.. . i
K
R
aux-fun(b)
: h
makeAssign((id,
~
id), (E,
~
E))
id := E;y makeAssign(
~
id,
~
E)
.. . i
K
R
aux-fun(c)
: h
makeupdAssign(id
1
= E
1
,. . . , id
n
= E
n
)
id
1
= E
1
;y . . . y id
n
= E
n
;
.. . i
k
R
sel
: h
SELECT
~
E INTO
~
rs FROM id WHERE B;
if(B){makeAssign(
~
rs,
~
E)}
.. . i
k
R
ins
: h
INSERT INTO id(
~
id) VALUES (
~
E);
makeAssign(
~
id,
~
E)
.. . i
k
R
upd
: h
UPDATE id SET
~
id =
~
E WHERE B;
if(B){makeupdAssign(id
1
= E
1
,. . . , id
n
= E
n
)}
.. . i
k
R
del
: h
DELETE FROM id WHERE B;
.
.. . i
k
Figure 3: K rewrite rules for Database Applications.
(2)
fixpoint(B,C,ρ
i
)
while(B) do {C}
.. .
k
hρ
0
i
i
env
when ρ
i
6= ρ
0
i
.
Where case (1) represents that the computation
reaches the fix-point and therefore the computation
is consumed. If not, then the iteration continues as
shown in case (2). The context-sensitivity in presence
of function calls is captured by the rule R
func-call
.
SQL Statements: Before we define the semantics
of SQL statements, let us first define the semantics
of auxiliary function makeAssign() which takes two
parameters and generate an assignment statement as
follows:
makeAssign(id, E) => id := E;
where id represents an identifier and E denotes an
expression. On passing a list as parameters to the
function makeAssign(), it generates a set of assign-
ment statements as follows:
makeAssign((id,
~
id), (E,
~
E)) => id := E; y makeAssign(
~
id,
~
E)
where
~
id denotes a list of identifier and
~
E denotes
a list of expression. Observe that, in case of empty
parameter passed to makeAssign(), it generates an
empty computation “." (i.e., makeAssign(.id, .E) =>
. ). Similarly, the function makeupdAssign() accepts
term of the form hid
1
= E
1
, . . ., id
n
= E
n
i, and returns
a set of assignment statements as follows:
makeupdAssign(id
1
= E
1
, . . . , id
n
= E
n
) => id
1
= E
1
;y . . . y id
n
= E
n
;
Let us now define the semantics of the subset of SQL
statements supported by the language under consider-
ation.
The SELECT statement Q
sel
retrieves values of
those attributes that are present in the expression
~
E. It
includes a WHERE clause to select number of rows to be
retrieved. The presence of condition B in the WHERE
clause act as an implicit flow of taint information. In
order to capture such implicit flow, we translate Q
sel
to an equivalent if statement as depicted in rule R
sel
.
Note that, the auxiliary function makeAssign(
~
rs,
~
E)
in R
sel
captures flow of tainted data which are re-
trieved from the database table using Q
sel
.
The UPDATE statement Q
upd
updates the values
of the attributes according to the condition in WHERE
clause. Like SELECT statement, condition in the
WHERE clause of UPDATE statement also acts as im-
plicit flow of taint information. Therefore, rule R
upd
translates Q
upd
into an equivalent if statement to cap-
ture implicit taint flow. The translation of
~
id =
~
E into
an auxiliary function makeupdAssign(id
1
= E
1
, . . . ,
id
n
= E
n
) captures direct taint flow due to assignment
of tainted data through the expression E.
Tailoring Taint Analysis for Database Applications in the K Framework
375
Table 3: Taint Analysis on Benchmark Programs Set (PL/SQL, 2021) (X: Passed, 7
+
: False Positives, 7
: False
negatives).
Progs. Descriptions K-DBTaint
WAP-TA Pixy Su et al. Cao et al.
Prog1
Balance_Transfer.sql (explicit-flow)
X 7
7
7
7
Prog2
Update_Inventory.sql (explicit-flow)
X 7
7
7
7
Prog3
Budget.sql (SQL injection)
X 7
7
7
7
Prog4
Proc_Inventory.sql (malware attack)
X 7
+
7
+
7
7
Prog5
Update_Quantity.sql (SQL injection)
X 7
7
+
7
7
Prog6
Populate_Products.sql(SQL injection, constant function)
X 7
7
7
7
Prog7
delete_Client.sql (SQL injection)
X 7
+
7
+
7
+
7
+
Prog8
Update_Account.sql (SQL injection, implicit flow)
X 7
+
7
7
7
+
Prog9
Record_NewSale.sql (explicit-flow)
X 7
7
7
7
Prog10
Get_Country_Id.sql (explicit-flow, constant function)
X 7
+
7
+
7
+
7
+
Prog11
Add_Car.sql (SQL injection, explicit-flow, constant function)
X 7
+
7
+
7
+
7
+
Prog12
Award_Bonus.sql (SQL injection, malware attack)
X 7
7
7
7
Prog13
Resrvation_Proc.sql (SQL injection, XSS attacks)
X 7
7
7
+
7
+
Prog14
Credit_Account.sql (SQL injection, implicit flow)
X 7
+
7
7
7
+
Prog15
Debit_Account.sql (SQL injection, implicit flow)
X 7
+
7
7
7
+
The INSERT statement appends a tuple to the
database table. We specify attributes names
~
id and
list of values
~
E in INSERT statement in the same or-
der as specified in the table definition. Therefore, any
tainted expression E
~
E directly stores tainted data
in the database table that may be accessed and used in
the future computations. Hence, the rule R
ins
trans-
lates Q
ins
into sequence of assignment statements to
capture such direct flow of taint data due to assign-
ment of E to id.
The DELETE statement deletes tuples from a table,
we translate it by an empty computation leaving the
current execution environment unchanged. This is de-
picted in rule R
del
.
Constant Functions Apart from the semantics of
SQL statements, defining the semantics of constant
functions greatly improve the precision of taint anal-
ysis. For example, consider the statement y := z ×
0 + 4, where z is a tainted variable. Note that, al-
though the syntax-based taint flow makes the vari-
able y tainted, the semantics of the constant function
z × 0 + 4” that always results 4 irrespective of the
value of z makes y untainted. The semantics approxi-
mation in the security domain, due to the abstraction,
leads to a challenge in dealing with constant func-
tions. As a partial solution, we specify rules for some
simple cases of constant functions such as x - x, x
xor x, x × 0, etc. We mention one of such rules in
R
con-func
. In this context, as a notable observation, we
consider the following scenario: given the code frag-
ment y := read(); x := y; v := x xor y, the analysis
successfully marks the variable v as tainted. Indeed,
attackers may inject some malicious input containing
a vulnerable control part for which the xor operation
fails to nullify the effect, affecting the subsequent crit-
ical computation involving v.
6 EXPERIMENTAL RESULTS
This section presents experimental results on a set
of benchmark codes collected from (PL/SQL, 2021).
These codes represent a wide range of PL/SQL codes
including explicit-flow (Prg1, Prg2, Prg09, Prg10,
Prg11), implicit-flow (Prg8, Prg14, Prg15), XSS at-
tacks (Prg13), malware attacks (Prg4, Prg12), SQL
injection attacks (Prg3, Prg5, Prg6, Prg7, Prg8,
Prg11, Prg12, Prg13, Prg14, Prg15), constant func-
tions (Prg6, Prg10, Prg11), etc. All experiments are
conducted using a computer system equipped with
core i3, 2.60 GHz CPU, 3GB memory, and Ubuntu
20.04 operating system.
Table 4: Detail Execution Steps in K-DBTaint of Code Snip-
pet Depicted in Figure 1 (U: untaint, T: taint).
Prog. Points
Security Types Rules
1 prod|->U R
decl
2 prod|->U, z|->U R
decl
3 prod|->T, z|->U R
read
4 prod|->T, z|->U R
if
5 prod|->T, z|->T, PName|->T R
sel
We have implemented a prototype K-DBTaint in
the K tool (version 5.0)
2
. We have defined a full set of
semantics rules (more than 430 rules) for our database
language under consideration. The tool K-DBTaint
accepts PL/SQL code as input from the console us-
ing K Framework-specific commands. Table 3 de-
picts the evaluation results of the benchmark codes.
2
https://github.com/kframework/k/releases
DATA 2021 - 10th International Conference on Data Science, Technology and Applications
376
The results of the K-DBTaint are compared with the
results obtained from some of the available static taint
analysis tools, such as WAP-TA (Evans and Larochelle,
2002), Pixy (Jovanovic et al., 2006), (Wassermann
and Su, 2008), and (Cao et al., 2017), are reported
in columns 3-7. The notations 7
+
and 7
indi-
cate failures due to false positives and false negatives
respectively, whereas X indicates a successful de-
tection of taint vulnerabilities. Observe that, due to
the flow-sensitivity, context-sensitivity, the enhance-
ment to deal with constant functions and semantics
of SQL statements, K-DBTaint significantly reduces
the occurrences of false alarms. In Table 4, we show
how K-DBTaint successfully captures taint flows of the
motivating example in Figure 1 by showing its corre-
sponding execution steps.
7 CONCLUSIONS
In this paper, we proposed an executable rewriting
logic semantics for static taint analysis of a database
language in the K framework. The proposed anal-
ysis addressed the semantics of both the database
statements and the imperative program statements to-
gether. We develop a prototype K-DBTaint in K
based on the theoretical foundation, which allows the
user to analyse PL/SQL code for integrity issues. As
compared to existing works, the proposed approach
has improved precision, as shown by our experimen-
tal evaluation on a set of benchmark programs. In
future, we aim to add more semantic rules to cover
more language features such as aggregate functions,
nested queries, set operations, etc., as an extension
to the current database language and we also address
more semantics-based non-dependencies.
REFERENCES
Alam, M., Halder, R., Goswami, H., Pinto, J. S., et al.
(2018). K-taint: an executable rewriting logic seman-
tics for taint analysis in the k framework. In Proc of
the 13th Int. Conf. on ENASE, pages 359–366.
Alam, M. I. and Halder, R. (2021). Formal verification of
database applications using predicate abstraction. SN
Computer Science, 2(3):1–24.
Alam, M. I., Halder, R., and Pinto, J. S. (2021). A de-
ductive reasoning approach for database applications
using verification conditions. Journal of Systems and
Software, 175:110903.
As
˘
avoae, I. M. (2014). Abstract semantics for alias anal-
ysis in k. Electronic Notes in Theoretical Computer
Science, 304:97–110.
Cao, K., He, J., Fan, W., Huang, W., Chen, L., and Pan,
Y. (2017). Php vulnerability detection based on taint
analysis. In 2017 6th ICRITO, pages 436–439. IEEE.
Clavel, M. and et al. (2007). All about maude-a high-
performance logical framework: how to specify, pro-
gram and verify systems in rewriting logic, volume
4350. Springer-Verlag.
Evans, D. and Larochelle, D. (2002). Improving security
using extensible lightweight static analysis. IEEE soft-
ware, 19(1):42–51.
Halim, V. H. and Asnar, Y. D. W. (2019). Static code ana-
lyzer for detecting web application vulnerability using
control flow graphs. In 2019 International Conference
on Data and Software Engineering (ICoDSE), pages
1–6. IEEE.
Huang, W., Dong, Y., and Milanova, A. (2014). Type-based
taint analysis for java web applications. In In Proc.
of Int. Conf. on Fundamental Approaches to Software
Engineering, pages 140–154. Springer.
Hunt, S. and Sands, D. (2006). On flow-sensitive security
types. In Conf. Record of the 33rd ACM SIGPLAN-
SIGACT Sym. on POPL, pages 79–90, S. California.
ACM.
Jana, A., Alam, M. I., and Halder, R. (2018). A symbolic
model checker for database programs. In ICSOFT,
pages 381–388.
Jovanovic, N., Kruegel, C., and Kirda, E. (2006). Pixy: A
static analysis tool for detecting web application vul-
nerabilities. In IEEE, S&P’06, pages pp. 258–263.
Maskur, A. F. and Asnar, Y. D. W. (2019). Static code anal-
ysis tools with the taint analysis method for detect-
ing web application vulnerability. In 2019 Interna-
tional Conference on Data and Software Engineering
(ICoDSE), pages 1–6. IEEE.
Medeiros, I., Neves, N., and Correia, M. (2015). Detect-
ing and removing web application vulnerabilities with
static analysis and data mining. IEEE Transactions on
Reliability, 65(1):54–69.
Meseguer, J. and Ro¸su, G. (2007). The rewriting logic
semantics project. Theoretical Computer Science,
373(3):213–237.
PL/SQL(2021). Github pl/sql project. https://github. com/-
topics/plsql. [Online accessed March-2021].
Ro¸su, G. and ¸Serb
˘
anut
˘
a, T. F. (2010). An overview of the k
semantic framework. The Journal of Logic and Alge-
braic Programming, 79(6):397–434.
Su, G., Wang, F., and Li, Q. (2018). Research on sql injec-
tion vulnerability attack model. In 2018 5th IEEE Int.
Conf. on CCIS, pages 217–221. IEEE.
Tripp, O., Pistoia, M., Fink, S. J., Sridharan, M., and Weis-
man, O. (2009). Taj: effective taint analysis of web ap-
plications. In ACM Sigplan Notices, volume 44, pages
87–97. ACM.
Vijayalakshmi, K. and Syed Mohamed, E. (2021). Case
study: Extenuation of xss attacks through various de-
tecting and defending techniques. Journal of Applied
Security Research, 16(1):91–126.
Wassermann, G. and Su, Z. (2008). Static detec-
tion of cross-site scripting vulnerabilities. In 2008
ACM/IEEE 30th International Conference on Soft-
ware Engineering, pages 171–180. IEEE.
Tailoring Taint Analysis for Database Applications in the K Framework
377