Tailoring Taint Analysis for Database Applications in the K Framework

Md. Imran Alam

1,2 a

and Raju Halder

1 b

Indian Institute of Technology Patna, India

Università Ca’ Foscari Venezia, Italy

Keywords:

Database Applications, Taint Analysis, K Framework, Security.

Abstract:

Maintaining the integrity of underlying databases of any information systems is one of the challenges. This

could be either due to coding ﬂaws or due to improper ﬂow of information from source to sink in the asso-

ciated database applications. Compromising this may lead to either disclosure of sensitive information to the

attackers or illegitimately modiﬁcation of private data stored in the databases. Taint analysis is a widely used

program analysis technique that aims at averting malicious inputs from corrupting data values in critical com-

putations of programs. In this paper, we propose K-DBTaint, a rewriting logic-based executable semantics

for taint analysis of database applications in the K framework. We specify the semantics for a subset of SQL

statements along with host imperative program statements. Our K semantics can be seen as a sound approxi-

mation of program semantics in the corresponding security type domain. With respect to the existing methods,

K-DBTaint supports context- and ﬂow-sensitive analysis, reduces false alarms, and provides a scalable solu-

tion. Experimental evaluation on several PL/SQL benchmark codes demonstrates encouraging results as an

improvement in the precision of the analysis.

1 INTRODUCTION

Database applications are ubiquitous today and many

of these applications are developed in general-

purposes languages embedded with SQL statements.

Vulnerabilities in database applications pose serious

security and privacy threats such as the exposure of

conﬁdential data, loss of customer trust, and denial

of service. According to OWASP

, the most serious

vulnerabilities are SQL injection, cross-site scripting,

buffer overﬂow, etc. These vulnerabilities are usually

caused either due to coding ﬂaws or due to improper

ﬂow of information from source to sink in the associ-

ated database applications.

Taint analysis is a well-established technique that

aims at averting malicious inputs from corrupting data

values in critical computations of programs (Huang

et al., 2014). Tainted data refers to data which orig-

inate from potentially malicious users and can cause

security problems at vulnerable points (known as sen-

sitive sinks) in the program. Tainted data may en-

ter in a program through speciﬁc places and spread

across the program via assignments and similar con-

https://orcid.org/0000-0003-2700-4127

https://orcid.org/0000-0002-8873-8258

https://owasp.org/www-project-top-ten/

structs. To exemplify this scenario, let us consider

the code snippet, shown in Figure 1, that allows users

to search a product catalog. The user input is read

through the auxiliary function read() and is integrated

with the SQL query at line number 5. When the query

is executed, it extracts information of those products

whose description IDs (i.e., "Descpr") match with

user-inputs (i.e., "prod"). If a user supplies " ’tpid’

OR 1 = 1; DROP TABLE Products" as input, the query

returns all the rows of the Products table and drops

Products table form the database. This way, the at-

tackers may inject tainted input and attempt to alter

the actual intent of the code. This is known as SQL

Injection attack (Su et al., 2018).

void prodsearch( ) {

1. prod VARCHAR(15);

2. z int;

3. prod = read();

4. if prod > 1 then

5. SELECT PName INTO z FROM Products WHERE Descpr =

prod;

}

Figure 1: A motivating example.

In practice, many database applications which

370

Alam, M. and Halder, R.

Tailoring Taint Analysis for Database Applications in the K Framework.

DOI: 10.5220/0010618603700377

In Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021), pages 370-377

ISBN: 978-989-758-521-0

deal with sensitive data, e.g. ﬁnancial, healthcare,

etc., do not perform proper checking of input data,

and allows attackers to steal or modify weakly pro-

tected data in the underlying database to conduct, for

example, credit card fraud, identity theft, or other

crimes.

Approaches based on taint analysis (Jovanovic

et al., 2006; Wassermann and Su, 2008; Cao et al.,

2017; Tripp et al., 2009) identify potential vulner-

abilities in program code. Unfortunately, these ap-

proaches suffer form false alarms due to ignorance

of the control-ﬂow, the semantics of SQL statements

and constant functions. Security type-system (Huang

et al., 2014) has emerged independently as a proba-

bly most popular approach to static taint analysis in a

competing manner.

In this paper, we extend Hunt and Sands’s se-

curity type system (Hunt and Sands, 2006) and we

propose K-DBTaint, a rewriting logic-based exe-

cutable semantics in the K framework for taint anal-

ysis of database applications. The K framework is a

rewrite logic-based framework for deﬁning program-

ming language semantics suitable for formal anal-

ysis and reasoning about programs and program-

ming languages (Ro¸su and ¸Serb

anut

a, 2010; As

avoae,

2014). Inspired by rewrite-logic semantics project

(Meseguer and Ro¸su, 2007), this framework uniﬁes

algebraic denotational semantics and operational se-

mantics by considering them as two different view

over the same object. Such semantic deﬁnitions are

directly executable in a rewriting logic language, e.g.

Maude (Clavel and et al., 2007), thus support a de-

velopment of analysis tools at no cost. With respect

to the literature, the developed prototype based on

our theoretical foundation is ﬂow-sensitive, context-

sensitive, and captures integrity violations in database

applications.

To summarize, our main contributions are:

• We apply the K framework to deﬁne taint anal-

ysis of database applications by extending Hunt

and Sands’s (Hunt and Sands, 2006) security type

system as the basis.

• We specify K rewrite rules which capture seman-

tics of both the database statements and the imper-

ative statements of a host imperative language.

• The proposed analysis is ﬂow-sensitive, context-

sensitive, and improve the precision by handling

constant functions.

• We develop a prototype tool based on our theoret-

ical foundation which allow the users to analyse

PL/SQL codes.

• We present experimental evaluation results on a

set of PL/SQL benchmark codes to establish the

effectiveness of our approach.

The paper is organized as follows: Section 2 discusses

the related works in the literature on static taint analy-

sis. A brief descriptions of abstract syntax of database

language and K framework are presented in Section 3.

Section 4 presents formulation of Hunt and Sands’s

security type system in the K framework. In Sections

5, we present the executable rewriting logic seman-

tics in K for taint analysis. The experimental evalua-

tion results are reported in section 6. Finally, section

7 concludes our work.

2 RELATED WORKS

Taint analysis, a form of information-ﬂow analysis,

detects integrity violations in database applications

(Cao et al., 2017; Huang et al., 2014; Tripp et al.,

2009; Medeiros et al., 2015; Wassermann and Su,

2008; Jovanovic et al., 2006; Maskur and Asnar,

2019; Halim and Asnar, 2019; Vijayalakshmi and

Syed Mohamed, 2021; Jana et al., 2018). The authors

in (Tripp et al., 2009) present TAJ, an analysis tool

for industrial applications. WAP-TA (Medeiros et al.,

2015) and PIXY (Jovanovic et al., 2006) apply taint

analysis to detect vulnerabilities in server-side scripts

written in PHP through an inter-procedural context-

sensitive data ﬂow analysis. However, these analyses

are imprecise as they do not support constant func-

tions. Static taint analysis for detecting cross-site

scripting vulnerabilities in JavaScript codes is pro-

posed in (Wassermann and Su, 2008). Unfortunately,

due to ignorance of control dependencies, the pro-

posed technique is unable to capture the indirect in-

ﬂuence of taint information on other variables due to

implicit ﬂow. Control ﬂow graph-based ﬁne-grained

taint analysis of PHP scripts is proposed in (Cao et al.,

2017; Halim and Asnar, 2019). Pattern-based taint

analysis of web application is proposed in (Maskur

and Asnar, 2019). The proposed approaches are not

context-sensitive and generate false alarms with func-

tion calls. Moreover, the above approaches fail to

address false positives in presence of constant func-

tions, such as x := 0 × x, x := y − y, etc. Observe

that these approaches are not directly applicable to

database applications due to the presence of external

database states along with program’s internal states.

Intuitively, precise taint analysis of database applica-

tions requires handling of both database and program

states on the basis of semantics foundation. Ignoring

database states and treating database statements as a

black box, of course, provide a pathway for possible

database-speciﬁc security threats.

Table 1 reports a summary of the state-of-the-art

tools and techniques in the line of static taint analysis,

Tailoring Taint Analysis for Database Applications in the K Framework

371

Table 1: A Comparative Summary (X denotes partially successful at this stage).

K-DBTaint

Pixy

WAP-TA

Su et al.

TAJ

Cao et al.

Semantics/Security

Type System

X 7 7 7 X X

Explicit Flow

X X X X X X

Implicit Flow

X 7 X 7 7 X

Constant Functions

X 7 7 7 7 7

Flow-Sensitivity

X X X 7 X X

Context-Sensitivity

X X X X X 7

SQL Semantics X 7 7 7 7 7

Language

Supported

Database

Applications

PHP PHP +

SQL

JavaScript Java PHP

as compared with K-DBTaint. A recent case study

on detection of vulnerabilities in web applications us-

ing taint analysis is reported in (Vijayalakshmi and

Syed Mohamed, 2021).

3 PRELIMINARIES

In this section, we ﬁrst recall the abstract syntax

of the database language (Alam et al., 2021; Alam

and Halder, 2021) under consideration and then de-

scribe the K framework (Ro¸su and ¸Serb

anut

a, 2010;

avoae, 2014) in brief.

Table 2: Abstract Syntax of the Database Language (Alam

et al., 2021; Alam and Halder, 2021).

E ::= n | id | E ap E | (E), where ap ∈ {+,−,×, /}

where rel ∈ {>, 6, <, >, ==}

τ ::= int | f loat | char | bool

D ::= τ id

A ::= id := E | id := read()

sel

::= SELECT

E INTO

rs FROM id WHERE B;

ins

::= INSERT INTO id (

id) VALUES (

E);

upd

::= UPDATE id SET

id =

E WHERE B;

del

::= DELETE FROM id WHERE B;

Q ::= Q

sel

| Q

ins

| Q

upd

| Q

del

C ::= skip; | Q | D; | A; | defun id(

D){C} | call

id(

E); | return; | return E; | if B then {C}

| if(B) then {C

} else {C

} | while(B) do {C}

P ::= C | C ; P

3.1 Abstract Syntax of Database

Language

We consider a generic database language scenario

where SQL statements are embedded in a high-level

imperative language. Its abstract syntax is shown in

Table 2. An identiﬁer denoted by id represents ei-

ther a program variable or a database attribute or a

database table name. For simplicity, we assume that

all attributes names in the database schema are dis-

tinct. The arithmetic and boolean expressions are de-

noted by E and B respectively. By convention,

stands for a sequence of identiﬁers hid

, id

, . . . , id

and

E stands for a sequence of arithmetic expres-

sions hE

, E

, . . . , E

i. The declaration and assign-

ment statements in imperative language are denoted

by D and A. We consider a subset of database ma-

nipulation statements, namely SELECT (Q

sel

), INSERT

ins

), UPDATE (Q

upd

), and DELETE (Q

del

The Q

sel

statement ﬁlters a set of tuples from the

target table id based on the satisﬁability of B and

stores them in the resultset program variable

rs. Sim-

ilarly, the Q

upd

updates the attributes

id of table id

by new values of

E, indicated by

id =

E, when the

corresponding rows satisfy B. More speciﬁcally, the

term

id =

E denotes the sequence of assignments id

= E

, . . . , id

= E

, where id

denotes the i

attribute

name, and E

represents i

expression. To exemplify

this, let us consider the statement "UPDATE Product

SET Item

:= ‘apple’, Item

:= ‘orange’ WHERE Pid

= uspid;". Its abstract syntax

id =

E is denoted by

hItem

, Item

i = h ‘apple’, ‘orange’ i and B is de-

noted by “Pid = uspid". The Q

ins

appends a new data

record (denoted by VALUES(

E)) to a table id. The

del

statement deletes records from table id, that sat-

isfy the condition B. The other imperative statements

supported by the language are assignment, function,

conditional, looping, and return. A database program

P consists of sequence of statements C.

3.2 The K Framework

In this section, we brieﬂy describe the K framework

(Ro¸su and ¸Serb

anut

a, 2010). The K framework is a

rewrite logic-based framework for deﬁning program-

ming language semantics suitable for formal reason-

ing about programs and programming languages.

Any formal semantics of a language requires ﬁrst

a formal syntax. In K framework, language syntax is

deﬁned using a variant of the familiar BNF notation,

with terminals enclosed in quotes and non-terminals

starting with capital letters. For example, the syntax

declaration:

DATA 2021 - 10th International Conference on Data Science, Technology and Applications

372

syntax E ::= id

| E ” ∗ ” E [strict]

deﬁnes a syntactic category E, containing the pro-

gram variables and a basic arithmetic operation on ex-

pressions of database language. Each production can

have a space-separated list of attributes which can be

speciﬁed in square brackets at the end of the produc-

tion.

Specifying language semantics using K frame-

work, consists of three parts: providing evaluation

strategies that conveniently (re)arrange computations

(computations), giving the structure of the conﬁgura-

tion to hold program states (conﬁguration), and writ-

ing K rules to describe transitions between conﬁgura-

tions (K rewrite rules).

Evaluation strategies serve as a link between syn-

tax and semantics, by specifying how the arguments

of a language construct should be evaluated. For ex-

ample, consider the following syntax for arithmetic

expression:

syntax E ::= E

"+" E

[strict]

The attribute strict allows E

and E

to evaluate in

any order, thus enforces a non-determinism. The an-

notation above corresponds to the following four heat-

ing/cooling rules:

+ E

y  + E

...i

| h

+ E

y  + E

...i

| h

y  + E

+ E

...i

| h

y E

+ 

+ V

...i

Here, V

and V

are the evaluated results of the ex-

pressions Exp

and Exp

respectively. The construct

 (HOLE) is a place-holder that will be replaced by

the result of the evaluated term or sub-term.

Conﬁgurations represent the state of a running

program/system and are structured as nested, la-

beled cells containing various computation-based

data structures. Within the K, conﬁguration cells are

represented using an XML-like notation, with the la-

bel of the cell as the tag name and the contents be-

tween the opening and closing tags (i.e., List, Map,

Bag, Set, etc.). For example, consider the following

conﬁguration with three cells:

conﬁguration ≡ hhKi

hMap[id 7→ L]i

env

hMap[L 7→

n]i

store

The k cell holds a list of computational tasks, that

is k : List{K,y} where K holds computational con-

tents such as programs or fragment of programs and

y is the task sequentialization operator which se-

quentializes program statements. The env cell maps

variables to their locations (i.e., env : id 7→ L) and the

store cell maps locations to values (i.e., store : L 7→ n).

These cells are covered by the top cell denoted by T .

K rules describe how a running conﬁguration

evolves by advancing the computation and potentially

altering the state/environment. K rewrite rules are

classiﬁed into two types: computational rules, that

may be interpreted as transition in a program execu-

tion, and structural rules, that rearrange a term to en-

able the application of computational rule. For better

understanding, let us consider the following rule, con-

sidering two cells k and env, for a variable declaration:

τ id

.. . i

ρ[id ← T : Type]

env

This speciﬁes that the next task to evaluate is a vari-

able declaration, which is replaced by an empty com-

putation “.” and a type T is assigned to the variable

id in the environment cell env.

By keeping rules compact and less redundant, it is

less likely that a rule will need to be changed as the

conﬁguration is changed or new constructs are added

to the language.

4 FORMULATING SECURITY

TYPE SYSTEM IN THE K

FRAMEWORK

In this section, we ﬁrst extend Hunt and Sands’s se-

curity type system (Hunt and Sands, 2006) to the case

of database applications, and then we formulate their

typing judgments and rules in the K framework.

Extending Hunt and Sands’s Security Type

Systems: As Hunt and Sands’s security type sys-

tem is ﬂow-sensitive, we extend this for the purposes

of our taint analysis with context-sensitivity in case

of inter-procedural database code. This is depicted in

Figure 2. We consider two security types taint and

untaint with the semi-join lattice of the type domain

as SD = hS, v, ti, where S = {taint, untaint} and

the partial order relation deﬁned as untaint v taint.

Formulating the Type System: Let us now formu-

late the typing judgements and rules in the K fram-

work. Let us ﬁrst consider the typing judgement

pc ` Γ{C}Γ

which speciﬁes that the security environ-

ment Γ

is derived by executing the statement C on the

security environment Γ under the program’s security

context pc. To formulate this, we ﬁrst deﬁne a conﬁg-

uration with three cells namely, k cell , env cell, and

context cell as follows:



hKi

hMapi

env

hMapi

context



Then we deﬁne following K rule to formulate the type

judgement pc ` Γ{C}Γ

.. . i

env

hpc 7→ _ i

context

Tailoring Taint Analysis for Database Applications in the K Framework

373

The symbol “. . . ” appearing in the k cell represents

remaining computations. As a result of the execution

of C which eventually be consumed (denoted by “."),

the previous environment Γ in the env cell will be up-

dated by the modiﬁed environment Γ

(implicitly) in-

ﬂuenced by the current value (denoted by “_") of the

security context pc in the context cell.

Each security type rule is written as

` ζ

Γ ` ζ

where judgements Γ

` ζ

denotes number of premise

and judgement Γ ` ζ denotes a single conclu-

sion. For example, the type rule for assign-

ment statement

Γ`E: T

pc`Γ {id:=E} Γ[[id7→pctT]]

is deﬁned by

the corresponding K rule h

id := T : Type

.. . i

h... ρ[id 7→

µ(pc) t T : Type

] . . . i

env

hµi

context

. Considering this as basis,

in the following section we discuss K rewrite rules

for static taint analysis of database language in the

abstract security type domain S.

[Expression]

Γ ` E : t

x∈FV(E)

Γ(x)

[skip]

pc ` Γ{skip}Γ

[Declaration]

pc ` Γ{τ id} Γ[[id 7→ pc t untaint]]

[Read]

pc ` Γ{id := read()} Γ[[id 7→ pc ttaint]]

[Assignment]

Γ ` E : T

pc ` Γ {id := E} Γ[[id 7→ pc t T]]

[DELETE]

Γ ` B : T

pc ` Γ{DELETE FROM id WHERE B}Γ

[SELECT]

Γ ` B : T pc t T ` Γ{

rs :=

E}Γ

pc ` Γ {SELECT

E INTO

rs FROM id WHERE B} Γ t Γ

[INSERT]

pc ` Γ{

id :=

E}Γ

pc ` Γ {INSERT INTO id(

id) VALUES(

E)} Γ t Γ

[UPDATE]

Γ ` B : T pc t T ` Γ{

id :=

E}Γ

pc ` Γ {UPDATE id SET

id :=

E WHERE B} Γ t Γ

[Function

Call]

Γ `

E :

de f un id(

D){C}

X = getParam(

Γ[[

X 7→

T]] ≡ Γ

pc ` Γ

{C} Γ

pc ` Γ

{de f un id(

D){C}} Γ

pc ` Γ {call id(

E)} Γ

[if]

Γ ` B : T pc t T ` Γ{C}Γ

pc ` Γ {i f B then C} Γt Γ

[if-else]

Γ ` B : T pc t T ` Γ{C

}Γ

pc t T ` Γ{C

}Γ

pc ` Γ {i f B then {C

} else {C

}} Γ

t Γ

[while]

` B : T

pc t T

` Γ

{C}Γ

0 ≤ i ≤ k

= Γ Γ

i+1

= Γ

t Γ Γ

k+1

= Γ

pc ` Γ {while B do {C}} Γ

Figure 2: Flow- and Context-sensitive Security Type Rules

for Taint Analysis.

5 K SEMANTICS FOR

K-DBTaint

In this section, we present K rewrite rules for taint

analysis of our database language. The proposed

analysis addresses semantics of both the database

statements and the imperative program statements to-

gether. To this aim, we consider the following K con-

ﬁguration on which the semantics is deﬁned:

conﬁguration ≡



hKi

hMapi

env

hMapi

context

h hMap i

λ-De f

hListi

f stack

control

hListi

out



Where the special cell hi

contains the list of com-

putation tasks, environment cell env maps variables

or database attributes to their security types, context

cell denotes the current program context, λ-Def cell

supports inter-procedural feature, in and out cells de-

notes input-output operations, and control cell man-

ages function calls using fstack cell. We now describe

the K rules for imperative program statements and

database statements as depicted in Figure 3. We label

the deﬁned rules by R

for future reference. Note that

these rules captures explicit- and implicit- ﬂow sen-

sitivity, context-sensitivity, the semantics of constant

functions, and the semantic of SQL statements.

Imperative Program Statements: Due to space

scarcity, here we discuss a few rules of imperative

program statements. However, a reader can refer

(Alam et al., 2018) for a complete set of rules of

imperative programs. The ﬁrst rule R

decl

deals with

variables declarations and initialization of variables

by their initial security types (untaint in our case) in

the environment cell env. Any unsanitized input gets

its type tainted in the rule R

read

. Rule R

asg

which

handles assignment computations, updates the secu-

rity type of id somewhere in the env cell by the least

upper bound of the security types of the right hand

side expression (i.e. T) and program’s current security

context pc in the context cell. The assignment is then

replaced by an empty computation. In order to cap-

ture implicit ﬂow of taint information in presence of

conditional and loop statements, we deﬁne rules R

and R

while

which update the security context µ in the

context cell based on the security type of B. The term

restore

(µ) restores the previous context on exiting a

block guarded by B and approx(ρ) provides a sound

approximation of the semantics as a least upper bound

of the environments obtained over all possible execu-

tion paths due to the presence of B. Note that the least

ﬁxed point solution in case of “while” is achieved by

deﬁning one of the following auxiliary function ﬁx-

point():

(1)



ﬁxpoint(B,C,ρ

)

.. .



hρ

env

when ρ

= ρ

DATA 2021 - 10th International Conference on Data Science, Technology and Applications

374

decl

: h

τ id

.. . i

ρ[id ← T : Type]

env

read

: h

read( )

taint

.. . i

con-func

: hid

∗ id

.. . i











∗ id

untaint

.. . i

when id

= zero

or id

= zero

∗ id

∗

Type

.. . i

otherwise

asg

: h

id := T : Type

.. . i

h... ρ[id 7→

µ(pc) t T : Type

] . . . i

env

hµi

context

: h

i f (B : T) then {C}

C y restore

(µ) y approx(ρ)

.. . i

µ[pc ← µ(pc)t T]

context

hρi

env

while

: h

while(B : T) do {C}

C y restore

(µ) y approx(ρ) y f ixpoint(B,C, ρ)

.. . i

hρi

env

µ[pc ← µ(pc)t T]

context

fun-decl

: h

de f un F unc_name(Params){C}

.. . i



ψ[Func_name ← lambda(Params,C)]

λ-De f



control

fun-call

: h

lambda(Params,C)(Es : T s) y K

McDecls(Params,T s) y C y return;

.. . i



.List

[ListItem(ρ,K,Ctr)]

.. . i

f stack

Ctr



control

hρi

env

aux-fun(a)

: h

makeAssign(id, E)

id := E;

.. . i

aux-fun(b)

: h

makeAssign((id,

id), (E,

E))

id := E;y makeAssign(

id,

.. . i

aux-fun(c)

: h

makeupdAssign(id

= E

,. . . , id

= E

)

= E

;y . . . y id

= E

;

.. . i

sel

: h

SELECT

E INTO

rs FROM id WHERE B;

if(B){makeAssign(

rs,

E)}

.. . i

ins

: h

INSERT INTO id(

id) VALUES (

E);

makeAssign(

id,

.. . i

upd

: h

UPDATE id SET

id =

E WHERE B;

if(B){makeupdAssign(id

= E

,. . . , id

= E

)}

.. . i

del

: h

DELETE FROM id WHERE B;

.. . i

Figure 3: K rewrite rules for Database Applications.

(2)



ﬁxpoint(B,C,ρ

)

while(B) do {C}

.. .



hρ

env

when ρ

6= ρ

Where case (1) represents that the computation

reaches the ﬁx-point and therefore the computation

is consumed. If not, then the iteration continues as

shown in case (2). The context-sensitivity in presence

of function calls is captured by the rule R

func-call

SQL Statements: Before we deﬁne the semantics

of SQL statements, let us ﬁrst deﬁne the semantics

of auxiliary function makeAssign() which takes two

parameters and generate an assignment statement as

follows:

makeAssign(id, E) => id := E;

where id represents an identiﬁer and E denotes an

expression. On passing a list as parameters to the

function makeAssign(), it generates a set of assign-

ment statements as follows:

makeAssign((id,

id), (E,

E)) => id := E; y makeAssign(

id,

where

id denotes a list of identiﬁer and

E denotes

a list of expression. Observe that, in case of empty

parameter passed to makeAssign(), it generates an

empty computation “." (i.e., makeAssign(.id, .E) =>

. ). Similarly, the function makeupdAssign() accepts

term of the form hid

= E

, . . ., id

= E

i, and returns

a set of assignment statements as follows:

makeupdAssign(id

= E

, . . . , id

= E

) => id

= E

;y . . . y id

= E

;

Let us now deﬁne the semantics of the subset of SQL

statements supported by the language under consider-

ation.

The SELECT statement Q

sel

retrieves values of

those attributes that are present in the expression

E. It

includes a WHERE clause to select number of rows to be

retrieved. The presence of condition B in the WHERE

clause act as an implicit ﬂow of taint information. In

order to capture such implicit ﬂow, we translate Q

sel

to an equivalent if statement as depicted in rule R

sel

Note that, the auxiliary function makeAssign(

rs,

in R

sel

captures ﬂow of tainted data which are re-

trieved from the database table using Q

sel

The UPDATE statement Q

upd

updates the values

of the attributes according to the condition in WHERE

clause. Like SELECT statement, condition in the

WHERE clause of UPDATE statement also acts as im-

plicit ﬂow of taint information. Therefore, rule R

upd

translates Q

upd

into an equivalent if statement to cap-

ture implicit taint ﬂow. The translation of

id =

E into

an auxiliary function makeupdAssign(id

= E

, . . . ,

= E

) captures direct taint ﬂow due to assignment

of tainted data through the expression E.

Tailoring Taint Analysis for Database Applications in the K Framework

375

Table 3: Taint Analysis on Benchmark Programs Set (PL/SQL, 2021) (X: Passed, 7

: False Positives, 7

−

: False

negatives).

Progs. Descriptions K-DBTaint

WAP-TA Pixy Su et al. Cao et al.

Prog1

Balance_Transfer.sql (explicit-ﬂow)

X 7

−

Prog2

Update_Inventory.sql (explicit-ﬂow)

X 7

−

Prog3

Budget.sql (SQL injection)

X 7

−

Prog4

Proc_Inventory.sql (malware attack)

X 7

−

Prog5

Update_Quantity.sql (SQL injection)

X 7

−

Prog6

Populate_Products.sql(SQL injection, constant function)

X 7

−

Prog7

delete_Client.sql (SQL injection)

X 7

Prog8

Update_Account.sql (SQL injection, implicit ﬂow)

X 7

−

Prog9

Record_NewSale.sql (explicit-ﬂow)

X 7

−

Prog10

Get_Country_Id.sql (explicit-ﬂow, constant function)

X 7

Prog11

Add_Car.sql (SQL injection, explicit-ﬂow, constant function)

X 7

Prog12

Award_Bonus.sql (SQL injection, malware attack)

X 7

−

Prog13

Resrvation_Proc.sql (SQL injection, XSS attacks)

X 7

−

Prog14

Credit_Account.sql (SQL injection, implicit ﬂow)

X 7

−

Prog15

Debit_Account.sql (SQL injection, implicit ﬂow)

X 7

−

The INSERT statement appends a tuple to the

database table. We specify attributes names

id and

list of values

E in INSERT statement in the same or-

der as speciﬁed in the table deﬁnition. Therefore, any

tainted expression E ∈

E directly stores tainted data

in the database table that may be accessed and used in

the future computations. Hence, the rule R

ins

trans-

lates Q

ins

into sequence of assignment statements to

capture such direct ﬂow of taint data due to assign-

ment of E to id.

The DELETE statement deletes tuples from a table,

we translate it by an empty computation leaving the

current execution environment unchanged. This is de-

picted in rule R

del

Constant Functions Apart from the semantics of

SQL statements, deﬁning the semantics of constant

functions greatly improve the precision of taint anal-

ysis. For example, consider the statement y := z ×

0 + 4, where z is a tainted variable. Note that, al-

though the syntax-based taint ﬂow makes the vari-

able y tainted, the semantics of the constant function

“z × 0 + 4” that always results 4 irrespective of the

value of z makes y untainted. The semantics approxi-

mation in the security domain, due to the abstraction,

leads to a challenge in dealing with constant func-

tions. As a partial solution, we specify rules for some

simple cases of constant functions such as x - x, x

xor x, x × 0, etc. We mention one of such rules in

con-func

. In this context, as a notable observation, we

consider the following scenario: given the code frag-

ment y := read(); x := y; v := x xor y, the analysis

successfully marks the variable v as tainted. Indeed,

attackers may inject some malicious input containing

a vulnerable control part for which the xor operation

fails to nullify the effect, affecting the subsequent crit-

ical computation involving v.

6 EXPERIMENTAL RESULTS

This section presents experimental results on a set

of benchmark codes collected from (PL/SQL, 2021).

These codes represent a wide range of PL/SQL codes

including explicit-ﬂow (Prg1, Prg2, Prg09, Prg10,

Prg11), implicit-ﬂow (Prg8, Prg14, Prg15), XSS at-

tacks (Prg13), malware attacks (Prg4, Prg12), SQL

injection attacks (Prg3, Prg5, Prg6, Prg7, Prg8,

Prg11, Prg12, Prg13, Prg14, Prg15), constant func-

tions (Prg6, Prg10, Prg11), etc. All experiments are

conducted using a computer system equipped with

core i3, 2.60 GHz CPU, 3GB memory, and Ubuntu

20.04 operating system.

Table 4: Detail Execution Steps in K-DBTaint of Code Snip-

pet Depicted in Figure 1 (U: untaint, T: taint).

Prog. Points

Security Types Rules

1 prod|->U R

decl

2 prod|->U, z|->U R

decl

3 prod|->T, z|->U R

read

4 prod|->T, z|->U R

5 prod|->T, z|->T, PName|->T R

sel

We have implemented a prototype K-DBTaint in

the K tool (version 5.0)

. We have deﬁned a full set of

semantics rules (more than 430 rules) for our database

language under consideration. The tool K-DBTaint

accepts PL/SQL code as input from the console us-

ing K Framework-speciﬁc commands. Table 3 de-

picts the evaluation results of the benchmark codes.

https://github.com/kframework/k/releases

DATA 2021 - 10th International Conference on Data Science, Technology and Applications

376

The results of the K-DBTaint are compared with the

results obtained from some of the available static taint

analysis tools, such as WAP-TA (Evans and Larochelle,

2002), Pixy (Jovanovic et al., 2006), (Wassermann

and Su, 2008), and (Cao et al., 2017), are reported

in columns 3-7. The notations ‘7

’ and ‘7

−

’ indi-

cate failures due to false positives and false negatives

respectively, whereas ‘X’ indicates a successful de-

tection of taint vulnerabilities. Observe that, due to

the ﬂow-sensitivity, context-sensitivity, the enhance-

ment to deal with constant functions and semantics

of SQL statements, K-DBTaint signiﬁcantly reduces

the occurrences of false alarms. In Table 4, we show

how K-DBTaint successfully captures taint ﬂows of the

motivating example in Figure 1 by showing its corre-

sponding execution steps.

7 CONCLUSIONS

In this paper, we proposed an executable rewriting

logic semantics for static taint analysis of a database

language in the K framework. The proposed anal-

ysis addressed the semantics of both the database

statements and the imperative program statements to-

gether. We develop a prototype K-DBTaint in K

based on the theoretical foundation, which allows the

user to analyse PL/SQL code for integrity issues. As

compared to existing works, the proposed approach

has improved precision, as shown by our experimen-

tal evaluation on a set of benchmark programs. In

future, we aim to add more semantic rules to cover

more language features such as aggregate functions,

nested queries, set operations, etc., as an extension

to the current database language and we also address

more semantics-based non-dependencies.

REFERENCES

Alam, M., Halder, R., Goswami, H., Pinto, J. S., et al.

(2018). K-taint: an executable rewriting logic seman-

tics for taint analysis in the k framework. In Proc of

the 13th Int. Conf. on ENASE, pages 359–366.

Alam, M. I. and Halder, R. (2021). Formal veriﬁcation of

database applications using predicate abstraction. SN

Computer Science, 2(3):1–24.

Alam, M. I., Halder, R., and Pinto, J. S. (2021). A de-

ductive reasoning approach for database applications

using veriﬁcation conditions. Journal of Systems and

Software, 175:110903.

avoae, I. M. (2014). Abstract semantics for alias anal-

ysis in k. Electronic Notes in Theoretical Computer

Science, 304:97–110.

Cao, K., He, J., Fan, W., Huang, W., Chen, L., and Pan,

Y. (2017). Php vulnerability detection based on taint

analysis. In 2017 6th ICRITO, pages 436–439. IEEE.

Clavel, M. and et al. (2007). All about maude-a high-

performance logical framework: how to specify, pro-

gram and verify systems in rewriting logic, volume

4350. Springer-Verlag.

Evans, D. and Larochelle, D. (2002). Improving security

using extensible lightweight static analysis. IEEE soft-

ware, 19(1):42–51.

Halim, V. H. and Asnar, Y. D. W. (2019). Static code ana-

lyzer for detecting web application vulnerability using

control ﬂow graphs. In 2019 International Conference

on Data and Software Engineering (ICoDSE), pages

1–6. IEEE.

Huang, W., Dong, Y., and Milanova, A. (2014). Type-based

taint analysis for java web applications. In In Proc.

of Int. Conf. on Fundamental Approaches to Software

Engineering, pages 140–154. Springer.

Hunt, S. and Sands, D. (2006). On ﬂow-sensitive security

types. In Conf. Record of the 33rd ACM SIGPLAN-

SIGACT Sym. on POPL, pages 79–90, S. California.

ACM.

Jana, A., Alam, M. I., and Halder, R. (2018). A symbolic

model checker for database programs. In ICSOFT,

pages 381–388.

Jovanovic, N., Kruegel, C., and Kirda, E. (2006). Pixy: A

static analysis tool for detecting web application vul-

nerabilities. In IEEE, S&P’06, pages pp. 258–263.

Maskur, A. F. and Asnar, Y. D. W. (2019). Static code anal-

ysis tools with the taint analysis method for detect-

ing web application vulnerability. In 2019 Interna-

tional Conference on Data and Software Engineering

(ICoDSE), pages 1–6. IEEE.

Medeiros, I., Neves, N., and Correia, M. (2015). Detect-

ing and removing web application vulnerabilities with

static analysis and data mining. IEEE Transactions on

Reliability, 65(1):54–69.

Meseguer, J. and Ro¸su, G. (2007). The rewriting logic

semantics project. Theoretical Computer Science,

373(3):213–237.

PL/SQL(2021). Github pl/sql project. https://github. com/-

topics/plsql. [Online accessed March-2021].

Ro¸su, G. and ¸Serb

anut

a, T. F. (2010). An overview of the k

semantic framework. The Journal of Logic and Alge-

braic Programming, 79(6):397–434.

Su, G., Wang, F., and Li, Q. (2018). Research on sql injec-

tion vulnerability attack model. In 2018 5th IEEE Int.

Conf. on CCIS, pages 217–221. IEEE.

Tripp, O., Pistoia, M., Fink, S. J., Sridharan, M., and Weis-

man, O. (2009). Taj: effective taint analysis of web ap-

plications. In ACM Sigplan Notices, volume 44, pages

87–97. ACM.

Vijayalakshmi, K. and Syed Mohamed, E. (2021). Case

study: Extenuation of xss attacks through various de-

tecting and defending techniques. Journal of Applied

Security Research, 16(1):91–126.

Wassermann, G. and Su, Z. (2008). Static detec-

tion of cross-site scripting vulnerabilities. In 2008

ACM/IEEE 30th International Conference on Soft-

ware Engineering, pages 171–180. IEEE.

Tailoring Taint Analysis for Database Applications in the K Framework

377