FlowSlicer

A Hybrid Approach to Detect and Avoid Sensitive Information Leaks in Android

Applications using Program Slicing and Instrumentation

Luis Menezes and Roland Wismüller

Institute of Computer Science, Universität Siegen, Siegen, Germany

Keywords:

Static Analysis, Dynamic Analysis, Hybrid Analysis, Slicing, Android, Privacy, Information-ﬂow.

Abstract:

With the increasingly amount of private information stored in mobile devices, the need for more secure ways to

detect, control and avoid malicious behaviors has become higher. The too coarse-grained permission system

implemented in the Android platform does not cover problems regarding the ﬂow of the data acquired by

the apps. In order to enhance detection, awareness and avoidance of such unwanted information ﬂows, we

propose a hybrid information ﬂow analysis that mixes the beneﬁts of static and dynamic analysis, using slicing

and instrumentation. Our results indicate a precise detection and only a small overhead while running the

application. The validation of our method has been done by creating a tool called FLOWSLICER and using the

category AndroidSpeciﬁc from the DROIDBENCH repository of applications with known information leaks.

1 INTRODUCTION

Mobile devices, such as smartphones and tablets are

important devices we all use in our daily life. The ca-

pability of providing tools and storing a big amount of

data that we daily need made them become a truly in-

dispensable device. Much of this success is only pos-

sible because of the ability to conveniently download

applications directly from some repositories. The An-

droid platform has several hundred thousands of ap-

plications divided in different markets (Arp et al.,

2014), including an ofﬁcial one, the Google Play, and

is worldwide known as one of the most popular plat-

forms used in mobile devices. However, along with

all beneﬁts, there is still margin for malicious devel-

opers who try to use the coverage, power of storing

and processing of such platforms to pose a threat to

the user’s privacy, security and also safety.

Modern operating systems, such as Android, use

a permission system in which each application has a

different set of permissions based on its requirements.

It uses as a premise the fact that every application

will only ask for the set of permissions that they need

(Felt et al., 2011b). However, one of the problems of

this approach is the lack of awareness from the user

side (Felt et al., 2011a). The permissions are nor-

mally too coarse-grained, which means that a permis-

sion such as android.permission.INTERNET may

be used either for retrieving server content or send-

ing private data to a server (Jeon et al., 2012). An-

other problem in this approach is the fact that it only

takes care of the access control of the requested data,

i.e., granting or denying the access to a speciﬁc re-

source. Once the access is given, no further control

is used to make sure the information is used properly

(Hedin and Sabelfeld, 2011). However, in order to ac-

knowledge what happens to the information one just

gave access to, an information-ﬂow control approach

needs to be used. Information-ﬂow control tracks the

propagation of the information through the program

and aims to make sure that the program handles the

information securely (Hedin and Sabelfeld, 2011).

To address this issue, most of the existing so-

lutions are using either static or dynamic analysis.

Static analysis works by scanning the application’s

code to examine all possible execution paths and vari-

able values without executing the application, making

the process of analysis fast, repeatable and not de-

pendent on the application execution (Kulenovic and

Donko, 2014). In other hand, it counts with the down-

side of reporting false positives and also false nega-

tives (Walden and Doyle, 2012). False positives occur

because static analysis can only determine that an ex-

pression, e.g., a branch condition, depends on another

one, but usually not how exactly it depends on the

other. Thus, static analysis may consider execution

paths as feasible which actually can never occur in re-

ality. On the other hand, false negatives may happen

450

Menezes, L. and Wismüller, R.

FlowSlicer - A Hybrid Approach to Detect and Avoid Sensitive Information Leaks in Android Applications using Program Slicing and Instrumentation.

DOI: 10.5220/0006428604500455

In Proceedings of the 14th International Joint Conference on e-Business and Telecommunications (ICETE 2017) - Volume 4: SECRYPT, pages 450-455

ISBN: 978-989-758-259-2

because of inaccuracies in the analysis’ abstraction.

Dynamic analysis is performed while the application

is actually executed, thus it is not susceptible to false

positives. However, dynamic analysis only checks a

concrete program execution, thus, it can never provide

a proof that an application does not leak sensitive in-

formation. In addition, this approach may have a large

negative impact on the application’s performance.

In order to keep user’s data private, we propose

FLOWSLICER, a hybrid approach to detect sensitive

information leaks in Android applications by mixing

the beneﬁts of static and dynamic analysis.

In this paper, we examine the effects of our ap-

proach by testing well-known malicious applications

present in the ANDROIDSPECIFIC category from the

DROIDBENCH repository in order to analyze both the

precision of FLOWSLICER and its runtime overhead.

The contributions of this work are: a hybrid method

to detect information leaks on Android devices.

The rest of this paper is organized as follows: The

state-of-art and some related work that inspired our

method are presented in Section 2. FLOWSLICER is

presented in Section 3. The evaluation of the method

is presented in Section 4. Section 5 brings the conclu-

sion of the work.

2 RELATED WORK

2.1 Static Analysis Approaches

DROIDSAFE (Gordon et al., 2015) presents a system

for analyzing explicit ﬂows of sensitive information in

Android applications by performing an information-

ﬂow analysis based on a model obtained from the An-

droid Open Source Project (AOSP) implementation.

The information-ﬂow analysis is performed by using

an object sensitive points-to-analysis, to cover code

that heavily uses object-oriented language features

(such as inheritance and polymorphic code reuse),

and ﬂow insensitivity, considering all possible run-

time event orderings that asynchronous callbacks can

trigger. FLOWDROID (Arzt et al., 2014) was for a

long time the state-of-the-art in the ﬁeld of static anal-

ysis for Android applications. It implements a pre-

cise model of Android’s lifecycle which allows the

analysis to properly handle callbacks invoked by the

Android framework, while context-, ﬂow-, ﬁeld- and

object-sensitivity reduces the number of false alarms.

For evaluation, FLOWDROID uses SecuriBench Mi-

cro (Livshits, 2005), which is a benchmark set de-

signed for web applications, DROIDBENCH and a set

of well-known Android test applications. It achieves

93% recall and 86% precision. (Fan and Xuan, 2016)

propose a model of a dependence-based taint analy-

sis for web applications. It uses the concept of source

and sink methods in order to detect if there is a ﬂow

of information coming from the former to the latter.

Although the current static analysis frameworks

provide valuable categorization for different Android

applications, there are still several limitations in the

analysis, since the static analysis can only perform

an approximation of the real behavior of the appli-

cation under focus. It assumes that some of the ﬂow

paths may be used while the application is running.

However, it is not asured to happen, since some of the

paths may depend on runtime variables.

2.2 Dynamic Analysis Approaches

TAINTDROID (Enck et al., 2010) is an information-

ﬂow tracking system for monitoring privacy in real-

time on Android smartphones. It incurs 14% perfor-

mance overhead on a CPU-bound microbenchmark

with low overhead when running thirdparty applica-

tions. It works by categorizing privacy-sensitive data

sources and labels when applications obtain informa-

tion from these sources. It also performs system-wide

tracking of variables, ﬁles and interprocess messages

that propagate these data. DROIDBOX (Lantz, 2011)

modiﬁed the Android’s core libraries in order to em-

ploy an integrated system, containing TAINTDROID.

Although performing a more accurate analysis,

the presented approaches still have some problems

such as the incurrence of overhead that may impact

the performance of the analyzed application and also

the need of modifying the system, which may forbid

normal users to use this approach, since it needs more

knowledge around the Android modiﬁcation ﬁeld in

order to install the system.

3 METHODOLOGY

FLOWSLICER is an approach that mixes a conserva-

tive static analysis with a dynamic analysis using a

tagging architecture. The static analysis is able to

identify the important information ﬂow and, then, ﬁl-

ter important parts of the ﬂow to be analyzed with

a dynamic analysis, that makes use of a tagging ar-

chitecture to keep track of all sensitive objects that

appears on the ﬁltered ﬂow. The hybrid approach pre-

sented here results in a high detection precision, small

overhead and no false positives.

FlowSlicer - A Hybrid Approach to Detect and Avoid Sensitive Information Leaks in Android Applications using Program Slicing and

Instrumentation

451

3.1 Input

FLOWSLICER takes as input a list of source and sink

methods and the APK ﬁle to be analyzed. The list of

source and sink methods is provided by SUSI (Arzt

et al., 2013), a tool that classiﬁes the methods present

in the Android API as source, sink or neither, with a

recall and precision of more than 92%, as evaluated

using ten-fold cross validation. The APK ﬁle is an-

alyzed by extending the Soot Framework (Vallée-Rai

et al., 1999), which provides important resources for

a precise analysis.

3.2 Entry Points

Since Android applications, unlike other Java pro-

grams, do not have an unique main method (Arzt

et al., 2014), different components must be analyzed

in order to create a main method and run the analy-

sis. FLOWSLICER uses the AndroidManifest.xml ﬁle

of each application under analysis to, initially, detect

which components are used and, afterwards, to create

the main method taking the extracted information into

account.

The components present in the Android platform

are: Activity, Service, ContentProvider and Broadcas-

tReceiver. Application is also provided by the An-

droidManifest.xml ﬁle and works as a base class to

handle the data of the whole application. In order to

obtain the correct ﬂow of data within the application,

the lifecycle of each component and also the Appli-

cation class need to be taken into account, as well

as the order they may be called. FLOWSLICER calls

the methods that handle the lifecycle of these classes

based in the following order: the Application class is

called ﬁrst, since it is the base of the whole applica-

tion and is called before the other components. It is

followed by the lifecycle present in the Service and

BroadcastReceiver classes, as they may be called in-

dependently, with the application in the foreground or

not. The next calls are to the ContentProvider com-

ponent and ﬁnally to the Activity lifecycle methods.

3.3 Call Graph

FLOWSLICER uses the entry point just created to

build the application’s call graph in order to analyze

the relationships between methods and routines and

also to ﬁnd out the reachable methods. Since we must

not miss any possible information ﬂow, the call graph

must be built in a conservative way, i.e., at a call site

we must consider every method that might be called.

However, due to polymorphism and other program-

ming languages features present in Java-based sys-

tems, the exact class of the receiver object of a virtual

call site may not be known during the analysis’ time.

In order not to be overly conservative, FLOWSLICER

uses Class Hierarchy Analysis (CHA), which exploits

the fact that only subtypes of a receiver’s declared

type are possible types at runtime and, thus, creates

edges for each of these in the ﬁnal call graph. A prob-

lem that may arise from such a conservative call graph

generation is related to false positives. However, we

are able to discard all the found false positives after

performing the dynamic analysis.

3.4 Program Slicing

According to (Weiser, 1981), a program slice consists

only of the parts of the program that really affect the

result obtained at a speciﬁc point of interesent, re-

ferred to as a slicing criterion. FLOWSLICER makes

use of this technique in order to extract the possible

ﬂows that lead to an information leak.

The most used way of creating slices of a program

is by creating a system dependency graph (SDG)

(Horwitz et al., 1988). A SDG is a connected col-

lection of every program dependency graph (PDG)

(Larsen and Harrold, 1996; Malloy et al., 1994; Ot-

tenstein and Ottenstein, 1984) related to the reachable

methods of the program. The PDGs represent the de-

pendencies between each statement of a method.

To create the edges of a PDG it is necessary to de-

termine both the control and data dependencies within

that method. A statement B has a control dependency

on a preceding statement A if the output of A deter-

mines whether B will be executed. On the other hand,

a statement B has a data dependency on a preceding

statement A if the statement B uses the result of the

statement A.

For the example of ﬁgure 1, we have that the line

4 depends on line 3 (data dependency) because line 4

uses the value of st deﬁned in line 3, line 5 depends

on line 4 (data dependency) because the condition’s

value is determined by the value of t computed in line

4, and line 6 depends both on line 4 (data dependency)

and line 5 (control dependency) because it is using the

value of t from line 4 and its execution is controlled

by the if-statement in line 5. The PDG resulting from

this is shown in ﬁgure 2.

The SDG is created by connecting methods the

same way as they were connected in the call graph.

In the example of ﬁgure 2, the nodes representing

an invocation of a method should now have an edge

with the respective method’s PDG. The same way, the

nodes that represent a return should have an edge to

the statement that receives the result of the computa-

tion of the called method. In order to represent in-

SECRYPT 2017 - 14th International Conference on Security and Cryptography

452

1 protected void onCreate(Bundle savedState)

2 {

3 String st = getSecret();

4 String t = st.substring(0, 2);

5 if (t.startsWith("1"))

6 leak(t);

7 }

Figure 1: A simple method that may leak sensitive informa-

tion.

Figure 2: Program dependency graph for the onCre-

ate(Bundle) method shown as example.

stance or class variables, FLOWSLICER also create

new nodes to represent them. These nodes have the

same dependencies as the normal statements. How-

ever, they are directly connected to the SDG and not

generated inside of the PDG.

As stated before, a possible information leak takes

place once the ﬂow of an information obtained by a

call to a source method hits a sink method. The same

concept is used within the slice. Since the slice con-

tains only statements that may affect the execution of

a sink method, it is possible to assume that if a source

method is part of the slice, we have a possible ﬂow.

FLOWSLICER uses the SDG to determine whether

a statement depends on the other and, then, creates

slices according to the slicing criterion. The use of

slicing also allows the dynamic analysis to look only

into the important statements, reducing caused over-

head and proving the false positives not to be true.

3.5 Instrumentation

In order to help the dynamic analysis with the re-

sults obtained from the static analysis execution,

FLOWSLICER makes use of instrumentation to point

out the important statements that need to be tracked

in order to determine the ﬁnal sensitiveness of the ob-

jects that are used by a sink method. FLOWSLICER

uses the slices obtained after using the program slic-

ing technique to instrument the code in the correct

points.

FLOWSLICER starts adding a statement either be-

fore or after a call to the important statements. The

statements are added after a source method, since it

results in an object that must be ﬂagged as sensitive,

after the statements of the ﬂow that are in between the

source and the sink calls, since they need to be ana-

lyzed regarding the sensitiveness of their result, and

before the calls to the sink method, since they are go-

ing to use the possibly sensitive objects.

3.6 Tagging

Tagging is the process of deﬁning a label, that repre-

sents a behavior or a characteristic, to objects that we

want to keep track of. FLOWSLICER labels objects

regarding the sensitiveness during the dynamic analy-

sis. The instrumented calls added to the application

in the previous process helps the dynamic analysis

to add these labels to objects according to the need.

Once an object is used in the ﬂow, the tags are added

to it and, when a sink call uses one of the sensitive ob-

jects, we are sure a leak of information has happened.

In order to enhance awareness, an alert containing in-

formations about the leak is sent to the user so that he

can decide whether he should keep using the applica-

tion or not.

4 EVALUATION

The evaluation of the method has been executed us-

ing a Core i7-4970 with 16GB RAM running Win-

dows 10 and a Nexus 5X running Android 7.1.1. The

decision to run the dynamic analysis on a real device

instead of the emulator is motivated by problems we

found while running it on the emulator, such as API

calls that return null instead of a sensitive data. We

believe this decision made the process more realistic

and more trustworthy by avoiding discrepancies that

may happen regarding the Android emulator execu-

tion.

The evaluation is performed with respect to two

goals: First, we assess the accuracy of our approach

by analysing a set of benchmark applications with

known leaks. Second, we evaluate its runtime over-

head, by executing both the original and the instru-

mented application under the exactly same condi-

tions, i.e., using the same number and sequence of

input events.

FlowSlicer - A Hybrid Approach to Detect and Avoid Sensitive Information Leaks in Android Applications using Program Slicing and

Instrumentation

453

4.1 Testing Process

We start the process by running FLOWSLICER with

the original application as input, obtaining an instru-

mented application as result. Thus, we run the origi-

nal application on the device to obtain base results and

as soon as it ﬁnishes, we run the instrumented version

in the same conditions as previously. After that, we

obtain the number of leaks found and also the per-

formance results required to compare the overhead of

our approach.

4.2 Preparing the Execution

As both original and instrumented applications will

be tested equaly, we need to make sure the environ-

ment is clean to allow both to execute without any

external interference. Before every execution, we

uninstall any previously installed application with the

same package name, preventing that any previously

saved parameter or conﬁguration interferes with the

current execution. Afterwards, our test ensures that

all permissions declared in AndroidManifest.xml

are granted by the time the application is running.

In both cases, the same seeds for inputing events on

the device with the UI/Application Exerciser Mon-

key (Google, 2017) are used. We also run the

dumpsys command repeatedly every 2 seconds to

analyse memory and CPU overhead while the appli-

cation is executing.

4.3 Obtaining the Results

The number of leaks found in each application is de-

termined by running the instrumented application on

a real device and using the UI/Application Exerciser

Monkey (Google, 2017) to generate random events.

To determine the overhead, the amount of memory

and CPU consumed by the instrumented application

is compared to the same values found while running

the original application with the same random events

as the ones generated for the instrumented one.

4.4 Testing Results

The table lists the results obtained by running the

analysis for the category ANDROIDSPECIFIC from

DROIDBENCH and analyze the following variables:

number of real leaks (obtained from the DROID-

BENCH repository), number of possible leaks found

during the static analysis, number of leaks found by

executing the instrumented version of the application.

The last parameter considered for the result is the

overhead caused by our instrumentation and dynamic

analysis.

Table 1: Evaluation of FLOWSLICER for all applications

contained in AndroidSpeciﬁc category of DROIDBENCH.

Application Real Static Dynamic

ApplicationModeling 1 1 1

DirectLeak 1 1 1

InactiveActivity 0 0 0

LogNoLeak 0 0 0

Library 1 1 1

Obfuscation 1 1 1

Parcel 1 1 1

PrivateDataLeak1 1 2 1

PrivateDataLeak2 1 1 1

PrivateDataLeak3 2 2 2

PublicAPIField1 1 1 1

PublicAPIField2 1 1 1

4.5 Analysis of Results

As seen in table 1, FLOWSLICER is able to detect

100% of the leaks with only one false positive during

the static analysis. The false positive happens because

whenever a block of statements depend on some vari-

able, all of those statements are control dependents on

this variable, even though they do not use it for any-

thing. However, it was proven not to be a leak in our

dynamic analysis, that also detected all the leaks.

In order to prove that our technique not only has

a high rate of detection, but also does not cause an

overhead that precludes the use of the instrumented

applications, we analyzed the overhead for both CPU

and memory. The memory overhead caused by the

instrumentation reaches an average of just 4% com-

pared to the normal applications. It happens because

we make use of memory to store the tags of the ob-

jects we need during the analysis. Given the fact that

memory is typically not a scarce resource in today’s

mobile devices, this overhead is quite acceptable. The

overhead caused on the CPU use is on average about

1%, which allows the user to use the instrumented ap-

plications without any different feeling or perception.

5 CONCLUSION

We proposed FLOWSLICER, a hybrid approach to de-

tect, avoid and alert the user about information leak

on Android devices. FLOWSLICER was able to iden-

tify all privacy leaks contained in the ANDROIDSPE-

CIFIC category from DROIDBENCH benchmark with

a small rate of false positives during the static analy-

sis and very low overhead to the Android device while

testing the application dynamically.

Despite of the results presented in this paper, there

SECRYPT 2017 - 14th International Conference on Security and Cryptography

454

is still room for improvement. The call graph con-

struction, although conservative, creates many differ-

ent edges that end up increasing the number of false

positives and raises the size of our system depen-

dency graph drastically. The user awareness has al-

ways been the focus of this method. However, the

way FLOWSLICER shows the alerts and the results of

analysis is not very user-friendly. In a future work, we

also want to give the user more power, allowing him

to decide what is considered a sensitive information

by ﬁltering the list of sources and sinks.

ACKNOWLEDGEMENTS

The authors thank the National Council for Scientiﬁc

and Technological Development (CNPq) for the ﬁ-

nancial support.

REFERENCES

Arp, D., Spreitzenbarth, M., Gascon, H., and Rieck, K.

(2014). Drebin: Effective and explainable detection

of android malware in your pocket.

Arzt, S., Rasthofer, S., and Bodden, E. (2013). Susi: A tool

for the fully automated classiﬁcation and categoriza-

tion of android sources and sinks.

Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A.,

Klein, J., Le Traon, Y., Octeau, D., and McDaniel, P.

(2014). Flowdroid: Precise context, ﬂow, ﬁeld, object-

sensitive and lifecycle-aware taint analysis for android

apps. SIGPLAN Not., 49(6):259–269.

Enck, W., Cox, L. P., Jung, J., and et al. (2010). Taintdroid:

An information-ﬂow tracking system for realtime pri-

vacy monitoring on smartphones.

Fan, G. and Xuan, Z. (2016). Design and implementation

of a dependence-based taint analysis. In 2016 11th

International Conference on Computer Science & Ed-

ucation (ICCSE), pages 985–991. IEEE.

Felt, A. P., Chin, E., Hanna, S., Song, D., and Wagner, D.

(2011a). Android permissions demystiﬁed. In Pro-

ceedings of the 18th ACM Conference on Computer

and Communications Security, CCS ’11, pages 627–

638, New York, NY, USA. ACM.

Felt, A. P., Greenwood, K., and Wagner, D. (2011b). The

effectiveness of application permissions. In Proceed-

ings of the 2Nd USENIX Conference on Web Applica-

tion Development, WebApps’11, pages 7–7, Berkeley,

CA, USA. USENIX Association.

Google (2017). Ui/application exerciser monkey.

https://developer.android.com/studio/test/monkey.html.

[Online; accessed 09-March-2017].

Gordon, M. I., Kim, D., Perkins, J. H., Gilham, L., Nguyen,

N., and Rinard, M. C. (2015). Information ﬂow anal-

ysis of android applications in droidsafe. In 22nd An-

nual Network and Distributed System Security Sym-

posium, NDSS 2015, San Diego, California, USA,

February 8-11, 2015. The Internet Society.

Hedin, D. and Sabelfeld, A. (2011). A perspective on

information-ﬂow control.

Horwitz, S., Reps, T., and Binkley, D. (1988). Interpro-

cedural slicing using dependence graphs. In Pro-

ceedings of the ACM SIGPLAN 1988 Conference on

Programming Language Design and Implementation,

PLDI ’88, pages 35–46, New York, NY, USA. ACM.

Jeon, J., Micinski, K. K., Vaughan, J. A., Fogel, A., Reddy,

N., Foster, J. S., and Millstein, T. (2012). Dr. android

and mr. hide: Fine-grained permissions in android ap-

plications. In Proceedings of the Second ACM Work-

shop on Security and Privacy in Smartphones and Mo-

bile Devices, SPSM ’12, pages 3–14, New York, NY,

USA. ACM.

Kulenovic, M. and Donko, D. (2014). A survey of static

code analysis methods for security vulnerabilities de-

tection. In MIPRO, pages 1381–1386. IEEE.

Lantz, P. (2011). Droidbox: An android ap-

plication sandbox for dynamic analysis.

https://github.com/pjlantz/droidbox. [Online; ac-

cessed 27-February-2017].

Larsen, L. and Harrold, M. J. (1996). Slicing object-

oriented software. In Rombach, H. D., Maibaum, T.

S. E., and Zelkowitz, M. V., editors, ICSE, pages 495–

505. IEEE Computer Society.

Livshits, B. (2005). Stanford securibench. https://suif. stan-

ford.edu/livshits/securibench/. [Online; accessed 22-

February-2017].

Malloy, B. A., Mcgregor, J. D., Krishnaswamy, A., and

Medikonda, M. (1994). An extensible program rep-

resentation for object-oriented software. ACM SIG-

PLAN Notices, 29:38–47.

Ottenstein, K. J. and Ottenstein, L. M. (1984). The program

dependence graph in a software development environ-

ment. SIGPLAN Not., 19(5):177–184.

Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam,

P., and Sundaresan, V. (1999). Soot - a java byte-

code optimization framework. In Proceedings of the

1999 Conference of the Centre for Advanced Studies

on Collaborative Research, CASCON ’99, pages 13–.

IBM Press.

Walden, J. and Doyle, M. (2012). SAVI: static-analysis

vulnerability indicator. IEEE Security & Privacy,

10(3):32–39.

Weiser, M. (1981). Program slicing. In Proceedings of the

5th International Conference on Software Engineer-

ing, ICSE ’81, pages 439–449, Piscataway, NJ, USA.

IEEE Press.

FlowSlicer - A Hybrid Approach to Detect and Avoid Sensitive Information Leaks in Android Applications using Program Slicing and

Instrumentation

455