MADLIRA: A Tool for Android Malware Detection
Khanh Huu The Dam
1
and Tayssir Touili
2
1
Nha Trang University, Vietnam
2
LIPN, CNRS, University Paris 13, France
Keywords:
Malware Detection, Android, Static Analysis.
Abstract:
Today, there are more threats to Android users since malware writers are changing their target to explore the
weakness of Android devices, in order to generate malicious behaviors. Thus, detecting Android malwares
is becoming crucial. We present in this paper a tool, called MADLIRA (MAlware Detection using Learning
and Information Retrieval for Android). This tool implements two static approaches: (1) apply Information
Retrieval techniques to automatically extract malicious behaviors from a set of malicious and benign appli-
cations, (2) apply learning techniques to automatically learn malicious applications. Then, in both cases,
MADLIRA can classify a new Android application as malicious or benign.
1 INTRODUCTION
The number of new malwares increased by 36 percent
in one year from 2014 to 2015. It is estimated that
there are more than one million of new pieces of mal-
wares released everyday (Symantec, 2016). Accord-
ing to the report of Kaspersky Lab in the third quar-
ter of 2017, there are more than 1.5 million malicious
packages detected on mobiles (Kaspersky, 2018). The
last years, the number of attacks on mobiles has in-
creased enourmeously, using various types such as
backdoors, cryptomining, fake apps, banking trojans,
etc. (McAfee, 2019). Consequently, there are more
threats to Android users. Thus, the challenge is to
detect malicious behaviors in Android applications.
However, most of industry anti-viruses are based on
the signature matching technique where a scanner will
search for specific elements (called signatures) like
permission requirements, package names and class
names or the segmentations of bytecode and certain
strings in the Android applications. If an application
contains such a signature, it is marked as malicious. If
not, it is marked as benign. This signature matching
technique is not very robust. Indeed, malware writ-
ers may implement some obfuscation techniques, e.g.,
renaming, inserting functions (Rastogi et al., 2013;
Zheng et al., 2013; Maiorca et al., 2015; Preda and
Maggi, 2016) to change the structure of a malware
while keeping its same behaviors so that the known
signatures cannot be used to recognize it.
To avoid this issue, several works (Burguera et al.,
2011; Dimja
ˇ
sevic et al., 2015; Canfora et al., 2015;
Jang et al., 2016; Malik and Khatter, 2016) use dy-
namic analysis to analyze the behaviors of the An-
droid applications instead of its syntactic signatures.
In this approach, the behaviors are dynamically ob-
served while running the application on an emulated
environment. However, by the dynamic analysis tech-
nique, it is hard to trigger the malicious behaviors in
a short period since they may require a delay or only
show up after some interaction of users.
To overcome these limitations, one needs to anal-
yse the behaviors of a program without executing it.
To this aim, we consider in this paper API calls in
Android applications to model the malicious behav-
iors in a static way. Indeed, API functions are me-
diators between programs and their running environ-
ment. They are used to access or modify the system
by malware authors. Therefore, API functions are
crucial to specify malicious behaviors. Thus, in this
work, we model a program using an API call graph,
which is a directed graph whose nodes are API func-
tions, and whose edges specify the execution order of
the API function calls: an edge ( f , f
0
) expresses that
there is a call to the API function f followed by a
call to the API function f
0
. For example, let us look
at a typical behavior of an Android trojan SMS spy.
The smali code of this behavior is given in Figure 1.
This behavior consists in collecting the phone id and
then sending this data via a text message. This task is
done by a sequence of API calls. First, the function
getDeviceId() is called at line 5 to collect the phone
id. Then, the TelephonyManager object is gotten by
calling getDefault() at line 19. Finally, the phone id is
sent to an anonymous phone number via a text mes-
sage by calling sendTextMessage() at line 25.
670
Dam, K. and Touili, T.
MADLIRA: A Tool for Android Malware Detection.
DOI: 10.5220/0010339506700675
In Proceedings of the 7th International Conference on Information Systems Security and Privacy (ICISSP 2021), pages 670-675
ISBN: 978-989-758-491-6
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
To represent this behavior, we use a malicious API
graph. Figure 2 shows the malicious API graph of
this behavior. The edges express that a call to the
function Landroid/telephony/TelephonyManager; ->
getDeviceId() is followed by the calls to the func-
tions Landroid/telephony/gsm/SmsManager; -> get-
Default() and Landroid/telephony/gsm/SmsManager;
-> sendTextMessage().
Figure 1: A piece of smali code of an Android trojan SMS
spy.
Figure 2: A malicious API graph of an Android trojan SMS
spy.
Using this API call graph representation, we
implement two static approaches in a tool, called
MADLIRA, for Android malware detection: (1) ap-
ply the Information Retrieval techniques of (Dam
and Touili, 2017a) to automatically extract mali-
cious behaviors from a set of malicious and benign
Android applications, and use the extracted mali-
cious behaviors to detect malwares; and (2) apply
the learning techniques of (Dam and Touili, 2017b)
to automatically learn malicious Android applica-
tions. We applied our tool on 3518 Android mal-
wares and 1118 Android benwares, we obtained a de-
tection rate of 98.76% with 0.24% false alarms. We
present our tool MADLIRA in this paper. Our tool
can be found in https://lipn.univ-paris13.fr/
dam/
tool/androidTool/MADLIRA.html
2 RELATED WORK
Similar to our presentation, (Burguera et al., 2011;
Gascon et al., 2013; Jang et al., 2016; Song and
Touili, 2014; Dam and Touili, 2019b) use API calls to
represent the malicious behaviors of malware. (Gas-
con et al., 2013) represent the applications by func-
tion call graphs where nodes correspond to function
calls and edges connect the callers to the callees. This
model is different from our API call graph where
nodes correspond to API function calls and edges
specify the execution order of API functions in the
application, i.e., it allows the connection of two func-
tions which have the same callers while (Gascon et al.,
2013) do not allow that connection. Moreover, our
tool allows the extraction of Android malicious be-
haviors.
In (Burguera et al., 2011; Jang et al., 2016), dy-
namic analysis is applied to capture the behaviors of
an application via system calls. Crowdroid in (Bur-
guera et al., 2011) generates the feature vector for
each Android application by counting the number of
system calls which are required during its execution.
Andro-dumpsys in (Jang et al., 2016) monitors the
API calls and its parameters and store them in a pro-
file. As we have mentioned before, dynamic analysis
is limited by a time interval. Our analysis is done in a
completely static way.
In (Song and Touili, 2014), the authors apply static
analysis to model Android applications and then ap-
ply model checking for malware detection. Their
approach requires the manual specification of mali-
cious behaviors (logic formulas), after a tedious man-
ual study of the code. In contrast, our tool can auto-
matically extract malicious behaviors of Android mal-
wares and then use these malicious behaviors to detect
malware.
Similar to the STAMAD tool (Dam and Touili,
2019a), our tool MADLIRA allows to learn and
extract malicious behaviors. However, STAMAD
is implemented for PC malwares whereas our tool
MADLIRA tackles Android malwares.
3 BACKGROUND
Given a set of malicious and a set of benign An-
droid applications, MADLIRA first extracts from ev-
ery program its corresponding API call graph. To
perform this step, we apply the techniques of (Dam
and Touili, 2017a; Dam and Touili, 2017b) that com-
pute this graph by performing a kind of reachability
analysis on the Control Flow Graphs of the programs.
Then, MADLIRA can (1) either automatically extract
MADLIRA: A Tool for Android Malware Detection
671
malicious behaviors using Information Retrieval tech-
niques, or (2) apply machine learning techniques to
automatically learn and detect malwares. Then, in
both cases, MADLIRA can classify a new given un-
seen application as malicious or benign. In this sec-
tion, we will present the main ideas behind these two
approaches that are implemented in MADLIRA.
3.1 Extraction of Malicious Behaviors
Given a set of malicious and a set of benign An-
droid applications, MADLIRA implements the idea
of (Dam and Touili, 2017a), to automatically extract
the malicious behaviors of the malwares. It first ex-
tracts the API call graph corresponding to each appli-
cation, then, MADLIRAs goal is to compute a ma-
licious API call graph that represents the behaviors
that are present in the malicious programs but not in
the benign ones. To this aim, it will extract the sub-
graphs which are relevant to the malicious graphs but
not relevant to the benign ones. A relevant subgraph
contains nodes and edges that are crucial to the mali-
cious API call graphs. This problem can be seen as an
Information Retrieval (IR) problem, where the goal is
to retrieve relevant items and reject nonrelevant ones.
One of the most efficient techniques in information
retrieval is the TFIDF
1
scheme. It was widely applied
for document extraction by the IR community in web
searching, text searching, image searching, etc. (Dam
and Touili, 2017a) applied these IR TFIDF techniques
to extract malicious API call graphs from a set of An-
droid malwares and benwares. MADLIRA applies
the techniques of (Dam and Touili, 2017a) to extract
malicious API call graphs: it associates to each node
and each edge of the API call graphs in the malicious
and benign applications a weight using the formulas
of (Dam and Touili, 2017a). These weights are com-
puted from the occurrences of terms (nodes/edges)
in benwares and malwares. These weights allow to
measure the relevance of each term (nodes/edges) to
malwares and benwares. The higher the weight is,
the more relevant the term is to malwares. Then,
the malicious API call graph is constructed from the
edges and nodes that have the highest weight. Finally,
MADLIRA uses the automatically extracted mali-
cious behavior specification to determine whether a
new unknown program is malicious or not by per-
forming a kind of product between the extracted mali-
cious API call graph, and the new program’s API call
graph. More details about this approach can be found
in (Dam and Touili, 2017a).
1
TFIDF stands for Term Frequency and Inverse Docu-
ment Frequency
3.2 Learning Malicious Behaviors
In our second approach, we implement the kernel
based support vector machine technique on API call
graphs to compute a function h which classifies An-
droid malwares from Android benign applications.
The choice of support vector machines is motivated
by the fact that they are very suitable for nonvecto-
rial data (graphs in our setting), whereas the other
well-known learning techniques like artificial neural
network, k-nearest neighbor, decision trees, etc. can
only be applied to vectorial data. This method is
highly dependent on the choice of kernels. A kernel
is a function which returns similarity between data.
In MADLIRA, we use a variant of the random walk
graph kernel that measures graph similarity as the
number of common paths of increasing lengths. To
implement this graph kernel support vector machine
approach to learn Android malwares, MADLIRA ap-
plies the ideas and details of (Dam and Touili, 2017b).
4 MADLIRA DESCRIPTION
MADLIRA takes as input a set of Android malwares
and a set of Android benwares and can either (1) ex-
tract a malicious API graph representing the mali-
cious behaviors of the Android malwares in the set;
or (2) learn to classify Android malwares without ex-
tracting the malicious behaviors. These phases are
called the training phases. Then, given a new An-
droid application, MADLIRA checks whether it is
malicious or not.
MADLIRA has two main components: TFIDF
component, which extracts the malicious behaviors
and uses these malicious behaviors to check whether a
new application is malicious or not (read Section 3.1
for more details), and SVM
2
component, which ap-
plies random walk graph kernel based support vector
machines to classify malwares from benign applica-
tions (read Section 3.2 for more details). MADLIRA
consists of three modules:
Module 1: Android Application Modeling.
This module takes as input an Android application (an
APK file). It first applies the apktool (ApkAndroid-
Tool, 2016) to decompile this APK file to smali codes.
Then, these smali codes are transformed to the con-
trol flow graphs of the Android application. Finally,
the Graph Computation component takes as input the
Android API Library and the control flow graph of the
2
SVM stands for Support Vector Machine
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
672
Android
Application (APK
file)
Apktool
Smali Codes
Control Flow
Graphs
Android API
Library
Graph
Computation
Android API
Call Graph
Android Application Modeling - Module 1
Figure 3: Android Application Modeling - Module 1.
Android application to compute the Android API call
graph by the algorithm of (Dam and Touili, 2017a).
Module 2 : Extraction of Malicious Behaviors.
This module consists in two phases: the extraction
phase, i. e., the extraction of malicious behaviors, and
the detection phase, i. e., the malicious behavior de-
tection. In the extraction phase, it takes as input a
set of malwares and a set of benwares. After ap-
plying the first module to extract their corresponding
API call graphs, these graphs are fed to the Malicious
Graph Computation component to compute the ma-
licious API graph. This component implements the
TFIDF weighting term scheme introduced in Section
3.1 to compute the malicious behaviors. It outputs
malicious API graphs representing the malicious be-
haviors. This phase will be called ”training phase”.
In the detection phase, this module takes as input an
Android application (an APK file) and applies the
first module to compute its corresponding API call
graph. Then, it checks whether the program’s graph
contains any malicious behavior from the malicious
API graphs (the output of the “training phase”) or not.
If this program contains any malicious behavior, the
output is “Malicious!”. Otherwise, the output is “Be-
nign!”.
Module 3: Learning Malicious Behaviors.
This module implements the learning technique de-
scribed in Section 3.2. It consists in two phases: the
learning phase and the detection phase. In the learn-
ing phase, it takes as input a set of malwares and a set
of benwares. It first applies the first Module to com-
pute their corresponding API call graphs. Then, these
API call graphs are fed to the SVM training compo-
nent, i.e., LIBSVM (Chang and Lin, 2011), to com-
pute a SVM training model.
In the detection phase, it takes as input an Android ap-
plication (an APK file), applies the Application Mod-
eling Module to compute its corresponding API call
graph. Then, it uses SVM classifier with the training
model (the output of the first phase) to classify the
program either” Malicious!” or “Benign!”.
5 EXPERIMENTS
To evaluate our tool, we use a dataset of 1118 An-
droid benwares, which are collected from the web-
site apkpure.com, and 3518 Android malwares which
are gotten from the Drebin dataset (Arp et al., 2014).
MADLIRA gives promising results:
Extraction of Malicious Behaviors. To evaluate the
performance of Module 2, we first applied our tool to
automatically extract a malicious API graph from a
set of 1900 Android malwares and 704 Android ben-
wares. The obtained malicious API graph is then used
for malware detection on a test set of 1618 Android
malwares and 414 Android benwares. We obtained
encouraging results: a detection rate of 96.6% with
15.7% false alarms.
Learning Malware. To evaluate the performance of
Module 3, we randomly split the dataset into two par-
titions, a training and a testing partition. For the train-
ing partition, the quantity of malwares and benwares
is balanced with 704 samples for each, this will allow
us to compute the SVM classifier. The test set con-
sisting of 2814 Android malwares and 414 Android
benwares, is used to evaluate the classifier. Using the
training set, we compute the training model. Then,
we apply this training model to classify Android mal-
wares on the test set and obtain a detection rate of
98.76%, with 0.24% of false alarms.
6 EXAMPLES OF MALICIOUS
BEHAVIORS
In this section, we present some malicious behaviors
that were automatically extracted by MADLIRA.
Installing Malicious Packages. This malicious be-
havior installs a list of malicious packages to the sys-
MADLIRA: A Tool for Android Malware Detection
673
Set of
Malwares
Set of
Benwares
Application
Modeling
Malware
Graphs G
M
Benware
Graphs G
B
Malicious Graph
Computation
Malicious
Behaviors Graph
Extraction of Malicious Behaviors
A New
Application
Application
Modeling
Program’s Graph
G
Detect
subgraphs?
Malicious Behavior Detection
Malicious!
Benign!
Yes
No
Figure 4: Extraction of Malicious Behaviors - Module 2.
Set of
Malwares
Set of
Benwares
Application
Modeling
Malware
Graphs G
M
Benware
Graphs G
B
LIBSVM training Training Model
Learning Malicious Behaviors
A New
Application
Application
Modeling
Program’s Graph
G
LIBSVM
Classifier
Malicious!
Benign!
Malware Detection
Figure 5: Learning Malicious Behaviors - Module 3.
tem. It is shown in the malicious API graph of Figure
6. This graph represents the following malicious be-
Ljava/util/Map; ->get
Ljava/lang/Runtimes; ->exec
Landroid/content/pm/PackageManager;
->getLaunchIntentForPackage
Ljava/util/Map; ->remove
Figure 6: The malicious API graph of the Android mali-
cious behavior of Installing malicious packages.
havior: the method get() in the object Map is first
called to get the package from the list. Then, the
method exec() is called to execute the package and
getLaunchIntentForPackage() is called to install
the package. Finally, this package is removed from
the list by calling the method remove() in the object
Map.
Running a Malicious Process This malicious be-
havior replaces the running process by a malicious
process. It is shown in the graph of Figure 7. The ma-
Ljava/lang/String;
->valueOf
Landroid/os/Process;
->killProcess
Landroid/content/pm/PackageManager;
->getLaunchIntentForPackage
Figure 7: The malicious API graph of the Android mali-
cious behavior of Running a malicious process.
licious behavior is implemented as follows: The ma-
licious application first calls valueOf() in the object
String to get the identifier of the specific process.
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
674
Then, it calls killProcess() to kill this process.
Finally, it calls getLaunchIntentForPackage() to
launch a replaced process in the system.
Repeatedly Sending Messages This malicious be-
havior repeatedly sends the data via SMS messages.
It is shown in the graph of Figure 8. The ma-
Landroid/content/SharedPreferences; ->getString
Landroid/telephony/SmsManager;
->getDefault
Ljava/util/Iterator;
->next
Landroid/telephony/SmsManager;
->sendTextMessage
Figure 8: The malicious API graph of the Android mali-
cious behavior of Repeatedly sending messages.
licious behavior is implemented as follows: The
malicious application first calls getString() in
SharedPreferences to get the values of the vari-
ables. Then, it calls getDefault() in SmsManager to
get the object to handle SMS message in the system.
Finally, it calls sendTextMessage() in SmsManager
to send the messages. Besides, it calls next() in
Iterator to get a list of messages for sending.
REFERENCES
ApkAndroidTool (2016). A tool for reverse engineering an-
droid apk files. https://ibotpeaches.github.io/Apktool.
Accessed: 2016-11-25.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., and
Rieck, K. (2014). Drebin: Effective and explain-
able detection of android malware in your pocket. In
NDSS.
Burguera, I., Zurutuza, U., and Nadjm-Tehrani, S. (2011).
Crowdroid: behavior-based malware detection system
for android. In Proceedings of the 1st ACM workshop
on Security and privacy in smartphones and mobile
devices.
Canfora, G., Medvet, E., Mercaldo, F., and Visaggio, C. A.
(2015). Detecting android malware using sequences
of system calls. In Proceedings of the 3rd Interna-
tional Workshop on Software Development Lifecycle
for Mobile.
Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for
support vector machines. ACM Transactions on Intel-
ligent Systems and Technology, 2. Software available
at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Dam, K. and Touili, T. (2019a). STAMAD: a static mal-
ware detector. In Proceedings of the 14th Interna-
tional Conference on Availability, Reliability and Se-
curity, ARES 2019, Canterbury, UK, August 26-29,
2019, pages 25:1–25:6. ACM.
Dam, K.-H.-T. and Touili, T. (2017a). Extracting android
malicious behaviors. In International Workshop on
FORmal methods for Security Engineering.
Dam, K.-H.-T. and Touili, T. (2017b). Learning android
malware. In Proceedings of the 12th International
Conference on Availability, Reliability and Security,
ARES ’17.
Dam, K. H. T. and Touili, T. (2019b). Stamad: A static
malware detector. In Proceedings of the 14th Interna-
tional Conference on Availability, Reliability and Se-
curity, ARES ’19, New York, NY, USA. Association
for Computing Machinery.
Dimja
ˇ
sevic, M., Atzeni, S., Ugrina, I., and Rakamaric, Z.
(2015). Android malware detection based on system
calls. University of Utah, Tech. Rep.
Gascon, H., Yamaguchi, F., Arp, D., and Rieck, K. (2013).
Structural detection of android malware using embed-
ded call graphs. In Proceedings of the 2013 ACM
workshop on Artificial intelligence and security, pages
45–54. ACM.
Jang, J.-w., Kang, H., Woo, J., Mohaisen, A., and Kim,
H. K. (2016). Andro-dumpsys: anti-malware system
based on the similarity of malware creator and mal-
ware centric information. Computers & security.
Kaspersky (2018). Internet security threat report.
https://securelist.com/it-threat-evolution-q3-2017-
statistics/83131/. Accessed: 2018-01-30.
Maiorca, D., Ariu, D., Corona, I., Aresu, M., and Giacinto,
G. (2015). Stealth attacks: An extended insight into
the obfuscation effects on android malware. Comput-
ers & Security.
Malik, S. and Khatter, K. (2016). System call analysis of
android malware families. Indian Journal of Science
and Technology.
McAfee (2019). Mcafee mobile threat re-
port. https://www.mcafee.com/enterprise/en-
us/assets/reports/rp-mobile-threat-report-2019.pdf.
Accessed: 2020-09-30.
Preda, M. D. and Maggi, F. (2016). Testing android mal-
ware detectors against code obfuscation: a systemati-
zation of knowledge and unified methodology. Jour-
nal of Computer Virology and Hacking Techniques.
Rastogi, V., Chen, Y., and Jiang, X. (2013). Droid-
chameleon: Evaluating android anti-malware against
transformation attacks. ASIA CCS ’13.
Song, F. and Touili, T. (2014). Model-checking for an-
droid malware detection. In Asian Symposium on Pro-
gramming Languages and Systems, pages 216–235.
Springer.
Symantec (2016). Internet security threat report.
https://www.symantec.com/securitycenter/threat-
report . Accessed: 2016-11-25.
Zheng, M., Lee, P. P. C., and Lui, J. C. S. (2013). ADAM:
An Automatic and Extensible Platform to Stress Test
Android Anti-virus Systems. DIMVA 2012.
MADLIRA: A Tool for Android Malware Detection
675