Mobile Family Detection through Audio Signals Classiﬁcation

Rosangela Casolare

, Giacomo Iadarola

, Fabio Martinelli

, Francesco Mercaldo

2,3

and Antonella Santone

Department of Biosciences and Territory, University of Molise, Pesche (IS), Italy

Institute of Informatics and Telematics, National Research Council of Italy, Pisa, Italy

Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy

{giacomo.iadarola, fabio.martinelli, francesco.mercaldo}@iit.cnr.it

Keywords:

Malware, Audio, Android, Machine Learning, Deep Learning, Security, Classiﬁcation.

Abstract:

Nowadays smartphones, and generically speaking mobile devices, allow users a plethora of tasks in total mo-

bility for instance, from checking the balance on the bank account to distance learning. In this context it is

really critical the detection of malicious behaviours, considering the weaknesses of the current antimalware

mechanisms. In this paper we propose a method for malicious family detection exploiting audio signal pro-

cessing: in fact, an application is converted into an audio ﬁle and then is processed to generate a feature vector

to input several classiﬁers. We perform a real-world experimental analysis by considering a set of malware

targeting the Android platform i.e., 4746 malware belonging to 10 families, showing the effectiveness of the

proposed approach for Android malicious family detection.

1 INTRODUCTION

In the last years, there was a huge spread of mobile de-

vices like smartphones and tablets, which became the

principal target of attacks, because these devices con-

tain a lot of sensitive, ﬁnancial and personal informa-

tion. Among various software systems present in mo-

bile devices, Android is the most popular and also the

most diffuse; since Android is an open source system,

it arouses more interest from malicious people as it al-

lows you to create customized systems by rebuilding

the source code (Enck et al., 2014). Furthermore, this

operating system allows to install applications from

third-party stores as well as from the ofﬁcial ones;

thus, the users, who lack adequate knowledge of the

dangers to which they are exposed, are subject to at-

tacks launched by applications downloaded from un-

ofﬁcial stores, which are less reliable. Nevertheless,

the presence of malicious applications cannot be ex-

cluded even in the ofﬁcial stores (i.e., Google Play

Store) (Nguyen et al., 2020).

For this reason, the Android environment results

to be the most attacked by cybercriminals (Canfora

et al., 2018).

During May 2020, around 430,000 malware at-

tacks were detected on Android devices, counting an

increase of 3.6% compared to the previous month. In-

stead, in August 2020, it was observed a growth of

6.26% compared to the previous month

In this regard, we propose an approach based on

the detection of malware families in the Android en-

vironment, which consists of the conversion of an An-

droid application into an audio ﬁle. Then, we extract

from the audio ﬁle a series of numerical features that

are used to understand which family the application

belongs to.

Among the most applied analysis techniques in

the literature, we decided to explore a group of

them and to adopt four different supervised classiﬁca-

tion algorithms, belonging to the Machine Learning

(Stochastic Gradient Descent, Random Forest) and

Deep Learning ﬁelds (two different model structures

of Multy-layer Perceptron).

The paper’s organization is the following: in sec-

tion 2 is described the proposed method to analyse the

audio signals and make the malware family detection,

starting from the conversion of an Android applica-

tion in an audio ﬁle, which its features will be used to

classify the family belonging; in section 3 is described

the considered dataset, showing the effectiveness of

the experimental analysis executed on it; in section 4

current state-of-the-art literature is analyzed and dis-

https://news.drweb.com/show/review/?i=13991&lng=en

Casolare, R., Iadarola, G., Martinelli, F., Mercaldo, F. and Santone, A.

Mobile Family Detection through Audio Signals Classiﬁcation.

DOI: 10.5220/0010543504790486

In Proceedings of the 18th International Conference on Security and Cryptography (SECRYPT 2021), pages 479-486

ISBN: 978-989-758-524-1

479

cussed and, ﬁnally, in section 5 conclusion and future

research plans are presented.

2 AUDIO SIGNALS FOR

MALWARE FAMILY

DETECTION

In this section, we present the method we propose for

mobile family detection. In a nutshell, we convert an

Android application into an audio ﬁle and we extract

a series of numerical values (i.e., the features) from

the audio ﬁle. Then, the features represent the input

for a supervised classiﬁer (previously trained) aimed

to predict the belonging malware family. In detail

the proposed method considers two distinct phases:

Training (shown in Figure 1) and Testing (shown in

Figure 2).

The Training phase, depicted in Figure 1, is aimed

to build a model for the malicious family prediction.

We start with a malware dataset (composed by mali-

cious Android application) and the relative family la-

bel i.e., the detail about the malicious family for each

sample involved. Subsequently, we extract from each

Android application (stored in the apk ﬁle format) the

executable ﬁle (i.e., the dex ﬁle), containing the bi-

nary of the application (we discard from the analysis

all the application resources as, for instance, images

and sounds). To convert binary (i.e., the dex ﬁle) we

consider binary bytes forming a digitized raw signal,

then we convert the raw signal into wav. In detail

to generate a wav ﬁle from dex ﬁle, we developed a

function aimed to ﬁrstly generate a wav header, sub-

sequently the dex ﬁle is open and each byte of this

ﬁle is converted in wav. For this task, we resort the

wave module

available in Python; in particular, we

invoked the open and the writeframes methods: the

ﬁrst one to open the ﬁle, while the second one for wav

ﬁle writing. By exploiting the setparams methods, we

also considered the following parameters for the wav

generation: the number of audio channels equal to 1

(mono i.e., with one input which is distributed equally

by the left and right speakers), the sample width set to

n bytes with n = 1, the frame rate set to 32768, and the

number of frames set to 0 and without compression.

Once obtained the audio samples related to each

Android sample in the malicious dataset, a set of fea-

ture is directly computed on the audio sample.

In detail, the following features are computed:

• Chromagram: this feature is related to a chroma-

gram representation automatically gathered from

a waveform;

https://docs.python.org/3/library/wave.html

• Root Mean Square: this feature (i.e., RMS) is re-

lated to the value of the mean square of the root

that is obtained for each audioframe that is gath-

ered from the sound sample under analysis;

• Spectral Centroid: this feature is symptomatic of

the “centre of mass” for a sound sample that is

obtained as the mean related to the frequencies of

the audio;

• Bandwidth: it is related to the bandwidth of the

spectrum;

• Spectral Rolloff : it is expressed as the frequency

related to a certain percentage of the total spectral

of the energy;

• Zero Crossing Rate: it is expressed as the ra-

dio belonging to the sign variation relating to the

sound samples;

• Mel-Frequency Cepstral Coefﬁcients: this feature

(i.e., MFCC), ranging from 10 to 20 different nu-

merical features, is devoted to represent the shape

of a spectral envelope;

• Zero Crossing Rate: this value is related to the rate

of an audio time series;

• Poly: it is computed as the ﬁtting coefﬁcients re-

lated to an nth-order polynomial;

• Tonnetz: it is computed from the the tonal cen-

troid.

Once we obtained the feature vectors from the .wav

ﬁles, we export them to .csv ﬁles, where each row

contains the feature values for each app under anal-

ysis with the relative label of the belonging family.

Subsequently, we set the parameters for the clas-

siﬁcation algorithms (i.e., model setting in Figure 1).

We adopt four different supervised classiﬁcation al-

gorithm. In detail, we experiment the effectiveness of

following models:

• Stochastic Gradient Descent (SGD): uses stochas-

tic gradient descent that minimizes a chosen loss

function with a linear function. The algorithm

approximates a true gradient by considering one

sample at a time, and simultaneously updates the

model based on the gradient of the loss function;

• Random Forest: an ensemble classiﬁer obtained

from the bagging of decision trees. It consists of

hundreds of thousands of decision trees. It falls

under those ensemble learning algorithms, that is,

algorithms that use multiple machine learning al-

gorithms to get more precise predictions. The

number of trees depends on the nature of the train-

ing set and other parameters such as the number

of classes, the number of beans and the maximum

depth;

SECRYPT 2021 - 18th International Conference on Security and Cryptography

480

Figure 1: Training.

• MLP 1: it is model of computation based on bi-

ological neural networks. A neural network is an

interconnection of a group of nodes called neu-

rons. An artiﬁcial neural network receives exter-

nal signals on a layer of input nodes (processing

units), each of which is connected with numerous

internal nodes, organized in several layers. Each

node processes the received signals and transmits

the result to subsequent nodes. In detail we ex-

ploit a multi-layer perceptron (MLP) algorithm

with backpropagation, a class of feedforward ar-

tiﬁcial neural network. We consider two differ-

ent networks exploiting the MLP architecture: the

MLP 1 model consists of three layers of nodes:

an input layer, a hidden layer and an output layer.

This model consider 100 neurons for the hidden

layer, by exploiting the ReLU activation and the

Adam solver;

• MLP 2: this model is based, as the MLP 1, the

MLP algorithm with backpropagation. Differ-

ently from the MLP 1 model, the MLP 2 one is

composed by an input layer, three hidden layers

each one considering 100 neurons and the output

layer (while the MLP 1 considers only one hidden

layer).

Four different models are considered for conclusion

validity i.e., to demonstrate that the proposed feature

set, obtained from audio samples, can be effective in

the discrimination of different malicious families.

Once built the predictive model, its effectiveness

is evaluated in the Testing phase, shown in Figure 2.

The idea of the Testing phase is the evaluation of

the effectiveness of the model built in the Training

phase. For this reason, considering an application

not considered in the model generation (i.e., app un-

der analysis in Figure 2), its dex ﬁle is obtained from

the apk one and it is converted into an audio sample.

Thus, from the audio sample the features are extracted

and then are considered as input for the model that

will generated a prediction (i.e., malicious family in

Figure 2).

3 STUDY DESIGN AND

EXPERIMENTAL ANALYSIS

We design a study composed by two steps: the ﬁrst

one is the descriptive statistics, aimed to provide a

graphical impact about the feature value distributions

for all the involved families and the second one is the

classiﬁcation results, devoted to conﬁrm the effective-

ness of the proposed model for the mobile family de-

tection task.

With regard to the descriptive statistics we exploit

boxplots, a method for graphically depicting groups

of numerical data through their quartile, to display

variation in samples of a statistical population without

making any assumptions of the underlying statistical

distribution.

The classiﬁcation analysis is aimed to compute

a set of well-known metrics to provide a numerical

measurement to evaluating the performances of the

proposed models.

3.1 The Real-world Dataset

As stated into the introduction, we consider a real-

world dataset composed by 4796 Android malicious

applications belonging to 10 different families, as

shown in Table 1.

The dataset considered in the experiment was

gathered from three different repositories: the ﬁrst

one is the Drebin dataset (Arp et al., 2014; Michael

et al., 2013), a very well-known collection of malware

largely considered by malware analysis researchers,

including the most widespread Android families. The

malware dataset is freely available for research pur-

poses

. The second malware repositoty is Conta-

gio Mobile

, a web site containing malicious sam-

ples with the relative technical report about the ma-

licious behaviour. The third malicious repository we

exploited is the Android Malware repository (AMD)

(Iadarola et al., 2020).

https://www.sec.cs.tu-bs.de/

∼

danarp/drebin/

http://contagiominidump.blogspot.com/

Mobile Family Detection through Audio Signals Classiﬁcation

481

Figure 2: Testing.

Table 1: The real-world malicious dataset involved in the study.

Family Description Inst. #

accutrack it tracks down the GPS location of the device on which it was installed R 500

airpush it aggressively pushes advertising content to the device’s notiﬁcation bar R 500

basebridge it sends SMS and personal information R, U 600

droidkungfu it uses exploits in its attempt to root a device to install other applications R 667

fakeinstaller it sends SMS messages to premium-rate services S 606

hummingbad it establishes a persistent rootkit and installs fraudulent applications R 500

judy an auto-clicking adware relying on the communication with its C&C server R 84

opfake it hides its presence by installing the Opera browser and can monitor SMS S 610

overlay a fake bank application using overlay technique to steal user credentials U 56

plankton it installs a JAR ﬁle obtained from an external server U 623

The considered malware dataset consists of 10 An-

droid malicious families characterized by different in-

stallation methods (column Inst. in Table 1): (i) stan-

dalone (i.e., S in Table 1), applications that intention-

ally include malicious functionalities; (ii) repackag-

ing (i.e., R in Table 1), known and common (legit-

imate) applications that are ﬁrst disassembled, then

the malicious payload is added, and ﬁnally are re-

assembled and distributed as a new version (of the

original application); and (iii) update attack (i.e., U in

Table 1), applications that initially do not show harm-

ful behaviors and download an update containing the

malicious payload, at runtime.

In detail the basebridge, the droidkungfu, the fake-

installer, the opfake and the plankton families were

obtained from the Drebin dataset, the hummingbad,

the judy and the overlay ones from the Contagio Mo-

bile website. The accutrack, airpush families were

gathered from the AMD dataset.

The malware dataset is also partitioned according

to the malware family (Zhou and Jiang, 2012).

We analyzed the dataset with the VirusTotal ser-

vice

, a web service able to run 61 commercial and

free antimalware: this analysis conﬁrmed that the ma-

licious applications were actually recognized as mal-

ware.

In Table 1 we indicate also the details about the

number of samples considered for each malware fam-

ily (i.e., column # in Table 1).

https://www.virustotal.com/

For each application of the dataset we gathered the

audio sample and the feature set with the procedure

explained in the previous section.

3.2 Descriptive Statistics

Figure 3 shows the box-plots related for the spectral

chromogram features. For reason space we show only

this plot, but similar considerations can be done for

the remaining ones.

In Figure 3 each boxplot is related to a single fam-

ily. On the top of each boxplot we indicate the fam-

ily name and the median value, while below, for each

boxplot, from the left the value of the ﬁrst quartile,

the average and the value of the third quartile.

From the Spectral Centroid boxplot in Figure 3, it

emerges that the values for this feature for the applica-

tion to the malicious dataset ranging into different val-

ues. For instance, the numerical values for the accu-

track family are ranging in a smaller range if com-

pared to the airpush family. A similar trend is exhib-

ited by the droidkungfu, the plankton and the hum-

mingbad families. For this reason, from this analysis,

it seems that the spectral centroid can not be of in-

terest for the discrimination of these families. Differ-

ently, the overlay family assumes values whose ﬁrst

quartile is greater than the third quartile of all other

families, making this feature very discriminatory in

identifying this family. The judy family boxplot in

the part between the ﬁrst quartile and the average dif-

SECRYPT 2021 - 18th International Conference on Security and Cryptography

482

Figure 3: Box-plots for the Spectral Centroid feature.

ferent values overlapping with those of other families,

but from the average up to the third quartile there is

no overlap with any other family, for this reason the

feature can be considered discriminating enough to

distinguish this family from others. Also the opfake

family boxplot is of interest in fact, there is only a

slight overlap with some values near the third quartile

with the ﬁrst quartile of the remaining families.

Obviously, the more the boxplots of each fam-

ily are not superimposed, the higher the probability

that the models will be able to correctly discriminate

the different families. From this visual analysis it

emerges that this feature can actually be valid to dis-

tinguish some families from others, but as has been

said for some families there is overlap. This is the

reason why we consider a set of features, in order to

increase this possibility.

3.3 Classiﬁcation Analysis

With regard to the classiﬁcation analysis, for differ-

ent metrics are exploited to measure the effectiveness

of the proposed method in Android family detection:

Precision, Recall, F-Measure and Accuracy.

Table 2 shows the classiﬁcation results.

As emerges from the results in Table 2 the model

obtaining the best performances is MLP 2 with an

average accuracy for family identiﬁcation equal to

0.988.

Also the MLP 1 and the Random Forest classiﬁca-

Mobile Family Detection through Audio Signals Classiﬁcation

483

Table 2: Classiﬁcation results.

Model Precision Recall F-Measure Accuracy

SGD 0.583 0.611 0.586 0.779

MLP 1 0.831 0.833 0.831 0.974

Random Forest 0.905 0.905 0.905 0.986

MLP 2 0.907 0.907 0.907 0.988

tion algorithms obtain interesting performances with

an average accuracy equal to 0.974 for the MLP 1 and

equal to 0.986 for the Random Forest.

In Figure 4 we show the ROC curve plot relating

to the accutrack family. The ROC curve is created by

plotting the True Positive Rate (TPR, fraction of true

positives) versus the False Positive Rate (FPR, frac-

tion of false positives) at various threshold settings.

In Figure 4 the green line is related to the Random

Forest algorithm, the orange one to the MLP 1 model,

the purple to the SGD model and, the pink one to the

MLP 2 network.

As shown from the ROC Area in Figure 4 with

the exception of the SDG model, the remaining ones

exhibit equally good performances.

Starting from this results, we focus our analysis

on the model obtaining the best results in the classi-

ﬁcation analysis i.e., the deep learning one (MLP 2

in Table 2). For understand the performances of the

MLP 2 model at a family grain, in Figure 5 we show

the confusion matrix.

All the family are generally correctly detected as

belonging to the right malicious family. We highlight

67 (on a total of 667 samples of this family) droid-

kungfu samples erroneously detected as belonging to

the accutrack family and 85 (on 500 samples analy-

sis of this family) accutrack samples predicted as be-

longing to the droidkungfu one. These two examples

represent the main cases of misclassiﬁcations. This

aspect is also visible from the descriptive analysis,

where in the boxplots shown in Figure 3 we high-

lighted the overlapping between the droidkungfu and

the hummingbad malware families.

4 RELATED WORK

The proposed techniques rely on the vivid research

branch that studies signal features to detect malware.

In this paper, we exploit the audio signal, but more ap-

proaches were proposed in the literature, such as tex-

ture (Nataraj et al., 2011), network (Kim et al., 2018)

or behavioural features analysis (Popli and Girdhar,

2019).

The paper (Farrokhmanesh and Hamzeh, 2019)

proposes a similar technique, which extracts the pro-

gram’s bytes and converts them to an audio signal.

In detail, the byte of executable ﬁles are converted to

musical notes (MIDI note) and then audio ﬁles are

generated. Then, audio features such as MFCC and

Chromagram are used to classify music and machine

learning classiﬁers (KNN) is applied.

The main issues of such approaches regard the

size of byte sequences to analyze. Most of the time,

the static analyses required are time-consuming and

computationally expensive. The paper (Bakhshinejad

and Hamzeh, 2017) presents an approach that tries to

mitigate this problem by applying compression algo-

rithms to the sequences to study. Similarly, the ap-

proach proposed by Jerome Q. et al. (Jerome et al.,

2014) works directly on the binary sequences, by ex-

tracting k-gram and classifying the malware with an

SVM.

The method presented in (Vasan et al., 2020)

converts malware binaries into colour images, and

then use a CNN model, pre-trained on the ImageNet

dataset, to distinguish between malware and benign

samples. The approach proposed in (Iadarola et al.,

2021) applies a similar methodology but improves the

robustness of the classiﬁcation, by analysing also the

inference phase, exploiting the use of a Grad-CAM.

The approach proposed in (Azab and Khasawneh,

2020) exploits the use of both audio signal process-

ing and image classiﬁcation techniques. The initial

program’s bytes are cast to audio signals, and then

applied Fourier Transformation to convert the signal

from time-domain to frequency-domain and generate

spectrograms. Then, the spectrograms are analysed

by a Convolutional Neural Network such as a stan-

dard image-classiﬁcation task.

5 CONCLUSION AND FUTURE

WORK

Mobile malware is continuously plaguing users, that

are unaware of the malicious behaviour that silently

are able to perpetrate harmful action as, for instance,

sending sensitive and private information (as, for in-

stance, the samples belonging to the accutrack fam-

ily) but also to install undesired apps (behaviour ex-

hibited by the droidkungfu and hummingbad fami-

SECRYPT 2021 - 18th International Conference on Security and Cryptography

484

Figure 4: ROC curve for the accutrack family.

Figure 5: Confusion Matrix for the MLP 2 model.

lies). In this paper we propose a technique for mo-

bile malware classiﬁcation into malicious belonging

family. In detail we propose the analysis of an au-

dio stream, obtained from the Android application un-

der analysis, to extract a set of numerical features.

These features are the input for several machine learn-

ing classiﬁers that we evaluate with more than 4500

malware targeting the Android environment. We ob-

tain an accuracy equal to 0.988 using a deep learn-

ing model designed by authors, showing that the pro-

posed method can be effective for Android malware

family detection. As future work, we plan to try to

localise inside the audio wave the frame related to the

malicious behaviour. Moreover, we will experiment

the proposed method by using a dataset composed by

iOS applications.

ACKNOWLEDGEMENTS

This work has been partially supported by MIUR -

SecureOpenNets, EU SPARTA, CyberSANE and E-

CORRIDOR projects.

REFERENCES

Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H.,

Rieck, K., and Siemens, C. (2014). Drebin: Effec-

Mobile Family Detection through Audio Signals Classiﬁcation

485

tive and explainable detection of android malware in

your pocket. In Ndss, volume 14, pages 23–26.

Azab, A. and Khasawneh, M. (2020). Msic: mal-

ware spectrogram image classiﬁcation. IEEE Access,

8:102007–102021.

Bakhshinejad, N. and Hamzeh, A. (2017). A new compres-

sion based method for android malware detection us-

ing opcodes. In 2017 Artiﬁcial Intelligence and Signal

Processing Conference (AISP), pages 256–261. IEEE.

Canfora, G., Martinelli, F., Mercaldo, F., Nardone, V., San-

tone, A., and Visaggio, C. A. (2018). Leila: formal

tool for identifying mobile malicious behaviour. IEEE

Transactions on Software Engineering, 45(12):1230–

1252.

Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.-

G., Cox, L. P., Jung, J., McDaniel, P., and Sheth,

A. N. (2014). Taintdroid: an information-ﬂow track-

ing system for realtime privacy monitoring on smart-

phones. ACM Transactions on Computer Systems

(TOCS), 32(2):1–29.

Farrokhmanesh, M. and Hamzeh, A. (2019). Music classiﬁ-

cation as a new approach for malware detection. Jour-

nal of Computer Virology and Hacking Techniques,

15(2):77–96.

Iadarola, G., Martinelli, F., Mercaldo, F., and Santone, A.

(2020). Evaluating deep learning classiﬁcation relia-

bility in android malware family detection. In 2020

IEEE International Symposium on Software Relia-

bility Engineering Workshops (ISSREW), pages 255–

260. IEEE.

Iadarola, G., Martinelli, F., Mercaldo, F., and Santone, A.

(2021). Towards an interpretable deep learning model

for mobile malware detection and family identiﬁca-

tion. Computers & Security, page 102198.

Jerome, Q., Allix, K., State, R., and Engel, T. (2014). Us-

ing opcode-sequences to detect malicious android ap-

plications. In 2014 IEEE International Conference on

Communications (ICC), pages 914–919.

Kim, H. M., Song, H. M., Seo, J. W., and Kim, H. K.

(2018). Andro-simnet: Android malware family clas-

siﬁcation using social network analysis. In 2018 16th

Annual Conference on Privacy, Security and Trust

(PST), pages 1–8. IEEE.

Michael, S., Florian, E., Thomas, S., Felix, C. F., and Hoff-

mann, J. (2013). Mobilesandbox: Looking deeper into

android applications. In Proceedings of the 28th In-

ternational ACM Symposium on Applied Computing

(SAC).

Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath,

B. S. (2011). Malware images: visualization and auto-

matic classiﬁcation. In Proceedings of the 8th interna-

tional symposium on visualization for cyber security,

pages 1–7.

Nguyen, T., Mcdonald, J., Glisson, W., and Andel, T.

(2020). Detecting repackaged android applications us-

ing perceptual hashing. In Proceedings of the 53rd

Hawaii International Conference on System Sciences.

Popli, N. K. and Girdhar, A. (2019). Behavioural analy-

sis of recent ransomwares and prediction of future at-

tacks by polymorphic and metamorphic ransomware.

In Computational Intelligence: Theories, Applica-

tions and Future Directions-Volume II, pages 65–80.

Springer.

Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei,

B., and Zheng, Q. (2020). Imcfn: Image-based

malware classiﬁcation using ﬁne-tuned convolutional

neural network architecture. Computer Networks,

171:107138.

Zhou, Y. and Jiang, X. (2012). Dissecting android mal-

ware: Characterization and evolution. In Proceed-

ings of 33rd IEEE Symposium on Security and Privacy

(Oakland 2012).

SECRYPT 2021 - 18th International Conference on Security and Cryptography

486