Systematic Characterization of a Sequence Group

Paul Irolla

2019

Abstract

Finding similarities in a group of sequences often involves studying their common subsequences or their common substrings. In our case, Android malware detection/classification, we study the event sequences coming from the dynamic analysis of applications. For several reasons, these sequences are mostly comprised of benign events. This specific set up makes classic sequence similarity criteria useless without any machine learning. The sequence membership to a group is characterized by subsequences of any length. Heuristic algorithms for extracting short subsequences already exist, but no attempt to solve the problem systematically has been proposed. We propose a new algorithm for building the Embedding Antichain from the set of common subsequences (noted AΓ). We show that this mathematical representation is very compact and embed all common subsequences of a sequence set. It is a tool for characterizing a group of sequences. The construction of this representation reveals several complex subproblems. A few of them are solved in this article, along with practical implementations. Moreover, we solved different reduced problems and provided suboptimal solutions for the others. This article opens a new path that has cross-domain applications. Specifically, in the malware detection/classification domain the Systematic Characterization of Sequence Groups is a tool that can be used for automatic generation of malware family signatures and detection heuristics. We experimented AΓ for building an Android malware family detector, on the sequences of executed Android API calls and it yields an accuracy of 97.74%.

Download


Paper Citation


in Harvard Style

Irolla P. (2019). Systematic Characterization of a Sequence Group.In Proceedings of the 5th International Conference on Information Systems Security and Privacy - Volume 1: ForSE, ISBN 978-989-758-359-9, pages 645-656. DOI: 10.5220/0007349706450656


in Bibtex Style

@conference{forse19,
author={Paul Irolla},
title={Systematic Characterization of a Sequence Group},
booktitle={Proceedings of the 5th International Conference on Information Systems Security and Privacy - Volume 1: ForSE,},
year={2019},
pages={645-656},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007349706450656},
isbn={978-989-758-359-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Information Systems Security and Privacy - Volume 1: ForSE,
TI - Systematic Characterization of a Sequence Group
SN - 978-989-758-359-9
AU - Irolla P.
PY - 2019
SP - 645
EP - 656
DO - 10.5220/0007349706450656