Improving Classification of Malware Families using Learning a Distance Metric

Martin Jureček, Olha Jurečková, Róbert Lórencz

Abstract

The objective of malware family classification is to assign a tested sample to the correct malware family. This paper concerns the application of selected state-of-the-art distance metric learning techniques to malware families classification. The goal of distance metric learning algorithms is to find the most appropriate distance metric parameters concerning some optimization criteria. The distance metric learning algorithms considered in our research learn from metadata, mostly contained in the headers of executable files in the PE file format. Several experiments have been conducted on the dataset with 14,000 samples consisting of six prevalent malware families and benign files. The experimental results showed that the average precision and recall of the k -Nearest Neighbors algorithm using the distance learned on training data were improved significantly comparing when the non-learned distance was used. The k -Nearest Neighbors classifier using the Mahalanobis distance metric learned by the Metric Learning for Kernel Regression method achieved average precision and recall, both of 97.04% compared to Random Forest with a 96.44% of average precision and 96.41% of average recall, which achieved the best classification results among the state-of-the-art ML algorithms considered in our experiments.

Download


Paper Citation


in Harvard Style

Jureček M., Jurečková O. and Lórencz R. (2021). Improving Classification of Malware Families using Learning a Distance Metric.In Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-491-6, pages 643-652. DOI: 10.5220/0010326306430652


in Bibtex Style

@conference{icissp21,
author={Martin Jureček and Olha Jurečková and Róbert Lórencz},
title={Improving Classification of Malware Families using Learning a Distance Metric},
booktitle={Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2021},
pages={643-652},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010326306430652},
isbn={978-989-758-491-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - Improving Classification of Malware Families using Learning a Distance Metric
SN - 978-989-758-491-6
AU - Jureček M.
AU - Jurečková O.
AU - Lórencz R.
PY - 2021
SP - 643
EP - 652
DO - 10.5220/0010326306430652