The Comparison of Word Embedding Techniques in RNNs for Vulnerability Detection

Hai Nguyen, Songpon Teerakanok, Atsuo Inomata, Tetsutaro Uehara

2021

Abstract

Many studies have combined Deep Learning and Natural Language Processing (NLP) techniques in security systems in performing tasks such as bug detection, vulnerability prediction, or classification. Most of these works relied on NLP embedding methods to generate input vectors for the deep learning models. However, there are many existing embedding methods to encode software text files into vectors, and the structures of neural networks are immense and heuristic. This leads to a challenge for the researcher to choose the appropriate combination of embedding techniques and the model structure for training the vulnerability detection classifiers. For this task, we propose a system to investigate the use of four popular word embedding techniques combined with four different recurrent neural networks (RNNs), including both bidirectional RNNs (BRNNs) and unidirectional RNNs. We trained and evaluated the models by using two types of vulnerable function datasets written in C code. Our results showed that the FastText embedding technique combined with BRNNs produced the most efficient detection rate, compared to other combinations, on a real-world but not on an artificially-produced dataset. Further experiments on other datasets are necessary to confirm this result.

Download


Paper Citation


in Harvard Style

Nguyen H., Teerakanok S., Inomata A. and Uehara T. (2021). The Comparison of Word Embedding Techniques in RNNs for Vulnerability Detection.In Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-491-6, pages 109-120. DOI: 10.5220/0010232301090120


in Bibtex Style

@conference{icissp21,
author={Hai Nguyen and Songpon Teerakanok and Atsuo Inomata and Tetsutaro Uehara},
title={The Comparison of Word Embedding Techniques in RNNs for Vulnerability Detection},
booktitle={Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2021},
pages={109-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010232301090120},
isbn={978-989-758-491-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - The Comparison of Word Embedding Techniques in RNNs for Vulnerability Detection
SN - 978-989-758-491-6
AU - Nguyen H.
AU - Teerakanok S.
AU - Inomata A.
AU - Uehara T.
PY - 2021
SP - 109
EP - 120
DO - 10.5220/0010232301090120