Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly

Claudio Curto, Daniela Giordano, Daniel Indelicato

2025

Abstract

Recent years have witnessed growing interest in applying deep learning techniques to software security assessment, particularly for detecting vulnerability patterns in human-generated source code. Despite advances, the effectiveness of deep learning models is often hindered by limitations in the datasets used for training. This study conducts a comprehensive evaluation of one widely used and two recently released C/C++ real-world vulnerable code datasets to assess their impact on the performance of transformer-based models, focusing on generalization across unseen projects, unseen vulnerability types and diverse data distributions. In addition, we analyze the effects of aggregating datasets and compare the results with previous experiments. Experimental results demonstrate that combining datasets significantly improves model generalization across varied distributions, highlighting the importance of diverse, high-quality data for enhancing vulnerability detection in source code.

Download


Paper Citation


in Harvard Style

Curto C., Giordano D. and Indelicato D. (2025). Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly. In Proceedings of the 22nd International Conference on Security and Cryptography - Volume 1: SECRYPT; ISBN 978-989-758-760-3, SciTePress, pages 355-362. DOI: 10.5220/0013495200003979


in Bibtex Style

@conference{secrypt25,
author={Claudio Curto and Daniela Giordano and Daniel Indelicato},
title={Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly},
booktitle={Proceedings of the 22nd International Conference on Security and Cryptography - Volume 1: SECRYPT},
year={2025},
pages={355-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013495200003979},
isbn={978-989-758-760-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 22nd International Conference on Security and Cryptography - Volume 1: SECRYPT
TI - Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly
SN - 978-989-758-760-3
AU - Curto C.
AU - Giordano D.
AU - Indelicato D.
PY - 2025
SP - 355
EP - 362
DO - 10.5220/0013495200003979
PB - SciTePress