loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Claudio Curto 1 ; Daniela Giordano 1 and Daniel Indelicato 2

Affiliations: 1 Department of Electrical Electronic and Computer Engineering (DIEEI), University of Catania, Catania, Italy ; 2 EtnaHitech S.c.p.A., Darwin Technologies S.r.l., Catania, Italy

Keyword(s): Vulnerable Code Datasets, Vulnerability Detection, Deep Learning, Data Analysis.

Abstract: Recent years have witnessed growing interest in applying deep learning techniques to software security assessment, particularly for detecting vulnerability patterns in human-generated source code. Despite advances, the effectiveness of deep learning models is often hindered by limitations in the datasets used for training. This study conducts a comprehensive evaluation of one widely used and two recently released C/C++ real-world vulnerable code datasets to assess their impact on the performance of transformer-based models, focusing on generalization across unseen projects, unseen vulnerability types and diverse data distributions. In addition, we analyze the effects of aggregating datasets and compare the results with previous experiments. Experimental results demonstrate that combining datasets significantly improves model generalization across varied distributions, highlighting the importance of diverse, high-quality data for enhancing vulnerability detection in source code.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Curto, C., Giordano, D., Indelicato and D. (2025). Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly. In Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-760-3; ISSN 2184-7711, SciTePress, pages 355-362. DOI: 10.5220/0013495200003979

@conference{secrypt25,
author={Claudio Curto and Daniela Giordano and Daniel Indelicato},
title={Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly},
booktitle={Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT},
year={2025},
pages={355-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013495200003979},
isbn={978-989-758-760-3},
issn={2184-7711},
}

TY - CONF

JO - Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT
TI - Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly
SN - 978-989-758-760-3
IS - 2184-7711
AU - Curto, C.
AU - Giordano, D.
AU - Indelicato, D.
PY - 2025
SP - 355
EP - 362
DO - 10.5220/0013495200003979
PB - SciTePress