Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly

Claudio Curto; Daniela Giordano; Daniel Indelicato

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly

Topics: Artificial Intelligence for Security and Privacy; Machine Learning Security and Privacy; Software Security

In Proceedings of the 22nd International Conference on Security and Cryptography SECRYPT - Volume 1, 355-362, 2025 , Bilbao, Spain

Authors: Claudio Curto ¹ ; Daniela Giordano ¹ and Daniel Indelicato ²

Affiliations: ¹ Department of Electrical Electronic and Computer Engineering (DIEEI), University of Catania, Catania, Italy ; ² EtnaHitech S.c.p.A., Darwin Technologies S.r.l., Catania, Italy

Keyword(s): Vulnerable Code Datasets, Vulnerability Detection, Deep Learning, Data Analysis.

Abstract: Recent years have witnessed growing interest in applying deep learning techniques to software security assessment, particularly for detecting vulnerability patterns in human-generated source code. Despite advances, the effectiveness of deep learning models is often hindered by limitations in the datasets used for training. This study conducts a comprehensive evaluation of one widely used and two recently released C/C++ real-world vulnerable code datasets to assess their impact on the performance of transformer-based models, focusing on generalization across unseen projects, unseen vulnerability types and diverse data distributions. In addition, we analyze the effects of aggregating datasets and compare the results with previous experiments. Experimental results demonstrate that combining datasets significantly improves model generalization across varied distributions, highlighting the importance of diverse, high-quality data for enhancing vulnerability detection in source code.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Curto, C., Giordano, D., Indelicato and D. (2025). Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly. In Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-760-3; ISSN 2184-7711, SciTePress, pages 355-362. DOI: 10.5220/0013495200003979

@conference{secrypt25,
author={Claudio Curto and Daniela Giordano and Daniel Indelicato},
title={Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly},
booktitle={Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT},
year={2025},
pages={355-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013495200003979},
isbn={978-989-758-760-3},
issn={2184-7711},
}

TY - CONF

JO - Proceedings of the 22nd International Conference on Security and Cryptography - SECRYPT
TI - Addressing the C/C++ Vulnerability Datasets Limitation: The Good, the Bad and the Ugly
SN - 978-989-758-760-3
IS - 2184-7711
AU - Curto, C.
AU - Giordano, D.
AU - Indelicato, D.
PY - 2025
SP - 355
EP - 362
DO - 10.5220/0013495200003979
PB - SciTePress