
5 CONCLUSION
This paper presents an exploratory study of security
vulnerabilities in C++ source code from very large
projects. We analyzed twenty-six worldwide C++
projects and empirically studied the prevalence of se-
curity vulnerabilities. From this exploratory study we
have answered four research questions (RQs). RQ1
- How often are code vulnerabilities in real C++
projects? Some vulnerabilities are more frequent
than others. Besides, some vulnerabilities were not
found in the selected projects. RQ2 - Are there sig-
nificant correlations between pairs of vulnerabilities?
Our results showed that many pairs of vulnerabili-
ties have high correlation coefficients, over 0.6. RQ3
- Are there code vulnerabilities that occur together
in real C++ projects? We have explored the Apri-
ori algorithm and found some interesting association
rules, which indicate that there are code vulnerabil-
ities that occur together. RQ4 - Is possible to find
groups of vulnerabilities that occur together in real
C++ projects? We found two 13 clusters of code vul-
nerabilities. Leveraging these insights, this paper has
the potential to assist software developers and secu-
rity experts in mitigating future problems during the
development of C++ projects. In future work, we in-
tend to investigate the occurrence of false positives
and negatives in security reports. Moreover, we will
use machine learning techniques to predict code vul-
nerabilities.
ACKNOWLEDGMENTS
This work was partially funded by Lenovo as part of
its R&D investment under the Information Technol-
ogy Law. The authors would like to thank LSBD/UFC
for the partial funding of this research.
REFERENCES
Agrawal, R. S. and Srikant, R. (1994). R. fast algorithms for
mining association rules. In Proceedings of the 20th
International Conference on Very Large Data Bases,
VLDB, pages 487–499.
Alqaradaghi, M. and Kozsik, T. (2024). Comprehensive
evaluation of static analysis tools for their perfor-
mance in finding vulnerabilities in java code. IEEE
Access.
Bagheri, E. and Gasevic, D. (2011). Assessing the maintain-
ability of software product line feature models using
structural metrics. Software Quality Journal, 19:579–
612.
Berger, T. and Guo, J. (2014). Towards system analysis
with variability model metrics. In Proceedings of the
8th International Workshop on Variability Modelling
of Software-Intensive Systems, pages 1–8.
Do, L. N. Q., Wright, J. R., and Ali, K. (2022). Why do
software developers use static analysis tools? A user-
centered study of developer needs and motivations.
IEEE Trans. Software Eng., 48(3):835–847.
Fan, J., Li, Y., Wang, S., and Nguyen, T. N. (2020).
Ac/c++ code vulnerability dataset with code changes
and cve summaries. In Proceedings of the 17th inter-
national conference on mining software repositories,
pages 508–512.
Hussain, S., Anwaar, H., Sultan, K., Mahmud, U., Farooqui,
S., Karamat, T., and Toure, I. K. (2024). Mitigating
software vulnerabilities through secure software de-
velopment with a policy-driven waterfall model. Jour-
nal of Engineering, 2024(1):9962691.
Islam, N. T., Karkevandi, M. B., and Najafirad, P. (2024).
Code security vulnerability repair using reinforcement
learning with large language models. arXiv preprint
arXiv:2401.07031.
Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., and Chen, Z. (2021).
Sysevr: A framework for using deep learning to detect
software vulnerabilities. IEEE Transactions on De-
pendable and Secure Computing, 19(4):2244–2258.
Ma, L., Yang, H., Xu, J., Yang, Z., Lao, Q., and Yuan, D.
(2022). Code analysis with static application security
testing for python program. J. Signal Process. Syst.,
94(11):1169–1182.
McCrum-Gardner, E. (2008). Which is the correct statistical
test to use? British Journal of Oral and Maxillofacial
Surgery, 46(1):38–41.
Mehrpour, S. and LaToza, T. D. (2023). Can static analysis
tools find more defects? Empir. Softw. Eng., 28(1):5.
Moyo, S. and Mnkandla, E. (2020). A novel lightweight
solo software development methodology with opti-
mum security practices. IEEE Access, 8:33735–
33747.
Nguyen-Duc, A., Do, M. V., Hong, Q. L., Khac, K. N., and
Quang, A. N. (2021). On the adoption of static anal-
ysis for software security assessment-a case study of
an open-source e-government project. Comput. Secur.,
111:102470.
Novak, J., Krajnc, A., et al. (2010). Taxonomy of static code
analysis tools. In The 33rd international convention
MIPRO, pages 418–422. IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Raschka, S. (2018). Mlxtend: Providing machine learning
and data science utilities and extensions to python’s
scientific computing stack. Journal of open source
software, 3(24):638.
DATA 2025 - 14th International Conference on Data Science, Technology and Applications
574