The Subtle Art of Digging for Defects: Analyzing Features for Defect Prediction in Java Projects

Geanderson Santos, Adriano Veloso, Eduardo Figueiredo

2022

Abstract

The task to predict software defects remains a topic of investigation in software engineering and machine learning communities. The current literature proposed numerous machine learning models and software features to anticipate defects in source code. Furthermore, as distinct machine learning approaches emerged in the research community, increased possibilities for predicting defects are made possible. In this paper, we discuss the results of using a previously applied dataset to predict software defects. The dataset contains 47,618 classes from 53 Java software projects. Besides, the data covers 66 software features related to numerous aspects of the code. As a result of our investigation, we compare eight machine learning models. For the candidate models, we employed Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (CART), Random Forest (RF), and Gradient Boosting Machine (GBM). To contrast the models’ performance, we used five evaluation metrics frequently applied in the defect prediction literature. We hope this approach can guide more discussions about benchmark machine learning models for defect prediction.

Download


Paper Citation


in Harvard Style

Santos G., Veloso A. and Figueiredo E. (2022). The Subtle Art of Digging for Defects: Analyzing Features for Defect Prediction in Java Projects. In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-758-568-5, pages 371-378. DOI: 10.5220/0011045700003176


in Bibtex Style

@conference{enase22,
author={Geanderson Santos and Adriano Veloso and Eduardo Figueiredo},
title={The Subtle Art of Digging for Defects: Analyzing Features for Defect Prediction in Java Projects},
booktitle={Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2022},
pages={371-378},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011045700003176},
isbn={978-989-758-568-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - The Subtle Art of Digging for Defects: Analyzing Features for Defect Prediction in Java Projects
SN - 978-989-758-568-5
AU - Santos G.
AU - Veloso A.
AU - Figueiredo E.
PY - 2022
SP - 371
EP - 378
DO - 10.5220/0011045700003176