ON THE PREDICTABILITY OF SOFTWARE EFFORTS USING MACHINE LEARNING TECHNIQUES

Wen Zhang, Ye Yang, Qing Wang

Abstract

This paper investigates the predictability of software effort using machine learning techniques. We employed unsupervised learning as k-medoids clustering with different similarity measures to extract natural clusters of projects from software effort data set, and supervised learning as J48 decision tree, back propagation neural network (BPNN) and na¨ive Bayes to classify the software projects. We also investigate the impact of imputing missing values of projects on the performances of both unsupervised and supervised learning techniques. Experiments on ISBSG and CSBSG data sets demonstrate that unsupervised learning as k-medoids clustering has produced a poor performance in software effort prediction and Kulzinsky coefficient has the best performance in software effort prediction in measuring the similarities of projects. Supervised learning techniques have produced superior performances in software effort prediction. Among the three supervised learning techniques, BPNN produces the best performance. Missing data imputation has improved the performances of both unsupervised and supervised learning techniques.

References

  1. Beohm, B. (1981). Software Engineering Economics. Prentice-Hall, New Jersey, USA, 2nd edition.
  2. Duda, R., Hart, P., and Stork, D. (2003). Pattern Classification. John Wiley & Sons, 2nd edition.
  3. Finnie, G., Wittig, G., and Desharnais, J. (1997). A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models. Journal of Systems and Software, 39:281-289.
  4. Gan, G., Ma, C., and Wu, J. (2007). Data clustering, theory, algorithmsm, and applications. ASA-SIAM Series on Statistical and Applied Probability, page 78.
  5. He, M., Li, M., Wang, Q., Yang, Y., and Ye., K. (2008). An investigation of software development productivity in china. In Proceedings of International Conference on Software Process, pages 381-394.
  6. Jorgensen, M. (2004). A review of studies on expert estimation of software development effort. Journal of Systems and Software, 70:37-60.
  7. Korte, M. and Port, D. (2008). Confidence in software cost estimation results based on mmre and pred. In Proceedings of PROMISE'08, pages 63-70.
  8. Krupka, E. and Tishby, N. (2008). Generalization from observed to unoberserved features by clustering. Journal of Machine Learning Research, 83:339-370.
  9. Park, H. and Baek, S. (2008). An empirical validation of a neural network model for software effort estimation. Expert System with Applications, 35:929-937.
  10. Pendharkar, P., G.Subramanian, and J.Roger (2005). A probabilistic model for predicting software development effort. IEEE Transactions on Software Engineering, 31(7):615-624.
  11. Prietula, M., Vicinanza, S., and Mukhopadhyay, T. (1996). Software-effort estimation with a case-based resoner. Journal of Experimental & Theoritical Artificial Intelligence, 8:341-363.
  12. Quinlan, J. (1993). Programs for Machine Learning. Morgan Kaufmann Publishers, 2nd edition.
  13. Rumelhart, D., Hinton, G., and Williams, J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing, Exploitations in the Microstructure of Cognition, pages 318-362.
  14. Shukla, K. (2000). Neuro-genetic prediction of software development effort. Information and Software Technology, 42:701-713.
  15. Song, Q. and Shepperd, M. (2007). A new imputation method for small software project data sets. Journal of Systems and Software, 80:51-62.
  16. Srinivasan, K. and Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2):126-137.
  17. Steinbach, M., Karypis, G., and Kumar, V. (2000). A comparison of document clustering techniques. In KDD2000 Workshop on Text Mining, pages 109-110.
  18. Theodoridis, S. and Koutroumbas, K. (2006). Recognition. Elsevier, 3rd edition.
  19. Yang, Y., Wang, Q., and Li, M. (2009). Process trustworthiness as a capability indicator for measuring and improving softwaer trustworthiness. In Proceedings of ICSP 2009, pages 389-401.
  20. Zhou, Z. and Tang, W. (2006). Clusterer ensemble. Knowledge-Based Systems, 19:77-83.
Download


Paper Citation


in Harvard Style

Zhang W., Yang Y. and Wang Q. (2011). ON THE PREDICTABILITY OF SOFTWARE EFFORTS USING MACHINE LEARNING TECHNIQUES . In Proceedings of the 6th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-8425-57-7, pages 5-14. DOI: 10.5220/0003408200050014


in Bibtex Style

@conference{enase11,
author={Wen Zhang and Ye Yang and Qing Wang},
title={ON THE PREDICTABILITY OF SOFTWARE EFFORTS USING MACHINE LEARNING TECHNIQUES},
booktitle={Proceedings of the 6th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2011},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003408200050014},
isbn={978-989-8425-57-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - ON THE PREDICTABILITY OF SOFTWARE EFFORTS USING MACHINE LEARNING TECHNIQUES
SN - 978-989-8425-57-7
AU - Zhang W.
AU - Yang Y.
AU - Wang Q.
PY - 2011
SP - 5
EP - 14
DO - 10.5220/0003408200050014