Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories

Markus Scheidgen, Martin Smidt, Joachim Fischer

Abstract

With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.

References

  1. Altmanninger, K., Seidl, M., and Wimmer, M. (2009). A survey on model versioning approaches. IJWIS, 5(3):271-304.
  2. Bajracharya, S., Ossher, J., and Lepos, C. (2009). Sourcerer: An internet-scale software repository. In Proceedings of SUITE'09, an ICSE'09 Workshop, Vancouver, Canada.
  3. Barmpis, K. and Kolovos, D. (2013). Hawk: Towards a scalable model indexing architecture. In Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 7813, pages 6:1-6:9, New York, NY, USA. ACM.
  4. Basili, V. R., Briand, L. C., and Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22(10):751-761.
  5. Bruneliere, H., Cabot, J., Jouault, F., and Madiot, F. (2010). Modisco: A generic and extensible framework for model driven reverse engineering. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE 7810, pages 173- 174. ACM.
  6. Chidamber, S. R. and Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Trans. Softw. Eng., 20(6):476-493.
  7. Chikofsky, E. J., Cross, J. H., et al. (1990). Reverse engineering and design recovery: A taxonomy. Software, IEEE, 7(1):13-17.
  8. Di Rocco, J., Di Ruscio, D., Iovino, L., and Pierantonio, A. (2014). Mining metrics for understanding metamodel characteristics. In Proceedings of the 6th International Workshop on Modeling in Software Engineering (MiSE), pages 55-60.
  9. Dyer, R., Nguyen, H. A., Rajan, H., and Nguyen, T. N. (2015). Boa: Ultra-large-scale software repository and source-code mining. ACM Transactions on Software Engineering and Methodology (TOSEM), 25(1):7.
  10. Falleri, J.-R., Teyton, C., Foucault, M., Palyart, M., Morandat, F., and Blanc, X. (2013). The harmony platform. CoRR, abs/1309.0456.
  11. Gousios, G. and Spinellis, D. (2009). A platform for software engineering research. In Godfrey, M. W. and Whitehead, J., editors, MSR, pages 31-40. IEEE.
  12. Gyimothy, T., Ferenc, R., and Siket, I. (2005). Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng., 31(10):897-910.
  13. Kagdi, H., Collard, M. L., and Maletic, J. I. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice, 19(2):77-131.
  14. Kolovos, D. S., Di Ruscio, D., Pierantonio, A., and Paige, R. F. (2009). Different models for model matching: An analysis of approaches to support model differencing. In Proceedings of the 2009 ICSE Workshop on Comparison and Versioning of Software Models, pages 1-6. IEEE Computer Society.
  15. Livshits, V. B. and Zimmermann, T. (2005). Dynamine: finding common error patterns by mining software revision histories. In Wermelinger, M. and Gall, H., editors, ESEC/SIGSOFT FSE, pages 296-305. ACM.
  16. Milev, R., Muegge, S., and Weiss, M. (2009). Design Evolution of an Open Source Project Using an Improved Modularity Metric. Open Source Ecosystems: Diverse Communities Interacting, 299:20-33.
  17. Scheidgen, M. and Fischer, J. (2014). Model-based mining of source code repositories. In Amyot, D., Fonseca i Casas, P., and Mussbacher, G., editors, System Analysis and Modeling: Models and Reusability, volume 8769 of Lecture Notes in Computer Science, pages 239-254. Springer International Publishing.
  18. Scheidgen, M. and Zubow, A. (2012). Emf modeling in traffic surveillance experiments. In Duddy, K., Steel, J., and Raymond, K., editors, Modeling of the Real World. ACM Digital Library. to appear.
  19. Scheidgen, M., Zubow, A., Fischer, J., and Kolbe, T. H. (2012). Automated and transparent model fragmentation for persisting large models. In Proceedings of the 15th International Conference on Model Driven Engineering Languages and Systems (MODELS), volume 7590 of LNCS, pages 102-118, Innsbruck, Austria. Springer.
  20. Subramanyam, R. and Krishnan, M. S. (2003). Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. Softw. Eng., 29(4):297-310.
  21. Williams, C. C. and Hollingsworth, J. K. (2005). Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Software Eng., 31(6):466-480.
  22. Williams, J., Matragkas, N., Kolovos, D., Korkontzelos, I., Ananiadou, S., and Paige, R. (2014a). Software analytics for MDE communities. CEUR Workshop Proceedings, 1290:53-63.
  23. Williams, J. R., Ruscio, D. D., Matragkas, N., Rocco, J. D., and Kolovos, D. S. (2014b). Models of OSS project meta-information: a dataset of three forges. Proceedings of the 11th Working Conference on Mining Software Repositories, undefined(undefined):408-411.
  24. Wilson, S. and Kesselman, J. (2000). Java Platform Performance: Strategies and Tactics. Addison-Wesley, Boston, MA.
  25. Yu, P., Systä, T., and Müller, H. A. (2002). Predicting fault-proneness using OO metrics: An industrial case study. In Proceedings of the 6th European Conference on Software Maintenance and Reengineering, CSMR 7802, pages 99-107, Washington, DC, USA. IEEE Computer Society.
Download


Paper Citation


in Harvard Style

Scheidgen M., Smidt M. and Fischer J. (2017). Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories . In Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development - Volume 1: MODELSWARD, ISBN 978-989-758-210-3, pages 329-336. DOI: 10.5220/0006127303290336


in Bibtex Style

@conference{modelsward17,
author={Markus Scheidgen and Martin Smidt and Joachim Fischer},
title={Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories},
booktitle={Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development - Volume 1: MODELSWARD,},
year={2017},
pages={329-336},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006127303290336},
isbn={978-989-758-210-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development - Volume 1: MODELSWARD,
TI - Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories
SN - 978-989-758-210-3
AU - Scheidgen M.
AU - Smidt M.
AU - Fischer J.
PY - 2017
SP - 329
EP - 336
DO - 10.5220/0006127303290336