A Data-driven Methodology towards Interpreting Readability against Software Properties

Thomas Karanikiotis, Michail Papamichail, Ioannis Gonidelis, Dimitra Karatza, Andreas Symeonidis


In the context of collaborative, agile software development, where effective and efficient software maintenance is of utmost importance, the need to produce readable source code is evident. Towards this direction, several approaches aspire to assess the extent to which a software component is readable. Most of them rely on experts who are responsible for determining the ground truth and/or set custom evaluation criteria, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. In an effort to provide a fully automated and extendible methodology, we refrain from using experts; rather we harness data residing in online code hosting facilities towards constructing a dataset that includes more than one million methods that cover diverse development scenarios. After performing clustering based on source code size, we employ Support Vector Regression in order to interpret the extent to which a software component is readable on three axes: complexity, coupling, and documentation. Preliminary evaluation on several axes indicates that our approach effectively interprets readability as perceived by developers against the aforementioned three primary source code properties.


Paper Citation