CQE

An Approach to Automatically Estimate the Code Quality using an Objective Metric

From an Empirical Study

Saima Arif, Miao Wang, Philip Perry and John Murphy

School of Computer Science and Informatics, University College Dublin, Dublin, Ireland

Keywords:

Static Analysis, Code Quality, Process Metrics.

Abstract:

Bugs in a project, at any stage of Software life cycle development are costly and difﬁcult to ﬁnd and ﬁx.

Moreover, the later a bug is found, the more expensive it is to ﬁx. There are static analysis tools to ease the

process of ﬁnding bugs, but their results are not easy to ﬁlter out critical errors and is time consuming to

analyze. To solve this problem we used two steps: ﬁrst to enhance the bugs severity and second is to estimate

the code quality, by Weighted Error Code Density metric. Our experiment on 10 widely used open-source Java

applications automatically shows their code quality estimated using our objective metric. We also enhance the

error ranking of FindBugs, and provide a clear view on the critical errors to ﬁx as well as low priority ones to

potentially ignore.

1 INTRODUCTION

Generally, the software development process includes

the development phase followed by testing phase. The

software development lifecycle (SDLC) often goes

through a number of iterations from the development

to testing phase. In this case the interface offer oppor-

tunities to reduce time taken for entire SDLC to en-

hance the quality of software. Software quality is im-

portant as it leads to signiﬁcant cost (testing) saving

in SDLC (Boehm et al., 1976). Assessing the qual-

ity of software is largely subjective. In this paper,

we explore the possibility to assess software quality

by using an objective metric. Metric that can pro-

vide short interval feedback to improve a process is

well known in the ﬁeld of project management. It is

highlighted in the software domain by T.DeMarco’s

expression “You can’t control what you can’t mea-

sure” (Daniel, 2004). The quality metric presented in

this paper enables developers and testers to introduce

a feedback loop in the system. The advantage of their

feedback loop is to reduce the time taken to develop a

product and the cost of overall system development.

Code quality should be investigated carefully to

uncover potential errors/bugs, before handing over

software to testing teams. There are different static

analysis approaches/tools like PMD, glint and Find-

Bugs to ensure the quality of code. The goal of static

analysis is to uncover and remove coding problems.

These coding problems might produce run-time errors

for example dereferencing a null pointer and array

overﬂows (Ayewah and Pugh, 2009). However, due to

many variations in coding styles and logic ﬂows de-

tecting code errors with 100% accuracy is not always

possible. Here, we focus on bug report generated by

FindBugs (FB)

. In such cases, reports produced by

static code analysis tools (such as FB) might contain

a large number of false positives (FP). The generated

report needs to be further assessed manually by expe-

rienced developers (Shen et al., 2011).

The scope of this paper is use of an objective met-

ric, ranking of FB reports and tailor them according

to company requirements. Code quality can be cal-

culated automatically and is useful in providing feed-

back to developers and testers. It is advantageous to

know the code quality prior to any performance eval-

uation. This metric can have applications in terms of

comparing the code developed by individuals or par-

ticular teams. This approach will enable project man-

agers to assemble teams that are known to produce

good quality codes.

The research work mentioned in this paper primar-

ily focuses on code quality estimation based on soft-

ware bugs. Inefﬁcient coding style and knowledge in-

troduces bugs during the development process. Com-

panies are trying to ﬁnd and ﬁx bugs in early phase of

http://ﬁndbugs.sourceforge.net/ﬁndbugs2.html

198

Arif S., Wang M., Perry P. and Murphy J..

CQE - An Approach to Automatically Estimate the Code Quality using an Objective Metric From an Empirical Study.

DOI: 10.5220/0004492901980205

In Proceedings of the 8th International Joint Conference on Software Technologies (ICSOFT-EA-2013), pages 198-205

ISBN: 978-989-8565-68-6

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

SDLC, as bugs that are found late are difﬁcult to ﬁx.

1.1 Novel Approach

In this paper, we present the Code Quality Estima-

tor (CQE), which is built on top of the FB technique.

It offers more detailed error ranking strategy to auto-

matically estimate the code quality for efﬁcient deci-

sion making process. CQE approach is a three-step

process, which consists of: a) surveying experienced

developers with a large number of FB reports to cre-

ate a knowledge base with a list of multi-categorized

bugs. b) applying the knowledge base to enhance the

FB report of a given JAR. c) calculating the quality

metric to automatically estimate the code quality.

The ﬁnal output of our approach is the measure-

ment of “Error code density” (ECD) of given JAR. To

estimate the code quality, project managers can then

compare the value of ECD against their pre-deﬁned

thresholds. ECD can vary between different organi-

zations or even different teams within the same or-

ganization. The contribution of our work is listed as

following:

• Providing a clear breakdown of a list of bugs with

more detailed categories than FB. It allows user to

effectively identify the most and the least critical

errors to reduce the bug ﬁxing task.

• Automatically calculating the quality metric of a

given JAR, using a static knowledge base on an

initial survey process.

• Easing the decision making process to determine

the code quality in a timely manner. It helps to

avoid unnecessary time spent on selecting exter-

nal JARs as well as analyzing the quality of inter-

nal JARs.

The structure of rest of paper is organized as fol-

low: in Section 2 we present the background knowl-

edge for the FB and its general issues. In Section 3

quality metrics are explained. In Section 4 the CQE

methodology will be detailed. Section 5 will show

our experiment results. In Section 6 a number of re-

lated works will be discussed and conclusion and fu-

ture work will be drawn in Section 7.

2 BACKGROUND KNOWLEDGE

OF FINDBUGS

FB is a static analysis tool used to obtain informa-

tion about bug patterns. This bug patterns are possi-

ble errors in java code. The overwhelming number of

700,000 downloads (Shen et al., 2011) is an indicator

of its popularity in industrial and research projects.

It was also an essential analyzer in developing Java

programs in Google (Ayewah et al., 2007). It uses

different set of bug detectors for detecting bug pat-

terns. There are seven categories and 400 bug pat-

terns associated with these categories. Categories are:

bad practice, correctness, malicious code vulnerabil-

ity, multithreaded correctness, performance, security

and dodgy code.

It generates report with the priority of errors as

“High” and “Medium”. Priorities are hard-coded by

tool developers. It is possible that high priority errors

are of high FP rates, as the priority is set by devel-

opers according to their experiences (Kim and Ernst,

2007). The problem with FB report is that it does not

provide information about the frequency of particu-

lar error categories as well as their patterns. It does

not provide any statistical results. Statistical measure

saves developer’s time to go through only those cate-

gories and their patterns which they want to ﬁx with

priority. Currently, FB does not provide any real qual-

ity metric or mechanism to quickly address these is-

sues. Lack of quality estimation makes it difﬁcult to

be used to pre-justify code quality. It is quite hard

to judge code quality by just looking at reports with-

out any indication of the frequency of particular error

report.

There is a need for an additional assistive system

to enhance the reports generated by FB and provide

quantitative measures. Some mechanisms need to

be established where programmer can assign weights

to different categories of errors in accordance to the

requirements of their applications. Critical errors

should be given higher ranks, as it is important to go

through higher rank errors and improve the code by

quality metric.

3 QUALITY METRICS

Quality is an important aspect of the software devel-

opment process. Especially for software maintenance

and management, where the availability of a qual-

ity metric could provide an important measurement

to support high-level decision making. In the early

stages of SDLC, quality metrics have been rarely

used. Software quality metrics can help to measure

the deviation of actual functionality (quality), time

frame and budget planning for a prospective system

development process. These metrics have been used

for comparison between predicted and actual outcome

of a systems quality (Daniel, 2004).

Software process quality metrics are classiﬁed as

error density and severity. Different types of metrics

CQE-AnApproachtoAutomaticallyEstimatetheCodeQualityusinganObjectiveMetricFromanEmpiricalStudy

199

are used to evaluate error density. Weight assignment

on the basis of error severity helps to classify errors.

A weighted metric could provide accurate evaluation

of the error situation. It is obtained by multiplying

the number of errors found in each severity class with

the appropriate relative severity weight and then sum-

ming up all error weights. The Weighted Code Error

Density (WCED) metric works as a better indicator

of adverse error situations than a simple Code Error

Density (CED) metric (Daniel, 2004).

In this paper, we are using an error density metric.

CED is deﬁned as:

CED = NCE/KLOC (1)

where NCE = number of code errors, KLOC =

thousands of lines of code

WCED =

∑

WCE/KLOC (2)

Equation (2) is a standard way to calculate

WCED, calculated by the sum of weights given to

code errors divided by KLOC, but it does not say how

to give weights to errors, according to speciﬁc needs.

To provide a ﬂexible method of calculating WCED,

we propose the use of a non-linear function.

WCE

is a weight of code error given to particular

rank and is calculated as:

WCE

= NCE

∗W

(3)

where NCE

is the number of code errors of spe-

ciﬁc rank, W

=(1,2,3,4,5) is the weight given to rank

(R1-R5) assigned on the basis of severity of error,

where R5 is the highest rank. α = exponent value,

to show the importance of severe error.

This ranking system (1-5) has been used to ex-

tensively as a Mean Opinion Score to evaluate user

perceived quality in both Voice and Video (Muntean

et al., 2007).

From these metrics, the Code Quality Estimation

(CQE) of a JAR can be calculated as:

CQE =

∑

i=1

WCE

/KLOC (4)

Similarly FB quality (FBQ) is calculated by sum-

ming up weights of high and medium priority bugs di-

vided by KLOC. W

=(3,5) where (WCE

i=3, WCE

i=5)

FBQ =

∑

i=3,5

WCE

/KLOC (5)

Finally, in Equation (4) the CQE of a JAR is cal-

culated by summing all error weights divided by its

KLOC. For example, if one JAR has an error of R5

and another JAR has 5 errors of R1, NCE for both

JARs comes out to be 5. This undermines the im-

portance of critical error in the ﬁrst JAR. However,

the use of non-linear emphasizes the importance of

error. Our metric will work for any exponent value

>1. Equation (2) shows that if WCED is high, the

quality of the JAR is not acceptable.

4 MODEL OF CQE

This paper presents an approach to enhance the error

ranking of FB by a once-off survey and provides a

mechanism to automatically estimate the code quality

using CQE.

As part of our training process, a reasonable num-

ber of JARs has been used to cover various application

domains in addition to the survey template. In our ex-

periments, we have used open source JARs obtained

from a repository

for the training process. This ap-

proach requires a number of experienced developers

to participate in our survey. The aim of our process

is to make sure that reports generated based on such a

training dataset will contain objective results for a list

of ranked errors. This list has used as the knowledge

base for further statistical calculations. Once the sur-

vey work has completed, the CQE model can be used

to automatically estimate the code quality.

Our quality assessment model for better code

quality is shown in Figure.1. A detailed description

has also been given in subsequent sub-sections.

!"#$%&'()'*+,-.%'

/01*''

23)45"6*'#%-(#7'(8'

9%*7'/01*''

:);+)<%4'%##(#'

#+)=3)6'(8'7%*7'/01*'

>(4%'?"+.37&'

@+7+A+*%'(8'#+)=%4'

%##(#*'

B%)%#+7%*'

0--.3%*''

C#(4"<%'

:*D,+7%*'

E)<%'(F'<+-7"#%'

:G-%#7'=)(H.%46%''

0"7(,+D<'0--.&3)6'

:G-%#7'=)(H.%46%''

Figure 1: CQE Model.

4.1 Survey

Reports generated by FB are stored in excel sheet and

are given to developers.

http://search.maven.org/browse—47

ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies

200

Table 1: High rank errors.

Error Error description

H C RV new IllegalException not thrown

H C BIT Bitwise add of signed byte value

H C FS illegal format string

M V MS Y should be package protected In Y

H C HE Y doesn’t deﬁne a hashCode method

H C NP Null passed for nonnull parameter

H C SA Self assignment of ﬁeld Y in Y

H C EC Using pointer equality to compare

H C IL There is an apparent inﬁnite loop

H M Ru Y explicitly invokes run on a thread

H M VO Increment of volatile ﬁeld Y in Y

H B ES Comparison of String parameter

M B NP Y may return null

H D ST Write to static ﬁeld Y from instance

Table 2: Test JARs.

JAR LOC

Asm-4.0 13672

Aspectj-1.6.5 122321

Axis-10.3 133708

BCastle-1.4.6 161021

Cglib-2.2.2 19412

Derby-10.8.1.2 642704

JBoss-5.1 162431

Jline-2.7 8998

Jnpserver-5.0.5 7412

Tomcat-7.0.8 147179

Survey was conducted on 80 open source projects

by giving ranks to the FB reports on excel sheet. Eight

experienced developers gave ranks to reports from 1-

5 based on their severity. All of them had experience

of using FB. These ranks are described as follows:

• R5 is a rank given to “must ﬁx” error.

• R4 is assigned to “should ﬁx” errors.

• R3 is given to “have a look” error.

• R2 is assigned to “harmless” errors.

• R1 is given to “unreal bugs” or false error mes-

sages.

Analysis of error ranking gives a generalized view

of critical errors of JARs. Table 1 shows a list of im-

portant categories of errors that are classiﬁed as R5.

Table 1 shows the correctness category of errors

with some of its patterns are highly ranked like “H C

RV” , where “H” is the high priority, “C” is the cor-

rectness category, “RV” is random value pattern and

description of the error. Next important category af-

ter correctness is multithreaded correctness and a few

patterns of bad practice. The survey shows that some

medium priority errors like “M V MS” with their cat-

egories like experimental and malicious code vulner-

ability and their speciﬁc patterns have been classiﬁed

as severity rank 5 by sample of developers.

This is because different categories of errors are

important for different applications. A bug that is im-

portant to web frontend developer may not be impor-

tant to backend developer. In survey more importance

is given to the category and pattern of error message

rather than the priority of error.

4.2 Database

After receiving feedback, we have created a simple

database of errors based on ﬁve ranks by combining

all error reports in an excel spread sheet. The rank

assigned to each error is calculated by taking average

of similar reports. From the survey, the unique error

patterns are obtained in the database. The database

cover almost 40% of all possible FB error categories

and their patterns.

On average, 80% of errors in test JARs are covered

by database. Errors that are not covered by database

are ranked by giving FB rank. We have given rank 3

for medium priority and rank 5 for high priority er-

rors. The database created from our survey is a once-

off process to capture domain expert knowledge of

developers. To apply our CQE approach to differ-

ent organizations or development environments sur-

vey should always be carried out.

4.3 Enhanced Error Ranking

The purpose of our research work is to enhance the

performance of FB tool by focusing on two aspects

to classify the most and least severe errors based on

user’s designation and automatically assign new error

ranks to FB reports by error matching with the survey

database. Usability scenario of our approach is that a

jar is used as an input ﬁle, and it will automatically

rank FB report (of jar) by matching the errors with

the database. After ranking it also calculate the CQE

metric. User interface to access the quality of a jar is

from command line.

4.4 Code Quality

Depending on the rank designation of errors, the CQE

metric is calculated using Equation 4. The weight

of an error is calculated by using Equation (3). If a

project has less severe errors then overall weight of

code errors will also be less and the code quality is

acceptable and vice-versa. In (Binkley, 2007), there

are different applications of source code analysis. In

CQE-AnApproachtoAutomaticallyEstimatetheCodeQualityusinganObjectiveMetricFromanEmpiricalStudy

201

our work, two of them (Schulmeyer and McManus,

1999) (Yang and Ward, 2003) have been used for the

purpose of software quality assessment and program

evaluation respectively.

From the outcome of CQE approach, developers

can easily ﬁnd to:

• Rank errors of their code.

• Decide higher priority errors to ﬁx ﬁrst.

• Extract information about priorities, categories

and patterns of bugs along with the frequency of

each rank of errors.

5 EXPERIMENTS AND RESULTS

In our work, 80 recent JARs have been randomly se-

lected from maven repository. After having survey

on those jars from experienced developers and creat-

ing database of ranked errors, we compare JAR report

with the database to give ranks. Ranks are assigned

automatically to the JAR errors by comparing it with

rules/classiﬁcation of ranks in database. Program-

mers can check severe errors with high ranks (R5, R4)

and less important with low ranks (R3, R2). Test JARs

and LOC are as shown in Table 2.

5.1 Error Enhancing

When test JARs are processed by FB, reports only

give high and medium priority of errors. They do not

provide much information. While checking the qual-

ity of code by FB, the tester has to go through all error

messages to see the critical errors of the application,

which is time consuming.

Figure 2 shows detailed classiﬁcation of errors (H

and M) of all ranks (R1-R5) for each JAR report. As

shown in Figure 2, there are high number of medium

errors in our ranking and only a small number of high

rank error. For example in Axis JAR has 15 R2M and

1 R2H, similarly 63 R3M and only 3 R3H, where as

R5H are 15 and 7 errors of R5M. Similarly for most

of jars number of RM (medium rank) is higher than

RH (high rank). Hence CQE shows a clear and more

detailed picture of error ranking. It will be quite con-

venient for programmers to ﬁx only high ranked er-

rors and improve their code quality quickly.

From the survey, 152 unique error patterns are ob-

tained in the database, which covers around 40% of

FB error categories and patterns. The number of er-

rors belong to R5 are 24, R4 are 52, R3 are 46, R2 are

23 and R1 are 9. Important features of the CQE are as

follows:

• It can automatically rank the errors according to

the knowledge base.

• It highlights critical errors to be ﬁxed at ﬁrst pri-

ority.

• It saves time for developers to only go through

high rank errors.

5.2 CQE Calculation

The code quality of JAR is estimated by CQE metric.

Results are shown in Table 3.

Table 3 shows the contributions of different

weights (W1-W5) of errors. For FB, high priority

weight is denoted by WH and medium priority weight

is WM. To compare the W

for FB and CQE metric,

we took a worst case scenario and mapped FB’s high

priority to a rank of 5 and 3 for medium. Weights are

calculated from Equation (3) assigned on the basis of

error ranking and exponent value of 1.5, this value is

empirically investigated and is optimal for our jars. To

ﬁnd a suitable value for the exponent, we have seen in

our survey that sometimes developers overestimate or

underestimate the importance of a bug. We observed

some positive and some negative differences between

the values calculated for FBQ and CQE for different

exponent values. From empirical investigation expo-

nent value of 1.5 is used as ﬁnal as it shows optimized

results comparable with FB. Our experiments shows

that an error of R5 is 11 times more severe than R1,

which seems reasonable.

Table 3 shows that R3 and R4 errors contribute

most to the overall quality depredation. This is to be

expected as there are many more errors ranked R3 and

R4 as compare to R5 as shown in Figure 2. The CQE

and FBQ is calculated from Eq. 4 using their weights.

Results in the Table 3 shows Derby is the best

quality JAR with a small number of errors, and high

KLOC. JBoss is a next good quality JAR with large

KLOC. Jline and Jnpserver have high CQE, as they

have less LOC, hence have a higher error density. WH

of Aspectj is 88 where as W5 is 22.36 which shows

that the CQE approach highlights a smaller number of

critical bugs. Similarly WH of BC is 254.31 where as

W5 is only 122.98. So in this case we should only

give importance to W5 as compare to WH. There-

fore from Table 3 CQE based method gives better and

clear picture of error ranking instead of just having

only two priorities of errors. In Table 3 the overall

quality is presented using an approach (CQE) and the

FB report categorization (FBQ).

In Figure 3, JAR with higher CQE value, have

lower quality code. A project manager can deﬁne a

threshold for good quality code in terms of having no

ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies

202

Figure 2: Survey ranking with H and M.

0.00#

2.00#

4.00#

6.00#

8.00#

10.00#

12.00#

14.00#

16.00#

18.00#

20.00#

Asm#

Aspectj#

Axis#

BC#

Cglib#

Derby#

jboss#

jline#

jnpserver#

tomcat#

CQE$

Test$JARs$

CQE# FB#Quality##

Figure 3: CQE and FBQ.

severe error for a particular project. If the code qual-

ity is above the threshold, it will be marked as a bad

quality JAR.

5.3 Discussion

Our approach could be used on any static analysis

tools like PMD and jlint output, by making the output

more precise and easy to understand to save time for

developers and managers for improved code quality.

FBQ metric could also be used to access the qual-

ity of code but there are some limitations: ﬁrstly, the

categorization is based on the opinions of the devel-

opers surveyed by the people who developed FB this

means they are in some way general and you have no

mechanism to tailor the FBQ metric to the require-

ments of your own companys opinions Secondly, the

CQE technique also gives a ranked list of the bugs

with a higher granularity than the raw FB list. This

allows the development team to priorities the bug ﬁx-

ing process.

Our methadology could also be used for checking:

• Team quality: Within a team of developers, we

should see a higher correlation between the team

members in ranking of errors. This could be used

to appraise the output of individual developers or

different groups. For example when developer A

works with B they tend to produce high quality

code. When A works with C, they tend to produce

poor quality code. So when a manager is organiz-

ing a team, he/she will know what combinations

work best.

• Education: An individual can rank the bugs. It can

also be used to monitor the code that they write

through out their studies, the quality will improve

over time. It also allows them to compare other

people code to what they personally consider to

be good code.

5.4 Threats to Validity

Survey is a knowledge base; it is subjective and is

mandatory. Survey is speciﬁc to a company and a

project. If we dont have a good number of surveys and

CQE-AnApproachtoAutomaticallyEstimatetheCodeQualityusinganObjectiveMetricFromanEmpiricalStudy

203

Table 3: Enhanced error ranking with CQE and FBQ.

CQE FB

Test JARs W1 W2 W3 W4 W5 CQE WM WH FBQ

Asm 0 2.83 20.78 40 11.18 5.47 46.77 22.36 5.06

Aspectj 1 16.97 155.88 144 22.36 2.78 254.31 88 2.80

Axis 3 45.25 342.95 336 234.79 7.19 576.09 407 7.35

BC 1 33.94 135.10 112 122.98 2.51 254.31 165 2.60

Cglib 0 5.66 62.35 32 11.18 5.73 83.04 33 5.98

Derby 2 8.49 31.18 104 33.54 0.28 114.18 55 0.26

Jboss 1 36.77 124.71 80 100.62 2.11 243.93 110 2.40

Jline 0 5.66 31.18 96 33.54 18.49 83.04 77 17.79

Jnpserver 0 8.49 31.18 56 11.18 14.41 72.66 33 14.26

Tomcat 0 28.28 124.71 280 167.71 4.08 337.35 209 3.71

experienced developers its a threat to validity. As the

the given ranking is important for calculating CQE, if

ranking is not given carefully there will be problem

to estimate the code quality. Our ranking may not be

generalize for other projects as all projects have dif-

ferent functionalities/requirements.

6 RELATED WORK

In this section we discuss some related work in the

area of improving code quality by enhancing error

ranking of static analysis tools.

Error ranking of only correctness category was

improved by (Shen et al., 2011). Shen et al. ranked

error reports by user designation and then gave rank-

ing on defect likelihood of bugs patterns to be an error

or FP. The comparison of results illustrated that Effec-

tive FB is an effective tool for error ranking in large

Java applications. But limited patterns of errors were

discovered in their approach.

Ayewah et al. (Ayewah et al., 2008) and (Ayewah

et al., 2007) evaluated bug warnings on production

software like Google, IBM web sphere, JBoss and Or-

acle containers for java. They analyzed error reports

of FB, and classiﬁed each issue as impossible, trivial

or a defect. Google deployed FB as a part of their

project. They addressed and improved real defects in

code by analyzing static errors, but only checked the

correctness warnings.

In (Ayewah and Pugh, 2009) did a user study to

check the list of important errors. They checked their

user study with expert participants, and found strong

responses for real bugs and weak responses for fake

warnings. Different user studies were conducted to

review the static analysis warnings of FB but different

users have a different understanding for looking at the

bug reports. They got more responses reported for

Null pointer checking than Redundant checking for

null.

(Kim and Ernst, 2007) used the History Warning

Prioritization algorithm for checking the severe bugs

as important ones. They classiﬁed important bugs as

those, that are ﬁxed in their next release by mining

the change history of project are considered to be real

errors. They applied their warning ranking method on

FB, jlint and PMD. But their algorithm will not work

if new project is tested. A new version of software

will have new features and new errors will be detected

that are not present in their real errors list.

Coding standards also play an important role in

maintaining good quality of code (Fang, 2001). But

for that purpose programmers should be an expert to

maintain coding standards. Furthermore it is difﬁcult

to maintain the coding standards because of the dead-

lines of projects.

Software quality factors were mapped by static

analysis tools like FB, PMD and Metrics. The most

covered quality factor from these tools was reliabil-

ity. Chirila et al. (Chirila et al., ) assessed the qual-

ity model based on the mapping of quality factors of

project with the tools coverage. But static analyzers

have a high number of false positives, so it is difﬁcult

to assign the actual quality of a project.

Static analysis tools were used as early indicators

of pre release defect density. In (Nagappan and Ball,

2005) Nagappan used PREﬁx and PREfast tools for

checking the pre-release defect density of Windows

Server 2003. They found the defect density correlated

with other pre-release defects extracted from testing,

integration, build results etc. Again sorting out of im-

portant errors of static tools is manual and time con-

suming. So static analysis is helpful in identifying

fault and not fault prone components of system.

These studies have used different methods to iden-

tify errors by static tools. Our approach is different

and novel as we identify important errors of a system

in an efﬁcient and automatic way to assess code qual-

ity by the CQE approach.

ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies

204

7 CONCLUSIONS AND FUTURE

WORK

In this paper, we presented our approach – CQE to

support efﬁcient code quality estimation by enhanc-

ing the current FB tool. The CQE is a three-step ap-

proach that make use of FB, surveying developers,

and calculating quality metrics. During the experi-

ment, we have used 80 JARs to train our knowledge

base after assessing FB reports with 8 experienced de-

velopers. The survey template contains 3 more sever-

ity categories than what FB provides (2 priorities: M,

H) in the bug report. By evaluating 10 testing JARs,

we have seen that these extra severity categories can

cause the code quality metric to slightly vary com-

paring to FB. CQE approach provides an automatic

and efﬁcient way to estimate and improve code qual-

ity with the help of statistics (classiﬁcation) of errors

provided by our approach. Furthermore by maintain-

ing a knowledge base obtained from initial surveys,

the subsequent code quality estimation processes can

be fully automated. This automatic process will sup-

port project managers with efﬁcient decision making.

In future we will use other quality metrics like

Weighted Code Errors per Function point (WCEF) to

identify quality factors other than weighted error den-

sity. We also like to focus on speciﬁc application (like

web based or database) based severe errors. We also

like to integrate our approach into FB to give clear

picture of severity of error and code quality estimated

by CQE metric.

In summary, our approach is useful for checking

the quality at different levels like:

• Global Quality: Expert knowledge could be taken

from a wide range of developers. This would cre-

ate a huge sample size, as large variation in opin-

ions would be expected.

• Corporate Quality: Use experts across company

and use that knowledge to estimate the code qual-

ity according to corporate norms. This could be

useful to assess in house teams or the quality of

software that comes from outsourced third par-

ties. Third parties could use this tool to assess the

client’s code quality.

ACKNOWLEDGEMENTS

This work was supported, in part, by Science

Foundation Ireland grant 10/CE/I1855 to Lero -

the Irish Software Engineering Research Centre

(www.lero.ie).

REFERENCES

Ayewah, N., Hovemeyer, D., Morgenthaler, J., Penix, J.,

and Pugh, W. (2008). Using static analysis to ﬁnd

bugs. Software, IEEE, 25(5):22 –29.

Ayewah, N. and Pugh, W. (2009). Using checklists to re-

view static analysis warnings. DEFECTS ’09, pages

11–15, New York, NY, USA. ACM.

Ayewah, N., Pugh, W., Morgenthaler, J. D., Penix, J., and

Zhou, Y. (2007). Evaluating static analysis defect

warnings on production software. PASTE ’07, pages

1–8, New York, NY, USA. ACM.

Binkley, D. (2007). Source code analysis: A road map.

In Future of Software Engineering, 2007. FOSE ’07,

pages 104 –119.

Boehm, B. W., Brown, J. R., and Lipow, M. (1976). Quanti-

tative evaluation of software quality. ICSE ’76, pages

592–605, Los Alamitos, CA, USA. IEEE Computer

Society Press.

Chirila, C., Juratoni, D., Tudor, D., and Cretu, V. Towards

a software quality assessment model based on open-

source statical code analyzers. (SACI), 2011, pages

341 –346.

Daniel, G. (2004). Software quality assurance: From theory

to implementation. chapter 21. Pearson Education.

Fang, X. (2001). Using a coding standard to improve pro-

gram quality. In Quality Software, 2001. Proceed-

ings.Second Asia-Paciﬁc Conference on, pages 73 –

78.

Kim, S. and Ernst, M. D. (2007). Which warnings should

i ﬁx ﬁrst? ESEC-FSE ’07, pages 45–54, New York,

NY, USA. ACM.

Muntean, G.-M., Perry, P., and Murphy, L. (2007). A

comparison-based study of quality-oriented video

on demand. Broadcasting, IEEE Transactions on,

53(1):92 –102.

Nagappan, N. and Ball, T. (ICSE,2005). Static analysis

tools as early indicators of pre-release defect density.

pages 580 – 586.

Rutar, N., Almazan, C., and Foster, J. (2004). A comparison

of bug ﬁnding tools for java. pages 245 – 256.

Schulmeyer, G. and McManus, J. (1999). Handbook of soft-

ware quality assurance. Prentice Hall.

Shen, H., Fang, J., and Zhao, J. (ICST,2011). Eﬁndbugs:

Effective error ranking for ﬁndbugs. pages 299 –308.

Yang, H. and Ward, M. (2003). Successful Evolution Of

Software Systems. Artech House Computer Library.

Artech House.

CQE-AnApproachtoAutomaticallyEstimatetheCodeQualityusinganObjectiveMetricFromanEmpiricalStudy

205