Predicting User Satisfaction in Software Projects using Machine

Learning Techniques

Łukasz Radli

nski

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin,

Keywords:

User Satisfaction, Software Projects, Prediction Models, Machine Learning, ISBSG.

Abstract:

User satisfaction is an important aspect of software quality. Factors of user satisfaction and its impact on

project success were analysed in various studies. However, very few studies investigated the ability to predict

user satisfaction. This paper presents results of such challenge. The analysis was performed with the ISBSG

dataset of software projects. The target variable, satisfaction score, was deﬁned as a sum of eight variables

reﬂecting different aspects of user satisfaction. Twelve machine learning algorithms were used to build 40

predictive models. Each model was evaluated on 20 passes with a test subset. On average, a random forest

model with missing data imputation by mode and mean achieved the best performance with the macro mean

absolute error of 1.88. Four variables with the highest importance on predictions for this model are: survey

respondent role, log(effort estimate), log(summary work effort), and proportion of major defects. On average

14 models performed worse than a simple baseline model. While best performing models deliver predictions

with satisfactory accuracy, high variability of performance between different model variants was observed.

Thus, a careful selection of model settings is required when attempting to use such model in practise.

1 INTRODUCTION

Project success is typically evaluated in the main three

dimensions: time and budget for process performance

and requirements for product performance. However,

more and more often management approaches high-

light the criticality of stakeholder satisfaction (Dieg-

mann et al., 2017). The ISO/IEC 25010:2011 stan-

dard deﬁned satisfaction as a ”degree to which user

needs are satisﬁed when a product or system is used

in a speciﬁed context of use” (ISO/IEC, 2011).

Numerous studies investigated factors inﬂuencing

user satisfaction in a software project. The impor-

tance of user satisfaction was conﬁrmed by an empir-

ical analysis in a study (Bano et al., 2017) which con-

cluded that user satisfaction signiﬁcantly contributes

to the system success even when schedule and budget

goals are not met. Recently published results from a

systematic literature review show that one of the main

factors that affects customer satisfaction is related to

the application of agile development methodologies

due to their deep involvement of the customer in the

development process (Amirova et al., 2019).

Some authors claim that user satisfaction is mea-

https://orcid.org/0000-0003-1007-6597

surable but not predictable (Jones, 2008, p. 456). Low

number of empirical studies on this problem partially

conﬁrms this claim. Furthermore, as discussed in Sec-

tion 2, existing literature reveals the difﬁculty of pre-

dicting user satisfaction with the acceptable accuracy.

Still, it shows that there is a potential for taking up

with this challenge. This paper investigates the fol-

lowing ﬁve research questions (RQ):

1. What accuracy of predictions can be achieved by

different models?

2. What are the ranks of each prediction model?

3. What accuracy of predictions can be achieved by

model variants for each prediction technique?

4. How accurate are predictions from the best per-

forming model?

5. Which attributes (predictors) are the most impor-

tant for the best performing model?

This study used the extended edition of the ISBSG

R11 dataset of software projects (ISBSG, 2009). This

extended dataset contains additional attributes de-

scribing software development process and, most im-

portantly, eight attributes reﬂecting user satisfaction.

For the needs of this study these eight attributes were

aggregated into a single target variable as explained

374

Radli

nski, Ł.

Predicting User Satisfaction in Software Projects using Machine Learning Techniques.

DOI: 10.5220/0009391803740381

In Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2020), pages 374-381

ISBN: 978-989-758-421-3

in Section 3. Twelve machine learning techniques

were used to predict this aggregated user satisfaction.

For most techniques several variants were built that

involved various combinations of the following: dif-

ferent type of dataset used for learning, missing value

imputation technique, data normalization, and appli-

cation of feature selection technique.

This paper makes the following contributions to

the applied science and practice: It evaluates a range

of predictive models and provides their ranking for

predicting the aggregated user satisfaction in software

projects. It also provides results from a deeper analy-

sis of performance of the most accurate model.

The paper is organized as follows: Section 2 dis-

cusses related work. Section 3 explains the data and

research method used in this study. Section 4 presents

obtained results by providing answers to each re-

search question. Section 5 discusses limitations and

threats to validity of results. Section 6 formulates

conclusions and ideas for future work.

2 RELATED WORK

There are two earlier studies which scope is the clos-

est to the current study, i.e. they also involved pre-

dicting user satisfaction with machine learning tech-

niques and using ISBSG dataset. The ﬁrst of them

(Radli

nski, 2015) was focused on predicting one at-

tribute of user satisfaction, i.e., ability of system to

meet stated objectives. The values of that target

variable were transformed to binary values reﬂect-

ing whether satisfaction in this aspect was achieved

or not. As a result, the prediction task was a binary

classiﬁcation. A total of 288 prediction schemes, i.e.

model variants, were evaluated in the ability to pre-

dict the target variable. These schemes were built as

combinations of their components, i.e. attribute pre-

selection, elimination of missing values, automated

attribute selection, and a classiﬁer. Two best perform-

ing schemes based on LMT and SimpleLogistic clas-

siﬁers achieved the accuracy measured as Matthews

correlation coefﬁcient of 0.71 in the test subset.

The second study (Radli

nski, 2018) was at a sig-

niﬁcantly larger scale as it involved building, evaluat-

ing and comparing 15,600 prediction schemes. Each

scheme was built as a combination of its components:

manual attribute pre-selection, handling missing val-

ues, outlier elimination, value normalization, auto-

mated attribute selection, and a classiﬁer. That study

also involved a binary classiﬁcation task. However,

the target variable was an aggregated user satisfac-

tion, i.e., a mean of eight satisfaction variables sub-

sequently dichotomized to a logical variable. The

research procedure involved training and evaluation

of each prediction scheme using a 10-fold cross-

validation and a separate testing, both repeated 10

times. For best performing schemes achieved level

of accuracy expressed by Matthews correlation coef-

ﬁcient was about 0.5 in the cross-validation and about

0.5–0.6 in the testing stage.

The scope of other studies involving user satisfac-

tion was different compared to the current one. For

example, a study (Fenton et al., 2004) was focused

mainly on predicting development resources. Devel-

oped model also was able to predict user satisfaction.

However, that study did not report achieved accuracy

predictions. A study (Cerpa et al., 2016) compared

various schemes to predict project outcome, i.e., ‘suc-

cess’ or ‘failure’. The authors found that attribute se-

lection using information gain score improved accu-

racy, statistical and ensemble classiﬁers were robust

for predicting project outcome, and on average ran-

dom forest provided the most accurate predictions.

A range of studies (Bano et al., 2017; Buchan

et al., 2017; Cartaxo et al., 2013; Montesdioca and

Mac¸ada, 2015; Raza et al., 2010; Subramanyam et al.,

2010; Tarafdar et al., 2010) involved empirical anal-

yses of gathered data to investigate the relationships

between user satisfaction and other factors describ-

ing software development projects and processes. Be-

cause the focus of the current study is on predicting

user satisfaction the results from these analytical stud-

ies were not further investigated here.

3 DATA AND METHOD

This study used the extended version of the ISBSG

dataset (ISBSG, 2009) of software projects which are

described by the attributes reﬂecting their type, size,

duration, development activities involved, environ-

mental factors, objectives, and documents and tech-

niques used. This extended version contains 205 at-

tributes. Eight of them reﬂect user satisfaction with:

• the ability of system to meet stated objectives,

• the ability of system to meet business require-

ments,

• the quality of the functionality provided,

• the quality of the documentation provided,

• the ease of use,

• the training given,

• the speed of deﬁning solution,

• the speed of providing solution.

Predicting User Satisfaction in Software Projects using Machine Learning Techniques

375

They are deﬁned at the 4-point ranked scale where ’1’

indicates that user needs were met to a limited extent

or not at all and ’4’ indicates that user expectations

were exceeded. The target variable for prediction, i.e.

satisfaction score, was deﬁned as the sum of values of

these eight individual attributes. Figure 1 illustrates

the distribution of satisfaction score.

Figure 1: Distribution of Satisfaction Score.

Preparation of the dataset involved actions on clean-

ing the data, correcting obvious mistakes in data

values, changing data types, creating new attributes

from multiple response nominal attributes or as log-

transformations of highly skewed numeric attributes,

removing attributes with fraction of missing values

exceeding 0.6, removing attributes with problems

limiting their usability (e.g. a single value, many val-

ues but with low counts, unclear interpretation, incon-

sistent values when compared to other attributes, not

applicable as predictors). Due to limited space it is

beyond the scope of this paper to discuss the details

of this data preprocessing. Additional on-line mate-

rials document all preparation actions, model learn-

ing and generating predictions

. The dataset was ﬁl-

tered by data quality rating attribute and, more im-

portantly, only cases with eight satisfaction attributes

with non-missing values were kept. After this ﬁltering

the dataset contained 89 cases.

The experimental analysis was performed using R

language

and the caret package

. Table 1 lists tech-

niques used to build predictive models. These tech-

niques were selected because they were widely used

in similar studies. The abbreviation for each tech-

nique indicates the name of the model implementa-

tion in the caret package. For comparison, a baseline

model was also used. Because the satisfaction score is

an integer number the implementations of each tech-

nique were adjusted so that they provided predictions

rounded to the nearest integer.

https://doi.org/10.5281/zenodo.3685484

https://www.R-project.org/

https://cran.r-project.org/package=caret

Table 1: Summary of Prediction Techniques.

Abbr. / Library Technique & Ref.

baselineMean null model predicting mean value

enet elastic net (Zou and Hastie, 2005)

gbm generalized boosted regression

(Friedman, 2001)

glmnet generalized linear regression with

convex penalties (Friedman et al.,

2010)

glmStepAIC generalized linear regression with

stepwise feature selection (Venables

and Ripley, 2002)

knn k-nearest neighbour regression (Alt-

man, 1992)

lm linear regression (Wilkinson and

Rogers, 1973)

lmStepAIC linear regression model with step-

wise feature selection (Venables and

Ripley, 2002)

M5 model trees and rule learner (Wang

and Witten, 1997; Witten et al., 2011)

ranger random forest (Breiman, 2001)

rpart2 recursive partitioning and regression

tree (Breiman et al., 1984)

svm support vector machines (Chang and

Lin, 2007)

xgbTree extreme gradient boosting (Chen and

Guestrin, 2016)

For each technique one or more models were created,

depending on the applicability of particular variant to

given technique. A total of 41 variants were used (see

Section 4.3). These models differed in:

• a dataset version used: regular including logical

and nominal attributes, or numeric with all logical

and nominal attributes transformed to numeric, as

required by some techniques,

• missing value imputation technique: none, by

mean (for numeric attributes), or mode (for non-

numeric attributes),

• numeric values normalization (only for numeric

version of a dataset) or no such pre-processing,

• attribute selection: none (then all available at-

tributes were used) or by principal component

analysis (PCA) with minimum fraction of cap-

tured variance of 0.85.

The experimental part involving model training and

evaluation was performed in the following way. The

dataset was divided into a cross-validation (CV) and

test subsets with randomly selected 79 and 10 cases,

respectively. The CV subset was used to train and

tune the model by selecting the best hyperparameters.

We used M × N-way CV, i.e., with N = 5 folds and re-

peated M = 3 times. Then the ﬁnal model was trained

using the whole CV subset and evaluated with the re-

maining test subset. This procedure ensured that the

ENASE 2020 - 15th International Conference on Evaluation of Novel Approaches to Software Engineering

376

evaluation of the ﬁnal model was performed on an in-

dependent subset of data that was not used in CV for

model tuning. This process was repeated 20 times

with different random data splits in each pass.

Random and exhaustive grid search (Bergstra and

Bengio, 2012) are popular strategies of hyperparame-

ter selection. In this study, a large number of hyperpa-

rameter combinations were deﬁned for some models.

Thus, considering the time-efﬁciency, a stepwise grid

search strategy was applied. It starts with a random

search and iteratively adapts the best performing hy-

perparameter sets until no improvement is achieved

To evaluate the accuracy of predictions a mean ab-

solute error (MAE) was used. This measure is pre-

ferred over mean relative error used in some studies

(Shepperd and MacDonell, 2012).

4 RESULTS

4.1 RQ1: What Accuracy of Predictions

Can Be Achieved by Different

Models?

To answer this RQ the distribution of MAE across

passes for each model was investigated (Figure 2).

The blue diamonds on this and subsequent ﬁgure in-

dicate the macro mean across all passes, i.e., the mean

of MAE (MMAE). On average, models using ranger,

xgbTree and svm techniques performed the best. The

ranger model involving a regular dataset and using

missing value imputation by mode and mean reached

the highest accuracy with mean MMAE = 1.88.

The baselineMean model reached the MMAE =

2.36. On average, some variants of xgbTree per-

formed only slightly better and 14 models performed

worse than the baselineMean. However, for all of

them at least one variant based on particular tech-

nique performed better than this baselineMean model.

The two worst performing models were based on the

lmStepAIC and reached the MMAE = 5.53, signiﬁ-

cantly worse than all other models.

4.2 RQ2: What Are the Ranks of Each

Prediction Model?

Apart from comparing models based on values of

MAE we also investigated model ranks. Each model’s

Due to limited space, an overview of this algorithm and

initial ranges/sets of values for hyperparameters were pro-

vided on-line at https://doi.org/10.5281/zenodo.3685484.

MAE was compared to MAE of all other models, sep-

arately in each pass, to calculate model ranks. The

distributions of these ranks are shown in Figure 3

where the models were sorted by the mean rank which

exact values are provided in Table 2.

Three models based on ranger achieved the best,

i.e. the lowest, mean ranks of 6.95, 8.50, and 9.15,

respectively. They were followed by some variants

of svm, xgbTree and ranger and all three variants of

enet. The baselineMean reached a mean rank of 22.55

which was superior to 13 other models.

We can observe high variability of ranks across

passes. Except for two lmStepAIC that consistently

performed the worst, three ranger models with top

mean ranks also achieved lowest range of these ranks.

However, even they performed quite poor in some

passes with worst ranks of 18, 19, and 26. A total

of 19 models in at least one pass reached the top rank

or at least tied for it – these for which the left-side

whisker starts at rank 1 in Figure 3.

4.3 RQ3: What Accuracy of Predictions

Can be Achieved by Model Variants

for Each Prediction Technique?

To answer this RQ a comparison of mean ranks of all

models grouped by variants of settings for each pre-

diction techniques was performed. Table 2 illustrates

these mean ranks. Techniques were sorted by the best

mean rank for each technique. Cells with no value

provided indicate that a particular model variant was

not deﬁned. This was caused by the following rea-

sons: some techniques need only numeric version of

the dataset, only xgbTree could work with missing

values, for standard lm the dataset must have more

cases than attributes (thus PCA was applied), and for

baselineMean there was no need to use other model

variants as they would provide the same predictions.

For each technique there was at least one model

variant which performed better than baselineMean.

However, there are no common variant settings which

would perform the best for each technique. For exam-

ple, for ranger, glmStepAIC and rpart2 the best ranks

were achieved using a regular dataset, with missing

values replaced with mode and median. However,

for most techniques, i.e., svm, enet, knn, ln, gbm,

and M5 the best ranks were achieved using numeric

dataset, with missing values replaced by median, with

normalization of values and with attribute selection

using PCA. Most notably, xgbTree, which was the

only model that could be trained with missing values,

achieved the best performance in the variant without

missing value imputation applied.

Predicting User Satisfaction in Software Projects using Machine Learning Techniques

377

Figure 2: Distribution of Mean Absolute Error across Passes for Each Model Variant.

Figure 3: Distribution of Ranks across Passes for Each Model Variant.

4.4 RQ4: How Accurate Are the

Predictions from the Best

Performing Model?

Previous RQs showed that a ranger model using a reg-

ular dataset and missing value imputation by mode

and median performed the best. This subsection in-

vestigates deeper predictions from this single model.

Figure 4 illustrates the actual vs predicted values of

satisfaction score in all passes for the test subsets.

The ideal predictor, to achieve a perfect accuracy

with MAE = 0, would give predictions which would

be plotted on the diagonal dashed line. While this

ranger model provided the most accurate predictions

we can observe some issues. Some points on the ﬁg-

ENASE 2020 - 15th International Conference on Evaluation of Novel Approaches to Software Engineering

378

Table 2: Mean Ranks of Model Variants across Passes (Best Performing Variants for Each Technique Are Underlined).

Technique

Settings

Regular Regular

Mode&Median

Numeric Numeric

Normalize

Numeric

Median

Numeric

Median

Normalize

Numeric

Median

Normalize

PCA

ranger 6.95 8.50 9.15 12.80

svm 25.70 12.50 11.00

xgbTree 11.45 11.60 21.80 22.1 16.15

enet 14.05 13.80 13.55

glmnet 14.75 14.75 16.40

knn 22.45 16.05 14.95

lm 15.95

glmStepAIC 16.65 16.65 24.90

rpart2 17.60 24.85 24.85 26.95

lmStepAIC 39.90 39.90 18.05

gbm 23.45 23.40 22.65 18.75

M5 27.35 27.45 27.45 19.85

baselineMean 22.55

ure deviate from this perfect prediction line, mostly

for projects with extreme values of satisfaction score,

i.e. ≤ 15 (7 cases) or ≥ 26 (2 cases).

The second issue is related to the range of pre-

dicted values. While the actual values are in the inter-

val [10..31], the predicted values are in the narrower

interval [13..24]. This shows that even though this

ranger model on average performed the best, it faced

problems with predicting particular cases.

Figure 4: Scatterplot of Actual Vs Predicted Satisfaction

Score by Most Accurate Model.

4.5 RQ5: Which Attributes Are the

Most Important for the Best

Performing Model?

To answer this RQ the importance of each attribute

was evaluated. In each pass a different ranger model

was built, i.e., using different CV subset. For each at-

tribute its importance was calculated as impurity, the

variance of the responses, that was scaled to an inter-

val [0, 100]. Figure 5 illustrates the distribution of this

attribute importance across all passes for the top 20

attributes with the highest mean importance.

On average, the survey respondent role was the

most important. It was followed by effort estimate,

summary work effort (both log-transformed) and pro-

portion of major defects. The top seven attributes

achieved the importance of 100 in at least one pass.

Among the attributes describing project environ-

ment the most important were project manager expe-

rience and selected development techniques, decision

making process, and intended market.

Surprisingly, only two attributes related to defects,

proportion of major defects and proportion of minor

defects, appeared as important when predicting sat-

isfaction score with this ranger model. The dataset

contained also other attributes related to defects, e.g.

defect rate, total # defects, proportion of extreme de-

fects as well as counts for extreme, major and minor

defects, that appeared with lower importance. Among

them the most important was log-transformed defect

rate with mean importance of 17.3 (at rank 39 of 100).

Predicting User Satisfaction in Software Projects using Machine Learning Techniques

379

Figure 5: Distribution of Attribute Importance for the Most

Accurate Model.

5 LIMITATIONS AND THREATS

TO VALIDITY

Results from this study are subject to some limitations

and threats to validity. The ﬁrst of them is related

to the fact that only one dataset was used. Accord-

ing to authors’ knowledge, among the publicly avail-

able datasets only the extended version of the ISBSG

dataset contains attributes on user satisfaction. Hence,

no comparison with other datasets was possible. Also,

this is the ﬁrst study on user satisfaction prediction

where the target variable is numeric. Thus, the MAE

was used to evaluate the model performance, not mea-

sures applicable for classiﬁcation problems as in other

studies. Hence, results obtained in this study are not

comparable with results in other studies even if they

used the same ISBSG dataset.

Second, the study used a subset of the data con-

taining 89 cases of 5024 before ﬁltering. Such strong

reduction was caused mostly by the low number of

projects with non-missing values for user satisfaction.

Because the dataset is not a random sample from pop-

ulation and the above issues, obtained results cannot

be generalized outside the context of this dataset.

Furthermore, subjective decisions were made

when designing the experiment, e.g. on selection of

prediction techniques, model settings, data prepro-

cessing. To partially reduce this problem, this was

performed as in similar studies investigating various

software quality prediction problems and with subjec-

tivity as limited as possible.

The target variable is an aggregation of eight in-

dividual attributes of user satisfaction. Each of them

was assigned the same weight when calculating satis-

faction score. In certain projects the real importance

might have been unequal but such information was

not available in the dataset.

6 CONCLUSIONS AND FUTURE

WORK

This study was focused on predicting aggregated user

satisfaction, i.e. satisfaction score using the extended

ISBSG dataset. It provided answers to ﬁve research

questions in this matter. Based on obtained results it

can be found that it is possible to predict satisfaction

score with satisfactory accuracy. The best perform-

ing ranger (random forest) model delivered predic-

tions with MMAE = 1.88 and achieved a mean rank

of 6.95 across all passes.

Four attributes with the highest importance on pre-

dictions for this model were: survey respondent role,

log(effort estimate), log(summary work effort), and

proportion of major defects. Only two attributes re-

ferring to defects were found among the top-20 most

important for that model.

Apart from this ranger model, two other best per-

forming techniques were xgbTree and svm. How-

ever, some models performed poorly, i.e., on aver-

age 14 models performed worse than a simple base-

line model when comparing MMAE and 13 models

when comparing models’ mean ranks. Despite this,

for each prediction technique there was at least one

model variant which achieved better mean rank than

a baselineMean model.

Achieved results may be extended in the future re-

search in various ways. First, other types of prediction

models may be used. This includes e.g. neural net-

works and more complex ensemble models that can

be built as a combinations of base models and which

in various studies perform superior to other simpler

machine learning models such as those investigated in

this paper. Second, partial analyses started or reported

in this study may be enhanced and completed. This

includes the analysis of importance of various predic-

tors. In this paper this was performed only for the

most accurate model but it can be extended to aggre-

gate importance across a range of used models. This

also includes analysis of performance of the hyper-

parameter tuning method. Such analysis would incor-

porate investigation of inﬂuence of method’s input pa-

rameters and comparison with other methods of tun-

ing hyperparameters.

ENASE 2020 - 15th International Conference on Evaluation of Novel Approaches to Software Engineering

380

REFERENCES

Altman, N. S. (1992). An Introduction to Kernel and

Nearest-Neighbor Nonparametric Regression. The

American Statistician, 46(3):175–185.

Amirova, R., Khomyakov, I., Mirgalimova, R., and Sil-

litti, A. (2019). Software Development and Cus-

tomer Satisfaction: A Systematic Literature Review.

In Mazzara, M., Bruel, J., Meyer, B., and Petrenko,

A., editors, Software Technology: Methods and Tools.

TOOLS 2019, pages 136–149. Springer, Cham.

Bano, M., Zowghi, D., and da Rimini, F. (2017). User satis-

faction and system success: an empirical exploration

of user involvement in software development. Empir-

ical Software Engineering, 22(5):2339–2372.

Bergstra, J. and Bengio, Y. (2012). Random search for

hyper-parameter optimization. Journal of Machine

Learning Research, 13(1):281–305.

Breiman, L. (2001). Random Forests. Machine Learning,

45(1):5–32.

Breiman, L., Friedman, J., Stone, C. J., and Olshen, R.

(1984). Classiﬁcation and Regression Trees. Chap-

man & Hall, Boca Raton.

Buchan, J., Bano, M., Zowghi, D., MacDonell, S., and

Shinde, A. (2017). Alignment of Stakeholder Expec-

tations about User Involvement in Agile Software De-

velopment. In Proceedings of the 21st International

Conference on Evaluation and Assessment in Software

Engineering - EASE’17, pages 334–343, New York,

New York, USA. ACM Press.

Cartaxo, B., Araujo, A., Barreto, A. S., and Soares, S.

(2013). The Impact of Scrum on Customer Satisfac-

tion: An Empirical Study. In 2013 27th Brazilian

Symposium on Software Engineering, pages 129–136.

IEEE.

Cerpa, N., Bardeen, M., Astudillo, C. A., and Verner, J.

(2016). Evaluating different families of prediction

methods for estimating software project outcomes.

Journal of Systems and Software, 112:48–64.

Chang, C.-C. and Lin, C.-J. (2007). LIBSVM: A library

for support vector machines. ACM Transactions on

Intelligent Systems and Technology, 2(3).

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable

Tree Boosting System. In Proceedings of the 22nd

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining - KDD ’16, pages

785–794, New York. ACM Press.

Diegmann, P., Basten, D., and Pankratz, O. (2017). In-

ﬂuence of Communication on Client Satisfaction in

Information System Projects: A Quantitative Field

Study. Project Management Journal, 48(1):81–99.

Fenton, N., Marsh, W., Neil, M., Cates, P., Forey, S., and

Tailor, M. (2004). Making Resource Decisions for

Software Projects. In Proceedings of the 26th Inter-

national Conference on Software Engineering, pages

397–406, Washington, DC. IEEE Computer Society.

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Reg-

ularization Paths for Generalized Linear Models via

Coordinate Descent. Journal of Statistical Software,

33(1).

Friedman, J. H. (2001). Greedy function approximation: A

gradient boosting machine. The Annals of Statistics,

29(5):1189–1232.

ISBSG (2009). ISBSG Repository Data Release 11. Inter-

national Software Benchmarking Standards Group.

ISO/IEC (2011). Software engineering – Software prod-

uct Quality Requirements and Evaluation (SQuaRE) –

System and software quality models, volume ISO/IEC

25010:2011(E).

Jones, C. (2008). Applied Software Measurement: Global

Analysis of Productivity and Quality. McGraw-Hill

Education, 3rd edition.

Montesdioca, G. P. Z. and Mac¸ada, A. C. G. (2015).

Measuring user satisfaction with information security

practices. Computers & Security, 48:267–280.

Radli

nski, Ł. (2015). Preliminary evaluation of schemes for

predicting user satisfaction with the ability of system

to meet stated objectives. Journal of Theoretical and

Applied Computer Science, 9(2):32–50.

Radli

nski, Ł. (2018). Predicting Aggregated User Satisfac-

tion in Software Projects. Foundations of Computing

and Decision Sciences, 43(4):335–357.

Raza, A., Capretz, L. F., and Ahmed, F. (2010). Improve-

ment of Open Source Software Usability: An Em-

pirical Evaluation from Developers’ Perspective. Ad-

vances in Software Engineering, 2010:1–12.

Shepperd, M. and MacDonell, S. (2012). Evaluating pre-

diction systems in software project estimation. Infor-

mation and Software Technology, 54(8):820–827.

Subramanyam, R., Weisstein, F. L., and Krishnan, M. S.

(2010). User participation in software development

projects. Communications of the ACM, 53(3):137–

141.

Tarafdar, M., Tu, Q., and Ragu-Nathan, T. S. (2010). Impact

of Technostress on End-User Satisfaction and Perfor-

mance. Journal of Management Information Systems,

27(3):303–334.

Venables, W. N. and Ripley, B. D. (2002). Modern Applied

Statistics with S. Statistics and Computing. Springer,

New York, NY, 4th edition.

Wang, Y. and Witten, I. H. (1997). Induction of model trees

for predicting continuous classes. In Proceedings

of the Poster Papers of the European Conference on

Machine Learning, Prague. University of Economics,

Faculty of Informatics and Statistics.

Wilkinson, G. N. and Rogers, C. E. (1973). Symbolic De-

scription of Factorial Models for Analysis of Variance.

Applied Statistics, 22(3):392.

Witten, I., Frank, E., and Hall, M. (2011). Data Mining:

Practical Machine Learning Tools and Techniques.

Elsevier, 3rd edition.

Zou, H. and Hastie, T. (2005). Regularization and variable

selection via the elastic net. Journal of the Royal Sta-

tistical Society: Series B (Statistical Methodology),

67(2):301–320.

Predicting User Satisfaction in Software Projects using Machine Learning Techniques

381