NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY

Revealing the Nature of Software Failure Dataseries

Andreas S. Andreou, Constantinos Leonidou

Department of Computer Science, University of Cyprus,

75 Kallipoleos Str., P.O.Box 20537, CY1678, Nicosia, Cyprus

Keywords: software failure, reliability, nonparametric analysis, antipersistence

Abstract: Software reliability is directly related to the number and time of occurrence of software failures. Thus, if we

were able to reveal and characterize the behavior of the evolution of actual software failures over time then

we could possibly build more accurate models for estimating and predicting software reliability. This paper

focuses on the study of the nature of empirical software failure data via a nonparametric statistical

framework. Six different time-series data expressing times between successive software failures were

investigated and a random behavior was detected with evidences favoring a pink noise explanation.

1 INTRODUCTION

System reliability is one of the major components

identified by the ISO9126 standard for assessing the

overall quality of software. More specifically,

software reliability can be evaluated on the

following factors:

• Maturity: Attributes of software that bear on

the frequency of failure by faults in the

software (

ISO/IEC 9126-1:2001)

• Fault tolerance: Attributes of software that

bear on its ability to maintain a specified level

of performance in cases of software faults or

of infringement of its specified interface

(

ISO/IEC 9126-1:2001)

• Crash frequency: The number of the system

crashes per unit of time

• Recoverability: Attributes of software that

bear on the capability to re-establish its level

of performance and recover the data directly

affected in case of a failure and on the time

and effort needed for it (

ISO/IEC 9126-1:2001)

The standard definition of reliability for software

is th

e probability of normal execution without failure

for a specified interval of time (Musa, 1999; Musa,

Iannino and Okumoto, 1987). Going a step further

we can state that reliability varies with execution

time and grows as faults underlying the occurred

failures are uncovered and corrected. Therefore,

measuring and studying quantities related to

software failure can lead to improved reliability and

customer satisfaction. There are four general ways

(quantities) to characterize (measure) failure

occurrences (Musa, 1999):

• Time of failure

• Time interval between failures

• Cumulative failures experienced up to a given

time t

i

• Failures experienced in a time interval t

i

The majority of recent research studies focus on

so

ftware reliability assessment or prediction using

the failure-related metrics mentioned above (e.g.

Patra, 2003; Tamura, Yamada and Kimura, 2003),

with very few exceptions that experiment with other

alternatives, like Fault Injection Theory (e.g. Voas

and Schneidewind, 2003), Genericity (Schobel-

Theuer, 2003), etc.

As Musa points out (Musa, 1999), the quantities

of fai

lure occurrences are random variables in the

sense that we do not know their values in a certain

time interval with certainty, or that we cannot

predict their exact values. This randomness, though,

does not imply any specific probability distribution

(e.g. uniform) and it is sourced on one hand by the

complex and unpredictable process of human errors

introduced when designing and programming, and

on the other by the unpredictable conditions of

program execution. In addition, the behavior of

software is affected by so many factors that a

deterministic model is impractical to catch.

138

S. Andreou A. and Leonidou C. (2005).

NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY Revealing the Nature of Software Failure Dataseries.

In Proceedings of the Seventh International Conference on Enterprise Information Systems, pages 138-145

DOI: 10.5220/0002517601380145

Copyright

c

SciTePress

The aim of this paper is to study the nature and

structure of software failure data using a set of

empirical software failure dataseries and a robust

nonparametric statistical method. The ReScaled

range (R/S) analysis, originated by Hurst (Hurst,

1956), looks deep in a time-series dataset and

provides strong evidences of either an underlying

deterministic structure, or a random behavior. In the

former case it can define the time pattern of the

deterministic behavior, while in the latter it reveals

the nature of randomness (i.e. the colour of noise).

The rest of the paper is organized as follows:

Section 2 outlines the basic concepts of the

nonparametric framework used to analyse the

available datasets. Section 3 describes briefly the six

software failure time-series data used and provides a

short statistical profile for the data involved. Section

4 concentrates on the empirical evidence that results

from the application of R/S analysis on the different

software projects failure data. In addition, this

section presents our attempt to investigate through

R/S Analysis the nature of software failure data

produced by two well-known and widely used

Software Reliability Growth Models, namely the

Musa Basic model and the logarithmic Poisson

Musa-Okumoto execution time model, and

compares results with those derived using the

empirically collected data. Finally, section 5 sums

up the empirical findings, draws the concluding

remarks and suggests future research steps.

2 THE R/S ANALYSIS

Technically, the origins of R/S analysis are related to

the “T to the one-half rule”, that is, to the formula

describing the Brownian motion (B.M.):

R = T

0.5

(1)

where R is the distance covered by a random particle

suspended in a fluid and T a time index. It is obvious

that (1) shows how R is scaling with time T in the

case of a random system, and this scaling is given by

the slope of the log(R) vs. log(T) plot, which is equal

to 0.5. Yet, when a system or a time series is not

independent (i.e. not a random B.M.), (1) cannot be

applied, so Hurst gave the following generalisation

of (1) which can be used in this case:

(R/S)

n

= c n

H

(2)

where (R/S)

n

is the ReScaled range statistic

measured over a time index n, c is a constant and H

the Hurst Exponent, which shows how the R/S

statistic is scaling with time.

The objective of the R/S method is to estimate

the Hurst exponent, which, as we shall see, can

characterise a time series. This can be done by

transforming (2) to:

log (R/S)

n

= log(c) + H log(n) (3)

and H can be estimated as the slope of the log/log

plot of (R/S)

n

vs. n.

Given a time series {X

t

: t=1,...,N}, the R/S

statistic can be defined as the range of cumulative

deviations from the mean of the series, rescaled by

the standard deviation. The analytical procedure to

estimate the (R/S)

n

values, as well as the Hurst

exponent by applying (3), is described in the

following steps :

Step 1: The time period spanned by the time series

of length N, is divided into m contiguous sub-periods

of length n such that mn = N. The elements in each

sub-period X

i,j

have two subscripts, the first (i=1,..,n)

to denote the number of elements in each sub-period

and the second (j=1,...,m) to denote the sub-period

index. For each sub-period j the R/S statistic is

calculated, as:

⎥

⎦

⎤

⎢

⎣

⎡

−−−=

⎟

⎠

⎞

⎜

⎝

⎛

∑∑

==

≤≤

≤≤

−

k

i

k

i

jij

nk

jij

nk

j

j

XXXXs

S

R

11

1

1

1

)(min)(max

(4)

where s

j

is the standard deviation for each sub-

period.

In (4), the k deviations from the mean of the sub-

period have zero mean; hence the last value of the

cumulative deviations for each sub-period will

always be zero. Due to this, the maximum value of

the cumulative deviations will always be greater or

equal to zero, while the minimum value will always

be less or equal to zero. Hence, the range value (the

bracketed term in (4)), will always be non-negative.

Normalizing (rescaling) the range is important since

it permits diverse phenomena and time periods to be

compared, which means that R/S analysis can

describe time series with no characteristic scale.

Step 2: The (R/S)

n

, which is the R/S statistic for time

length n, is given by the average of the (R/S)

j

values

for all the m contiguous sub-periods with length n,

as :

∑

=

⎟

⎠

⎞

⎜

⎝

⎛

=

⎟

⎠

⎞

⎜

⎝

⎛

m

j

jn

S

R

mS

R

1

1

(5)

Step 3: Equation (5) gives the R/S value, which

corresponds to a certain time interval of length n. In

order to apply equation (3), steps 1 and 2 are

repeated by increasing n to the next integer value,

until n = N/2, since, at least two sub-periods are

needed, to avoid bias.

From the above procedure it becomes obvious that

the time dimension is included in the R/S analysis by

examining whether the range of the cumulative

deviations depends on the length of time used for the

measurement. Once (5) is evaluated for different n

NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY: Revealing the Nature of Software Failure Dataseries

139

periods, the Hurst exponent can be estimated

through an ordinary least square regression from (3).

The Hurst exponent takes values from 0 to 1

(0≤H≤1). Gaussian random walks, or, more

generally, independent processes, give H = 0.5. If

0.5 ≤ H ≤1, positive dependence is indicated, and the

series is called persistent or trend reinforcing, and in

terms of equation (1), the system covers more

distance than a random one. In this case the series is

characterised by a long memory process with no

characteristic time scale. The lack of characteristic

time scale (scale invariance) and the existence of a

power law (the log/log plot), are the key

characteristics of a fractal series. If 0 ≤H ≤0.5,

negative dependence is indicated, yielding anti-

persistent or mean-reverting behavior

1

. In terms of

equation (1), the system covers less distance than a

random series, which means that it reverses itself

more frequently than a random process.

A Hurst exponent different from 0.5 may

characterise a series as fractal. However a fractal

series might be the output of different kinds of

systems. A “pure” Hurst process is a fractional

Brownian motion (Mandelbrot and Wallis, 1969),

also known as biased random walk or fractal noise

or coloured noise, that is, a random series the bias of

which can change abruptly but randomly in direction

or magnitude.

A problem, though, that must be dealt is the

sensitivity of R/S analysis to short-term dependence,

which can lead to unreliable results (Aydogan and

Booth, 1988; Booth and Koveos, 1983; Lo, 1991).

Peters (Peters, 1994) shows that Autoregressive

(AR), Moving Average (MA) and mixed ARMA

processes exhibit Hurst effects, but once short-term

memory is filtered out by an AR(1) specification,

these effects cease to exist. On the contrary, ARCH

and GARCH models do not exhibit long term

memory and persistence effects at all. Hence, a

series should be pre-filtered for short-term linear

dependence before applying the R/S analysis. In our

analysis, we use partial autocorrelograms and

Schwartz´s information criterion to indicate the best-

fit time series linear model to our data.

3 SOFTWARE FAILURE DATA

Before moving to applying R/S Analysis on the

available software failure datasets, let us examine

some basic features of the data: Six different

1

Only if the system under study is assumed to have a

stable mean

datasets were used in this paper, which were

collected throughout the mid 1970s by John Musa,

representing projects from a variety of applications

including real time command and control, word

processing, commercial, and military applications

(DFC, www.dacs.dtic.mil): Project 5 (831 total

samples), SS1B (375 total samples), SS3 (278 total

samples), SS1C (277 total samples), SS4 (196 total

samples) and SS2 (192 total samples).

R/S analysis can be applied only on a

transformation of the original data samples which

produces the so-called return series eliminating any

trend present. This transformation is the following:

⎟

⎟

⎠

⎞

⎜

⎜

⎝

⎛

=

−1

log

t

t

t

S

S

r

(6)

where S

t

and S

t-1

are the failure samples (more

specifically, the elapsed times form the previous

failure measured in seconds) at times (t) and (t-1)

respectively (discrete sample number) and r

t

is the

estimated return sample.

As mentioned in the previous section, in order to

remove the short-term memory all return series were

filtered with an AR(1) model. A short statistical

profile is depicted in Table 1, where one can easily

discern that the series under study present slight

differences from the normal distribution: Left

skewness is evident for all data series, while, as

regards kurtosis, Project 5 samples present

leptokurtosis with the rest series being less

concentrated around the mean compared to the

normal distribution. Figure 1 plots the data samples

belonging to Project 5, with the left figure showing

the evolution of failures according to the elapsed

time and the right figure presenting the de-trended

return series. The rest of the datasets present a

similar graphical representation, thus their figures

were omitted due to space limitations.

The statistical profile of the dataseries given

above suggests a general resemblance with the

normal (Gaussian random) distribution, but slight

differences were also observed. The main questions,

though, remain: What is the actual nature of this

data? Is there a random component in the behavior

of these series? The answers will be provided by

closely examining the results derived from the

application of R/S Analysis described in the next

section.

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

140

4 EMPIRICAL EVIDENCE

These Hurst exponent values indicate anti-

persistency, that is, all return series present negative

dependence or antipersistence, yielding a mean-

reverting behavior since the data fluctuate around a

reasonably stable mean (no trend or consistent

pattern of growth). In terms of equation (1), each of

the six systems covers less distance than a random

series, which means that it reverses itself more

frequently than a random process.

4.1 R/S Analysis Results on Software

Reliability Data Series

The results of R/S analysis on the six failure return

series are listed in Table 2. The Hurst exponent

estimated has a low value ranging from 0.27 (Project

5 dataset) to 0.37 (SS1C dataset).

AR(1)-filtered returns for the "Pro

j

ect 5" software

failure dataset

-6

-4

-2

0

2

4

6

1 61 121 181 241 301 361 421 481 541 601 661 721 781

sample

return

Time intervals between successive software

failures for the "Project 5" dataset

0

50000

100000

150000

200000

250000

300000

350000

1 61 121 181 241 301 361 421 481 541 601 661 721 781

failure number

seconds

(a) (b)

Figure 1: Project 5 sample series: (a) Time intervals between successive failures (b) AR(1)

Table 1: Statistical description of the six software failure returns series data (AR(1)-filtered samples)

Project 5 SS1B SS3 SS1C SS4 SS2

Sample size

829 373 276 275 194 191

Average

5.55E-18 2.97E-18 -1.20E-18 -5.85E-18 1.54E-17 3.95E-17

Median

0.024 0.073 0.075 0.084 0.121 -0.027

Variance

1.150 0.871 1.936 0.988 1.421 1.176

Standard deviation

1.072 0.933 1.391 0.993 1.192 1.084

Minimum

-4.963 -3.120 -5.056 -3.094 -4.127 -2.634

Maximum

5.165 2.879 3.470 4.786 2.832 3.268

Range

10.129 6.000 8.526 7.881 6.960 5.902

Lower quartile

-0.479 -0.536 -0.761 -0.473 -0.699 -0.671

Upper quartile

0.597 0.619 0.942 0.563 0.823 0.712

Skewness

-0.616 -0.263 -0.315 -0.034 -0.432 -0.131

Kurtosis

4.082 0.457 0.180 2.327 0.713 -0.019

Table 2: Hurst estimates and test of significance against two random alternatives for the AR(1)-filtered

returns series of the six different Musa’s software failure datasets

IID-null

Hypothesis

Gaussian-null

Hypothesis

Dataset

Hurst

Exponent

Mean Hurst Significance Mean Hurst Significance

Project5 0.27 0.56 0.1% 0.56 0.1%

SS1B 0.31 0.58 0.1% 0.58 0.1%

SS3 0.28 0.59 0.1% 0.60 0.1%

SS1C 0.37 0.59 0.1% 0.59 0.1%

SS4 0.33 0.60 0.1% 0.60 0.1%

SS2 0.33 0.60 0.1% 0.60 0.1%

NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY: Revealing the Nature of Software Failure Dataseries

141

A significant problem of R/S analysis is the

eva

lts obtained using the R/S

ana

m null, the H

est against the iid null, the same procedure is

tion of the R/S analysis results was

per

resembles the one observed for random, non-white

laws, which essentially is a function of

freq

b is the spectral component and H the

Hur t. T e can relate pi

to a

can conclude

tha

nalysis and Software

ystem

Havi ults thus far and the evidences

suggesting a pink noise explanation for the behavior

luation of the H exponent from a statistical point

of view. Specifically, we should be able to assess

whether an H value is statistically significant

compared to a random null, i.e. to the H exponent

exhibited by an independent random system. Peters

(1994) shows that under the Gaussian null, a

modification of a formula developed by Anis and

Lloyd (1976) allows for hypothesis testing by

computing E(R/S)

n

and E(H), the expected variance

of which will depend only on the total sample size

N, as Var(H)=1/N. However, if the null is still iid

randomness but not Gaussianity, the formal

hypothesis testing is not possible. To overcome this

problem we used bootstrapping (Efron, 1979) to

assess the statistical significance of the H exponents

of our series, against both the Gaussian and the iid

random null hypotheses.

The validity of the resu

lysis may be assessed as follows:

(A) To test against the Gaussian rando

exponent from 5000 random shuffles of a Gaussian

random surrogate, having the same length, mean and

variance with our return series is calculated and

compared to the test statistic i.e. the actual H

exponent of our series. Since the actual H statistic

was found to be lower than 0.5 and anti-persistence

is possible, the null can be formed as: H

0

: H = H

G

and the alternative is

H

1

: H < H

G

. In this case, the

significance level of the test is constructed as the

frequency with which the pseudostatistic H

G

is

smaller than or equal to the actual statistic and the

null is rejected if the significance level is smaller

than the conventional rejection levels of 1%, 2.5%

or 5%.

(B) To t

followed but this time we randomise the return

series tested to produce 5000 iid random samples

having the same length and distributional

characteristics as the original series. In this case,

rejection of the null means that the actual H

exponent calculated from the original series is

significantly smaller from the one calculated from an

iid random series. Hence, this is also a test for non-

iid-ness.

Valida

formed as described above, with both tests being

applied to the available dataseries. As Table 2

shows, both hypotheses are rejected since the

significance level of the test in both cases was lower

than 1%. This means that our Hurst estimates are

statistically significant against both the Gaussian and

the iid-null hypotheses suggesting an underlying

structure that deviates from normal distribution and

noise.

Fractional noise scales according to inverse

power

uency f, and follows the form f

-b

. This is a

typical characteristic of fractals, which have power

spectra that follows the inverse power law as a result

of the self-similar nature of the system (Peters,

1994). For white noise (Gaussian, random process)

b = 0, that is, the power spectra is not related to

frequency. There is no scaling law for white noise.

When white noise is integrated, then b = 2, the

power spectra for brown noise. If 0 < b < 2 we have

pink noise, while when b > 2 there is black noise.

Mandelbrot and Van Ness (Mandelbrot and Van

Ness, 1968) postulated, and recently Flandrin

(Flandrin, 1989) rigorously defined, the following

equation that relates fractional noises with the Hurst

exponent:

b = 2H + 1 (7)

where

st exponen herefore, w nk noise

ntipersistence: H < 0.50, 1 ≥ b > 2.

According to the foregoing notions, and based

on the results of the R/S analysis, we

t the six software failure data series are random in

nature, thus confirming the findings and supporting

the arguments of various researchers (Musa, 1999;

Musa and Iannino, 1990). Moving a step forward,

we may state that the main conclusion drawn is that

software failure empirical behavior resembles that of

pink noise.

4.2 R/S A

Reliability Growth Models S

Variables

ng in mind the res

of the empirical software failure data, we will

attempt to study the nature of the data produced by

known Software Reliability Growth Models

(SRGM) and compare results. Several studies (Dale,

1982; Ramamoorthy and Bastani, 1982; Farr, 1996;

Jones 1991) report the superiority of the Musa Basic

(MB) and the logarithmic Poisson Musa-Okumoto

(MO) execution time models (Musa, 1975; Musa

and Okumoto, 1984) over a variety of SRGM widely

applied to different actual projects and computer

programs. Therefore, we decided to use those two

models to produce new software failure time

samples and apply R/S analysis to investigate the

nature of the produced dataseries, and hence the type

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

142

of software failure behavior each model is capable to

capture. Let M(t) represent the number of failures of

the particular software package by time t, (t≥0). It is

clear that M(0)=0. M(t) cannot be easily calculated

because it corresponds to a physical quantity and can

only be measured as the software is tested. Software

Reliability Growth Models are trying to model the

behavior of M(t) with the statistical function µ(t).

The mean value function µ(t) of a SRGM represents

the number of failures expected to occur up to time

moment t, (t≥0):

Ntct )()( =

µ

(8)

where c(t) is the time variant test coverag

func N is the number

hav

e

tion and of faults expected to

e been exposed at full coverage. This is

distinguished from the expected number of faults to

be detected after infinite testing time, perfect testing

and fault detection coverage, which can be denoted

as

N

ˆ

. Equations (9) and (10) describe the test

coverage function for the MB and MO models

respectively:

MB :

fKBt

etc

−

−=1)(

(9)

MO :

)1ln()( ttc

φ

+

=

(10)

where φ is e s the so-the failure rate p r fault, K i

call sure ratio, B is a fault r

fac

ed fault expo eduction

tor and f can be calculated as the average object

instruction execution rate of the computer r, divided

by the number of source code instructions of the

application under testing I

S

, times the average

number of object instructions per source code

instruction, Q

x

(Musa, 1999):

xS

QI

r

f =

(11)

Both M(t) q nc he form and µ(t) are se ue es of t

0,1,2, … with M(0)=0 and µ(0)=0. Assuming that as

t

ime t as t app

positive variable i, (i=0,1,2...). Thus, we

hav

Su g c(t

i

) in (9) and (10) and solving for t

i

we

e:

approaches infinity µ(t) becomes a good

approximation of M(t) (the number of failures that

has been realized up to time t) we will have µ(t)

taking by definition integer positive values (since

µ(t) corresponds to the expected number of failures

experienced by time t; hence

[]

K,2,1,0)( ∈t

µ

). Both

SRGMs (MB and MO) converge to the number of

failures being realized up to t roaches

infinity is generally acceptable. Therefore, our

assumption that M(t) will be approached by µ(t) as t

approaches infinity. But µ(t), which characterizes a

SRGM, is given as a model property, thus we should

try to reverse engineer the process of failure

occurrence and artificially reproduce the “ failure

occurrences”. The failure reproduction process is as

follows:

First we replace µ(t) in equation (8) with an

integer

e:

i=c(t

i

)N (12)

bstitutin

hav

MB:

⎟

⎞

⎜

⎛

−−=⇔−=

−

i

tNei

fKBt

i

1ln

1

)1(

(13)

⎠⎝

NfKB

i

MO:

⎟

⎠

⎞

⎜

⎝

⎛

−=⇔+= 1

1

)1ln(

N

i

ii

etNti

ϕ

φ

me variable t

i

represents discrete fa

moments, i.e. the time elapsed from the start of this

pro

(14)

Ti

ilure time

cess (t

0

for i=0) until the occurrence of the i

th

failure (t

1

for i=1, t

2

for i=2 etc.).

Consequently we can calculate times between

successive failures as:

escribed above, we

duc artific es d ta via e

13

failure dataseries were thus

g

5 CONCLUSIONS – DISCUSSION

The nature of empirical software reliability time

series data was investigated in this paper using a

∆t

i

= t

i

- t

i-1

. (15)

Following the analysis d

repro ed ial failure tim a quations

to 15, for which we used the following parameter

values suggested by Musa (1975, 1999) and the

Defense Software Collaboration (DFC,

www.dacs.dtic.mil):

φ = 7.8 E-8, f = 7.4 E-8, K = 4.2 E-7, B= 0.955

Two new software

enerated, with their size being equal to that of

Project 5 (829 samples). R/S analysis was then

applied on the AR(1)-filtered returns of the

artificially reproduced series. The Hurst exponent

estimations indicate that the MB-based reproduced

series exhibit long-memory or persistent effects

(H=0.9), while the corresponding MO-based sample

data is characterized as antipersistent (H=0.2). These

findings are very interesting considering the fact that

they suggest a strong diversification between two

models that are both used as software reliability

growth estimators, and hence as software failure

predictors. The similar antipersistent

characterization of the six software failure dataseries

under study and the MO-based generated data is also

consistent with the findings of other researchers who

report the superiority of the MO model in actual

software projects. It is therefore natural that this

model proved to be more successful compared to the

rest of the SRGMs simply because its underlying

modeling nature is similar to the behavior observed

with actual empirical data.

NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY: Revealing the Nature of Software Failure Dataseries

143

robust nonparametric statistical framework called

ReScaled range (R/S) analysis. This type of analysis

ing process should play a central role in the

pla

ast five years. We are trying to

bui

is able to detect the presence (or absence) of long-

term dependence, thus characterizing the system

under study as persistent / deterministic (or mean-

reverting / antipersistent).

The results of the R/S analysis on six different

software failure dataseries suggested a random

explanation, revealing strong antipersistent behavior.

The validity of these results was tested against the

Ga

ussian and iid-null hypotheses and both

alternatives were rejected with highly significant

statistical values. Our findings are consistent with

other reports in the international literature that

comment on the random nature of software failures

in terms of number and time of occurrence. Going a

step further and relating the values of the Hurst

exponent estimated via the R/S analysis with colored

noise, we concluded that the six series examined are

better described by a pink noise structure.

Two well known and successful software

reliability growth models, namely the Musa Basic

(MB) and the Musa-Okumoto (MO) logarithmic

model, were used to reproduce sample series of

tim

es between failures. The new series were also

tested with R/S analysis and found to be long-term

dependent in the case of the MB data and

antipersistent in the case of the MO series. The latter

finding justifies the superiority of the MO model

over a variety of other SRGM tested on a number of

actual projects: The MO model is ruled by the same

structure as the empirical software failure data (i.e.

mean reverting), thus it can capture the actual

behavior of software failures better.

The framework for statistical characterization of

empirical software failure data proposed by the

present work can assist towards the identification of

possible weaknesses in the assumptions of SRGMs

and

suggest the most suitable and accurate model to

be used for controlling the reliability level of the

software product under development.

One thing that remains to investigate, and this

will be the focus of our future work, is the high

value of the Hurst exponent for the MB series.

Chaotic systems have also Hurst exponents H > 0.5,

and

in chaotic terms long memory effects

correspond to sensitive dependence on initial

conditions. Actually, the latter property combined

with fractality characterises chaotic systems. Pure

chaotic processes have Hurst exponents close to 1.

Our future research will concentrate on employing

R/S analysis to detect the existence of cycles (i.e.

repeating patterns) in the MB series and will attempt

to characterise it in terms of periodicity: Cycles

detected can be periodic or non-periodic in the sense

that the system has no absolute frequency. Non-

periodic cycles can be further divided to statistical

cycles and chaotic cycles. Fractal noises exhibit

statistical cycles, i.e. cycles with no average cycle

length.

Software Reliability is cited by the majority of

users as the most important feature of software

products. Hence, the software reliability

engineer

nning and control of software development

activities (Musa, 1999) to ensure that product

reliability meets user needs, to reduce product costs

and to improve customer satisfaction. In this context,

it is quite important to record and study the time of

fault occurrences, their nature and the time spent on

correction, throughout the design and

implementation phases, as well as during testing

activities. This data can prove very useful when

trying to model the reliability behavior of the

software product being developed and it can also

play a central role to the empirical verification of

Software Reliability Growth Models (SRGMs).

With such data available it is easier to select suitable

reliability models for estimating the time at which

the software product will have reached a desirable

level of reliability, and/or to devise methods to

decrease that time. As we mentioned before, the

datasets used in this study were gathered by John

Musa around mid seventies. Although the datasets

are quite old, we believe that their analysis is a good

starting point for studying the behavior of software

failure occurrences.

Currently we are making efforts towards

gathering synchronous failure data, that is, data from

systems that have been developed and are in

operation during the l

ld a database with failure data from various

modern application domains and operating systems.

In our days, where most computer applications and

operating systems are multitasking and/or

distributed, it is very interesting to elaborate on how

to gather accurate failure data. This problem, the

definition of the right metrics to assess software

reliability, has a particular interest due to the fact

that software products nowadays include some new

and unique characteristics, such as distribute

computing, mobility and web accessibility /

immediacy. Once this reliability data gathering is

accomplished, we will be able to compare the

characteristics of modern reliability data with those

of the datasets developed by Musa in the 70ies, and

investigate whether the random nature of software

failure occurrences revealed by the R/S analysis

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

144

holds true for modern software systems as well.

Furthermore, we will attempt to investigate whether

there is a relationship between the datasets used and

the results obtained, i.e. whether certain application

domains are more error prone than others.

REFERENCES

Anis, A.A. and Lloyd E.H., 1976. The Expected Value of

endent

iometrica 63, pp. 155-164.

Aydogan, K. and Booth, G., 1988. Are there long cycles in

Bo ment

Da ware Reliability Evaluation Methods,

De

the Adjusted Rescaled Hurst Range of Indep

Normal Summands. B

common stock returns? Southern Economic Journal

55, pp. 141-149.

oth, G.G. and Koveos, P.E., 1983. Employ

fluctuations: an R/S analysis. Journal of Regional

Science 23, pp. 19-31.

le, C.J., 1982. Soft

Report ST26750, British Aerospace.

fense Software Collaboration,

http://

www.dacs.dtic.mil/techs/baselines/reliability. html

Efron, B., 1979. Bootstrap methods: Another look at the

jack-knife. The Annals of Statistics 7,

pp. 1-26.

ility

Fla ctional

Hu in

Inte rganization for Standardization, 1991.

Ma

ents with fractional Gaussian noises. Part 1, 2,

Ma Fractional

Mu tware Reliability. In

Mu re Reliability and its

Musa, J.D., Iannino, A., Okumoto, K., 1987. Software

national Conference on

Pat

ftware Reliability

Pet

ering 8(4), pp. 354-370.

on Software

Tam

networks for distributed development

Vo

IEEE

Farr, W., 1996. Software Reliability Modeling Survey. In

M. Lyu (ed.) Handbook of Software Reliab

Engineering, McGraw-Hill, New York, pp. 71-117.

ndrin, P., 1989. On the Spectrum of Fra

Brownian Motions, IEEE Transactions on Information

Theory 35.

rst, H.E, 1956. Methods of using long-term storage

reservoirs. Proceedings of the Institute of Civil

Engineers 1, pp. 519-543.

rnational O

ISO/IEC 9126 - Information Technology, Software

Product Evaluation, Quality, Characteristics and

Guidelines for their Use.

Jones, W.D., 1991. Reliability Models for Very Large

Software Systems in Industry, 2

nd

International

Symposium on Software Reliability Engineering, pp.

35-42.

Lo, A.W., 1991. Long term memory in stock market

prices. Econometrica 59, pp. 1279-1313.

ndelbrot, B. and Wallis, J., 1969. Computer

experim

3. Water Resources Research 5.

ndelbrot, B. and van Ness, J.W., 1968.

Brownian Motions, Fractional Noises and

Applications, SIAM Review 10.

sa, J.D., Iannino, A., 1990. Sof

Marshall Yvotis (ed.) Advances in Computers 30, pp.

85-170, Academic Press, San Diego.

sa, J.D., 1975. A Theory of Softwa

Application, IEEE Trans. Software Eng. 1(3), pp. 312-

327.

Musa, J., 1999. Software Reliability Engineering,

McGraw-Hill, New York.

Reliability - Measurement, Prediction, Application,

McGraw-Hill, New York.

Musa, J.D., Okumoto, K., 1984. A Logarithmic Poisson

Execution Time model for Software Reliability

Measurement, 7

th

Inter

Software Engineering, pp. 230-238.

ra, S., 2003. A neural network approach for long-term

software MTTF prediction. In Fast abstracts of the 14

th

IEEE International Symposium on So

Engineering (ISSRE2003), Chillarege Press.

ers, E., 1994. Fractal market analysis: applying chaos

theory to investment and economics. New York: John

Wiley & Sons.

Ramamoorthy, C.V., Bastani, F.B., 1982. Software

Reliability – Status and Perspectives, IEEE Trans. On

Software Engine

Schobel-Theurer, T., 2003. Increasing software reliability

through the use of Genericity. In Fast abstracts of the

14

th

IEEE International Symposium

Reliability Engineering (ISSRE2003), Chillarege

Press.

ura, Y., Yamada, S. and Kimura, M., 2003. A

software reliability assessment method based on

neural

environment. Electronics and Communications in

Japan, Part 3, Vol. 86, No. 11, pp. 1236-1243.

as, J. and Schneidewind, N., 2003. Marrying software

Fault Injection Technology with Software Reliability

Growth Models. In Fast abstracts of the 14

th

International Symposium on Software Reliability

Engineering (ISSRE2003), Chillarege Press.

NONPARAMETRIC ANALYSIS OF SOFTWARE RELIABILITY: Revealing the Nature of Software Failure Dataseries

145