Testing Practices of Software in Safety Critical Systems: Industrial

Survey

Mohamad Kassab

Engineering Division, Pennsylvania State University, Malvern, PA, U.S.A.

Keywords:

Safety-critical Systems, Software Testing, Software Professionals, Industrial Practices.

Abstract:

The software becomes increasingly a core integrated part of the safety-critical systems. Unfortunately, little

contemporary data exists to document the actual practices used by software professionals for software testing

and quality assurance activities for software in safety-critical systems. To remedy the deﬁciency of lack of data,

we conducted a comprehensive survey of software professionals to attempt to discover these practices. In this

paper we report on our ﬁndings from this survey on the state of practice of testing software for safety-critical

systems in respect to three areas: 1) The integration of the testing activities within the software development

life cycle; 2) Testing methods and techniques; 3) Testing metrics and defects management. We also provide

some comparison with testing software for non-safety-critical systems.

1 INTRODUCTION

A safety-critical system is a system whose malfunc-

tion may result in death or serious injury to peo-

ple, loss or damage to property or environmental

harm. Engineers have developed safety-critical sys-

tems by relying on conservative best practices, stan-

dards (e.g. MIL-STD-882E: System Safety (DoD,

2012), ISO 26262, Road vehicles-Functional safety

(ISO, 2011), NASA-STD-8719.13C: Software Safety

Standard (NASA, 2013)) and a culture where safety

considerations are integral to all aspects of an organi-

zation (Feiler et al., 2013).

The software becomes increasingly a core inte-

grated part of the safety-critical systems. The indus-

try cost for the software of current-generation aircraft

has reached an $8 billion (Redman et al., 2010). The

avionics system in the F-22 Raptor consists of about

1.7 million lines of software code (Charette, 2009) as

80% of its functionality is achieved by software which

compromised 30% of engineering and manufacturing

development costs. Software in cars is only going to

grow in both amount and complexity. It is estimated

that cars will require 200 million to 300 million lines

of software code in the near future (Charette, 2009).

While a software in isolation cannot do physical

harm, a software in the context of a system and an em-

bedding environment could be vulnerable (Naik and

Tripathy, 2011). For example, a software module in

a database is not hazardous by itself, but if a radia-

tion therapy machine delivers fatal doses to patients

because of a software error then it is not a safe soft-

ware (Leveson and Turner, 1993). Software is con-

sidered safety-critical if it controls or monitors haz-

ardous or safety-critical hardware or software. Such

software usually resides on remote, embedded, and/or

real-time systems (NASA, 2013).

The organizations developing safety-critical soft-

ware systems should have a clear testing strategy that

deﬁnes the methods for identifying, tracking, evalu-

ating and eliminating hazards associated with a sys-

tem. Despite best build-then-test practices, system-

level faults due to software have increasingly domi-

nated the rework effort for faults discovered during

system integration and acceptance testing. Several

studies of safety-critical systems show that 80% of all

errors are not discovered until system integration or

later. The rework effort to correct a problem in later

phases can be as high as 300-1000 times the cost of

in-phase correction (Feiler et al., 2013).

In order to trigger any favorable change in this

state of practice, a serious effort is required in pre-

dicting the trends, learning the stakeholder mindsets,

and pinpointing the problem areas in software testing.

Unfortunately, little contemporary data exists to docu-

ment the actual practices used by software profession-

als for software testing and quality assurance (QA)

activities for safety-critical systems. This is partly be-

cause the data are commercially sensitive, and partly

because the data is not always collected systemati-

Kassab, M.

Testing Practices of Software in Safety Critical Systems: Industrial Survey.

DOI: 10.5220/0006797003590367

In Proceedings of the 20th International Conference on Enterprise Information Systems (ICEIS 2018), pages 359-367

ISBN: 978-989-758-298-1

359

cally (McDermid and Kelly, 2006).

We conducted a comprehensive survey of software

professionals to attempt to discover these practices.

Surveys of software industry professionals are an ef-

fective way to determine the current trends in soft-

ware engineering processes. Survey responses can

also help others to understand the relationship be-

tween area such as software quality and testing (Ng

et al., 2004). A carefully constructed survey has the

potential to: 1) remedy the deﬁciency of lack of data

and 2) to identify the software testing best practices,

which can then be disseminated. Based on these two

objectives, We designed a survey study on the cur-

rent software testing state of practice. While an initial

view from the survey results was presented in (Kassab

et al., 2017), in this paper we provide a different prag-

matic view to report on the state of practice for testing

software in safety-critical systems.

The rest of the paper is organized as follows: Sec-

tion 2 discusses related work while Section 3 de-

scribes the survey’s design and conduct. In Section

4 we provide general statistics regarding the partic-

ipants, their organizations and the reported projects.

Section 5 provides our ﬁndings on the state of prac-

tice of testing software for safety-critical systems and

provides some comparison with testing software for

non-safety-critical systems. Section 6 discusses the

limitation to validity. Finally, the conclusions are pre-

sented in Section 7.

2 RELATED WORK

There are few works available involving surveys of

software professionals with respect to testing (e.g.

(Haberl et al., 2011), (Ng et al., 2004), (Causevic

et al., 2010), (Turkish-Testing-Board, 2014), (ISTQB,

2014), (Kasurinen et al., 2010), (Kanij et al., 2011),

(Knauss et al., 2016)). For example, in (Knauss

et al., 2016), the authors presented an investigation

the state-of-the-art and future trends of testing crit-

ical cyber-physical systems on the example of active

safety systems for vehicles. The results from conduct-

ing four focus groups with Swedish industrial partners

show that while the main testing processes and sce-

narios are supported, there is a clear need to enable

testing of more complex scenarios in realistic settings,

as well as increasing the degree of automating there-

for to achieve better repeatability and a more effective

test resource usage. Indeed, many other survey results

indicate that the percent of automated testing is low

in industry (Causevic et al., 2010), (Raﬁ et al., 2012),

(Lee et al., 2012). Supporting this fact are other stud-

ies that indicate tool adoption is also low (Ng et al.,

2004), (Garousi and Varma, 2010). While current

survey studies show that organizations do not make

effective use of testing tools (Grindal et al., 2006),

Garousi et al. (Garousi and Varma, 2010) found that

automated testing has increased since 2004.

Causevics survey results (Causevic et al., 2010)

indicated that the use of open source vs. proprietary

testing tools depended on whether or not they were

unit testing or performing higher level system test-

ing. Although Causevic (Causevic et al., 2010) found

that writing test cases before writing code is mostly

not considered a current practice, our survey results

showed that these approaches are becoming estab-

lished in practice.

Other surveys with focus on the cost and produc-

tivity aspects of testing also exist. For example, Ka-

surinen et al. (Kasurinen et al., 2010) examined the

cost factor for testing, ﬁnding that testing is often a

much underestimated part of the project. These re-

searchers found, however, that more effective testing

process may reduce testing time which is often under-

estimated.

Another area of testing research is how to build

an effective testing team. Kanij, Merkel and Grundy

(Kanij et al., 2011) conducted a survey of software

practitioners to determine the importance of factors

in building testing teams. The results suggest that ex-

perience in software testing is more important than a

team members interpersonal skills. The results also

suggest the desire for the testing team to be built

with members having diverse work experience (Kanij

et al., 2011).

On the other hands, The U.S. Army has recog-

nized that qualifying the airworthiness of rotorcraft

has increasingly become infeasible with current soft-

ware test practices trying to achieve full code cov-

erage due to increased software size and interaction

complexity (Boydston and Lewis, 2009). Compli-

ance with standards and practices speciﬁc to the cer-

tiﬁcation of safety-critical software systems such as

DO-178B and C, SAE ARP 4754, and SAE ARP

4761 (Johnson et al., 1998), (International, 1996) be-

comes essential instrument besides testing to improve

the quality and ensure the safety of the software.

The Software Engineering Institute (SEI) published in

2013 a white paper presenting an improvement strat-

egy comprising four pillars of an integrate-then-build

practice that lead to improved quality through early

defect discovery and incremental end-to-end valida-

tion and veriﬁcation (Feiler et al., 2013). The four pil-

lars are: 1. Capture of mission and safety-criticality

requirements in analyzable form; 2. Virtual integra-

tion of the physical system, hardware platform, and

software architectures through consistent analyzable

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

360

architecture models; 3. Static analysis techniques ap-

plied to the models and actual system implementation

to complement testing; and 4. Incremental assurance

of justiﬁed conﬁdence through consistent end-to-end

evidence throughout the development life cycle.

There are also studies to investigate the sources

of errors in safety-critical systems. For example,

studies of safety-critical software-reliant systems de-

veloped show that 70% of the errors are introduced

in the requirements (35%) and architecture design

phases (35%) (Planning, 2002), (Galin, 2004), (Dab-

ney, 2003). A NASA study traced down requirements

errors and found that missing requirements accounts

for 33% of the all requirements errors comparing to

24% for incorrect requirements, 21% for incomplete

requirements, 6% for and ambiguous, 5% to inconsis-

tency, and 6% to over-speciﬁed requirements (Hayes,

2003).

3 SURVEY DESIGN AND

CONDUCT

We designed a web-based survey using Question-

Pro tool (www.QuestionPro.com). The survey ques-

tions were designed after a careful review to similar

other conducted survey studies (Haberl et al., 2011),

(Ng et al., 2004), (Causevic et al., 2010), (Turkish-

Testing-Board, 2014), (ISTQB, 2014). To allow a

valid comparison with the other conducted surveys,

we also included selected questions from these sur-

veys into ours. In total, the survey consisted of

40 questions arranged into six sections related to:

project information, integrating software testing ac-

tivities within the software development life cycle;

software testing methodologies and techniques; test-

ing metrics and defects management, organizations

information and participants professional informa-

tion. A summary of our survey questions is available

in https://goo.gl/kGBLhq. In this summary we also

highlight which questions are shared with the other

surveys mentioned above.

We drew our survey participants from multiple

sources but primarily from a database of past gradu-

ate students in Software Engineering of the Penn State

School of Graduate Professional Studies. The school

caters primarily to working professionals. An email

invitation (and subsequent reminder) was sent to these

individuals. We also posted an invitation at related

Linked-In professional testing and quality groups, to

which the author belonged. Respondents were asked

to base their responses on only one software project

that they were either currently involved with or had

taken part in during the past ﬁve years.

We collected survey data through two phases. The

ﬁrst phase was from January and June 2015. At the

end of this phase, we conducted sessions to analyze

the captured responses. We presented an initial view

of the overall collected results from all projects by the

end of this phase (Kassab et al., 2017). We also made

a decision to go into a second phase to collect more

responses. The second phase lasted from June to De-

cember 2016. Overall, of the 293 who viewed the

survey; there were 195 who started taking the survey.

Of these survey takers; there were 72 who completed

the survey all the way to the end. The completion

rate was 37% and the average time taken to complete

the survey was 17 minutes. We also included the re-

sults of the partially completed responses, which have

been analyzed following the standard statistical prac-

tices (Phillips and Sweeting, 1996), particularly re-

vived in medical research (Rezvan et al., 2015), tak-

ing also into account the speciﬁcity of online ques-

tionnaires (including the lack of responses or the par-

tial responses), as discussed in details in (Tsikriktsis,

2005). All responses were treated anonymously and

only aggregate data were used - not the individual re-

sponses.

4 GENERAL STATISTICS

REGARDING THE

PARTICIPANTS AND

PROJECTS

CHARACTERISTICS

In order to make well-informed statements about the

practice of software testing, it was essential to at-

tract as many experienced participants as possible.

Both objectives were achieved for this study. Eigh-

teen different industries were represented. The re-

ported professional experience represented in the sur-

vey was impressive with an average of 7.8 years of

related IT/Software experience. The reported aca-

demic qualiﬁcation indicated that 100% of survey

participants have successfully completed a bache-

lors or equivalent, and 32% even hold a masters

degree or doctorate. The survey responses cap-

tured a diverse mix of positions within the chosen

projects. To view the complete survey results on par-

ticipant and project characteristics in a graphical for-

mat, we suggest to the reader to view these charts at:

https://goo.gl/xWHEhO

Since we aimed at classifying the projects based

on their safety-criticality, the respondents were asked

to specify the maximum loss or damage if the soft-

ware being developed for the project failed (that is,

Testing Practices of Software in Safety Critical Systems: Industrial Survey

361

the delivered service no longer complies with the

speciﬁcations). There were 62 responses reported a

highly critical system where serious failure could in-

volve loss of one or more lives. The majority of re-

spondents (133 responses) reported the loss would be

limited to essential funds, discretionary funds or com-

fort. In this paper, we analyze in depth those 62 re-

sponses for the safety-critical software and compare

their reported practices with the non-safety-critical

software projects. In general, the projects were dis-

tributed across different categories with bias towards

database projects (22% of the projects). In case of

the safety-critical software sample, there was an obvi-

ous bias towards embedded systems (42%). The over-

all projects sample showed also a distribution across

a broad range of application domains with a mild

bias towards applications in the Information Tech-

nology sector (15% of the projects) while in case of

the of the safety-critical software sample, both the

Aerospace and Defense domains dominated the dis-

tribution at 53% and 37% correspondingly. The ma-

jority of the projects were classiﬁed as new devel-

opment (at 64% for the overall sample and 47% for

the safety-critical software sample), while 14% of the

overall projects were legacy system evolution (32%

of the safety-critical software) and 14% of the over-

all sample were classiﬁed as enhancement projects

(21% of the safety-critical software). The overall

sample population presents companies located in dif-

ferent geographic regions (9 countries were repre-

sented in this survey). As far as the size of the par-

ticipating companies is concerned, a representative

sample can be determined. Almost 44% of the par-

ticipants work in small companies (with 1- 100 full

time employees). But also very large companies (with

more than 1000 full time employees) are well repre-

sented at 28%. Regarding those software projects for

safety-critical systems, 40% of the corresponding re-

spondents worked in very large companies. It was

surprising to ﬁnd that the question enquiring about

an independent QA department in the organization

was afﬁrmed by 56% from the safety-critical projects

comparing to the higher ﬁgure of 67% from the non-

safety-critical projects.

5 STATE OF PRACTICE OF

TESTING SOFTWARE IN

SAFETY CRITICAL SYSTEMS

In this section we report on our ﬁndings from this sur-

vey in respect to three areas related to software testing

for safety-critical software: 1) The integration of the

testing activities within the software development life

cycle; 2) Testing methods and techniques; 3) Testing

metrics and defects management.

5.1 Integrating Testing Activities

Within Software Development Life

Cycle

Since software development has changed increasingly

into an engineering discipline, often involving widely

distributed teams, the methods and frameworks used

have evolved accordingly. The need for efﬁcient and

cost-effective software production has also reached

software testing. Hence, we were motivated to in-

vestigate how is the software testing methodically

implemented in organizations. Several studies of

safety-critical systems show that 80% of all errors

are not discovered until system integration or later

(Feiler et al., 2013). The results from our survey

provided an evidence that performing QA measures

is concentrated on the late phases of software de-

velopment. This was actually the pattern in both

samples of safety-critical and non-safety-critical soft-

ware. Only 29% of the participants from the safety-

critical software agreed that they use QA measures

in the Study & Concept phase. While the shares of

those that use quality assurance in the Requirements

Speciﬁcations and System Design phases are 59% for

each. From the Implementation phase onwards, qual-

ity assurance practices increase signiﬁcantly (Figure

1). We observed that quality activities for the safety-

critical software were performed at a higher rate than

for non-safety-critical software in every phase of de-

velopment except for Implementation.

Figure 1: In which phases are Quality Assurance measures

are / were applicable within this project?

When asked on the ”Testing as deﬁned phase in

the project development”; 85% of participants from

the safety-critical software sample reported a level of

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

362

agreement (strongly agree or agree) that this was the

actual practice in the project, and 92% reported an

agreement that they personally prefer to have test-

ing as a deﬁned phase on its own. The correspond-

ing numbers for the non-safety-critical software were

70% for the actual practices and 82% for the pref-

erence. If we considered a dissatisfaction in a ques-

tion as the difference between the current and the pre-

ferred practice, then these numbers indicate a higher

level of satisfaction in the safety-critical software on

how the software testing activities are distinct from

the rest of activities within a project. Planning the

duration and the budget of the quality assurance ac-

tivities together with other software development ac-

tivities in a total package was the most common ap-

proach in both safety-critical software (44%) and the

non-safety-critical software (37%) whenever the du-

ration and budget were planned (See Figure 2). The

safety-critical software sample showed a more matu-

rity level in budget and duration planning as only 5%

of the participants do not plan any explicit estimation

for QA activities (comparing to 18% from the non-

safety-critical sample) (see Figure 2).

About 33% from safety-critical software sample

reported that the team is not doing a good job for es-

timating the size / effort of software testing activates

(comparing to 30% from the non-safety-critical soft-

ware); nevertheless, it was surprising to see a signif-

icant difference between the two samples when re-

porting on that they didn’t have enough time to test

the software before its deployment (66% for safety-

critical and only 39% for non-safety-critical).

Figure 2: The quality assurance effort (budget and time) in

your projects is planned ...

5.2 Testing Methods and Techniques

Respondents, who indicated that they perform testing

activities were additionally asked questions regarding

tools and techniques in use within their projects.

It is very common for organizations to have de-

ﬁned levels of testing. Those levels include unit, inte-

gration, system, acceptance and regression testing.

While the results in Figure 3 clearly show that

system-level testing is the most common level of

testing for safety-critical software (93% of partici-

pants reported applying system test for their projects),

around 90% of participants reported that system test-

ing was applied in order to test more than just one

characteristic of the system with a clear focus on test-

ing the functionality (Figure 4). Performance was the

most tested quality attribute in the surveyed projects.

This was the case for both samples: safety and non-

safety-critical software. As one may expect, both

regulatory and reliability testing were executed at a

higher rate for safety-critical software in comparison

to non-safety-critical (Figure 4).

Regression Testing is a level of software testing

that seeks to uncover new software bugs, or regres-

sions, in existing functional and non-functional ar-

eas of a system after changes such as enhancements,

patches or conﬁguration changes, have been made to

them. We observed that Regression testing is ex-

ecuted more frequently for safety-critical software

(75% of respondents) in comparison to non-safety-

critical (49%) (Figure 4). Only 14% from the safety-

critical software sample reported that regression test-

ing was outsourced for their projects; and this num-

ber is close to the only 20% of participants from

safety-critical software who personally preferred to

outsource regression testing. This indicates a level of

satisfaction on the current outsourcing practices of the

regression testing activities.

Figure 3: Which of the following levels of testing are/ were

applied for this project? (Select all that apply).

The survey showed that in all of their dynamic

testing activities, participants from both samples use

principally Black-Box testing techniques (80% used

this technique in safety-critical sample comparing to

Testing Practices of Software in Safety Critical Systems: Industrial Survey

363

Figure 4: Which of the following types of system tests are /

were executed for your project? (Select all that apply).

79% in the non-safety-critical systems sample). It

is noticeable the difference between the two samples

on using the structured-based techniques (white-box).

Sixty percent of the safety-critical software reported

using white box in comparison to 39% of the non-

safety-critical software.

The results show that systematic test case design

and test data deﬁnition for the non-safety-critical soft-

ware are widespread. Even though 60% of the partic-

ipants reported that they prefer to have the test cases

written before writing the code; only 28% reported

that this was the actual practice in their projects.

It was surprising to see that for the non-safety-

critical software category of respondents, the current

practice of writing test cases before writing code was

more popular than for the safety-critical software cat-

egory (reported only at 14%). However, while the

non-safety-critical respondents seem quite willing to

even improve the situation more, the safety-critical re-

spondents show no interest as a group in changing

towards a more test-driven development (preference

was reported only at 20%). This is noteworthy con-

sidering the fact that empirical studies seem to ascribe

test-driven developed code a high external code qual-

ity. For fairness sake, it is not trivial to see how such

a practice would affect, and be affected by, other spe-

ciﬁc aspects of safety-critical system development,

e.g., fulﬁllment of safety certiﬁcation standards.

On the positive side, formal languages are used

more often to describe test cases for the safety-critical

software (57%) than in non-safety-critical software

(43%). Most of the test cases for the non-safety-

critical software were described freely in verbal or

text-based forms (56%), while only 29% of the test

cases from the safety-critical sample were described

this way. The overall effectiveness of test cases for

the safety-critical software is rated high since almost

60% reported that most or all of defects are found

during test cases execution. This number was higher

than the one from non-safety-critical software sam-

ple (43%). About 40% rate test case effectiveness in

safety-critical software as medium (some defects are

found) (Figure 5).

Figure 5: How many of the defects in this project are / were

discovered by executing the test cases?

The expected test results from safety-critical soft-

ware sample were available prior to test execution

in 53% of all cases (in comparison to 70% for non-

safety-critical sample). This survey result is a sur-

prise. On the other hand, both samples were almost

similar in comparing the expected results with the ac-

tual results manually (80% for safety sample and 82%

for non-safety sample).

Data protection is a matter that organizations must

take very seriously these days. It is therefore quite

surprising that 33% of safety-critical sample use the

original the production data (in comparison to 20%

from non-safety-critical sample), and only 33% of the

respondents comprehensively document the test data

(Figure 6). In addition, majority (60%) of the survey

participants stated that they do not explicitly distin-

guish between test case generation and the generation

of associated test data. Only 40% of safety-critical

reported on using a separate test system. Respon-

dents from this sample utilized more often using the

integration system for testing purposes (73%) while

respondents from the non-safety-critical systems uti-

lized more often using the development system (82%)

(Figure 7).

5.3 Testing Metrics and Defects

Management

The results on exit criteria to conclude testing activ-

ities indicated that the majority from both samples

were in favor of concluding the test when ”all planned

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

364

Figure 6: In this project, test data are / were ...

Figure 7: Which system environments are used for testing

in your projects?

test activities were performed” and when ”each re-

quirement has been tested at least once” (Figure 8).

Although a good test process maturity can be con-

cluded from this, it is nonetheless 29% of the safety-

critical participants close their test activities when the

delivery time has been reached (42% for the non-

safety-critical). This ﬁnding is an indication that even

though the safety-critical sample showed a more ma-

ture practice; testing continues to be planned as a

”buffer” in the project that will be sacriﬁced if there

are delays from preceding project phases.

The most common cause in both samples for the

discovered defects was related to requirements prob-

lems (omissions, changes, and errors) - 72% reported

this cause. This is consistent with the ﬁndings from

other studies we referred to in Section 2 on the sources

of defects in the Software (Planning, 2002), (Galin,

2004), (Dabney, 2003), (Kassab, 2014), (Kassab,

2015). Design problems was the second most re-

ported cause at 66%. JIRA was the most used tool

to report the defects.

Figure 8: For test completion, the following exit criteria are

used ...

6 LIMITATION TO VALIDITY

We carefully examined our study for the possible

types threats to validity described in (Campbell and

Stanley, 2015), (Hyman, 1982) and (Wohlin et al.,

2012). One possible internal threat to validity that

we identiﬁed is related to the instrumentation. This

is the effect caused by the artifacts (e.g. survey ques-

tions) if these are badly designed. Our survey ques-

tions were designed after a careful review to similar

other conducted survey studies (Haberl et al., 2011),

(Ng et al., 2004), (Causevic et al., 2010), (Turkish-

Testing-Board, 2014), (ISTQB, 2014). In addition,

we sent out the link to the survey to a number of re-

searchers to collect their feedback before the data col-

lection phase started. We addressed the feedback to-

wards improving the quality of the questions. One

received comment during the survey design assess-

ment phase suggested providing an explanation next

the possible answers for particular questions to reduce

possible ambiguity. We implemented this suggestion

when applicable. For example, in the question on

what levels of testing were executed for the project, a

clear deﬁnition was provided next to each of the pos-

sible answers a participant may select from (e.g. unit

testing, integration testing, acceptance testing, etc.).

Another possible internal threat is related to the

morality. This is the effect due to the different kinds

of persons who drop out from the survey. We care-

fully examined the sample whom dropped out in re-

gards of three participants characteristics: job role,

education level and years of experiences. We ob-

served that the drop out sample was representative to

the total sample. We couldn’t relate the dropout to a

particular participants characteristic. In addition, we

examined the results from the perceptions of manage-

rial job roles and non-managerial roles. No signiﬁcant

Testing Practices of Software in Safety Critical Systems: Industrial Survey

365

differences on the results came from these two sam-

ples.

A third aspect of internal validity we examined

was related to history. This is related to different re-

sults may be obtained by the same sample at differ-

ent times. To address this, we run our data collection

phase into two rounds as explained in Section 3. We

followed the same recruiting strategy in each round.

If we break the results into two samples to correspond

to the two rounds, we also observe no signiﬁcant dif-

ferences on the ﬁndings between the two samples.

On the external validity threats, we examined the

possible threat related to interaction of selection and

treatment. This is an effect of having a subject pop-

ulation, not representative of the population we want

to generalize to. Judging from the participants char-

acteristics we presented in Section 3, this was not the

case. A second possible external validity threat may

be related to the interaction of history and treatment.

For example, if the survey is taken a few days after a

big software-related crash, people tend to answer dif-

ferently than few days before. To address this threat,

the two data collection phases were spanned over rel-

atively a long period (12 months).

Lastly, it is worth saying that all the studies

like this require replications and conﬁrmatory stud-

ies, especially in software engineering. To facili-

tate the replication of our study, we posted the sur-

vey questions from this survey through the link:

https://goo.gl/kGBLhq. Researchers and practitioners

are welcome to execute further analysis on the data.

The author is available to offer the original text of the

questionnaire to any scientist interested in replication.

7 CONCLUSION

In this study, we collected 195 partial or complete re-

sponses from software professionals from wide range

of industries and backgrounds to analyze the software

testing state of practice. In this paper, we provided a

pragmatic view from the results to focus on the soft-

ware in safety-critical systems. The survey results in-

cluded a wide variety of raw information, but inter-

preting some of this information, we offer the follow-

ing key ﬁndings:

• Performing quality assurance measures for soft-

ware in safety-critical systems is concentrated on

the late phases of software development.

• The safety-critical software sample showed a

higher level of maturity (comparing to non-safety-

critical sample) in maintaining ”testing” as a dis-

tinct phase.

• The overall effectiveness of test cases is perceived

to be high for software in safety-critical soft-

ware (60%) comparing to 43% for the non-safety-

critical.

• Safety-critical software utilizes mostly formal

languages to describe test cases.

• Regulatory and reliability testing are executed at a

higher rate for safety-critical software (comparing

to non-safety-critical sample). White-box testing

techniques are also executed at a higher rate.

• Particular areas that require an attention for poten-

tial improvement include:

– Systematic test case design and test data deﬁni-

tion are not uniformly practiced.

– Expected results need to be available sooner

and prior to test execution.

– The usage of production data need to be

avoided.

– There is a need to improve the size and effort

estimation practice for testing activities.

– Requirements problems continue to be the most

common cause for the discovered defects.

We hope the survey and corresponding results

stimulate research into prevailing software practices,

but moreover, we intend these results to highlight the

areas of software testing that need the attention of

both the research community and the industry profes-

sional. Our own subsequent work will offer more de-

tailed analysis of some of the survey results. We also

plan to replicate the study in the near future to observe

any potential changes in the landscape of testing prac-

tices for software in safety critical systems.

REFERENCES

Boydston, A. and Lewis, W. (2009). Qualiﬁcation and reli-

ability of complex electronic rotorcraft systems. In

Army Helicopter Society System Engineering Meet-

ing.

Campbell, D. T. and Stanley, J. C. (2015). Experimental

and quasi-experimental designs for research. Ravenio

Books.

Causevic, A., Sundmark, D., and Punnekkat, S. (2010). An

industrial survey on contemporary aspects of software

testing. In Software Testing, Veriﬁcation and Valida-

tion (ICST), 2010 Third International Conference on,

pages 393–401. IEEE.

Charette, R. N. (2009). This car runs on code. IEEE spec-

trum, 46(3):3.

Dabney, J. (2003). Return on investment of independent

veriﬁcation and validation study preliminary phase 2b

report. Fairmont, WV: NASA IV&V Facility.

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

366

DoD, U. (2012). Mil-std-882e, department of defense stan-

dard practice system safety. US Department of De-

fense.

Feiler, P., Goodenough, J., Gurﬁnkel, A., Weinstock, C.,

and Wrage, L. (2013). Four pillars for improving

the quality of safety-critical software-reliant systems.

Technical report, SOFTWARE ENGINEERING IN-

STITUTE.

Galin, D. (2004). Software quality assurance: from theory

to implementation. Pearson Education India.

Garousi, V. and Varma, T. (2010). A replicated survey of

software testing practices in the Canadian province of

Alberta: What has changed from 2004 to 2009? Jour-

nal of Systems and Software, 83(11):2251–2262.

Grindal, M., Offutt, J., and Mellin, J. (2006). On the testing

maturity of software producing organizations. In Test-

ing: Academic and Industrial Conference-Practice

And Research Techniques, pages 171–180. IEEE.

Haberl, P., Spillner, A., Vosseberg, K., and Winter, M.

(2011). Survey 2011: Software test in practice. Trans-

lation of Umfrage.

Hayes, J. H. (2003). Building a requirement fault taxon-

omy: Experiences from a NASA veriﬁcation and vali-

dation research project. In 14th International Sympo-

sium on Software Reliability Engineering., pages 49–

59. IEEE.

Hyman, R. (1982). Quasi-experimentation: Design and

analysis issues for ﬁeld settings (book). Journal of

Personality Assessment, 46(1):96–97.

International, S. (1996). Guidelines and methods for con-

ducting the safety assessment process on civil air-

borne systems and equipment. SAE International.

ISO, I. (2011). 26262: Road vehicles-functional safety. In-

ternational Standard ISO/FDIS, 26262.

ISTQB (2014). Istqb effectiveness survey 2013-14.

http://www.istqb.org/documents/ISTQB Effectiveness

Survey 2013 14.pdf.

Johnson, L. A. et al. (1998). Do-178b, software considera-

tions in airborne systems and equipment certiﬁcation.

Crosstalk, October, 199.

Kanij, T., Merkel, R., and Grundy, J. (2011). A pre-

liminary study on factors affecting software testing

team performance. In 2011 International Symposium

on Empirical Software Engineering and Measurement

(ESEM), pages 359–362. IEEE.

Kassab, M. (2014). An empirical study on the requirements

engineering practices for agile software development.

In 40th EUROMICRO Conference on Software En-

gineering and Advanced Applications (SEAA), pages

254–261.

Kassab, M. (2015). The changing landscape of require-

ments engineering practices over the past decade. In

IEEE 5th International Workshop on Empirical Re-

quirements Engineering (EmpiRE), pages 1–8.

Kassab, M., DeFranco, J. F., and Laplante, P. A. (2017).

Software testing: The state of the practice. IEEE Soft-

ware, 34(5):46–52.

Kasurinen, J., Taipale, O., and Smolander, K. (2010). Soft-

ware test automation in practice: empirical observa-

tions. Advances in Software Engineering, 2010.

Knauss, A., Berger, C., and Eriksson, H. (2016). Towards

state-of-the-art and future trends in testing of active

safety systems. In Proceedings of the 2nd Interna-

tional Workshop on Software Engineering for Smart

Cyber-Physical Systems, pages 36–42. ACM.

Lee, J., Kang, S., and Lee, D. (2012). Survey on software

testing practices. IET software, 6(3):275–282.

Leveson, N. G. and Turner, C. S. (1993). An investigation

of the Therac-25 accidents. Computer, 26(7):18–41.

McDermid, J. and Kelly, T. (2006). Software in safety crit-

ical systems-achievement & prediction. Nuclear Fu-

ture, 2(3):140.

Naik, K. and Tripathy, P. (2011). Software testing and qual-

ity assurance: theory and practice. John Wiley &

Sons.

NASA (2013). Nasa-std 8719.13 software safety stan-

dard. https://standards.nasa.gov/standard/nasa/nasa-

gb-871913.

Ng, S., Murnane, T., Reed, K., Grant, D., and Chen, T.

(2004). A preliminary survey on software testing prac-

tices in Australia. In Software Engineering Confer-

ence, 2004. Proceedings. 2004 Australian, pages 116–

125. IEEE.

Phillips, M. and Sweeting, T. (1996). Estimation for cen-

sored exponential data when the censoring times are

subject to error. Journal of the Royal Statistical Soci-

ety. Series B (Methodological), pages 775–783.

Planning, S. (2002). The economic impacts of inadequate

infrastructure for software testing.

Raﬁ, D. M., Moses, K. R. K., Petersen, K., and M

antyl

M. V. (2012). Beneﬁts and limitations of automated

software testing: Systematic literature review and

practitioner survey. In Proceedings of the 7th Inter-

national Workshop on Automation of Software Test,

pages 36–42. IEEE Press.

Redman, D., Ward, D., Chilenski, J., and Pollari, G. (2010).

Virtual integration for improved system design. In An-

alytic Virtual Integration of Cyber-Physical Systems

Workshop (AVICPS), volume 52498, pages 57–64.

Rezvan, P. H., Lee, K. J., and Simpson, J. A. (2015).

The rise of multiple imputation: a review of the re-

porting and implementation of the method in med-

ical research. BMC medical research methodology,

15(1):30.

Tsikriktsis, N. (2005). A review of techniques for treating

missing data in om survey research. Journal of Oper-

ations Management, 24(1):53–62.

Turkish-Testing-Board (2014). Software quality

report 2014-2015 released by Turkish test-

ing board. http://www.istqb.org/documents/

TurkeySoftwareQualityReport 2014 2015.pdf.

Wohlin, C., Runeson, P., H

ost, M., Ohlsson, M. C., Reg-

nell, B., and Wessl

en, A. (2012). Experimentation in

software engineering. Springer Science & Business

Media.

Testing Practices of Software in Safety Critical Systems: Industrial Survey

367