Cost-effective Functional Testing of Reactive Software

R. Venkatesh, Ulka Shrotri, Amey Zare and Supriya Agrawal

Tata Research Development and Design Centre, Pune, India

Keywords:

Formal Speciﬁcation, Functional Test Generation.

Abstract:

Creating test cases to cover all functional requirements of real-world systems is hard, even for domain experts.

Any method to generate functional test cases must have three attributes: (a) an easy-to-use formal notation

to specify requirements, from a practitioner’s point of view, (b) a scalable test-generation algorithm, and (c)

coverage criteria that map to requirements.

In this paper we present a method that has all these attributes. First, it includes Expressive Decision Table

(EDT), a requirement speciﬁcation notation designed to reduce translation efforts. Second, it implements

a novel scalable row-guided random algorithm with fuzzing (RGRaF)(pronounced R-graph) to generate test

cases. Finally, it implements two new coverage criteria targeted at requirements and requirement interactions.

To evaluate our method, we conducted experiments on three real-world applications. In these experiments,

RGRaF achieved better coverage than pure random test case generation. When compared with manual ap-

proach, our test cases subsumed all manual test cases and achieved up to 60% effort savings. More impor-

tantly, our test cases, when run on code, uncovered a bug in a post-production sub-system and captured three

missing requirements in another.

1 INTRODUCTION

Safety critical standards such as DO-178B (do1,

1994) mandate requirements coverage during func-

tional testing. However, functional test case gener-

ation is an intellectually demanding and critical task

that has a strong impact on the effectiveness and ef-

ﬁciency of the entire testing process (Anand et al.,

2013). For large and complex reactive software, it is

difﬁcult even for domain experts to envision all inter-

actions between requirements. This sometimes makes

it impossible to write functional test cases that cover

all requirements and the interactions among them.

Hence, there is a need to automatically generate func-

tional test cases.

Random Test case Generation (RTG) (Arcuri

et al., 2010) and Model-Based Testing (MBT) (Dalal

et al., 1999) are two traditional techniques that are

used for functional test case generation for reactive

software. However, RTG generates input sequences

using only input signals and their types and does not

need speciﬁcation of requirements of a system and

hence does not generate expected output along with

the generated input sequence. Additional efforts are

therefore required to determine the expected results.

Moreover, RTG is likely to generate a large number

of redundant test cases. MBT is implemented by sev-

eral tools (Reactis, ; Peranandam et al., 2012; Wang

et al., 2014; Harel et al., 1990), but it is not widely

adopted by the practitioners as the requirements need

to be speciﬁed in a formal language supported by the

tool. Often, the language supported by these tools

demands a strong mathematical background from the

user or require the user to design the state space of

the problem even if it is not part of the requirements

(Thyssen and Hummel, 2013). This activity is effort-

intensive and adversely affects the overall cost of the

approach. In fact, very little is known about the cost-

effectiveness of MBT (Briand, 2010). Moreover, the

syntactic structure of these languages is very different

from the original requirements description, so there

is no direct mapping from speciﬁcations to require-

ments. As a result, the coverage targeted by these

tools, such as state and transition coverage, does not

directly map to the requirements (Tahat et al., 2001).

MBT tools use a combination of random generation

and constraint solving to generate test cases, however,

neither of these techniques scale up to industry-size

applications (Cadar and Sen, 2013; P˘as˘areanu and

Rungta, 2010).

In short existing methods have the following limi-

tations - a) They are effort intensive as they either re-

quire a speciﬁcation in a formal language or they need

expected results to be determined, b) The algorithms

Venkatesh R., Shrotri U., Zare A. and Agrawal S..

Cost-effective Functional Testing of Reactive Software.

DOI: 10.5220/0005347800670077

In Proceedings of the 10th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE-2015), pages 67-77

ISBN: 978-989-758-100-7

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

they implement do not scale up to industry-size ap-

plications and c) The generated test cases do not map

directly to requirements.

We present a requirements-driven method, EDT-

Based Testing (EBT), that overcomes the aforemen-

tioned limitations with current MBT methods. To re-

duce speciﬁcation efforts EBT uses Expressive De-

cision Tables (EDT) (Venkatesh et al., 2014) as a

formal language. The authors of (Venkatesh et al.,

2014) have shown that EDT is more efﬁcient and ef-

fective for specifying functional requirements of re-

active systems, as compared to a state-based formal-

ism such as Statecharts (Harel et al., 1990) and Soft-

ware Cost Reduction (SCR) (Heitmeyer et al., 1998).

In an EDT speciﬁcation, rows map directly to re-

quirements that are described in a natural language.

EBT includes a novel algorithm that combines row-

guided random input generation with fuzzing at time

boundaries (RGRaF) to scale up test case genera-

tion. We have implemented the RGRaF algorithm in

a tool called EDT-Test. We introduce row and row-

interaction coverage criteria that target: 1) the row

coverage to measure the coverage of requirements

and, 2) the row-interactions coverage to measure the

interaction between the requirements. We also man-

ifest a novel use of EDT to allow users to model

the environment and specify properties/invariants of

the system for detecting errors, leading to better test

cases. To summarize, EBT involves a) Specifying

requirements in EDT b) Modelling environment and

system properties in EDT c) Generating test cases that

cover rows and row-interactions using RGRaF.

To evaluate the cost-effectiveness and scalability

of EBT, we conducted case studies where we spec-

iﬁed requirements of ten software modules of three

automotive projects in EDT. We then generated test

cases using EDT-Test and evaluated the scalability of

our algorithm by: a) comparing it with a pure random

test case generation algorithm and, b) comparing it

with RGRaF without fuzzing. As manually created

test cases and effort data of all the case studies were

available to us, we assessed the cost-effectiveness of

EBT. Our ﬁndings clearly show that RGRaF scales

better than both pure random test case generation and

RGRaF without fuzzing. It also showed that EBT is

more cost-effectivethan manual testing. Additionally,

during the case studies we found a bug in a production

code and uncovered gaps in requirements showing the

usefulness of the generated test cases.

We have compared our algorithm against a pure

random algorithm because constraint solving does not

scale up (P˘as˘areanu and Rungta, 2010). We validated

this through a small experiment in which we took

an example smaller than the ones in our case-study,

translated it to the C language and ran Autogen (Bokil

et al., 2009) on it. It was not able to generate any test

cases whereas EDT-Test covered all rows. Due to this

experience and ﬁndings from the experiment, we did

not compare EDT-Test against any constraint solving

based tool.

The main contribution of this paper is a cost-

effective method, EBT, to generate functional test

cases. This is achieved by the following:

• Using EDT as a speciﬁcation language to reduce

the effort required in specifying requirements and

extending EDT with constructs that enable easy

modelling of the environment.

• A new row-guided random algorithm that also

adds fuzzing at time boundaries to scale up test

generation.

• Targeting two new test coverage criteria, row and

row-interaction coverage, which ensure that all

requirements and interactions between require-

ments are tested adequately.

The organization of the paper is as follows. Sec-

tion 2 discusses the related work. We explain the EDT

Notation in brief in Section 3. Two new coverage cri-

teria, row and row-interaction, are introduced in Sec-

tion 4. Section 5 describes our algorithm, RGRaF, in

detail. We present extensions to EDT in Section 6

and describe the observations and ﬁndings of the ex-

periments conducted in Section 7. Finally, Section 8

concludes the paper.

2 RELATED WORK

In this section, we discuss a few of the several relevant

publications related to automated test case generation.

These are grouped based on the source speciﬁcation

language, the technique employed and the coverage

criteria targeted.

Generating test cases from speciﬁcations has re-

ceived a lot of attention with tools that have a va-

riety of input languages. There are tools based on

languages such as Software Cost Reduction method

(SCR) (Heitmeyer et al., 1998), Statecharts (Offutt

et al., 2003), Z (Cristi´a et al., 2014), Spec# (Veanes

et al., 2008), and Lustre (Raymond et al., 1998).

These languages require test engineers to specify the

requirements of the system under test (SUT) in the

form of mathematical expressions or state diagrams,

which takes a lot of effort. Tahat et al. (Tahat et al.,

2001) have proposed an approach to generate test

cases from requirements given in textual and SDL

format, but they too do not have any data to show

the beneﬁts of their approach nor have they evaluated

ENASE2015-10thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

Table 1: EDT for Alarm feature.

sno

Ignition

Alarm

in PanicSw out Alarm out Flash

1 Off Off {{Press;Release}{=2}}{<3s} On{=30s};Off {On{=500ms};Off{=500ms}}{=30};No Req

2 On Press{>3s};Release Off No Req

3 On Off No Req

their approach for cost-effectiveness. To the best of

our knowledge, there has been no work that compares

the effort required to specify requirements in a formal

language and generate test cases using a tool for in-

dustry examples. EBT reduces the effort required for

speciﬁcation by choosing a compact and easy to use

notation, EDT (Venkatesh et al., 2014).

Additionally, all the aforementioned tools target

a coverage criterion that is natural to the formal lan-

guage being used. Thus, if the language is Statecharts

based then the state and transition coverage criteria

are targeted, and in the case of Z, pre-/post-operation

relations are considered as coverage criteria. As a re-

sult, these coverage criteria do not always achieve re-

quirements coverage. In contrast, EDT-Test generates

test cases to achieve row and row-interaction cover-

age giving direct mapping to requirements coverage

as EDT rows map directly to requirements.

Several tools employ a constraint solver or a

model-checker to generate test cases. These include

Java Path Finder (Brat et al., 2000), Autogen (Bokil

et al., 2009), KLEE (Cadar et al., 2008) and Pex (Till-

mann and De Halleux, 2008). The scalability of

constraint solvers (Cadar and Sen, 2013; P˘as˘areanu

and Rungta, 2010) continues to limit the applicabil-

ity of these techniques and there are still some chal-

lenges that need to be overcome for its wider adop-

tion (Cadar et al., 2011). Random testing (Hamlet,

2002) has also been studied extensively as an alterna-

tive to systematic testing as it is very easy to gener-

ate a large number of test cases randomly. Theoret-

ical studies indicate that random testing is as effec-

tive as systematic testing (Duran and Ntafos, 1984;

Chen et al., 2010). On the other hand, empirical stud-

ies (Ferguson and Korel, 1996; Marinov et al., 2003)

have shown that pure random testing achieves less

code coverage than systematic test generation. Un-

like existing methods, EBT implements a row-guided

random technique with fuzzing at time boundaries to

achieve scalability and coverage.

3 EDT NOTATION

EDT (Venkatesh et al., 2014) is a tabular notation

to formally specify requirements of reactive systems.

This notation is designed in a manner that makes it

easy to understand and use, and yet keeps it formal

enough to enable automated test case generation. It

provides a uniform notation to specify both – state-

based and sequence-based requirements, leading to

compact speciﬁcations of reactive systems.

An EDT speciﬁcation consists of one or more ta-

ble(s) where the column headers specify the input and

output signal names, and the rows specify relation-

ships between patterns of input and output signal val-

ues or events. We illustrate EDT through partial re-

quirements of the Alarm module of a real world auto-

motive application, which are described below:

1. If Ignition and Alarm are Off, and PanicSw is

pressed and released twice within 3 seconds, then

Alarm should be On for 30 seconds, and Flash

should blink 30 times with a gap of 500 millisec-

onds between each On and Off and should have

Req after that.

2. If Alarm is On and PanicSw is pressed for more

than 3 seconds and then released (called as long

press), then the Flash should be No

Req and

Alarm should be Off.

3. If Ignition becomes On, then Flash should be

No Req and Alarm should be Off.

Table 1 speciﬁes the above requirements using EDT,

in which each row maps directly to one of the require-

ments. The column headers specify three input sig-

nals: Ignition, PanicSw and Alarm, and two output

signals: Flash and Alarm. Its is worth noting that

Alarm is an input and output (I/O) signal. The pattern

expressions in each input cell specify the sequence of

input value(s) that will match the requirements of that

cell. The pattern expressions in an output cell spec-

ify the sequence of signal value(s) that will be output

when the requirements of all the input cells in that row

are matched. The pattern language itself is regular, as

EDT supports a discrete timed model, and can be rec-

ognized by a discrete timed automaton (Bowman and

Gomez, 2006). The pattern Off given in the ﬁrst row

for columns corresponding to the signals Ignition and

Alarm matches when the environment sends the value

Off to the system. The compactness of EDT is illus-

trated by the pattern ‘{{Press;Release}{=2}}{<3s}’

which is detected when the values Press followed by

Release are received twice within three seconds for

the signal PanicSw. The output pattern in the ﬁrst

Cost-effectiveFunctionalTestingofReactiveSoftware

row corresponding to the signal Flash speciﬁes that

the values On followed by Off should be output with

a gap of 500 milliseconds, and this pattern should be

repeated 30 times.

4 COVERAGE CRITERIA

To effectively test the system speciﬁed using EDT,

we propose two coverage criteria – row coverage and

row-interaction coverage, which are described below:

4.1 Requirement/Row Coverage

An EDT row is covered when it is matched in at least

one generated test case. Complete row coverage is

said to be achieved when all rows in the EDT are cov-

ered. The intuition behind row coverage is that an

individual requirement can often be mapped to one

or more EDT row(s) and hence row coverage implies

requirements coverage.

Table 2 illustrates a test case corresponding to

EDT speciﬁcation shown in Table 1. The default val-

ues of input signals Ignition and Alarm are consid-

ered to be Off. When PanicSw values are generated

as Press followed by Release twice within three sec-

onds, that is at time 1500 milliseconds (ms) in Table

2, Row 1 is matched and hence the expected output of

Alarm is On and the ﬂashing pattern is ‘On followed

by Off’.

4.2 Requirement-Interaction/

Row-Interaction Coverage

Requirements, as speciﬁed in EDT, can have the fol-

lowing two types of interactions between them:

• I/O row-interaction: (r

, r

) is said to be a I/O

row-interaction if r

outputs a value that is used

by r

• O/O row-interaction: (r

, r

) is said to be a O/O

row-interaction if both r

and r

output values for

the same signal at the same time.

Row-interaction is covered when a test case cap-

tures either of the aforementioned interactions be-

tween rows.

In the example mentioned in Table 1, because of

the common I/O signal Alarm, there are three I/O row-

interactions: (1, 2), (2, 1) and (3, 1). This is because

the output On to Alarm in Row 1 is used by Row 2

and the output Off to Alarm in Rows 2 and 3 is used

by Row 1. The input sequence shown in the test case

in Table 2 covers the row-interaction (1, 2).

Table 2: Test Case for Row and I/O row-interaction cover-

age.

Time(ms) Input Signals Remarks

0 PanicSw=Press

500 PanicSw=Release

1000 PanicSw=Press

1500 PanicSw=Release Row 1 output starts

2000 PanicSw=Press

5500 PanicSw=Release Row 2 output starts

Table 3: Test Case for O/O row-interaction coverage.

Time(ms) Input Signals Remarks

0 PanicSw=Press

500 PanicSw=Release

1000 PanicSw=Press

1500 PanicSw=Release Row 1 output starts

2000 Ignition=On Row 3 output starts

In Table 1, Rows 1 and 3 form an O/O row-

interaction (1, 3) as both these rows can potentially

affect the output value of the same signal Flash at the

same time. Consider the input sequence shown in Ta-

ble 3. At time 1500 ms, the output pattern for Flash

will start because Row 1 is matched. However,at time

2000 ms the output of Flash is changed to No

Req,

although the previous output pattern is still going on.

This happens because Row 3 is matched due to the

occurrence of Ignition = On. When such input se-

quence is generated in a test case, it is said to have

covered O/O row-interaction (1, 3).

5 RGRaF: ROW-GUIDED

RANDOM ALGORITHM WITH

FUZZING

We now present RGRaF (Figure 3), an algorithm to

generate test cases with expected output from EDT

speciﬁcations. A test case consists of a timed se-

quence of input values and corresponding expected

output values. Each element of the sequence is a

tuple of the form (signalname, value,time, category)

where, signalname is an input signal, value is a valid

value for that signal, time is the time when the value

arrives, and category indicates if the signal is an in-

put, output or I/O signal. The sequence is arranged

in increasing order of time. The test case generation

algorithm that generates a set of these sequences con-

sists of four main steps; Automata construction, In-

put sequence generation (

InpGen

), Expected output

sequence generation (

ExpGen

) and fuzzing at time-

boundaries (

Fuzz

RGRaF begins by building a discrete timed au-

tomaton corresponding to the regular expression in

ENASE2015-10thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

InpGen()

Create a random sequence of rows R

:= ∅

For each r in R

:= Expand(r)

:= I

. I

End For

return I

Figure 1: InpGen.

ExpGen(I

)

For each row r in table T

τ := ExecuteAutomata(r,ﬁrst(I

)

If (r matches)

:= I

⊕ O

of r

:= M

∪ r

End If

For all rows r

that produce some input of r

:= M

∪ {hr

, ri}

End If

min

:= min (τ

min

, τ)

End For

Return (M

, M

, τ

min

)

Figure 2: ExpGen.

each cell, using known techniques. It then invokes

InpGen

, which selects a random sequence of rowsand

then systematically expands each row in the sequence

to produce a sequence of inputs that may match that

row. This input sequence is passed on to

ExpGen

which executes the timed automaton of each cell for

each input to determine the rows that match. When

a row matches,

ExpGen

modiﬁes the input sequence

by adding outputs generated by the matched row, thus

creating the ﬁnal test sequence. During its execution,

ExpGen

invokes

Fuzz

which randomly fuzzes the time

of inputs to increase the probability of time related re-

quirements getting covered. This sequence of

InpGen

and

ExpGen

is repeated till either all rows and row-

interactions are covered or the number of row se-

quences tried exceeds a given threshold (sample size),

S. These steps are described in detail below:

5.1 InpGen: Input Sequence

Generation

This function, Figure 1, ﬁrst creates a row-sequence

by randomly selecting some EDT rows, including

uncovered rows with a higher probability. It then in-

RGRaF Algorithm:

:= Set of all rows(R

)

:= Set of row-interactions(R

)

i := 0

For each cell in each row r

Build its timed automaton

End For

While (i ≤ S and (U

6= ∅ or U

6= ∅ ))

:= InpGen()

While (I

6= ∅)

, M

, τ) := ExpGen(I

)

:= U

- M

:= U

- M

:= I

→ Next

Fuzz: Randomly change time of ﬁrst(I

)

to before or after τ

End While

i := i+1

End While

Figure 3: RGRaF Algorithm.

vokes the function Expand, which generates an input

sequence for each cell of each row r in R

, by select-

ing an element from the language speciﬁed by that

cell’s regular expression. The sequences of all the

cells of a row are merged, maintaining time ordering,

to get an input sequence I

for the row. Each I

appended to I

to get an input sequence for the row.

Note that the expansion of each row proceeds inde-

pendent of the other rows in the sequence and does

not take into account any value for I/O variable that

may be generated by a previous row. As a result, the

actual rows matching the generated sequence of in-

puts could be different from the rows in R

. This sys-

tematic expansion of rows ensures the generation of

input patterns that need repetition. The probability of

such repeated pattern getting generated will be low if

input generation is purely random.

5.2 ExpGen: Expected Output

Sequence Generation

ExpGen, Figure 2, takes as input a sequence I

, con-

sisting of inputs yet to be processed. Each input in

is processed by taking a step of each row r, of the

EDT table T. A step of a row consists of taking a

transition in the automaton of each cell in that row.

Once a step is taken a row matches if all its automata

are in their ﬁnal state, with at least one of them hav-

ing reached the ﬁnal state due to the current signal.

When a row matches, tuples with category output or

I/O corresponding to the output O

of that row are

Cost-effectiveFunctionalTestingofReactiveSoftware

merged (⊕) with the input sequence I

maintaining its

time ordering and the matched row is added to the set

of matched rows M

. If the current row matched due

to outputs generated by a previously matched row r

then the pair hr

, ri is added to the matched interac-

tions M

. Any I/O signal produced by a matched row

is processed in the next step. If an automaton is in

a state that has an outgoing time-out transition it is

said to be in a time-out state. Of all the automata in

a time-out state

ExpGen

returns the smallest time τ

min

at which a time-out transition may occur.

5.3 Fuzz: Fuzzing at Time Boundaries

As in standard discrete timed automata each transi-

tion of a cell’s automaton is either labelled by a signal

value or is a time-out transition of the form hc, op, ni

where c is a clock variable, op is one of the oper-

ators {<, ≤, >, ≥} and n is a positive integer repre-

senting time. Time constraints modelled as time-out

transitions are one of the reasons why model-based

approaches to test generation do not scale up to in-

dustry size code. Random algorithms too are unable

to cover time-based requirements. To address this is-

sue, at the end of each step, we randomly change the

time of inputs occurring around the nearest time τ, at

which a time-out may occur. The generated scenario

is altered by randomly changing the time of some in-

puts that occur either - a) before τ to a time after it or

b) after τ to a time before it.

We call the above alteration fuzzing at time bound-

aries. Consider the scenario presented in Table 2. Af-

ter processing the input at 1500ms, the nearest time-

out will occur at 3000ms due to the PanicSw pattern in

Row 1 of the example given in Table 1. At this point,

the algorithm could randomly choose to fuzz the sce-

nario by changing the time of the input at 2000ms to

3500ms or it could change the time of the input oc-

curring at 5500ms to 2500ms. If fuzzing is not per-

formed, the scenario will be generated only at the

3000ms that is the time-out. Hence, fuzzing helps in

generating scenarios with different time around τ and

thus helps in covering complex time-based scenarios.

All these steps are repeated until full row coverage

and row-interaction coverage is achieved (i.e., U

∅ ∧ U

= ∅), or the number of row sequences tried

exceeds the sample size S.

6 EXTENSIONS TO EDT

For the generated test cases to be useful it should

not have any input combinations that will never be

generated by the environment. To eliminate such in-

valid combinations the environment needs to be spec-

iﬁed. We have extended the EDT notation with a

special output column

RejectFlag

to support easy

modelling of the environment as required for test-

ing. Similarly, we have also added a special col-

umn

ErrorFlag

to support speciﬁcation of proper-

ties. These two extensions are described in detail be-

low.

6.1 Modelling Environment Constraints

In reactive systems, there could be several combina-

tion(s) of input(s) that can never occur in the actual

run of the system. For instance, in a car, the left

and right indicator switches cannot be On simultane-

ously. We provide a special output signal, RejectFlag

to model such environment constraints. These con-

straints are speciﬁed as an EDT row with a Reject out-

put to the RejectFlag column. Sample EDT row spec-

ifying an environmentconstraint is illustrated in Table

4. If a test case generated by the Input Sequence Gen-

Table 4: Speciﬁcation for Environment Constraints.

sno in LeftSw in RightSw out RejectFlag

1 On On Reject

erator matches the row in Table 4, then that test case is

rejected. So RejectFlag is actually used to eliminate

test cases for all the combinations that cannot happen

in the functioning of real-world reactive systems.

6.2 Property Checking

The requirements of real-world reactive systems gen-

erally contain certain safety-critical requirements that

should never be violated during any execution of the

system. These can be seen as properties of the sys-

tem. For example, ‘when a vehicle is moving at a

considerable speed (say, >20 kmph), all doors should

be locked’, is one such requirement. It is often eas-

ier to express such requirements as a system prop-

erty. This property should not be violated by other re-

quirements that alter either the vehicle speed or door

lock/unlock status. To specify such properties, we

provide a special output signal, ErrorFlag. An ex-

Table 5: Speciﬁcation for Property Checking.

sno

VehicleSpeed

in DoorStatus out ErrorFlag

1 > 20 Unlocked Error

ample of specifying system properties is illustrated in

Table 5. As RGRaF generates test cases for row cov-

erage, to cover the row in Table 5, a test case will

be generated that matches this row. Once the row

ENASE2015-10thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

is matched, Expected Output Generator will generate

‘Error’ as the expected output of that test case. This

test case is a counter-example to the given property.

So this special output signal is actually used to detect

and report error for all the signal combinations that

are possible in real-world reactive systems but should

not occur due to pre-deﬁned system properties.

7 EXPERIMENTS: RESULTS AND

OBSERVATIONS

To evaluate the cost-effectiveness and practical use-

fulness of EBT, we conducted case studies on differ-

ent projects. The case studies were conducted to an-

swer the following four questions:

1. Does RGRaF perform better than RGRaF without

fuzzing on real-world projects?

2. Does it take lesser effort to generate test cases

manually when compared with the effort required

to translate requirements to EDT and then gener-

ate test cases?

3. Does RGRaF generate better test cases than the

manually written test cases?

To investigate the aforementioned questions we

needed real-world projects which had documented re-

quirements in a natural language, manually written

test cases and also detailed data of effort spent in writ-

ing these test cases. We could ﬁnd only a few such

projects, of which we selected three from the auto-

motivedomain. We carved out three case studies from

these projects such that each case-study was fairly big

and was representative of a real-world reactive sys-

tem.

Brief description of the three case studies followed

by details of the comparisons are given below.

Case Study 1 was from Body Control component

of an automotive original equipment manufacturer

(OEM). It consisted of a single sub-system named

Integrated-FAT that had three modules – Flasher,

Alarm and Trunk Back Door. Each module was fur-

ther divided into sub-modules and requirements of

sub-module were available. We modelled these re-

quirements in EDT and generated test cases for each

module as well for the sub-system level.

Case Study 2 was from another automotive OEM.

We conducted experiments on four modules – Power

Lift Gate (PLG), Power Closure Decision (PCD) and

Panic Alarm from Body Control component, and also

for Blower Control module from Climate Control

component. For all these modules and a sub-system

(Integrated-PLG+PCD) that merged PLG and PCD,

we generated test cases in MATLAB compatible for-

mat.

Case Study 3 was from Engine Control component

of an automotive tier one supplier. We generated test

cases, in CoverageMaster winAMS (winAMS, ) com-

patible format for three modules – TF Switch Open,

TF Switch Low and RD Switch Operation.

7.1 Comparison of RGRaF and Pure

Random with Fuzzing

To compare RGRaF with pure random test case gen-

eration we executed EDT-Test for both RGRaF and

pure random test case generation on all the modules

and sub-systems of the selected case studies. The

pure random algorithm generated random input se-

quences with a random time assigned to each tuple

in the input sequence. Each input sequence was of a

random length. Once an input sequence was gener-

ated, the rest of the algorithm was similar to RGRaF

and involved execution of automata and retained only

those sequences that covered a new row or a row-

interaction. In both cases a

Sample Size

S of 25000

was used.

Table 6 illustrates the results of these experi-

ments. RGRaF achieved higher row coverage for

seven modules and higher row-interaction coverage

for eight modules as compared to pure random algo-

rithm. Moreover, RGRaF gave 100% row coverage

for seven modules whereas the pure random variant

could not cover all of them. During these experi-

ments, we observed that pure random test generation

was achieving lesser row and row-interaction cov-

erage for larger sub-systems/modules/systems. For

instance, in Alarm module, which had 822 rows,

RGRaF covered672 rows whereas pure random could

cover only 471 rows.

An analysis revealed that RGRaF performed bet-

ter in cases where size of input domain was large and

in cases where to cover a row an input with a speciﬁc

value had to be generated within a speciﬁc time. This

is illustrated by the example in Table 1. To cover Row

1 of this example the Panic Switch has to be pressed

and released twice within three seconds. The proba-

bility of this happening when the generation is purely

random is very low. When we generated test cases for

this example using RGRaF and pure random, RGRaF

needed a sample size of only 6 to cover all rows

and row-interactions, whereas the random algorithm

needed a sample size of 663. We also observed that

pure random with fuzzing algorithm generated many

invalid input combinations and hence were rejected.

Cost-effectiveFunctionalTestingofReactiveSoftware

Table 6: RGRaF and Pure Random.

Case Feature Name No. of No. of Rows Covered No. of Row-Interactions Covered

Study EDT RGRaF Pure RGRaF RGRaF Pure RGRaF

Rows Random Without Random Without

With Fuzz Fuzz With Fuzz Fuzz

Case Alarm 822 672 368 399 921 303 679

Study 1 Trunk Back Door 86 86 86 86 63 63 63

Flasher 146 125 121 120 541 506 503

Integrated-FAT 1052 683 580 579 1339 1176 1176

Case Panic Alarm 262 262 257 261 772 597 735

Study 2 Blower Control 101 101 101 101 280 279 271

PLG 52 51 51 51 301 301 301

PCD 16 16 16 16 5 5 5

PLG + PCD 68 67 65 64 296 290 285

Case TF Switch Open 14 14 14 14 22 22 22

Study 3 TF Switch Low 14 14 14 14 23 23 23

RD Sw Operation 31 31 26 23 46 35 31

Table 7: Summary of EBT Experimental Data.

Case Feature Name

No.

Test Case Generation Using EBT Manual Test

Efforts

Study

EDT

EDT Creation EDT-Test Total Efforts Case Generation

savings

Rows

[person hours] Execution [person hours] [person hours]

by EBT

Case Alarm

822

13 95 mins 14.5 38.5

33%

Study 1 Trunk Back Door

7 2 mins 7 10

Flasher

146

18.5 32 mins 19 12

Integrated-FAT

1052

0 6.5 hours 6.5 Not Available

–

Case Panic Alarm

262

40 5 mins 40 80

44.8%

Study 2 Blower Control

101

5 12 mins 5 18

PLG

12.5 1.5 mins 12.5 6

PCD

1 1 second 1 2

PLG + PCD

0 30 mins 0.5 Not Available

–

Case TF Switch Open

0.75 1 min 0.75 9

62.5%

Study 3 TF Switch Low

1.25 1.75 mins 1.25 9

RD Sw Operation

10 1.5 mins 10 14

7.2 Comparison of RGRaF with

Fuzzing and RGRaF without

Fuzzing

To evaluate the contribution of fuzzing we ran EDT-

Test with and without fuzzing on all the modules. To

Fuzz, at the end of each step of all automata, the next

input was optionally chosen. If the chosen input had a

time less than the nearest time-out τ, then the time of

the input was modiﬁed to a value higher than τ else it

was changed to a time less than τ. For this comparison

too we used a

Sample Size

of 25000.

Table 6 presents the results of the comparison.

Fuzzing at time boundaries helped in seven mod-

ules because these had complex time-based require-

ments. For these modules, RGRaF achieved higher

row and row-interaction coverage as compared to

RGRaF without Fuzzing. This demonstrates that

fuzzing of timings of inputs helps in increasing row

and row-interaction coverage, especially in the pres-

ence of time-based requirements. A thorough analy-

sis revealed that Fuzzing helped in cases where there

were time constraints associated with I/O signals, as

explained in Section 5, because these I/O signals’

time constraints were not taken into account while ex-

panding rows.

7.3 Comparison with Manual Testing

For all the case studies, manually created test cases

with the corresponding efforts data were available to

us. These test cases were created by respective ap-

plication development teams consisting of test engi-

neers and domain experts whereas, the team that cre-

ated EDT speciﬁcations and generated test cases us-

ing EDT-Test did not have automotive domain knowl-

edge.

Table 7 presents a summary of our ﬁndings of a

comparison between EBT and manual test case gen-

eration for effort required. In the case of EBT the

ENASE2015-10thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

effort is split into the person hours taken to specify

requirements in EDT and the time taken by EDT-Test

to generate test cases. We have not compared the two

methods for coverage because no coverage data was

available for the manual test cases. Instead we asked

the domain experts from the project teams to manu-

ally compare and analyse the two sets of test cases.

The ﬁndings reveal that on an average EBT re-

quired 30%–60% less effort for test case creation. In

all the modules, EBT not only generated all the test

cases present in the manual sets, but also generated

many additional interesting scenarios. These addi-

tional scenarios should have been part of the manual

test cases according to the domain experts. In two

modules, Flasher and PLG, EBT test cases needed

more effort compared manual ones primarily because,

these modules required an understanding of complex

domain functionality which the manual test case writ-

ers already had.

Analysis of some key ﬁndings is presented below:

• In Case study 1, for the Trunk Back Door mod-

ule, EBT generated cases covered 40 more row-

interactions and in the case of Flasher it covered

346 more row-interactionsthan the manually writ-

ten test cases.

• In Case Study 1, Integrated-FAT module clearly

showed scalability of our algorithm. It had ap-

proximately 1000 requirements and 98 signals.

Due to the complexity of the requirements, it was

hard for the testers to visualize all the require-

ments’ combinations. Hence, the manually cre-

ated test cases covered only module level require-

ments and interactions between modules were not

adequately covered. EBT test cases subsumed all

the manually created ones and generated many

more valid and necessary requirements combina-

tions as conﬁrmed by the domain experts and the

project team.

• In Case Study 2 EBT test cases, when run on the

model, detected a bug in a post-production sub-

system, Integrated-PLG + PCD. We detected this

bug by specifying properties of the sub-system us-

ing the ErrorFlag in EDT speciﬁcations, as ex-

plained in the Section 6. In case of Panic Alarm

module, three missing requirements were uncov-

ered when tool-generated test cases were executed

on MATLAB models.

• In Case Study 3, tool-generated test cases

achieved 100% statement and decision coverage

when executed on C code using CoverageMaster

winAMS. This is interesting because EBT does

not target code coverage.

The overall analysis of our experiments demon-

strates that, on real-world projects, RGRaF performs

better than pure random and than RGRaF without

fuzzing. It also shows that our EBT is more cost-

effective and generates better test cases than manual

test cases. However, there are some threats to validity

of our experiments and they are described in the next

section.

7.4 Threats to Validity

Below we list some threats to the validity of our ﬁnd-

ings.

• All the systems we selected are from the auto-

motive domain and although the ﬁndings should

carry overto reactivesystems from other domains,

explicit experiments will have to be conducted to

conﬁrm it.

• To judge the quality of the generated test cases

we relied on the judgement of domain experts.

A more scientiﬁc study that determines the num-

ber of defects detected by RGRaF will have to be

conducted to ascertain its effectiveness. However,

getting defect data is not easy and we were not

able to get it for all the systems we considered

making it a difﬁcult experiment to conduct.

• Although we have considered fairly big systems,

modern reactive systems are much bigger. Con-

ducting an experiment on such a large application

will not be possible as it will take several person

months to specify these in EDT. We will therefore

have to see if RGRaF is actually used by testers to

get ﬁndings for large applications.

8 CONCLUSIONS AND FUTURE

WORK

In this paper, we have presented an effective and ef-

ﬁcient automated technique for functional test case

generation. We have also shared the results of various

experiments that were conducted on three industry-

size applications, which highlight the advantages of

using the proposed technique on real-world reactive

systems. From the experiments we can conclude that:

• An appropriate choice of formal notation, com-

bined with automated test case generation, is ca-

pable of reducing testing efforts by around 30%-

60%, while providing better coverage.

• Row and row-interaction coverage criteria results

in several scenarios that are found useful and in-

teresting by test engineers and domain experts.

Cost-effectiveFunctionalTestingofReactiveSoftware

• Combining systematic generation of input se-

quences with a degree of randomness and ﬁnally

fuzzing at time boundaries performs better than

pure random test case generation.

Although the experiments were performed on au-

tomotive domain applications, we expect similar ben-

eﬁts on reactive systems belonging to other domains

as well. The current version of the presented tech-

nique faces scalability issues in generating test cases

for applications having large and complex time-based

requirements, as observed in case study 1 with fea-

tures like Flasher and Alarm. Going forward, we aim

to overcome this issue by adding more intelligence

in the mechanism to generate input sequences. We

also targetto enhance the coverage criteria of the tech-

nique by evaluating the effectiveness of various cov-

erage criteria in ﬁnding bugs in the system under test,

and enable coverage of long sequences of require-

ments’ interaction.

REFERENCES

(1994). DO-178B: Software Considerations in Airborne

Systems and Equipment Certiﬁcation.

Anand, S., Burke, E. K., Chen, T. Y., Clark, J., Cohen,

M. B., Grieskamp, W., Harman, M., Harrold, M. J.,

and Mcminn, P. (2013). An orchestrated survey of

methodologies for automated software test case gen-

eration. J. Syst. Softw., 86(8):1978–2001.

Arcuri, A., Iqbal, M., and Briand, L. (2010). Black-box sys-

tem testing of real-time embedded systems using ran-

dom and search-based testing. In Petrenko, A., Simo,

A., and Maldonado, J., editors, Testing Software and

Systems, volume 6435 of Lecture Notes in Computer

Science, pages 95–110. Springer Berlin Heidelberg.

Bokil, P., Darke, P., Shrotri, U., and Venkatesh, R. (2009).

Automatic test data generation for c programs. In

Secure Software Integration and Reliability Improve-

ment, 2009. SSIRI 2009. Third IEEE International

Conference on, pages 359–368. IEEE.

Bowman, H. and Gomez, R. (2006). Discrete timed au-

tomata. In Concurrency Theory, pages 377–395.

Springer London.

Brat, G., Havelund, K., Park, S., and Visser, W. (2000).

Java pathﬁnder - second generation of a java model

checker. In In Proceedings of the Workshop on Ad-

vances in Veriﬁcation.

Briand, L. (2010). Software veriﬁcation - a scalable, model-

driven, empirically grounded approach. In Tveito, A.,

Bruaset, A. M., and Lysne, O., editors, Simula Re-

search Laboratory, pages 415–442. Springer Berlin

Heidelberg.

Cadar, C., Dunbar, D., and Engler, D. R. (2008). Klee:

Unassisted and automatic generation of high-coverage

tests for complex systems programs. In OSDI, vol-

ume 8, pages 209–224.

Cadar, C., Godefroid, P., Khurshid, S., P˘as˘areanu, C. S.,

Sen, K., Tillmann, N., and Visser, W. (2011). Sym-

bolic execution for software testing in practice: Pre-

liminary assessment. In Proceedings of the 33rd Inter-

national Conference on Software Engineering, ICSE

’11, pages 1066–1071, New York, NY, USA. ACM.

Cadar, C. and Sen, K. (2013). Symbolic execution for soft-

ware testing: Three decades later. Commun. ACM,

56(2):82–90.

Chen, T. Y., Kuo, F.-C., Merkel, R. G., and Tse, T. (2010).

Adaptive random testing: The {ART} of test case di-

versity. Journal of Systems and Software, 83(1):60 –

66. SI: Top Scholars.

Cristi´a, M., Albertengo, P., Frydman, C., Pl¨uss, B., and

Monetti, P. R. (2014). Tool support for the test tem-

plate framework. Software Testing, Veriﬁcation and

Reliability, 24(1):3–37.

Dalal, S. R., Jain, A., Karunanithi, N., Leaton, J., Lott,

C. M., Patton, G. C., and Horowitz, B. M. (1999).

Model-based testing in practice. In Proceedings of the

21st international conference on Software engineer-

ing, pages 285–294. ACM.

Duran, J. W. and Ntafos, S. C. (1984). An evaluation of

random testing. IEEE Trans. Softw. Eng., 10(4):438–

444.

Ferguson, R. and Korel, B. (1996). The chaining approach

for software test data generation. ACM Trans. Softw.

Eng. Methodol., 5(1):63–86.

Hamlet, R. (2002). Random Testing. John Wiley & Sons,

Inc.

Harel, D., Lachover, H., Naamad, A., Pnueli, A., Politi, M.,

Sherman, R., Shtull-Trauring, A., and Trakhtenbrot,

M. (1990). Statemate: A working environment for the

development of complex reactive systems. Software

Engineering, IEEE Transactions on, 16(4):403–414.

Heitmeyer, C., Kirby, J., Labaw, B., and Bharadwaj, R.

(1998). Scr: A toolset for specifying and analyzing

software requirements. In Computer Aided Veriﬁca-

tion, pages 526–531. Springer.

Marinov, D., Andoni, A., Daniliuc, D., Khurshid, S., and

Rinard, M. (2003). An evaluation of exhaustive testing

for data structures. Technical report, MIT Computer

Science and Artiﬁcial Intelligence Laboratory Report

MIT -LCS-TR-921.

Offutt, J., Liu, S., Abdurazik, A., and Ammann, P.

(2003). Generating test data from state-based speciﬁ-

cations. Software Testing, Veriﬁcation and Reliability,

13(1):25–53.

Peranandam, P., Raviram, S., Satpathy, M., Yeolekar, A.,

Gadkari, A., and Ramesh, S. (2012). An inte-

grated test generation tool for enhanced coverage of

simulink/stateﬂow models. In Design, Automation

& Test in Europe Conference & Exhibition (DATE),

2012, pages 308–311. IEEE.

P˘as˘areanu, C. S. and Rungta, N. (2010). Symbolic

pathﬁnder: Symbolic execution of java bytecode. In

Proceedings of the IEEE/ACM International Confer-

ence on Automated Software Engineering, ASE ’10,

pages 179–180, New York, NY, USA. ACM.

ENASE2015-10thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

Raymond, P., Nicollin, X., Halbwachs, N., and Weber, D.

(1998). Automatic testing of reactive systems. In

Real-Time Systems Symposium, 1998. Proceedings.,

The 19th IEEE, pages 200–209. IEEE.

Reactis. Reactis. http://www.reactive-systems.com/model-

based-testing-simulink.html. [Online; accessed 3-

Dec-2014].

Tahat, L. H., Vaysburg, B., Korel, B., and Bader, A. J.

(2001). Requirement-based automated black-box test

generation. In Computer Software and Applications

Conference, 2001. COMPSAC 2001. 25th Annual In-

ternational, pages 489–495. IEEE.

Thyssen, J. and Hummel, B. (2013). Behavioral speciﬁca-

tion of reactive systems using stream-based i/o tables.

Software & Systems Modeling, 12(2):265–283.

Tillmann, N. and De Halleux, J. (2008). Pex–white box test

generation for. net. In Tests and Proofs, pages 134–

153. Springer.

Veanes, M., Campbell, C., Grieskamp, W., Schulte, W.,

Tillmann, N., and Nachmanson, L. (2008). Model-

based testing of object-oriented reactive systems with

spec explorer. In Formal methods and testing, pages

39–76. Springer.

Venkatesh, R., Shrotri, U., Krishna, G. M., and Agrawal,

S. (2014). Edt: a speciﬁcation notation for reactive

systems. In Proceedings of the conference on Design,

Automation & Test in Europe, page 215. European De-

sign and Automation Association.

Wang, J., Li, H., Lv, T., Wang, T., and Li, X. (2014). Func-

tional test generation guided by steady-state probabil-

ities of abstract design. In Proceedings of the confer-

ence on Design, Automation & Test in Europe, page

321. European Design and Automation Association.

winAMS, C. Coveragemaster winams.

http://www.gaio.com/product/dev

tools/pdt07 winams.html.

[Online; accessed 3-Dec-2014].

Cost-effectiveFunctionalTestingofReactiveSoftware