Application of Property-based Testing Tools for Metamorphic Testing

Nasser Alzahrani, Maria Spichkova and James Harland

School of Computing Technologies, RMIT University, Melbourne, Australia

Keywords:

Software Testing, Metamorphic Testing, Property-based Testing, Formal Speciﬁcation.

Abstract:

Metamorphic testing (MT) is a general approach for the testing of a speciﬁc kind of software systems –

so-called “non-testable”, where the “classical” testing approaches are difﬁcult to apply. MT is an effective

approach for addressing the test oracle problem and test case generation problem. The test oracle problem is

when it is difﬁcult to determine the correct expected output of a particular test case or to determine whether

the actual outputs agree with the expected outcomes. The core concept in MT is metamorphic relations (MRs)

which provide formal speciﬁcation of the system under test. One of the challenges in MT is effective test

generation. Property-based testing (PBT) is a testing methodology in which test cases are generated according

to desired properties of the software. In some sense, MT can be seen as a very speciﬁc kind of PBT.

In this paper, we show how to use PBT tools to automate test generation and veriﬁcation of MT. In addition to

automation beneﬁt, the proposed method shows how to combine general PBT with MT under the same testing

framework.

1 INTRODUCTION

Formal speciﬁcation is an essential tool for managing

the complexity of specifying and verifying the design

and the development of critical software systems. The

formal approach removes ambiguity, improves preci-

sion, and used to verify that the requirements are ful-

ﬁlled. Appel et al. summarised a number of desired

qualities that the speciﬁcation should have in order

to be effective, see (Appel et al., 2017). Firstly, the

speciﬁcation has to be formal, where the speciﬁca-

tion should be mathematically precise. It should be

rich, i.e. precisely expressing the intended behaviour

of the system (we could reformulate this quality as

completeness). The speciﬁcation has to be two-sided

where the speciﬁcation is exercised by both imple-

mentations and clients. Finally, the speciﬁcation has

to be live where it is automatically checked against

actual code rather than some abstract model.

Formal languages like TLA+ (Lamport, 2002), or

Alloy (Jackson, 2012) are generally concerned with

specifying systems against some models rather than

the actual code under development. On the other

hand, property-based testing (PBT) facilitates the use

of formal speciﬁcations on actual code which help in

ﬁnding subtle faults on live running systems (Hughes

et al., 2016). One of the main attributes of PBT is that

it can automatically generate tests to cover edge cases

that are not so obvious to identify manually. Two

main elements of PBT approach make this possible:

(1) a random test generator, responsible for generat-

ing random values in a controlled way, and (2) a so-

called shrinker, minimizing the number of the gener-

ated tests cases to allow for easier debugging.

Metamorphic Testing (MT) is a special PBT tech-

nique elaborated for the cases where it’s complicated

to specify “classical” test cases having input and out-

put data ﬂows - in some cases, it’s difﬁcult to identify

what could be the correct output for each particular

input. For “classical” testing we need to have a so-

called an oracle that can determine whether or not the

output is correct wrt. the provided input and this deci-

sion should be taken in a reasonable amount of time,

see (Weyuker, 1982). In the case an oracle cannot be

created, the system are typically called non-testable

(or untestable), but MT can provide an effective so-

lution to the oracle problem using Metamorphic Re-

lations (MR), see e.g., (Segura et al., 2020). MT was

initially introduced in (Chan et al., 1998) in the do-

main of numerical analysis. Since then, it has devel-

oped and been applied in many application areas, such

as compilers, medical systems, embedded applica-

tions, search engines, service computing, simulation

software, image processing systems, machine learn-

ing software and optimizing software (Chen, 2015).

PBT tools for testing functional programs were

ﬁrst introduced in the Haskell programming language

Alzahrani, N., Spichkova, M. and Harland, J.

Application of Property-based Testing Tools for Metamorphic Testing.

DOI: 10.5220/0011101700003176

In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2022), pages 553-560

ISBN: 978-989-758-568-5; ISSN: 2184-4895

553

by (Claessen and Hughes, 2000), where QuickCheck,

a library for random testing of program properties,

was implemented. Since then, many libraries have

been developed following this approach for different

programming languages. The main components of

QuickCheck are the generator of random values, the

shrinker, and the checker which runs these random

values with pre-selected functions. In our work, we

extend the generator and the shrinker in order to auto-

mate MT test generation and veriﬁcation.

The idea of generating random tests is not new, see

for example (Chen et al., 2010). However, using PBT

tools such QuickCheck has many advantages. First of

all, PBT has many tools for automating and control-

ling the generation of random test cases. Secondly,

these tools allow controlled strategies for generating

random data of complicated data types, i.e., it is pos-

sible to conﬁgure how the random values distribute

over the input domain. Although (Chen et al., 2018)

argues that one of MT’s main advantages is the ease

of test case automation, the MT automation is a com-

plex task, when using PBT tools such as QuickCheck

as a tool to test MR relations.

Contributions: The main contribution of this pa-

per is a systematic approach in which we utilize the

random generation of test cases and automatic test-

ing capabilities of PBT tools to automate some steps

of metamorphic testing: More precisely, we auto-

mate test case generation and test case veriﬁcation by

extending QuickCheck’s shrinker and generator with

our customized shrinker and generator. These variants

are simpler and more amendable to customization in

the context of MT.

2 BACKGROUND:

PROPERTY-BASED TESTING

Property-based testing (PBT) is an approach to test-

ing software by deﬁning general speciﬁcations and

properties that must hold for all the executions of ran-

domly generated test cases. The inputs to these test

cases are random. If these properties do not hold,

a minimized failing tests are reported. In PBT, test

cases are generated randomly according to universally

quantiﬁed properties. Examples of quantiﬁed proper-

ties include; validity checks, postconditions, model-

based properties, inductive properties, and metamor-

phic testing.

In validity checks, one writes functions to check

some invariants of the system under test or the

datatypes used in the system. This process also in-

cludes writing properties to check test-case generator

and test-case shrinker both produce valid results. The

last step is to write property for the functions under

test which performs a single random call and checks

that the return value is valid. For instance, when test-

ing the functionality of inserting a value into a tree

datatype, we demand that all the keys in a left subtree

must be less than the key in the node, and all the keys

in the right subtree must be greater.

A postcondition is a property that should be true

after a call. One can come up with such property by

asking what would be the expected state of the system

after calling some function. A postcondition usually

tests one function after calling this function with some

random argument and then checking an expected re-

lationship between the result and its argument. For

Instance, after calling insert (which inserts a value in

some tree datatype), we should be able to ﬁnd the key

inserted without changing previously inserted keys.

A model-based property is used to test some func-

tion by making a single call and comparing its result

with the result of some abstract operation. The model

refers to the abstraction functions which map the real

arguments and results to abstract values.

Inductive properties are the properties that one can

use induction to assert that the only function that can

pass the tests is the correct one. This is usually done

by relating a call of the function under test to calls

with smaller arguments. The set of inductive prop-

erties covering all possible cases allows the testing

of the base case and induction steps of an inductive

proof-of-correctness.

The canonical example in the literature to explain

property based testing is the reversing a list. One

property that should hold for all lists is that reversing

a list x twice returns the original list:

reverse (reverse x) = x

In the case of functional oracle-based testing, we

would need to identify the equivalence partitions (the

sets of inputs that have to be handled equivalently as

they have to provide the same type of output but with

possibly different values), and then specify at least

one test case for each partition, or use boundary val-

ues in partitions that are ranges. Thus, for reverse

function this could be an empty list [] and some non-

empty list, e.g, [1, 2, 3], i.e., our test cases would be

reverse (reverse []) = []

reverse (reverse [1, 2, 3]) = [1, 2, 3]

In contrast to this approach, in PBT the checks take

place on the return value of the function under test,

instead of checking values “hard-coded” in the test

cases such as [] and [1, 2, 3]. That is, the input to

the function under test would be automatically gener-

ated and the library chooses random values for testing

rather than the tester specifying particular values. For

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

554

example, for the reverse function, one could call the

reverse function twice and expect the original list to

be returned. The library generates many random test

cases and reports a failure if it ﬁnds a counterexample.

In the case of QuickCheck, the property to be

passed to the library is reverse (reverse x) = x, the

library will generate the random values for x, and will

report a failure if it ﬁnds a counterexample.

It is worth noting that there are some anti-patterns

that could emerge while writing property-based tests.

Because it is sometimes difﬁcult to think about a

property, practitioners usually fall into the trap of du-

plicating the implementation of the code in tests. The

literature on PBT has many other examples of such

anti-patterns and how to avoid them. One solution to

avoid this problem is MT.

MT is a successful method in solving the oracle

problem in software testing. The core idea is this: in

the cases when it’s hard to specify in advance what

exactly should be the output of a function, we may be

able to observe the change to the output when chang-

ing the input. That is, valuable information about the

function would be whether and how its input changes

when we change its input. For instance, even if the

expected result of a function such as inserting a key

into a tree is difﬁcult to predict, we may still be able

to express an expected relationship between this re-

sult and the result of some other call. In this case, if

we insert an additional key into t before calling insert

k v, then we expect the additional key to appear in the

result also.

Figure 1: Testing compilers using Metamorphic testing.

Metamorphic Relations (MRs) are a central ele-

ment of MT. The MRs are properties of the func-

tion under test. An important (and usually missed)

attribute of MR is that they relate multiple inputs to

their expected outputs. When implementing MT, we

ﬁrst generate source test cases. Then use the MR to

generate new input. This new input is then used to

compare the output from the ﬁrst set of the tests with

the last ones.

In our proposed method, we show how to auto-

mate the test case generation and veriﬁcation of MT

using QuickCheck. Our method can be generalized to

work with other PBT tools other than QuickCheck as

our method extends features of QuickCheck that are

common in other PBT tools.

Figure 1 illustrates MT in the context of testing

a compiler. There are two paths that are expected to

lead to the same output:

(1) We start with Program 1, it is compiled to obtain

the corresponding executable code Executable 1.

Then, we run Executable 1 with some input data

x to get an output.

(2) We modify Program 1 to a semantically equiv-

alent but syntactically different Program 2 (e.g.,

by unrolling a loop or removing a comment etc.),

then apply the same compilation method to get the

corresponding executable code Executable 2. Af-

ter that, we run Executable 2 with the same input

data x to get an output.

When both outputs are obtained, we compare them: if

they are not exactly the same, the compiler is faulty.

One of the beneﬁts of using PBT for MT is the

rich sets of tools available. For instance, PBT tools

allow the creation of strategies for generating random

data for complicated data types with minimal setup.

In MT, the operations and data types are usually more

complicated than simple data types such as integers

or lists. Existing approaches for creating these data

types are ad-hoc (Chen et al., 2015). In these other

approaches, one has to do almost the same setup work

for every kind of data type in order to generate the ran-

dom values. PBT tools, on the other hand, are more

general and cover more data types without the need

to duplicate the code for every kind of data type or

model. In addition, PBT requires less code than these

other approaches with more control over the distribu-

tion of the test cases space. These approaches evenly

distribute the test cases over the input space.

Creating a random BST using any PBT library re-

quires less setup work. One only required to deﬁne

the data type and pass it to the library. However, since

we are using these PBT tools for MT, some more

setup and customization are required.

3 RELATED WORK

The effectiveness of MT in alleviating the oracle

problem has allowed it to appear in many different

application domains. However, many of these appli-

cations do not provide a systematic way to automate

some parts MT.

The automation of MT was ﬁrst introduced in

Application of Property-based Testing Tools for Metamorphic Testing

555

(Gotlieb and Botella, 2003) where they proposed a

framework that utilizes Constraint Logic Program-

ming techniques to ﬁnd test data that violate a

given metamorphic relation. However, they require

the usage of special metamorphic-relations, such as

permutation-based relations, to speed up the search

among the possible test data.

There are few other efforts to automate MT steps.

For instance, Zhu created a tool for automating meta-

morphic testing for Java unit tests, see (Zhu, 2015).

This method is speciﬁc to Java unit tests. Our method

is more general and can be applied in any program-

ming language which has library support for PBT.

An automatic MT framework for compilers is pro-

posed in (Tao et al., 2010). Their approach in generat-

ing the test cases is similar to the approach presented

in this paper. However, their approach is tailored to

the domain of testing compilers, where we propose a

generally applicable solution.

In (Liu et al., 2012), the authors propose a method

that allows the composition of new metamorphic re-

lations based on previously deﬁned ones, their case

study showed that new metamorphic relations can be

constructed by compositing some existing metamor-

phic relations. They assert that the new derived meta-

morphic relation delivers better metamorphic testing

than the original metamorphic relation as well as re-

duces the number of test cases.

Work related to verifying authenticated data struc-

tures (ADS) is presented in (Miller et al., 2014).

The approach of Miller provides a semantics for a

programming language LambdaAuth, which supports

ADS. This approach provides many beneﬁts, how-

ever, it might be hard to convince practitioners to use

it which is less likely to be widely spread among en-

gineers and is difﬁcult to have an impact. In (Brun

and Traytel, 2019), the authors used Isabelle proof

assistant to formally verify LambdaAuth. They also

assert that they found several mistakes in the seman-

tics of lambdaAuth. In our work, we use a mainstream

programming language (Haskell) to design such ADS

and verify our implementation of these ADS using

PBT and MT.

4 PROPOSED APPROACH

In this section, we present a systemic method to

use PBT tools to test MRs. We speciﬁcally choose

QuickCheck to illustrate the proposed approach. Our

method is general and can be implemented using

other PBT libraries as well. Our proposed method

consists of three aspects. First, we develop a new gen-

erator for generating test cases for MT. Second, we

develop a new test case shrinker. Finally, we use the

newly designed generator and the shrinker instead of

QuickCheck’s default generator and shrinker. In the

rest of the section, we present the core features and

steps of our approach, and then discuss the advantages

of this approach.

4.1 Core Features and Steps

QuickCheck is a library for random testing of pro-

gram properties. The programmer provides a spec-

iﬁcation of the program, in the form of properties

that functions should satisfy. The library then gener-

ates a large number of random test cases and checks

that the property holds. Speciﬁcations are expressed

in Haskell. The Haskell programming language also

provides functions to deﬁne properties, observe the

distribution of test data, and deﬁne test data genera-

tors, which is an important advantage for system spec-

iﬁcation.

When using PBT tools such as QuickCheck there

is some expected setup that needs to be done before

deﬁning the properties. One such setup is shrinking.

The main objective of shrinking is to produce a mini-

mum failing test cases which facilitate the debugging

of the program. Another required setup is the random

values generator which can be conﬁgured depending

on the scenario. More importantly, the generator and

shrinker need to be designed to work together when

testing MRs. Otherwise, if we use the default test

case generator and shrinker, the checkers might miss

some test cases or generate invalid tests. The pro-

posed method ensures that does not happen.

The steps needed to systematically test MR rela-

tions using PBT tools are as follows:

(1) Specify an MR property

(2) Customize the test case generator

(3) Customize the test case shrinker

(4) Run the checker

Our main contributions are in Steps 2 and 3. Let us

now discuss these steps and our solution in more de-

tail.

Step 1: Specify an MR Property. Specifying

suitable MRs is key in MT. Although identifying MRs

is not a difﬁcult task, this is typically a manual proce-

dure, see (Mayer and Guderlei, 2006), and we don’t

intend to automate it within our approach. However,

there have been some approaches intending to auto-

mate this step (Chen et al., 2016), and in our future

work, we consider combining our method with these

approaches.

Step 2: Customize the Test Case Generator.

The ﬁrst thing that all PBT libraries do is to gener-

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

556

ate random inputs for the functions under test. In the

PBT literature, this is known as generation. For every

type, there is an associated random test generator.

For example, to generate a list of values, one has

to use the generator together with two parameters.

The ﬁrst parameter is the number of elements in the

list. The second parameter is the size which depends

on the type of values being generated and the con-

text. For example, the size can be the maximum value

of Integer type, the maximum length of a list, or the

depth of a binary search tree.

For MT, the generator has to be customized to pro-

duce valid test cases. In Section 5 we will present an

example of BST, where the values should not be gen-

erated uniformly.

Step 3: Customize the Test Case Shrinker. Al-

most all PBT libraries and frameworks have a mech-

anism to reduce the set of generated test cases that

fails a property to a minimum number of failing test

cases that is necessary for the debugging process, as

an unnecessary large number of the failed tests cases

(where many cases might refer to the same error) will

make debugging process more complicated and time-

consuming. This mechanism is known in the litera-

ture as shrinking.

Step 4: Run the Checker. In this step, we pass

the MR property, which was speciﬁed in Step 1 to the

checker. If the function under test does not satisfy the

MR, the PBT library will report the failing test case

that violated the MR property.

4.2 Advantages of the Approach

As mentioned in Step 3, we cannot use the de-

fault QuickCheck’s shrinker for testing MR proper-

ties. Thus, we modify QuickCheck generator with our

designed generator. The default QuickCheck’s gener-

ator is based on (Claessen and Pałka, 2013) which is

found to require some efforts to use in MT. On this

basis, we design a modiﬁed generator and instruct

QuickCheck to use it instead of its default one. The

advantage of the proposed approach is that the testing

of MT can together with other properties under the

same testing framework. Thus, the same shrinker and

veriﬁer can be used for both MT and general proper-

ties to test.

Our version of shrinker has the following features:

• The values are enumerated by depth instead of

size and for this reason, the number of values

tends to grow quickly as our shrinker explores fur-

ther test cases.

• The modiﬁed shrinker exploits laziness (Hudak

et al., 2007). That is it uses partially deﬁned test

values. If a property returns a Boolean result for a

partially deﬁned value, the shrinker does not enu-

merate more versions of this value. The beneﬁt

is that the checker will stop as soon it encounters

the ﬁrst failing test which improves the speed of

the checker.

One of the differences between QuickCheck

shrinker and our shrinker is that our method of shrink-

ing is integrated into generation. It is worth noting

that almost all PBT tools in many different program-

ming languages use a similar shrinking methodology

as the one used in QuickCheck. The main problem

with this approach is that shrinking is deﬁned based

on datatypes. This constraints the ways in which val-

ues are shrinked. That is, there is only one way to de-

ﬁne shrinking for the same data type without taking

into consideration the way it was generated. On the

other hand, our shrinker is composed with the gener-

ator and the generator controls how the values it pro-

duces shrinks.

Our approach to shrinking has many beneﬁts. For

example, shrinking happens even if there is no deﬁned

shrinker on the datatype. This allows the shrinker

to share the same variants as the generator and, at

the same time, reduce the effort needed to write a

separate shrinker for each datatype involved in the

test. Another beneﬁt of our shrinker is that failure

reported is more revealing than the shrinker deﬁned

as datatype. For instance, in QuickCheck, errors are

sometimes shrinked to different errors, which is un-

desirable since the error we expect is being reduced

to another error we do not care about. To mitigate

this problem, one has to duplicate the constraint logic

both in the generator and in the shrinker. In our im-

plemented shrinker, the main idea is that we shrink the

outputs by shrinking the inputs. This help in ﬁnding

possible more shrinks based on that representation.

Our designed shrinker covers the range between

the smallest value of some type and increases the

value until the test fails. It repeats this process until

the test passes. In this case, it reports the largest value

from the previous step as the smallest test case that

fails the property, i.e., the boundary values. For exam-

ple, suppose that we are testing whether the value of

variable x of type Integer is less than 77 (x < 77). Sup-

pose that the ﬁrst random value that is generated (by

the generator) is 90 which will cause the test to fail.

Then, the shrinker will generate new random values

and in random steps ranging from zero to 89. Now,

maybe the new failing value is 89. The shrinker will

repeat the same process again for the values between

zero and 88. The shrinking repeats until the random

value is 78 after which the smallest failing test value

is 77. After which the shrinker stops.

The way we ensure the validity of the generated

Application of Property-based Testing Tools for Metamorphic Testing

557

(then shrinked) test cases is by adding a precondition.

The main objective of a precondition is to inform the

generator not to generate invalid test cases using the

valid function that we have to implement. The valid

function checks the property before passing it to the

generator. The generator will still generate random

test cases but they will not be executed. The valid

function depends on the context. For instance, in the

context of Binary Search Trees, the valid function

checks that the keys in the left subtree are less than

the key at the root node and all the keys on the right

subtree is greater than the key at the root node.

5 EXAMPLE: BINARY SEARCH

TREE

As a running example to explain the proposed method

of applying the PBT tool QuickCheck for MT, we

consider the operations of inserting into and deleting

from a binary search tree (BST). This example not

trivial but is simple enough to explain the proposed

method. Another reason for choosing BST is that the

same approach can be used for testing more elaborate

kinds of trees such as Merkle trees (Merkle, 1987), see

also our discussion future works in Section 6. To eval-

uate the proposed approach, we also introduce faulty

variants of the operations under test, insert and delete.

A BST is a type of data structure for storing val-

ues such as integers in an organized way. The internal

nodes of BST store a key greater than all the keys in

the node’s left subtree and less than those in its right

subtree. BST are usually used for fast lookup, in-

sert and delete of value items. Testing insert function

which inserts a key and value in a binary search tree

is difﬁcult. Using MT approach, we can change the

input using a new key and value and then observe the

relationship to the original call to insert function. MT

allows more numbers of properties to be tested. Us-

ing the example of trees, we can use insert with delete

and test the output. Inversely, we can use delete with

insert and test the output. This is true for any combi-

nation of the operations under test.

One possible mistake when testing properties of

the insertion and deletion of BST, is that the test code

is the same as the implementation. Therefore, if there

is a bug in the implementation, it will also be in the

tests which renders the tests useless. One solution to

this problem is to get an appropriate metamorphic re-

lation to test the intended behaviour. This way we can

verify the correctness of the implementation without

a expecting concrete output.

Figure 2 shows the MT of inserting keys and val-

ues into a BST. Starting with the Tree at the top, we

insert some key k1 and some value v1 to get some

modiﬁed tree. Then, another key k2 and value v2 is

inserted into the modiﬁed tree to get the out put tree

(whatever it is). We repeat the same operation to the

original tree but we change the inputs to insert. That

is, we insert k2 and v2 followed by inserting k1 and v1

to get the out put tree. The metamorphic relation as-

serts that the two out put trees should be the same oth-

erwise insert is faulty. The notion of quality between

two trees depends on the operations under test. For

insert, we can just assert that if the keys and values

in both trees are the same the trees are semantically

equivalent.

Figure 2: MR property: Tree 1 and Tree 2 are semantically

equivalent.

To test the effectiveness of the proposed method,

we intentionally introduce faulty variants of insert

and delete and test them in a similar way. The faulty

variants are:

Fault 1. insert removes the original tree and re-

turns just the newly inserted value in a single

node.

Fault 2. delete does not build the tree above the

key being deleted. That is, it only returns the rest

of the tree instead.

Starting with the declaration of the data type, a BST

for some key k and value v, is either a Lea f or a

Branch containing left subtree, key k, value v and the

right subtree, respectively.

Step 1: Specify an MR Property. This is the

property that we wish the PBT tool to check. Before

we can choose the MR, we need to pick the functions

that we want to test. For this example we choose in-

sert and delete. insert takes key k, value v and the tree

and returns the modiﬁed tree after the insertion. The

delete function takes key k and value v and returns the

modiﬁed tree after the deletion.

Since we want to test two distinct functions (in-

sert and delete), there are, at least, two MRs that we

identify. The MRs that we want to check are the fol-

lowing:

• MR 1: Inserting into the tree after modifying it

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

558

with a delete operation should be the same as do-

ing the deleting before inserting

• MR 2: Deleting from the tree after modifying it

with inserting, should be the same as doing the

inserting before deleting

Table 1 shows the precise Metamorphic Relations

(properties) for inserting and deleting in the context

of a binary search tree. The ﬁrst set shows the inser-

tion of key k and value v into the tree modiﬁed by the

deletion of key k

from the original tree. The MR as-

serts this should be equivalent to deleting k from the

tree modiﬁed by insertion key k and value v into the

same tree. The second is set of operations shows the

deletion of key k into the tree t modiﬁed by the inser-

tion of key k

and value v

. Again, the MR asserts this

should be equivalent to doing the deletion of k ﬁrst,

then, inserting k

and v

This demonstrates how effective MT can be for

generating properties. That is, if the number of opera-

tions is n, the number of derived operations is O(n

see also (Liu et al., 2013).

Step 2: Customize the Test Case Generator. We

use the generator deﬁned in section 4 which generate

random trees by creating a random list of keys and

a random list of values and inserting them into the

empty tree using insert function. We also have to de-

ﬁne valid function which ensures the following:

• All the keys in the left subtree is less than the key

at the root node

• All the keys in the right subtree is greater than the

key at the root node

Step 3: Customize the Test Case Shrinker. Us-

ing the default shrink function, shrink might include

invalid trees. The library may shrink the test case

before reporting it. Or It may produce a valid tree

with an invalid shrinking. Therefore, we must add the

precondition discussed in 4 to ensure only valid trees

participates the shrinking process. This precondition

holds for any randomly generated test. The precondi-

tion is just the valid function deﬁned in Step 2.

Step 4: Run the Checker. The checker is just a

function that takes any property and returns a Boolean

value. We pass the MRs relations to the checker func-

tion then the library will generate many test cases.

The number can be set when conﬁguring the checker.

For the correct variants of insert and delete, the

PBT library reports a 100 passing tests. The number

of the generated test case can also be conﬁgured to

increase the assurance of the test. For the faulty vari-

ants, the tests report failing of tests after 100 test cases

and generate the minimum failing examples for both

of the introduced faults. However, one interesting ob-

servation is that fault 1 is missed by the checker when

we don’t check both MR at the same time. There-

fore, it is recommended to include as many MRs as

necessary in a single test to specify properties of the

function under test.

One misconception of MT is that any property can

be considered as an MR, see (Chen et al., 2018). It

is true that MR is a property but the inverse is not

true. Therefore, when using PBT tools to test MR

properties, we almost always use two operations, at

least, in a single metamorphic test. More precisely,

when using PBT tools to test metamorphic relations,

we should either change the input to same function as

shown in Figure 2 or use two distinct operations as

shown in Table 1.

6 CONCLUSIONS

In this paper, we presented a systemic method for

using PBT tools to automate the test generation and

veriﬁcation of metamorphic relations. Many existing

efforts for automating MT are domain-speciﬁc, i.e.

the automation of the MT steps is elaborated to work

only for speciﬁc application domains such as web ser-

vices and speciﬁc programming languages. The work

presented in this paper is more general and can be

used in many different scenarios where MT is needed.

Its advantage is in using authenticated data structures

(ADS) to solve the issue.

PBT tools are generally used for testing univer-

sal properties other than MR such as postconditions,

inductive properties, and model-based properties. In

this paper, we have shown a method to created a spe-

cialized test-case generator and test-case shrinker to

automate some parts of MT steps. We showed that

The default shrinker is not ideal for testing some kinds

of MR as it is difﬁcult to compose previously de-

ﬁned MR to create new MRs. In addition, the default

shrinker may report confusing failure cases since it is

based on deﬁning shrinking on datatypes which forces

the user to add additional duplicated code. However,

this workaround is not needed with our shrinker since

it does not have to be deﬁned on the datatypes and

there would no need to encode the invariants into the

shrinkers, which requires more effort and could be

difﬁcult if the scenario is more complicated. We have

implemented our method using one particular PBT

tool QuickCheck. However, our method is general and

can be implemented using any other PBT tool.

We presented our method using the Binary search

tree example. The two operations we selected were

insert and delete and we introduced faulty versions of

these two operations. We showed it is recommended

to use as many MRs as necessary to specify the oper-

Application of Property-based Testing Tools for Metamorphic Testing

559

Table 1: Some MR properties for a BST insert and delete.

op 1 op 2 Metamorphic properties

insert delete

insert k v (delete k’ t)= delete k’ (insert k v t)

delete insert

delete k (insert k’ v’ t) = insert k’ v’ (delete k t)

ations under test otherwise the test might miss some

subtle faults.

For future work, we plan to combine the proposed

approach to our earlier work presented in (Alzahrani

et al., 2017), where we used PBT tools to test mod-

els generated by formal methods tools such as TLA+

(Lamport, 1994).

REFERENCES

Alzahrani, N., Spichkova, M., and Blech, J. O. (2017).

From temporal models to property-based testing. In

Evaluation of Novel Approaches to Software Engi-

neering, pages 241–246. SciTePress.

Appel, A. W., Beringer, L., Chlipala, A., Pierce, B. C.,

Shao, Z., Weirich, S., and Zdancewic, S. (2017). Po-

sition paper: the science of deep speciﬁcation. Philos.

Trans. R. Soc. A., 375(2104).

Brun, M. and Traytel, D. (2019). Generic authenticated data

structures, formally. In Interactive Theorem Proving.

Chan, F., Chen, T., Cheung, S. C., Lau, M., and Yiu, S.

(1998). Application of metamorphic testing in numer-

ical analysis. In Int. Conf. on Software Engineering.

Chen, T. Y. (2015). Metamorphic testing: A simple method

for alleviating the test oracle problem. In Automation

of Software Test, pages 53–54. IEEE.

Chen, T. Y., Kuo, F.-C., Liu, H., Poon, P.-L., Towey, D.,

Tse, T. H., and Zhou, Z. Q. (2018). Metamorphic Test-

ing: A Review of Challenges and Opportunities. ACM

Computing Surveys, 51(1):1–27.

Chen, T. Y., Kuo, F.-C., Merkel, R. G., and Tse, T. (2010).

Adaptive random testing: The art of test case diversity.

Journal of Systems and Software, 83(1):60–66.

Chen, T. Y., Kuo, F.-C., Towey, D., and Zhou, Z. Q. (2015).

A revisit of three studies related to random testing.

Science China Information Sciences, 58(5):1–9.

Chen, T. Y., Poon, P.-L., and Xie, X. (2016). Metric: Meta-

morphic relation identiﬁcation based on the category-

choice framework. Journal of Systems and Software,

116:177–190.

Claessen, K. and Hughes, J. (2000). QuickCheck: A

lightweight tool for random testing of Haskell pro-

grams. In Functional Programming, pages 268–279.

Claessen, K. and Pałka, M. H. (2013). Splittable pseudoran-

dom number generators using cryptographic hashing.

ACM SIGPLAN Notices, 48(12):47–58.

Gotlieb, A. and Botella, B. (2003). Automated metamor-

phic testing. In Computer Software and Applications

Conference, pages 34–40. IEEE.

Hudak, P., Hughes, J., Peyton Jones, S., and Wadler, P.

(2007). A history of haskell: being lazy with class.

In History of programming languages, pages 12–1.

Hughes, J., Pierce, B. C., Arts, T., and Norell, U. (2016).

Mysteries of dropbox: property-based testing of a dis-

tributed synchronization service. In Software Testing,

Veriﬁcation and Validation, pages 135–145. IEEE.

Jackson, D. (2012). Software Abstractions: logic, language,

and analysis. MIT press.

Lamport, L. (1994). The temporal logic of actions.

ACM Tran. on Programming Languages and Systems,

16(3):872–923.

Lamport, L. (2002). Specifying systems, volume 388.

Addison-Wesley Boston.

Liu, H., Kuo, F.-C., Towey, D., and Chen, T. Y. (2013). How

effectively does metamorphic testing alleviate the or-

acle problem? IEEE Transactions on Software Engi-

neering, 40(1):4–22.

Liu, H., Liu, X., and Chen, T. Y. (2012). A new method for

constructing metamorphic relations. In Quality Soft-

ware, pages 59–68. IEEE.

Mayer, J. and Guderlei, R. (2006). An empirical study

on the selection of good metamorphic relations. In

Computer Software and Applications Conference, vol-

ume 1, pages 475–484. IEEE.

Merkle, R. C. (1987). A digital signature based on a con-

ventional encryption function. In Theory and appli-

cation of cryptographic techniques, pages 369–378.

Springer.

Miller, A., Hicks, M., Katz, J., and Shi, E. (2014). Authen-

ticated data structures, generically. ACM SIGPLAN

Notices, 49(1):411–423.

Segura, S., Towey, D., Zhou, Z. Q., and Chen, T. Y. (2020).

Metamorphic testing: Testing the untestable. IEEE

Software, 37(3):46–53.

Tao, Q., Wu, W., Zhao, C., and Shen, W. (2010). An au-

tomatic testing approach for compiler based on meta-

morphic testing technique. In Asia Paciﬁc Software

Engineering Conference, pages 270–279. IEEE.

Weyuker, E. J. (1982). On testing non-testable programs.

The Computer Journal, 25(4):465–470.

Zhu, H. (2015). Jfuzz: A tool for automated java unit test-

ing based on data mutation and metamorphic testing

methods. In Trustworthy Systems and Their Applica-

tions, pages 8–15. IEEE.

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

560