Model-based Performance Testing of Web Services using Probabilistic

Timed Automata

Fredrik Abbors, Tanwir Ahmad, Dragos Truscan and Ivan Porres

Department of Information Technologies,

Abo Akademi University, Joukahaisenkatu 3-5 A, Turku, Finland

Keywords:

Performance Testing, Performance Monitoring, Load Generation, Probabilistic Timed Automata.

Abstract:

In this paper, we present an approach for performance testing of web services in which we use abstract models,

speciﬁed using Probabilistic Timed Automata, to describe how users interact with the system. The models are

used in the load generation process to generate load against the system. The abstract actions from the model

are sent in real-time to the system via an adapter. Different performance indicators are monitored during the

test session and reported at the end of the process. We exemplify with an auction web service case study on

which we have run several experiments.

1 INTRODUCTION

Today, we see advancements in cloud computing and

more and more software applications being adapted

to a cloud environment. Applications deployed in the

cloud are delivered to users as a service, without the

need for the users to install anything. This means that

most of the processing is done on the server side and

this puts a frightful amount of stress on the back-end

of the system. Performance characteristics such as

throughput, response times, and resource utilization

are crucial quality attributes of such applications and

systems.

The purpose of performance testing is to deter-

mine how well the system performs in terms of re-

sponsiveness, stability, and resource utilization under

a particular synthetic workload in a controlled en-

vironment. The synthetic workload (Ferrari, 1984)

should mimic the expected workload (Shaw, 2000) as

closely as possible, once the system is in operational

use, otherwise it is not possible to draw any reliable

conclusions from the test results.

Performance tests are typically implemented as

usage scenarios that are either manually scripted (e.g.,

using httperf or JMeter) or pre-recorded (e.g., using

Selenium (SeleniumHQ, 2012) in the case of web ap-

plications). The usage scenarios are then executed

concurrently against the system under test. A ma-

jor drawback with this approach is that the manually

coded scripts and pre-recorded scenarios seldom rep-

resent real-life trafﬁc and that certain combinations of

user inputs may remain untested. Repeating the same

script over and over may lead to unrealistic results

because of caching and other operating system opti-

mization mechanisms. Performance testing is done

efﬁciently when it is executed in an iterative man-

ner and uses techniques that simulate real life work

load as closely as possible (Menasce, 2002). This

means that load is incrementally increased until a cer-

tain threshold (saturation) is reached, beyond which

the performance of the system begins to degrade.

In this paper, we propose a model-based approach

to evaluate the performance of a system by incremen-

tally exercising different kinds of loads on the system.

The main contributions of this work are:

• we use abstract models, speciﬁed as Probabilistic

Timed Automata (PTA) to model the user proﬁles,

including the actions or sequences of actions the

user can send, the probabilistic distribution of the

actions, and individual think time for each action,

• the load is generated in real-time from these mod-

els and sent to the system under test (SUT) via an

adapter which converts abstract actions into con-

crete interactions with the SUT and manages data

dependencies between different actions;

The rest of the paper is structured as follows: In

Section 2 we give an overview of the work related

to our approach. Section 3 presents our model-based

testing process, while Section 4 presents an auction

web service case study and an experiment using our

approach. Finally, in Section 5, we present our con-

clusions and we discuss future work.

Abbors F., Ahmad T., Truscan D. and Porres I..

Model-based Performance Testing of Web Services using Probabilistic Timed Automata.

DOI: 10.5220/0004372000990104

In Proceedings of the 9th International Conference on Web Information Systems and Technologies (WEBIST-2013), pages 99-104

ISBN: 978-989-8565-54-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

2 RELATED WORK

There is already a large body of work on workload

characterization and a more limited one on load gen-

eration from performance models. In the following,

we brieﬂy enumerate several works that are closer to

our approach.

Barna et al., (Barna et al., 2011) present a model-

based testing approach to test the performance of a

transactional system. The authors make use of an it-

erative approach to ﬁnd the workload stress vectors of

a system. An adaptive framework will then drive the

system along these stress vectors until a performance

stress goal is reached. They use a system model, rep-

resented as a two-layered queuing network, and they

use analytical techniques to ﬁnd a workload mix that

will saturate a speciﬁc system resource. Their ap-

proach differs from ours in the sense that they use a

model of the system instead of testing against a real

implementation of a system.

Other related approaches can be found in (Shams

et al., 2006) and (Ruffo et al., 2004). In the former,

the authors have focused on generating valid traces

or a synthetic workload for inter-dependent requests

typically found in sessions when using web applica-

tions. They describe an application model that cap-

tures the dependencies for such systems by using Ex-

tended Finite State Machines (EFSMs). Combined

with a workload model that describes session inter-

arrival rates and parameter distributions, their tool

SWAT outputs valid session traces that are executed

using a modiﬁed version of httperf (Mosberger and

Jin, 1998). The main use of the tool is to perform a

sensitivity analysis on the system when different pa-

rameters in the workload are changed, e.g., session

length, distribution, think time, etc. In the latter, the

authors suggest a tool that generates representative

user behavior traces from a set of Customer Behav-

ior Model Graphs (CBMG). The CBMG are obtained

from execution logs of the system and they use a mod-

iﬁed version of the httperf utility to generate the traf-

ﬁc from their traces. The methods differ from our

approach in the sense they both focus on the trace

generation and let other tools take care of generating

the load/trafﬁc for the system, while we do on-the-ﬂy

load generation from our models.

(Denaro et al., 2004) propose an approach for

early performance testing of distributed software

when the software is built using middleware compo-

nents technologies, such as J2EE or CORBA. Most

of the overall performance of such a system is deter-

mined by the use and conﬁguration of the middleware

(e.g. databases). They also note that the coupling be-

tween the middleware and the application architecture

determines the actual performance. Based on archi-

tectural designs of an application the authors can de-

rive application-speciﬁc performance tests that can be

executed on the early available middleware platform

that is used to build the application with. This ap-

proach differs from ours in that the authors mainly

target distributed systems and testing of the perfor-

mance of middleware components.

3 PERFORMANCE TESTING

PROCESS

Our performance testing process is depicted in Figure

1. In brief, we build a workload model of the system

by analyzing different sources of information, and

subsequently we generate load in on-the-ﬂy against

the system. During the process, different performance

indicators are measured and a test report is created at

the end.

Requirements

Execution

Logs

Test

Report

Workload

Models

Load

Generation

System

UnderTest

Monitoring

Figure 1: Our performance testing process.

In our work, we have used various Key Perfor-

mance Indicators (KPIs) to provide quantiﬁable mea-

surements for our performance goals. For instance,

we specify the target KPIs before the testing proce-

dure is started and later on we compare them against

the actual measured KPIs.

3.1 Workload Characterization

The ﬁrst step in our process is characterizing the

workload of the system. According to (Menasce and

Almeida, 2001), the workload of a system can be de-

ﬁned as the set of all inputs the system receives from

the environment during any given period of time.

Traditionally, performance analysis starts ﬁrst

with identifying key performance scenarios, based on

the idea that certain scenarios are more frequent than

others or certain scenarios impact more on the perfor-

mance of the system than other scenarios. A perfor-

mance scenario is a sequence of actions performed by

an identiﬁed group of users (Petriu and Shen, 2002).

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

100

In order to build the workload model, we start by

looking and analyzing the requirements and the sys-

tem speciﬁcations, respectively. During this phase we

try to form an understanding of how the system is

used, what are the different types of users, and what

are the key performance scenarios that will impact

most on the performance of the system. A user type

is characterized by the distribution and the types of

actions if performs.

The main sources of information for work-

load characterization are: Service Level Agreements

(SLAs), system speciﬁcations, and standards.

By using these sources we identify the inputs

of the system with respect to types of transactions

(actions), transferred ﬁles, ﬁle sizes, arrival rates,

etc. following the generic guidelines discussed in

(Calzarossa et al., 2000). In addition, we extract in-

formation regarding the KPI’s, such as the number of

concurrent users the system should support, expected

throughput, response times, expected resource utiliza-

tion demands etc. for different actions under a given

load. We would like to point out that this is a man-

ual step in the process. However, automating this step

could be achieved analyzing log ﬁles of the system

and using various clustering algorithms for determin-

ing e.g., different user types, which is subject for fu-

ture work.

The following steps are used for analyzing the

workload:

1. Identify the actions that can be executed against

the system.

(a) Determine the required input data for each ac-

tion. For instance, the request type and the pa-

rameters.

(b) Identify dependencies between actions. For ex-

ample, a user can not execute a logout action

before a login action.

2. Identify the most relevant user types, based for in-

stance on the amount of interactions with the sys-

tem.

3. Deﬁne the distribution of actions that are per-

formed by each user type.

4. Estimate an average think time per action.

With think time we refer to the time between two

consecutive actions. In our approach, the think time

for the same action can vary from one user to another,

or from one test scenario to another.

3.2 Workload Models

The results of the workload characterization are ag-

gregated in a workload model based on Probabilistic

Timed Automata.

X=t1 / action1() / p0 /X:=0

X=t2 / action2() / X:=0 X=t3 / action3() / X:=0 X=t4 / action4() / X:=0

X = t5 / action5() / p4 / X:= 0

X = t5 / action6() / p5 / X:= 0

X=t6 / action7() / X:=0

Figure 2: Example of a probabilistic timed automaton.

We take the deﬁnition of a probabilistic timed au-

tomaton (PTA) as deﬁned by (Kwiatkowska et al.,

2006). A (PTA) P = (L,l, X,

∑

, inv, prob) is a tu-

ple consisting of a ﬁnite set L of locations with the

initial location l ∈ L; a ﬁnite set X of clocks; a ﬁ-

nite set of

∑

of actions; a function inv : L → CC(X)

associating an invariant condition with each location,

where CC(X) is a set of clock constraints over X; a

ﬁnite set prob ⊆ L × CC(X) ×

∑

× Dist(2

× L)

of probabilistic transitions, such that, for each l ∈ L,

there exists at least one (l,

, , ) ∈ prob; and a label-

ing function δ : L → 2

, where AP denote a set of

atomic propositions.

A probabilistic transition (l,g,p,a) ∈ prob is a

quadruple containing (1) a source location l, (2) a

clock constraint g, called guard or invariant condi-

tion, (3) a probability p, and (4) an action. The prob-

ability indicates the chance of that transition being

taken. The action describes what action to take when

the transition is used, and the clock indicates how long

to wait before ﬁring the transition. The behavior of a

PTA is similar to that of a timed automaton (Alur and

Dill, 1994): in any location, time can advance as long

as the invariant holds, and a probabilistic transition

can be taken if its guard is satisﬁed by the current val-

ues of the clocks. Every automaton has an end loca-

tion, depicted with a double circle, which will even-

tually be reached. It is possible to specify loops in

the automaton. We note that not all transitions have

both a guard and a probability. For simplicity, we do

not explicitly specify location invariants, but they im-

plicitly evaluate to true. One such workload model is

created for each identiﬁed user type.

Model-basedPerformanceTestingofWebServicesusingProbabilisticTimedAutomata

101

Figure 3: PTA model for an aggressive-bidder user type.

3.3 Load Generation

The resulting workload models are used for generat-

ing load in real-time against the system under test, by

creating traces from the corresponding PTA. The user

types are selected based on their reciprocal distribu-

tion. The PTA of each user type will be executed con-

currently by selecting the corresponding actions and

sending them to the system. By executing the PTA of

a given user, in each step an action is chosen based on

the probabilistic values in the automaton.

The load generation is based on a deterministic

choice with a probabilistic policy. This introduces

certain randomness into the test process and that can

be useful for uncovering certain sequences of ac-

tions which may have a negative impact of the perfor-

mance. Such sequences would be difﬁcult or maybe

impossible to discover if static test scripts are used,

where a ﬁxed order of the actions is speciﬁed, and re-

peated over and over again. Every PTA has an exit

location which will eventually be reached. By modi-

fying the probability for the exit action, it is also pos-

sible to adjust the average length of the generated se-

quences.

3.4 Performance Monitoring

During the load generation, we constantly monitor

target KPIs for the entire test duration. At the end,

we collect all the gathered data and compute descrip-

tive statistics, like the mean and peak response times

for different actions, number of concurrent users, the

amount of transferred data, the error rate, etc. All the

gathered information is presented in a test report. The

resource utilization of the system under test is also

monitored and reported. Besides computing different

kinds of statistical values from the raw data we have,

the test report also contains graphs such as how the

response time varied over time with the number of

concurrent users. The test report also shows the CPU,

disk, network and memory usage on the target system.

Tool support for load generation is provided via

the MBPeT tool (Abbors et al., 2012). Due to space

limitations we defer more details about the approach

and support to (Ahmad et al., 2013)

4 CASE STUDY AND

EXPERIMENTS

In this section, we demonstrate our approach by us-

ing it to evaluate the performance of an auction web

service, generically called YAAS. The YAAS applica-

tion was developed as a stand-alone application and is

used for the evaluation of our approach. The YAAS

has a RESTful (Richardson and Ruby, 2007) interface

based on the HTTP protocol and allows registered

users to create, change, search, browse, and bid on

auctions that other users have created. The applica-

tion maintains a database of the created auctions and

the bids that other users have placed on the auctioned

objects. The YAAS application is implemented us-

ing Python (Python, 2012) and the Django (Django,

2012) framework.

Test Architecture. The test architecture is shown in

Figure 4. The MBPeT tool has a scalable architec-

ture where a master node controls several slave nodes.

The SUT runs an instance of the YAAS application

on top of the Apache web server. All nodes (master,

slaves, and the server) feature an 8-core CPU, 16 GB

of memory, 1Gb Ethernet, 7200 rpm hard drive, and

Fedora 16 operating system. The nodes were con-

nected via a 1Gb Ethernet.

A populator script is used to generate input data

(i.e., populate the test databases) on both the client

and server side, before each test session. This ensures

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

102

1GB

Slave

Node 1

Slave

Node 2

Master

Node

Slave

Node N

Monitoring tools

Apache

YAAS

Populator

Ethernet

Server

Dstat

Figure 4: A caption of the test architecture.

that the test data on either side is consistent and easy

to rebuild after each test session.

Workload Modeling. We analyzed the workload fol-

lowing the steps described in Section 3. Based on

this analysis, three user types were identiﬁed: aggres-

sive bidders, passive bidders, and non-bidders. From

this information, we constructed a PTA model for

each user type. Figure 3 shows the PTA for a aggres-

sive bidder.

Figure 3 shows that each action has a think time

parameter, modeled as a clock variable associated

with it, that speciﬁes how much time should elapse

before ﬁring a transition. This variable is denoted

with the symbol X and it is reset to 0 after the tran-

sition is ﬁred. Upon ﬁring the transition, the action

associated with that transition is sent to the SUT.

Adapter. An adapter is used to translate abstract ac-

tions generated from the model into concrete HTTP

requests by adding the necessary HTTP parameters

and encapsulation to the SUT. All slaves run identi-

cal adapters. The models as such are system inde-

pendent, but an adapter module need to be written for

every system that one chooses to interface with. Since

YAAS is based on the HTTP protocol, it will under-

stand the basic HTTP commands like GET, POST,

PUT, etc. Whenever a new action is selected from

the model, the corresponding HTTP request is created

and, when needed, the associated data is automati-

cally attached to the request from the local database.

Experiments. In the case study, an experiment was

conducted to ﬁnd out how the YAAS application per-

forms under load. As a rule of thumb for ensuring

accurate results, the experiment was run three times.

In the experiment, we set out to test how many

concurrent users the host node can support without

exceeding the speciﬁed target response time values.

Table 1 shows the average and max response time

limits (see column Target Response Time) that were

selected for each type of action. For instance, the av-

erage response time limit for action browse() was set

to 4.0 seconds, while the max response time was set to

8.0 seconds. If any of the set limits (average and max)

are breached during the test run, the tool will mark the

time of the breach and the number of concurrent users

Figure 5: Average response times for get auction and

get bids (bottom), search and browse (middle), and bid

(top) when ramping up from 0 to 300 users.

at that time (see Table 1 - Time of breach). The length

of the test run was 20 minutes (1200 seconds). Fig-

ure 5 shows how the response times of different ac-

tions increase over time for the aggressive user type

as the number of concurrent users are ramped up from

0 to 300. In this experiment the tool generated a total

of 1504 unique test sequences form the models. Sev-

eral of the unique test sequences were executed more

than 100 times and the variance on the test sequence

length was from 1 up to 50 actions.

Table 1 also shows the time when a target response

time (average and/or max) value was exceeded and

the number of concurrent users at that time. For ex-

ample, the average response time for the search() ac-

tion was exceeded at 229 seconds into the test run by

the aggressive user type. The tool was when running

64 concurrent users. Form this table we concluded

that the current conﬁguration of the server can sup-

port a maximum of 64 concurrent users before ex-

ceeding the threshold value of 3 seconds set for action

search(). A closer inspection of the monitored values

of the server showed that the database was the bottle-

neck, due to the fact a sqlite database was used and

the application locked the whole database for write

operations.

Additional experiments, including a comparison

of our approach against JMeter can be found in (Ah-

mad et al., 2013). The experiment showed that out

tool has similar capabilities as JMeter for instance

when comparing the throughput (actions/sec) against

the SUT.

5 CONCLUSIONS AND FUTURE

WORK

In this paper, we have presented a model-based

performance testing approach that uses probabilistic

Model-basedPerformanceTestingofWebServicesusingProbabilisticTimedAutomata

103

Table 1: Response time measurements for user actions when ramping up from 0 to 300 users.

Target Response Time Non-Bidders (22 %) Passive Users (33 %) Aggressive users 45 % Verdict

Actions Average Max Time of Time of Time of Time of Time of Time of Pass/fail

(sec) (sec) breach (sec) breach (sec) breach (sec) breach (sec) breach (sec) breach (sec)

browse() 4.0 8.0 279 (78 users) 394 (110 users) 323 (90 users) 394 (110 users) 279 (78 users) 394 (110 users) Failed

search(string) 3.0 6.0 279 (78 users) 394 (110 users) 279 (78 users) 394 (110 users) 229 (64 users) 327 (92 users) Failed

get action(id) 2.0 4.0 280 (79 users) 325 (91 users) 279 (78 users) 279 (78 users) 276 (77 users) 325 (91 users) Failed

get bids(id) 3.0 6.0 279 (78 users) 446 (130 users) 325 (91 users) 394 (110 users) 327 (92 users) 394 (110 users) Failed

bid(id,price, username, password) 5.0 10.0 —- —– 327 (92 users) 474 (132 users) 328 (92 users) 468 (131 users) Failed

models to generate synthetic load in real-time. The

models are based on the Probabilistic Timed Au-

tomata, and include statistical information that de-

scribes the distribution between different actions and

corresponding think times. With the help of probabil-

ity values, we can make it so that a certain action is

more likely to be chosen over another action, when-

ever the virtual user encounters a choice in the PTA.

We believe that the PTA models are well suited for

performance testing and that the probability aspect

that the PTA holds is good for describing dynamic

user behavior, allowing us to include a certain level

of randomness in the load generation process. This is

important because we wanted the virtual users to be

able to mimic real user behavior as closely as possi-

ble, and minimize the effect of caches on the perfor-

mance evaluation.

The approach is supported by a set of tools, in-

cluding the MBPeT load generator. MBPeT has a

scalable distributed architecture which can be easily

deployed to cloud environments. The tool has a ramp-

ing feature which describes at what rate new users are

added to the system and also supports the ability to

specify a think time. When the test duration has ended

the MBPeT tool will gather measured data, process it

and create a test report.

In the future we will look into if parts of the model

creation can be automated. At the moment it is done

manually. There are indications that certain parts of

creating the models can be automated e.g. by auto-

matically analyzing the log data and using different

clustering algorithms.

REFERENCES

Abbors, F., Ahmad, T., Truscan, D., and Porres, I. (2012).

MBPeT: A Model-Based Performance Testing Tool.

2012 Fourth International Conference on Advances in

System Testing and Validation Lifecycle.

Ahmad, T., Abbors, F., Truscan, D., and Porres, I. (2013).

Model-Based Performance Testing Using the MBPeT

Tool. Technical Report 1066, Turku Centre for Com-

puter Science (TUCS).

Alur, R. and Dill, D. L. (1994). A theory of timed automata.

Theor. Comput. Sci., 126(2):183–235.

Barna, C., Litoiu, M., and Ghanbari, H. (2011). Model-

based performance testing (NIER track). In Proceed-

ings of the 33rd International Conference on Software

Engineering, ICSE ’11, pages 872–875, New York,

NY, USA. ACM.

Calzarossa, M., Massari, L., and Tessera, D. (2000). Work-

load Characterization Issues and Methodologies. In

Performance Evaluation: Origins and Directions,

pages 459–481, London, UK, UK. Springer-Verlag.

Denaro, G., Polini, A., and Emmerich, W. (2004). Early

performance testing of distributed software applica-

tions. In Proceedings of the 4th international work-

shop on Software and performance, WOSP ’04, pages

94–103, New York, NY, USA. ACM.

Django (2012). Online at https://www.djangoproject.com/.

Ferrari, D. (1984). On the foundations of artiﬁcial work-

load design. In Proceedings of the 1984 ACM SIG-

METRICS conference on Measurement and modeling

of computer systems, SIGMETRICS ’84, pages 8–14,

New York, NY, USA. ACM.

Kwiatkowska, M., Norman, G., Parker, D., and Sproston,

J. (2006). Performance analysis of probabilistic timed

automata using digital clocks. Formal Methods in Sys-

tem Design, 29:33–78.

Menasce, D. A. (2002). Load Testing of Web Sites. IEEE

Internet Computing, 6:70–74.

Menasce, D. A. and Almeida, V. (2001). Capacity Plan-

ning for Web Services: metrics, models, and methods.

Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st

edition.

Mosberger, D. and Jin, T. (1998). httperfa tool for measur-

ing web server performance. SIGMETRICS Perform.

Eval. Rev., 26(3):31–37.

Petriu, D. C. and Shen, H. (2002). Applying the UML

Performance Proﬁle: Graph Grammar-based Deriva-

tion of LQN Models from UML Speciﬁcations. pages

159–177. Springer-Verlag.

Python (2012). Python programming language. Online at

http://www.python.org/.

Richardson, L. and Ruby, S. (2007). Restful web services.

O’Reilly, ﬁrst edition.

Ruffo, G., Schifanella, R., Sereno, M., and Politi, R. (2004).

WALTy: A User Behavior Tailored Tool for Evaluat-

ing Web Application Performance. Network Comput-

ing and Applications, IEEE International Symposium

on, 0:77–86.

SeleniumHQ (2012). Online at http://seleniumhq.org/.

Shams, M., Krishnamurthy, D., and Far, B. (2006). A

model-based approach for testing the performance of

web applications. In SOQUA ’06: Proceedings of the

3rd international workshop on Software quality assur-

ance, pages 54–61, New York, NY, USA. ACM.

Shaw, J. (2000). Web Application Performance Testing –

a Case Study of an On-line Learning Application. BT

Technology Journal, 18(2):79–86.

WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies

104