the best tool and it promotes the use of software 
testing tools.  
Nowadays the content of a website is important 
as well as the speed at which it responds. Companies 
focus on improving the capability of a website’s 
response to avoid losing users. To conduct a realistic 
evaluation of the tools, four search engines are tested 
in terms of performance: Google; Bing; Ask and Aol 
Search. 
This paper is structured as follows. Section 2 
presents a literature revision and section 3 describes 
the various types of performance testing. Section 4 
describes the four testing tools, section 5 the 
qualitative and quantitative analysis of these tools. 
Section 6 presents the performance tests performed 
on each search engine. Lastly, section 7 states the 
conclusion of this work and proposes some future 
work. 
2 RELATED WORK 
Web applications are ubiquitous and need to deal 
with a large number of users. Due to their exposure 
to end users, especially customers, web applications 
have to be fast and reliable, as well as up-to-date. 
However, delays during the usage of the Internet are 
common and have been the focus of interest in 
different studies (Barford and Crovella, 1999), 
(Curran and Duffy, 2005). 
Load testing is thus an important practice for 
making sure a web site meets those demands and for 
optimizing its different components (Banga and 
Druschel, 1999). 
The goal of a load test is to uncover functional 
and performance problems under load. Functional 
problems are often bugs which do not surface during 
the functional testing process. Deadlocks and 
memory management bugs are examples of 
functional problems under load. Performance 
problems often refer to performance issues like high 
response time or low throughput under load. 
The first conference about testing software was 
organized in 1972, at Chapel Hill, where the 
presented works at the conference defended that 
performing tests is not the same as programming 
(Sharma and Angmo, 2014).  
Existing load testing research focuses on the 
automatic generation of load test suites (Avritzer and 
Larson, 1993), (Avritzer and Weyuker, 1994), 
(Avritzer and Weyuker, 1995), (Bayan and 
Cangussu, 2006), (Garousi et al., 2006), (Zhang and 
Cheung 2002). 
There is limited work, which proposes the 
systematic analysis of the results of a load test to 
uncover potential problems. Unfortunately, looking 
for problems in a load test is a timeconsuming and 
difficult task. The work Jiang et al., (2008) flags 
possible functional problems by mining the 
execution logs of a load test to uncover dominant 
execution patterns and to automatically flag 
functional deviations from this pattern within a test. 
In Jiang (2010) the authors introduce an 
approach that automatically flags possible 
performance problems in a load test. They cannot 
derive the dominant performance behavior from just 
one load test, since the load is not constant. A 
typical workload usually consists of periods 
simulating peak usage and periods simulating off-
hours usage. The same workload is usually applied 
across load tests, so that the results of prior load tests 
are used as an informal baseline and compared 
against the current run. If the current run has 
scenarios which follow a different response time 
distribution than the baseline, this run is probably 
troublesome and worth investigating. 
Wang and Du (2012) introduced a new 
integrated automation structure by Selenium and 
Jmeter. This structure shares the test data and steps, 
which is usefull  for switching in severall kinds of 
tests for web applications. With the use of this 
software structure one can improve extensibility and 
reuse of the tests, as well as the product quality. The 
document describes how to design the tests 
automation based in web details. 
Wang et al., (2010) proposed a usage and load 
model to simulate user behaviors and help generate a 
realistic load to the web application load test, 
respectively. They implemented a tool know as “ 
Load Testing Automation Framework” for web 
apllications load test. The tool is based in the two 
models mentioned above. 
There are not many scientific articles dedicated 
to the comparison of evaluation tools of web 
platforms. However, Sharma et al., (2007) used four 
testing tools: Apache JMeter, HP LoadRunner, 
WebLOAD and The Grinder, with the objective of 
comparing these tools and identify which  one is the 
most efficient. In the comparison were used 
parameters such as cost, the unlimited load generator 
and the ease of use. After comparing the tools, the 
selected one was jMeter, since it’s free, has a huge 
ability to simulate load and its interface is easy to 
use. 
Hussain et al., (2013) describes three open 
source tools (jMeter, soapUI e storm) and compares 
them in terms of functionalities, usability, 
performance and software requirements. Concludes