The Effect of Network Performance on High Energy Physics Computing

Jukka Kommeri

1,2

, Aleksi Vartiainen

1,2

, Seppo Heikkil

and Tapio Niemi

Helsinki Institute of Physics, Helsinki, Finland

Aalto University, Espoo, Finland

Helsinki Institute of Physics, CERN, Geneva, Switzerland

{jukka.kommeri, aleksi.vartiainen}@aalto.ﬁ, {seppo.heikkila, tapio.niemi}@cern.ch

Keywords:

OpenStack, Network, Latency, Energy efﬁciency, Scientiﬁc Computing.

Abstract:

High Energy Physics (HEP) data analysis consists of simulating and analysing events in particle physics.

In order to understand physics phenomena, one must collect and go through a very large quantity of data

generated by particle accelerators and software simulations. This data analysis can be done using the cloud

computing paradigm in distributed computing environment where data and computation can be located in

different, geographically distant, data centres. This adds complexity and overhead to networking. In this paper,

we study how the networking solution and its performance affects the efﬁciency and energy consumption of

HEP computing. Our results indicate that higher latency both prolongs the processing time and increases the

energy consumption.

1 INTRODUCTION

High Energy Physics (HEP) studies elementary parti-

cles by using large particle accelerators, such as the

Large Hadron Collider (LHC) at CERN, for produc-

ing millions of high-energy particle collision events.

In order to understand physics phenomena, one must

go through a very large quantity of measurement sam-

ples. A single high-energy physics analysis can pro-

cess millions of events (Ponce and Hersch, 2004).

This work can be easily parallelized because there are

no dependencies among these events. Particle physics

events are stored in database like containers, ROOT

ﬁles (Antcheva et al., 2009). The required computing

resources for CERN LHC data analysis are divided

among 11 tier-1 sites and 155 tier-2 sites of comput-

ing centers world-wide using the grid/cloud comput-

ing paradigms (Bird et al., 2014). The distributed na-

ture of HEP computing poses some extra overhead

when the data needs to be accessed from a site that is

geographically very distant. This often happens since

the grid infrastructure used at CERN, World Wide

LHC Computing Grid (WLCG), spans from Japan to

USA. Although, WLCG was designed before the era

of cloud computing, also different cloud solutions has

been studied and the OpenStack cloud suite has been

found suitable for HEP computing (Andrade et al.,

2012; O’Luanaigh, 2014).

On the high level, cloud computing is a collec-

tion of servers, or hypervisors, that run mixed sets of

virtual machines processing various workloads. The

hypervisors share their processing and networking re-

sources among a set of virtual machines. In the case

of HEP data analysis jobs, which fetch constantly data

from remote location, the network can become a bot-

tleneck and cause delays for the analysis. The delays

can have a big impact on overall performance.

ROOT ﬁles are most commonly accessed with the

XRootD protocol, that runs on top of TCP, (Behrmann

et al., 2010). The performance of XrootD is a well-

studied topic. These studies mainly focus on storage

performance (Gardner et al., 2014; Matsunaga et al.,

2010), data federation (Bauerdick et al., 2014), and

scalability (Dorigo et al., 2005; de Witt and Lahiff,

2014). Energy efﬁciency has not been considered,

nor the effect of network delay on the performance of

HEP computing. Therefore, in this paper we study the

effect of networking in cloud environment on the per-

formance and, especially, energy efﬁciency of HEP

computing. In HEP the data and computing is ge-

ographically distributed all over the globe. For this

reason the key problem examined in this paper is the

performance of HEP software accessing locally and

remotely located data sets. In particular, the goal is

to understand the effect of latency and throughput to

HEP job execution time and energy usage.

The remainder of this paper is structured as fol-

lows. First, in Section 2, we cover the related work.

227

Kommeri J., Vartiainen A., HeikkilÃd’ S. and Niemi T.

The Effect of Network Performance on High Energy Physics Computing.

DOI: 10.5220/0006224202270234

In Proceedings of the Sixth International Symposium on Business Modeling and Software Design (BMSD 2016), pages 227-234

ISBN: 978-989-758-190-8

In Section 3 we describe our test software and the test

environment used in this study, which is followed by

the results in Section 4. Then we end with conclusion

in Section 5.

2 RELATED WORK

The importance of networking energy efﬁciency

keeps growing as the world is getting more and more

connected. Bolla et al. (Bolla et al., 2011) have stud-

ied what kind of research there has been in the net-

work domain to improve energy efﬁciency. Ideas are

very similar to those of the energy efﬁciency of com-

puting. Hardware energy efﬁciency needs to be im-

proved and the hardware needs better power scaling

abilities: there needs to be a way to turn off hard-

ware resources when they are not needed and in this

way to improve the utilization level of the hardware.

Kliazavich et al. (Kliazovich et al., 2010) have de-

veloped a cloud simulator, Greencloud, for measuring

the energy consumption of cloud data centers. It can,

e.g., evaluate different internal network layouts. The

authors also showed how cloud network can beneﬁt

from load based management of virtual machines.

Load based management of virtualized cluster has

been studied a lot and many different algorithms with

various heuristics have been proposed. Piao et al.

(Piao and Yan, 2010) have studied this topic from net-

working aspect. They have developed an algorithm

that moves virtual machines in order to avoid conges-

tion in the network. In their simulations, the execu-

tion time of data intensive application was improved

by up to 25% when using their network trafﬁc aware

algorithm. Kuo et al. (Kuo et al., 2014) have studied

how internal cloud latencies affect MapReduce (Dean

and Ghemawat, 2008) performance. They have de-

veloped a virtual machine placement algorithm that

attempts to minimize the network latency between

cloud instances and this way improve the performance

of Hadoop tasks.

In a shared environment like cloud, also the inter-

nal networking can become a bottleneck. Mauch et

al. (Mauch et al., 2013) introduce High Performance

Cloud Computing (HPC2) model. They have studied

how suitable Amazon cloud would be for HPC com-

putation and found the network to be limiting factor

for performance. The 10Gb network of Amazon was

found to cause more than ten times more latency than

Inﬁniband that is normally used in HPC clusters. Ex-

posito et al. (Expsito et al., 2013) have also stud-

ied how well different HPC loads perform in Ama-

zon cloud. Reano et al. (Reano et al., 2013) have

found similar limitations for remote GPU computing.

The Gigabit Ethernet can add a 100% overhead on

rCUDA

As the physics analysis jobs are well parallelizable

and do not need inter process communication, the per-

formance of internal communication is not so impor-

tant. More important is the access time to data, which

depends partly on cloud internal networks, but also on

how far the data is and how the cloud connects to it.

Haeussler et al. (Haeussler et al., 2015) have studied

how latency effects the performance of genome an-

notation data retrieval. In their case, the data can be

very far away and this distance can slow down the re-

trieval process signiﬁcantly. Shea et al (Shea et al.,

2014) have studied the performance of TCP in cloud

environment. They have shown that the network per-

formance of virtual machine depends on the CPU load

of its hosting hypervisor, i.e., if there are other virtual

machines on the same physical host with high CPU

load, there is less CPU time for networking.

The same situation occurs when virtual machine

itself has a high CPU load. This study was made in

Amazon environment and with a separate Xen setup.

Bullot et al. (Bullot et al., 2003) have studied the

performance of different TCP variants. They have

compared different TCP versions over connections

that link continents. The distance affects the perfor-

mance of different TCP variations differently. Sci-

entiﬁc computing runs mainly on Linux machine and

CERN has its own variant of it, Scientiﬁc Linux at

CERN (SLC). SLC uses Cubic TCP, which is a de-

fault TCP variant in Linux and a more fair version of

BIC TCP (Ha et al., 2008).

As the review above shows, many aspects of net-

working of cloud clusters, data centers, and the In-

ternet have received a lot of research attention, still

the effect of latency on energy efﬁciency of different

workloads have not been much studied.

3 TEST ENVIRONMENT

There are different kinds of HEP workloads: sim-

ulation, reconstruction, analysis, etc. In this study,

we used as our workload a process that transforms

real physics event data into a more compact form that

can be eventually used by the physicist on a standard

PC hardware. The transformation process of a single

event has two phases. In the ﬁrst phase, the events

are selected based on their suitability for the current

analysis. Then, in the second phase, the event data is

transformed and stored in a more space saving struc-

ture.

http://www.rcuda.net/

Sixth International Symposium on Business Modeling and Software Design

228

HEP computation uses special software packages.

In the case of CERN CMS experiment, CMS soft-

ware framework (CMSSW) (Fabozzi et al., 2008) is

used. CMSSW is distributed to computing nodes with

CERN Virtual Machine Filesystem, CVMFS (Meusel

et al., 2015). CVMFS is a centrally managed soft-

ware repository that contains several versions of var-

ious HEP software frameworks. It can be mounted

directly to computing nodes. The software is cached

locally when it is being used. In the case of CMS anal-

ysis, the job can cache about 1 GB of data or program

code.

In this study, we tested HEP workload in a cloud

environment. Since the location of the data and net-

working conditions have a big impact to the perfor-

mance of the computation, our goal is to measure

this effect of distance on both computation time and

energy consumption. In our tests, the OpenStack

cloud platform was used. OpenStack is a open-source

platform for cloud computing consisting of individual

projects, i.e. services, that are responsible for comput-

ing, networking and storage, among other services.

OpenStack services are designed to be deployed on

multiple nodes, with a scalable number of compute

nodes. Virtual machines running on OpenStack are

called instances.

Measurements were done using three different

cloud setups. In all the setups, we had a HEP client

that reads data from storage server and does the trans-

formation. In every test, the client was run in a cloud

instance and the data server on a separate instance

or on a separate server outside the cloud. The same

workload was used in all the tests and its run times

and energy consumption in different network condi-

tions were measured. Tests were repeated several

times to get reliable results.

3.1 Local Data

First tests were done using a single physical server

with suitable hardware for energy measurement. This

single server OpenStack cloud installation was setup

with DevStack

. In our DevStack installation, all of

the OpenStack components run on the same hard-

ware. The setup used the following hardware: Fu-

jitsu Esprimo Q910 computer with quad-core Intel(R)

Core(TM) i5-3470T CPU @ 2.90GHz, 8 GB of RAM

and 8 GB of swap. As an operating system, it had

Ubuntu 12.04.

Physics workload was run in an OpenStack in-

stance. OpenStack instances can have different

amounts of resources; number of virtual processors

https://www.openstack.org/software

http://docs.openstack.org/developer/devstack/

(VCPU), the amount of memory (RAM), and the size

root disk and swap disk. These different conﬁgura-

tions are called ﬂavors

. Two types of ﬂavors were

used in the test environment: when running only one

job the instance was assigned two virtual CPUs, 4 GB

of memory and two GB of swap, and when running

two jobs it was assigned one virtual CPU, 3 GB of

memory and 1 GB of swap.

As a storage server for ROOT ﬁles, we used a

ProLiant BL280c G6 blade computer with 16-core

Intel Xeon CPU E5640 @ 2.67GHz and 68 GB of

RAM. The server was in the same local area network

of computer science department of Aalto University

as the cloud setup and were initially connected with

100MbE, which was upgraded to 1GbE for compari-

son. The server was installed with Ubuntu 14.04 and

xrd server version 4.1.3.

We measured aspects such as processor (CPU),

memory (RAM), power, cached data and network

trafﬁc statistics. Power measurements were done us-

ing Running Average Power Limit (RAPL) (H

ahnel

et al., 2012). RAPL is an Intel technology that mea-

sures the power consumption in Sandy Bridge CPUs

and above. Network trafﬁc has been recorded with

Tshark, which is the command line version of Wire-

shark

packet analyzer.

The workload was run in varying conditions, in-

cluding network delay, packet loss, packet duplica-

tion, packet corruption, limited network throughput,

parallel jobs, and different operating system cache

and CVMFS cache conﬁgurations. In this paper, we

use the term throughput to describe the actual net-

work transport capacity, i.e., bits per second. The

network limitations were simulated using the class-

less queuing disciplines (qdisc) tool, except for lim-

ited throughput simulated with Wondershaper

. The

OS cache was cleared by freeing pagecache, dentries

and inodes, and CVMFS cache with CVMFS tool

cvmfs_config wipecache.

In addition to previously described single node

cloud system, we installed a separate cloud, which

was able to run more virtual machines, but lack the

ability to measure energy consumption. We used the

same blade hardware and operating system as previ-

ously for storage server. In these tests, we used three

blades, which were installed using Puppetlabs Open-

Stack module

. The OpenStack controller node, net-

working node and compute node were installed on

their own hardware. All the nodes were connected

http://docs.openstack.org/openstack-ops/content/

ﬂavors.html

https://www.wireshark.org/

http://www.lartc.org/wondershaper/

https://github.com/puppetlabs/puppetlabs-openstack

The Effect of Network Performance on High Energy Physics Computing

229

to the same gigabit network switch. Data was served

from the same node as where the controller was in-

stalled, but not within OpenStack. Physics analysis

was run in an OpenStack instance. Tests with this

cloud setup were done using varying amounts of vir-

tual machines. In similar way as in single node tests,

latency was simulated using qdisc.

3.2 Remote Data

Two OpenStack (release 2015.1.2) installations were

used in this study. One was located in CERN (Meyrin,

Switzerland) and another in the Aalto University (Es-

poo, Finland). Both of the deployments used three

physical machines. The physical machines used in

CERN were Dell PowerEdge R210 rack servers and

in Aalto HP blade servers in HP BladeSystem c7000

Enclosure. The roles of these three machines were

computing, networking and other services. Power

consumption was measured only over the physical

machine running computing service. Both of the

OpenStack instances were conﬁgured with routable

IP addresses in order to be accessible from outside.

The OpenStack installation in CERN was used to

run three identical computing jobs in parallel. Each

job was run in a separate Virtual Machine (VM).

The jobs accessed 1.4GB physics data ﬁle hosted in

XRootD servers. Two of these XRootD servers were

hosted inside the Aalto and CERN OpenStack instal-

lations. Two other XRootD servers were deployed to

existing OpenStack VMs: one in the CERN IT de-

partment and one to Kajaani in Finland.

In this study duration of HEP job processing and

energy usage were collected. The duration was mea-

sured from the start of job processing until ﬁrst VM

ﬁnished the processing. Energy usage was collected

during this same period of time.

4 RESULTS

We used three different testbeds to get diverse mea-

surements. The workload was the same in every

testbed. Depending on the setups, energy was mea-

sured either with an energy meter or by calculating

from processor energy counters.

4.1 Local Data

Running the workload is both CPU and network in-

tense. Figure 1 shows the relation between power

consumption and network trafﬁc, both appearing in

synchronous cycles. The base power consumption of

the hypervisor is typically less than ﬁve watts. Results

have a 30-second period of time in both ends when

the virtual machine is running idle, i.e., no workload.

This demonstrates the base power consumption and

other base statistics on the hypervisor and the virtual

machines. Remote resources from the XRootD server

are downloaded in distinct parts. Most of the time

there is no trafﬁc between the server and the hyper-

visor. Closer look at the network throughput peaks

show that there is short but constant peak that uses all

the available bandwidth.

Network delays are typical to wide area networks

(WAN) and have a clear impact on the workload run-

time. As Figure 2 shows, an increase of 75 millisec-

onds in network delay, causes the run time to increase

by over ten seconds. In addition, the power consump-

tion increases by 34 percent. The effect is less evident

when the same test is repeated in a network with more

bandwidth. In Figure 3, we have the results of the

same measurements in gigabit Ethernet. Run times

are shorter with 1GbE, but energy consumption de-

pends on latency. The effect of bandwidth is sum-

marized in Table 1. In a gigabit LAN, the run times

decrease roughly by ten percent when compared to

that of a 100 Mbps network. Latency does not seem

to have a big impact on the energy consumption when

using 1GbE network, but an impact on 100MbE net-

work.

Table 1: Comparison of execution times in different net-

works with no added delay.

Parallelism Execution time (s) ∆

100MB/s 1 GB/s

1 VM 281 255 -9 %

2 jobs, 1

363 327 -10 %

2 jobs, 2

VMs

315 278 -12 %

The workload downloads and stores roughly 400

MB of data in CERN VM File System (CVMFS) local

cache. If the cache is empty, this data is downloaded

from CERN servers at the beginning of executing the

workload. Otherwise no data exceeding ten kB in to-

tal is downloaded from CERN. As the total size of

required tools is 400 MB, setting the CVMFS cache

limit lower than that, affects the processing time and

network trafﬁc. As Table 2 shows, no data needs to be

downloaded if the cache limit is high enough and the

data has been previously cached. On the contrary, the

workload cannot be executed at all if the cache limit

is too low.

In Table 3, we have a summary of parallel work-

load tests. It shows that the latency does not in-

crease signiﬁcantly even though ten virtual machines

are sharing a single physical interface. Flows have

different latencies depending on the direction, but this

Sixth International Symposium on Business Modeling and Software Design

230

Figure 1: Relation between power consumption and network trafﬁc on the hypervisor.

Figure 2: Comparison of workload run times and to-

tal energy consumptions with different network delays in

100Mb/s Ethernet.

Table 2: Comparison of execution times and network trafﬁc

from CERN. When the cache limit is sufﬁciently less than

300 MB, the workload cannot be executed.

* First run after CVMFS cache clear

** The following runs (average)

Cache

limit

Data from

CERN (MB)

Execution

time (s)

* ** **

200 MB - - -

300 MB 422 300 249

500 MB 315 0 166

5000 MB 320 0 165

is similar with all workloads. Some change in the

maximum values, but means and medians are about

the same.

We tried to stress the shared interface even more

by adding ﬁve additional virtual machines generat-

ing HTTP trafﬁc by downloading large images from

a university server. The results of this addition were

similar to the results of 10VM workload.

Similarly to previous single server measurements,

Table 3: Round trip times between XRootD server and

OpenStack instance in milliseconds.

1VM 10VM

min 0.11 0.03

max 238.97 294.71

median 4.38 4.46

mean 7.67 10.14

stdev 25.32 32.08

we tested how added latency affects the execution

time when running multiple virtual machines in par-

allel. Figure 4 shows how execution times increase

when we add more latency. This measurement was

done with 1, 5 and 10 virtual machines. The ef-

fect of added latency was greater with 1VM where

100ms caused 6.9% increase to execution time as with

10VMs it is 2.3%.

From the same 10VM tests we got the through-

put values, that are shown in Figure 5. These results

show a relation between latency and throughput as the

maximum throughput decreased 41% when 100ms la-

tency was added.

4.2 Remote Data

Figure 6 and Figure 7 show job run times and energy

usage, respectively, when data is hosted with XRootD

server in different physical locations. The energy us-

age is directly related to the processing time. Only the

OpenStack energy usage is slightly higher because the

measurement includes also the XRootD server host-

ing. Thus, it would be possible to estimate the energy

usage by measuring only the processing time.

There is at least two possible causes for the differ-

ences between sites: the network latency and through-

put. The network latencies to Aalto and Kajaani are

46.1±0.3ms and 49.6±0.2ms, respectively, while in-

side CERN the latencies are less than one millisec-

ond. The network throughput to Aalto and Kajaani

are 23.2±2.7 MB/s and 20.0±1.6 MB/s, respectively,

while inside CERN 58.3±4.9 MB/s.

The Effect of Network Performance on High Energy Physics Computing

231

Figure 3: Comparison of workload run times and total energy consumptions with different network delays in 1Gb/s Ethernet.

Figure 4: Job execution times with ten OpenStack instances

in parallel and with added latency.

The effect of network throughput and latency can

be examined by comparing local and remote sites with

a 20MB/s ﬁxed throughput that is achievable with

both sites. Figure 8 shows that there is around 12.5

percent difference between CERN and throughput

limited Kajaani site. Throughput limitation removes

70 percent of this difference so effect of latency seems

to explain around 30 percent of the difference. A de-

tailed analysis of the network trafﬁc showed that only

0.01% TCP packets were retransmitted and also the

TCP window size increased quickly to around three

MB. Thus network problems do not explain the ob-

served differencies. With the current HEP software

stack the best option is to prefer nearby data sources

with low latency.

Figure 5: Maximum throughput with ten OpenStack in-

stances in parallel and with added latency.

5 CONCLUSIONS AND FUTURE

WORK

High energy physics computing at CERN uses a large

computing grid/cloud distributed around the world.

This naturally poses long distances between the sites

and slows down the network connections among

them. To alleviate this, we studied how networking

performance affects on computing performance and

energy efﬁciency on high energy physics computing

in an OpenStack cloud testbed. We used both simu-

lated network latencies in laboratory network and sev-

eral geographically distant sites connected by the In-

ternet to measure how different latencies change com-

puting performance when processing HEP workload.

Our results indicate that the network latency, ei-

ther caused by a simulator or physical distances be-

tween the sites, has a negative impact on the com-

Sixth International Symposium on Business Modeling and Software Design

232

Figure 6: Job execution times of OpenStack VMs.

Figure 7: Energy usage of OpenStack VMs.

puting performance. High latency both increases run

times and the total energy consumption. Addition-

ally, we also noticed that the contribution of latency,

to the execution time and energy consumption of a

computation job, increases when bandwidth is small.

Parallelism, multiple cloud instances sharing the lim-

ited network resource, also adds more latency and in-

creases job run times.

The obtained results reﬂect the current software

environment used for HEP job processing. New data

transfer protocols or advanced caching mechanism

could diminish the observed differencies. Instead, the

used network infrastructure and computing hardware

is unlikely to change signiﬁcantly in the near future.

Our future work includes studying methods how

the effect of latency can be minimized using e.g.

smarter workload scheduling, data preloading, or op-

timized network protocols.

Figure 8: Job execution times with throughput limits.

ACKNOWLEDGMENTS

This paper has received funding from the European

Union’s Horizon 2020 research and innovation pro-

gram 2014-2018 under grant agreement No. 644866.

DISCLAIMER

This paper reﬂects only the authors’ views and the

European Commission is not responsible for any use

that may be made of the information it contains.

REFERENCES

Andrade, P., Bell, T., van Eldik, J., McCance, G., Panzer-

Steindel, B., dos Santos, M. C., Traylen, S., , and

Schwickerath, U. (2012). Review of cern data centre

infrastructure. Journal of Physics: Conference Series,

396(4).

Antcheva, I., Ballintijn, M., Bellenot, B., and Biskup, M.

(2009). ROOT - A C++ framework for petabyte data

storage, statistical analysis and visualization. Com-

puter Physics Communications, 180(12):2499–2512.

Bauerdick, L. A. T., Bloom, K., Bockelman, B., Bradley,

D. C., Dasu, S., Dost, J. M., Sﬁligoi, I., Tadel, A.,

Tadel, M., Wuerthwein, F., Yagil, A., and the Cms col-

laboration (2014). Xrootd, disk-based, caching proxy

for optimization of data access, data placement and

data replication. Journal of Physics: Conference Se-

ries, 513(4).

Behrmann, G., Ozerov, D., and Zanger, T. (2010). Xrootd

in dcache - design and experiences. In International

Conference on Computing in High Energy and Nu-

clear Physics (CHEP).

Bird, I., Buncic, P., Carminati, F., Cattaneo, M., Clarke, P.,

Fisk, I., Girone, M., Harvey, J., Kersevan, B., Mato,

The Effect of Network Performance on High Energy Physics Computing

233

P., Mount, R., and Panzer-Steindel, B. (2014). Up-

date of the computing models of the wlcg and the lhc

experiments. Technical report, CERN.

Bolla, R., Bruschi, R., Davoli, F., and Cucchietti, F. (2011).

Energy efﬁciency in the future internet: A survey of

existing approaches and trends in energy-aware ﬁxed

network infrastructures. Communications Surveys Tu-

torials, IEEE, 13(2):223–244.

Bullot, H., Les Cottrell, R., and Hughes-Jones, R. (2003).

Evaluation of advanced tcp stacks on fast long-

distance production networks. Journal of Grid Com-

puting, 1(4):345–359.

de Witt, S. and Lahiff, A. (2014). Quantifying xrootd scala-

bility and overheads. Journal of Physics: Conference

Series, 513(3).

Dean, J. and Ghemawat, S. (2008). Mapreduce: Simpli-

ﬁed data processing on large clusters. Commun. ACM,

51(1):107–113.

Dorigo, A., Elmer, P., Furano, F., and Hanushevsky, A.

(2005). Xrootd-a highly scalable architecture for data

access. WSEAS Transactions on Computers, 1(4.3).

Expsito, R. R., Taboada, G. L., Ramos, S., Tourio, J., and

Doallo, R. (2013). Performance analysis of HPC ap-

plications in the cloud. Future Generation Computer

Systems, 29(1):218 – 229. Including Special section:

AIRCC-NetCoM 2009 and Special section: Clouds

and Service-Oriented Architectures.

Fabozzi, F., Jones, C., Hegner, B., and Lista, L. (2008).

Physics analysis tools for the cms experiment at lhc.

Nuclear Science, IEEE Transactions on, 55:3539–

3543.

Gardner, R., Campana, S., Duckeck, G., Elmsheuser, J.,

Hanushevsky, A., H

onig, F. G., Iven, J., Legger,

F., Vukotic, I., Yang, W., and the Atlas Collabora-

tion (2014). Data federation strategies for atlas us-

ing xrootd. Journal of Physics: Conference Series,

513(4):042049.

Ha, S., Rhee, I., and Xu, L. (2008). Cubic: A new tcp-

friendly high-speed tcp variant. SIGOPS Oper. Syst.

Rev., 42(5):64–74.

Haeussler, M., Raney, B. J., Hinrichs, A. S., Clawson, H.,

Zweig, A. S., Karolchik, D., Casper, J., Speir, M. L.,

Haussler, D., and Kent, W. J. (2015). Navigating pro-

tected genomics data with ucsc genome browser in a

box. Bioinformatics, 31(5):764–766.

ahnel, M., D

obel, B., V

olp, M., and H

artig, H. (2012).

Measuring energy consumption for short code paths

using rapl. ACM SIGMETRICS Performance Evalua-

tion Review, 40(3):13–17.

Kliazovich, D., Bouvry, P., and Khan, S. U. (2010). Green-

cloud: a packet-level simulator of energy-aware cloud

computing data centers. The Journal of Supercomput-

ing, 62(3):1263–1283.

Kuo, J.-J., Yang, H.-H., and Tsai, M.-J. (2014). Optimal ap-

proximation algorithm of virtual machine placement

for data latency minimization in cloud systems. In IN-

FOCOM, 2014 Proceedings IEEE, pages 1303–1311.

Matsunaga, H., Isobe, T., Mashimo, T., Sakamoto, H., and

Ueda, I. (2010). Managed Grids and Cloud Systems in

the Asia-Paciﬁc Research Community, chapter Perfor-

mance of a disk storage system at a Tier-2 site, pages

85–97. Springer US, Boston, MA.

Mauch, V., Kunze, M., and Hillenbrand, M. (2013). High

performance cloud computing. Future Generation

Computer Systems, 29(6):1408 – 1416.

Meusel, R., Blomer, J., Buncic, P., Ganis, G., and Heikkil

S. S. (2015). Recent developments in the cernvm-ﬁle

system server backend. Journal of Physics: Confer-

ence Series, 608(1):012031.

O’Luanaigh, C. (2014). Openstack boosts tier 0 for lhc run

2. Technical report, CERN.

Piao, J. T. and Yan, J. (2010). A network-aware vir-

tual machine placement and migration approach in

cloud computing. In Grid and Cooperative Com-

puting (GCC), 2010 9th International Conference on,

pages 87–92.

Ponce, S. and Hersch, R. D. (2004). Parallelization and

scheduling of data intensive particle physics analysis

jobs on clusters of pcs. In 18th International Par-

allel and Distributed Processing Symposium (IPDPS

2004), Santa Fe, New Mexico, USA.

Reano, C., Mayo, R., Quintana-Orti, E., Silla, F., Duato,

J., and Pena, A. (2013). Inﬂuence of inﬁniband fdr

on the performance of remote gpu virtualization. In

IEEE International Conference on Cluster Computing

(CLUSTER), pages 1–8.

Shea, R., Wang, F., Wang, H., and Liu, J. (2014). A deep

investigation into network performance in virtual ma-

chine based cloud environments. In Proceedings of

IEEE INFOCOM, pages 1285–1293.

Sixth International Symposium on Business Modeling and Software Design

234