RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE

FOR ON-DEMAND SAAS SERVICES

Rodrigue Chakode

, Blaise Omer Yenke

1,2

and Jean-Franc¸ois M´ehaut

INRIA, LIG Laboratory, University of Grenoble, Grenoble, France

Dept. of Computer Science, University of Ngaoundere, Ngaoundere, Cameroon

Keywords:

On-demand Software-as-a-Service, Cloud computing, Virtualization, Resource sharing, Scheduling.

Abstract:

With the emerging of cloud computing, offering software as a Service appears to be an opportunity for soft-

ware vendors. Indeed, using an on-demand model of provisioning service can improve their competitiveness

through an invoicing tailored to customer needs. Virtualization has greatly assisted the emerging of on-demand

based cloud platforms. Up until now, despite the huge number of projects around cloud platforms such as

Infrastructure-as-a-Service, less open research activities around SaaS platforms have been carried on. This is

the reason why our contribution in this work is to design an open framework that enables the implementation

of on-demand SaaS clouds over a high-performance computing cluster. We have ﬁrst focused on the frame-

work design and from that have proposed an architecture that relies on a virtual infrastructure manager named

OpenNebula. OpenNebula permits to deal with virtual machines life-cycle management, and is especially

useful on large scale infrastructures such as clusters and grids. The work being a part of an industrial project

we have then considered a case where the cluster is shared among several applications owned by distinct soft-

ware providers. After studying in a previous work how to implement the sharing of an infrastructure in such

a context, we now propose policies and algorithms for scheduling jobs. In order to evaluate the framework,

we have evaluated a prototype experimentally simulating various workload scenarios. Results have shown its

ability to achieve the expected goals, while being reliable, robust and efﬁcient.

1 INTRODUCTION

Cloud computing has highlighted the suitability of

the on-demand model of invoicing against the sub-

scription model that has been applied until recently

by application service providers also know as ASP.

With cloud computing, Software-as-a-Service (SaaS)

appears to be a real opportunity for ASPs. Indeed,

they can improve their competitiveness by enabling

on-demand remote access to software (as service or

utility) in addition to traditional license and/or sub-

scription models. Virtualization has greatly assisted

the emerging of cloud computing that enabled the ef-

fective use of remote computing resources as utility.

Up until now, open research activities around cloud

computing are mostly focused on Infrastructure-as-a-

Service clouds leading to huge ecosystem of high-

quality resources. There are no real open research

projects around SaaS clouds.

Our contribution in this work is to design an open

framework that enables the implementation of on-

demand SaaS clouds over a high-performance com-

puting (HPC) cluster. We haveﬁrst proposed an archi-

tecture that relies on virtual machines to execute the

jobs associated to service requests. We have built a re-

source manager that relies on OpenNebula, a virtual

infrastructure manager that allows to deal with vir-

tual machines life-cycle management on large scale

infrastructures such as clusters and grids. The re-

source manager provides core functionalities to han-

dle requests and schedule the associated jobs. Then,

guided by the requirements of the Ciloe project

, we

have considered that the resources of the underlying

cluster are shared among several applications owned

by distinct providers, which contribute to purchase

the infrastructure and keep it up. We have focused

on scheduling of service requests within such a con-

text and proposed policies and algorithms. We have

implemented a prototype which is evaluated by simu-

lating various workloads, that derive from a real pro-

duction system workload. Experimental results have

This work is funded by Minalogic through the project Ciloe

(http://ciloe.minalogic.net). Located in France, Minalogic is a

global business competitive cluster that fosters research-led inno-

vation in intelligent miniaturized products and solutions for indus-

try.

352

Chakode R., Omer Yenke B. and Méhaut J..

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES.

DOI: 10.5220/0003387503520361

In Proceedings of the 1st International Conference on Cloud Computing and Services Science (CLOSER-2011), pages 352-361

ISBN: 978-989-8425-52-2

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

shown its ability to achieve the expected goals, while

being reliable, robust and efﬁcient.

We will ﬁrst describe the SaaS cloud within the

Ciloe project context, and then show the different ob-

stacles that we have faced in order to achieve the

project (Section 2). We will then present the re-

lated work around SaaS clouds and the different ap-

proaches concerning those obstacles (Section 3). Af-

ter that, we will describe the design and the archi-

tecture of the proposed framework (Section 4), along

with the policies and the algorithms of scheduling

used by the resource manager (Section 5). After pre-

senting the implemented prototype, we will show the

results of its experimental evaluation (Section 6). Fi-

nally, we will present our conclusion and future work

(Section 7).

2 CONTEXT: THE CILOE

PROJECT

The Ciloe project aims at developing an open frame-

work to enable the implementation of software-as-a-

service (SaaS) (Gene K. Landy, 2008)(Turner et al.,

2003) cloud computing (Vaquero et al., 2009) plat-

forms. A SaaS cloud is illustrated on Figure 1. The

cloud relies on a HPC cluster, while the model of ac-

cessing service is on-demand. A customer submits a

request along data through Internet. The execution of

the job that will process the request is scheduled. Af-

ter processing the output is returned to the customer.

Both software and computing resources (computing

nodes, storage, etc.) are owned by the software ven-

dor, which is also the service provider.

In our previous work (Chakode et al., 2010), we

have mentioned the case of clusters that can be shared

among several businesses in order to minimize the im-

pact of an initial deployment on their budgets. This is

the reason why this particular project involves three

small and medium software vendors. We have also

shown that when using this model of service provid-

ing, we can face different obstacles concerning job

and resource management, such as:

• Schedule of on-Demand Request. Allocate re-

sources for executing requests would be transpar-

ent: It would be achieved internally without ex-

plicit constraints provided by customers. Addi-

tionally, the allocation would ensure some pri-

oritization among customers: the providers can

differentiate the end-to-end quality of service of-

fered to their customers, distinguishing for in-

stance partners or regular customers against punc-

tual customers. However, the delivered service

has to be well deﬁned (for instance in term of rea-

sonably delivery time) to ensure customers satis-

faction.

• Fair-share of Resource. The underlying HPC

cluster is shared among several businesses collab-

orating to purchase the cluster and keep it in op-

erational condition. Therefore, the sharing has to

be fair: the application of each business has to be

guaranteed with a share or ratio of resource usage

according to its investment on the infrastructure.

• Efﬁcient Use of Resource. When resources are

idle, the software providers can use them to run

internal jobs, such as regression testing. However,

the jobs associated to such tasks are less priori-

tized than the customer ones.

Figure 1: The model of the expected infrastructure for SaaS

on-demand.

3 RELATED WORK

Up until now, the implementation of on-demand SaaS

platforms was studied by industries, leading to many

platforms such as SalesForce (for, ), Google (goo,

), Adobe (ado, ), Intacct (int, ), etc. Since these

platforms are closed, there are not open documenta-

tion about how they deal with resource management.

However, in the area of high performance computing

(HPC), the problems of sharing computing resources

and to use them efﬁciently have been studied.

3.1 Sharing of Computing Resources

Most batch job schedulers implement fair-share disci-

plines, from native operating systems (e.g. Linux), to

high level batch job schedulers such as Maui (Jackson

et al., 2001). In fair-sharing systems, resources are eq-

uitably allocated to run jobs or group of jobs accord-

ingly to shares assigned to them (Kay, J. and Lauder,

P., 1988)(Li and Franks, 2009). Fair-share schedul-

ing permits to avoid starvation, while allowing jobs to

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES

353

beneﬁt from the aggregated power of processing re-

sources. Fair-share can be observed over long periods

of time (Jackson et al., 2001; Kay, J. and Lauder, P.,

1988; Jackson et al., 2001) or over few clock cycles

(Linux systems). In most of scheduling systems, it is

implemented through dynamic priority policies guar-

anteeing higher priorities to jobs or groups of jobs that

have used few resources.

This approach does not seem to be suitable in the

Ciloe case. Indeed, the execution of some tasks could

be signiﬁcantly delay if all resources are used by long-

term jobs. Yet, even if the partners want to bene-

ﬁt from the aggregated power of computing nodes,

they would expect a reasonable waiting time for their

tasks. For a given partner, this expectation could be

particularly high when the amount of resources he has

used is less than the ratio of resources for which he

has invested.

3.2 Efﬁcient Use of Computing

Resources

The need of using computing resources efﬁciently

led to the emerging of advanced scheduling poli-

cies such as backﬁlling (Lawson and Smirni, 2002),

against the classical ﬁrst-come ﬁrst-serve (FCFS) pol-

icy adopted in earlier job schedulers. A backﬁlling-

applying scheduler allows newer jobs requiring less

resources to be executed than the former ones. De-

spite the fact that it may lead to starvation issues for

big jobs, the backﬁlling approach allows a more ef-

fective utilization of resources avoiding wasting idle

time.

Our approach is to carry out several novelties. The

design of the SaaS resource manager we have pro-

posed is opened from its architecture to internal poli-

cies and algorithms of scheduling jobs. It relies on

a generic model of resource management that con-

siders that the underlying computing infrastructure is

shared among several applications owned by distinct

SaaS providers. The policies and algorithms used to

enforce this sharing permit to guarantee shares of re-

source use to each application, even in period of high

load and/or high competition to accessing resources.

Since the effectiveness of the utilization of the whole

resources is a major point in our project, we also al-

low backﬁlling.

4 THE PROPOSED

FRAMEWORK

As previously mentioned (Chakode et al., 2010), it

has been shown that scheduling SaaS requests on-

demand upon such a shared cluster should enable ﬂex-

ibility and easy reconﬁgurability in the management

of resources. A dynamic approach of allocating re-

sources has been proposed. This approach aimed at

guaranteeing fair-sharing statistically, while improv-

ing the utilization of whole system resources.

We think that using virtual machines would be a

suitable solution to implement this approach. The

suitability of virtual machines for sharing comput-

ing resources has been studied (Borja et al., 2007).

The authors have claimed and shown that using vir-

tual machines allows to overcome some scheduling

problems, such as schedule interactive applications,

real-time applications, or applications requiring co-

scheduling. Furthermore, virtual machines enable

safe partitioning of resources. Being easily allocable

and re-allocable, they also enable easy reconﬁgurabil-

ity of computing environments. While the main vir-

tual machines drawback has been performance over-

head, it has been shown that this overhead can be

signiﬁcantly reduced with speciﬁc tuning, see for ex-

ample the works introduced in (Intel Corporation,

2006), (AMD, 2005), (Yu and Vetter, 2008), (Jone,

), (Mergen et al., 2006). Tuning being typically

implementation-dependent, we do not consider this

aspect in this work.

4.1 Global Architecture

In the model we have proposed, jobs would be run

within virtual machines to ensure safe node partition-

ing. The cluster is viewed as a reconﬁgurable virtu-

alized infrastructure (VI), upon which we build a re-

source manager component to deal with handling re-

quests, and scheduling the associated jobs on the un-

derlying resources pool. This system architecture is

shown on Figure 2. At the infrastructure level, the

scheduler relies on a VI Manager (VIM), to deal with

usual virtual machine live-cycle management capa-

bilities (creation, deployment, etc.), over large scale

infrastructures resources. Among the leading VIMs

which are OpenNebula (Sotomayor et al., 2009b),

Nimbus (Keahey et al., 2005), Eucalyptus (Nurmi

et al., 2009), Enomaly ECP (eno, ), VMware vSphere

(vmw, ), we have chosen OpenNebula considering

the following key-points, which are further detailed

in (Sotomayor et al., 2009a). OpenNebula is scal-

able (tested with up to 16, 000 virtual machines), open

source and it uses open-standards. It enables the pro-

grammability of its core functionalities through appli-

cation programmable interface (API), and supports all

of popular Virtual Machine Monitor (VMM) includ-

ing Xen, KVM, VMware, and VirtualBox.

CLOSER 2011 - International Conference on Cloud Computing and Services Science

354

The proposed resource manager replaces the de-

fault OpenNebula scheduler (mm sched), and acts as

the entry-point for all requests submitted on the in-

frastructure. As mentioned above, it provides core

functionalities for handling requests, scheduling the

execution of the associated jobs, monitoring their ex-

ecution, etc. To be more precise, it is in charge of vari-

ous tasks: selecting jobs to run, preparing virtual ma-

chines template along with suitable software stacks

(virtual appliances), selecting nodes onto which the

virtual machines will be run, and to then request

OpenNebula to perform the deployment. The sched-

uler relies on the XML-RPC interface of OpenNebula

to communicate with it.

Figure 2: Model of the Software-as-a-Service platform.

4.2 Resource Manager for SaaS

Platform

Like classical bash schedulers such as PBS (H. Feng

and Rubenstein, 2007) and OAR (Capit et al.,

2005), the scheduling system is designed following

a client/server architecture. In this section, we will

present the design of the server part consisting of four

main modules: an admission module, a scheduling

module, an executive module, and a monitoring mod-

ule. Schematically, the modules work together fol-

lowing the ﬂow described on Figure 3.

4.2.1 Admission Module

In charge of handling user requests, it processes the

following algorithm.

1: Listen for new request.

2: Once a request arrives, the module parses the re-

quest to check its validity (the syntax of valid re-

quests is well deﬁned and known by the system).

If the request is not valid, it is rejected and the

user is notiﬁed while the processing returns to

step 1.

3: When the request is valid, the associated job is

created, masked as in pending, and added to suit-

Figure 3: Overview of the resource manager workﬂow.

able service queue.

4: A wake-up signal is emitted towards the schedul-

ing module while the processing returns to step

4.2.2 Scheduling Module

In charge of selecting the job to execute it processes

the selection algorithms described in the subsection

5.4.1. Its processing workﬂowis described as follows.

1: Wait when there is no queued job.

2: Once waking up, check if there are idle resources.

If there are any, go to step 4; Otherwise, continue

to step 3.

3: If there are queued production jobs while there

are running best-effort jobs, it tries to preempt

some of the later to free sufﬁcient resources to

run the former. Otherwise, it waits for resources

to become idle (a wake up signal is emitted when

a job completes). The processing returns to step

2 after the waking up.

4: Check whether or not there are queued produc-

tion jobs. If there are any, select a job to execute

using the algorithm of the subsection 5.4.1. Oth-

erwise, select a queued best-effort job using the

algorithm of the subsection 5.4.2.

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES

355

5: After selecting a job, it forwards the job to the ex-

ecutive module (that is responsible for launching

it), and it resumes the processing to step 2.

4.2.3 Executive Module

It is in charge of allocating resources and launching

the execution of jobs. Aware that in general case some

applications can be distributed, we have decided to

focus mainly on applications that required only one

virtual machine, while the virtual machine can have

several CPU/cores. Otherwise, we should have intro-

duced a synchronization so that the job starts at the

end of virtual machines startup. The executive mod-

ule processes as follows:

1: Wait when there is any task to carry out.

2: Once a job has to be launched, it generates the

conﬁguration of the associated virtual machine.

The conﬁguration data consist of a virtual ma-

chine template (indicating among other things,

memory, CPU, swap, location of image ﬁle, etc.).

The template and the image ﬁle contain contextu-

alization data that permit: (i) to automatically set

the execution environment and start the job at the

virtual machine startup; (ii) send back a notiﬁca-

tion when the job is completed.

3: Select a host which free resources can match the

virtual machine requirements. The host selec-

tion is based on a greedy algorithm that iterates

through the host pool until reach a suitable host.

4: Request the underlying virtual infrastructure

manager to instantiate the virtual machine.

5: Request the underlying virtual infrastructure to

deploy the virtual machine onto the selected host.

6: Wait for the job to start and mark it as in ”run-

ning”, update pools, and resume the processing

to step 1.

4.2.4 Monitoring Module

This is a utility module. Among other things, it peri-

odically gathers statistics (concerning for instance re-

source usage, queues state, etc.), and logs them. Such

information can be used, for example, for analyzing

trends and making forecasts.

5 FAIR AND EFFICIENT

SCHEDULING

It can be deduced from the workﬂow introduced in

the subsection 4.2 that the guarantee of fairness and

efﬁciencyhighly depends on algorithms of scheduling

jobs.

5.1 Scheduling Assumptions

Two classes of jobs can be distinguished: production

jobs and best-effort jobs. Their scheduling obeys to

some assumptions that aim at making tradeoffs so to

be fair regarding the sharing, while being effective as

possible regarding resource utilization.

5.1.1 Production Jobs

They consist of jobs associated to the execution of

customer requests. For this reason, it is considered

that they can not be stopped or preempted so to guar-

antee end-to-end performance. Such jobs are higher

prioritized to accessing resources than best-effortjobs

(introduced below). We also distinct two subclasses

of production jobs: long-term jobs and short-term

jobs according to a given threshold duration.

Concerning production jobs, the tradeoffs consists

in applying the fair-guaranteed only on long-term

jobs. This would permit to overlap the fragmentation

or the wasting of resources caused by static partition-

ing by short-term jobs (Chakode et al., 2010), which

can be executed regardless of the sharing constraint.

Therefore, resources can be assigned for executing

long-term jobs if the share-guaranteed is respected.

Furthermore, with this assumption we expect a cer-

tain ﬂexibility in order to improve the utilization of

resources, while guaranteeing that a possible overuse

(that can occur after allocating resources to short-term

jobs) does not persist too long.

5.1.2 Best-effort Jobs

They consist of jobs associated to requests submitted

by the software providers. These jobs are less priori-

tized than production jobs. They should be preempted

when there aren’t any sufﬁcient resources to run some

queued production jobs. The main idea behind al-

lowing best-effort is to improve resource utilization.

Since they can be preempted every time, best-effort

jobs are scheduled regardless the constraints on fair-

sharing. Thus, as long as there are sufﬁcient free re-

sources, those resources can be used to execute the

jobs. In short, the policy used to select the jobs to

be preempted is based on newer-job ﬁrst order, which

means that newer jobs are preempted before the for-

mer ones. This policy aims at reducing the time

required to preempt jobs (a short-time running job

would take less time).

5.2 Scheduling Policies

The jobs scheduling is based on a multi-queues sys-

tem. Each application has two queues: one for pro-

CLOSER 2011 - International Conference on Cloud Computing and Services Science

356

duction jobs and the other one.

5.2.1 Queue Selection Policy

In order to avoid the starvation of some queues, the

selection of the service’s queue, in which a job will

be selected to execute, relies on a round-robin pol-

icy. All available queues are grouped in two lists of

queues (a list of production queues and a list of best-

effort queues). Each list is reordered after selecting

and planning a job to be executed. The reordering

moves the queue in which the job has been chosen to

the end of the related queue list.

5.2.2 Production Job Selection Policy

The selection of the queued production job to be ex-

ecuted is based on a priority policy. Indeed, each re-

quest is assigned with a priority according to a weight

assigned to the associated customer. The weights

are related to the customers importance to software

providers. Backﬁlling is allowed when selecting a job

from each queue.

5.2.3 Best-effort Job Selection Policy

The selection of the best-effort job to be executed is

based on a First-Come First-Served (FCFS) policy.

The queued jobs are planned onto resources accord-

ing the their arrival date, as long as there are sufﬁcient

idle resources. The policy also allows backﬁlling.

5.3 Share Guaranteed and System

Utilization

The fair-sharing is enforced assigning a ratio of re-

source usage to each application. Before executing a

job, the scheduler should check if the associated ser-

vice’s resource usage is less or equal than the granted

ratio after allocating resources. Resource usage is

then evaluated through a weighting relation among

computational and I/O resources. The weighting is

similar to that of Maui job priority factors.

usage = w

∗ mem usage + w

∗ cpu usage

+ w

∗ network usage + . . .

Each w

is a weight assigned to each kind of re-

sources. However, we have only considered memory

and CPU resources, and used a simple weighting rule

expressed as follows:

usage = Max(mem usage, cpu usage)

5.4 Job Selection Algorithms

We will present the essential of the algorithm instead

of speciﬁc implementation details. We assume that

we know (or evaluate in a prior stage) the CPU and

memory requirements, and the job duration.

5.4.1 Production Jobs Selection

The selection of production jobs to execute takes into

account the available idle resources, priorities among

jobs, the assigned ratios of resource usage, job du-

ration, and submission dates. It allows backﬁlling,

and as noted in the subsection 5.1, the constraint of

share-guaranteed is only applied on long-term jobs.

An overview of the algorithm is shown on Figure 4,

and further described in the Algorithm 1

5.4.2 Best-effort Jobs Selection

The algorithm used for selecting the best-effort job to

be executed is similar to Algorithm 1. However, there

is one main difference. Indeed, resource selection is

not subjected to compliance with shares-guaranteed,

and then the associated usage of resources is not ac-

counted when checking it (steps 10 and next).

Figure 4: Overview of the job selection principle. Produc-

tion queues are sorted according to job priorities.

6 EVALUATION

In order to validate the proposed resource manager,

we have built a prototype named Smart Virtual Ma-

chines Scheduler (SVMSched). It is a multi-process

application, which processes represent the compo-

nents introduced in subsection 4.2.

6.1 Experimental Environment

The experimental platform has consisted of 8 multi-

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES

357

Algorithm 1: Select the production job to execute.

Require: sorted job queues: Let j and j

′

be two jobs,

be a queue of production jobs, and Q

the

set of all production queues. j 6= j

′

∈ q

∈ Q

;

priority( j) a function that returns the priority of

a given job j; subTime( j) a function that returns

the submission time of a given job j; rank( j, q)

a function that returns the rank of a given job j

in the queue q. Then, if priority( j) < priority( j

′

)

or if priority( j) = priority(j

′

) while subTime( j) <

subTime( j

′

) ⇒ rank( j, q

) < rank( j

′

, q

Ensure: If the variable found is true, then the variable se-

lectedJob contains the job to be executed. Otherwise,

it means that no job can be executed while respecting

the scheduling assumptions. When found is true, the

value of the variable totalCompliance (which can be

true or false) allows to know whether the fair-sharing

compliance will be overridden after execution.

1: found ← false

2: totalCompliance ← false

3: q ← getNextCandidateQueue (Q

)

4: while ( totalCompliance = false )

and ( q != endOfQueue set (Q

) ) do

5: service ← getService (q)

6: grantedUsage ← getGrantedUsage (service)

7: job ← getFirstJob (q)

8: while ( job != endOfQueue (q) )

and ( totalCompliance = false ) ) do

9: if ( sufﬁcientResourcesToRun (job) ) then

10: nextUsage ← expectedUsage(job, service)

11: if ( nextUsage ≤ grantedUsage )

( ( nextUsage > grantedUsage )

and ( hasShortDuration (job) ) ) then

12: selectedQueue ← q

13: selectedJob ← job

14: found ← true;

15: if ( nextUsage ≤ grantedUsage ) then

16: totalCompliance ← true

17: end if

18: end if

19: end if

20: job ← getNextJob( q )

21: end while

22: q ← getNextCandidateQueue (Q

)

23: end while

24: if ( found = true) then

25: updateQueuesOrder (selectedQueue, Q

)

26: end if

core nodes of the Grid5000

Genepi cluster located

in Grenoble. Each one of the nodes consists of 2

chips (CPU) of 4 Xeon cores (2.27Ghz), 24GB RAM,

2 NICs (Ethernet 1Gbit/s, Inﬁniband 40Gbit/s). We

have used Xen version 3.4.2 as the backend VMM

for OpenNebula which has been deployed on six of

the nodes working as cluster nodes. OpenNebula ver-

sion 1.4 and SVMSched have been installed on the two

other nodes. We have used an NFS-based OpenNeb-

https://www.grid5000.fr

ula deployment where the images of virtual machines

are stored in an NFS-attached repository. The NFS

and the OpenNebula server have been mutualized on

the same node. Each virtual machine has to have a

round not-nil number of CPU-cores dedicated, allow-

ing up to 48 virtual machines in the virtual infrastruc-

ture. The NFS system relies on the Inﬁniband net-

work through IP-over-Inﬁniband, while the Ethernet

network supports the infrastructure virtual network.

6.2 Workload

We have simulated the system workload using a ﬁl-

tered workload from the SHARCNET log

. We have

extracted the last 96 jobs related to partitions 8, 4,

and 2 with a number of allocated CPU less or equal to

8. The resulting workload consists of 96 jobs among

which 65 of them are related to partition 8, 12 of them

to partition 4, and 19 of them to partition 2. In or-

der to transpose this workload in our context, we have

assumed that the requests related to partitions 8, 4,

2 are associated to three applications or services de-

noted App1, App2, and App3, respectively.

6.3 Ability of Enforcing

Share-guaranteed

In a ﬁrst experiment, we have evaluated the behavior

of the resource manager when the constraint of share-

guaranteed is applied to all jobs, regardless of their

duration. Precisely, we have evaluated the ability to

enforce the policy denoted Rigid Policy, and the level

of resource utilization (amount of CPU/cores used).

Intuitively and regardless of each application load de-

mand, we have considered that a same ratio of re-

sources usage is assigned to each one of them. Thus,

it can use up to 1 out 3 resources, equivalent to 16

cores.

On Figure 5, we have plotted the amount of CPU-

cores used by each one of the applications over time.

The graphs shows that the system manages to guaran-

tee that none of the applications uses more resources

than the amount allowed. For instance, even if App1

has high demand, it won’t use more than 16 cores. On

the contrary, we can observe that some applications,

namely App2 and App3, use less resources than al-

lowed. In this situation, we can expect such a result

(Chakode et al., 2010). During the experiment, the

maximum of resources used by all the applications

is 27 cores, against an average of 15 cores used over

time, for a total duration of more than 35.000 seconds.

http://www.cs.huji.ac.il/labs/parallel/workload/

l sharcnet/index.html

CLOSER 2011 - International Conference on Cloud Computing and Services Science

358

Figure 5: Plot of time versus the amount of CPU-cores used

when the Rigid Policy is enforced. All of jobs related to a

given application can not use more than 16 cores.

6.4 Beneﬁt of the Flexibility: Stay Fair

while Improving System Utilization

In a second experiment, we have evaluated the ben-

eﬁt of the Flexible Policy that consisted in applying

the constraint of share-guaranteed only on long-term

jobs. In this experiment, jobs with a duration of more

than 30 minutes are considered as long-term jobs. For

information, the total duration of such jobs represents

more than 80% of the total duration of all jobs.

Figure 6 has shown that this policy permits to im-

prove the utilization of the whole system resources.

With a maximum of 40 and an average of 17.5 cores

against respectively 27 and 15 cores in the ﬁrst exper-

iment, the gain is equivalent to about 2.5 cores during

the experiment, and thus permits to reduce the total

duration to around 32700 seconds against 3500 sec-

onds for the ﬁrst experiment. Furthermore, execut-

ing short-term jobs related to a given application does

not signiﬁcantly delay the starting of the jobs related

to other applications. For instance, in the ﬁrst 10000

seconds, the delay that could have caused the overuse

of resources by App1 is not perceptible on the starting

of jobs related to App2 and App3.

6.5 Utilization, Reliability,

Performance, Robustness

In the previous results, it has appeared that the equal

repartition does not reﬂect the demand of resources

by each application. This has led to a low utiliza-

tion of the available resources. Ideally, in the context

of sharing like the one we have considered (see sec-

tion 2), the software vendorssharing the infrastructure

would invest to get ratios of resource usage according

Figure 6: Plot of time versus the amount of CPU-cores used

when the Flexible Policy is enforced. The long-term jobs

related to an given application can not used more than 16

cores.

to their needs, which would be evaluated and dimen-

sioned suitably. Therefore, in the next two experi-

ments we have considered that each application has a

ratio of usage proportional to the number of jobs re-

lated to it. This corresponds to respectively assigning

the equivalent of 32.5, 9.5 and 6 cores to App1, App2

and App3. Additionally, in order to evaluate the be-

havior of the resource manager facing high loads, we

have also considered that all of the 96 jobs have been

submitted consecutively without any delay in the sys-

tem.

On Figure 7, the graphs have shown that the sys-

tem is still able to enforce the selected policy. We

also can abserve that with a suitable dimensioning of

ratios, the utilization of resource is signiﬁcantly im-

proved with an average of 27 cores and another one

of 34.5 cores used during the experiments, against 15

and 17.5 cores in the ﬁrst two experiments. Therefore,

this has led to reducing the time required to complete

all jobs.

We have observed that the resource manager has

successfully efﬁcient upon high loads. Indeed, in both

experiments, all of jobs submitted have been success-

fully handled and the associated jobs have been added

in suitable queues. The system has required less than

three seconds to handle the 96 jobs and add them to

the queues.

7 CONCLUSIONS

In this paper, we have designed a resource manager to

deal with the implementation of software-as-a-service

platform over a HPC cluster. We have ﬁrst studied the

design and proposed an architecture that relies on ex-

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES

359

Figure 7: Plot of time versus the amount of CPUs used when the ratio of resource usage granted to each application is related

to its demand.

ecuting the jobs related to service requests onto vir-

tual machines. The proposed system relies on Open-

Nebula to deal with virtual machines life-cycle man-

agement. We have then considered a case in which

the cluster is shared among a limited number of ap-

plications owned by distinct software providers. We

have proposed policies and algorithms to schedul-

ing jobs so to fairly allocate resources regarding the

applications, while improving the utilization of the

cluster resources. We have implemented a prototype

that has been evaluated. Experimental results have

shown the ability of the system to achieve the ex-

pected goals, while being reliable, robust and efﬁ-

cient. We think that the proposed framework would

be useful for both industries and academics interested

in SaaS platforms. It can be also easily extended to

support the implementation of Platform-as-a-Service

(PaaS) clouds.

After the prototype stage, some features should

be either improved or added in the proposed re-

source manager before releasing a production-ready

version. For instance, since it currently only re-

lies on the OpenNebula’s XML-RPC authentiﬁca-

tion mechanism, this resource manager would need

a strong credential mechanism in order to be more

secure. Furthermore, since it only supports single

virtual machine-based applications, it should be ex-

tended in order to support distributed applications.

We intend to deal with these issues in our future work.

REFERENCES

Adobe PDF Online. http://createpdf.adobe.com/.

Enomaly Home. http://www.enomaly.com.

Force.com. http://www.salesforce.com/platform/.

Google Apps. http://www.google.com/apps.

Intacct Home. http://www.intacct.com/.

VMware Home. http://www.vmware.com/.

AMD (2005). Amd64 virtualization codenamed asia paciﬁc

technology: Secure virtual machine architecture refer-

ence manual. (Publication No. 33047, Revision 3.01).

Borja, S., Kate, K., Ian, F., and Tim, F. (2007). En-

abling cost-effective resource leases with virtual ma-

chines. In Hot Topics session in ACM/IEEE Interna-

tional Symposium on High Performance Distributed

Computing.

Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin,

C., Mouni´e, G., Neyron, P., and Richard, O. (2005). A

batch scheduler with high level components. In Clus-

ter computing and Grid.

Chakode, R., M´ehaut, J.-F., and Charlet, F. (2010). High

Performance Computing on Demand: Sharing and

Mutualization of Clusters. In Proceedings of the 24th

IEEE International conference on Advanced Informa-

tion Networking and Applications, pages 126–133.

Gene K. Landy, A. J. M. (2008). The IT / Digital Le-

gal Companion: A Comprehensive Business Guide to

Software, IT, Internet, Media and IP Law, pages 351–

374. Burlington: Elsevier.

H. Feng, V. M. and Rubenstein, D. (2007). PBS: a uni-

ﬁed priority-based scheduler. In SIGMETRICS, pages

203–214.

Intel Corporation (2006). Intel Virtualization Technology.

Intel Technology Journal, 10(3).

Jackson, D. B., Snell, Q., and Clement, M. J. (2001). Core

algorithms of the maui scheduler. In Revised Papers

from the 7th International Workshop on Job Schedul-

ing Strategies for Parallel Processing, pages 87–102.

Springer-Verlag.

Jone, T. Linux virtualization and pci passthrough.

http://www.ibm.com/developerworks/linux/library/l-

pci-passthrough/.

Kay, J. and Lauder, P. (1988). A fair share scheduler. Com-

mun. ACM, 31(1):44–55.

CLOSER 2011 - International Conference on Cloud Computing and Services Science

360

Keahey, K., Foster, I., Freeman, T., and Zhang, X. (2005).

Virtual workspaces: Achieving quality of service and

quality of life in the grid. Sci. Program., 13:265–275.

Lawson, B. G. and Smirni, E. (2002). Multiple-queue back-

ﬁlling scheduling with priorities and reservations for

parallel systems. In In Job Scheduling Strategies for

Parallel Processing, pages 72–87. Springer-Verlag.

Li, L. and Franks, G. (2009). Performance modeling

of systems using fair share scheduling with layered

queueing networks. In Modeling, Analysis Simulation

of Computer and Telecommunication Systems. MAS-

COTS ’09, IEEE International Symposium on, pages

1 –10.

Mergen, M. F., Uhlig, V., Krieger, O., and Xenidis, J.

(2006). Virtualization for high-performance comput-

ing. SIGOPS Oper. Syst. Rev., 40(2):8–11.

Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., So-

man, S., Youseff, L., and Zagorodnov, D. (2009). The

Eucalyptus open-source cloud-computing system. In

9th IEEE/ACM International Symposium on Cluster

Computing and the Grid, volume 0, pages 124–131.

IEEE.

Sotomayor, B., Montero, R. S., and Foster, I. (2009a).

An Open Source Solution for Virtual Infrastructure

Management in Private and Hybrid Clouds. Preprint

ANL/MCS-P1649-0709, 13.

Sotomayor, B., Montero, R. S., Llorente, I. M., and Foster,

I. (2009b). Virtual Infrastructure Management in Pri-

vate and Hybrid Clouds. IEEE Internet Computing,

13:14–22.

Turner, M., Budgen, D., and Brereton, P. (2003). Turning

Software into a Service. Computer, 36(10):38–44.

Vaquero, L. M., Rodero-M., L., Caceres, J., and Lindner, M.

(2009). A break in the clouds: towards a cloud deﬁni-

tion. SIGCOMM Comput. Commun. Rev., 39(1):50–

55.

Yu, W. and Vetter, J. S. (2008). Xen-Based HPC: A Paral-

lel I/O Perspective. Cluster Computing and the Grid,

IEEE International Symposium on, 0:154–161.

RESOURCE MANAGEMENT OF VIRTUAL INFRASTRUCTURE FOR ON-DEMAND SAAS SERVICES

361