Secure Scheduling of Scientiﬁc Workﬂows in Cloud

Shubhro Roy

, Arun Ramamurthy

, Anand Pawar

2,∗

, Mangesh Gharote

and Sachin Lodha

TCS Research, Tata Consultancy Services, India

Indian Institute of Information Technology, Pune, India

sachin.lodha@tcs.com

Keywords:

Scientiﬁc Workﬂow Scheduling, Cloud Computing, Security in Cloud, Task Scheduling, Evolutionary

Optimization Algorithms.

Abstract:

Scheduling of tasks in scientiﬁc workﬂow has been challenging due to heterogeneous and interdependent tasks

in workﬂow. The scheduling involves selection of different types of virtual machines (VM) belonging to differ-

ent instance series (computing, memory, storage) to minimize the overall execution cost and time (makespan).

Apart from VM selection, selection of security services (such as authentication, integrity, conﬁdentiality) is

critical. In this paper, we propose OptReUse - a workﬂow schedule generation algorithm for efﬁcient reuse of

VMs. Our approach of OptReUse algorithm along with combinatorial optimization approach gives lower cost

of scheduling compared to the prior art without incurring delay in the makespan. Further, we enhance the se-

curity model by accurate estimation of risks. Our experiments using standard scientiﬁc workﬂows demonstrate

that the proposed method gives lower costs compared to the prior VM resource utilization methods.

1 INTRODUCTION

Scientiﬁc workﬂow consists of several heterogeneous

and interdependent computational tasks. Such work-

ﬂows could occur in various research areas, such as in

physics, bioinformatics, seismology, industrial con-

trol systems, etc. Large-scale computational infras-

tructure is needed to schedule them. Scheduling in-

volves processing the tasks on selected computational

resources considering the dependency order. Since

Cloud Service Providers (CSPs) offer highly scalable

computational resources, such as Virtual Machines

(VMs) at pay-per-use model, cloud has emerged as a

cost- and time-effective platform for solving scientiﬁc

workﬂow problems.

CSPs offer different series of VMs such as com-

putation intensive, memory intensive and data inten-

sive, for processing heterogeneous tasks. Each series

of VMs contains different VM types which vary in

renting cost and processing capacity. Higher process-

ing capacity VMs help in lowering the workﬂow exe-

cution time (makespan) but may lead to higher rental

cost. While lower processing capacity VMs could be

cheaper, processing all tasks on them might lead to vi-

olation of deadlines. So, selecting the right VM com-

∗

Intern, TCS Research

bination for heterogeneous tasks becomes a challeng-

ing combinatorial optimization problem.

2 RELATED WORK

In the study (Liu et al., 2020) present a detailed

survey on scientiﬁc workﬂow scheduling algorithms.

Finding the right combination of VMs is a challeng-

ing combinatorial optimization problem (Zhou et al.,

2019; Mboula et al., 2020). As this problem is NP-

Hard (Hilman et al., 2018), researchers have pro-

posed different evolutionary optimization algorithms

such as Genetic Algorithm (GA) (Shishido et al.,

2018), Particle Swarm Optimization (PSO) (Li et al.,

2016; Shishido et al., 2018), Improved PSO (Peng and

Wolter, 2019), Frog Leaping Algorithm (Kaur and

Mehta, 2017), Fireﬂy based approach (Adhikari et al.,

2020), and deadline constrained co-evolutionary GA-

based algorithm (Liu et al., 2017).

Evolutionary optimization algorithms are used to

determine the optimal VM combinations for process-

ing the tasks in the workﬂow. A separate Workﬂow

Schedule Generation (WSG) algorithm is required to

efﬁciently schedule each of the tasks on the selected

VM combination. WSG algorithm would compute

the: workﬂow execution cost and makespan based on

170

Roy, S., Ramamurthy, A., Pawar, A., Gharote, M. and Lodha, S.

Secure Scheduling of Scientiﬁc Workﬂows in Cloud.

DOI: 10.5220/0011038600003200

In Proceedings of the 12th International Conference on Cloud Computing and Services Science (CLOSER 2022), pages 170-177

ISBN: 978-989-758-570-8; ISSN: 2184-5042

the selected VM combinations. Further, researchers

proposed different VM resource utilization strate-

gies to reduce the workﬂow execution cost. In the

study (Lee et al., 2015) proposed a resource utiliza-

tion strategy by delaying a combination of few tasks.

This strategy resulted in increase in makespan while

achieving the cost beneﬁts. In other study (Malawski

et al., 2015) used a similar strategy of delaying tasks

to improve resource utilization. They proposed a de-

cision algorithm for scheduling workﬂows, consider-

ing budget and deadline constraints. However, if in-

crease in makespan resulted in violation of deadline,

then new VMs with higher processing capacity were

selected, which resulted in higher rental cost.

Further, (Li et al., 2016) and (Shishido et al.,

2018) proposed a WSG algorithm where VMs were

reused among different tasks and cost beneﬁts were

achieved without delay in makespan. Thus in sum-

mary, prior art has addressed different heuristics and

evolutionary algorithms (Challita et al., 2017) for bet-

ter resource utilization but most have not included

the concept of VM reuse or do not explore all pos-

sible VM reuse options. In this paper, we propose

OptReUse algorithm that overcomes the above stated

limitations, and obtain higher cost savings compared

to WSG. Speciﬁcally, we reduce the cost by explor-

ing all possible VM reuse options, and by not includ-

ing any compulsory delay in task. Other VM utiliza-

tion strategies proposed by (Weng et al., 2016) are by

monitoring overheads. Further, (Ramamurthy et al.,

2021) addresses the problem of VM allocation for sci-

entiﬁc workﬂow considering multi-objectives and un-

certainties.

Another critical factor in scheduling of scientiﬁc

workﬂows in cloud is security. The security in cloud

is a shared responsibility (Kumar et al., 2018). CSPs

provide security services to mitigate different attacks.

We focus on three crucial security attacks alteration,

snooping, and spooﬁng attacks. Alteration affects

the data integrity, snooping affects conﬁdentiality and

spooﬁng provides illicit access to sensitive data. To

mitigate these attacks, different cryptographic algo-

rithms are used for task security (Li et al., 2016;

Shishido et al., 2018). However, cryptographic algo-

rithms that provide stronger security have higher over-

heads in terms of computation time, which increase

the makespan and operational cost. Further, all the

tasks in workﬂow, might not require the same level of

security services. Thus, the problem of scheduling of

scientiﬁc workﬂow involves selecting the right com-

binations of VMs and security model (i.e. different

security services and level) for keeping the risk rate

below permissible limit.

The security model proposed in the prior art (Li

et al., 2016) considers that each task would require

cryptographic security services for a limited period,

more speciﬁcally, for at most one hour. But tasks can

continue to use a VM for several hours till the pro-

cessing is complete. Hence, risk is underestimated in

the security model in prior art (Li et al., 2016). We

propose an enhanced security model that keeps the

risk rate of the workﬂow below a permissible limit

while accurately estimating the risk. Thus, the main

contributions of the paper are:

• OptReUse algorithm for efﬁcient reuse of VMs to

obtain cost beneﬁts, and

• An enhanced security model, and more accurate

estimation of risks.

The rest of the paper is organized as follows: Section

3 describes the problem and the system model. Sec-

tion 4 compares the WSG and OptReU se algorithms.

Section 5 presents coding strategies for GA and PSO

and contains the experimentation results. Conclusion

and future works are laid out in Section 6.

3 WORKFLOW MODEL

A scientiﬁc workﬂow is modeled as a Directed

Acyclic Graph (DAG) and represented as DAG(T,E),

where T represents the set of tasks {t

...,t

}

modeled as vertices and E represents the set of edges

,. . . , e

} modeled as dependencies between the

tasks. The predecessor and successor of a task t

represented as pre(t

) and suc(t

). A task t

cannot

start until all its predecessor tasks have completed

their execution. A sample workﬂow is shown in Fig-

ure 2.

CSPs provide different series of VMs for process-

ing heterogenous tasks. We consider three series of

VMs namely Computing Optimized, Memory Opti-

mized and Storage Optimized (refer Tables 2, 3 and 4

from (Li et al., 2016)). Each VM is rented based on

an hourly pricing model. For example, if a task has

a processing time of 1 hour and 10 minutes on a VM

, the user must pay for 2 hours. We assume there

is no bound on the number of VMs that can be rented

from a CSP.

The scientiﬁc workﬂows running on cloud are vul-

nerable to different types of security attacks. In this

work, we have considered three types of attacks: 1)

Snooping attack (theft of information), 2) Alteration

attack (modiﬁcation of information), 3) Spooﬁng at-

tack (deceitful access to information). To protect the

VMs against these attacks, security services such as

authentication, integrity and conﬁdentiality are used.

Secure Scheduling of Scientiﬁc Workﬂows in Cloud

171

Different security algorithms under these security ser-

vices are listed in (Xie and Qin, 2006) (Tables 1, 2 and

3), (Li et al., 2016) (Table 5). As discussed in these

works, the algorithms differ in the offered security

level and the associated overhead. In general, an al-

gorithm with a higher security level has a higher over-

head than the one with a lower security level. Hence,

using lower levels of security services can reduce cost

and makespan but increases the attack probability and

vise-a-versa. The security overheads for the three ser-

vices (a,g,c) are computed as given in (Li et al., 2016)

and (Xie and Qin, 2006). The overall security over-

head for task t

is the sum of individual security over-

heads as below:

T SC(t

) = SC

) + SC

) (1)

The security overhead signiﬁcantly increases the task

processing time and leads to higher task execution

cost. Hence, optimal selection of security levels

for each security service is essential for minimizing

workﬂow execution cost and makepsan.

The pictorial representation of task processing on

VMs is obtained from (Li et al., 2016) (Figure 2) and

the equations for task execution process analysis are

modiﬁed in our paper. For processing a task (say, t

the output data (GB) from the VM of its predecessor

task (say, t

) needs to be transferred to the VM of t

The total input data transfer time for task is given as

T T (t

) =

∑

∈pre(t

)

/B (2)

Note that d

= 0 if t

reuses the same VM instance

type allocated to t

since data transfer is not required

when a VM is reused. The transfer time for start tasks

is assumed to be zero.

The execution time and total processing time (in-

cluding all security overheads) for a task is given by

ExT (t

,vm

) = W

(3)

PT (t

,vm

) = T T (t

) + ExT (t

,vm

) + T SC(t

) (4)

The start time and end time for a task is related as

ET (t

) = ST (t

) + PT (t

,vm

) (5)

Total Execution Cost for the entire workﬂow can be

also computed as

T EC =

n−1

∑

i=0

d(ET (t

) − ST (t

) − IT

,vm

))/60ec

(6)

. Since VMs are borrowed on hourly basis, the idle

time exists between the completion of the task and

the end of that hour slot. The Total Execution Time

(T ET ), that is, the makespan of a workﬂow, is given

by the equation as

T ET = max{ET (t

)|t

∈ T } (7)

Table 1: Notation.

Symbol Deﬁnition and units

An individual task of a workﬂow

VM of instance series s and type k

Processing Capacity (MFLOPS)

Renting cost ($/hr)

q = {a, g,c} Set of security services

a,g Authentication, Integrity

c Conﬁdentiality

Required security levels

Provided security level

) Individual security overhead

T SC(t

) Total security overhead

Output data from predecessor task

B Bandwidth of data transfer (GB/s)

Tt(t

) Total input data transfer time (min)

Task workload (MFLOP)

ExT (t

,vm

) Task execution time on vm

PT (t

,vm

) Total task processing time on vm

ST (t

), ET (t

) Start and End time of task (hr)

T EC Total Execution Cost ($)

,vm

) Available idle time on vm

(min)

T ET Total Execution Time (hr)

3.1 Risk Analysis

Different security services are provided to mitigate

risk of attacks. It could happen that, there is differ-

ence between the security services required and ser-

vices provided for a particular workﬂow. Hence, the

risk of attack exists.

The security model considered in the prior art (Li

et al., 2016) assumed the risk rate over a unit time

interval. While the workﬂow execution could occur

over several time intervals (hours). Hence, security

model needs to be enhanced. The risk probability of

attack for a given task P(t

,sl

) is given as

P(t

,sl

) = 1 − exp(−λ

(sr

− sl

)N(t

)) (8)

where, N(t

) is the number of time intervals (each

time interval could be an hour) for which t

is exe-

cuted on the VM. For a given task t

, the arrival rate

(λ

) of snooping, alteration and spooﬁng attacks is as-

sumed to follow a Poisson distribution. Consequently,

the risk probability P(t

) for task t

, involving all three

security services is computed as

P(t

) = 1 −

∏

l∈{a,g,c}

(1 − P(t

,sl

)) (9)

For a given a workﬂow, consisting of a set of T tasks,

risk probability P(T ) can be computed as

P(T ) = 1 −

∏

∈T

(1 − P(t

)) (10)

This value of P(T ) must be lower than the risk rate

threshold P

∈ [0, 1]), which is the permissible risk

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

172

rate of the workﬂow. The constraint P(T) ≤ P

, can

be also written as 1 − P(T ) ≥ 1 − P

∏

∈T

(1 − P(t

)) ≥ 1 − P

(11)

On further expanding the LHS

∏

∈T

∏

l∈{a,g,c}

(1 − P(t

,sl

)) ≥ 1 − P

(12)

∏

∈T

∏

l∈{a,g,c}

exp(−λ

(sr

− sl

)N(t

)) ≥ 1 − P

(13)

Taking log on both sides the inequality becomes:

∑

∈T

∑

l∈{a,g,c}

−λ

(sr

− sl

)N(t

) ≥ log(1 − P

) (14)

By introducing the correction factor N(t

), the secu-

rity model is enhanced and it provides a better es-

timation of risk. However, providing security levels

to tasks lower than required could result in violation

of risk rate constraint whereas higher security levels

could result in high cost and makespan. This makes

the workﬂow scheduling problem challenging.

4 OUR APPROACH

The ﬂow chart in Figure 1 shows our methodology

for scientiﬁc workﬂow scheduling. As stated earlier,

we use evolutionary optimization algorithm (GA or

PSO) to obtain a best combination of VM types and

security levels for tasks in a workﬂow. Further, an ef-

ﬁcient workﬂow schedule generation is required to al-

locate the selected VMs for lowering the cost and the

makespan. The efﬁcient workﬂow schedule is gener-

ated using OptReUse algorithm, which does the VM

reuse along with the schedule generation. The process

stops, when converge criteria is met (i.e. best solution

is obtained).

Problem Statement: Given a scientiﬁc workﬂow

with n tasks, the problem is to determine the optimal

combination of VM types and security levels for each

task such that the overall execution cost is minimal,

and the workﬂow is processed within a given dead-

line (T

) and permissible risk limit.

Objective:

Minimize T EC (15)

Subject to:

T ET ≤ T

(16)

P(T ) ≤ P

(17)

Since scientiﬁc workﬂow scheduling problem is a

combinatorial optimization problem and NP-Hard (Li

Input: Workflow

task parameters

Generate a set of combinations using

evolutionary optimization algorithm

Workflow schedule generation

using OptReUse

Convergence

Output: Best combination

(VM, security services & task order selection)

YES

Figure 1: Approach for Scientiﬁc Workﬂow Scheduling.

et al., 2016), we use evolutionary optimization al-

gorithm for solving the problem. We demonstrate

the results using both the GA and PSO. In the be-

low section, ﬁrst we illustrate the beneﬁts of VM

re-utilization and in the subsequent section, discuss

the intuition behind our workﬂow schedule genera-

tion OptReUse algorithm.

4.1 Intuition: WSG and OptReUse

If each task is allocated to a separate VM then it leads

to under utilization of the resources. For the task with

similar instance type, VM can be reused, resulting in

reduced rental costs and lower data transfer delays.

WSG proposed by (Li et al., 2016), (Shishido et al.,

2018) helps to reduce the cost and transfer time delays

by reuse of VMs among adjacent tasks. For example,

consider the workﬂow instance as shown in Figure

2. The tasks are traversed by a traversal order, such

as topological sort (t

), and simultaneously

allocated to VMs based on the VM types selected by

the evolutionary optimization algorithm. This enables

to reuse the VM allocated to t

and avoid extra data

Data intensive

Compute intensive

Data intensive

Memory intensive

PT = 70 mins

ST = 0 mins

ET = 70 mins

IT = 50 mins

PT = 80 mins

ST = 70 mins

ET = 150 mins

IT = 30 mins

PT = 90 mins

ST = 70 mins

ET = 160 mins

IT = 30 mins

PT = 200 mins

ST = 285 mins

ET = 485 mins

IT = 40 mins

PT = 125 mins

ST = 160 mins

ET = 285 mins

IT = 55 mins

WSG

PT = 125 mins

ST = 160 mins

ET = 285 mins

IT = 15 mins

OptReUse

Figure 2: Sample workﬂow with task processing time.

Secure Scheduling of Scientiﬁc Workﬂows in Cloud

173

transfer delays.

For t

and t

, the total cost to be paid is for 150

minutes (i.e. 3 hours) and for the remaining tasks new

VMs need to be borrowed. Task t

which belongs to

the same VM instance series (Data intensive), a new

VM is borrowed for 125 mins (i.e. 3 hours). The

WSG approach proposed in prior art (Li et al., 2016),

(Shishido et al., 2018) keeps the VM reuse limited

between adjacent tasks only.

The VM on which t

is processed has an idle time

of 30 minutes (180−(70 +80)). Task t

which is non-

adjacent task to t

can also avail this idle time. Note,

task (t

, t

and t

) belongs to the same instance series.

But as task t

starts only after completion of task t

(i.e

ST = 160 mins). The available idle time on VM for t

is 20 minutes, which can be utilized by task t

. Hence,

the VM rental cost for task t

has to be paid only for

105 minutes (125 −20)) (i.e. 2 hours), which is lower

than the cost paid for t

using WSG approach. Our

proposed OptReUse algorithm is based on this VM

reuse strategy. Along with this strategy, task ordering

has signiﬁcant impact of VM reuse. In the subsequent

section, we elaborate on how ordering of task belong-

ing to the same VM instance series can further reduce

the VM rental cost.

4.2 Task Order Selection and VM Reuse

Suppose, tasks t

, t

and t

are part of a workﬂow, as

displayed in Figure 3. Assume, these tasks belong to

the same VM instance series (data intensive). Con-

sider that the least expensive VM types are selected

for these tasks. The ﬁrst workﬂow in Figure 3 dis-

plays, task t

reuses the VM and in second workﬂow

task t

reuses the VM of task t

• If t

reuses the VM of t

, then the total cost to be

paid for t

and t

is for 150 minutes (i.e. 3 units)

and the cost for t

has to be paid for 130 minutes

(i.e. 3 units). Thus, the total cost to be paid for all

the three tasks is 6 units.

• If t

has reused the VM of t

, then the total cost

to be paid for t

and t

is for 170 minutes (i.e. 3

Reuse

PT = 40 mins

ST = 0 mins

ET = 40 mins

IT = 20 mins

Cost = 1 unit

PT = 40 mins

ST = 0 mins

ET = 40 mins

IT = 20 mins

Cost = 1 unit

PT = 110 mins

ST = 40 mins

ET = 150 mins

IT = 30 mins

Cost = 2 units

PT = 130 mins

ST = 40 mins

ET = 170 mins

IT = 50 mins

Cost = 3 units

PT = 130 mins

ST = 40 mins

ET = 170 mins

IT = 10 mins

Cost = 2 units

PT = 110 mins

ST = 40 mins

ET = 150 mins

IT = 10 mins

Cost = 2 units

(1)

(2)

Figure 3: Impact of order selection on VM reuse.

units) and the cost for t

is for 110 minutes (i.e. 2

units). Thus, the total cost for second workﬂow is

5 units.

Thus, task order impacts the VM reuse and cost. In

our approach, along with VM re-utilization across ad-

jacent and non-adjacent task, beneﬁts due to task or-

dering is considered.

4.3 OptReUse Algorithm

The best combination of VM types and security levels

for workﬂow tasks are determined by an evolutionary

optimization algorithm (like GA or PSO). Different

combinations are explored to obtain the best combina-

tion using evolutionary optimization algorithm. Each

combination is represented as a chromosome (GA) or

particle (PSO). The OptReUse algorithm computes

the execution cost (T EC), time (T ET ) and risk rate

(P(T )) for each such combination to determine the

ﬁtness value. OptReUse algorithm comprises of two

parts: Algorithms 1 and 2.

Using Algorithm 1, we compute the start time and

the end time of each task, for the given combination

of VM types and security levels. The other inputs to

Algorithm 1 are processing capacity, workload, out-

put data size of tasks, which assist in computing the

ST and ET . The ST of a task is same as the ET of its

predecessor task which completed last. First, we do

a general sort on the ST array in an ascending order.

Later, using the task order selection vector provided

by the evolutionary algorithm, we rearrange the task.

These task vector comprises of those task having the

same start time.

Algorithm 1: Start Time Computation and Task Ordering.

1: INPUT: VM and security services for n tasks

2: INPUT: Task order vector based on start time

3: INPUT: Processing capacity, workload, output

data size, VM rent cost

4: for i = 1,2, . .., n tasks do

5: Compute PT[i] for each task.

6: end for

7: for i = 1,2, . .., n tasks do

8: ST[i] = max{ET [ j]; j ∈ pre(i)}

9: Calculate ET [i]

10: end for

11: Sort ST based on the task order selection ST

12: OUTPUT: ST,ET

The output of Algorithm 1 is used as input to Al-

gorithm 2. Before allocating a task i to a new VM,

all possible underutilized VMs of the same instance

series and type are explored for reuse.

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

174

• If the task reuses a VM used by an adjacent task

then reduction in cost is due to (i) re-utilization of

available idle time on the VM and (ii) data transfer

costs between the tasks is null.

• If the task reuses a VM used by a non-adjacent

task then reduction in cost is only due to re-

utilization of available idle time on the VM.

If such a reusable VM is found, the task i is allo-

cated to the VM where it gets maximum cost reduc-

tion or else a new VM is rented for that task.

Algorithm 2: Task-VM Allocation Algorithm.

1: INPUT: ST,ET

2: Initialize TEC = 0

3: Initialize IT [n] = {0,0, ...,0}

4: for i = 1,2, . .., n tasks do

5: Search for vm

where idea time is available.

6: if VM is available then

7: Allocate task i to VM, where maximum

cost reduction is available.

8: Update Idle time on vm

, if reused.

9: else

10: Allocate a new VM.

11: end if

12: Compute new idle time and update IT [i].

13: Increment TEC as per equation 6.

14: Update ST and ET .

15: Sort ST.

16: end for

17: Calculate TET = max(ET ) as per equation 7.

18: Calculate P(T) as per equation 10

19: OUTPUT: T EC, T ET , P(T ).

Since OptReUse does not compute T EC while

traversing the workﬂow DAG by a traversal order, but

sorts tasks based on their start time and allocate re-

sources, OptReU se can ﬁnd VM reuse between tasks

of different workﬂows running simultaneously. The

implementation details and experimentation results

on various scenario is explained in the subsequent

section.

5 IMPLEMENTATION DETAILS

The individuals in the population-based evolutionary

algorithms represent a combination of VM types, se-

curity levels and task order. Workﬂow schedule gen-

eration approaches like WSG or OptReU se helps in

efﬁcient resource utilization for a given combination.

Each combination is used as an input to the WSG and

OptReUse algorithms. The total cost, makespan and

risk rate for that combination is computed. The ﬁtness

of a chromosome or a particle is measured in terms of

total cost, makespan and risk rate. In the subsequent

section, we discuss the individual coding strategy and

the implementation details for both the evolutionary

algorithms.

Parameters and Coding Strategy: In GA we used

random sampling with Tournament Selection. Sim-

ulated binary crossover (prob = 0.9) operation and

polynomial mutation operation were adapted. The

chromosome coding strategy is shown in Figure 4.

The ﬁrst chromosome consists of set of n values for

VM type, authentication, integrity and conﬁdentiality

security service level for each task. The second chro-

mosome is for task order selection. These tasks are

those having the same starting time. The second chro-

mosome represents the task ordering between tasks

having same starting time. The PSO particle coding

VM1 .... VMn

types

a1 .... an

Authentication

g1 .... gn

Integrity

c1 .... cn

Confidentiality

0 1 .... 1

Order

Selection

Figure 4: Chromosome structure.

strategy is similar to GA chromosome coding strat-

egy. Both GA and PSO is run for 1000 generations

and the number of particles in each generation is 150.

For PSO, the initial velocity is kept 0. Initially param-

eters have values like w = 0.64, c

= 2.0 and c

= 2.0.

However, the parameters are adaptive which means

they keep changing at each iteration.

5.1 Results and Discussion

Our approach is tested on three standard scientiﬁc

workﬂows: LIGO, SIPHT, CyberShake (refer Figure

7 from (Li et al., 2016)). The experimentation set-

tings are considered from the paper (Li et al., 2016).

The workload and output data for each task is ran-

domly generated from a uniform distribution in the

range [5000, 50000] GFLOP and [10, 100] GB, re-

spectively. The bandwidth is considered as 0.1 GB/s.

The two evolutionary optimization algorithms GA

and PSO are used for selecting the optimal combi-

nation of VM types, security levels and task order

for tasks. The cost performance of our proposed

OptReUse algorithm is compared with the workﬂow

scheduling generation (WSG) proposed in (Li et al.,

2016).

Secure Scheduling of Scientiﬁc Workﬂows in Cloud

175

Figure 5: LIGO Workﬂow Cost Comparison.

Figure 6: CyberShake Workﬂow Cost Comparison.

5.2 Comparison: WSG and OptReUse

We compare the performance between WSG and

OptReUse for different risk rates. The results are

shown in Figures 5 and 6. We consider an instance

of each standard workﬂow. For a given evolutionary

algorithm (GA or PSO) and for a given value of (P

OptReUse always results in lower cost compared to

WSG.

1. For LIGO, the average percentage reduction in

cost obtained by OptReUse over WSG is 8.07%.

2. CyberShake is a data intensive workﬂow requiring

mostly storage optimized VMs and they are the

costliest VMs. Therefore, in CyberShake work-

ﬂow more VMs are reused compared to LIGO.

Hence, the average percentage reduction in cost

is 11.2%.

3. SIPHT workﬂow is a computation intensive work-

ﬂow requiring mostly the cheapest compute opti-

mized VMs. But it has the highest number of tasks

which makes it possible for OptReUse to search

for more reuse cases. We observed for SIPHT the

average percentage reduction in cost as 13.15%.

With increase in value of P

, T EC deceases be-

cause of the selection of lower security levels. The

makespan in all the three benchmark test cases for all

the P

values in the range [0.1,..., 0.9] does not change

Figure 7: Average performance of OptReUse: Mean cost

and Margin of Error.

signiﬁcantly {LiGO = 45.47, SIPHT = 53.81, Cyber-

Shake = 102.2} minutes. Note, as time is not an ob-

jective function in our problem statement but it is a

constraint by a deadline.

5.3 OptReUse: Average Performance

To demonstrate the average performance of our ap-

proach, we experimented over number of instances.

We generated 75 workﬂow instances of CyberShake

work. Each instance having different task workload

and output data size. We chose three (high, medium,

low) permissible risk rates (P

) to observe the im-

pact of security levels on workﬂow execution cost and

makespan. The results are demonstrated in Figure 7.

Each bar in the ﬁgure gives the mean cost and margin

of error for conﬁdence interval of 95%. It is observed

that OptReUse gives lower mean cost and margin of

error compared to W SG. When P

is increased, the

difference between the required and the provided se-

curity levels, increases. Hence, with the selection of

lower security levels (with lower overheads), overall

cost is reduced (refer Figure 7). Note, the average

makespan for both WSG and OptReUse algorithm is

46.66 minutes. Therefore, we can conclude that using

OptReUSe approach, there is reduction in cost with-

out any delay.

5.4 OptReUse: Multiple Workﬂows

Consider the case of scheduling more than one work-

ﬂow instance simultaneously. We demonstrate using

two instances of CyberShake workﬂow. Suppose, all

the tasks are assigned with the least expensive VM

types. Scheduling multiple workﬂows simultaneously

by OptReUse results in lower cost compared to the

combined cost of scheduling workﬂows separately

(both by WSG or OptReUse). Ths is due to reuse of

VMs between tasks of different workﬂow instances as

shown in Table 2.

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

176

Table 2: Workﬂow scheduling results.

Instance Approach Cost ($) Makespan (hrs)

1 WSG 73.56 46.72

2 WSG 67.66 51.125

1 OptReU se 72.876 46.72

2 OptReU se 67.35 51.125

1&2 OptReuse 137.45 51.125

6 CONCLUSION

Scientiﬁc workﬂow scheduling is about selecting the

right VMs, security levels and schedule generation

such that the overall cost and makespan is minimal.

We demonstrate, how VM reuse can reduce the over-

all cost without any delay. In particular, we demon-

strate the beneﬁts of VM reuse across adjacent and

non-adjacent task, and further due to ordering of tasks

with the same start time. We design an enhanced se-

curity model for accurate estimation of risk. The re-

sults are shown by using two evolutionary algorithms

(GA and PSO) and the approach is tested on three

benchmark datasets. Our approach provides signiﬁ-

cant cost reduction via VM reutilization.

REFERENCES

Adhikari, M., Amgoth, T., and Srirama, S. N. (2020). Multi-

objective scheduling strategy for scientiﬁc workﬂows

in cloud environment: A ﬁreﬂy-based approach. Ap-

plied Soft Computing, 93:106411.

Challita, S., Paraiso, F., and Merle, P. (2017). A study of vir-

tual machine placement optimization in data centers.

In 7th International Conference on Cloud Computing

and Services Science (CLOSER), pages 343–350.

Hilman, M. H., Rodriguez, M. A., and Buyya, R. (2018).

Task runtime prediction in scientiﬁc workﬂows us-

ing an online incremental learning approach. In 2018

IEEE/ACM 11th International Conference on Utility

and Cloud Computing (UCC), pages 93–102. IEEE.

Kaur, P. and Mehta, S. (2017). Resource provisioning and

work ﬂow scheduling in clouds using augmented shuf-

ﬂed frog leaping algorithm. Journal of Parallel and

Distributed Computing, 101:41–50.

Kumar, P. R., Raj, P. H., and Jelciana, P. (2018). Exploring

data security issues and solutions in cloud computing.

Procedia Computer Science, 125:691–697.

Lee, Y. C., Han, H., Zomaya, A. Y., and Yousif, M.

(2015). Resource-efﬁcient workﬂow scheduling in

clouds. Knowledge-Based Systems, 80:153–162.

Li, Z., Ge, J., Yang, H., Huang, L., Hu, H., Hu, H., and

Luo, B. (2016). A security and cost aware scheduling

algorithm for heterogeneous tasks of scientiﬁc work-

ﬂow in clouds. Future Generation Computer Systems,

65:140–152.

Liu, J., Lu, S., and Che, D. (2020). A survey of modern sci-

entiﬁc workﬂow scheduling algorithms and systems in

the era of big data. In 2020 IEEE International Con-

ference on Services Computing (SCC), pages 132–

141. IEEE.

Liu, L., Zhang, M., Buyya, R., and Fan, Q. (2017).

Deadline-constrained coevolutionary genetic algo-

rithm for scientiﬁc workﬂow scheduling in cloud com-

puting. Concurrency and Computation: Practice and

Experience, 29(5):e3942.

Malawski, M., Juve, G., Deelman, E., and Nabrzyski, J.

(2015). Algorithms for cost-and deadline-constrained

provisioning for scientiﬁc workﬂow ensembles in iaas

clouds. Future Generation Computer Systems, 48:1–

18.

Mboula, J. E. N., Kamla, V. C., and Djamegni, C. T. (2020).

Cost-time trade-off efﬁcient workﬂow scheduling in

cloud. Simulation Modelling Practice and Theory,

103:102107.

Peng, G. and Wolter, K. (2019). Efﬁcient task scheduling

in cloud computing using an improved particle swarm

optimization algorithm. In CLOSER, pages 58–67.

Ramamurthy, A., Pantula, P. D., Gharote, M. S., Mitra, K.,

and Lodha, S. (2021). Multi-objective optimization

for virtual machine allocation in computational scien-

tiﬁc workﬂow under uncertainty. In CLOSER, pages

240–247.

Shishido, H. Y., Estrella, J. C., Toledo, C. F. M., and

Arantes, M. S. (2018). Genetic-based algorithms ap-

plied to a workﬂow scheduling algorithm with secu-

rity and deadline constraints in clouds. Computers &

Electrical Engineering, 69:378–394.

Weng, C., Liu, Q., Li, K., and Zou, D. (2016). Cloudmon:

monitoring virtual machines in clouds. IEEE Trans-

actions on Computers, 65(12):3787–3793.

Xie, T. and Qin, X. (2006). Scheduling security-critical

real-time applications on clusters. IEEE transactions

on computers, 55(7):864–879.

Zhou, J., Wang, T., Cong, P., Lu, P., Wei, T., and Chen, M.

(2019). Cost and makespan-aware workﬂow schedul-

ing in hybrid clouds. Journal of Systems Architecture,

100:101631.

Secure Scheduling of Scientiﬁc Workﬂows in Cloud

177