Bi-Objective CSO for Big Data Scientific Workflows Scheduling in the Cloud: Case of LIGO Workflow

K. Bousselmi, S. Ben Hamida, M. Rukoz

Abstract

Scientific workflows are used to model scalable, portable, and reproducible big data analyses and scientific experiments with low development costs. To optimize their performances and ensure data resources efficiency, scientific workflows handling big volumes of data need to be executed on scalable distributed environments like the Cloud infrastructure services. The problem of scheduling such workflows is known as an NP-complete problem. It aims to find optimal mapping task-to-resource and data-to-storage resources in order to meet end user’s quality of service objectives, especially minimizing the overall makespan or the financial cost of the workflow. In this paper, we formulate the problem of scheduling big data scientific workflows as bi-objective optimization problem that aims to minimize both the makespan and the cost of the workflow. The formulated problem is then resolved using our proposed Bi-Objective Cat Swarm Optimization algorithm (BiO-CSO) which is an extension of the bio-inspired algorithm CSO. The extension consists of adapting the algorithm to solve multi-objective discrete optimization problems. Our application case is the LIGO Inspiral workflow which is a CPU and Data intensive workflow used to generate and analyze gravitational waveforms from data collected during the coalescing of compact binary systems. The performance of the proposed method is then compared to that of the multi-objective Particle Swarm Optimization (PSO) proven to be effective for scientific workflows scheduling. The experimental results show that our algorithm BiO-CSO performs better than the multi-objective PSO since it provides more and better final scheduling solutions.

Download


Paper Citation