Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing

Paul Marshall, Henry Tufo, Kate Keahey, David La Bissoniere, Matthew Woitaszek

2012

Abstract

Infrastructure-as-a-service (IaaS) clouds, such as Amazon EC2, offer pay-for-use virtual resources on-demand. This allows users to outsource computation and storage when needed and create elastic computing environments that adapt to changing demand. However, existing services, such as cluster resource managers (e.g. Torque), do not include support for elastic environments. Furthermore, no recontextualization services exist to reconfigure these environments as they continually adapt to changes in demand. In this paper we present an architecture for a large-scale elastic cluster environment. We extend an open-source elastic IaaS manager, the Elastic Processing Unit (EPU), to support the Torque batch-queue scheduler. We also develop a lightweight REST-based recontextualization broker that periodically reconfigures the cluster as nodes join or leave the environment. Our solution adds nodes dynamically at runtime and supports MPI jobs across dis-tributed resources. For experimental evaluation, we deploy our solution using both NSF FutureGrid and Amazon EC2. We demonstrate the ability of our solution to create multi-cloud deployments and run batch-queued jobs, recontextualize 256 node clusters within one second of the recontextualization period, and scale to over 475 nodes in less than 15 minutes.

References

  1. Amazon CloudWatch. Amazon, Inc. [Online]. Retrieved January 8, 2012, from: http://aws.amazon.com/ cloudwatch/
  2. Amazon Web Services. Amazon.com, Inc. [Online]. Retrieved January 8, 2012, from: http:// www.amazon.com/aws/
  3. Armbrust M., et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep., February 2009.
  4. Armstrong P., et al., “Cloud scheduler: a resource manager for distributed compute clouds,” CoRR, vol. abs/1007.0050, 2010.
  5. Barham P., et al., Xen and the art of virtualization. SIGOPS Oper. Syst. Rev., 37:164--177, October 2003.
  6. Bode B., et al. The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters. Usenix, 4th Annual Linux Showcase and Conference, 2000.
  7. Bresnahan J., et al., Managing Appliance Launches in Infrastructure Clouds. Teragrid 2011. Salt Lake City, UT. July 2011.
  8. Chef. Opscode. [Online]. Retrieved January 8, 2012, from: http://www.opscode.com/chef/
  9. Evangelinos C., Hill C., “Cloud Computing for Parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2,” The First Workshop on Cloud Computing and its Applications (CCA'08), October 2008.
  10. FutureGrid. [Online]. Retrieved February 29, 2012, from: http://futuregrid.org/
  11. Gavrilovska A., et al., “High-Performance Hypervisor Architectures: Virtualization in HPC Systems,” In 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt 2007).
  12. Gentzsch W., “Sun grid engine: towards creating a compute power grid,” in Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on, 5 2001, pp. 35 -36.
  13. Ghoshal D., et al., I/O performance of virtualized cloud environments. In Proceedings of the second international workshop on data intensive computing in the clouds, DataCloud-SC 7811, 71--80, New York, NY, USA, 2011. , ACM.
  14. GitHub EPU. GitHub. [Online]. Retrieved January 8, 2012, from: https://github.com/ooici/epu
  15. He Q., et al. Case study for running hpc applications in public clouds. In Proceedings of the 19th acm international symposium on high performance distributed computing, HPDC 7810, 395--401, New York, NY, USA, 2010. , ACM.
  16. Huang W., et al. A Case for High Performance Computing with Virtual Machines. In Proceedings of the 20th Annual International Conference on Supercomputing, Queensland, Australia, 2006.
  17. Jackson D., et al., Core algorithms of the maui scheduler. In D. Feitelson and L. Rudolph, editors, Job scheduling strategies for parallel processing, volume 2221, page 87-102. Springer Berlin / Heidelberg, 2001.
  18. Jackson K. R., et al., “Seeking supernovae in the clouds: a performance study,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ser. HPDC 7810. New York, NY, USA: ACM, 2010, pp. 421- 429.
  19. Jacob A., “Infrastructure in the cloud era,” in Proceedings at International O'Reilly Conference Velocity, 2009.
  20. Juve G., Deelman E.,“Automating application deployment in infrastructure clouds,” Cloud Computing Technology and Science, IEEE International Conference on, vol. 0, pp. 658- 665, 2011.
  21. Juve G., et al., “Data sharing options for scientific workflows on amazon ec2,” in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC 7810. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-9.
  22. Keahey K. and Freeman T., Contextualization: Providing One-Click Virtual Clusters, eScience 2008, Indianapolis, IN. December 2008.
  23. Keahey K., et al., Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. Scientific Programming Journal, vol 13, No. 4, 2005, Special Issue: Dynamic Grids and Worldwide Computing, pp. 265-276.
  24. Keahey, K., et al., “Infrastructure Outsourcing in MultiCloud Environments,” submitted to XSEDE 2012, Chicago, IL.
  25. Marshall P., Keahey K., and Freeman T., “Elastic Site: Using clouds to elastically extend site resources,” in IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2010.
  26. Marshall P., Tufo H., and Keahey K., Provisioning Policies for Elastic Computing Environments, 9th HighPerformance Grid and Cloud Computing Workshop (HPGC), Proceedings of the 26th International Parallel and Distributed Processing Symposium (IPDPS 2012), Shanghai, China, May 2012 (to appear).
  27. Maui. [Online]. Retrieved February 29, 2012, from: http://www.clusterresources.com/pages/products/maui -cluster-scheduler.php
  28. Murphy M., et al., "Dynamic Provisioning of Virtual Organization Clusters" 9th IEEE International Symposium on Cluster Computing and the Grid, Shanghai, China, May 2009.
  29. Nimbus. [Online]. Retrieved January 8, 2012, from: http://www.nimbusproject.org
  30. OOI EPU. [Online]. Retrieved February 29, 2012, from: https://confluence.oceanobservatories.org/display/syse ng/CIAD+CEI+OV+Elastic+Computing
  31. Oracle Grid Engine. Oracle. [Online]. Retrieved January 8, 2012, from: http://www.oracle.com/us/products/ tools/oracle-grid-engine-075549.html
  32. Ostermann S., et al. A performance analysis of ec2 cloud computing services for scientific computing. In cloud computing, volume 34, page 115-131. Springer Berlin Heidelberg, 2010.
  33. pbs_python. [Online]. Retrieved January 8, 2012, from: https://subtrac.sara.nl/oss/pbs_python
  34. Rehr J., et al., “Scientific computing in the cloud,” Computing in Science Engineering, vol. 12, no. 3, pp. 34 - 43, may-june 2010.
  35. Ruth P., et al. Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. IEEE International Conference on Autonomic Computing, 2006.
  36. Ruth P., et al., VioCluster: Virtualization for Dynamic Computational Domains, Cluster Computing, 2005. IEEE International, pages 1-10, Sept. 2005.
  37. Sotomayor B., et al., “Virtual infrastructure management in private and hybrid clouds,” Internet Computing, IEEE, vol. 13, no. 5, pp. 14 -22, sept.- oct. 2009.
  38. Top500 List. [Online]. Retrieved February 29, 2012, from: http://top500.org/list/2011/11/100
  39. Vinoski S., “Advanced message queuing protocol,” Internet Computing, IEEE, vol. 10, no. 6, pp. 87 -89, 2006.
  40. Wilkening J., et al., “Using clouds for metagenomics: A case study,” in Cluster Computing and Workshops, 2009. CLUSTER 7809. IEEE International Conference on, 31 2009-sept. 4 2009, pp. 1 -6.
  41. Woitaszek M. and Tufo H., “Developing a cloud computing charging model for high-performance computing resources,” in 10th IEEE International Conference on Computer and Information Technology, Bradford, UK, June 2010.
Download


Paper Citation


in Harvard Style

Marshall P., Tufo H., Keahey K., La Bissoniere D. and Woitaszek M. (2012). Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing . In Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT, ISBN 978-989-8565-19-8, pages 409-418. DOI: 10.5220/0004081704090418


in Bibtex Style

@conference{icsoft12,
author={Paul Marshall and Henry Tufo and Kate Keahey and David La Bissoniere and Matthew Woitaszek},
title={Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing},
booktitle={Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT,},
year={2012},
pages={409-418},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004081704090418},
isbn={978-989-8565-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Software Paradigm Trends - Volume 1: ICSOFT,
TI - Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing
SN - 978-989-8565-19-8
AU - Marshall P.
AU - Tufo H.
AU - Keahey K.
AU - La Bissoniere D.
AU - Woitaszek M.
PY - 2012
SP - 409
EP - 418
DO - 10.5220/0004081704090418