Authors:
Akihiro Misawa
1
;
Susumu Date
1
;
Keichi Takahashi
1
;
Takashi Yoshikawa
2
;
Masahiko Takahashi
3
;
Masaki Kan
3
;
Yasuhiro Watashiba
4
;
Yoshiyuki Kido
1
;
Chonho Lee
1
and
Shinji Shimojo
1
Affiliations:
1
Cybermedia Center, Osaka University and 5-1 Mihogaoka, Japan
;
2
Cybermedia Center, Osaka University, 5-1 Mihogaoka, System Platform Research Laboratories, NEC, 1753 Shimonumabe and Nakahara, Japan
;
3
System Platform Research Laboratories, NEC, 1753 Shimonumabe and Nakahara, Japan
;
4
Information of Science, Nara Institute of Science and Technology, 8916-5, Takayama, Cybermedia Center, Osaka University and 5-1 Mihogaoka, Japan
Keyword(s):
Cloud Computing, Disaggregation, Resource Pool, GPU/FPGA Accelerator, Hetero Computer, Distributed Storage, Job Scheduling, Resource Management, PCI Express, Openstack, Software Defined System.
Related
Ontology
Subjects/Areas/Topics:
Cloud Computing
;
Cloud Computing Enabling Technology
;
Xaas
Abstract:
It has become increasingly difficult for high performance computing (HPC) users to own a HPC platform for themselves. As user needs and requirements for HPC have diversified, the HPC systems have the capacity and ability to execute diverse applications. In this paper, we present computer architecture for dynamically and promptly delivering high performance computing infrastructure as a cloud computing service in response to users’ requests for the underlying computational resources of the cloud. To obtain the flexibility to accommodate a variety of HPC jobs, each of which may require a unique computing platform, the proposed system reconfigures software and hardware platforms, taking advantage of the synergy of Open Grid Scheduler/Grid Engine and OpenStack. An experimental system developed in this research shows a high degree of flexibility in hardware reconfigurability as well as high performance for a benchmark application of Spark. Also, our evaluation shows that the experimental
system can execute twice as many as jobs that need a graphics processing unit (GPU), in addition to eliminating the worst case of resource congestion in the real-world operational record of our university’s computer center in the previous half a year.
(More)