Authors:
Denis B. Citadin
1
;
Fábio Rossi
2
;
Marcelo Luizelli
3
;
Philippe Navaux
1
and
Arthur Lorenzon
1
Affiliations:
1
Institute of Informatics, Federal University of Rio Grande do Sul, Brazil
;
2
Campus Alegrete, Federal Institute Farroupilha, Brazil
;
3
Campus Alegrete, Federal University of Pampa, Brazil
Keyword(s):
Cloud Computing, Energy Efficiency, Infrastructure as Code, Artificial Intelligence.
Abstract:
Cloud computing has become essential for executing high-performance computing (HPC) workloads due to its on-demand resource provisioning and customization advantages. However, energy efficiency challenges persist, as performance gains from thread-level parallelism (TLP) often come with increased energy consumption. To address the challenging task of optimizing the balance between performance and energy consumption, we propose SmartNodeTuner. It is a framework that leverages artificial intelligence and Infrastructure as Code (Iac) to optimize performance-energy trade-offs in cloud environments and provide seamless infrastructure management. SmartNodeTuner is split into two main modules: a BuiltModel Engine leveraging an artificial neural network (ANN) model trained to predict optimal TLP and node configurations; and AutoDeploy Engine using IaC with Terraform to automate the deployment and resource allocation, reducing manual efforts and ensuring efficient infrastructure management. Us
ing ten well-known parallel workloads, we validate SmartNodeTuner on a private cloud cluster with diverse architectures. It achieves a 38.2% improvement in the Energy-Delay Product (EDP) compared to Kubernetes’ default scheduler and consistently predicts near-optimal configurations. Our results also demonstrate significant energy savings with negligible performance degradation, highlighting SmartNodeTuner ’s effectiveness in optimizing resource use in heterogeneous cloud environments.
(More)