Leveraging Distributional Reinforcement Learning for Performance Optimization of Spark Job Scheduling in Cloud Environment

Sumit Kumar; Vishnu Prasad Verma; Santosh Kumar

doi:10.5220/0013582600004664

Leveraging Distributional Reinforcement Learning for Performance Optimization of Spark Job Scheduling in Cloud Environment

Sumit Kumar, Vishnu Prasad Verma, Santosh Kumar

2025

Abstract

Apache Spark is extensively utilized for processing massive data sets in fields like big data analytics and machine learning. However, its performance is closely tied to how jobs are scheduled, and resources are allocated, especially in dynamic cloud settings. The default Spark scheduler can sometimes struggle to efficiently manage resources in diverse clusters, leading to delays, higher costs, and slower job completion. This research introduces a new approach for optimizing Spark job scheduling using Distributional Deep Reinforcement Learning (DDRL). Unlike other methods focusing on average performance, DDRL employs a Rainbow Deep Q-Network to model the entire range of possible outcomes. This allows the system to better understand the risks and uncertainties associated with scheduling decisions. Key features of our approach include multi-step learning for long-term planning, techniques to encourage exploration and exploitation, and strategies for adapting to rapidly changing workloads. Our experiments show that the proposed framework significantly improves Spark’s performance. It achieves faster job scheduling, better resource utilization, and lower overall costs than existing methods. These results demonstrate the potential of DDRL as a robust and scalable solution for enhancing Spark scheduling in dynamic cloud environments

Download

Paper Citation

in Harvard Style

Kumar S., Verma V. and Kumar S. (2025). Leveraging Distributional Reinforcement Learning for Performance Optimization of Spark Job Scheduling in Cloud Environment. In Proceedings of the 3rd International Conference on Futuristic Technology - Volume 1: INCOFT; ISBN 978-989-758-763-4, SciTePress, pages 615-623. DOI: 10.5220/0013582600004664

in Bibtex Style

@conference{incoft25,
author={Sumit Kumar and Vishnu Prasad Verma and Santosh Kumar},
title={Leveraging Distributional Reinforcement Learning for Performance Optimization of Spark Job Scheduling in Cloud Environment},
booktitle={Proceedings of the 3rd International Conference on Futuristic Technology - Volume 1: INCOFT},
year={2025},
pages={615-623},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013582600004664},
isbn={978-989-758-763-4},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Futuristic Technology - Volume 1: INCOFT
TI - Leveraging Distributional Reinforcement Learning for Performance Optimization of Spark Job Scheduling in Cloud Environment
SN - 978-989-758-763-4
AU - Kumar S.
AU - Verma V.
AU - Kumar S.
PY - 2025
SP - 615
EP - 623
DO - 10.5220/0013582600004664
PB - SciTePress