Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads

Hifza Khalid, Arunselvan Ramaswamy, Simone Ferlin, Alva Couch

2024

Abstract

Accurate predictive models for cloud workloads can be helpful in improving task scheduling, capacity planning and preemptive resource conflict resolution, especially in the setting of co-located jobs. Alibaba, one of the leading cloud providers co-locates transient batch tasks and high priority latency sensitive online jobs on the same cluster. In this paper, we consider the problem of using a publicly released dataset by Alibaba to model the batch tasks that are often overlooked compared to online services. The dataset contains the arrivals and resource requirements (CPU, memory, etc.) for both batch and online tasks. Our trained model predicts, with high accuracy, the number of batch tasks that arrive in any 30 minute window, their associated CPU and memory requirements, and their lifetimes. It captures over 94% of arrivals in each 30 minute window within a 95% prediction interval. The F1 scores for the most frequent CPU classes exceed 75%, and our memory and lifetime predictions incur less than 1% test data loss. The prediction accuracy of the lifetime of a batch-task drops when the model uses both CPU and memory information, as opposed to only using memory information.

Download


Paper Citation


in Harvard Style

Khalid H., Ramaswamy A., Ferlin S. and Couch A. (2024). Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 558-569. DOI: 10.5220/0012392700003654


in Bibtex Style

@conference{icpram24,
author={Hifza Khalid and Arunselvan Ramaswamy and Simone Ferlin and Alva Couch},
title={Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={558-569},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012392700003654},
isbn={978-989-758-684-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads
SN - 978-989-758-684-2
AU - Khalid H.
AU - Ramaswamy A.
AU - Ferlin S.
AU - Couch A.
PY - 2024
SP - 558
EP - 569
DO - 10.5220/0012392700003654
PB - SciTePress