loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Qian Chen ; Kebing Wang ; Zhaojuan Bian ; Illia Cremer ; Gen Xu and Yejun Guo

Affiliation: Intel Corporation, China

Keyword(s): Spark Simulation, Cluster Simulation, Performance Modelling, Memory Modelling, In-memory Computing, Big Data, Capacity Planning.

Related Ontology Subjects/Areas/Topics: Computer Simulation Techniques ; Performance Analysis ; Simulation and Modeling ; Simulation Tools and Platforms

Abstract: As the most active project in the Hadoop ecosystem these days (Zaharia, 2014), Spark is a fast and general purpose engine for large-scale data processing. Thanks to its advanced Directed Acyclic Graph (DAG) execution engine and in-memory computing mechanism, Spark runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk (Apache, 2016). However, Spark performance is impacted by many system software, hardware and dataset factors especially memory and JVM related, which makes capacity planning and tuning for Spark clusters extremely difficult. Current planning methods are mostly estimation based and are highly dependent on experience and trial-and-error. These approaches are far from efficient and accurate, especially with increasing software stack complexity and hardware diversity. Here, we propose a novel Spark simulator based on CSMethod (Bian et al., 2014), extension with a fine-grained multi-layered memory subsystem, well suitable for Spark cluster dep loyment planning,performance evaluation and optimization before system provisioning. The whole Spark application execution life cycle is simulated by the proposed simulator, including DAG generation, Resilient Distributed Dataset (RDD) processing and block management. Hardware activities derived from these software operations are dynamically mapped onto architecture models for processors, storage, and network devices. Performance behaviour of cluster memory system at multiple layers (Spark, JVM, OS, hardware) are modeled as an enhanced fine-grained individual global library. Experimental results with several popular Spark micro benchmarks and a real case IoT workloads demonstrate that our Spark Simulator achieves high accuracy with an average error rate below 7%. With light weight computing resource requirement (a laptop is enough) our simulator runs at the same speed level than native execution on multi-node high-end cluster. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.138.141.202

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Chen, Q.; Wang, K.; Bian, Z.; Cremer, I.; Xu, G. and Guo, Y. (2016). Simulating Spark Cluster for Deployment Planning, Evaluation and Optimization. In Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - SIMULTECH; ISBN 978-989-758-199-1; ISSN 2184-2841, SciTePress, pages 33-43. DOI: 10.5220/0005952300330043

@conference{simultech16,
author={Qian Chen. and Kebing Wang. and Zhaojuan Bian. and Illia Cremer. and Gen Xu. and Yejun Guo.},
title={Simulating Spark Cluster for Deployment Planning, Evaluation and Optimization},
booktitle={Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - SIMULTECH},
year={2016},
pages={33-43},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005952300330043},
isbn={978-989-758-199-1},
issn={2184-2841},
}

TY - CONF

JO - Proceedings of the 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - SIMULTECH
TI - Simulating Spark Cluster for Deployment Planning, Evaluation and Optimization
SN - 978-989-758-199-1
IS - 2184-2841
AU - Chen, Q.
AU - Wang, K.
AU - Bian, Z.
AU - Cremer, I.
AU - Xu, G.
AU - Guo, Y.
PY - 2016
SP - 33
EP - 43
DO - 10.5220/0005952300330043
PB - SciTePress