In Spark performance tuning, memory related 
tuning should be a high priority. As an in-memory 
computing engine, Spark holds most of the data sets 
in memory, not on hard disks, which greatly reduces 
the file access time. When free memory space 
becomes insufficient, data set are spilled to disks and 
this operation causes long latency. Garbage 
Collection (GC) can also occur to release more Java 
Virtual Machine (JVM) heap space thus adding 
significant GC latency. Besides memory hardware 
configuration parameters like capacity and 
bandwidth, Spark also provides a wide range of 
parameters to control the memory behaviour. All 
these parameters and memory related operations have 
a significant performance impact. These parameters 
exist in software at 4 different layers: the Spark 
execution engine, cluster resource management 
(YARN, Mesos, Standalone etc.), JVM and 
Operating System (OS). Since complex interactions 
exist between these parameters, it is very difficult to 
find an optimized parameters configuration that 
would maximize the Spark cluster performance.  
Traditional Cluster design and deployment 
decision are experience or measurement based, which 
can’t meet Spark cluster deployment criterions very 
well. Due to the very new nature of Spark, very few 
users can take sound and accurate decisions based on 
experience. On the other hand, upon cluster 
availability, measurement based optimization is 
extremely time consuming and can be easily 
interrupted by random environment factors like disk 
or network interface card (NIC) failures. 
Simulation based cluster analysis in general is a 
much more reliable approach to obtain systematic 
optimization solutions. Among the various simulation 
methods proposed (Kolberga et al., 2013), (Wang et 
al., 2011), (Kennedy and Gopal, 2013), (Verma et al., 
2011), CSMethod (Bian et al., 2014) is a fast and 
accurate cluster simulation method which employs a 
layered and configurable architecture to simulate Big 
Data clusters on standard client computers (desktop 
or laptop).  
The Spark workflow, especially the DAG 
abstraction, is very different from the Hadoop 
MapReduce workflow. In addition, current 
CSMethod based MapReduce model’s memory 
subsystem is too coarse to meet accuracy 
requirements for Spark simulation. To fill these gaps, 
this paper proposes a new simulation framework 
which is based on and extending CSMethod. All 
performance intensive Spark parameters and 
workflow are modeled for fast and accurate 
performance prediction with a fine-grained multi-
layer memory subsystem.  
The whole Spark cluster software stack is 
abstracted and simulated at functional level, including 
computing, communications and dataset access. 
Software functions are dynamically mapped onto 
hardware components. The timing of hardware 
components (storage, network, memory and CPU) is 
modeled according to payload and activities as 
perceived by software. A low overhead discrete-event 
simulation engine enables fast simulation speed and 
good scalability. The Spark simulator accepts Spark 
applications with input dataset information and 
cluster configurations then simulates the performance 
behaviour of the Spark application. The cluster 
configuration includes the software stack 
configuration and the hardware components 
configuration. 
The following key contributions are presented in 
this paper: 
•  We propose a new framework to simulate the whole 
performance intensive Spark workflow, including: 
DAG generation; RDD input fetch, transfer, shuffle 
and block management; Spill and HDFS access. 
• We describe a fine-grained multi-layer memory 
performance model which simulates the memory 
behaviour of Spark, JVM, OS and H/W layers with 
high accuracy. 
• We implement and validate the Spark simulation 
framework using a range of micro benchmarks and a 
real case IoT (Internet of Things) workload. The 
average error rate is within 7% and simulation speeds 
are very high. Running on a commercial Desktop the 
simulation time is close to the native execution time 
of a 5 node Intel Xeon E5 high-end server cluster. 
• We demonstrate a simulation based Spark parameter 
tuning approach which helps BigData cluster 
deployment planning, evaluation and optimization.  
The rest of this paper is organized as follows. 
Section 2 presents the proposed Spark simulator in 
details. The experimental environment set up and the 
workload are then introduced in section 3. Section 4 
illustrates the evaluation results and its analyses. A 
memory related Spark performance tuning case study 
is then presented in details in section 5. Section 6 
overviews related work. A summary and future work 
thoughts are described in the final section. 
2 SPARK SIMULATION 
FRAMEWORK 
ARCHITECTURE 
In this section, we introduce the proposed Spark 
simulation framework in details.