Process-oriented Discrete-event Simulation in Java with Continuations
Quantitative Performance Evaluation
Antonio Cuomo
1
, Massimiliano Rak
2
and Umberto Villano
1
1
Universit`a degli Studi del Sannio, Benevento, Italy
2
Seconda Universit`a di Napoli, Aversa, Italy
Keywords:
Discrete-event Simulation, Java, Continuations, Benchmark.
Abstract:
In discrete-event simulation the process interaction view is appreciated in many different contexts, as it often
provides the cleanest and simplest way to express models. However, this view is harder to implement than
the more common event-oriented view. This is mostly due to the need for the simulation engine to support
in a efficient way the coroutine-like semantics needed to implement the simulation processes. A common
solution adopted in many Java-based simulators is the use of system threads to provide coroutines. This paper
shows that this choice leads to unnecessary overheads and limitations, and presents an alternative implemen-
tation based on continuations. For many common models the continuation-based simulator shows significant
performance gains compared to the most popular open source Java engines.
1 INTRODUCTION
Discrete-event simulation makes it possible to model
a wide class of systems ranging from factory pro-
duction lines to computer systems, from military op-
erations to air-traffic control, just to mention a few.
Support for the computer execution of discrete-event
models dates back to the sixties, when simulation-
oriented languages as Simula, GPSS and SIM-
SCRIPT were devised (Nance, 1996). A large re-
search effort has been devoted to enrich mainstream
languages as C, C++, Java, Python with simulation
capabilities. The most common choice is to provide
the additional simulation functionality through a soft-
ware library. Independently of the architectural level
at which they are provided (application, library, lan-
guage), the simulation capabilities embody a world
view (Derrick et al., 1989) for their users. The world
view is essentially the set of concepts that constitute
the basic elements available to the modeler to com-
pose and to specify the simulation. The diverse world
views are functionally equivalent, but differ in ex-
pressive power and in terms of computational effi-
ciency. The most commonly used world views are the
event-oriented, the activity-scanning and the process-
oriented views. In the event-oriented formalism, the
This research is partially supported by MIUR-PRIN
2008 project “Cloud@Home: a New Enhanced Computing
Paradigm”.
modeler describes the system in terms of events which
are associated with an event routine in charge of han-
dling event records, scheduling of future events, and
evaluating conditional events. The resulting model
logic is quite fragmented, as the scheduling and the
evaluation of conditions are scattered throughout the
event routines, but implementation can be made very
efficient. In the activity scanning view, the modeler
identifies various objects in the systems, the activities
that these objects perform, and the conditions under
which these activities take place. The simulation is
composed of a time-scan (which determines the time
increment for the system clock) and an activity scan
(which determines which activities can be executed).
The process-oriented view hinges on the concept of
process, a sequence of events and activities through
which a specific object moves. It enables the modeler
to clearly grasp a model structure, since each object
can be represented through a single, coherent process
rather than multiple event routing.
Very often the process-oriented view is internally
implemented on the top of an event-oriented kernel,
due to the efficiency of the last approach. But the sim-
ulator design is not trivial: to implement through a se-
quential program the concurrent execution of simula-
tion processes evolving in discrete time, these should
take turns in their execution on a sequential machine
(the use of parallel machines introduces further prob-
lems). One can argue that the notion of simula-
87
Cuomo A., Rak M. and Villano U..
Process-oriented Discrete-event Simulation in Java with Continuations - Quantitative Performance Evaluation.
DOI: 10.5220/0004014500870096
In Proceedings of the 2nd International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH-2012),
pages 87-96
ISBN: 978-989-8565-20-4
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
tion processes as interacting “flows of control” which
must be able to suspend themselves, to yield control
to other processes and, later on, to restart from where
they were suspended, is in close correspondence with
the programming concept of coroutines. These are a
generalization of subroutines, introduced in the six-
ties by Conway. Marlin (Marlin, 1980) best describes
two key features of coroutines:
the values of data local to a coroutine persist be-
tween successive calls;
the execution of a coroutine is suspended as con-
trol leaves it, only to carry on where it left off
when control re-enters the coroutine later.
In the Java language, as there is no direct sup-
port for them, coroutines can be implemented through
the built-in thread system: it is sufficient to asso-
ciate a thread to each coroutine, and to perform
yields through the basic synchronization primitives
generally available with threads (e.g.,
wait()
and
notify()
). However, many researchers have recog-
nized that the use of threads just to hold some compu-
tation state is overkill, and can possibly lead to large
overheads
2
.
This paper illustrates an alternative design of a
Java-based discrete event simulator, which imple-
ments the coroutine semantics through the less com-
mon concept of continuations (Reynolds, 1993). A
continuation is a data structure that stores the compu-
tational process at a given point in the program exe-
cution (program counter, stack, ...). The data stored
can be accessed later on programmatically: upon in-
vocation of the instance of continuation, the process
will resume execution from the control point that it
previously saved.
Our Java-based discrete event simulator, JADES
(which stands for JAva Discrete Event Simulator)
uses continuations as the basic tool for providing the
coroutine semantics. Our objective was to implement
a process-oriented simulator not resorting to thread-
ing, and so one that could be possibly immune to the
overheads and run-time limitations typical of thread-
based Java simulators. This paper firstly presents the
continuation-based design of JADES, and then tackles
the problem of the quantitative evaluation of its per-
formance in comparison to other currently-available
Java simulators. This will require the design of suit-
able benchmarks and extensive experimentation, both
widely documented here.
The rest of this paper is organized as follows. Sec-
tion 2 provides a description of the most relevant is-
2
Overheads have been amplified by the progressive dis-
missal of ’green’ threads and the use of native operating
system threads by the Java Virtual Machine.
sues in designing process-oriented simulators and an
overview of related work. Section 3 and 4 describe
the design and implementation of the JADES simula-
tor, respectively. Section 5 presents benchmarks for
the evaluation of process-oriented discrete event sim-
ulators. In Section 6 a first application of the bench-
marks is shown, using them to compare the perfor-
mance of JADES to other state-of-the-art open source
Java simulators. The comments on the test results and
our conclusions are the object of Section 7.
2 BACKGROUND AND RELATED
WORK
The implementation of a general-purpose simulation
system entails the provision of six fundamental fea-
tures (Kiviat, 1969): 1) representation of simulated
time; 2) management of simulated entities, includ-
ing their creation, state and collections; 3) generation
of uniform pseudorandom numbers; 4) generation of
non-uniform random variates; 5) statistical data col-
lection; 6) reporting facilities, for summary and/or de-
tailed performance data.
If the simulator is to support the process-oriented
view, additional features are needed, as pointed out in
(Perumalla and Fujimoto, 1998):
F1 procedures can declare and use local variables.
F2 procedure calls can be nested.
F3 procedures can be recursive and re-entrant.
F4 primitives to advance simulation time can be in-
voked in any procedure.
F5 primitives to advance simulation time can be in-
voked wherever a conditional, looping or other
statements can appear.
As noted in (Kunert, 2008), features F1 to F3 are
directly provided by most general programming lan-
guages, while features F4-F5 are difficult to imple-
ment as they imply the possibility to suspend the exe-
cution of processes and continue it in a later moment.
In the context of Java simulators, we can distin-
guish between those that implement simulation pro-
cesses as threads, and others that do not. Based
on this distinction, the next subsections highlight the
most relevant research contributions in each area.
It is worth pointing out the existence of a plethora
of discrete-event simulation libraries that do not of-
fer direct support for the process-oriented world
view, but provide only the activity scanning or, more
commonly, the event-oriented view. On the other
hand, there exist simulators that support the process-
oriented view, but are not implemented in Java, as
SIMULTECH2012-2ndInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
88
CSIM for C/C++ and SimPy for Python. Both
the classes of simulators mentioned above will not
be dealt with in the discussion that follows, which
will consider only Java-based simulators supporting
a process-oriented world view.
2.1 Thread-based Simulators
SimJava (Howell and McNab, 1998) and JSim (Miller
et al., 1997) are among the first implementations of
the thread-based class of simulators. These early ef-
forts pay particular attention to web-based simulation
and to the Java Applet deployment model. Many sim-
ulators aim at replicating the functionality and design
of Simula in Java. For example, Javasimulation (Hels-
gaun, 2000) follows the Simula design so close that
coroutines are presented as the main mechanism for
implementing simulation processes. However, corou-
tines are in their turn implemented by exploiting the
Java threading.
DesmoJ (Lechler and Page, 1999) supports ad-
vanced process-oriented modeling features. These
include capacity-constrained resources, conditional
waiting and special process relationships as pro-
ducer/consumer and asymmetric master/slave. SSJ
(L’Ecuyer and Buist, 2005) is designed for perfor-
mance, flexibility and extensibility. It offers its users
the possibility to choose between many alternatives
for most of the internal algorithms and data structures
of the simulator.
As mentioned before, Java threads are a power-
ful resource, but using them just for saving an exe-
cution context for later resume is probably overkill,
and introduces unnecessary overheads and limitations
on the maximum number of simulated processes. Let
us consider the two issues separately. As regards the
overheads, it should be pointed out that most typically
Java platforms implement threads as system threads.
Hence threads are individually scheduled by the OS
and managed through system calls. This makes it
possible to exploit multiple processors, but also intro-
duces a non-negligible overhead associated to thread
management activities. As for the limitation on the
maximum number of simulated processes, it can be
observed that Java threads have a minimum size
3
that
cannot be reduced due to the presence of guard pages
on the stack. When the address spaces are not so large
(e.g., in 32 bit systems), the maximum number of sim-
ulated processes is severely limited by the maximum
number of threads that can be allocated in a memory
where the heap is usually dominant due to the Java
execution model. In practice, in a 32-bit Linux sys-
3
Platform dependent, 48 KB on a typical 32-bit Linux
box.
tem, where the addressable space is 4 GB, of which
1 GB is typically reserved to the kernel, less than 3
GB are available for the rest. Under the reasonable
assumption that 1.5 GB are used for the JVM text and
data sections and the heap, only 1.5 GB remain for the
stack. Even with the lowest setting of stack size for
a thread (48 KB), the maximum number of allowed
threads will be about 30,000. In fact, this huge num-
ber of threads turns out to be not sufficient for models
of complex systems in which a considerable number
of simultaneously active entities must be simulated
(e.g., parallel computers, wide area networks, multi-
tasking operating systems, sensors networks, ... ).
2.2 Beyond Threads
In light of all the above, the use of alternatives to
threads for saving and resuming a context seems at
least reasonable. In recent years, different approaches
have been followed to provide a better implementa-
tion of the process interaction view, all of which try
to deal with the lack of support for coroutines in the
standard Java language and virtual machine. D-SOL
(Jacobs et al., 2002) is based on the use of a process
interpreter. This can be thought of as a virtual ma-
chine implemented in Java that executes the code of
the process class. The interpreter takes care of paus-
able methods and is able to save their execution con-
text. To avoid unnecessary overhead, the methods in-
voked by the process that do not lead to process sus-
pension are directly executed through reflection. Cur-
rently, a D-SOL based interpretation engine is found
in the current version of the above-mentioned SSJ
simulator. Tortuga (Weatherly and Page, 2004) pro-
vides an implementation of coroutines in Java which
is not based on threads. This is based on a modifica-
tion of a non-standard Java virtual machine, the Jikes
RVM, able to provide the state-saving mechanism.
Works as (Stadler, 2011) are paving the way for
the integration of coroutines or of a continuation-
like mechanism in the standard Java platform. In
the meantime, the use of continuations is beginning
to spread in discrete-event simulation. The My-
TimeWarp simulator by Kunert (Kunert, 2008) hinges
on the JavaFlow continuation library (which is also
adopted for the JADES simulator presented in this pa-
per) and focuses on process-oriented time warp opti-
mistic parallel simulation. However, no performance
tests and source code are available to make compari-
son with our work.
Process-orientedDiscrete-eventSimulationinJavawithContinuations-QuantitativePerformanceEvaluation
89
Figure 1: JADES class diagram.
3 JADES: SIMULATOR DESIGN
JADES (JAva Discrete Event Simulator) is a Java
library that exploits continuations to support the
construction of simulation models according to the
process-oriented world view. The aim of JADES is
providing modelers with an implementation of the
process-oriented paradigm which is both effective (as
it offers a wide range of building blocks to compose
models) and efficient (allowing for the creation of
large, complex models and their fast evaluation).
While the implementation of the simulator is quite
innovative, the design of the JADES interface is in-
spired by the popular simulator CSIM (Schwetman,
2001), which provides one of the most complete
process-oriented simulation APIs for C and C++.
There exists a Java version of CSIM, which, unlike
the coroutine-based C/C++ version, uses Java threads
to implement the simulation processes. This design
choice introduces all the drawbacks discussed in the
previous section.
The design of JADES is illustrated in Figure 1. In
our system, a simulation is composed of processes,
which are active components able to act upon (and in-
teract through) passive objects or resources. A model
is a logical organization of simulation processes, to-
gether with a description of the simulation details, in
terms of number of simulation runs, report genera-
tion and tracing. Modelers create their own simula-
tion model as a subclass of
Model
: they have only to
implement the
run
method, providing all the model-
specific logic to create the initial simulation processes
and the resources. A model is associated with a
Scheduler
, which manages the queue of ready pro-
cesses and decides the next process that will run in
order of simulation time and, to break ties, process
priority. Besides defining the overall model, devel-
opers have also to specify the behavior of their pro-
cesses: as for the model, this is done by subclass-
ing a predefined
Process
class, which provides all
the common basic functionalities a process needs.
These include: a) holding, that is, waiting for the pas-
sage of a given amount of simulation time; b) adding
other processes to the simulation (at the current sim-
ulation time, or later); c) making use of resources.
The resources are passive objects that essentially pro-
vide higher-level, more useful abstractions than sim-
ple process queues. Predefined class of resources pro-
vided with the simulator are:
Facility
, which models all-or-none resources,
as servers;
Storage
, which models partially allocatable re-
sources, as the likes of memories, disks, ...;
Buffer
, a resource through which processes can
communicate and synchronize by producing and
consuming items;
Mailbox
, through which processes can interact by
exchanging messages.
Event
, for conditional process synchronization.
Process can wait for events to occur and declare
events as occurred.
Every resource is instrumented to gather statistics
about its usage during the simulation (average queue
length, waiting time, number of processes served).
Additional statistics can be gathered programmati-
cally using
Table
s, which allow to add modeler-
defined observations of values at given points in the
program. Almost all the events that happen during
execution can be singularly traced, together with the
SIMULTECH2012-2ndInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
90
process to which they refer and the time of simula-
tion when the event happened. Finally, the
Random
class allows the generation of multiple, independent
streams of pseudo-random numbers.
4 JADES IMPLEMENTATION
This section will discuss some of the internals of
JADES, with particular regard to its distinguishing
feature, the use of continuations to implement the
simulation processes. The description of the more
“conventional” parts of the simulation library is omit-
ted here for brevity’s sake, and will be presented in a
companion paper. Since continuations are not directly
provided by the standard Java platform, JADES must
resort to an external system which we call continua-
tion provider. Several existing continuation providers
have been evaluated for use in JADES. As one of
our objectives was the use of a standard JVM (to al-
low easy integration of the simulator in other applica-
tions), we did not consider approaches based on mod-
ified VMs. Our analysis showed that currently the
most mature project is Javaflow (Ortega-Ruiz et al.,
2004), a component of the Apache Jakarta Commons
Sandbox. Javaflow provides asymmetric continua-
tions, in that it forces the programmer to specify the
continuation to which he wants to pass control. The
core JavaFlow API is found in the static methods of
the
Continuation
class:
Continuation startWith(Runnable r)
makes it possible to construct a continuation
from a
Runnable
object and execute its
run
method. Control passes to the continuation and
goes back to the caller if
run
ends or if the
suspend()
method is invoked (in which case a
valid continuation is returned).
Continuation continueWith(Continuation
c)
resumes the execution of the continuation
passed as parameter from where it left off.
void Continuation suspend()
stops the run-
ning continuation, creating a resumption point
and giving control back to the method that called
startWith
or
continueWith
.
JavaFlow implements the continuations functionali-
ties through bytecode rewriting. The bytecode of all
the classes of the system is scanned for the invocation
of the
suspend
method. When a method is found that
contains such invocation, it (and all its potential in-
vokers, recursively up to the start of the continuation),
are instrumented to add the continuation-management
code. This includes: a) code which must be executed
when a suspension occurs, which includes switches
that represent the intermediate points of the method
being executed and calls to a library-managed stack
that maintains the content of the stack frame; b) code
for resuming the execution at the point of the suspen-
sion (following from the start the chain of switches)
with the associated stack contents (popping the stack
managed by the library). Examples of this process are
shown in (Ortega-Ruiz et al., 2004; Kunert, 2008).
Let us now describe how the JavaFlow library is
used in different part of JADES. When the first pro-
cess is added to the simulation, it is added to the ready
queue and a continuation is created for the scheduler
code:
public void start (Process p){
scheduler.addProcess(p, scheduler.getCurrentTime());
Continuation.startWith(scheduler);}
When a process wants to add a new process to the
simulation, it tells the scheduler to enqueue the new
process and gives control to the scheduler continua-
tion by suspending itself.
public void add (Process p, double delay){
double currentTime = scheduler.getCurrentTime();
scheduler.addProcess(p, currentTime+delay);
scheduler.addToReadyProcesses(
scheduler.getCurrentProcess(), currentTime);
Continuation.suspend();}
The code for a process hold is shown below. If
the current process will sleep past the wake-up time
of the process at the head of the queue, the process is
added to the ready queue, giving control to the sched-
uler continuation. Otherwise, the current process has
to hold just to be made active immediately next: we
can just advance simulation time to its wake-up time,
avoid unnecessary rescheduling.
public void hold(double time){
double wakeupTime = scheduler.getCurrentTime() + time;
double nextWakeupTime = scheduler.peekNextTime();
if (nextWakeupTime > wakeupTime)
scheduler.setCurrentTime(wakeupTime);
else{
scheduler.addToReadyProcesses(this, wakeUpTime);
Continuation.suspend();}}
The scheduler continuation executes the schedul-
ing loop in which:
1. The process with smallest wakeup time is ex-
tracted from the ready queue. If there is no pro-
cess available, simulation is finished. Otherwise:
2(a). if the extracted process is at its first schedule, a
continuation is created for it and started through
the
Continuation.startWith()
method;
2(b). else if the extracted process is not at its first
schedule, its continuation is resumed through the
Continuation.continueWith()
method.
Process-orientedDiscrete-eventSimulationinJavawithContinuations-QuantitativePerformanceEvaluation
91
Table 1: A Benchmark suite for process-oriented discrete event simulators.
Name Type Parameters Output
ProcessCreator Micro
- simTime total simulation time - # of created processes
- thinkTime delay between successive process
creations
- Execution time (ms)
PingPong Micro
- simTime total simulation time
- Execution time (ms)- thinkTime predetermined process delay
- stackDepth depth of the stack (additional # of
frames put on the stack when delay
is invoked)
MM1Queue Kernel
- simTime total simulation time
- Execution time (ms)- iarTime mean job inter-arrival time
- srvTime mean resource service time
Our impressions on Javaflow, after its adoption for
JADES development, is that it is not the ultimate solu-
tion for providing continuationsin Java, and that there
is still room for improvements. In fact, in the litera-
ture are emerging other approaches that should allow
a more efficient implementation of these constructs
(Stadler, 2011), for example with the direct support of
the virtual machine. Meanwhile, Javaflow offers the
best support available for the continuationswe wanted
to exploit in JADES. The issue here is not to provide
a better implementation of continuations, but simply
to check if their use in the place of threads can lead
to performance benefits in Java discrete-event simula-
tions. This assessment is the goal of the next sections.
5 BENCHMARK DESIGN
To evaluate the efficiency of the proposed approach,
the performance of JADES has been analyzed and
compared to other open source process-oriented Java
simulators. Since both JADES and its competitors are
available for testing, we found direct measurement to
be a suitable technique to obtain highly accurate per-
formance comparisons. However, to the best of our
knowledge, no standard benchmark exists for com-
paring process-oriented discrete-event simulators. In
event-oriented simulators it is common to use the
event processing rate as a measure of performance
of the system under test, but this metric does not ap-
ply to process-oriented systems. The PHOLD model
(Fujimoto, 1990) is suitable for comparing the perfor-
mances of parallel discrete-eventsimulators which in-
teract through message-passing, but it is unfit to eval-
uate sequential simulator behavior.
Our rationale in designing our own suite of bench-
marks for comparision of process-oriented discrete
event simulator is to provide models with small
amount of computation and strong exercising of the
process-oriented simulator mechanisms. The bench-
marks are specified in Table 1 and described hereafter.
The
ProcessCreator
benchmark gives a mea-
sure of how many processes can be handled by
the simulator. It runs until simTime is reached,
spawning a new process every thinkTime. The cre-
ated processes will perform a hold operation until
simTime, in order to remain alive during the whole
simulation and to avoid having their resources col-
lected and reused. At the end of the run, there
will be simTime/thinkTime processes alive. As
ProcessCreator
exercises only a single basic func-
tion, it can be classified as a microbenchmark.
PingPong
is another microbenchmark, which
measures the cost associated with process switch
(what is called, in OS terms, a context switch). It con-
sists of two processes that hold for thinkTime units of
time. The first one is scheduled at time 0, while the
second at time thinkTime/2. This leads to a strict al-
ternation of the two processes, which generate a to-
tal of (2 * simTime/thinkTime) context switches. In
order to measure switch time in different working
conditions, a further parameter is used to supply the
benchmark with the deepness of the stack at which
the hold must occur. This has a direct impact on the
quantity of “context” that must be saved and restored.
The
MM1Queue
benchmark is an implementation
of a M/M/1 queuing model in which customers ar-
rive according to a Poisson process with rate λ, ser-
vice time is exponentially distributed with mean
1
µ
and there is 1 server. When customers find the server
busy, they are added to a queue of infinite capacity.
In the implementation, a
Generator
process loops,
alternately spawning a
Job
and holding for a time ex-
ponentially distributed with mean iarTime. The
Job
processes compete for exclusive access to a resource,
which is used for a time exponentiallydistributedwith
mean srvTime. When λ > µ the queue is unstable and
the expected number of users grows steadily as sim-
SIMULTECH2012-2ndInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
92
ulation proceeds. In process-oriented simulation, an
increasing number of users in the system corresponds
to an increasing number of simultaneously active pro-
cesses. Hence this benchmark can easily activate a
huge number of processes if the queue is unstable.
The
MM1Queue
benchmark can be classified as a ker-
nel benchmark, as it is representative of the core be-
havior of more complex queuing network models.
6 EXPERIMENTS
For our tests, we have selected several state-of-the-art
open source Java simulation engines, purposely ne-
glecting non-Java frameworks, which typically have
non-comparable performance. For every simulation
engine, an implementation of the benchmark suite de-
scribed in the previous section has been devised. The
set of simulators to be evaluated, includes, in addition
to JADES, the following:
JADESThreads, a previous implementation of
JADES based on threads;
Javasimulation (Helsgaun, 2000), version 2.1, as a
representative of the “barebone” process-oriented
simulators, which provide a simple implementa-
tion of basic processes in the form of wrappers
around the underlying thread implementation.
Desmo-J (Lechler and Page, 1999), version 2.3.2,
a process-oriented simulator whose thread-based
implementation spans 10 years of maintenance.
The D-SOL simulator, version 2.1 (Jacobs et al.,
2002). This is representative of the (few) avail-
able simulation engines that are based neither on
threads nor on continuations;
The SSJ simulator (L’Ecuyer and Buist, 2005),
version 2.4, a complete solution for discrete-event
simulation. It is provided with both a thread-based
implementation and an alternative interpretative
mechanism which hinges on D-SOL. Only the for-
mer implementation has been chosen for our tests,
as D-SOL is considered separately.
The next step is the design of a set of experiments
to be performed on the simulators. A simple design
was chosen, in which every benchmark-specific pa-
rameter has been varied to explore its effects. Apart
from benchmark-specificparameters, the tests depend
on the configuration of the Java Virtual Machine. The
standard HotSpot virtual machine in server configura-
tion has been used. Two parameters of the JVM are
particularly relevant for the tests, maximum heap size
and thread stack size. These were judiciously cho-
sen for every benchmark. In the ProcessCreator and
MM1Queue benchmark, which involve the creation
of many processes, the heap was limited to 1 GB to
ensure that thread-based simulators are not penalized.
In the PingPong benchmark, with only 2 processes
involved, the heap limit was raised to 1.5 GB. All the
data points plotted in the figures are actually the aver-
age of 5 repetitions. We observed that the coefficient
of variation (CV) was under 10%. Since many sim-
ulators are involved, error bars are not plotted in the
following figures to avoid cluttering the graphs.
To allow for repeatability and results contextual-
ization, Table 2 summarizes the hardware and soft-
ware configuration of the test environment.
Table 2: Testbed configuration.
CPU Intel Core 2 Duo T9500 2.5 Ghz
RAM 3 GB DDR2 PC5300 @667 Mhz
OS GNU/Linux 2.6.38-2, 32-bit
Java version Java
TM
SE 1.6.0 24-b07
6.1 ProcessCreator Results
Table 3 summarizes the results obtained for the pro-
cess creation benchmark. For each simulator, two
runs of the benchmark try to create 2
12
(4,096)and 2
20
(1,048,576) processes, respectively, measuring the to-
tal execution time. In the second run, as was to be ex-
pected, thread-based simulators fail after the creation
of a few thousands of processes (figures in italic in the
table). Reducing the thread stack size to 64 KB makes
it possible to rise this limit, but on the machines used
for our tests 2
20
processes remain out of the reach of
thread-based simulators. As discussed in Section 2,
this limitation is linked to the maximum number of
native threads that the platform allows to execute.
The only simulators that can manage the 2
20
pro-
cesses required by the
ProcessCreator
benchmark
are JADES and D-SOL. However, the performance of
the latter is very low (around 9 minutes of running
time for the simulation, about 70 times the execution
time of JADES). For 4,096 processes, it can also be
observed that JADES is significantly faster than the
thread-based simulators.
6.2 PingPong Results
Figure 2(a) shows the results of the base
PingPong
benchmark with no additional stack frames
(stackDepth = 0). As D-SOL is much slower
than the other simulator in this test, its results are
not plotted in the figure for the sake of clarity, and
are reported separately in Table 4. As in the figure
the x axis scale is logarithmic, a linear function of
x has been plotted to provide a reference. It can
Process-orientedDiscrete-eventSimulationinJavawithContinuations-QuantitativePerformanceEvaluation
93
Table 3: ProcessCreator benchmark results.
Simulator Thread stack size # proc. requested # proc. created
a
execution time (ms)
a
JADES
N.A. 4,096 4,096 192
N.A. 1,048,576 1,048,576 7,436
JADESThreads
N.A. 4,096 4,096 2,103
256KB 1,048,576 7,151 4,239
64KB 1,048,576 23,292 29,390
javasimulation
N.A. 4,096 4,096 2,291
256KB 1,048,576 6,709 2,929
64KB 1,048,576 23,279 28,880
Desmo-j
N.A. 4,096 4,096 3,354
256KB 1,048,576 7,288 5,558
64KB 1,048,576 24,195 33,729
D-SOL
N.A. 4,096 4,096 2,451
N.A. 1,048,576 1,048,576 534,505
SSJ
N.A. 4,096 4,096 1,275
256KB 1,048,576 6,708 2,896
64KB 1,048,576 23,342 28,052
a
Results in bold indicate the maximum number of processes created and the (partial) execution time when
the simulator fails.
be observed that Desmo-J follows strictly its linear
pattern. JADESThreads, Javasimulation and SSJ
exhibit a very similar less-than-linear growth of
execution time, but in the long run their execution
time grows faster than JADES, which outperforms
all the other simulators (by a factor of 2.5 when
simTime = 1, 000, 000 and growing). Figure 2(b)
Table 4: PingPong execution times for D-SOL.
SimTime Exec. time (ms)
1,000 1,347
10,000 3,917
100,000 17,066
1,000,000 147,372
shows how the simulators behave when the stacks
are deepened by using values of the benchmark
parameter stackDepth > 0. It can be observed that
while thread-based simulators are unaffected by the
increase of stack depth, JADES execution times
grow linearly with stackDepth. This phenomenon is
due to the way JavaFlow implements continuations.
As continuations involve saving and restoring of
the stack, context switches introduce an overhead
proportional to the stack size. On the other hand,
thread-based simulators are not affected by the same
problem, since every thread has its own private stack
and so context switches are executed in constant
time. Once again, the D-SOL results are not plotted
to avoid breaking the graph scale: its execution
time showed to be independent of the stack depth
and very close to the last row of Table 4. Figure
2(b) makes it possible to observe that JADES shows
better performance than the other simulators only for
stackDepth < 50.
In conclusion, we can deduce that only simula-
tion models in which the level of recursion is low (or,
equivalently, call nesting is shallow) will benefit from
the use of continuations.
6.3 MM1Queue Results
Figure 2(c) and 2(d) show the plot of execution times
for the M/M/1 benchmark. The arrival rate has been
set to 1.0. Service rate has been set to 2.0 for ob-
taining the stable queue behavior, and to 0.1 for the
unstable one. The stable queue produces a model in
which the number of active processes in the system
is small, whereas the unstable one implies a growing
number of processes as simulation time flows. Figure
2(c) shows the behavior of the four fastest simulators
with a stable queue. D-SOL and Desmo-j have not
been included in the plot as their execution times are
considerably higher than the others. SSJ and JADES
are the fastest simulators for this benchmark. The lat-
ter gains advantage as simulation time grows.
Figure 2(d) shows execution times in unstable
conditions. Once again, Desmo-j had to be omit-
ted from the graph as it is slower than the others
by a factor of 4. The best performance is exhib-
ited by D-SOL (typically very slow for other bench-
marks) and by JADES, which is the clear winner here,
thanks to the very slow growth of execution times
as simTime increases. It should also be pointed out
that for simTime > 20, 000, the thread-based simula-
tors crash rapidly, as they cannot create the required
SIMULTECH2012-2ndInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
94
10
100
1000
10000
1000 10000 100000 1e+06
Execution time (ms)
Simulation length
JADES
JADESThreads
javasimulation
Desmo-j
SSJ
(a)
PingPong
benchmark: execution time vs. Ssimulated time.
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
100 200 300 400 500 600 700 800
Execution time (ms)
stackDepth
JADES
JADESThreads
Desmo-j
javasimulation
SSJ
(b)
PingPong
benchmark: effect of stack depth on execution time.
100
1000
10000
1000 10000 100000 1e+06
Execution time (ms)
Simulation length
JADES
JADESThreads
javasimulation
SSJ
(c)
MM1Queue
benchmark: stable queue (
iarTime
= 1.0,
srvTime
= 0.5)
0
5000
10000
15000
20000
4000 6000 8000 10000 12000 14000 16000 18000 20000
Execution time (ms)
Simulation length
JADES
JADESThreads
javasimulation
D-SOL
SSJ
(d)
MM1Queue
benchmark: unstable queue (
iarTime
= 1.0,
srvTime
= 10.0).
Figure 2:
PingPong
and
MM1Queue
benchmark results.
number of threads. In the same 20 seconds needed
by the other simulators to simulate 20,000 time units,
JADES is able to simulate 2,000,000 time units, an
improvement of two orders of magnitude.
Process-orientedDiscrete-eventSimulationinJavawithContinuations-QuantitativePerformanceEvaluation
95
7 CONCLUSIONS
Threads are not the only option when implementing
process-oriented discrete-event simulations in Java.
Our tests have shown that the continuation-based
JADES simulator is a viable alternative, which can
lead to significant performance gains in many cases.
In particular, the use of continuations turns out to be
advantageous for applications with stack of moderate
depth. On the other hand, large stacks to be saved
and restored are managed more efficiently by thread-
based simulators. In fact, this behavior is partly due
to the limited efficiency of the continuation library
Javaflow used for JADES development. Things are
likely to change if optimized continuations are intro-
duced into the standard Java platform: the chances of
this introduction get higher and higher as the language
and its ecosystem evolve (Stadler, 2011).
To the best of our knowledge, this paper is the
first contribution in the literature that proposes a
benchmark suite for discrete-event process-oriented
simulators and makes a comparative evaluation of
threads and continuations as means for the imple-
mentation of such simulators in Java. Our future
work will focus on devising parallelization strate-
gies for the simulator, and on the implementation
of the continuation-based simulation library in C
language. Application-level benchmarks with com-
plex models will also be added to the base bench-
mark suite presented here. JADES will be made
publicly available soon with open source license at
http://deal.ing.unisannio.it/perflab/projects/jades/.
REFERENCES
Derrick, E., Balci, O., and Nance, R. (1989). A compari-
son of selected conceptual frameworks for simulation
modeling. In Proc. of the 21st Winter Simulation Con-
ference, pages 711–718.
Fujimoto, R. M. (1990). Performance of Time Warp under
synthetic workloads. In Proc. of 22nd SCS Multicon-
ference on Distributed Simulation.
Helsgaun, K. (2000). Discrete Event Simulation in Java.
http://akira.ruc.dk/keld/research/JAVASIMULATIO
N/JAVASIMULATION-1.0/docs/Report.pdf.
Howell, F. and McNab, R. (1998). SimJava: a discrete event
simulation package for Java with applications in com-
puter systems modelling. In Proc. of the First Interna-
tional Conference on Web-based Modelling and Sim-
ulation.
Jacobs, P., Lang, N., and Verbraeck, A. (2002). D-SOL; a
distributed Java based discrete event simulation archi-
tecture. In Proc. of the 34th Winter Simulation Con-
ference: exploring new frontiers, pages 793–800.
Kiviat, P. (1969). Digital computer simulation: computer
programming languages. Rand Corp.
Kunert, A. (2008). Optimistic parallel Process-Oriented
DES in Java using Bytecode Rewriting. In Proc. of
MESM 2008, pages 15–21.
Lechler, T. and Page, B. (1999). DESMO-J: An object ori-
ented discrete simulation framework in Java. In Proc.
Simulation in Industry ’99 - 11th European Simulation
Symposium ’99, pages 119–124. SCS publ.
L’Ecuyer, P. and Buist, E. (2005). Simulation in Java with
SSJ. In Proc. of the 37th Winter Simulation Confer-
ence, pages 611–620.
Marlin, C. (1980). Coroutines: A Programming Method-
ology, a Language Design and an Implementation,
volume 95 of Lecture notes in computer science.
Springer.
Miller, J., Nair, R., Zhang, Z., and Zhao, H. (1997). JSIM:
A Java-based simulation and animation environment.
In Simulation Symposium, 1997. Proc.. 30th Annual,
pages 31–42. IEEE.
Nance, R. (1996). A history of discrete event simulation
programming languages. In History of programming
languages—II, pages 369–427. ACM.
Ortega-Ruiz, J., Curdt, T., and Ametller-Esquerra, J.
(2004). Continuation-based mobile agent migration.
http://hacks-galore.org/jao/spasm.pdf .
Perumalla, K. and Fujimoto, R. (1998). Efficient large-scale
process-oriented parallel simulations. In Proc. of the
30th Winter Simulation Conference, pages 459–466.
Reynolds, J. (1993). The discoveries of continuations. Lisp
and symbolic computation, 6(3):233–247.
Schwetman, H. (2001). CSIM19: a powerful tool for build-
ing system models. In Proc. of the 33nd Winter Simu-
lation Conference, pages 250–255. IEEE.
Stadler, L. (2011). Serializable coroutines for the
HotSpot
TM
Java virtual machine. Master’s thesis, Jo-
hannes Kepler University Linz, Austria.
Weatherly, R. and Page, E. (2004). Efficient process inter-
action simulation in Java: Implementing co-routines
within a single Java thread. In Proc. of the 36th Win-
ter Simulation Conference, pages 1437–1443.
SIMULTECH2012-2ndInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
96