Deconstructing yield Operator to Enhance Streams Processing
Diogo Poeira and Fernando Miguel Carvalho
a
CCISEL, cc.isel.pt, Polytechnic Institute of Lisbon, Portugal
Keywords:
Yield, Generators, Streams, Lazy Sequences, Iterators, Extensions.
Abstract:
Customizing streams pipelines with new user-defined operations is a well-known pattern regarding streams
processing. However, programming languages face two challenges when considering streams extensibility: 1)
provide a compact and readable way to express new operations, and 2) keep streams’ laziness behavior. From
here, we may find a consensus around the adoption of the generator operator, i.e. yield, as a means to fulfil
both requirements, since most state-of-the-art programming languages provide this feature. Yet, what is the
performance overhead of interleaving a yield-based operation in streams processing? In this work we present
a benchmark based on realistic use cases of two different web APIs, namely: Last.fm and world weather on-
line, where custom yield-based operations may degrade the streams performance in twofold. We also propose
a purely functional and minimalistic design, named tinyield, that can be easily adopted in any programming
language and provides a concise way of chaining extension operations fluently, with low overhead in the eval-
uated benchmarks. The tinyield proposal was deployed in three different libraries, namely for Java (jayield),
JavaScript (tinyield4ts) and .Net (tinyield4net).
1 INTRODUCTION
Lazy evaluation was a well-known technique intro-
duced with lazy lists in Lisp in 1976 (Friedman and
Wise, 1976). Yet, its straightforward application to
object-oriented languages gave rise to ad hoc iterator
classes, that increase substantially their implementa-
tions in complexity and verbosity (Baker, 1993).
The use of the generator operator (i.e. yield) to im-
plement streams suppresses the aforementioned prob-
lem and was widely adopted by mainstream program-
ming languages (with the exception of Java). The
yield operator allows programmers to develop user-
defined operations on streams in a compact manner,
while still preserving their laziness property.
Simply put, a generator is like a function that gen-
erates a sequence of values. However, instead of
building a sequence at once (e.g. array or vector),
a generator yields the values one at a time, i.e. it
returns a ”new” value every time it is called. This
idea was first introduced in CLU programming lan-
guage (Liskov, 1983), but its recent popularity may be
attributed to its use first in C# 2.0 (Borins et al., 2006)
and later in Ruby 1.9 (Thomas and Hunt, 2007). In
CLU and C#, generators are known as iterators, and
in Ruby, enumerators. Also, Python, Php, JavaScript,
a
https://orcid.org/0000-0002-4281-3195
Scala, Dart and Kotlin provide variants of the yield
operator.
Despite all the advantages of using the yield op-
erator, we observe an emerging offer of alternative
streams libraries over the standard libraries of every
programming environment. And, with those libraries
also come distinct extensibility approaches, namely in
Java that lacks the yield operator.
This panoply of libraries includes:
Java: Guava, Protonpack, Vavr, Eclipse Collec-
tions, jOOλ and StreamEx.
JavaScript: IxJs, LazyJs, Lodash, Sequency and
Underscore.
Dotnet: Cister.ValueLinq, LinqFaster, LinqAF,
StructLinq and Hyperlinq.
Given that, how should we elect an auxiliary li-
brary to our project?
Not only did we find a lack of benchmarks that
assess the effectiveness of each alternative, but also
the evaluated workloads have little in common with
real use cases.
This work aims to answer the questions and prob-
lems stated in this Introduction, and more specifically,
the main contributions of this paper are:
A novel benchmark that merges state-of-the-art
toolkits such as kotlin-benchmarks (Ryzhenkov,
Poeira, D. and Miguel Carvalho, F.
Deconstructing yield Operator to Enhance Streams Processing.
DOI: 10.5220/0010541001430150
In Proceedings of the 16th International Conference on Software Technologies (ICSOFT 2021), pages 143-150
ISBN: 978-989-758-523-4
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
143
2014) and JMH (Shipilev, 2013), with the idea of
processing streams from realistic data sources, in-
terleaved with user-defined operations.
A minimalist and functional design of generators,
named tinyield
1
that is the first proposal to unify a
generalized yield model only focused on traversal
and not supported by co-routines as proposed in
previous works(James and Sabry, 2011; Prokopec
and Liu, 2018).
tinyield is not slower than state-of-the-art libraries
and in most cases it is even faster (Poeira and Car-
valho, 2020).
Also, tinyield allows verboseless and fluent ex-
tensibility. It provides concise extension of
streams operations in an equivalent idiom to yield-
based generators, without requiring compiler in-
strumentation support. And, those user-defined
extensions can be fluently chained in streams
pipelines (Fowler, 2015).
The remainder of this paper is organized as fol-
lows. In the next section we establish the terminol-
ogy and we propose a generalized API level design of
yield, named tinyield. After that, Section 3 explains
what tests were devised to analyze the sequence al-
ternatives and discuss the results of the benchmarks.
Section 4 describes the related work and existing al-
ternative libraries. Finally we conclude in Section 5
and discuss some future work.
2 yield GENERALIZED DESIGN
The variants of yield operator are beyond the scope
of this paper and we are only establishing a common
terminology according to its formal model (James and
Sabry, 2011).
After that, we will present our proposal of a gen-
eralized design of yield that can be implemented in
any programming language with higher-order func-
tions support.
2.1 yield Generator Operator
We will dictate the yield operator using
JavaScript (ecm, 2020) as the lingua franca to
focus on the relevant properties that are shared
among different programming languages. JavaScript
is largely based on well-established C language
syntax and influenced by Scheme features. In our
examples, we will avoid specific JavaScript particu-
larities and we mostly use its generalized keywords
1
https://github.com/tinyield
and operators common to other languages, such as:
for, var, [], !=, ++, <<, and others.
The generator operator yield is inspired by the
coroutine primitive yield. In coroutines, the yield pro-
vides a means of suspending a computation, so that
execution can be resumed later (Conway, 1963). In
the same way, the term generator (or iterator) refers to
a computation that: 1) yields values to the caller and,
2) is resumed after the yielded value has been con-
sumed by the caller (Liskov, 1996). Like a coroutine,
the caller must interact with the generator by reading
the yielded values and resuming.
To exemplify the yield semantics in the context
of generators, we will start with a generator of a se-
quence of Cullen numbers defined by C
n
= n · 2
n
+ 1
and implemented in JavaScript according to Listing 1.
function* cullen () {
for (var i = 0; true; i ++)
yield (1 < < i ) * i + 1
}
Listing 1: Javascript generator of Cullen numbers.
Terminology. We use the term generator to refer to
computations that yield values. Only generator func-
tions can use the yield keyword. A free yield results
in a compiler error. Finally, the argument to the yield
operator becomes an output of the generator. We refer
to these outputs as yielded values.
In this sense, the function cullen of Listing 1
is a generator. Notice, that in JavaScript a genera-
tor differentiates from a regular function by the suf-
fix character *, which indicates that it yields a se-
quence of values (potentially infinite). On the other
hand, in strongly typed languages, a generator may
be identified by the function’s returned type (e.g.
IEnumerable in C#).
Traversals allow composing separately written
generators. It must be possible for one generator to
call into another generator and retain the same yield-
ing context.
Consider for example a map with closed address-
ing, which consists of a hash table whose entries
are arrays of elements. We would like a generator
flatten that traverses the elements of the map.
Given the separately implemented list generator
of Listing 2, which yields from an array, it is handy
that the flatten generator of Listing 3 can reuse this
existing functionality by passing arrays from which
to yield. The yield* expression is used to delegate
to another generator. It iterates over the operand and
yields each value returned by it.
ICSOFT 2021 - 16th International Conference on Software Technologies
144
function* li s t ( items ) {
for (const value of it e ms )
yield value
}
Listing 2: yield-based implementation of list generator.
function* flat t e n ( map ) {
const es = O b ject . v a lues ( m ap )
for (const e of es )
yield* li s t ( e )
}
Listing 3: yield-based implementation of flatten generator
that combines the use of list.
This requirement is equivalent to that stated by
a monad combinator, where given a type constructor
M that builds up a monadic type MT and a monadic
function such as T MU, we have:
(MT, T MU) MU
This is the same behavior of yield* list(e).
Given the generator list of type MElement, then
each entry of the map is MArray that is unwrapped
in MElement.
2.2 Tinyield Design
We choose the .Net Type System (CTS, 2012) to spec-
ify the tinyield types design, because it has support for
first-class function types. Notice, for example in Java,
function types are defined by interfaces that may mis-
lead their real purpose. According to .Net type sys-
tem, every function type has an Invoke method that
conforms to its descriptor, i.e. type of the arguments
and return type.
The tinyield generator is based on the Traverser
function type that specifies how the elements of a se-
quence are traversed. A Traverser corresponds to
a delimited subroutine that marks the boundary of a
generator and delimits the action of yield. The ar-
gument of Traverser function is an opaque com-
putation that can yield. This immediately suggests
a monadic encapsulation for the effectful generator
computations with yield as the only effect operator of
the monad. Since Traverser marks the boundary of
this effect, it can be used as the operation that escapes
the monad.
Notice that in C#, Ruby and JavaScript, the equiv-
alent to Traverser is hidden in the implementation
of the loop construct, that is the for( of ) statement
of Listings 2 and 3.
In Figure 1 we depicted the types design of
Traverser and Yield, which are implemented in the
three distributions of the tinyield, in Java, C# and
Typescript (a strict syntactical superset of JavaScript).
The generator parameters are not identified in Fig-
ure 1 and are captured by the traverser lexical scope
Figure 1: Class diagram of Traverser and Yield types.
from the generator function (closure). In Listings 4
and 5 we present the corresponding implementations
of generators list and flatten according to tinyield
types Traverser and Yield defined in C#.
Tra v e r s e r < T > list (.. . it e m s ) {
return yield = > {
foreach ( T v alue in items )
yield( v a l ue ) ;
};
}
Listing 4: tinyield based implementation of list generator.
Tra v e r s e r < T > f l a t t e n ( I E n u m e r a b l e < T > ma p ) {
return yield = > {
foreach(var entry in ma p . Values )
li s t ( entry ) (yield) ;
};
}
Listing 5: tinyield based implementation of flatten
generator.
Each lambda (i.e. =>) returned by each function
encloses the generator boundary that captures the gen-
erator parameters (i.e. items and map).
These implementations do not require any com-
piler instrumentation support since we do not use
any kind of special primitive, like yield. Notice
that in Listing 4 and 5, yield is of type Yield<T>
and it is the argument of the Traverser. This
Yield<T> instance encloses the context that can
be preserved across different generators’ calls, e.g.
list(entry)(yield), complying to the composi-
tion property stated in sub-section 2.1. The call to
Invoke is implicit in list(entry)(yield), which is
a simplification for list(entry).Invoke(yield).
We only take advantage of higher-order functions
and the ability to define local functions (i.e. lambdas),
which are closed over their free lexical variables (i.e.
closures) (Landin, 1964).
The other difference from the tinyield proposal
to the JavaScript generator is that the resulting se-
quence from the JavaScript generator may be tra-
versed with a for( of ) loop whereas the result-
ing tinyield Traverser cannot be traversed with the
equivalent C# foreach( in ). The Traverser can
be traversed only through its invocation, for exam-
ple as presented in Listing 6. The difference between
the two forms of traversing is usually denoted as pull
versus push access, where pull denotes getting items
(ask) and push regards expressing what to do with
those items (tell) (Hunt and Thomas, 2003).
Deconstructing yield Operator to Enhance Streams Processing
145
flatt e n ( ma p ) ( C o n s o le . Wr i t e L in e ) ;
Listing 6: Traversing elements from a Traverser in a push
style idiom.
Yet, the tinyield Traverser has a limitation re-
garding the yield primitive: a suspended Traverser
is not a first-class value. A Traverser performs a sin-
gle bulk computation. Hence, the caller relinquishes
control, and many algorithms cannot do this, such as
any algorithm that needs to manipulate two sequences
simultaneously. For example, the zip (also known as
convolution) is an operation that takes a tuple of se-
quences and transforms them into a sequence of tu-
ples. This problem is easily solved if we convert at
least one of the sequences to an explicit data struc-
ture. Yet, there is a useless overhead in case of that se-
quence being very large and the streams do not match.
Hence, we will have performed a great deal of work
for nothing.
Thus, we need a suspendable traversal that is able
to iterate element by element, rather than all elements
in bulk. To that end we have designed an alterna-
tive way of traversing elements individually, which is
specified by the tinyield Advancer function type de-
picted in Figure 2.
Figure 2: Class diagram of Advancer and Yield types.
The Advancer is similar to the Traverser de-
scriptor but returns a Boolean instead (i.e. bool). An
Advancer function is expected to yield the next el-
ement of the sequence, if there are any, and returns
whether an element was processed, or not. Simply
put, it essentially merges the behavior of hasNext()
and next() of Java Iterator interface in a single
subroutine.
To traverse all elements of an Advancer we need
to perform a foreach( in ) loop, as presented
in the next statement that traverses an hypothetical
Advancer<T> adv and prints all its elements:
while( adv ( C o n s o l e . W rit e L i n e ) ) { }
In Listing 7, we present an Advancer based imple-
mentation of a zip( upstream, other, zipper)
that applies the specified zipper function to the cor-
responding elements between upstream and other,
producing a new sequence of results. Both upstream
and other are of type Advancer. Notice that the call
to other() produces a Boolean value according to the
Advancer idiom. We use this value to let the result-
ing Advancer inform whether it has yielded a value,
or not. Also, when the upstream is empty the inner
lambda is not performed and the variable yielded re-
mains false. So, only when we successful advance on
both streams, the zip produces a new value and the
variable yielded is changed to true.
Ad v a n cer < R > Zip < T , U , R > (
Ad v a n cer < T > up s t r eam ,
Ad v a n cer < U > othe r ,
Func <T , U , R > z i p p e r )
{
return yield = > {
bool yi e l d e d = false;
ups t r e a m ( e1 = >
yield e d = other ( e2 = >
yield( z i p p e r ( e1 , e2 ) )
)
) ;
return yi e l d e d ;
}
}
Listing 7: Zip operation for Advancer based sequences.
Our Advancer based implementation is much
more compact than its equivalent counterpart in Java.
For example, the accepted answer to the question
Zipping streams using JDK8 with lambda
2
gives
an implementation with more than 30 lines of code.
Moreover, our proposal outperforms Java streams in
a realistic benchmark zipping sequences from Last.fm
(Section 3).
Concluding, and like many others streams li-
braries, the tinyield library provides implementa-
tion of core streams processing operations, such as
map, filter, reduce, limit, takeWhile, zip, and oth-
ers (Fowler, 2015). These operations may require one,
or both ways of traversal: Traverser and Advancer.
To that end, the tinyield type Query<T> aggregates
the two traversal methods in a single instance. Then,
operations are built on top of the Query<T> type that
allows chaining invocations fluently. In this case, the
terminal operation will decide which traversal method
to use.
Finally, we should not be restricted to the oper-
ations suite provided by a streams library. To that
end, we included in tinyield a fluent way of chain-
ing user-defined operations. Since, we give priority
to Traverser type traversal, we provide in Query a
method then, which receives a function that maps an
upstream Query in a new Traverser, such that:
Th e n ( F un c < Q uer y < T > , T raverse r <R > > next )
Consider for example, that we would like to use
an absent distinctBy operation to get a sequence of
random numbers with distinct lengths of digits. In
Listing 8 we show how to implement and chain this
new operation fluently in such pipeline.
This is the most concise way of interleaving a
user-defined operation. Yet, it fails if the terminal
operation requires an Advancer, as is the case for zip.
2
stackoverflow.com/a/23529010/1140754
ICSOFT 2021 - 16th International Conference on Software Technologies
146
Set < int > l e n g t hs = new Ha s h Set < > () ;
Random r a n d = new R a n d o m () ;
Quer y
. G e n e rat e (() => ra n d . n e x t () * MAX )
. L imit ( 1 0 2 4 )
. Ma p ( C o n v e r t . T o I n t32 )
. Then ( u p s tr e a m = > yield = > u p s t rea m .
Tra v e r s e (
it e m = > {
in t nr O f D i g i t s = it e m . Length ;
if ( l e n g t hs . A dd ( n rO f D i g it s ) )
yield( item ) ;
}) )
. T r a v ers e ( C o n s o l e . W r it e L i n e ) ;
Listing 8: User-defined distinctBy fluently chained in a
tinyield pipeline.
For those cases, we provide an alternative over-
loaded Then that receives two mapping functions to
produce both ways of traversal. However, that alter-
native will incur in verbosity.
3 PERFORMANCE EVALUATION
To avoid I/O operations during benchmark execu-
tion, we have previously collected all data into re-
source files, loading all that data into in-memory data
structures on benchmark bootstrap. Thus, we avoid
any I/O by providing the sequences sources from
memory. You may find further environment details
on sequences-benchmark repository (Poeira and Car-
valho, 2020).
To achieve the most unbiased and precise results
we relied our benchmarks in state-of-the-art platforms
for performance analysis in the both environments:
JMH (Shipilev, 2013) in Java and benchmark.js (By-
nens and Dalton, 2014) in JavaScript.
We ran our tests on a local machine which has the
following specs: Microsoft Windows 10 Home, In-
tel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz. We
have used the following runtimes: Openjdk 15.0.1
(build 15.0.1+9-18), Dotnet core 5.0.102 and Node.js
v12.13.1.
Custom Operation Every. The custom operation,
Every is based on the Stackoverflow question Zip-
ping streams using JDK8 with lambda”. The question
also discussed how the lack of a zip operation in Java
Stream was significant. Our benchmark leveraged
some ideas from kotlin-benchmarks (Ryzhenkov,
2014), such as testing with three different data types:
Integer, String and a Value class holding an int
and a String field that are combined to implement
equals and hashCode.
Every is an operation that, based on a user-defined
predicate, tests if all the elements of a sequence match
between corresponding positions. To implement the
every() operation we simply combine the zip() and
allMatch() operations in sequence, such as:
se q 1
. zi p ( seq2 , pre d :: t e st )
. a l l M atc h ( B o o l e a n . T RUE : : eq u a l s ) ;
The tinyield is the most performant in both Java
and JavaScript environments as depicted in charts of
Figure 3. JavasScript arrays present the most close
performance to tinyield, but they become unafford-
able for large data sets due to their eager nature.
Last.fm. To benchmark use cases with real-world
data, we resorted to publicly available Web APIs,
namely REST Countries and Last.fm. We retrieved
from REST Countries a list of 250 countries and then
used them to query Last.fm, retrieving both the top
Artists and the top Tracks by country, resulting in a
total of 7500 records each.
The domain model for these benchmarks can be
summarized by the entities: Country, Language,
Track, and Artist.
We devised two benchmarks using data from
Last.fm, “Distinct Top Artist and Top Track by Coun-
try identified in Figure 4 as “Distinct”, and“Artists
Who Are in A Country’s Top Ten Who Also Have
Tracks in The Same Country’s Top Ten identified as
Filter”(Poeira and Carvalho, 2020). Both bench-
marks start off the same way. We first query all the
countries, filter the non-English speaking countries
and, from these, we retrieve two sequences: one pair-
ing Country with it’s top Tracks and another pair-
ing Country with it’s top Artists(Poeira and Carvalho,
2020).
Tinyield is only overtaken on Last.fm benchmark
most significantly on Java, by Kotlin Sequence. Nev-
ertheless, tinyield is the second most performant li-
brary in Java for the Last.fm and with many advan-
tages on extensibility over Java streams (i.e. verbose-
less and fluency).
On JavasScript the yield counterpart used by IxJs
is the worst performant in all benchmarks. On the
other hand, Lazy.js and Sequency are also well per-
formant alternatives for Last.fm that avoid yield prim-
itive as is tinyield. Arrays also present good behavior
but can be unviable for larger data sets.
Weather and User-defined Operations. We used
another realistic data source from WorldWeatherOn-
line to benchmark interleaved user-defined opera-
tions. For these benchmarks, we created two cus-
tom operations: oddLines and collapse. We then
queried WorldWeatherOnline for the weather in Lis-
bon, Portugal between the dates of 2020-05-08 and
Deconstructing yield Operator to Enhance Streams Processing
147
Class
Class
Class
Class
Class
Class
Class
Number
Number
Number
Number
Number
Number
Number
String
String
String
String
String
String
String
0
5
10
15
20
25
30
35
40
45
Underscore
Tinyield
Sequency
Lodash
ES6 Arrays
Lazy.js
IxJs
10
3
ops/s
Every Node.js
Class
Class
Class
Class
Class
Class
Class
Integer
Integer
Integer
Integer
Integer
Integer
Integer
String
String
String
String
String
String
String
0
50
100
150
200
250
300
350
400
450
Eclipse-…
Tinyield
Jool
Kotlin Sequence
Stream
StreamEx
Vavr
10
3
ops/s
Every Java
Figure 3: Performance in throughput on Every benchmark for 1000 elements.
Filter
Filter
Filter
Filter
Filter
Filter
Filter
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
0
2
4
6
8
Undersco…
Tinyield
Sequency
Lodash
ES6 Arrays
Lazy.js
IxJs
10
3
ops/s
LastFM Node.js
Filter
Filter
Filter
Filter
Filter
Filter
Filter
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
0
5
10
15
Eclipse…
Tinyield
Jool
Kotlin…
Stream
Stream…
Vavr
103ops/s
LastFM Java
Figure 4: Performance in throughput on Last.fm benchmark.
2020-11-08, providing us with a CSV file that we ma-
nipulated with the operations above in a benchmark
to perform the following queries: 1) maximum tem-
perature; 2) count distinct temperatures values, and 3)
count temperature transitions.
Sequency is a JavaScript library developed in
TypeScript like tinyield4net. However tinyield is be-
tween 2 and 3-fold faster than Sequency on weather
as depicted in Figure 5.
For comparison, Prokopec (Prokopec and Liu,
2018) also has observed that lazy functional lists
are 12-17x slower. We have also experimented that
same behavior in Java most significantly on weather
benchmark for Vavr, which is a purely functional
and immutable-based data structure, and also for
StreamEx. Vavr and StreamEx approach to user-
defined operations consist on using a cons (Friedman
and Wise, 1976) in conjunction with the head method
and a supplier for the new tail of the sequence recur-
sively.
Alternative Approaches Comparison. Tinyield
performance gains are due to the fast-path iteration
protocol that has less overhead when bulk traversing
a sequence than a common iterators does. This ap-
proach reduces the overhead of per-element access,
and increases the effectiveness of other optimizations
such as inlining, code motion, bounds check elimina-
tion, and others.
Not only that but tinyield is the only Java library
with a verboseless way of defining new stream op-
erations while also maintaining the fluency of the
pipeline.
The Java programming language does not provide
a yield primitive and extending streams API incurs in
inevitable verbose implementations.
We identified a few advantages of Kotlin’s Se-
quence, namely on operations that in Java would re-
turn Optional, return nullable in Kotlin, meaning no
wrapper is created resulting in less overhead. More-
over Kotlin’s terminal operations are inline so there is
no indirection when calling terminal operations.
Eclipse Collections has a lot of optimizations
in place regarding the data-source of the pipeline,
namely if an array was at the source then iteration will
be as fast as using a for loop.
The main gain of StreamEx, jOOλ and Vavr is the
fact that these libraries bring extra functionality to the
user out of the box with almost no need of creating
new user-defined operations.
JavaScript supports, since EcmaScript5 in
2009 (ecm, 2020), operation chaining over sequences,
in other words, sequence pipelines. JavaScript’s se-
quence type is the Array type, distinguishing itself
from other sequence type implementations by having
an eager approach. Generators and the yield keyword
ICSOFT 2021 - 16th International Conference on Software Technologies
148
Transions
Transions
Transions
Transions
Transions
Transions
Transions
Max
Max
Max
Max
Max
Max
Max
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
0
20
40
60
80
100
120
140
Undersc…
Tinyield
Sequency
Lodash
ES6 Arrays
Lazy.js
IxJs
10
3
ops/s
Weather Node.js
Transions
Transions
Transions
Transions
Transions
Transions
Transions
Max
Max
Max
Max
Max
Max
Max
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
Disnct
0
50
100
150
200
250
300
350
400
450
500
Eclipse-…
Tinyield
Jool
Kotlin…
Java…
StreamEx
Vavr
10
3
ops/s
Weather Java
Figure 5: Performance in throughput on weather benchmark.
were later introduced with ES6 in 2015, yet, no lazy
sequence type implementations were provided by
this new standard either forcing developers to look
for this feature in third-party libraries.
Lodash and Underscore, although they are quite
popular in the Javascript world, they suffer from the
same problems of ES6 Arrays. When processing a
sequence pipeline these libraries will calculate all in-
termediate results before proceeding to the next oper-
ation, which incurs in the same unnecessary process-
ing observed in ES6 Arrays.
Lazy.js and Sequency propose alternative traver-
sal designs as is tinyield, but their proposals have no
foundation on a unified yield model. Moreover, Se-
quency is one of the worst performant libraries on
weather benchmark.
4 RELATED WORK
Lazy traversal is inspired by the concept of lazy lists,
also known as streams, first described in 1965 by
Landin (Landin, 1965). It was Landin who proposed
the use of delayed evaluation to avoid ”item-by-item”
representation of collections. Friedman and Wise
(Friedman and Wise, 1976) introduced lazy lists in
Lisp in 1976 and the idea was then adopted in other
languages too, either as a fundamental data structure,
as in Haskell(Jones, 2003).
Alphard, developed at CMU in the late 1970 was
the first programming language to introduce the gen-
erator operator (Shaw et al., 1977). That construct
inspired iterators in CLU (Liskov, 1983) as a proce-
dure that returns a sequence of elements, that allows
to get at the elements one at a time.
The idea of a single iteration method was intro-
duced in Python 2.2, where iterators provide a single
method next that returns the next element in a se-
quence, or raises an exception when no more elements
are available (Yee and van Rossum, 2001). This fea-
ture is described in the proposal PEP 234 (Python
Enhancement Proposal 234) Iterators (Yee and van
Rossum, 2001). The advantages of a single traversal
subroutine were highlighted in (Baker, 1993), where
H. Baker shows how higher-order functions, taking
as argument functions which are closed over their
free lexical variables (closures) can be used to pro-
vide iteration capabilities. Similar simplification with
higher-order functions has been followed in several
domain-specific approaches namely on template pro-
cessors (Carvalho et al., 2020).
Many use-cases evaluated sequence traversal per-
formance through the use of benchmarks, namely
kotlin-benchmarks(Ryzhenkov, 2014), which pro-
vides benchmarks over Kotlins features such as the
use of Sequence. Another example are the bench-
marks devised by Angelika Langer and Klaus Kreft
(Langer and Kreft, 2015) with the aim of better under-
standing how Java Streams perform and when paral-
lel() outperforms sequential processing of the same
Stream. Nicolai Parlog also tackled this point in his
benchmarks on Parallel Stream Vectorization (Par-
log, 2019), evaluating the performance gained using
Stream parallel() when computing factorials.
5 CONCLUSIONS
Generators are heavily inspired by co-routines, which
generally follow two approaches to implement control
flow: call stack manipulation and program transfor-
mation, i.e. instrumentation. The main problem with
both approaches regards their overheads due to con-
text switch manipulation.
In the first approach, the runtime is augmented
with call stack introspection or the ability to swap call
stacks during the execution of the program. Several
attempts in the context of the JVM runtime (Dragos
et al., 2007; Stadler et al., 2009) did not become of-
ficial and most recently OpenJDK is still working on
Project Loom (OpenJDK, 2020) that aims to provide
fibers and continuations for the Java Virtual Machine.
Deconstructing yield Operator to Enhance Streams Processing
149
In the second approach, the compiler transforms
the program to translate coroutines into an equivalent
program without coroutines. This is the approach fol-
lowed by most generators such as in C#, JavaScript,
and others.
We claim that leash to co-routines induce heavy-
weight approaches that incur in performance over-
heads on generators. On the other hand, object-
oriented iterators incur in useless complexity and ver-
bosity that affects readability and expressiveness of
streams operations (Baker, 1993). The tinyield design
proposal suppresses those limitations with advantages
in both performance and extensibility conciseness.
Our model has been already extended for asyn-
chronous processing and has evidences that may over-
take alternatives such as reactive streams (Kaazing
et al., 2017) achieving better throughput under some
non-blocking IO scenarios. Previous work has al-
ready been made in this field (Prokopec and Liu,
2018) but again, it is tight with co-routines subject,
which we have shown in this work that is harmful for
streams traversal, namely in JavaScript environment.
ACKNOWLEDGEMENTS
This work was supported by Instituto Politec-
nico de Lisboa, Lisbon, Portugal, for funding
the projet Reactive Web streams for Big Data
(IPL/2020/WebFluid ISEL).
REFERENCES
(2020). ECMAScript 2020 language specification, 11th edi-
tion. ECMA, 11 edition.
Baker, H. G. (1993). Iterators: Signs of weakness in object-
oriented languages. SIGPLAN OOPS Mess., 4(3):18–
25.
Borins, M., Braun, A. R., Palmer, R., and Terlson,
B. (2006). ECMA-334 C# language specification.
ECMA, 5 edition.
Bynens, M. and Dalton, J.-D. (2014). benchmarkjs: A
benchmarking library that supports high-resolution
timer.
Carvalho, F. M., Duarte, L., and Gouesse, J. (2020). Text
web templates considered harmful. In Web Informa-
tion Systems and Technologies, pages 69–95, Cham.
Springer International Publishing.
Conway, M. E. (1963). Design of a separable transition-
diagram compiler. Commun. ACM, 6(7):396–408.
CTS (2012). ECMA-335 Common Language Infrastructure
(CLI), 6th edition, June 2012. ECMA, 6 edition.
Dragos, I., Cunei, A., and Vitek, J. (2007). The-
ory and practice of coroutines with snapshots. In
ICOOOLPS’2007, Technische Universit
¨
at Berlin.
Fowler, M. (2015). Collection pipeline.
Friedman, D. P. and Wise, D. S. (1976). CONS should not
evaluate its arguments. In Michaelson, S. and Milner,
R., editors, Automata, Languages and Programming,
pages 257–284, Edinburgh, Scotland. Edinburgh Uni-
versity Press.
Hunt, A. and Thomas, D. (2003). The art of enbugging.
IEEE SOFTWARE.
James, R. and Sabry, A. (2011). Yield: Mainstream de-
limited continuations. Workshop on the Theory and
Practice of Delimited Continuations.
Jones, S. (2003). Haskell 98 Language and Libraries: The
Revised Report. Journal of functional programming:
Special issue. Cambridge University Press.
Kaazing, Lightbend, Netflix, Pivotal, and Hat, R. (2017).
Reactive streams specification for the jvm.
Landin, P. J. (1964). The Mechanical Evaluation of Expres-
sions. The Computer Journal, 6(4):308–320.
Landin, P. J. (1965). Correspondence between algol 60
and church’s lambda-notation: Part i. Commun. ACM,
8(2):89–101.
Langer, A. and Kreft, K. (2015). Stream performance. JAX
London Online Conference.
Liskov, B. (1983). CLU Reference Manual. Springer-Verlag
New York, Inc., Secaucus, NJ, USA.
Liskov, B. (1996). A History of CLU, page 471–510. Asso-
ciation for Computing Machinery, NY, USA.
OpenJDK (2020). Loom - fibers, continuations and tail-
calls for the jvm.
Parlog, N. (2019). Github.
Poeira, D. and Carvalho, F. M. (2020). Benchmark for dif-
ferent sequence operations in java and kotlin. Tech-
nical report, https://github.com/tinyield/sequences-
benchmarks.
Prokopec, A. and Liu, F. (2018). Theory and practice of
coroutines with snapshots. In European Conference
on Object-Oriented Programming.
Ryzhenkov, I. (2014). JetBrains.
Shaw, M., Wulf, W. A., and London, R. L. (1977). Ab-
straction and verification in alphard: Defining and
specifying iteration and generators. Commun. ACM,
20(8):553–564.
Shipilev, A. (2013). Java microbenchmark harness (the
lesser of two evils).
Stadler, L., Wimmer, C., W
¨
urthinger, T., M
¨
ossenb
¨
ock, H.,
and Rose, J. (2009). Lazy continuations for java vir-
tual machines. In Proceedings of the 7th International
Conference on Principles and Practice of Program-
ming in Java, PPPJ ’09, page 143–152, New York,
NY, USA. Association for Computing Machinery.
Thomas, D. and Hunt, A. (2007). Programming Ruby: The
Pragmatic Programmer’s Guide. Addison-Wesley.
Yee, K.-P. and van Rossum, G. (2001). Pep 234 – iterators.
Technical report, Python.
ICSOFT 2021 - 16th International Conference on Software Technologies
150