Deconstructing yield Operator to Enhance Streams Processing

Diogo Poeira and Fernando Miguel Carvalho

CCISEL, cc.isel.pt, Polytechnic Institute of Lisbon, Portugal

Keywords:

Yield, Generators, Streams, Lazy Sequences, Iterators, Extensions.

Abstract:

Customizing streams pipelines with new user-deﬁned operations is a well-known pattern regarding streams

processing. However, programming languages face two challenges when considering streams extensibility: 1)

provide a compact and readable way to express new operations, and 2) keep streams’ laziness behavior. From

here, we may ﬁnd a consensus around the adoption of the generator operator, i.e. yield, as a means to fulﬁl

both requirements, since most state-of-the-art programming languages provide this feature. Yet, what is the

performance overhead of interleaving a yield-based operation in streams processing? In this work we present

a benchmark based on realistic use cases of two different web APIs, namely: Last.fm and world weather on-

line, where custom yield-based operations may degrade the streams performance in twofold. We also propose

a purely functional and minimalistic design, named tinyield, that can be easily adopted in any programming

language and provides a concise way of chaining extension operations ﬂuently, with low overhead in the eval-

uated benchmarks. The tinyield proposal was deployed in three different libraries, namely for Java (jayield),

JavaScript (tinyield4ts) and .Net (tinyield4net).

1 INTRODUCTION

Lazy evaluation was a well-known technique intro-

duced with lazy lists in Lisp in 1976 (Friedman and

Wise, 1976). Yet, its straightforward application to

object-oriented languages gave rise to ad hoc iterator

classes, that increase substantially their implementa-

tions in complexity and verbosity (Baker, 1993).

The use of the generator operator (i.e. yield) to im-

plement streams suppresses the aforementioned prob-

lem and was widely adopted by mainstream program-

ming languages (with the exception of Java). The

yield operator allows programmers to develop user-

deﬁned operations on streams in a compact manner,

while still preserving their laziness property.

Simply put, a generator is like a function that gen-

erates a sequence of values. However, instead of

building a sequence at once (e.g. array or vector),

a generator yields the values one at a time, i.e. it

returns a ”new” value every time it is called. This

idea was ﬁrst introduced in CLU programming lan-

guage (Liskov, 1983), but its recent popularity may be

attributed to its use ﬁrst in C# 2.0 (Borins et al., 2006)

and later in Ruby 1.9 (Thomas and Hunt, 2007). In

CLU and C#, generators are known as iterators, and

in Ruby, enumerators. Also, Python, Php, JavaScript,

https://orcid.org/0000-0002-4281-3195

Scala, Dart and Kotlin provide variants of the yield

operator.

Despite all the advantages of using the yield op-

erator, we observe an emerging offer of alternative

streams libraries over the standard libraries of every

programming environment. And, with those libraries

also come distinct extensibility approaches, namely in

Java that lacks the yield operator.

This panoply of libraries includes:

• Java: Guava, Protonpack, Vavr, Eclipse Collec-

tions, jOOλ and StreamEx.

• JavaScript: IxJs, LazyJs, Lodash, Sequency and

Underscore.

• Dotnet: Cister.ValueLinq, LinqFaster, LinqAF,

StructLinq and Hyperlinq.

Given that, how should we elect an auxiliary li-

brary to our project?

Not only did we ﬁnd a lack of benchmarks that

assess the effectiveness of each alternative, but also

the evaluated workloads have little in common with

real use cases.

This work aims to answer the questions and prob-

lems stated in this Introduction, and more speciﬁcally,

the main contributions of this paper are:

• A novel benchmark that merges state-of-the-art

toolkits such as kotlin-benchmarks (Ryzhenkov,

Poeira, D. and Miguel Carvalho, F.

Deconstructing yield Operator to Enhance Streams Processing.

DOI: 10.5220/0010541001430150

In Proceedings of the 16th International Conference on Software Technologies (ICSOFT 2021), pages 143-150

ISBN: 978-989-758-523-4

143

2014) and JMH (Shipilev, 2013), with the idea of

processing streams from realistic data sources, in-

terleaved with user-deﬁned operations.

• A minimalist and functional design of generators,

named tinyield

that is the ﬁrst proposal to unify a

generalized yield model only focused on traversal

and not supported by co-routines as proposed in

previous works(James and Sabry, 2011; Prokopec

and Liu, 2018).

• tinyield is not slower than state-of-the-art libraries

and in most cases it is even faster (Poeira and Car-

valho, 2020).

• Also, tinyield allows verboseless and ﬂuent ex-

tensibility. It provides concise extension of

streams operations in an equivalent idiom to yield-

based generators, without requiring compiler in-

strumentation support. And, those user-deﬁned

extensions can be ﬂuently chained in streams

pipelines (Fowler, 2015).

The remainder of this paper is organized as fol-

lows. In the next section we establish the terminol-

ogy and we propose a generalized API level design of

yield, named tinyield. After that, Section 3 explains

what tests were devised to analyze the sequence al-

ternatives and discuss the results of the benchmarks.

Section 4 describes the related work and existing al-

ternative libraries. Finally we conclude in Section 5

and discuss some future work.

2 yield GENERALIZED DESIGN

The variants of yield operator are beyond the scope

of this paper and we are only establishing a common

terminology according to its formal model (James and

Sabry, 2011).

After that, we will present our proposal of a gen-

eralized design of yield that can be implemented in

any programming language with higher-order func-

tions support.

2.1 yield Generator Operator

We will dictate the yield operator using

JavaScript (ecm, 2020) as the lingua franca to

focus on the relevant properties that are shared

among different programming languages. JavaScript

is largely based on well-established C language

syntax and inﬂuenced by Scheme features. In our

examples, we will avoid speciﬁc JavaScript particu-

larities and we mostly use its generalized keywords

https://github.com/tinyield

and operators common to other languages, such as:

for, var, [], !=, ++, <<, and others.

The generator operator yield is inspired by the

coroutine primitive yield. In coroutines, the yield pro-

vides a means of suspending a computation, so that

execution can be resumed later (Conway, 1963). In

the same way, the term generator (or iterator) refers to

a computation that: 1) yields values to the caller and,

2) is resumed after the yielded value has been con-

sumed by the caller (Liskov, 1996). Like a coroutine,

the caller must interact with the generator by reading

the yielded values and resuming.

To exemplify the yield semantics in the context

of generators, we will start with a generator of a se-

quence of Cullen numbers deﬁned by C

= n · 2

+ 1

and implemented in JavaScript according to Listing 1.

function* cullen () {

for (var i = 0; true; i ++)

yield (1 < < i ) * i + 1

}

Listing 1: Javascript generator of Cullen numbers.

Terminology. We use the term generator to refer to

computations that yield values. Only generator func-

tions can use the yield keyword. A free yield results

in a compiler error. Finally, the argument to the yield

operator becomes an output of the generator. We refer

to these outputs as yielded values.

In this sense, the function cullen of Listing 1

is a generator. Notice, that in JavaScript a genera-

tor differentiates from a regular function by the suf-

ﬁx character *, which indicates that it yields a se-

quence of values (potentially inﬁnite). On the other

hand, in strongly typed languages, a generator may

be identiﬁed by the function’s returned type (e.g.

IEnumerable in C#).

Traversals allow composing separately written

generators. It must be possible for one generator to

call into another generator and retain the same yield-

ing context.

Consider for example a map with closed address-

ing, which consists of a hash table whose entries

are arrays of elements. We would like a generator

flatten that traverses the elements of the map.

Given the separately implemented list generator

of Listing 2, which yields from an array, it is handy

that the flatten generator of Listing 3 can reuse this

existing functionality by passing arrays from which

to yield. The yield* expression is used to delegate

to another generator. It iterates over the operand and

yields each value returned by it.

ICSOFT 2021 - 16th International Conference on Software Technologies

144

function* li s t ( items ) {

for (const value of it e ms )

yield value

}

Listing 2: yield-based implementation of list generator.

function* flat t e n ( map ) {

const es = O b ject . v a lues ( m ap )

for (const e of es )

yield* li s t ( e )

}

Listing 3: yield-based implementation of flatten generator

that combines the use of list.

This requirement is equivalent to that stated by

a monad combinator, where given a type constructor

M that builds up a monadic type MT and a monadic

function such as T → MU, we have:

(MT, T → MU) → MU

This is the same behavior of yield* list(e).

Given the generator list of type MElement, then

each entry of the map is MArray that is unwrapped

in MElement.

2.2 Tinyield Design

We choose the .Net Type System (CTS, 2012) to spec-

ify the tinyield types design, because it has support for

ﬁrst-class function types. Notice, for example in Java,

function types are deﬁned by interfaces that may mis-

lead their real purpose. According to .Net type sys-

tem, every function type has an Invoke method that

conforms to its descriptor, i.e. type of the arguments

and return type.

The tinyield generator is based on the Traverser

function type that speciﬁes how the elements of a se-

quence are traversed. A Traverser corresponds to

a delimited subroutine that marks the boundary of a

generator and delimits the action of yield. The ar-

gument of Traverser function is an opaque com-

putation that can yield. This immediately suggests

a monadic encapsulation for the effectful generator

computations with yield as the only effect operator of

the monad. Since Traverser marks the boundary of

this effect, it can be used as the operation that escapes

the monad.

Notice that in C#, Ruby and JavaScript, the equiv-

alent to Traverser is hidden in the implementation

of the loop construct, that is the for( of ) statement

of Listings 2 and 3.

In Figure 1 we depicted the types design of

Traverser and Yield, which are implemented in the

three distributions of the tinyield, in Java, C# and

Typescript (a strict syntactical superset of JavaScript).

The generator parameters are not identiﬁed in Fig-

ure 1 and are captured by the traverser lexical scope

Figure 1: Class diagram of Traverser and Yield types.

from the generator function (closure). In Listings 4

and 5 we present the corresponding implementations

of generators list and flatten according to tinyield

types Traverser and Yield deﬁned in C#.

Tra v e r s e r < T > list (.. . it e m s ) {

return yield = > {

foreach ( T v alue in items )

yield( v a l ue ) ;

};

}

Listing 4: tinyield based implementation of list generator.

Tra v e r s e r < T > f l a t t e n ( I E n u m e r a b l e < T > ma p ) {

return yield = > {

foreach(var entry in ma p . Values )

li s t ( entry ) (yield) ;

};

}

Listing 5: tinyield based implementation of flatten

generator.

Each lambda (i.e. =>) returned by each function

encloses the generator boundary that captures the gen-

erator parameters (i.e. items and map).

These implementations do not require any com-

piler instrumentation support since we do not use

any kind of special primitive, like yield. Notice

that in Listing 4 and 5, yield is of type Yield<T>

and it is the argument of the Traverser. This

Yield<T> instance encloses the context that can

be preserved across different generators’ calls, e.g.

list(entry)(yield), complying to the composi-

tion property stated in sub-section 2.1. The call to

Invoke is implicit in list(entry)(yield), which is

a simpliﬁcation for list(entry).Invoke(yield).

We only take advantage of higher-order functions

and the ability to deﬁne local functions (i.e. lambdas),

which are closed over their free lexical variables (i.e.

closures) (Landin, 1964).

The other difference from the tinyield proposal

to the JavaScript generator is that the resulting se-

quence from the JavaScript generator may be tra-

versed with a for( of ) loop whereas the result-

ing tinyield Traverser cannot be traversed with the

equivalent C# foreach( in ). The Traverser can

be traversed only through its invocation, for exam-

ple as presented in Listing 6. The difference between

the two forms of traversing is usually denoted as pull

versus push access, where pull denotes getting items

(ask) and push regards expressing what to do with

those items (tell) (Hunt and Thomas, 2003).

Deconstructing yield Operator to Enhance Streams Processing

145

flatt e n ( ma p ) ( C o n s o le . Wr i t e L in e ) ;

Listing 6: Traversing elements from a Traverser in a push

style idiom.

Yet, the tinyield Traverser has a limitation re-

garding the yield primitive: a suspended Traverser

is not a ﬁrst-class value. A Traverser performs a sin-

gle bulk computation. Hence, the caller relinquishes

control, and many algorithms cannot do this, such as

any algorithm that needs to manipulate two sequences

simultaneously. For example, the zip (also known as

convolution) is an operation that takes a tuple of se-

quences and transforms them into a sequence of tu-

ples. This problem is easily solved if we convert at

least one of the sequences to an explicit data struc-

ture. Yet, there is a useless overhead in case of that se-

quence being very large and the streams do not match.

Hence, we will have performed a great deal of work

for nothing.

Thus, we need a suspendable traversal that is able

to iterate element by element, rather than all elements

in bulk. To that end we have designed an alterna-

tive way of traversing elements individually, which is

speciﬁed by the tinyield Advancer function type de-

picted in Figure 2.

Figure 2: Class diagram of Advancer and Yield types.

The Advancer is similar to the Traverser de-

scriptor but returns a Boolean instead (i.e. bool). An

Advancer function is expected to yield the next el-

ement of the sequence, if there are any, and returns

whether an element was processed, or not. Simply

put, it essentially merges the behavior of hasNext()

and next() of Java Iterator interface in a single

subroutine.

To traverse all elements of an Advancer we need

to perform a foreach( in ) loop, as presented

in the next statement that traverses an hypothetical

Advancer<T> adv and prints all its elements:

while( adv ( C o n s o l e . W rit e L i n e ) ) { }

In Listing 7, we present an Advancer based imple-

mentation of a zip( upstream, other, zipper)

that applies the speciﬁed zipper function to the cor-

responding elements between upstream and other,

producing a new sequence of results. Both upstream

and other are of type Advancer. Notice that the call

to other() produces a Boolean value according to the

Advancer idiom. We use this value to let the result-

ing Advancer inform whether it has yielded a value,

or not. Also, when the upstream is empty the inner

lambda is not performed and the variable yielded re-

mains false. So, only when we successful advance on

both streams, the zip produces a new value and the

variable yielded is changed to true.

Ad v a n cer < R > Zip < T , U , R > (

Ad v a n cer < T > up s t r eam ,

Ad v a n cer < U > othe r ,

Func <T , U , R > z i p p e r )

{

return yield = > {

bool yi e l d e d = false;

ups t r e a m ( e1 = >

yield e d = other ( e2 = >

yield( z i p p e r ( e1 , e2 ) )

)

) ;

return yi e l d e d ;

}

Listing 7: Zip operation for Advancer based sequences.

Our Advancer based implementation is much

more compact than its equivalent counterpart in Java.

For example, the accepted answer to the question

”Zipping streams using JDK8 with lambda”

gives

an implementation with more than 30 lines of code.

Moreover, our proposal outperforms Java streams in

a realistic benchmark zipping sequences from Last.fm

(Section 3).

Concluding, and like many others streams li-

braries, the tinyield library provides implementa-

tion of core streams processing operations, such as

map, ﬁlter, reduce, limit, takeWhile, zip, and oth-

ers (Fowler, 2015). These operations may require one,

or both ways of traversal: Traverser and Advancer.

To that end, the tinyield type Query<T> aggregates

the two traversal methods in a single instance. Then,

operations are built on top of the Query<T> type that

allows chaining invocations ﬂuently. In this case, the

terminal operation will decide which traversal method

to use.

Finally, we should not be restricted to the oper-

ations suite provided by a streams library. To that

end, we included in tinyield a ﬂuent way of chain-

ing user-deﬁned operations. Since, we give priority

to Traverser type traversal, we provide in Query a

method then, which receives a function that maps an

upstream Query in a new Traverser, such that:

Th e n ( F un c < Q uer y < T > , T raverse r <R > > next )

Consider for example, that we would like to use

an absent distinctBy operation to get a sequence of

random numbers with distinct lengths of digits. In

Listing 8 we show how to implement and chain this

new operation ﬂuently in such pipeline.

This is the most concise way of interleaving a

user-deﬁned operation. Yet, it fails if the terminal

operation requires an Advancer, as is the case for zip.

stackoverﬂow.com/a/23529010/1140754

ICSOFT 2021 - 16th International Conference on Software Technologies

146

Set < int > l e n g t hs = new Ha s h Set < > () ;

Random r a n d = new R a n d o m () ;

Quer y

. G e n e rat e (() => ra n d . n e x t () * MAX )

. L imit ( 1 0 2 4 )

. Ma p ( C o n v e r t . T o I n t32 )

. Then ( u p s tr e a m = > yield = > u p s t rea m .

Tra v e r s e (

it e m = > {

in t nr O f D i g i t s = it e m . Length ;

if ( l e n g t hs . A dd ( n rO f D i g it s ) )

yield( item ) ;

}) )

. T r a v ers e ( C o n s o l e . W r it e L i n e ) ;

Listing 8: User-deﬁned distinctBy ﬂuently chained in a

tinyield pipeline.

For those cases, we provide an alternative over-

loaded Then that receives two mapping functions to

produce both ways of traversal. However, that alter-

native will incur in verbosity.

3 PERFORMANCE EVALUATION

To avoid I/O operations during benchmark execu-

tion, we have previously collected all data into re-

source ﬁles, loading all that data into in-memory data

structures on benchmark bootstrap. Thus, we avoid

any I/O by providing the sequences sources from

memory. You may ﬁnd further environment details

on sequences-benchmark repository (Poeira and Car-

valho, 2020).

To achieve the most unbiased and precise results

we relied our benchmarks in state-of-the-art platforms

for performance analysis in the both environments:

JMH (Shipilev, 2013) in Java and benchmark.js (By-

nens and Dalton, 2014) in JavaScript.

We ran our tests on a local machine which has the

following specs: Microsoft Windows 10 Home, In-

tel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz. We

have used the following runtimes: Openjdk 15.0.1

(build 15.0.1+9-18), Dotnet core 5.0.102 and Node.js

v12.13.1.

Custom Operation Every. The custom operation,

Every is based on the Stackoverﬂow question ”Zip-

ping streams using JDK8 with lambda”. The question

also discussed how the lack of a zip operation in Java

Stream was signiﬁcant. Our benchmark leveraged

some ideas from kotlin-benchmarks (Ryzhenkov,

2014), such as testing with three different data types:

Integer, String and a Value class holding an int

and a String ﬁeld that are combined to implement

equals and hashCode.

Every is an operation that, based on a user-deﬁned

predicate, tests if all the elements of a sequence match

between corresponding positions. To implement the

every() operation we simply combine the zip() and

allMatch() operations in sequence, such as:

se q 1

. zi p ( seq2 , pre d :: t e st )

. a l l M atc h ( B o o l e a n . T RUE : : eq u a l s ) ;

The tinyield is the most performant in both Java

and JavaScript environments as depicted in charts of

Figure 3. JavasScript arrays present the most close

performance to tinyield, but they become unafford-

able for large data sets due to their eager nature.

Last.fm. To benchmark use cases with real-world

data, we resorted to publicly available Web APIs,

namely REST Countries and Last.fm. We retrieved

from REST Countries a list of 250 countries and then

used them to query Last.fm, retrieving both the top

Artists and the top Tracks by country, resulting in a

total of 7500 records each.

The domain model for these benchmarks can be

summarized by the entities: Country, Language,

Track, and Artist.

We devised two benchmarks using data from

Last.fm, “Distinct Top Artist and Top Track by Coun-

try” identiﬁed in Figure 4 as “Distinct”, and“Artists

Who Are in A Country’s Top Ten Who Also Have

Tracks in The Same Country’s Top Ten” identiﬁed as

“Filter”(Poeira and Carvalho, 2020). Both bench-

marks start off the same way. We ﬁrst query all the

countries, ﬁlter the non-English speaking countries

and, from these, we retrieve two sequences: one pair-

ing Country with it’s top Tracks and another pair-

ing Country with it’s top Artists(Poeira and Carvalho,

2020).

Tinyield is only overtaken on Last.fm benchmark

most signiﬁcantly on Java, by Kotlin Sequence. Nev-

ertheless, tinyield is the second most performant li-

brary in Java for the Last.fm and with many advan-

tages on extensibility over Java streams (i.e. verbose-

less and ﬂuency).

On JavasScript the yield counterpart used by IxJs

is the worst performant in all benchmarks. On the

other hand, Lazy.js and Sequency are also well per-

formant alternatives for Last.fm that avoid yield prim-

itive as is tinyield. Arrays also present good behavior

but can be unviable for larger data sets.

Weather and User-deﬁned Operations. We used

another realistic data source from WorldWeatherOn-

line to benchmark interleaved user-deﬁned opera-

tions. For these benchmarks, we created two cus-

tom operations: oddLines and collapse. We then

queried WorldWeatherOnline for the weather in Lis-

bon, Portugal between the dates of 2020-05-08 and

Deconstructing yield Operator to Enhance Streams Processing

147

Class

Number

String

Underscore

Tinyield

Sequency

Lodash

ES6 Arrays

Lazy.js

IxJs

ops/s

Every Node.js

Class

Integer

String

100

150

200

250

300

350

400

450

Eclipse-…

Tinyield

Jool

Kotlin Sequence

Stream

StreamEx

Vavr

ops/s

Every Java

Figure 3: Performance in throughput on Every benchmark for 1000 elements.

Filter

Dis�nct

Undersco…

Tinyield

Sequency

Lodash

ES6 Arrays

Lazy.js

IxJs

ops/s

LastFM Node.js

Filter

Dis�nct

Eclipse…

Tinyield

Jool

Kotlin…

Stream

Stream…

Vavr

103ops/s

LastFM Java

Figure 4: Performance in throughput on Last.fm benchmark.

2020-11-08, providing us with a CSV ﬁle that we ma-

nipulated with the operations above in a benchmark

to perform the following queries: 1) maximum tem-

perature; 2) count distinct temperatures values, and 3)

count temperature transitions.

Sequency is a JavaScript library developed in

TypeScript like tinyield4net. However tinyield is be-

tween 2 and 3-fold faster than Sequency on weather

as depicted in Figure 5.

For comparison, Prokopec (Prokopec and Liu,

2018) also has observed that lazy functional lists

are 12-17x slower. We have also experimented that

same behavior in Java most signiﬁcantly on weather

benchmark for Vavr, which is a purely functional

and immutable-based data structure, and also for

StreamEx. Vavr and StreamEx approach to user-

deﬁned operations consist on using a cons (Friedman

and Wise, 1976) in conjunction with the head method

and a supplier for the new tail of the sequence recur-

sively.

Alternative Approaches Comparison. Tinyield

performance gains are due to the fast-path iteration

protocol that has less overhead when bulk traversing

a sequence than a common iterators does. This ap-

proach reduces the overhead of per-element access,

and increases the effectiveness of other optimizations

such as inlining, code motion, bounds check elimina-

tion, and others.

Not only that but tinyield is the only Java library

with a verboseless way of deﬁning new stream op-

erations while also maintaining the ﬂuency of the

pipeline.

The Java programming language does not provide

a yield primitive and extending streams API incurs in

inevitable verbose implementations.

We identiﬁed a few advantages of Kotlin’s Se-

quence, namely on operations that in Java would re-

turn Optional, return nullable in Kotlin, meaning no

wrapper is created resulting in less overhead. More-

over Kotlin’s terminal operations are inline so there is

no indirection when calling terminal operations.

Eclipse Collections has a lot of optimizations

in place regarding the data-source of the pipeline,

namely if an array was at the source then iteration will

be as fast as using a for loop.

The main gain of StreamEx, jOOλ and Vavr is the

fact that these libraries bring extra functionality to the

user out of the box with almost no need of creating

new user-deﬁned operations.

JavaScript supports, since EcmaScript5 in

2009 (ecm, 2020), operation chaining over sequences,

in other words, sequence pipelines. JavaScript’s se-

quence type is the Array type, distinguishing itself

from other sequence type implementations by having

an eager approach. Generators and the yield keyword

ICSOFT 2021 - 16th International Conference on Software Technologies

148

Transi�ons

Max

Dis�nct

100

120

140

Undersc…

Tinyield

Sequency

Lodash

ES6 Arrays

Lazy.js

IxJs

ops/s

Weather Node.js

Transi�ons

Max

Dis�nct

100

150

200

250

300

350

400

450

500

Eclipse-…

Tinyield

Jool

Kotlin…

Java…

StreamEx

Vavr

ops/s

Weather Java

Figure 5: Performance in throughput on weather benchmark.

were later introduced with ES6 in 2015, yet, no lazy

sequence type implementations were provided by

this new standard either forcing developers to look

for this feature in third-party libraries.

Lodash and Underscore, although they are quite

popular in the Javascript world, they suffer from the

same problems of ES6 Arrays. When processing a

sequence pipeline these libraries will calculate all in-

termediate results before proceeding to the next oper-

ation, which incurs in the same unnecessary process-

ing observed in ES6 Arrays.

Lazy.js and Sequency propose alternative traver-

sal designs as is tinyield, but their proposals have no

foundation on a uniﬁed yield model. Moreover, Se-

quency is one of the worst performant libraries on

weather benchmark.

4 RELATED WORK

Lazy traversal is inspired by the concept of lazy lists,

also known as streams, ﬁrst described in 1965 by

Landin (Landin, 1965). It was Landin who proposed

the use of delayed evaluation to avoid ”item-by-item”

representation of collections. Friedman and Wise

(Friedman and Wise, 1976) introduced lazy lists in

Lisp in 1976 and the idea was then adopted in other

languages too, either as a fundamental data structure,

as in Haskell(Jones, 2003).

Alphard, developed at CMU in the late 1970 was

the ﬁrst programming language to introduce the gen-

erator operator (Shaw et al., 1977). That construct

inspired iterators in CLU (Liskov, 1983) as a proce-

dure that returns a sequence of elements, that allows

to get at the elements one at a time.

The idea of a single iteration method was intro-

duced in Python 2.2, where iterators provide a single

method next that returns the next element in a se-

quence, or raises an exception when no more elements

are available (Yee and van Rossum, 2001). This fea-

ture is described in the proposal PEP 234 (Python

Enhancement Proposal 234) Iterators (Yee and van

Rossum, 2001). The advantages of a single traversal

subroutine were highlighted in (Baker, 1993), where

H. Baker shows how higher-order functions, taking

as argument functions which are closed over their

free lexical variables (closures) can be used to pro-

vide iteration capabilities. Similar simpliﬁcation with

higher-order functions has been followed in several

domain-speciﬁc approaches namely on template pro-

cessors (Carvalho et al., 2020).

Many use-cases evaluated sequence traversal per-

formance through the use of benchmarks, namely

kotlin-benchmarks(Ryzhenkov, 2014), which pro-

vides benchmarks over Kotlins features such as the

use of Sequence. Another example are the bench-

marks devised by Angelika Langer and Klaus Kreft

(Langer and Kreft, 2015) with the aim of better under-

standing how Java Streams perform and when paral-

lel() outperforms sequential processing of the same

Stream. Nicolai Parlog also tackled this point in his

benchmarks on Parallel Stream Vectorization (Par-

log, 2019), evaluating the performance gained using

Stream parallel() when computing factorials.

5 CONCLUSIONS

Generators are heavily inspired by co-routines, which

generally follow two approaches to implement control

ﬂow: call stack manipulation and program transfor-

mation, i.e. instrumentation. The main problem with

both approaches regards their overheads due to con-

text switch manipulation.

In the ﬁrst approach, the runtime is augmented

with call stack introspection or the ability to swap call

stacks during the execution of the program. Several

attempts in the context of the JVM runtime (Dragos

et al., 2007; Stadler et al., 2009) did not become of-

ﬁcial and most recently OpenJDK is still working on

Project Loom (OpenJDK, 2020) that aims to provide

ﬁbers and continuations for the Java Virtual Machine.

Deconstructing yield Operator to Enhance Streams Processing

149

In the second approach, the compiler transforms

the program to translate coroutines into an equivalent

program without coroutines. This is the approach fol-

lowed by most generators such as in C#, JavaScript,

and others.

We claim that leash to co-routines induce heavy-

weight approaches that incur in performance over-

heads on generators. On the other hand, object-

oriented iterators incur in useless complexity and ver-

bosity that affects readability and expressiveness of

streams operations (Baker, 1993). The tinyield design

proposal suppresses those limitations with advantages

in both performance and extensibility conciseness.

Our model has been already extended for asyn-

chronous processing and has evidences that may over-

take alternatives such as reactive streams (Kaazing

et al., 2017) achieving better throughput under some

non-blocking IO scenarios. Previous work has al-

ready been made in this ﬁeld (Prokopec and Liu,

2018) but again, it is tight with co-routines subject,

which we have shown in this work that is harmful for

streams traversal, namely in JavaScript environment.

ACKNOWLEDGEMENTS

This work was supported by Instituto Politec-

nico de Lisboa, Lisbon, Portugal, for funding

the projet ”Reactive Web streams for Big Data”

(IPL/2020/WebFluid ISEL).

REFERENCES

(2020). ECMAScript 2020 language speciﬁcation, 11th edi-

tion. ECMA, 11 edition.

Baker, H. G. (1993). Iterators: Signs of weakness in object-

oriented languages. SIGPLAN OOPS Mess., 4(3):18–

25.

Borins, M., Braun, A. R., Palmer, R., and Terlson,

B. (2006). ECMA-334 C# language speciﬁcation.

ECMA, 5 edition.

Bynens, M. and Dalton, J.-D. (2014). benchmarkjs: A

benchmarking library that supports high-resolution

timer.

Carvalho, F. M., Duarte, L., and Gouesse, J. (2020). Text

web templates considered harmful. In Web Informa-

tion Systems and Technologies, pages 69–95, Cham.

Springer International Publishing.

Conway, M. E. (1963). Design of a separable transition-

diagram compiler. Commun. ACM, 6(7):396–408.

CTS (2012). ECMA-335 Common Language Infrastructure

(CLI), 6th edition, June 2012. ECMA, 6 edition.

Dragos, I., Cunei, A., and Vitek, J. (2007). The-

ory and practice of coroutines with snapshots. In

ICOOOLPS’2007, Technische Universit

at Berlin.

Fowler, M. (2015). Collection pipeline.

Friedman, D. P. and Wise, D. S. (1976). CONS should not

evaluate its arguments. In Michaelson, S. and Milner,

R., editors, Automata, Languages and Programming,

pages 257–284, Edinburgh, Scotland. Edinburgh Uni-

versity Press.

Hunt, A. and Thomas, D. (2003). The art of enbugging.

IEEE SOFTWARE.

James, R. and Sabry, A. (2011). Yield: Mainstream de-

limited continuations. Workshop on the Theory and

Practice of Delimited Continuations.

Jones, S. (2003). Haskell 98 Language and Libraries: The

Revised Report. Journal of functional programming:

Special issue. Cambridge University Press.

Kaazing, Lightbend, Netﬂix, Pivotal, and Hat, R. (2017).

Reactive streams speciﬁcation for the jvm.

Landin, P. J. (1964). The Mechanical Evaluation of Expres-

sions. The Computer Journal, 6(4):308–320.

Landin, P. J. (1965). Correspondence between algol 60

and church’s lambda-notation: Part i. Commun. ACM,

8(2):89–101.

Langer, A. and Kreft, K. (2015). Stream performance. JAX

London Online Conference.

Liskov, B. (1983). CLU Reference Manual. Springer-Verlag

New York, Inc., Secaucus, NJ, USA.

Liskov, B. (1996). A History of CLU, page 471–510. Asso-

ciation for Computing Machinery, NY, USA.

OpenJDK (2020). Loom - ﬁbers, continuations and tail-

calls for the jvm.

Parlog, N. (2019). Github.

Poeira, D. and Carvalho, F. M. (2020). Benchmark for dif-

ferent sequence operations in java and kotlin. Tech-

nical report, https://github.com/tinyield/sequences-

benchmarks.

Prokopec, A. and Liu, F. (2018). Theory and practice of

coroutines with snapshots. In European Conference

on Object-Oriented Programming.

Ryzhenkov, I. (2014). JetBrains.

Shaw, M., Wulf, W. A., and London, R. L. (1977). Ab-

straction and veriﬁcation in alphard: Deﬁning and

specifying iteration and generators. Commun. ACM,

20(8):553–564.

Shipilev, A. (2013). Java microbenchmark harness (the

lesser of two evils).

Stadler, L., Wimmer, C., W

urthinger, T., M

ossenb

ock, H.,

and Rose, J. (2009). Lazy continuations for java vir-

tual machines. In Proceedings of the 7th International

Conference on Principles and Practice of Program-

ming in Java, PPPJ ’09, page 143–152, New York,

NY, USA. Association for Computing Machinery.

Thomas, D. and Hunt, A. (2007). Programming Ruby: The

Pragmatic Programmer’s Guide. Addison-Wesley.

Yee, K.-P. and van Rossum, G. (2001). Pep 234 – iterators.

Technical report, Python.

ICSOFT 2021 - 16th International Conference on Software Technologies

150