Creo: Reduced Complexity Service Development

Per-Olov

Ostberg and Niclas Lockner

Dept. of Computing Science, Ume

a University, Ume

a, Sweden

Keywords:

Service-orientated Architecture, Service Development Tools.

Abstract:

In this work we address service-oriented software development in distributed computing environments, and

investigate an approach to software development and integration based on code generation. The approach is

illustrated in a toolkit for multi-language software generation built on three building blocks; a service descrip-

tion language, a serialization and transport protocol, and a set of code generation techniques. The approach

is intended for use in the eScience domain and aims to reduce the complexity of development and integration

of distributed software systems through a low-knowledge-requirements model for construction of network-

accessible services. The toolkit is presented along with a discussion of use cases and a performance evaluation

quantifying the performance of the toolkit against selected alternative techniques for code generation and ser-

vice communication. In tests of communication overhead and response time, toolkit performance is found to

be comparable to or improve upon the evaluated techniques.

1 INTRODUCTION

Cloud computing has in recent years evolved to an

established paradigm for provisioning of IT capacity.

While this approach can offer several beneﬁts com-

pared to traditional static provisioning, e.g., facilita-

tion of more ﬂexible service types (Armbrust et al.,

2010) and improvements in cost and energy efﬁciency

of large-scale computing (Walker, 2009; Berl et al.,

2010), it also places focus on a current problem in dis-

tributed computing: the increasing complexity of de-

velopment and management of systems in distributed

computing environments (Kephart and Chess, 2003).

Service-Oriented Computing (SOC) is a popular

approach to software development and integration in

large-scale distributed systems. SOC is argued to be

well suited for cloud environments as it places focus

on representation of logic components as network-

accessible services, and aims to facilitate develop-

ment and integration of systems through coordination

of service interactions. At architecture level, Service-

Oriented Architectures (SOAs) deﬁne service inter-

faces as integration points and address system compo-

sition at interface or protocol level. While a number of

SOA techniques have emerged, service development

and integration are still complex issues and there ex-

ists a need for development tools that provide non-

complex and low-learning-requirement environments

for efﬁcient development of service-based systems.

To illustrate these issues, we here take the per-

spective of eScience application development. In

eScience

, distributed computing techniques are used

to create collaborative environments for large-scale

scientiﬁc computing. In comparison to commer-

cial software stacks, scientiﬁc computing tools are

typically prototype-oriented, developed in projects

with limited software development budgets, and of-

ten composed of heterogeneous components devel-

oped in multiple languages and environments. In ad-

dition, eScience applications often use distributed or

parallel programming techniques to exploit the inher-

ent parallelism of computational problems. As many

current eScience efforts are approaching construction

of virtual infrastructures using cloud technology, they

here serve as illustrative examples of the difﬁculties

of developing multi-language software stacks in het-

erogeneous distributed computing environments.

In this work we address reduction of complexity

in service-based software development, and present

an easy-to-use toolkit for efﬁcient cross-language in-

tegration of software services. The toolkit is based

on three core components: a simpliﬁed syntax service

description language, a transparent data serialization

and transmission protocol, and a set of code genera-

tion tools designed to abstract complexity in service

and service client development.

Computationally intensive science carried out in highly

distributed network environments.

230

Östberg P. and Lockner N..

Creo: Reduced Complexity Service Development.

DOI: 10.5220/0004854902300241

In Proceedings of the 4th International Conference on Cloud Computing and Services Science (CLOSER-2014), pages 230-241

ISBN: 978-989-758-019-2

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

The remainder of the paper is structured as fol-

lows: Section 2 presents project background and a

brief survey of related work, Section 3 outlines the

proposed approach and toolkit, and Section 4 dis-

cusses use cases for the approach. In the second half

of the paper, Section 5 contains a performance evalu-

ation quantifying toolkit performance against selected

alternative techniques for code generation and ser-

vice communication, followed by conclusions and ac-

knowledgements in sections 6 and 7.

2 RELATED WORK

A number of tools for service development and mid-

dleware construction exist, ranging in complexity and

abstraction levels from very simple ﬁne-grained in-

terprocess communication tools to advanced middle-

ware construction tools featuring advanced data mar-

shalling, call translation, and remote reference count-

ing techniques. In general there exists trade-offs be-

tween complexity and efﬁciency that make service

technologies more or less suitable for certain situa-

tions, and many technologies have been developed for

speciﬁc application scenarios.

For example, direct interprocess communication

technologies such as traditional remote procedure

calls (RPC) (Birrell and Nelson, 1984) and Java Ob-

ject Serialization (JOS) (Oracle, 2005) (over sock-

ets) provide transparent development models but of-

fer little in ways of complexity abstraction. Other

approaches such as Java Remote Method Invocation

(RMI) (Wollrath et al., 1996) and the Microsoft Win-

dows Communication Framework (WCF) (Mackey,

2010) offer development models tightly integrated

into mature commercial software development en-

vironments, but lose some applicability in multi-

platform application scenarios. There exists also

standardized approaches to multi-language and multi-

platform service development, e.g., the Common Ob-

ject Request Broker Architecture (CORBA) (Vinoski,

1993), but while such standardized approaches typi-

cally are very expressive and capable of application in

multiple programming styles, e.g., object-orientation

and component-oriented development, this general

applicability often comes at the price of very steep

learning curves and high development complexity.

In service-oriented computing and architectures,

programming models such as SOAP and REST-style

web services are widely used due to features such as

platform independence, high abstraction levels, and

interoperability. The SOAP approach to web ser-

vices favors use of standardization of XML-based ser-

vice description and message formats to facilitate au-

tomated generation of service interconnection code

stubs, dynamic service discovery and invocation tech-

niques, and service coordination and orchestration

models. SOAP-style web services are however of-

ten criticized for having overly complex development

models, inefﬁciencies in service communication, and

low load tolerances in servers (although developments

in pull-based parser models have alleviated some of

the performance issues (Govindaraju et al., 2004)).

The REpresentational State Transfer (REST)

(Fielding, 2000) web service model is often seen

as a light-weight alternative to the complexity of

SOAP-style web service development. The REST ap-

proach discourages standardization (of message for-

mats), promotes (re)use of existing wide-spread tech-

nology, and aims to give service developers more

freedom in, e.g., choice of data representation formats

and API structures. While this approach facilitates a

development model well suited for smaller projects,

it is sometimes argued to lead to more tightly coupled

service models (that require service client developers

to have knowledge of service-side data structures) and

introduce technology heterogeneity in large systems.

Although service models are considered suitable

for large-scale system integration, and some under-

standing of the applicability of web services has

been gained (Pautasso et al., 2008), neither ap-

proach fully addresses the requirements of service-

oriented software development and a number of

technologies for hybrid service-RPC mechanisms

have emerged. These include, e.g., interface def-

inition language (IDL) based technologies such as

Apache Thrift (Slee et al., 2007), an RPC frame-

work for scalable cross-language service develop-

ment, Apache Avro (Apache, 2009), a data serial-

ization system featuring dynamic typing, and Google

protocol buffers (Google, 2008), a method for seri-

alizing structured data for interprocess communica-

tion. For high performance serialization and trans-

mission, there also exists a number of non-IDL

based serialization formats and tools such as Jack-

son JSON (Jackson, 2009), BSON (MongoDB Inc.,

2007), Kryo (Kryo, 2009), and MessagePack (Fu-

ruhashi, 2011).

In addition to trade-offs for technical performance

and applicability, tools and development models of-

ten impose high learning requirements in dimensions

orthogonal to the task of building distributed sys-

tems. For example, the Distributed Component Ob-

ject Model (DCOM) requires developers to under-

stand data marshalling and memory models, Java

RMI distributed garbage collection, CORBA portable

object adapters (type wrappers), and SOAP web ser-

vices XML Schema (for type deﬁnition and valida-

Creo:ReducedComplexityServiceDevelopment

231

tion). As distributed systems are by themselves com-

plex to develop, debug, and efﬁciently analyze, there

exists a need for software development tools that pro-

vide transparent and intuitive development models,

and impose low learning requirements.

In this work we build on the service develop-

ment model of the Service Development Abstraction

Toolkit (

Ostberg and Elmroth, 2011), and investigate

an approach to construction of development tools fo-

cused on reducing complexity of service-based soft-

ware development. The aim of this approach is to

combine the high abstraction levels of SOAP-style

web services (using a simpliﬁed service description

syntax) with the communication efﬁciency of more

direct RPC-style communication techniques, and pro-

duce tools with low learning requirements that efﬁ-

ciently facilitate service development. As the work

is based on code generation, the approach can be

seen akin to development of a domain-speciﬁc lan-

guage (Van Deursen et al., 2000) for service descrip-

tion, but the main focus of the work is to reduce

overhead for exposing component logic as network-

accessible services. The work is done in eScience set-

tings, and presented results are primarily intended to

be applied in scientiﬁc environments, e.g., in produc-

tion of tools, applications, and middlewares for scien-

tiﬁc simulation, experimentation, and analysis.

3 CREO

Service-oriented architectures typically expose com-

ponents and systems as platform independent,

network-accessible services. While this approach

gracefully abstracts low-level integration issues and

provides for high-level architecture design models, it

can often lead to practical integration issues stem-

ming from, e.g., complexity in service development

models, steep learning curves of service development

tools, and lack of distributed systems development ex-

perience in service client developers.

In this paper we build on earlier efforts presented

in (

Ostberg and Elmroth, 2011) and (

Ostberg et al.,

2012), and propose an approach to service develop-

ment that places the responsibility of service client

development on service developers. As this shift in

responsibility introduces noticeable additional com-

plexity in service development, e.g., in requirements

for multi-language service client development, we

note a need for tools to support the approach and

present Creo - a service development toolkit based on

automated code generation.

The Creo toolkit is aimed to reduce complexity

in construction of network-accessible services by pro-

viding a development model that lowers learning re-

quirements and increases automation in service de-

velopment. While the toolkit is designed to be sim-

ple to use and targeted towards developers with lim-

ited distributed systems development experience, it

also strives to provide service communication perfor-

mance high enough to motivate use of the toolkit in

mature service development scenarios.

To limit the scope of the work, we have initially

designed the toolkit to support development of ser-

vices in a single language (Java), and service client

development in four languages common in eScience

environments: C, C#, Java, and Python. The toolkit

implementation patterns are however transparent and

modularized, and all modules are designed to be ex-

tensible to code generation in additional languages.

The intent of the toolkit is to provide robust service

communication stubs in general purpose program-

ming languages that can later be used to build in-

tegration bridges into special purpose environments

such as R and Matlab. The choice of Java as ser-

vice language is motivated by the language’s rich de-

velopment APIs, robustness in performance, platform

independence, and wide-spread adoptance in operat-

ing systems and server platforms. The design phi-

losophy of the toolkit can be summarized as support-

ing advanced implementation of services while keep-

ing generated code for clients as transparent, light-

weight, and free of external dependencies as possible.

To combine the ease-of-use of high abstraction

level tools with the communication performance of

more ﬁne-grained approaches, the toolkit develop-

ment model is based on the service description ap-

proach of SOAP-style web services combined with a

customized version of the RASP protocol presented

in (

Ostberg et al., 2012). The toolkit service develop-

ment process can be summarized in three steps:

1. Service description. Service type sets and inter-

faces are deﬁned in a custom service description

(interface deﬁnition) language

2. Communication code generation. Service and

service client communication stubs are generated

from service descriptions.

3. Service integration. Logic components are ex-

posed as services through implementation of gen-

erated service interfaces, and service clients are

implemented based on the generated communica-

tion stubs for service interconnection.

In all steps of this process, the toolkit aims to re-

duce the complexity of service development by pro-

viding intuitive tools and formats for service descrip-

tion, data representation, and code generation.

CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience

232

3.1 Service Description

Program 1 A sample Creo service description.

// annotations

@PACKAGE("packagename")

// type definitions

struct MetaData

{

String description;

long timestamp;

}

struct Data

{

MetaData metadata;

double[] samples;

}

// interface definitions

interface DataService

{

void storeData (Data[] data);

Data retrieveData (String description);

}

For data type and service interface deﬁnition, the

toolkit employs a service description language com-

prised of three parts:

• Annotations. Deﬁne code generation parameters,

e.g., service package names.

• Types. Speciﬁes a basic set of primitive types and

a struct mechanism for type aggregation.

• Interfaces. Deﬁne service interfaces in terms of

methods and method parameters.

The service description language format is based

on the block syntax of the C/C++ family of lan-

guages. In the interest of simplicity, the primitive type

set is restricted to a basic type set commonly occur-

ring in most programming languages: byte, char,

int, long, float, double, and String. The lan-

guage supports direct aggregation of primitive types

in structs and arrays as well as construction of com-

pound types via aggregation of structs. This allows

construction of hierarchical data types such as trees,

but not cyclic data types such as graphs. Program 1

contains a sample service description demonstrating

the aggregation mechanisms of the Creo service de-

scription language.

While alternative representation formats with

more advanced features exist, e.g., schema-based type

set and data validation in XML and WSDL, the de-

sign philosophy of this work is to reduce complexity

rather that offer advanced features. The goal of the de-

scription language is to provide a convenient format

that has great expressive power, is as unambiguous

as possible, and introduces as few learning require-

ments as possible. The primitive type set deﬁned, as

well as the concept of aggregation of ﬁelds in records

and arrays, are prevalent in programming languages

and should prove intuitive to developers regardless of

background. To minimize the learning requirements

of the tool, the type interpretations and syntax of the

description language are based on a subset of the well-

known Java programming language.

3.2 Data Representation

To promote transparency, the representation format

speciﬁed in service description also directly outlines

the data structures used in data serialization and trans-

mission. For language and platform independence,

all values are transformed to and from network byte

order in transmission and support code is generated

for programming languages not supporting descrip-

tion language features (e.g., byte order transforma-

tion, string classes, or array types). For aggregated

types, types are serialized in the order declared (and

stored in memory), with size counters preﬁxing data

for array types and strings. As data are declared and

stored in hierarchical structures (trees), data serializa-

tion is essentially a left-wise depth-ﬁrst traversal of

data trees, where individual node values are stored se-

quentially. In terms of invocation semantics, Creo de-

ﬁnes call-by-value semantics for invocation of remote

service methods. As data are serialized by value, the

use of reference and pointer types inside data blocks

passed to services is not supported. In particular,

use of circular references (e.g., cyclic graphs) may

lead to inefﬁcient transmission performance or non-

terminating loops.

For efﬁciency in transmission (i.e. minimization

of system calls and alignment of network package

sizes to maximum transfer units), all data are serial-

ized and deserialized via transmission buffers located

in the generated code stubs. The protocol used for

transmission of data between clients and services (il-

lustrated in Figure 1) is a customized version of the

Resource Access and Serialization Protocol (RASP)

of the StratUm framework (

Ostberg et al., 2012). The

description language does not support encoding of ex-

plicit exception messages for propagating error infor-

mation across process boundaries.

3.3 Code Generation

Service integration code is typically provided in one

of two forms: APIs or service communication stubs.

To reduce complexity in service client development,

Creo:ReducedComplexityServiceDevelopment

233

Figure 1: Byte layout of the Creo protocol request message for the sendData() method of Program 1. Data encoded in the

order deﬁned in service descriptions, arrays and strings preﬁxed with item counts. Byte block sizes and primitive types in

black, protocol preamble (protocol and method ids) and aggregated (struct and array) types in red.

and increase the transparency of the service com-

munication mechanisms, the Creo toolkit uses a

code generation approach centered around immutable

wrapper types and call-by-value interfaces. The ratio-

nale of this design is to make use of generated client

code as intuitive as possible, and to facilitate a service

client development model that doesn’t require prior

distributed systems development experience.

Use of code generation techniques rather than

APIs fundamentally assumes that service descriptions

rarely change (as service-oriented architectures tend

to be designed in terms of service interfaces), and

have the added beneﬁts of allowing typed languages

to catch type errors earlier while keeping service

client implementations loosely coupled to services.

3.3.1 Code Generator

From a high level, the Creo toolkit can be seen to be

composed of three components: a service description

parser, a framework generator, and a custom package

generator. To promote ﬂexibility and facilitate adap-

tation to new requirements, e.g., support for new pro-

gramming languages or representation formats, the

architecture of the toolkit is designed to be modular

and extensible. The separation of code generation for

frameworks and custom packages (i.e. code speciﬁc

to data types and services deﬁned in service descrip-

tions) serves to facilitate third party implementation

of code generator plug-ins. With this separation it is

possible to contribute plug-in modules to support al-

ternative implementations of, e.g., data serialization

routines and client implementations, without having

to deal with generation of framework code.

The service description parser is constructed using

a combination of in-memory compilation of the ser-

vice description types (after replacing selected key-

words to make service descriptions Java compliant),

use of the Java reﬂection API (to validate descrip-

tion structures), and a custom language parser (that

extracts parameter information). To isolate code gen-

erators from document parsing, the parser provides a

full internal API that completely describes the type

sets and document structures of service descriptions.

3.3.2 Generated Code - Framework

To establish a uniform model for client-service com-

munication, all service client code implements a

framework model for connection establishment, data

serialization, and transmission capabilities. This

framework is structured around an identiﬁed core fea-

ture set that includes, e.g., primitive type representa-

tion and serialization (including network byte order

transformations), array and string type wrapper types

(for languages not providing such types), and socket-

level read and write transmission buffers.

The purpose of the framework is to separate

service and client logic independent of the types

and services deﬁned in service descriptions, and re-

duce the complexity of generating code for service-

dependent logic. Implementation of this frame-

work pattern keeps all service client implementations

lightweight and compatible with the service imple-

mentation, which facilitates development of client im-

plementations in additional languages. On the service

side, the framework code is connected to the service-

dependent code through a provider-pattern implemen-

tation for service data type serializer factories.

3.3.3 Generated Code - Service Side

On the service side, the generated framework is ex-

tended with a lightweight service hosting environ-

ment containing basic server functionality such as

thread and service management. The architecture of

the service framework is based on the principle of ab-

stracting as much as possible of the service boilerplate

code required to expose components as services. It is

the intent of the toolkit that service implementation

should consist only of two steps - generation of the

service framework from a service description ﬁle and

implementation of a service (Java) interface.

The basic structure of the generated services is de-

signed around the information ﬂow in the system; a

server hosts services, parses incoming requests, and

passes request messages onto an incoming message

queue for the requested service. The service imple-

mentation processes requests, generates and pushes

response messages onto the outgoing message queue

for the service. The server continuously monitors

CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience

234

all service message queues and sends response mes-

sages when available. The core of the generated

service framework is message-oriented and deﬁned

around the concept of asynchronous message queues,

and does not restrict service implementations to use

of only synchronous request-response communica-

tion patterns. However, while service implementa-

tions are free to deﬁne their own communication pat-

terns in terms of the messages exchanged between

clients and services, use of asynchronous communi-

cation patterns requires modiﬁcations of the gener-

ated service clients to fully support such exchanges.

For reference, an asynchronous client (in Java) is pro-

vided with the generated service framework.

3.3.4 Generated Code - Client Side

The architecture of the generated service clients fol-

lows the same pattern in all implementing service

client languages (C, C#, Java, and Python), and is de-

signed to abstract ﬁne-grained service communication

tasks. A service API is generated exposing the meth-

ods deﬁned in service descriptions, and all data are

managed in immutable wrapper types based on the

types deﬁned in service descriptions. Service com-

munication details, such as connection establishment

and data marshalling, are abstracted by clients stubs.

The underlying philosophy of the toolkit is that

it should be the responsibility of the service devel-

oper to provide integration code (service clients) and

APIs for services, and the toolkit aims to abstract as

much as possible of that process. To promote trans-

parency, all client code generated is designed to fol-

low the same design pattern and all generated service

client code is designed to be as homogeneous as pos-

sible in architecture, code structure, and API func-

tionality support. When applicable, all code is gen-

erated along with sample build environment data ﬁles

(e.g., makeﬁles for C and ant build ﬁles for Java). In-

memory compilation and generation of Java Archive

(JAR) ﬁles are supported for Java.

4 USE CASES

To illustrate toolkit use, we here brieﬂy discuss ex-

ample application scenarios in the eScience domain.

Envisioned use cases for the Creo toolkit include:

• Coordinated multi-language logging and conﬁg-

uration. Scientiﬁc applications in the eScience

domain often consist of multiple components and

systems developed in multiple programming lan-

guages. Coordinated logging of application state

information can be very useful for visualization

and management of application processes, which

can be achieved by, e.g., developing a database ac-

cessor component in Java and exposing it as a ser-

vice using the Creo toolkit. Client stubs generated

by the toolkit can then be used to coordinate sys-

tem logs from multiple sources without introduc-

ing external dependencies in systems. Similarly,

multi-component systems can also use this tech-

nique to coordinate system conﬁguration, allow-

ing dynamic reconﬁguration of systems (use cases

from the StratUm (

Ostberg et al., 2012) project).

• Multi-component system integration. The Aequus

system (

Ostberg et al., 2013)) system is designed

for use in high performance and grid computing

infrastructures. While the core of the system is

developed in Java, the system also contains spe-

cialized components and tools developed in other

languages, e.g., scheduler integration plug-ins in

C and visualization and statistics tools in Python

and Matlab. Use of the Creo toolkit allows smooth

integration of different parts of the system without

extensive distributed systems development effort.

• System evaluation experiments. Distributed

computing infrastructure systems constructed as

service-oriented architectures often require sim-

ulation experiments for testing and validation of

functionality. The previously mentioned Aequus

system is developed and evaluated using emulated

system environments for system tests and scal-

ability simulations. In these settings the Creo

toolkit allows easy integration of multiple simu-

lation components for surrounding systems (e.g.,

batch schedulers and accounting systems), and

construction of large-scale emulation systems for

system evaluation.

• Application cloud migration. Many eScience ap-

plications are initially developed for use on a sin-

gle machine and later (for performance and scala-

bility reasons) transformed into multi-component

systems using parallel and distributed computing

techniques. As part of this process, staging of ap-

plications into cloud environments often requires

some form of reformulation of computational al-

gorithms to better adapt to horizontal cloud elas-

ticity models. The Creo toolkit can here be used

to, e.g., build staging and monitoring tools or

to facilitate remote communication with applica-

tions running in cloud data centers.

Use cases such as these illustrate not only the ex-

pressive power of tools for service development and

component integration, but also the importance of

keeping such tools simple and reducing the complex-

ity of building distributed systems. Use of develop-

Creo:ReducedComplexityServiceDevelopment

235

Table 1: A brief overview of the feature sets of the evaluated service technologies.

Creo Thrift SOAP REST RMI

Interface type IDL IDL IDL protocol API / stubs

Integration style stubs stubs API / stubs API / protocol stubs

Data representation format binary text / binary text text / binary binary

ment tools with steep learning curves or advanced

knowledge requirements for, e.g., serialization for-

mats, marshalling techniques, and transmission for-

mats, can greatly add to the complexity of building

distributed systems. For many purposes, and proto-

type development in particular, reduction of complex-

ity and ease-of-use often outweigh the additional fea-

tures of more advanced approaches.

5 EVALUATION

Service-based software design is an area with many

competing approaches to service development and in-

tegration, making objective evaluation of new tools

non-trivial.

In this work we identify three abstraction levels

for development toolkits; low (ﬁne-grained message

level integration), intermediary (remote procedure

call communication abstraction), and high (service-

oriented component integration); and evaluate the

proposed toolkit against selected tools from each ab-

straction level in the dimensions of serialization over-

head, transmission overhead, and service response

time. To facilitate future comparison against third

party tools, we select well-established and easily ac-

cessible tools for the evaluation.

For low level abstractions we compare the per-

formance of the toolkit against that of Apache

Thrift (Apache, 2010), a software framework for scal-

able cross-language service development. As the

toolkit primarily targets service development in Java,

we have for high and intermediary levels selected

Java-based tools. For intermediary level we evalu-

ate two related technologies: Java Remote Method

Invocation (RMI) (Wollrath et al., 1996), an object-

oriented remote procedure call mechanism that sup-

ports transfer of serialized Java objects and dis-

tributed garbage collection, and Java Object Serial-

ization (JOS) (Oracle, 2005), the object serialization

technology used by Java RMI. For high level, we

evaluate the toolkit against two popular web service

technologies: REST web services (using the RESTlet

framework version 2.0.15 (Restlet, 2013)) and SOAP

web services (using the Apache Axis 2 SOAP frame-

work version 1.6.2 (Apache, 2005)). Table 1 provides

a brief comparison of the feature sets of the evaluated

service technologies.

5.1 Testbed and Experimental Setup

To evaluate the technical performance of the toolkit

we measure three facets of service communica-

tion performance; serialization overhead, transmis-

sion overhead, and response time; and quantify these

against corresponding measurements of selected al-

ternative tools. Serialization overhead is here deﬁned

in terms of the computational capacity used for gener-

ation and parsing of service messages, and is included

in tests as it can heavily impact the execution footprint

of service-based tools. Transmission overhead is here

deﬁned to be the additional bandwidth requirements

introduced by service data representation formats, and

is measured by quantitative comparison of total mes-

sage sizes and message payload (raw data) sizes. To

isolate the communication overhead components in-

troduced by service tools in response time measure-

ments, thin service implementations (minimal request

processing times) are used.

Tests are performed using three types of request

data; coarse-grained data (byte chunks), ﬁne-grained

number-resolved data (integer and ﬂoat values), and

ﬁne-grained string-resolved data (text segments). For

each test and request type, tests are performed with

request sizes grown by orders of magnitude (blocks

of 100, 1k, 10k, 100k, 1M, 10M and 100M bytes).

Coarse-grained requests consist of large chunks of

bytes without structured format. For clients based

on Creo, Thrift, RMI, and JOS coarse-grained data

are sent as raw byte arrays. For REST-based clients,

requests are sent in HTTP POST requests as raw

bytes with the MIME type ”application/octet-stream”.

In SOAP-based clients, request data are encoded as

Base64-encoded strings.

Data for ﬁne-grained requests are created by

grouping data in blocks of 10 bytes, grown by aggre-

gating data blocks in groups of 10, and padded using

smaller data blocks to align sizes with even exponen-

tials of 2. For example, a 1k (1024 bytes) data block

consists of 10 groups of 10 blocks of 10 bytes plus

padding in the form of two 10 byte blocks and a 4-

byte pad value (a 32-bit integer or a 4-byte string de-

pending on type). Larger data blocks are grown us-

ing the same scheme, e.g., by aggregating ten 1k data

blocks to form a 10k data block. Numbers-based data

CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience

236

blocks contain pairs of 64-bit double-precision ﬂoat-

ing point and 16-bit integer values. String-based data

blocks contain 10-character strings.

For serialization overhead and service response

time tests, all tests are done by measuring the client-

side makespans of full operations, starting at the point

of client invocation and ending when the client re-

ceives a uniform size 4 byte server response mes-

sage. To isolate overhead components, all measure-

ments are performed in closed loop system settings

using sequential invocation patterns on dedicated ma-

chines with no competing load and isolated network

segments. Experiments are repeated multiple (at least

ten) times to minimize the impact of external factors

on measurements. Parallel invocation tests are used to

evaluate the load sensitivity and scalability of service

tools. All services used in measurements are imple-

mented in Java and service clients are implemented

in C, C#, Java and Python. For tests of the service

response time of REST and SOAP tools, request seri-

alization is done in JSON (using the reference library

of json.org) and XML (using JAXB).

All tests are run on a dedicated symmetric cluster

where nodes are equipped with dual 16 core 2.1 GHz

AMD Opteron 6272 processors and 54 GB RAM.

Nodes are interconnected with 1 Gbps links and net-

works are conﬁgured using MTU sizes of 1500 bytes.

All nodes run Ubuntu Linux 12.04 kernel version 3.2,

OpenJDK 1.6, Python 2.7, Mono 2.10, and GLib 2.32.

All software are accessible from Ubuntu repositories.

5.2 Serialization Overhead

To isolate measurements of data serialization over-

head, it is necessary to exclude all artefacts from

transmission of data between clients and the servers

in tests. Additionally, as tools employ transmission

(read and write) buffers that consume computational

power and are orthogonal to data serialization, trans-

mission buffers need to be bypassed in tests. To quan-

tify the serialization overhead of Creo and Thrift ser-

vice clients, both generated code and runtime libraries

are modiﬁed so that no data are placed in transmission

buffers or sent to servers after serialization. Further-

more, both tool’s service clients are modiﬁed so that

they do not read data from servers after invocations.

To avoid modiﬁcations of Java RMI stacks, we

here include measurements of the underlying seri-

alization technology used (JOS) and assume mea-

surements are representative of the serialization over-

head of RMI. To quantify the serialization overhead

of JOS, ObjectOutputStream instances are wrapped

around non-buffered dummy output streams (no data

transferred to underlying sockets). After modiﬁca-

tions, serialization overhead tests are performed in the

same way as service response time tests.

Results from data serialization overhead tests are

visualized in ﬁgures 2 and 3. For ease of comparison,

test results for multi-language tests (comparing Creo

to Thrift) using ﬁne-grained data tests are presented

individually, separating tests using number-resolved

and string-resolved data. As can be seen in Figure 2,

Creo improves upon the the performance of Thrift for

ﬁne-grained data on average of factors 1.16 to 5.23

for C#, Java, and Python clients. Compared to the less

mature Thrift C clients, Creo shows improvements of

factors 36.84 to 115.69. When comparing the perfor-

mance of Creo against that of other Java-based tools

(illustrated in Figure 3), Creo exhibits performance

improvements of on average of factors 5.66 to 388.56,

which is attributed to use of more complex serializa-

tion techniques and text-resolved data representation

formats in other tools.

These tests illustrate the magnitude of serializa-

tion overhead for complex serialization techniques, as

well as the impact serialization overhead can have on

service execution footprint and performance. For ex-

ample, the JAXB serialization engine used in SOAP

tests is unable to process messages of sizes 100 MB

in standalone settings, indicating a potential source

for load issues when used inside service engines.

5.3 Transmission Overhead

To evaluate transmission overhead for service com-

munication a simple server component that counts

and returns the number of bytes in requests is used.

Service invocation makespan is measured on the

client side and used to quantify transmission over-

head for service invocations with known request pay-

load sizes. Apache Thrift supports transmission of

data using three protocols: text-resolved JSON and

two binary protocols: TBinaryProtocol and TCom-

pactProtocol, where the former sends data as-is and

the latter uses variable-length encoding of integers.

The purpose of this encoding scheme; which for ex-

ample encodes 16-bit integers as 1-3 bytes, 32-bit in-

tegers as 1-5 bytes, and 64-bit integer as 1-10 bytes;

is to reduce the size of payload and commonly occur-

ring metadata such as the length of strings, arrays, and

collections. In tests we primarily use TBinaryProto-

col as it is supported in all languages, and evaluate

the efﬁciency of TCompactProtocol in the languages

supported (and quantify it against that of Creo and the

binary protocol) in separate tests.

For ease of comparison, test results for compact

binary representation formats (Creo, Thrift, and JOS)

and text-resolved formats (JSON REST and XML

Creo:ReducedComplexityServiceDevelopment

237

(a) Number-resolved data. On average Creo shows improvements of

factors 36.84 (C), 1.23 (C#), 3.51 (Java), and 5.23 (Python) in serial-

ization time.

(b) String-resolved data. On average Creo shows improvements of

factors 115.69 (C), 1.16 (C#), 2.03 (Java), and 3.24 (Python) in serial-

ization time.

Figure 2: Creo and Thrift serialization time (in milliseconds) for ﬁne-grained messages. Axes logarithmic.

(a) Number-resolved data. On average Creo shows improvements of

factors 388.56 (SOAP), 177.18 (REST), and 34.84 (RMI) in serializa-

tion time.

(b) String-resolved data. On average Creo shows improvements of

factors 30.37 (SOAP), 20.25 (REST), and 5.66 (RMI) in serialization

time.

Figure 3: Serialization time (in milliseconds) of Java-based tools for ﬁne-grained messages. Axes logarithmic.

(a) Creo, Thrift, and JOS. (b) REST and SOAP.

Figure 4: Transmission (message size) overhead for service invocation requests. Horizontal axis logarithmic.

SOAP) are presented separately. As can be seen in

Figure 4a, compact coarse-grained (binary) data are

represented with little overhead and ﬁne-grained data

are represented with overhead within a factor of 2 in

size for Creo, Thrift and JOS. As can be seen in Fig-

ure 4b, the use of text-resolved representation formats

can introduce signiﬁcant overhead for ﬁne-grained

data, ranging in tests up to a factor of 20 for JSON

REST and XML SOAP (both of which are unable to

process messages larger than 10MB in tests).

5.4 Service Response Time

Having roughly quantiﬁed the impact of potential

overhead sources for data serialization and transmis-

sion, we analyze the communication performance of

the evaluated tools in terms of service request re-

sponse times. Using closed system loop settings (se-

quential invocations of services deployed in isolated

systems), we measure invocation makespan from the

client perspective and use it as a measurement of ser-

vice response time. To verify the transfer of results

from sequential tests to (more realistic) parallel invo-

cation scenarios, we also validate results using paral-

CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience

238

(a) Number-resolved data. On average Creo shows improvements of

factors 19.76 (C), 2.77 (C#), 4.21 (Java), and 2.65 (Python) in service

response time.

(b) String-resolved data. On average Creo shows improvements of

factors 13.42 (C), 2.15 (C#), 4.77 (Java), and 1.87 (Python) in service

response time.

Figure 5: Creo and Thrift service response time (in milliseconds) for ﬁne-grained messages. Axes logarithmic.

(a) Fine-grained number-resolved data. On average Creo shows im-

provements of factors 135.69 (SOAP), 140.93 (REST), 6.97 (RMI),

and 5.04 (JOS) in service response time.

(b) Coarse-grained data. On average Creo shows improvements of

factors 7.66 (SOAP), 11.99 (REST), 0.83 (RMI), and 0.83 (JOS) in

service response time.

Figure 6: Service response time (in milliseconds) for Java-based tools. Axes logarithmic.

(a) Reduction of service invocation request size for TCompactProtocol

compared to TBinaryProtocol.

(b) Response time of Thrift’s C and C# clients when using 16kB write

buffers compared to using the default write buffers.

Figure 7: Transmission overhead in Thrift protocols and buffer alignment issues. Axes logarithmic.

lel invocation tests.

Results from response time tests are visualized in

ﬁgures 5 and 6. Figure 5 illustrates comparison of

the response time of Creo and Thrift services. On

average, Creo improves on the response time perfor-

mance of Thrift for ﬁne-grained data on average of

factors 1.87 to 4.77 for C#, Java, and Python clients.

Compared to Thrift C clients, Creo shows improve-

ments of factors 13.42 to 19.76. However, for coarse-

grained data (unstructured binary data, not illustrated

in graphs), Thrift service response times are on av-

erage 16% (C), 26% (C#), 27% (Java), and 32%

(Python) lower than that of Creo (performance av-

erages calculated for request sizes of 1MB, 10MB,

and 100MB). The higher response times of Creo for

coarse-grained data are attributed to the use of asyn-

chronous message queues and immutable data struc-

tures on the service side, which cause redundant data

replications in message transmission.

When comparing the response time of Creo to that

of other Java-based tools (illustrated in Figure 6), we

note performance improvements of at least factor 4.91

for ﬁne-grained data, and comparative performance

for coarse-grained data. As expected from analy-

Creo:ReducedComplexityServiceDevelopment

239

(a) Number-resolved data. On average TCompactProtocol shows im-

provements of factors 1.2% (C#), 26% (Java), and -51% (Python) in

service response time.

(b) String-resolved data. On average TCompactProtocol shows im-

provements of factors 7.4% (C#), 22% (Java), and -40% (Python) in

service response time.

Figure 8: Response time performance of Thrift protocols. Axes logarithmic.

sis of serialization and transmission overhead, REST

and SOAP web services exhibit response time perfor-

mance degradations from the use of text-based repre-

sentation formats and associated serializations.

5.5 Thrift Protocols

As mentioned, we use Thrift’s TBinaryProtocol in

tests as it is supported in all client languages. How-

ever, for selected languages, Thrift also supports the

TCompactProtocol that in theory provides more efﬁ-

cient representation of data. To ensure fair compari-

son in tests, we here evaluate the use of this variable-

length encoding scheme protocol. As can be seen in

Figure 7a, Thrift’s TCompactProtocol reduces Thrift

transmission overhead of ca 27% (number-resolved

data) and 34% (string-resolved data) in tests using

ﬁne-grained data. The greater reduction for string-

resolved data stems from all test data blocks con-

taining short strings (4 or 10 characters), causing

string lengths to be serializable in a single byte. The

variable-length encoding scheme has little effect on

unstructured (coarse-grained) binary data, but shows

an improvement for small messages as the protocol

contains less metadata.

In tests, we note oscillations in the performance

of Thrift’s C and C# clients for data sizes of 1kB and

10kB (see Figure 5). After analysis we speculate that

these effects arise due to buffer (size) alignment is-

sues in tests. To investigate this, we evaluate the per-

formance of the same clients with altered buffer sizes,

and note (as illustrated in Figure 7b) that the effects

can be alleviated using larger (16kB) message trans-

mission buffers.

Finally we evaluate the service response time of

Thrift’s two binary protocols to investigate the po-

tential impact of Thrift’s variable-length encoding

scheme on our tests. As illustrated in Figure 8, the

TCompactProtocol results in response time improve-

ments of 1.2% to 26% for C and C# clients, and per-

formance degradations of 40% to 51% for Python

clients. From these measurements we conclude that

use of the TCompactProtocol would not signiﬁcantly

impact the ﬁndings of the evaluation.

6 CONCLUSIONS

In this work we investigate an approach to service-

based software development and present a toolkit for

reduction of complexity in service development and

distributed component integration. The architecture

of the toolkit is designed to be modular and extensi-

ble, and places focus on transparency and reduction

of complexity. To reduce learning requirements, the

toolkit employs a service description language based

on the syntax and type interpretations of the well-

known Java language. The service description lan-

guage deﬁnes a set of primitive types and mechanisms

for aggregation of types in arrays and structs.

The toolkit supports generation of code for con-

struction of Java-based services as well as service

clients in Java, C, C#, and Python. The toolkit uses

the same code generation pattern for all languages,

which deﬁnes immutable types that directly wrap the

aggregation patterns deﬁned in service descriptions.

For transparency, the service communication proto-

col serializes data in the order and types deﬁned in

the service description language. A performance eval-

uation quantifying toolkit performance (in terms of

overhead and response time) against Java Object Se-

rialization, Java RMI, SOAP web services, REST web

services, and Apache Thrift is presented. Toolkit per-

formance is found to be comparable to or improve

upon the performance of the alternative techniques.

CLOSER2014-4thInternationalConferenceonCloudComputingandServicesScience

240

ACKNOWLEDGEMENTS

The authors acknowledge Mikael

Ohman, Sebastian

ohn, and Anders H

aggstr

om for work related to

the project. This work is done in collaboration

with the High Performance Computing Center North

(HPC2N) and is funded by the Swedish Govern-

ment’s strategic research project eSSENCE and the

Swedish Research Council (VR) under contract num-

ber C0590801 for the project Cloud Control.

REFERENCES

Apache (2005). Apache Web Services Project - Axis2,

http://ws.apache.org/axis2, February 2014.

Apache (2009). Apache Avro, http://avro.apache.org/,

February 2014.

Apache (2010). Apache Thrift, http://thrift.apache.org/,

February 2014.

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A., Katz, R.,

Konwinski, A., Lee, G., Patterson, D., Rabkin, A.,

Stoica, I., et al. (2010). A view of cloud computing.

Communications of the ACM, 53(4):50–58.

Berl, A., Gelenbe, E., Di Girolamo, M., Giuliani, G.,

De Meer, H., Dang, M. Q., and Pentikousis, K. (2010).

Energy-efﬁcient cloud computing. The Computer

Journal, 53(7):1045–1051.

Birrell, A. D. and Nelson, B. J. (1984). Implementing re-

mote procedure calls. ACM Transactions on Com-

puter Systems (TOCS), 2(1):39–59.

Fielding, R. T. (2000). Architectural styles and the design

of network-based software architectures. PhD thesis,

University of California.

Furuhashi, S. (2011). MessagePack, https://github.com/ ms-

gpack/msgpack/blob/master/spec.md, February 2014.

Google (2008). https://developers.google.com/protocol-

buffers/, February 2014.

Govindaraju, M., Slominski, A., Chiu, K., Liu, P., Van En-

gelen, R., and Lewis, M. J. (2004). Toward character-

izing the performance of soap toolkits. In Grid Com-

puting, 2004. Proceedings. Fifth IEEE/ACM Interna-

tional Workshop on, pages 365–372. IEEE.

Jackson (2009). https://github.com/FasterXML/jackson,

February 2014.

Kephart, J. O. and Chess, D. M. (2003). The Vision of Au-

tonomic Computing. Computer, 36:41–50.

Kryo (2009). https://github.com/EsotericSoftware/kryo,

February 2014.

Mackey, A. (2010). Windows communication foundation.

In Introducing. NET 4.0, pages 159–173. Springer.

MongoDB Inc. (2007). BSON, http://http://bsonspec.org,

February 2014.

Oracle (2005). Java Object Serialization,

http://docs.oracle.com/javase/7/docs/platform/

serialization/spec/serialTOC.html, February 2014.

Ostberg, P.-O. and Elmroth, E. (2011). Increasing Flex-

ibility and Abstracting Complexity in Service-Based

Grid and Cloud Software. In F. Leymann, I. I., van

Sinderen, M., and Shishkov, B., editors, Proceed-

ings of CLOSER 2011 - International Conference on

Cloud Computing and Services Science, pages 240–

249. SciTePress.

Ostberg, P.-O., Espling, D., and Elmroth, E. (2013). Decen-

tralized scalable fairshare scheduling. Future Genera-

tion Computer Systems - The International Journal of

Grid Computing and eScience, 29:130–143.

Ostberg, P.-O., Hellander, A., Drawert, B., Elmroth, E.,

Holmgren, S., and Petzold, L. (2012). Reducing com-

plexity in management of escience computations. In

Proceedings of CCGrid 2012 - The 12th IEEE/ACM

International Symposium on Cluster, Cloud and Grid

Computing, pages 845–852.

Pautasso, C., Zimmermann, O., and Leymann, F. (2008).

Restful web services vs. big web services: making the

right architectural decision. In Proceedings of the 17th

international conference on World Wide Web, pages

805–814. ACM.

Restlet (2013). Restlet Framework, http://restlet.org, Febru-

ary 2014.

Slee, M., Agarwal, A., and Kwiatkowski, M. (2007).

Thrift: Scalable cross-language services implementa-

tion. Facebook White Paper, 5.

Van Deursen, A., Klint, P., and Visser, J. (2000). Domain-

speciﬁc languages: An annotated bibliography. Sig-

plan Notices, 35(6):26–36.

Vinoski, S. (1993). Distributed object computing with

corba. C++ Report, 5(6):32–38.

Walker, E. (2009). The real cost of a cpu hour. Computer,

42(4):35–41.

Wollrath, A., Riggs, R., and Waldo, J. (1996). A distributed

object model for the java system. Computing Systems,

9:265–290.

Creo:ReducedComplexityServiceDevelopment

241