Classifying the Reliability of the Microservices Architecture

Adrian Ramsingh

, Jeremy Singer

and Phil Trinder

School of Computing Science, University of Glasgow, Glasgow, U.K.

Keywords:

Microservices, Reliability, Web Architectures, Patterns, Bad Smells, Web Applications.

Abstract:

Microservices are popular for web applications as they offer better scalability and reliability than monolithic

architectures. Reliability is improved by loose coupling between individual microservices. However in pro-

duction systems some microservices are tightly coupled, or chained together. We classify the reliability of

microservices: if a minor microservice fails then the application continues to operate; if a critical microservice

fails, the entire application fails. Combining reliability (minor/critical) with the established classiﬁcations

of dependence (individual/chained) and state (stateful/stateless) deﬁnes a new three dimensional space: the

Microservices Dependency State Reliability (MDSR) classiﬁcation. Using three web application case studies

(Hipster-Shop, Jupyter and WordPress) we identify microservice instances that exemplify the six points in

MDSR. We present a prototype static analyser that can identify all six classes in Flask web applications, and

apply it to seven applications. We explore case study examples that exhibit either a known reliability pattern

or a bad smell. We show that our prototype static analyser can identify three of six patterns/bad smells in

Flask web applications. Hence MDSR provides a structured classiﬁcation of microservice software with the

potential to improve reliability. Finally, we evaluate the reliability implications of the different MDSR classes

by running the case study applications against a fault injector.

1 INTRODUCTION

Microservices are popular for web applications, since

they may offer better scalability and reliability than

monolithic components. Some microservices are

stateful, recording data, e.g. participants in a web

chat. Others are stateless, i.e. they simply accept re-

quests and purely process them.

Architectures with monolithic components are

prone to catastrophic failure, where user-visible

functionality is suddenly and permanently unavail-

able (Nikolaidis et al., 2004). It is common for the

failure of a single monolithic component to cause the

entire system to fail (a cascade failure).

In contrast microservice architectures potentially

provide improved reliability due to loose coupling of

services. If one microservice fails, others will remain

available. This may cause a reduction in throughput

but will most likely avoid catastrophic failure. In the

worst case scenario, the loose coupling of services en-

ables graceful failure (Soldani et al., 2018).

This is achieved based on the design principle that

https://orcid.org/0000-0003-3501-902X

https://orcid.org/0000-0001-9462-6802

https://orcid.org/0000-0003-0190-7010

microservices are implemented as standalone, inde-

pendent services (Jamshidi et al., 2018). However,

many large scale web applications include chains of

microservices where a set of services are closely de-

pendent, e.g. Netﬂix Titus (Ma et al., 2018).

Chained microservices are tightly coupled, e.g.

by high-frequency API-based interaction sequences.

Chained microservices make an application less reli-

able because if any of the services fail the entire chain

fails, and may induce catastrophic failure (Heorhiadi

et al., 2016). For example, in 2014 BBC experienced

a critical database overload that caused many criti-

cal microservices to fail one after another (Cooper,

2014). In 2015, Parse.ly experienced several cascad-

ing outages in its analytics data processing due to

a microservices message bus overload (Montalenti,

2015).

This paper makes the following research contribu-

tions.

(1) We combine a reliability (minor/critical) clas-

siﬁcation with the established classiﬁcations of de-

pendence (individual/chained) and state (stateful/s-

tateless). If a minor microservice fails the applica-

tion continues to function, although performance or

functionality may be reduced. If a critical microser-

Ramsingh, A., Singer, J. and Trinder, P.

Classifying the Reliability of the Microservices Architecture.

DOI: 10.5220/0011381700003318

In Proceedings of the 18th International Conference on Web Information Systems and Technologies (WEBIST 2022), pages 21-32

ISBN: 978-989-758-613-2; ISSN: 2184-3252

vice fails, the application fails catastrophically. Com-

bining reliability with state and dependence deﬁnes

a new three dimensional space: the Microservices

Dependency State Reliability (MDSR) classiﬁcation.

As microservice chains are necessarily critical (Heo-

rhiadi et al., 2016), only six of the possible eight

points in the space are valid. We outline a proto-

type static analyser that can identify all six MDSR

classes. Applying the tool to 30 microservices from

seven small Flask web applications reveals interest-

ing statistics, e.g. the majority of services are chained

(70%), and critical (77%) (Section 4).

(2) Using three web applications we highlight

microservices that exemplify each point in MDSR.

The web applications are: (1) Hipster-Shop, a

Google demo application; (2) JPyL, a Jupyter Note-

book/Flask web stack; and (3) WordPress, a content

management system (Section 2).

(3) We show that each of the MDSR critical case

study microservices exhibits a known bad smell (Taibi

and Lenarduzzi, 2018). Likewise in each minor

MDSR class the case study microservices follow a de-

sign pattern (Taibi and Lenarduzzi, 2018). We show

that the prototype static analyser can identify three of

six patterns/bad smells in Flask web applications. The

analysis offers the opportunity to focus reliability en-

gineering efforts early in the development cycle. That

is, we propose static MDSR analysis as a complement

to dynamic Service Dependency Graph (SDG) analy-

sis (Ma et al., 2018) (Section 5).

(4) We explore reliability implications of differ-

ent MDSR microservice classes by running the three

web applications against a simple process level fault

injector. Speciﬁcally we show: (1) All applications

fail catastrophically if a critical microservice fails. (2)

Applications survive the failure of a minor microser-

vice, and successive failures of minor microservices.

(3) The failure of any chain of microservices in JPyL

& Hipster is catastrophic. (4) Individual microser-

vices do not necessarily have minor reliability impli-

cations (Section 6).

2 CASE STUDIES

We illustrate our new classiﬁcation and analysis us-

ing three realistic microservice web applications.

Hipster-Shop is a popular Google microservices demo

web application; JPyL is a Jupyter/Flask microser-

vices web application; WordPress is a widely-used

Content Management System. The applications illus-

trate different aspects of real world microservice web

applications, e.g. Hipster-Shop implements microser-

vices in different languages, and both JPyL and Word-

Press combine monolithic and microservice compo-

nents.

2.1 Hipster-Shop

Key attractions of microservices are decentralization

and polyglotism. Here each service can be sepa-

rately developed with appropriate programming lan-

guages and tools, promoting agile development (Zim-

mermann, 2017). Many developers claim this is why

they prefer microservices (IBM, 2021).

The Hipster-Shop case study illustrates polyglot

development with services developed in Python, Go

and Java and communication via gRPC remote pro-

cedure calls. Hipster-Shop is an e-commerce applica-

tion with 10 microservices (Figure 1) used by Google

to demonstrate tools like Kubernetes Engine (Google,

2021). Users can perform activities like viewing prod-

ucts, adding items to cart and making purchases

Figure 1: Hipster-Shop Architecture.

2.2 JPyL: Jupyter/Python/Linux

Migrating from a monolithic architecture towards a

full microservices architecture is a gradual process.

It is common for developers to initially integrate one

or more microservice tiers that function alongside

monolithic components, in a hybrid approach known

as microlith (Soldani et al., 2018).

JPyL is a microlith web stack that combines the

popular Flask microservices web tier with monolithic

Jupyter components (Figure 2). The Flask tier has

seven microservices and is fairly conventional as out-

lined in Figure 2.

Some are supported by a data store, e.g. current

geolocation and IP address rely on the userdata mi-

croservice that interfaces with a MySQL database.

Data is displayed on the webpage via reverse proxy

and port conﬁguration microservices on port 10125.

Crucially for reliability a backup URL port can be ini-

tiated via redirect if the original port service is inter-

rupted.

https://github.com/GoogleCloudPlatform/

microservices-demo

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies

Each service is handled by a speciﬁc set of Flask

microservices

. For example, security headers are

processed by a Python Talisman microservice.

Figure 2: JPyL Application Architecture.

2.3 WordPress

WordPress is an open source Content Management

System (CMS) used for many websites (Patel et al.,

2011). As a standalone application WordPress has a

monolithic architecture with core CMS components

that communicate with a MySQL database. Mi-

croservices can be integrated with WordPress to pro-

vide plugins for additional features, e.g. to post com-

ments, allow subscription memberships or search in-

dexes (Cabot, 2018).

The application we study integrates microservice

endpoints that allow users to post comments

. The

service uses WordPress HTTP REST. It facilitates

communication between the microservice and the

monolithic components as shown in Figure 3.

Figure 3: WordPress Application Architecture.

https://bitbucket.org/latent12/microproject/src/master/

jpyl/

https://bitbucket.org/latent12/microproject/src/master/

wordpress/

3 RELATED WORK

3.1 Existing Classiﬁcations

Lewis and Fowler identify three microservices design

principles (Lewis and Fowler, 2015). (1) Independent

services — each service should run in its own process

and be deployed in its own container like one service

per Docker container. (2) Single functionality — one

business function per service. This is referred to as

the Single Responsibility Principle (SRP). (3) Com-

munication — often using a REST API or message

brokers.

Others have added other principles like reliabil-

ity (Heorhiadi et al., 2016), and advocate using de-

sign patterns like timeouts, bounded retries, circuit

breakers and bulkheads to mitigate failures (Heo-

rhiadi et al., 2016). However most assume that all

microservices are individual, but in reality there are

many types of microservices.

Microservices are commonly classiﬁed by their

properties and we outline some key properties below,

and summarise in Table 1.

3.1.1 Dependence

This property classiﬁes a microservice by how tightly

coupled it is with other microservices (Heorhiadi

et al., 2016; Ma et al., 2018; Ghirotti et al., 2018).

Coupling is the degree of dependence between soft-

ware components like microservices, and there are

different types like content, data and control cou-

pling (Offutt et al., 1993). Software architects seek

loose coupling, and this is often achieved for mi-

croservices through data coupling.

Individual microservices are loosely coupled to

other microservices and communication with other

microservices is typically via infrequent remote API

calls. Many individual microservices express com-

putations at a high level of abstraction, and provide

their own built-in runtimes, functionalities and data

stores (Salah et al., 2016). Examples for JPyL in-

clude the SSL and Service Logging services (Fig-

ure 2) while Hipster-Shop includes Frontend and Ad-

service microservices (Figure 1).

In contrast chained microservices are tightly cou-

pled with one or more other microservices. A chained

microservice is reliant on some form of constant com-

munication or a chain of calls with another service to

function (Heorhiadi et al., 2016; Rossi, 2018). This is

often due to control coupling where the chained ser-

vices must request and marshal data between them-

selves (Offutt et al., 1993).

In JPyL, the Reverse Proxy service is dependent

on the Port Conﬁguration service to display data on

Classifying the Reliability of the Microservices Architecture

the webpage through port 10125. Speciﬁcally the Re-

verse Proxy must access the Port Conﬁguration to de-

termine which backup URL port to use.

3.1.2 State

This property classiﬁes a microservice by whether

it preserves state between service requests. State-

ful microservices require data storage, for example to

record transactions or current actors (Wu, 2017). In

JPyL the Service Logging microservice is stateful: it

logs the status of all microservices in the application

in a MySQL database (Figure 2). Other microservices

are Stateless, i.e. they maintain no session state. Such

services typically accept requests, process them in a

pure fashion, and respond accordingly. In JPyL the

SSL microservice is stateless: it processes https re-

quests but maintains no session data.

3.1.3 Combining & Inheriting Properties

Microservices may have any combination of prop-

erties, e.g. individual/stateful or chained/stateless.

Properties may be inherited from other chained mi-

croservices, e.g. if any microservice is stateful then

the entire chain is stateful. In Hipster-Shop although

both Checkout and Payment microservices are state-

less, their chain with Cart Services is stateful as Cart

Services is stateful (Table 6).

3.2 Microservices Reliability

There are substantial studies of the reliability of mi-

croservice software in both the academic (Heorhiadi

et al., 2016; Zhou et al., 2018; Toffetti et al., 2015)

and grey literature (Wolff, 2018; Gupta and Pal-

vankar, 2020). These reveal that reliability in the mi-

croservices architecture is not always attainable be-

cause the reliability design principle is not always

followed. This principle states that a microservice

should be fault tolerant so that in the case of failure,

its impact on other services will be negligible (Power

and Kotonya, 2018).

However, developers do not always implement the

necessary design patterns or follow the Fowler and

Lewis design principles (Section 3.1) to prevent mi-

croservices failure. Even if they do, they remain un-

aware whether their microservice can actually toler-

ate failures until it actually occurs (Heorhiadi et al.,

2016). Thus, the impact of a microservices failure

on an application is not always readily known before-

hand.

Table 1: Microservices Classiﬁcation Criteria.

Classiﬁcation Properties Description

Dependence Individual Loosely Coupled.

Limited communi-

cation with other

microservices.

Chained Tightly Coupled. Con-

stant communication

with other microser-

vices required.

State Stateless No data store. Does not

maintain state.

Stateful Utilises data store.

Maintains state.

Reliability Critical Supports core function-

ality. Service fail-

ure means that the ap-

plication becomes sud-

denly and permanently

unavailable.

Minor Supports non-essential

functionality. Applica-

tion continues to func-

tion despite service fail-

ure. Degradation in

performance or grace-

ful failure over a period

of time.

3.3 Patterns and Bad Smells

Some design patterns capture reusable solutions to

common microservice design challenges (Taibi and

Lenarduzzi, 2018). For example the Database-Per-

Service pattern prevents tight coupling by ensuring

that multiple microservices are not dependent on a

single data store. Instead, each service accesses its

own private store (Taibi et al., 2018), eliminating the

single data store as a single point of failure (SPOF).

Some of the microservices patterns utilised in our

evaluation are summarised in Table 2

Table 2: Microservices Patterns Description.

Pattern Description

Database-Per-Design Microservice ac-

cesses its own

private data store.

API-Gateway Microservice com-

munication occurs

through an API.

Single Responsibility Principle Microservice per-

forms a single

functionality.

While design patterns like Database-Per-Service

help, they are not universal solutions. For ex-

ample a single atomic operation often spans mul-

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies

tiple microservices, and here additional techniques

are required to ensure consistency across the data

stores (Rudrabhatla, 2018).

Likewise microservice bad smells identify com-

mon designs that may cause issues (Taibi and Lenar-

duzzi, 2018). Indeed (Heorhiadi et al., 2016) and

the Fowler and Lewis design principles consider all

chained microservices as bad smells and prone to

reliability issues. Some of the microservices bad

smells found in our evaluation are summarised in Ta-

ble 3. Microservices Greedy, Shared Persistency and

Cyclic Dependency are listed in (Taibi and Lenar-

duzzi, 2018), Chained Services is mentioned by (Heo-

rhiadi et al., 2016) and SRP Violation is a well-known

microservice bad smell.

Table 3: Microservices Bad Smells.

Bad Smell Description

SRP Viola-

tion

Microservice performs more than one

functionality.

Reason: Microservice becomes more

critical. Increases the probability of

catastrophic failure in the application.

Microservices

Greedy

Microservices created for every feature

in an application.

Reason: More microservices could lead

to more points of failure.

Shared Per-

sistency

Different microservices access the same

data storage.

Reason: Single Point of Failure

(SPOF).

Chained

Services

Microservices that depend on commu-

nication or data marshalling from other

microservices.

Reason: Tight Coupling.

Cyclic

Dependency

Where there are cycles in the call graph,

e.g. A to B, B to C and C to A. A subset

of Chained Services.

Reason: Too much dependency.

3.4 Failures in Microservices

Even if the microservices design principles are fol-

lowed, failure is to be expected. There are two major

reasons for these failures (Zhou et al., 2018). Func-

tional Failures due to poor implementation e.g. a

SQL column missing error is returned upon some data

request. Environmental Failures due to misconﬁgu-

ration of the infrastructure necessary to run the mi-

croservices effciently e.g. microservices processing

of requests is slow due to insufﬁcient memory being

made available in the Docker environment.

Partial failures are typically temporary and re-

covery is automatic, e.g. a microservice without a

load balancer may be brieﬂy overloaded (Zhou et al.,

2018). Downtime can often be minimised if replace-

ment microservice instance(s) are activated automat-

ically (Toffetti et al., 2015). Partial failure is consid-

ered acceptable in the design of microservices appli-

cations.

In contrast, catastrophic failure is considered un-

acceptable as it can lead to long downtimes with-

out manual intervention (Nikolaidis et al., 2004). In

microservices, catastrophic failures are often termed

Interaction Failures (Zhou et al., 2018). Common

causes are incorrect coordination or communication

failure between microservices, e.g. asynchronous

message delivery lacking sequence control or a mi-

croservice receiving an unexpected output in its call

chain. The errors may be replicated in several mi-

croservice instances (Zhou et al., 2018), so even

switching workload from a failed instance doesn’t

help as the new instance fails in the same way.

Chained microservices are especially prone to in-

teraction faults because they violate the Single Re-

sponsibility Principle (SRP) and lead to brittle archi-

tectures (Heorhiadi et al., 2016). Moreover adding

more microservices to the chain increases coupling

and the likelihood of catastrophic failure (Rossi,

2018). If one service in the chain fails, there will be

a cascade of failures of all services in the chain (Heo-

rhiadi et al., 2016).

A limitation of (Heorhiadi et al., 2016) and the

Lewis and Fowler design principles is that they con-

sider only dependence. We extend their work by si-

multaneously classifying dependence, state and relia-

bility.

3.5 Detecting Failures

Failures in microservice-based applications may arise

from the microservices, or from the infrastructure ser-

vices like libraries, containers like Docker, develop-

ment frameworks like Flask, etc. Here we focus on

failures in the microservices, and detecting these fail-

ures is often challenging.

A common approach is to conﬁgure a collection of

microservices indicators (KPIs) to continuously mon-

itor for the causes of failure. The KPIs are typically

time series, e.g. the response time of a microservice

to requests from other services. The microservice is

identiﬁed as failing if it fails to meet the expected

KPI (Meng et al., 2020).

Service Dependency Graphs (SDGs) can be used

to dynamically detect microservices bad smells by

mapping their node relationships (Ma et al., 2018).

However, diagnosing the severity and reason for a

failure in a large system is challenging. The diagno-

sis usually requires domain and site-reliability knowl-

edge as well as automated observability support (Has-

Classifying the Reliability of the Microservices Architecture

selbring and Steinacker, 2017). Not all companies

have such resources.

4 CLASSIFYING RELIABILITY

4.1 Critical vs Minor Reliability

To analyse the reliability of a microservice architec-

ture we consider a microservice reliability property

alongside the established properties of state and de-

pendency.

Critical microservices provide core functionality

to the application, and if such a service fails, the en-

tire application fails catastrophically even if there are

several instances of the microservice. In JPyL, the

chained PortConﬁg to ReverseProxy services are crit-

ical because if the PortConﬁg service fails, the Re-

verseProxy service will not be able to determine the

port to display data or access the URL backup port.

As with other properties criticality is inherited within

chains, so if any microservice is critical then the entire

chain is critical.

Minor microservices provide non-essential func-

tionality. The application continues to operate if they

fail, although performance and/or functionality may

be reduced. In Hipster-Shop, the Adservice microser-

vice is minor because if it fails, the server returns a

404 status code indicating that the service is temporar-

ily unavailable. The rest of the application continues

to function normally.

It would also be possible to consider partial fail-

ures that eventually affect the operation of the sys-

tem (Cristian, 1991). However, given the challenges

of distinguishing between partial failures with differ-

ent severities we adopt a binary minor/critical clas-

siﬁcation. As partial failures are considered as gray

failures, severe partial failures are classiﬁed as criti-

cal, and low severity failures are classiﬁed as minor.

4.2 MDSR Classiﬁcation

Combining reliability with the state and dependency

classiﬁcations deﬁnes a three-dimensional space:

our new Microservices Dependency State Reliability

(MDSR) Classiﬁcation Tree as shown in Figure 4. It

can also be represented in tabular form as in Tables 5

and 6.

In MDSR chained microservices are necessarily

critical as argued in (Heorhiadi et al., 2016), and con-

ﬁrmed in our evaluation (Section 6) even for chains

that attempt to recover reliability using microservice

patterns. As examples we implement a Database-Per-

Service pattern for the chained/stateful UserData &

White/Black Listing service and an API Gateway pat-

tern for the chained/stateless/ Product Catalog & Rec-

ommended service. In both cases the application fails

catastrophically despite reporting only a ”404 Service

Not Found” error.

The fourth rows of Tables 5 and 6 show exam-

ple microservices from the case study applications for

each of the six MDSR classes. For example the indi-

vidual/stateful/minor exemplar is JPyL’s Service Log-

ging microservice. The ﬁfth rows of the tables show

the error reported if the service fails.

4.3 Semi-automatic Classiﬁcation

Static analysis of a set of microservices can automat-

ically propose MDSR classiﬁcations for many mi-

croservices in an application. We demonstrate the

principle with a prototype analyser that classiﬁes all

Python/Flask microservices in a source project

. The

analyser tokenises the Python/Flask code, and identi-

ﬁes properties using keyword matches. As examples,

the presence of keywords like ”SQL” or ”JSON” clas-

siﬁes a service as stateful; the presence of ”request”,

”requests”, ”requests.get”, ”get” Flask keywords, or

use of the ”POST” or ”GET” methods classiﬁes a ser-

vice as chained as they indicate a service is pushing

data or requesting information from other microser-

vices. Reliability is determined by the type of pattern

or bad smell detected as discussed in Sections 5.2 &

5.3. Figure 5 shows a screenshot of the tool’s output

for JPyL.

The prototype analyser identiﬁes all six MDSR

classes. The analyser may, however, propose an in-

correct classiﬁcation, for example a stateless service

may be incorrectly classiﬁed as stateful if a keyword

like ”SQL” appears in a comment. Similarly, a state-

ful microservice could be classiﬁed as stateless if it

uses a persistent store that is not included in the cur-

rent set of keywords.

Despite these limitations the analyser is effective

in classifying microservices. For example Tables 5

and 6 show how it correctly classiﬁes all of the JPyL

microservices. We have also applied the analyser to a

total of 30 microservices in a further six small Flask

web application projects

. Manual inspection of 3 of

the projects validates the properties identiﬁed by the

https://bitbucket.org/latent12/microproject/src/master/

analyser/

GitHub Links: https://github.com/IBM/worklog/tree/

master/app, https://github.com/IBM/Flask-microservice,

https://github.com/bakrianoo/Flask-elastic-microservice,

https://github.com/airavata/Blitzkrieg, https://github.

com/michaellitherland/Flask-microservice-demo,

https://github.com/umermansoor/microservices

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies

Microservice

Individual Chained

Stateful Stateless Stateful/

Critical

Stateless/

Critical

Critical MinorMinor Critical

Figure 4: MDSR Classiﬁcation Tree.

Figure 5: MDSR Analyser JPyL Output.

analyser. As a further example the analyser output for

the IBM worklog application is shown in Figure 6,

and corresponds to the use case diagram provided by

IBM

Figure 6: MDSR Analyser IBM Worklogs Output.

Table 4 shows the number and percentage of each

MDSR class of microservice that the analyser detects

in the seven web applications, and key observations

are as follows. The majority of services are chained

(70%), and most are stateful (73%). 50% of services

are from a single MDSR classiﬁcation, i.e. chained/s-

tateful/critical. Perhaps most startling is that 77% of

https://github.com/IBM/worklog/blob/master/designs/

services are critical. We speculate that this reﬂects

that the designers of these small web applications

have not designed them to be reliable.

Table 4: MDSR Classiﬁcations of 7 small Flask Applica-

tions containing 30 Microservices.

Classiﬁcation No. Services %

Individual 9 30

Chained 21 70

Stateful 22 73

Stateless 8 27

Critical 23 77

Minor 7 23

Individual/Stateful 7 23

Individual/Stateless 2 7

Chained/Stateful 15 50

Chained/Stateless 6 20

Individual/Stateful/Critical 2 7

Individual/Stateless/Critical 1 3

Chained/Stateful/Critical 15 50

Chained/Stateless/Critical 6 20

Individual/Stateful/Minor 5 17

Individual/Stateless/Minor 1 3

5 IDENTIFYING RELIABILITY

MDSR analysis provides information about the ex-

pected reliability of microservices and chains of mi-

croservices in an architecture. In general reliability

engineering should focus on the 77% of critical mi-

croservices identiﬁed by the analysis in Table 4. More

speciﬁcally the analysis can help identify design pat-

terns and bad smells in the architecture. To illustrate,

the sixth row of the Patterns and Bad Smell tables

(Tables 5 & 6) identify the microservice pattern or

bad smell associated with each point in the classiﬁca-

Classifying the Reliability of the Microservices Architecture

tion space. Of the patterns and bad smells enumerated

in (Taibi and Lenarduzzi, 2018) (and summarised in

Tables 2 & 3) the case studies exhibit four out of eight

patterns and three out of eleven bad smells.

Static analysis enables the early identiﬁcation of

patterns and bad smells, allowing developers to an-

ticipate the types of failures, and their likely impact.

Potentially this information allows developers to trou-

bleshoot problems faster and prevent long application

downtimes.

5.1 MDSR Patterns & Implications

The sixth row of the MDSR Patterns table (Table 5)

identiﬁes the microservice pattern exhibited by the

case study example microservice, or microservice

chain. In our case study applications, individual/state-

ful microservices and individual/stateless microser-

vices have only minor reliability implications if they

implement a pattern as shown in Table 5. For exam-

ple JPyL Service Logging is individual/stateful/minor

and implements the Database-Per-Service pattern.

5.2 MDSR Bad Smells & Implications

The sixth row of the MDSR Bad Smells table (Ta-

ble 6) identiﬁes the microservice bad smell exhibited

by the case study example microservice, or microser-

vice chain. Considering the Patterns and Bad Smells

tables together (Tables 5 and 6) we see that the ex-

ample case study microservices at each point in the

MDSR classiﬁcation exhibit either a design pattern

or a bad smell.This is expected as microservice best

practice applies patterns, while bad smells indicates

places where design principles have not, or cannot be

applied (Heorhiadi et al., 2016).

Bad smells identiﬁed by MDSR can be considered

for refactoring to improve reliability. That is, most

critical microservices are associated with known bad

smells as shown in Table 6. For example the indi-

vidual/stateless/critical SSL microservice in JPyL is

an instance of Microservices Greedy, where there is

a proliferation of microservices. Of course SSL need

not be implemented as a microservice.

A key element of MDSR is that chained microser-

vices remain critical, even if they implement good de-

sign patterns. For example, the chained/stateful/crit-

ical UserData & White/Black Listing microservices

implement the Database-Per-Service pattern but still

fail catastrophically as we show in Section 6.

5.3 Semi-automatic Pattern/Bad Smell

Detection

The prototype MDSR analyser can detect the

Database-Per-Service pattern, and both chained mi-

croservices and Shared Persistency bad smells.

Shared Persistency is detected by determining

whether any microservices share a data store. Cur-

rently the user must provide the analyser with the

names of the data stores used in the application, e.g.

jpyl micro in JPyL. An enhanced analyser could parse

the Flask code and extract the data store names from

connection statements. The analyser counts the num-

ber of times each data store name appears in the mi-

croservices in the given directory. If the count is

greater than 1, the microservices have a shared per-

sistency bad smell. Microservices with a unique per-

sistent store name implement a Database-Per-Service

pattern.

There are some bad smells that the analyser isn’t

able to detect. Some of these, like Cyclic Dependency

could be detected dynamically, perhaps using SDGs

to examine the connection between services and the

rate of communication (Ma et al., 2018; Omer and

Schill, 2011). Other bad smells likely require human

analysis, like Microservices Greedy and SRP viola-

tion.

The analyser also inspects Flask error handling

codes to classify the reliability of a microservice. For

example if a service returns a 404 error indicating that

the service is not found or temporarily unavailable the

failure is considered minor. In contrast a code like 415

indicates that there is a SSL Protocol Violation, and

the service is critical because even if other services

are available, the application cannot accept https re-

quests, and has failed catastrophically.

Table 7 shows the number and percentage of bad

smells and patterns detected by the analyser in the

seven web applications. Key observations are as fol-

lows. (1) 17% of services implement Database-Per-

Service. (2) As 70% of the services are chained (Ta-

ble 4), they are the most common bad smells. This

accords with, and provides evidence for, the claim in

(Heorhiadi et al., 2016) that developers do not always

implement reliability patterns.

6 EVALUATING RELIABILITY

We execute the Hipster, JPyL & WordPress web ap-

plications against a simple fault injector to investi-

gate the reliability implications of different MDSR

classes. All applications are executed on a typical

server, i.e. a 16 core Intel server with 2TB of RAM

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies

Table 5: MDSR Pattern Classiﬁcation.

Individual Chained

Stateful Stateless Stateful Stateless

Critical Minor

Critical

Minor Critical Critical

Microservices

JPyL

Service

Logging

Hipster

Adservice

JPyL

Security

Headers

JPyL

UserData,

White/Black

Listing

Hipster

Recommend,

ProductCatalog

Failure

Impact

404 Service

Not Found

404 Service

Not Found

404 Service

Not Found

404 Service

Not Found

Pattern

Database-Per

Service

Single

Responsibility

Principle (SRP)

Database-Per

Service

API

Gateway

Table 6: MDSR Bad Smell Classiﬁcation.

Individual Chained

Stateful Stateless Stateful Stateless

Critical Minor Critical Minor Critical Critical

Microservices

Hipster

Frontend

JPyL

SSL

Hipster

Payment,

Checkout,

Cart

WordPress

Comments

HTTP REST

Hipster

Shipping,

Checkout

JPyL

PortConﬁg,

ReverseProxy

Failure

Impact

500

Internal Server

Error

ERR SSL

Protocol Failure

Error

Database

Connection

Error

500

Internal Server

Error

Bad

Smells

SRP

Violation

Microservices

Greedy

Chained, Shared

Persistency

Chained, Cyclic

Dependency

Table 7: MDSR Patterns/Bad Smells found in 7 small Flask

Applications containing 30 Microservices.

Patterns/Bad Smells No. Services %

Database-Per-Service 5 17

Shared Persistency 2 7

Chained Services 21 70

Other 2 6

running Ubuntu 18.04. Hipster uses Minikube 1.19.0

and multiple languages including Python 3.6, Go 1.10

and C# 8.0. JPyL uses Jupyter Server 6.1, Python 3.6,

MySQL 5.7 and Flask 1.1.2. WordPress v5.7.2 uses

PHP 7.2 and MySQL 5.7. The code for all applica-

tions, tools and experiments are available

6.1 Critical Microservice Failures

Catastrophic failure is a major challenge for web

applications and our case study applications are no

https://bitbucket.org/latent12/microproject/src/master/

exception. To investigate the failure of critical

microservices we target the chained/stateful/critical

HTTP REST microservice in WordPress, the individ-

ual/stateless/critical SSL microservice in JPyL & the

chained/stateful/critical Cart Service in Hipster.

Figures 7, 8 & 9 plot throughput (Request KB/s)

against time. The red line in each box plot is the me-

dian throughput from three executions. Once estab-

lished, all services have throughputs of approximately

600KB/s. When the fault injector kills the critical mi-

croservice at 43s the applications fail almost instanta-

neously: by 50s throughput is 0KB/s.

6.2 Minor/Individual Failures

Our ﬁrst investigation of the failure of minor mi-

croservices uses an individual microservice. Specif-

ically we target the individual/stateful/minor Service

Logging microservice in JPyL. Recall that, although

stateful, this microservice has a private store follow-

Classifying the Reliability of the Microservices Architecture

Figure 7: A JPyL Critical Failure (SSL) at 43s.

Figure 8: A WordPress Critical Failure (HTTP REST &

Comment) at 43s.

Figure 9: Hipster Critical Failure (Cart & Payment Ser-

vices) at 43s.

ing the Database-per-service design pattern.

As before, Figure 10 plots JPyL throughput (Re-

quest KB/s) against time, and the service is running

at around 600KB/s. Once the fault injector kills the

critical microservice at 43s the application continues

to serve pages, but throughput falls dramatically but

brieﬂy to around 2KB/s. By 50s the application is

able to recover to a throughput of around 520KB/s.

Figure 10: JPyL Minor/Individual Failure (Service Log-

ging) at 43s.

6.3 Critical/Chained Failures

We next investigate the failure of critical/chained

chained microservices. Speciﬁcally we target the

chained/stateful/critical User Data & White/Black

Listing microservices in JPyL and the chained/state-

less/critical Product Catalog & Recommended mi-

croservices in Hipster. Both microservice chains im-

plement patterns that aim to recover reliability (Sec-

tion 4).

As before Figure 11 plots JPyL throughput (Re-

quest KB/s) against time, and the service is running at

around 600KB/s. Once the fault injector kills the pair

of microservices at 43s the application reports a 404

Service Not Found error and continues to serve pages.

However the throughput has fallen to around 2KB/s.

That is the application is barely able to accept client

requests or even load in a browser quickly. A sim-

ilar failure is reported for Hipster when the Product

Catalog service fails (Figure 12).

For realistic workloads the failure of chained/crit-

ical microservices even with pattern implementations

has caused the applications to fail catastrophically!

6.4 Multiple Microservices Failure

Even if an application survives the failure of a sin-

gle minor microservice, how will it cope when mul-

tiple microservices fail successively? To investigate

the failure of multiple microservices in JPyL we tar-

get three microservices i.e. Service Logging, Security

Headers & User Data – White/Black Listing. Speciﬁ-

cally the fault injector kills these microservices in or-

der at approximately 16s, 32s and 48s into the execu-

tion.

Figure 13 plots JPyL throughput (Request KB/s)

against time, and the service is running at around

600KB/s. When the fault injector kills the in-

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies

Figure 11: JPyL Chained (User Data & White-Blacklisting)

at 43s.

Figure 12: Hipster Chained (Product & Recommended) at

43s.

Figure 13: JPyL Multiple Minor Failures at 16s, 32s, 48s.

dividual/minor microservices the throughput drops

brieﬂy to around 2KB/s, but then recovers to around

600KB/s. As before, when the chained/critical mi-

croservice fails at 48s the application fails catastroph-

ically.

6.5 Evaluation Summary

The key ﬁndings from our evaluation are as follows.

(1) All case study applications fail catastrophically

if a critical microservice fails ( 7, 8 & 9). (2) JPyL

survives the failure of an individual/minor microser-

vice (Figure 10), and even the successive failure of

two individual/minor microservices (up to 40s in Fig-

ure 13) (3) The failure of any chain of microservices

in JPyL & Hipster is catastrophic: throughput being

dramatically reduced (by 98%) (Figures 11, 12 and

after 48s in Figure 13). (4) Individual microservices

do not necessarily have minor reliability implications,

e.g. the Hipster Frontend is individual/stateful/critical

and the JPyl SSL is individual/stateless/critical (Fig-

ure 7).

7 CONCLUSION

Microservices are commonly classiﬁed based on their

dependence (chained/individual) or state (stateful/s-

tateless). We add a binary reliability classiﬁcation,

and combine it with the other classiﬁcations to deﬁne

a three dimensional space: the MDSR Classiﬁcation

in Figure 4 (Section 4). Using three established web

applications we exhibit microservices that exemplify

the six MDSR classes. We outline a prototype static

analyser that can statically identify all six classes in

Flask web applications, and apply it to seven small

web applications. Analysing the applications reveals

that the majority of services are chained (70%), state-

ful (73%) and critical (77%) (Table 4). We speculate

that the high percentage of critical services indicates

that the applications are not designed for reliability.

We demonstrate that each of the MDSR critical

case study microservices exhibits a known bad smell;

and that each minor MDSR class in the case study mi-

croservices follows a design pattern (Taibi and Lenar-

duzzi, 2018). Across, the seven applications we ﬁnd

that 70% consist of the chained services bad smell

by default while only 17% were implemented for re-

siliency as they consist of the Database-Per-Service

pattern. Hence MDSR provides a framework to anal-

yse the properties of microservices and chains of mi-

croservices in a system, identifying components to be

considered for refactoring to improve reliability (Sec-

tion 5).

In future work we plan to apply the MDSR classi-

ﬁcation to larger microservice systems, e.g. to Death

Star

. It would be interesting to see whether MDSR

could be extended to also classify failures in infras-

tructure services.

https://github.com/djmgit/DeathStar

Classifying the Reliability of the Microservices Architecture

We also seek to enhance the analyser to make it

more automatic, e.g. to automatically detect more

persistent stores. Perhaps the analyser could sug-

gest when some microservice resilience patterns, like

bulkhead or load balancer, are implemented to reduce

criticality? Finally we would like to investigate the

potential of combining static MDSR analysis with dy-

namic SDG analysis.

REFERENCES

Cabot, J. (2018). Wordpress: A content management

system to democratize publishing. IEEE Software,

35(3):89–92.

Cooper, R. (2014). BBC online outage on saturday 19th july

2014. https://www.bbc.co.uk/blogs/internet/entries/

a37b0470-47d4-3991-82bb-a7d5b8803771.

Cristian, F. (1991). Understanding fault-tolerant distributed

systems. Comm. ACM, 34(2):56–78.

Ghirotti, S. E., Reilly, T., and Rentz, A. (2018). Tracking

and controlling microservice dependencies. Comm.

ACM, 61(11):98–104.

Google (2021). Hipster-shop microservices demo.

https://github.com/GoogleCloudPlatform/

microservices-demo.

Gupta, D. and Palvankar, M. (2020). Pitfalls and challenges

faced during a microservices architecture implemen-

tation. Technical report.

Hasselbring, W. and Steinacker, G. (2017). Microservice

architectures for scalability, agility and reliability in

e-commerce. In IEEE Intl Conf on Software Architec-

ture Workshops, pages 243–246.

Heorhiadi, V., Rajagopalan, S., Jamjoom, H., Reiter, M. K.,

and Sekar, V. (2016). Gremlin: Systematic resilience

testing of microservices. In IEEE 36th Intl Conf on

Distributed Computing Systems, pages 57–66.

IBM (2021). Microservices in the enterprise, 2021: Real

beneﬁts, worth the challenges. Technical report, Tech-

nical report, IBM.

Jamshidi, P., Pahl, C., Mendonc¸a, N. C., Lewis, J., and

Tilkov, S. (2018). Microservices: The journey so far

and challenges ahead. IEEE Software, 35(3):24–35.

Lewis, J. and Fowler, M. (2015). Microservices: A deﬁni-

tion of a new architectural term. https://martinfowler.

com/articles/microservices.html.

Ma, S.-P., Fan, C.-Y., Chuang, Y., Lee, W.-T., Lee, S.-J., and

Hsueh, N.-L. (2018). Using service dependency graph

to analyze and test microservices. In IEEE 42nd An-

nual Computer Software and Applications Conf, vol-

ume 2, pages 81–86.

Meng, Y., Zhang, S., Sun, Y., Zhang, R., Hu, Z., Zhang,

Y., Jia, C., Wang, Z., and Pei, D. (2020). Localizing

failure root causes in a microservice through causality

inference. In IEEE/ACM 28th Intl Symp on Quality of

Service (IWQoS), pages 1–10.

Montalenti, A. (2015). Kafkapocalypse: a post-

mortem on our service outage. https://blog.parse.ly/

kafkapocalypse/.

Nikolaidis, E., Chen, S., Cudney, H., Haftka, R. T., and

Rosca, R. (2004). Comparison of probability and pos-

sibility for design against catastrophic failure under

uncertainty. J. Mech. Des., 126(3):386–394.

Offutt, A. J., Harrold, M. J., and Kolte, P. (1993). A soft-

ware metric system for module coupling. J. Systems

and Software, 20(3):295–308.

Omer, A. M. and Schill, A. (2011). Automatic management

of cyclic dependency among web services. In 2011

14th IEEE Intl Conf on Computational Science and

Engineering, pages 44–51.

Patel, S. K., Rathod, V., and Prajapati, J. B. (2011). Per-

formance analysis of content management systems-

joomla, drupal and wordpress. Intl J. Computer Ap-

plications, 21(4):39–43.

Power, A. and Kotonya, G. (2018). A microservices archi-

tecture for reactive and proactive fault tolerance in IoT

systems. In IEEE 19th Intl Symposium on” A World

of Wireless, Mobile and Multimedia Networks”, pages

588–599.

Rossi, D. (2018). Consistency and availability in microser-

vice architectures. In Intl Conf on Web Information

Systems and Technologies, pages 39–55.

Rudrabhatla, C. K. (2018). Comparison of event choreogra-

phy and orchestration techniques in microservice ar-

chitecture. Intl J. Advanced Computer Science and

Applications, 9(8):18–22.

Salah, T., Zemerly, M. J., Yeun, C. Y., Al-Qutayri, M., and

Al-Hammadi, Y. (2016). The evolution of distributed

systems towards microservices architecture. In Proc.

11th Intl Conf for Internet Technology and Secured

Transactions, pages 318–325.

Soldani, J., Tamburri, D. A., and Van Den Heuvel, W.-J.

(2018). The pains and gains of microservices: A sys-

tematic grey literature review. J. Systems and Soft-

ware, 146:215–232.

Taibi, D. and Lenarduzzi, V. (2018). On the deﬁnition of

microservice bad smells. IEEE Software, 35(3):56–

62.

Taibi, D., Lenarduzzi, V., and Pahl, C. (2018). Architec-

tural patterns for microservices: A systematic map-

ping study. In CLOSER, pages 221–232.

Toffetti, G., Brunner, S., Bl

ochlinger, M., Dudouet, F.,

and Edmonds, A. (2015). An architecture for self-

managing microservices. In Proceedings of the 1st

Intl Workshop on Automated Incident Management in

Cloud, pages 19–24.

Wolff, E. (2018). Why microservices fail: An experience

report. Technical report.

Wu, A. (2017). Taking the cloud-native approach with mi-

croservices. Ma-genic/Google Cloud Platform.

Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Li, W., and

Ding, D. (2018). Fault analysis and debugging of mi-

croservice systems: Industrial survey, benchmark sys-

tem, and empirical study. IEEE Trans. on Software

Engineering.

Zimmermann, O. (2017). Microservices tenets. Computer

Science-Research and Development, 32(3):301–310.

WEBIST 2022 - 18th International Conference on Web Information Systems and Technologies