Ofﬂine Mining of Microservice-based Architectures

Jacopo Soldani

, Javad Khalili, and Antonio Brogi

Department of Computer Science, University of Pisa, Pisa, Italy

Keywords:

Microservices, Microservices Architecture, Software Architecture Mining.

Abstract:

Designing, implementing, and operating microservices is known to be complex and costly, mainly due to the

multitude of heterogeneous software services forming a microservice-based application. Such tasks can be

simpler if a speciﬁcation of the microservice-based architecture (MSA) of an application is available. At the

same time, due to the number of services and service interactions in a MSA, manually generating a speciﬁ-

cation of such MSA is complex and costly. For this reason, in this paper we present a novel technique for

automatically mining the speciﬁcation of a MSA from its Kubernetes deployment. The obtained MSA speci-

ﬁcation is in µTOSCA, a microservice-oriented proﬁle of the human- and machine-readable OASIS standard

TOSCA. We also present a prototype implementation of our technique, which we use to assess it by means of

case studies based on third-party applications.

1 INTRODUCTION

Microservice-based architectures (MSAs) are known

to enable realising so-called cloud-native applica-

tions, viz., applications architected to fully exploit the

potentials of cloud computing platforms (Kratzke and

Quint, 2017). This resulted in MSAs becoming com-

monplace. For instance, Amazon, Netﬂix, or Twit-

ter are already exploiting MSAs to deliver their busi-

nesses (Soldani et al., 2018).

MSAs are essentially service-oriented architec-

tures satisfying some additional key design principles,

e.g., ensuring services’ independent deployability and

horizontal scalability, or isolating failures (Zimmer-

mann, 2017). It is hence crucial to determine whether

a service-based application adheres to the key design

principles of MSAs, and understanding how to refac-

tor an application to resolve possible violations of

such key design principles (Soldani et al., 2021).

µTOSCA and µFRESHENER (Brogi et al., 2020)

enable modelling, analysing, and refactoring the ar-

chitecture of a service-based application, to enhance

its adherence to the key design principles of MSAs.

µTOSCA is a model enabling to specify MSAs

with the human- and machine-readable OASIS stan-

dard TOSCA (OASIS, 2020). MSAs are represented

by typed directed graphs, called topology graphs,

where nodes model the services, integration compo-

https://orcid.org/0000-0002-2435-3543

https://orcid.org/0000-0003-2048-2468

nents (e.g., load balancers or message queues), and

databases forming a MSA. Oriented arcs represent the

interactions among such components.

µFRESHENER (Brogi et al., 2020) enables visu-

alising the µTOSCA speciﬁcation of an MSA, and

automatically analysing the speciﬁed MSA to check

whether the application includes some known archi-

tectural smells, viz., possible symptoms of violations

of the key design principles of MSAs. µFRESHENER

also enables reasoning on how to refactor an appli-

cation to resolve identiﬁed architectural smells, based

on applying practitioner-shared refactorings known to

resolve their occurrence (Neri et al., 2020).

On the other hand, MSAs can often include hun-

dreds of interacting services (Forti et al., 2022). This

makes manually specifying MSAs in µTOSCA a

complex, time-consuming, and error-prone process

(Soldani et al., 2021). To this end, µMINER (Muntoni

et al., 2021) was proposed to automatically mine the

µTOSCA speciﬁcation of the MSA of an application,

given the latter’s Kubernetes deployment. µMINER

ﬁrst elicits the services, integration components, and

databases in an MSA from the deployment speciﬁca-

tion in Kubernetes. It then runs the application de-

ployment on a devoted cluster, and it loads the de-

ployed application. µMINER also sniffs the packets

exchanged among the deployed application compo-

nents to mine the occurring interactions. This requires

µMINER to run with root privileges on the cluster, and

the application to not encrypt any of the messages ex-

Soldani, J., Khalili, J. and Brogi, A.

Ofﬂine Mining of Microservice-based Architectures.

DOI: 10.5220/0011061000003200

In Proceedings of the 12th International Conference on Cloud Computing and Services Science (CLOSER 2022), pages 63-73

ISBN: 978-989-758-570-8; ISSN: 2184-5042

changed among deployed components, which can of

course happen only in a testing environment. The

deployed application must also be loaded to stress

all possible service interactions, to allow µMINER

to monitor them. In short, µMINER requires to run

the target microservice-based application in a suitably

conﬁgured testing environment.

In this paper, we propose a different technique

for mining the µTOSCA speciﬁcation of the MSA

of an application, which can work with any existing

application deployment, rather than requiring a suit-

ably conﬁgured testing environment. Our technique

starts from the Kubernetes deployment of an appli-

cation, conﬁgured to also exploit Istio

and Kiali

two Kubernetes-native tools for proxying deployed

services and monitor their interactions. It then pro-

cesses, ofﬂine, the Kubernetes manifest ﬁles speci-

fying the application deployment and the Istio-based

proxying of its services, as well as a graph generated

by Kiali in any former run of the application, e.g., its

production run. The Kiali graph models the deployed

software components (as nodes) and their monitored

interactions (as oriented arcs). Given such inputs,

our technique can automatically mine the MSA of

an application in two steps. It ﬁrst elicits the soft-

ware components and their interactions, producing a

ﬁrst draft of the MSA of an application. The draft is

then reﬁned by distinguishing services from integra-

tion components and databases, and by characteris-

ing the mined interactions, e.g., determining whether

circuit breakers or timeouts are used therein. The re-

ﬁned architecture is ﬁnally marshalled to µTOSCA,

obtaining a speciﬁcation which can be processed by

µFRESHENER.

To illustrate the feasibility of the proposed min-

ing technique, we present an open source proto-

type implementation, called µTOM (µTOSCA Of-

ﬂine Miner). We also show how we used µTOM to

run case studies based on two existing, third-party ap-

plications, viz., Robot Shop (Instana, 2021) and On-

line Boutique (Google Cloud, 2021). The case studies

show that µTOM effectively mines the MSAs of the

considered applications, and that it also captures more

details if compared with µMINER.

The paper is organised as follows. Section 2 provides

the necessary background on µTOSCA, Kubernetes,

Istio, and Kiali. Section 3 presents our technique for

mining MSAs ofﬂine, while Section 4 introduces its

open source prototype implementation. Section 5 il-

lustrates the case studies assessing our technique. Fi-

nally, Sections 6 and 7 discuss related work and draw

some concluding remarks, respectively.

https://istio.io.

https://kiali.io.

2 BACKGROUND

We hereafter provide the necessary background on

µTOSCA (Section 2.1), Kubernetes (Section 2.2),

and Istio and Kiali (Section 2.3).

2.1 µTOSCA

The µTOSCA type system (Figure 1) allows specify-

ing MSAs as typed topology graphs in TOSCA, the

Topology and Orchestration Speciﬁcation for Cloud

Applications (OASIS, 2020). Topology nodes model

the services, communication patterns, or databases in

an MSA. A Service runs some business logic, e.g.,

a service managing users’ orders in an e-commerce

application. A CommunicationPattern implements

message-based integration pattern (Hohpe and Woolf,

2003), viz., MessageRouter and MessageBroker,

which decouples the communication among two or

more components. MessageBrokers are also distin-

guished based on whether they implement message

brokering asynchronously (AsynchronousMessage-

Broker) or synchronously (SynchronousMessage-

Broker). Finally, a Database is a component stor-

ing the data pertaining to a certain domain, e.g., a

database of orders in an e-commerce application.

Topology arcs instead model the interactions

among the components in an MSA, throughout Inte-

ractsWith relationships. Such relationships can be fur-

ther characterized by setting three boolean properties,

viz., circuit breaker, timeout, and dynamic discovery.

circuit breaker, timeout allows specifying whether the

source node is interacting with the target node via a

circuit breaker or by setting proper timeouts. dyna-

mic discovery instead allows to specify whether the

endpoint of the target of the interaction is dynamically

discovered (e.g., by exploiting a discovery service).

Service Database

MessageRouter

AsynchronousMessageBroker SynchronousMessageBroker

InteractsWith Edge

micro.nodes.Root

CommunicationPattern

MessageBroker

micro.relationships.Root micro.groups.Root

Figure 1: The node types, relationship types, and group

types deﬁning µTOSCA. The corresponding deﬁnitions

in TOSCA are publicly available on GitHub at https://

di-unipi-socc.github.io/microTOSCA/microTOSCA.yml.

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

Finally, nodes can be added to an Edge group.

The latter speciﬁes the application components that

are publicly accessible from outside of the applica-

tion, namely those components that can be directly

accessed by external clients.

Example. Figure 2 displays an example of µTOSCA

topology modelling the MSA of a toy e-commerce ap-

plication. The application includes four services, i.e.,

frontend (accessible by external clients), orders, pay-

ment, and shipping. It is then completed by two in-

tegration components, i.e., router and queue, and two

databases, i.e., catalogDb and ordersDb. The frontend

allows browsing the catalogue of available products,

by interacting with catalog. The actual instance of

catalog used to access the catalogDb is dynamically

discovered by a message router implementing server-

side service discovery. The frontend also allows to

place orders, by interacting with orders. The latter al-

lows to upload new product orders, which are stored

in ordersDb, and which are also enqueued in the asyn-

chronous message broker implementing the queue of

orders to be shipped. The latter is consumed by the

service shipping, which pulls orders from the queue

and proceeds with their shipping.

edge

frontend

ordersDb

router

queue shipping

Service

Message

Router

Asynchronous

MessageBroker

Database

InteractsWith

(circuit_breaker set)

Edge

orders

catalogDb

catalog

Legend

Figure 2: An example of µTOSCA topology modelling the

architecture of an application.

2.2 Kubernetes

Kubernetes allows deploying and managing multi-

service applications in distributed clusters. Such a de-

ployment and management is realised by orchestrat-

ing pods, which constitute Kubernetes’ deployment

units. A pod is a deployable instance of an application

service, which is shipped within a single container or

in several tightly coupled containers. A pod can actu-

ally encapsulate multiple Docker containers that need

to share the same resources, e.g., when a container-

ised service is accompanied by “sidecar” containers

monitoring it or proxying its communications.

Pod instances are deployed and managed with Ku-

bernetes controllers. The latter allow to spawn and

manage pod instances from pod templates, which are

included in workload resource speciﬁcations, e.g.,

Deployments, StatefulSets, and ReplicaSets. The lat-

ter specify the Docker containers running in a pod,

their target state, as well as the number of replicas of

the pod that must be deployed. Kubernetes controllers

then ensure that the speciﬁed number of replicas of a

pod continue to run on a cluster, with each pod in-

stance reaching and maintaining its target state.

Replicated pods can be accessed through Kuber-

netes services, which deﬁne their load balancing poli-

cies. A Kubernetes service indeed implements a mes-

sage routing component, which receives requests and

balances them among the pods it manages accord-

ing to the speciﬁed balancing policy. Kubernetes ser-

vices can be of multiple types, depending on whether

they should be accessible only within the Kubernetes

cluster (viz., ClusterIP services), or whether they

should be exposed to external clients (viz., NodePort

or LoadBalancer services).

2.3 Istio and Kiali

Istio and Kiali are two Kubernetes-native tools for

controlling and monitoring service interactions. Is-

tio includes so-called envoy proxies in a Kubernetes

deployment. Envoy proxies are deployed as sidecar

proxies alongside application services to control how

they interact with each other. This is done by spec-

ifying VirtualServices or DestinationRules, which al-

low deﬁning how to route a message to its destination.

This includes, e.g., indicating whether timeouts or cir-

cuit breakers are used to avoid the sender to continue

waiting for an answer when the receiver has failed.

Kiali is an observability console, which comes na-

tively integrated with Istio. It exploits Istio envoy

proxies to store the interactions they proxy, so as to

trace the interactions among deployed services. Each

interaction is stored together with its metadata, in-

cluding the source and target Kubernetes workloads

or services, and whether the interaction successfully

completed. Kiali then exploits such interactions to

build different types of graphs, which enable visual-

ising them at different abstraction levels in the Kiali

dashboard, and which can be exported to JSON graph

data ﬁles. In the rest of this paper, we consider Kiali

graph modelling monitored service interactions, viz.,

service graphs.

https://kiali.io/docs/features/topology

Ofﬂine Mining of Microservice-based Architectures

topology

graph

Kubernetes

manifest files

Mining Topology Fragments Node Refinement

Kiali graph

(existing) app deployment

YAML

JSON

edge

μTOSCA

refined

topology

graph

mR mB

μTOSCA

edge

Connecting Topology Fragments

edge

μTOSCA

Interaction Refinement

mR mB

d c

μTOSCA

edge

Step 1: Mining

Step 2: Refinement

Figure 3: Our two-steps technique for mining MSAs from their Kubernetes deployment.

3 MINING MSAs

Our technique for mining MSAs consists of a pipeline

of two steps (Figure 3), which processes the Kuber-

netes manifest ﬁles specifying the application deploy-

ment and a JSON graph data ﬁle specifying the graph

generated by Kiali while monitoring an existing ap-

plication deployment. Such inputs are ﬁrst processed

by the mining step, which elicits the nodes and inter-

actions forming the target MSAs, and which produces

a ﬁrst corresponding µTOSCA topology graph. The

graph is then passed to the reﬁnement step, which dis-

tinguishes the types of the nodes therein, and which

characterises the mined relationships by indicating

whether dynamic discovery, circuit breakers, or time-

outs are used in the corresponding interactions.

3.1 Step 1: Mining

The mining step processes the available inputs to de-

termine the nodes and interactions forming the tar-

get MSA. Firstly, µTOSCA topology fragments are

mined from the Kubernetes manifest ﬁles, by es-

sentially mapping Kubernetes entities to µTOSCA

nodes. The topology graph is then completed by con-

necting mined topology fragments based on the run-

time interactions contained in the Kiali graph.

Mining Topology Fragments. Topology fragments

are extracted from the Kubernetes manifest ﬁles by

ﬁrst mapping the workloads and services speciﬁed

therein to µTOSCA nodes. Each Kubernetes work-

load speciﬁes the pod conﬁguration for a component

of the target MSA, by indicating the Docker container

edge

frontend

orders.svc

orders

orders.svc

(ClusterIP)

orders

(Deployment)

frontend.svc

(NodePort)

frontend

(Deployment)

(b)

(c)

Kubernetes μTOSCA

(a)

queue

(Deployment)

queue

frontend.svc

Figure 4: Three examples of µTOSCA topology fragments

mined from Kubernetes services and workloads.

from which it runs and its target conﬁguration (Sec-

tion 2.2). Therefore, each Kubernetes workload is

mapped to a µTOSCA node of type Service. The type

may change in the reﬁnement step, if the workload is

used to deploy an integration component or database.

A Kubernetes service instead implements a mes-

sage routing component balancing the trafﬁc sent to

the replicas of the pod they manage, speciﬁed by a

Kubernetes workload (Section 2.2). They are hence

mapped to µTOSCA nodes of types MessageRouter,

which are directly speciﬁed to InteractsWith the Servi-

ce node corresponding to the workload they manage.

In addition, if a Kubernetes service is speciﬁed to be a

NodePort or LoadBalancer, its corresponding Messa-

geRouter node is placed within the Edge group. This

reﬂects the fact that NodePort or LoadBalancer ser-

vices that can be invoked by external clients.

Figure 4 illustrates three examples of application

of our topology fragment mining. In case (a), a Ku-

bernetes ClusterIP service orders.svc manages the

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

Deployment workload running the service orders. By

applying our node mining technique, we obtain a

MessageRouter node modelling the Kubernetes ser-

vice, which InteractsWith the Service node modelling

the workload. Case (b) is similar, with the only dif-

ference that the MessageRouter node is placed in the

Edge group, since Kubernetes NodePort services are

exposed to external clients, while the same does not

hold for ClusterIP services. Finally, case (c) con-

siders a Deployment workload used to deploy a mes-

sage queue, without any Kubernetes service balacing

its load. In this case, we obtain a singleton Service

node. Case (c) also provides an example of µTOSCA

node that may be typed as Service only temporarily:

the type of queue might be changed to Asynchrono-

usMessageBroker, if it actually implements an asyn-

chronous message broker (Section 3.2).

Connecting Topology Fragments. The topology

fragments obtained by parsing the Kubernetes man-

ifest ﬁles are interconnected to model the runtime

interactions occurring among the components they

model. This is done by parsing the Kiali graph,

which explicitly models the component interactions

that were monitored in a former deployment of the

application, e.g., in its production deployment.

A monitored interaction is represented as an edge

in the Kiali graph, which connects the node corre-

sponding to the the Kubernetes workload that started

the interaction to the Kubernetes service that was in-

voked.

Each edge is hence mapped to InteractsWith

relationships connecting the Service node modelling

the starting workload to the MessageRouter node

modelling the target Kubernetes service. In addition,

the mined InteractsWith relationship are directly set to

enact dynamic

discovery, given that Kubernetes pre-

scribes to invoke services based on their name and to

rely on Kubernetes’ native DNS to resolve the address

of the actual host to contact.

Figure 5 illustrates an examples of mined Inte-

ractsWith relationship, which connects two of the

topology fragments in Figure 4. The Kiali graph spec-

iﬁes that the Kubernetes workload running the fron-

tend service interacted with the Kubernetes service

managing the replicated orders service. This is mod-

elled by including an InteractsWith relationship con-

necting the corresponding µTOSCA nodes, viz., the

Service frontend and the MessageRouter orders.svc.

Kiali uniﬁes a Kubernetes service with the Kubernetes

worklaod it manages, assuming that Kubernetes services

are used to enact server-side service discovery, as recom-

mended by Kubernetes documentation (https://kubernetes.

io/docs/concepts/services-networking/service).

edge

frontend

orders.svc

orders

frontend.svc

Kiali μTOSCA

frontend

orders

Figure 5: Examples of µTOSCA InteractsWith relationship

mined from the Kiali graph.

3.2 Step 2: Reﬁnement

The mining step produces a “draft” of the µTOSCA

topology modelling the target MSA, in which all

nodes and interactions are recognised, but associated

with default types and properties.

Node Reﬁnement. After the mining step, nodes

are associated with either one of two types: Messa-

geRouter or Service. Whilst nodes types as Messa-

geRouter are truly routing messages in the Kuber-

netes deployment of an MSA (being them obtained

from Kubernetes services), Service is used as the de-

fault type for all other nodes. Nodes initially typed

as Services may however implement other compo-

nents than those running some business logic, namely

databases or asynchronous message brokers. The ob-

jective of the node reﬁnement substep is hence to

identify such nodes and assign them with the corre-

sponding µTOSCA type, viz., Database and Asynch-

ronousMessageBroker.

Database and message brokers can be seen as

“passive” components: despite they reply when being

invoked by other components, they are not proactively

invoking other components (Soldani et al., 2021).

They hence appear as “sink nodes” in the mined

µTOSCA topology graph, meaning that they are tar-

geted by InteractsWith relationships, whilst no such

relationship outgoes from them. The node reﬁnement

substep hence focuses on the Service nodes being sink

nodes, and determines whether they should be rather

typed as Database or AsynchronousMessageBroker.

This is essentially done by looking at the Docker im-

age running in the corresponding Kubernetes work-

load: if such image is one of the ofﬁcial Docker im-

ages for databases or message brokers (Table 1), then

the node’s type is changed to Database or Asynch-

ronousMessageBroker, respectively. Otherwise, the

node continues to be a Service.

As a result, there might be nodes typed as Servi-

ces, despite they are implementing some databases or

message brokers by means of unofﬁcial Docker im-

ages. If this is the case, the application developer can

Ofﬂine Mining of Microservice-based Architectures

reﬁne the generated µTOSCA representation of the

target MSA by suitably changing their types. To sup-

port this, the implementation of our technique (which

we describe in Section 4) not only features the fully

automated mode described above, but also an inter-

active mode prompting developers when a sink node

may implement something different from a Servi-

ce. This enables them to explicitly indicate whether

such node should be typed as Service, Database, or

AsynchronousMessageBroker.

Interaction Reﬁnement. The interaction reﬁnement

substep is intended to characterise mined interactions

by associating them with other properties than the

default dynamic discovery property included during

the mining step. More precisely, it associates each

mined InteractsWith relationship with properties cir-

cuit breaker and timeout, if a circuit breaker or a

timeout is used during the corresponding interactions.

This is done by inspecting the Istio trafﬁc manage-

ment rules deﬁned for the service targeted by each

mined InteractsWith relationship.

Istio trafﬁc management rules are deﬁned in

VirtualServices and DestinationRules (Section 2.3).

VirtualServices allow explicitly setting a timeout

ﬁeld to indicate the maximum amount of time after

which the interaction with the target service is con-

sidered to have failed (Figure 6a). DestinationRules

instead feature a ﬁeld outlierDetection that allows

deﬁning circuit breaking policies, by setting the max-

imum number of tolerated consecutive errors before

the circuit breaker trips, as well as the amount of time

it remains tripped (Figure 6b).

The interaction reﬁnement substep hence checks

whether VirtualServices or DestinationRules are de-

ﬁned for the target of each interaction. To avoid un-

necessarily browsing the input Kubernetes manifest

ﬁles, it relies on the metadata included in the Kiali

graph. Kiali indeed already determines whether the

target of an interaction is reached through a Virtual-

Service or through a DestinationRule deﬁning some

circuit breaking policy. If this is the case, Kiali asso-

ciates the service targeted by a monitored interaction

with properties hasVS or hasCB, respectively. There-

fore, if the property hasVS is set for a service in the

Kiali graph corresponding to the target of a mined

InteractsWith relationship, the interaction reﬁnement

Table 1: Ofﬁcial Docker images of software implementing

a database or message broker.

Databases Message Brokers

cassandra, db2, iris,

mariadb, mongo, mysql,

neo4j, oracle, postgres,

redis, sqlite

activemq, kafka,

mosquito, nats,

rabbitmq

kind : V ir tu alS er vi ce

spec :

hosts :

- o rder s

http :

- ro ute :

- d es tinat io n :

host : orde rs

tim eou t : 0.5 s

(a)

kind : De st in at io nRule

spec :

host : cat alo g

tr affic Po li cy :

co necti on Po ol :

tcp : { max Co nn ect io ns : 1}

http :

ht tp 1M ax Pe nd ing Re qu es ts : 1

ma xR eq ue st sPe rC on ne ct io n : 1

ou tl ie rD ete ct io n :

co ns ec ut iv e5 xx Er ro rs : 1

int erv al : 1

ba se Ej ec tio nT im e : 3m

ma xE je ct io nP er ce nt : 100

(b)

Figure 6: Examples of how (a) timeouts and (b) circuit

breakers can be deﬁned with Istio trafﬁc management rules.

substep looks for the corresponding VirtualService in

the Kubernetes manifest ﬁles. It then checks whether

such VirtualService sets some timeout (similarly to

Figure 6a). If this is the case, timeout is set on the cor-

responding interactsWith relationship, to model that a

timeout is used therein.

Similarly, if the service in the Kiali graph corre-

sponding to the target of a mined InteractsWith re-

lationships has the property hasCB set, the interac-

tion reﬁnement substep looks for the corresponding

DestinationRule in the Kubernetes manifest ﬁles. The

interaction reﬁnement substep then checks whether

such DestinationRule sets a circuit breaking policy

through the outlierDetection ﬁeld (similarly to

Figure 6b). If this is the case, circuit breaker is set

on the corresponding InteractsWith relationship, to

model that a circuit breaker is used therein.

Instead, if none of the above applies, e.g., since no

Istio trafﬁc management rule is applied to the compo-

nent targeted by a relationship, the latter remains with

the only dynamic discovery property set.

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

μTOM

Kubernetes

manifest files

JSON

Kiali graph

YAML

μTOSCA

specification

Parsers

Refiners GraphMain

Writer

YAML

Figure 7: Architecture of µTOM.

4 PROTOTYPE

To assess the feasibility of our mining technique, we

developed the µTOM (µTOSCA Ofﬂine Miner), an

open source prototype tool implemented in Java.

µTOM provides a command-line interface that auto-

matically generates the µTOSCA speciﬁcation of an

MSA, given the Kubernetes manifest ﬁles specifying

the corresponding application deployment and a Kiali

graph obtained from an existing deployment. It does

so by featuring both the automated and interactive

modes described in Section 3.2.

µTOM consists of the ﬁve components shown in

Figure 7. Main implements the command-line in-

terface offered by µTOM and coordinates the other

components for enacting our two-steps mining tech-

nique. It ﬁrst invokes Parsers, which resembles the

Java classes implementing the logic for running the

mining step (Section 3.1) by parsing the input Kuber-

netes manifest ﬁles and Kiali graph. The mined MSA

is represented by istantiating Graph, and returned to

Main. The latter then invokes Reﬁners, which resem-

bles the Java classes implementing the logic of the

reﬁnement step (Section 3.2). This results in updating

the Graph instance, which is reﬁned by updating the

types associated with mined nodes and by character-

izing mined relationships. The reﬁned Graph instance

is returned to Main, which passes it to Writer. The lat-

ter implements the logic for marshalling the received

Graph instance to a µTOSCA speciﬁcation in YAML,

which constitutes the output of µTOM.

µTOM can be run by issuing:

java - jar microT OM -1.0. jar W ORK DIR [ - i]

“microTOM-1.0.jar” is the executable ﬁle JAR ﬁle

obtained from µTOM’s sources. “WORKDIR” is instead

the path to a directory containing the Kubernetes man-

ifest ﬁles and the Kiali graph to be passed as input to

µTOM, and where µTOM will also store the gener-

ated µTOSCA ﬁle. Finally, the option “-i” enables

https://github.com/di-unipi-socc/microTOM.

activating the interactive reﬁnement mode, prompt-

ing the user with the nodes that remain assigned with

type Service, even if their interactions are such that

they may implement some different component. By

default, µTOM however runs the fully automated re-

ﬁnement mode.

5 CASE STUDIES

To assess our approach, we exploited µTOM to mine

the MSA of two open source, third-party applications,

namely Robot Shop (Instana, 2021) and Book Info (Is-

tio, 2021). We actually compared the MSA mined by

µTOM (in its fully automated mode) with that de-

clared in the online available documentation of the

considered applications, as well as with that mined

by µMINER, the state-of-the-art tool for mining the

µTOSCA speciﬁcation of an MSA. As a result, we

observed that µTOM effectively mined the MSA of

the considered applications (as per what declared in

their documentation), and that it generated more in-

formative µTOSCA speciﬁcations if compared with

µMINER. For instance, µTOM identiﬁed the use of

timeouts and circuit breakers in mined interactions,

which were not instead detected by µMINER.

We hereafter report on the mining of the MSAs of

Robot Shop (Section 5.1) and Book Info (Section 5.2).

To enable repeating all our experiments, we anyhow

published all the necessary inputs on GitHub.

5.1 Robot Shop

Robot Shop (Instana, 2021) is a microservice-based

application simulating an e-commerce website selling

robots. It does not include failure handling mecha-

nisms like timeouts or circuit breakers, and its MSA

is documented to be as shown in Figure 8a.

We run both µMINER and µTOM on the pub-

licly available Kubernetes deployment of Robot Shop

to automatically generate a µTOSCA representation

of the MSA of Robot Shop. In both cases we ex-

ploited the Robot Shop’s load component to gener-

ate workload for the application and monitor the run-

time interactions among its components. In the case

of µMINER, we used the load directly in its dynamic

mining step (Muntoni et al., 2021). In the case of

µTOM, we instead deployed the application, used

the load component to generate workload, and then

downloaded the Kiali graph from the running deploy-

ment, which we then provided as input to µTOM

https://github.com/di-unipi-socc/microTOM/tree/

main/data/examples.

Ofﬂine Mining of Microservice-based Architectures

redis

payment

mongodb

mysql

rabbitmq

dispatch

users

catalogue

cart

web

ratings

shipping

(a)

edge

mysql

mysql.svc

ratings

ratings.svc

shipping

shipping.svc

catalogue

catalogue.svc

web

web.svc

mongodb

mongodb.svc

users

users.svc

redis

redis.svc

cart

carts.svc

payment

payment.svc

rabbitmq.svc

rabbitmq

dispatch

dispatch.svc

load

d d

(b)

edge

mysql

mysql.svc

ratings

ratings.svc

shipping

shipping.svc

catalogue

catalogue.svc

web

web.svc

mongodb

mongodb.svc

users

users.svc

redis

redis.svc

cart

carts.svc

payment

payment.svc

rabbitmq.svc

rabbitmq

dispatch

dispatch.svc

d d

(c)

Figure 8: MSAs of Robot Shop (a) taken from its documen-

tation and mined with (b) µMINER and (c) µTOM. Issues

in the MSA mined by µMINER are highlighted in yellow.

itself. The µTOSCA representations generated by

µMINER and µTOM are shown in Figure 8b and Fig-

ure 8c, respectively.

By looking at Figure 8, we can observe that both

µMINER and µTOM successfully identiﬁed all com-

ponents forming the MSA of Robot Shop. Given that

they are deployed as Kubernetes workloads managed

by Kubernetes services, each mined node is proxied

by a MessageRouter node implementing the corre-

sponding Kubernetes service. At the same time, we

can observe that there are some issues in the nodes

mined by µMINER, viz., (i) mongodb and mysql are

not recognised to be Databases, but rather typed as

Services, and (ii) the load component used to gener-

ated workload is included in the mined MSA, even

if it is not truly part of the MSA of Robot Shop.

The same does not hold for µTOM, which success-

fully identiﬁes mongodb and mysql as Databases, and

which does not include the load component in the

mined MSA.

In addition, whilst both µMINER and µTOM ef-

fectively characterise the mined InteractsWith rela-

tionships, two relationships are missing in the MSA

mined by µMINER. The latter does not include the

interactions from payment and catalog to the Ku-

bernetes services managing rabbitmq and mongodb,

respectively. As a result, the portions including

rabbitmq and mongodb results to be disconnected

from the rest of the MSA in the µTOSCA topol-

ogy mined by µMINER. The same does not hold for

the µTOSCA topology mined by µTOM, which suc-

cessfully identiﬁes all the interactions in the MSA of

Robot Shop.

5.2 Book Info

Book Info (Istio, 2021) is a microservice-based ap-

plication developed to experiment Istio. It consists

of the four services in Figure 9a. We instrumented

its Kubernetes deployment by exploiting Istio to set

a timeout in the interactions between productpage

and details, and a circuit breaker in that between pro-

ductpage and reviews. This enabled us to show that

µTOM outperforms µMINER in determining whether

timeouts or circuit breakers are used in interactions.

The above can be readily observed by looking at

the µTOSCA representations of the MSA of Book

Info generated by µMINER and µTOM, which are

shown in Figure 9b and Figure 9c, respectively.

Whilst both µMINER and µTOM effectively mined all

components and interactions forming Book Info, only

µTOM successfully detected the timeout and circuit

breaker used in the InteractsWith relationships outgo-

ing from productpage.

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

productpage

reviews

details

ratings

Istio timeout set

Istio circuit breaker set

(a)

edge

details

productpage

reviewsreviews.svc

ratingsratings.svc

productpage.svc

details.svc

(b)

edge

details

productpage

reviewsreviews.svc

ratingsratings.svc

productpage.svc

details.svc

d t

d c

(c)

Figure 9: MSAs of Book Info (a) taken from its documenta-

tion and mined with (b) µMINER and (c) µTOM. Issues in

the MSA mined by µMINER are highlighted in yellow.

5.3 Summary

Table 2 shows the success percentages of µMINER

and µTOM in identifying and typing nodes in our case

studies. The table also shows their success percent-

ages in identifying the interactions occurring among

such nodes, and in characterising such interactions

by eliciting whether dynamic discovery, timeouts, or

circuit breakers were used therein. In both cases,

the success percentages are counted as the ratio of

successfully identiﬁed/characterised nodes and inter-

actions over all those appearing in the applications

in our case studies. The numbers in Table 2 again

show how µTOM outperformed µMINER in mining

the MSAs of the two considered applications.

Table 2: Success percentages of µMINER and µTOM in

identifying and characterising the nodes and interactions in

the applications considered in our case studies.

nodes interactions

identiﬁed typed identiﬁed characterised

µMINER 100% 87.5% 97.3% 91.9%

µTOM 100% 100% 100% 100%

6 RELATED WORK

Several solutions have been proposed for mining the

MSAs of existing applications. The closest to ours

is µMINER (Muntoni et al., 2021), which is the only

existing solution mining a µTOSCA representation

of an application’s MSA from its Kubernetes deploy-

ment. As we already discussed in Section 1, µMINER

requires to run in a Kubernetes cluster with root priv-

ileges. This limits the applicability of µMINER, e.g.,

not allowing it to consider an existing deployment of

the application, like its production deployment. The

same does not hold for our technique, which can

work with existing Kubernetes application deploy-

ments, therein included their production deployment.

We also compared µMINER and our technique in their

mining capabilities, showing that our technique out-

performs µMINER in the quality of the µTOSCA rep-

resentation of mined MSAs (Section 5).

Other approaches worth mentioning are (Ma et al.,

2018), (Rademacher et al., 2020), (Alshuqayran et al.,

2018), (Granchelli et al., 2017b), and (Granchelli

et al., 2017a). They introduce different techniques,

which differ from ours in the mining approach and in

the generated representation of mined MSAs. As for

the latter, we generate a representation of the mined

MSA where components are distinguished among ser-

vices, integration components, and databases. This

is intended to enable checking whether the mined

MSA is affected by some architectural smells, by

giving the mined MSA to smell detection tools like

µFRESHENER (Brogi et al., 2020). The same is not

supported by (Ma et al., 2018), (Rademacher et al.,

2020), (Alshuqayran et al., 2018), (Granchelli et al.,

2017b), and (Granchelli et al., 2017a), which do not

distinguish the type of mined components.

As for the enacted mining approach, (Ma et al.,

2018), (Rademacher et al., 2020), and (Alshuqayran

et al., 2018) reconstruct the MSA of an application

by statically analysing the source code of its compo-

nents. They hence follow a “white-box” approach, as-

suming the availability of the source code of the com-

ponents forming a MSA. Our mining technique in-

stead works also in “black-box” scenarios, viz., when

the source code of application components is not

available. We indeed only require the manifest ﬁles

specifying its deployment in Kubernetes and the run-

time interactions monitored among its components.

In addition, while our mining solution can be fully

automated, both (Rademacher et al., 2020) and (Al-

shuqayran et al., 2018) require developers to manu-

ally intervene while mining a MSA.

Similar considerations apply to (Granchelli et al.,

2017a) and (Granchelli et al., 2017b), which also em-

Ofﬂine Mining of Microservice-based Architectures

ploy a white-box, semi-automated technique to mine

a MSA from the source code of its components. Such

technique is semi-automated since it relies on devel-

opers to manually reﬁne the obtained MSA by remov-

ing the infrastructure facilities (e.g., service discov-

ery components) used to let application components

interoperate. At the same time, (Granchelli et al.,

2017a) and (Granchelli et al., 2017b) are a step closer

to ours, given that they enrich the mined MSA by rely-

ing on runtime monitored interactions. Our technique

hence differs from what proposed in (Granchelli et al.,

2017a) and (Granchelli et al., 2017b), since it can

fully automate the mining of a MSA, and since it

can work in black-box scenarios, i.e., when the source

code of some application components is not available.

Finally, it is worth relating our mining technique

with existing system for monitoring Kubernetes-

based application deployments. For instance, Kiali

(Section 2.3), KubeView (Coleman, 2021), and

WeaveScope (Weaveworks, 2021) are three open

source tools for monitoring and visualising the struc-

ture of applications deployed with Kubernetes. They

differ from our technique mainly because of their ulti-

mate goal, which is to enable visualising the deployed

Kubernetes objects (e.g., workloads and services) and

their interactions. Our solution instead generates a

machine-readable representation of a MSA, whose

components are distinguished among services, inte-

gration components, and databases forming a MSA,

and where component interactions are characterised

by indicating whether client-side service discovery,

timeouts, or circuit breakers are used therein.

Similar considerations apply to Instana (Instana,

2021), another tool for visualising applications de-

ployed with Kubernetes. Instana (Instana, 2021) is

however closer to our mining technique in the gen-

erated representation, given that it distinguishes the

deployed component among services and databases.

Our mining technique goes beyond this, by recog-

nising whether deployed components are imple-

menting message routing/brokering patterns, and

whether service discovery, timeouts, or circuit break-

ers are used in component interactions. Additionally,

whilst Instana (Instana, 2021) is a commercial and

subscription-based tool, an open source implementa-

tion of our technique is publicly available on GitHub.

7 CONCLUSIONS

We presented a technique for mining MSAs from

their Kubernetes deployment. Our technique also in-

puts the component interactions monitored in a for-

mer deployment with Kiali, and it process all such

μMiner

μFreshener

μTOSCA

Kiali graph

JSON

μTOM

Kubernetes

manifest files

YAML

"smell-free"

μTOSCA

YAML

Figure 10: Updated µTOSCA toolchain. Existing tools are

in light blue, whilst the newly introduced tool is darker.

inputs ofﬂine. As a result, it automatically generates

a representation of the mined MSA in µTOSCA, a

microservice-oriented proﬁle of the TOSCA standard.

We have also presented µTOM, a prototype im-

plementation of our mining technique. µTOM plugs

into the µTOSCA toolchain (Soldani et al., 2021),

as shown in Figure 10. It actually provides an of-

ﬂine alternative to µMINER (Muntoni et al., 2021) to

generate µTOSCA representations of MSAs, which

can still be processed by µFRESHENER (Neri et al.,

2020) to identify and resolve the architectural smells

therein. µTOM showed to outperform µMINER in

generating more informative representations of mined

MSAs, without requiring to run the target applica-

tion in a suitably conﬁgured testing environment, but

rather by processing the information monitored with

Kubernetes-native monitoring in former application

deployments, e.g., production deployments. If such

information is not available, e.g., since Kubernetes-

native monitoring is not enabled, one could anyhow

still use µMINER to mine the MSA of an application.

We anyhow plan to further enhance the mining ca-

pabilities of µTOM and, more generally, of our min-

ing technique. For instance, we plan to enhance the

detection of the type of mined components, which

currently detects message brokers or databases when

they run from ofﬁcial Docker images of software dis-

tributions known to implement such components. The

type of component run by a Docker image may be

detected by exploiting machine learning techniques,

e.g., similary to what done in (Guidotti et al., 2019)

to predict the popularity of Docker images, or by in-

specting them with approaches like that proposed in

DockerFinder (Brogi et al., 2017).

We also plan to enable our technique to work with

other technologies than Kubernetes, Istio, and Kiali.

For instance, we plan to include support for mani-

fest ﬁles specifying the deployment of a microservice-

based application with Docker Compose/Swarm. We

also plan to support processing the interactions mon-

itored with other tracing tools, e.g., Jaeger (Jaeger,

2021) or Zipkin (OpenZipkin, 2021).

CLOSER 2022 - 12th International Conference on Cloud Computing and Services Science

REFERENCES

Alshuqayran, N., Ali, N., and Evans, R. (2018). To-

wards micro service architecture recovery: An em-

pirical study. In Gorton, I., Buhnova, B., Ernst, N.,

and Szyperski, C., editors, 2018 IEEE International

Conference on Software Architecture, pages 47–4709.

IEEE. https://doi.org/10.1109/ICSA.2018.00014.

Brogi, A., Neri, D., and Soldani, J. (2017). Dockerﬁnder:

Multi-attribute search of docker images. In 2017

IEEE International Conference on Cloud Engineering

(IC2E), pages 273–278. https://doi.org/10.1109/IC2E.

2017.41.

Brogi, A., Neri, D., and Soldani, J. (2020). Freshen-

ing the air in microservices: Resolving architectural

smells via refactoring. In Yangui, S., Bouguettaya,

A., Xue, X., Faci, N., Gaaloul, W., Yu, Q., Zhou,

Z., Hernandez, N., and Nakagawa, E. Y., editors,

Service-Oriented Computing – ICSOC 2019 Work-

shops, pages 17–29, Cham. Springer. https://doi.org/

10.1007/978-3-030-45989-5 2.

Coleman, B. (2021). KubeView. https://github.com/

benc-uk/kubeview.

Forti, S., Bisicchia, G., and Brogi, A. (2022). Declar-

ative continuous reasoning in the cloud-IoT contin-

uum. Journal of Logic and Computation. https:

//doi.org/10.1093/logcom/exab083.

Google Cloud (2021). Online Boutique. https://github.com/

GoogleCloudPlatform/microservices-demo.

Granchelli, G., Cardarelli, M., Di Francesco, P., Malavolta,

I., Iovino, L., and Di Salle, A. (2017a). MicroART:

A software architecture recovery tool for maintain-

ing microservice-based systems. In Malavolta, I. and

Capilla, R., editors, 2017 IEEE International Confer-

ence on Software Architecture Workshops, pages 298–

302. IEEE. https://doi.org/10.1109/ICSAW.2017.9.

Granchelli, G., Cardarelli, M., Di Francesco, P., Malavolta,

I., Iovino, L., and Di Salle, A. (2017b). Towards

recovering the software architecture of microservice-

based systems. In Malavolta, I. and Capilla, R., edi-

tors, 2017 IEEE International Conference on Software

Architecture Workshops, pages 46–53. IEEE. https:

//doi.org/10.1109/ICSAW.2017.48.

Guidotti, R., Soldani, J., Neri, D., Brogi, A., and Pe-

dreschi, D. (2019). Helping your Docker images

to spread based on explainable models. In Brefeld,

U., Curry, E., Daly, E., MacNamee, B., Marascu,

A., Pinelli, F., Berlingerio, M., and Hurley, N., edi-

tors, Machine Learning and Knowledge Discovery in

Databases, pages 205–221, Cham. Springer. https:

//doi.org/10.1007/978-3-030-10997-4

13.

Hohpe, G. and Woolf, B. (2003). Enterprise Integration

Patterns: Designing, Building, and Deploying Mes-

saging Solutions. Addison-Wesley, 1st edition.

Instana (2021). Instana. https://www.instana.com.

Instana (2021). Robot Shop. https://github.com/instana/

robot-shop.

Istio (2021). Book Info. https://github.com/istio/istio/tree/

master/samples/bookinfo.

Jaeger (2021). Jaeger. https://www.jaegertracing.io.

Kratzke, N. and Quint, P.-C. (2017). Understanding cloud-

native applications after 10 years of cloud comput-

ing - a systematic mapping study. Journal of Systems

and Software, 126:1–16. https://doi.org/10.1016/j.jss.

2017.01.001.

Ma, S., Fan, C., Chuang, Y., Lee, W., Lee, S., and Hsueh,

N. (2018). Using service dependency graph to ana-

lyze and test microservices. In Reisman, S., Ahamed,

S. I., Demartini, C., Conte, T. M., Liu, L., Clay-

comb, W. R., Nakamura, M., Tovar, E., Cimato, S.,

Lung, C.-H., Takakura, H., Yang, J.-J., Akiyama, T.,

Zhang, Z., and Hasan, K., editors, 2018 IEEE 42nd

Annual Computer Software and Applications Confer-

ence, pages 81–86. IEEE. https://doi.org/10.1109/

COMPSAC.2018.10207.

Muntoni, G., Soldani, J., and Brogi, A. (2021). Min-

ing the architecture of microservice-based applica-

tions from their kubernetes deployment. In Zirpins,

C., Paraskakis, I., Andrikopoulos, V., Kratzke, N.,

Pahl, C., El Ioini, N., Andreou, A. S., Feuerlicht,

G., Lamersdorf, W., Ortiz, G., Van den Heuvel, W.-

J., Soldani, J., Villari, M., Casale, G., and Plebani,

P., editors, Advances in Service-Oriented and Cloud

Computing, pages 103–115, Cham. Springer. https:

//doi.org/10.1007/978-3-030-71906-7 9.

Neri, D., Soldani, J., Zimmermann, O., and Brogi,

A. (2020). Design principles, architec-

tural smells and refactorings for microser-

vices: a multivocal review. SICS Software-

Intensive Cyber-Physical Systems, 35(1):3–15.

https://doi.org/10.1007/s00450-019-00407-8.

OASIS (2020). TOSCA simple proﬁle in

YAML. v1.3, https://docs.oasis-open.org/

tosca/TOSCA-Simple-Proﬁle-YAML/v1.3/

TOSCA-Simple-Proﬁle-YAML-v1.3.pdf.

OpenZipkin (2021). Zipkin. https://zipkin.io.

Rademacher, F., Sachweh, S., and Z

undorf, A. (2020).

A modeling method for systematic architecture re-

construction of microservice-based software systems.

In Nurcan, S., Reinhartz-Berger, I., Soffer, P.,

and Zdravkovic, J., editors, Enterprise, Business-

Process and Information Systems Modeling, pages

311–326, Cham. Springer. https://doi.org/10.1007/

978-3-030-49418-6

21.

Soldani, J., Muntoni, G., Neri, D., and Brogi, A. (2021).

The µTOSCA toolchain: Mining, analyzing, and

refactoring microservice-based architectures. Soft-

ware: Practice and Experience, 51(7):1591–1621.

https://doi.org/10.1002/spe.2974.

Soldani, J., Tamburri, D. A., and Van Den Heuvel, W.-J.

(2018). The pains and gains of microservices: A sys-

tematic grey literature review. Journal of Systems and

Software, 146:215–232. https://doi.org/10.1016/j.jss.

2018.09.082.

Weaveworks (2021). WeaveScope. https://www.weave.

works/oss/scope.

Zimmermann, O. (2017). Microservices tenets.

Computer Science: Research and Develop-

ment, 32(3–4):301–310. https://doi.org/10.1007/

s00450-016-0337-0.

Ofﬂine Mining of Microservice-based Architectures