Model-based Recovery Connectors for Self-adaptation

and Self-healing

Emad Albassam, Hassan Gomaa and Daniel Menascé

Department of Computer Science, George Mason University, Fairfax, Virginia, U.S.A.

Keywords: Self-adaptation, Self-configuration, Self-healing, Dynamic Software Adaptation, Autonomic Computing,

Component Recovery, Recovery Patterns, MAPE-K Loop Model, Recovery Connectors, State Machines.

Abstract: Self-healing and self-configuration are highly desirable properties in software systems so that components

can dynamically adapt to changing environments and recover from failure with minimal human intervention.

This paper discusses a model-based approach for self-healing and self-configuration using recovery

connectors. A recovery connector extends connectors in component-based software architectures and service-

oriented architectures with self-healing and self-configuration capabilities so that a component or service can

be dynamically adapted and recovered from failures. The design of the recovery connector is based on the

MAPE-K loop model and can handle both recovery and adaptation.

1 INTRODUCTION

Connectors in component-based software

architectures (CBSA) are objects that interconnect

components and encapsulate a communication

protocol (Gomaa, 2011). Connectors encapsulate

frequently used communication patterns such as

asynchronous communication and synchronous

communication with reply. Previous papers

investigated adaptation connectors which are used to

adapt service-oriented software systems after original

deployment (Gomaa et al., 2010).

This paper investigates how a model-based

recovery connector integrates self-healing and self-

configuration capabilities. Recovery connectors are

used to separate adaptation and recovery concerns

from service concerns so that a service can be

transparently adapted and recovered from failures.

Recovery connectors are described for

architectural communication patterns that are

frequently used in service-oriented architectures

(SOA). The main architectural pattern in a SOA is the

client/coordinator/service pattern in which a

coordinator is an intermediary between clients and

service, with the goal of allowing services to be

autonomous and relatively independent of each other.

Within this overarching pattern, several other

communication patterns are used including

synchronous communication with reply,

asynchronous communication with callback, and

various brokering patterns including service

registration, and brokered communication.

Software adaptation involves dynamically

replacing, adding, or removing service, coordinator,

or client components at run-time in service-oriented

applications. Software recovery involves

dynamically replacing service, coordinator, or client

components after a run-time failure.

The contributions of this paper are the design and

validation of recovery connectors that dynamically

adapt and recover stateless and stateful services,

when client requests are idempotent, for different

architectural communication patterns in service-

oriented architectures.

The paper is organized as follows: Section 2

highlights key concepts and assumptions. Section 3

discusses the design of recovery connectors. Section

4 describes how recovery connectors can be used in

different SOA patterns. Section 5 contains validation

results. Section 6 discusses related work. Section 7

concludes the paper and discusses future work.

2 KEY CONCEPTS

This section describes the key concepts for providing a

systematic and reusable approach for self-healing and

self-configuration of CBSAs (Taylor et al., 2009).

Albassam, E., Gomaa, H. and Menascé, D.

Model-based Recovery Connectors for Self-adaptation and Self-healing.

DOI: 10.5220/0006005900790090

In Proceedings of the 11th International Joint Conference on Software Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, pages 79-90

ISBN: 978-989-758-194-6

Autonomic Control. Manual management of large

and complex software systems is difficult and costly.

Consequently, such systems should have the

following autonomic properties: self-healing, self-

configuration, self-optimization, and self-protection

(Kephart and Chess, 2003). The MAPE-K loop model

is widely used to implement autonomic controllers

and consists of four activities (monitoring, analysis,

planning, and execution) that operate on a

knowledge-base of the system. We use the general

MAPE-K loop model to support self-healing and self-

configuration of autonomic services.

Recovery Connectors. Recovery connectors are

used to separate adaptation and recovery concerns

from service concerns so that a service can be

transparently adapted and recovered from failures.

Recovery Patterns. A recovery pattern defines how

components in an architectural pattern can be

dynamically relocated and recovered to a consistent

state after a component has failed.

Message-Based Transactions. A transaction in

CBSAs is defined by Kramer and Magee as an

information exchange between multiple components

through messages (Kramer and Magee, 1990) while a

transaction in transactional processing systems is

defined as an atomic unit of work (Bernstein and

Newcomer, 2009). We combine these two definitions

as: a transaction is an information exchange between

two or more components through messages such that

either all messages in a transaction are eventually

exchanged or none of them are.

We make the following assumptions here:

• Only one component can fail permanently at a

time based on the fail-stop failure model

(Avizienis et al., 2004) in which components do

not send any erroneous messages but simply

cease functioning when they fail. Furthermore,

we assume that failures are not caused by

malicious attacks.

• Message delivery uses a reliable network

transport protocol.

• Recovery connectors do not fail.

• Clocks are synchronized between all nodes.

• Services can be either stateless of stateful with

idempotent operations.

3 RECOVERY CONNECTORS

This section describes the design of the basic

structure of a recovery connector for service-oriented

architectures. We assume that there are multiple

clients and a single service that processes multiple

client requests concurrently. The service responds to

each request from the client. The next section shows

how the same recovery connector design can handle

adaptation and recovery in other, more complex

architectural patterns.

The recovery connector manages transactions

between a client and a service that comprise either

single request/response messages or a dialog.

3.1 Design of the Recovery Connector

The service recovery connector (fig. 1) behaves as a

proxy for the service by receiving requests from

clients and then forwarding these requests to the

service. The recovery connector also receives

responses from the service, which are then forwarded

to requesting clients.

To ensure safe adaptation at run-time and

recoverability of service failures, the service recovery

connector must keep track of the transactions that the

service is currently engaged in and must maintain

messages (i.e. requests and responses) that pass

through it, so that these messages can be held during

adaptation and can be recovered in case the service

fails.

The service recovery connector has a control

object (Connector Control in fig. 1) that handles

sending messages to and receiving responses from

application components, and also handles adaptation

and recovery concerns of the service. To facilitate

maintenance of application messages, requests and

responses are stored by the connector in queues

located at the Service Request Manager and the

Service Response Manager (fig. 1), respectively.

Each manager is provided with a coordinator

component for controlling the queues it manages. The

goal of these coordinators is to separate the concerns

of queue management from adaptation and recovery

concerns handled by Connector Control.

3.2 Service Request Manager

Every request sent by a client to the service passes

through the Service Request Coordinator (fig 1). The

Service Request Coordinator maintains three queues

for storing client requests based on the status of these

requests, as follows:

Service Pending Queue (SPQ). The SPQ stores

requests received by the recovery connector from

clients but that have not yet been forwarded to the

service. The purpose of this queue is to buffer

requests for the service so that any requests received

by the connector while the service is being

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

<<connector>>

:Service Recovery Connector

<<control>>

:Service Response Manager

<<control>>

:Service Request Manager

:Response Recovery Queue

(RRQ)

:Response Forwarding Queue

(RFQ)

:Service Pending Queue

(SPQ)

Request

ACK

Response

Forward Response

Response

<<service>>

:Service

<<coordinator>>

:Coordinator

<<client>>

:Client

Queue Request

Dequeue Request

Queue Request

Dequeue Request

Queue Response

Dequeue Response

Forward Request

Forward Response

Transaction Completed

Forward Response

Transaction Completed

Queue Response

Remove Transaction Responses

<<coordinator>>

:Service Request

Coordinator

<<coordinator>>

:Service Response

Coordinator

:Service Recovery Queue

(SRQ)

Queue Request

Remove Transaction Requests

:Service Active Queue

(SAQ)

:Active Transactions

Count

Increment

Decrement

<<state-dependent-

control>>

:Connector Control

Figure 1: Design of service recovery connector showing messages during normal execution.

dynamically adapted or is in the failed state are held

in this queue until the service becomes active again.

Thus, the SPQ ensures that no requests to the service

are lost due to dynamic adaptation or recovery.

Service Active Queue (SAQ). This queue stores

client requests that have been forwarded to the service

but do not have corresponding service responses at

the recovery connector, either because the service is

still processing the request and has not generated the

corresponding response yet or because the service

response was lost due to service failure.

The service recovery connector uses this queue to

determine pending requests that must be processed by

the service first before the service can be dynamically

adapted. Furthermore, the recovery connector uses

this queue to recover requests that were lost by the

service (due to service failure) before the

corresponding responses of these requests are

received by the service recovery connector.

Service Recovery Queue (SRQ). This queue

stores client requests that have corresponding service

responses at the service recovery connector. This

queue ensures that previous requests of each dialog

that the service is currently engaged in are stored in

SRQ so that these dialogs can also be recovered in

case they were interrupted due to service failure.

3.3 Service Response Manager

Responses sent by the service are received by the

Service Response Coordinator (fig. 1). The Service

Response Coordinator maintains two queues for

storing service responses:

Response Forwarding Queue (RFQ): stores

responses from the service that have been received by

the recovery connector but have not been yet

forwarded to the requesting client.

Response Recovery Queue (RRQ): stores service

responses after they have been forwarded to

requesting clients. This queue ensures that a service

response that has been forwarded by the service

recovery connector to the requesting client cannot be

lost due to client failure. In this case, when the service

recovery connector receives a duplicate request from

a recovered client, the corresponding response is

obtained from the RRQ and then forwarded to the

recovered client, without requiring the service to

process the request again.

3.4 Connector Control State Machine

Connector Control (fig. 1) is a state-dependent

control component that handles recovery and

adaptation of the service by tracking its current state.

While the service is active, Connector Control keeps

Model-based Recovery Connectors for Self-adaptation and Self-healing

track of whether the service is currently engaged in

any transactions with its clients so that it can base its

adaptation and recovery decisions accordingly.

The Connector Control state machine (fig. 2)

consists of two orthogonal state machines (STMs).

Integrated Adaptation and Recovery is the orthogonal

STM that handles service adaption and recovery. The

Message Queue Management state machine is

responsible for notifying the Service Request

Coordinator and the Service Response Coordinator

when a client acknowledges the completion of a

transaction to enable these coordinators to remove the

requests and responses of this transaction from their

queues.

The orthogonal integrated adaptation and

recovery state machine (fig. 3) consists of three

composite states: (1) Active, which defines behaviour

during normal service execution, (2) Adapting, which

defines behaviour during dynamic service adaptation,

and (3) Recovering, which defines behaviour during

recovery.

Connector Control STM

Message Queue

Management STM

Integrated

Adaptation and

Recovery STM

ACK/

Transaction

Completed {to coordinators}

Figure 2: State machine executed by Connector Control.

3.4.1 Normal Service Execution

Initially, Connector Control is in the Waiting for

Request state (fig. 3) indicating that the service is

currently not engaged in any transactions with its

clients. When Connector Control receives a client

request, it forwards the request to the service,

increments the number of active transactions that the

service is currently engaged in, and then transitions to

the Processing state. While in the Processing state,

Connector Control forwards requests to the service

and forwards responses to requesting clients.

Connector Control remains in the Processing state as

long as the service is engaged in one or more

transactions. Furthermore, Connector Control

increments the number of active transactions when it

forwards a request that initiates a new transaction

with the service and decrements this number when it

receives the final response of a transaction from the

service. When Connector Control receives the final

response of the last transaction that the service is

currently engaged in, then Connector Control

forwards that response to the requesting client and

transitions back to the Waiting for Request state.

3.4.2 Dynamic Service Adaptation

In order to safely adapt the service at run-time, the

service must be in a quiescent state (Kramer and

Magee, 1990) in which it is not involved in any

transactions and will not receive any new transactions

from its clients. That is, the service can be removed

or replaced at run-time after the service has sent the

final response of every transaction that it is currently

engaged in. In passivating state, Connector Control

must not forward any requests that initiate new

transactions with the service, so that the service can

eventually transition to a quiescent state where it can

be safely adapted.

If Connector Control receives the Passivate

command from Change Management (Kramer and

Magee, 1990) while it is in the Waiting for Request

state (fig. 3), then the service is not engaged in any

transactions with its clients. It thus transitions

immediately to the Quiescent state, and notifies

Service Requests Coordinator that the service is

quiescent so that it holds all requests it receives from

clients in SPQ. On the other hand, if Connector

Control receives the Passivate command while it is in

the Processing state, then the service is engaged in

one or more transactions with its clients. In this case,

Connector Control transition to the Passivating state,

where the service completes existing transactions.

While in the Passivating state, Connector Control

forwards intermediate requests it receives to the

service and forwards service responses it receives to

requesting clients. Eventually, when all active

transactions are completed, Connector Control

notifies Service Requests Coordinator that the service

is transitioning to the Quiescent State where the

service can be safely adapted.

3.4.3 Service Recovery

While the service is in the recovering state, Connector

Control must not forward any requests and must

ensure that all failed transactions are restarted when

the service is recovered.

Recovering a service from failure is handled by

the connector using the MAPE-K loop model for self-

healing and self-configuration.

The monitoring activity of MAPE-K notifies the

recovery connector of service failure. When

Connector Control receives a failure notification, it

notifies Service Requests Coordinator of the failure

and then transitions to Analyzing Failure Events state

(fig. 3).

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

Figure 3: Integrated adaptation and recovery state machine executed by Connector Control.

The Analyzing Failure Events state corresponds

to the analysis activity of MAPE-K where the

recovery connector identifies all transactions that

were interrupted due to service failure. The service

recovery connector determines that a transaction has

failed if either SAQ or SRQ contains a request that

initiates a transaction with the service but neither

RFQ nor RRQ contains a response that completes that

transaction. When failure analysis is completed,

connector control transitions to Planning for

Recovery state.

The Planning for Recovery state corresponds to

the planning activity of MAPE-K where the recovery

connector determines the recovery plan for the failed

transactions. The plan identifies which requests must

be resent to the recovered service so that failed

transactions are restarted at the recovered service.

The recovery plan is determined by executing the

following recovery policy:

• First, the service recovery connector forwards

previous requests of every failed dialog that the

service was engaged in before it failed. These

requests are recovered from SRQ and are

forwarded sequentially in the same order they

were processed before service failure to ensure

that the recovered service also processes these

requests in that order.

• Second, the recovery connector forwards the

requests of failed transactions queued in SAQ,

which contains pending requests that were lost

by the failed service before the service recovery

connector received the responses to these

requests. Note that at this step, if a request that is

being forwarded is of a dialog, then (from the

previous step) the service must have already

received all previous requests of this dialog.

• Third, the recovery connector forwards all

requests in the SPQ, which are new requests that

Integrated Adaptation and Recovery STM

Active

Recovering

Adapting

First Request/

Forward Request

Increment {Active Transaction Count}

First Request/

Forward Request,

Increment {Active Transaction Count}

Intermediate Request, Final Request/

Forward Request

Passivate/

Notify Passivating

Analyzing Failure

Events

Planning For

Recovery

Component

Recovering

Failed/

Notify Failed

Failure Analysis Results

Recovery Plan

[Active Transactions Count > 0]

Reactivate [Active Transactions Count == 0]]/Notify Active

Executing

Recovery

Plan

Restored Lost Messages

Recovery Plan [Active Transactions Count = 0]

Reactivate/

Notify Active

Final Response

[Active Transactions Count = 1]/

Forward Response,

Decrement {Active Transaction Count}

First Response, Intermediate Response/

Forward Response

Final Response [Active Transactions Count > 1]/

Forward Response,

Decrement {Active Transaction Count}

First Response, Intermediate Response/

Forward Response

Final Response [Active Transactions Count > 1]/

Forward Response,

Decrement {Active Transaction Count}

Waiting For

Request

Processing

Passivate/

Notify Quiescent

Final Response

[Active Transactions Count = 1]/

Forward Response,

Decrement {Active Transaction Count},

Notify Quiescent

Passivating

Request/

Forward Request

Quiescent

Failed/Notify Failed

Reactivate [Active Transactions Count > 0]]/Notify Active

Model-based Recovery Connectors for Self-adaptation and Self-healing

have been received while the service is in the

recovering state, to the recovered service.

The Executing Recovery Plan state corresponds to the

execution activity of MAPE-K where the recovery

connector restores all requests that must be resent to

the recovered service by moving these requests from

SRQ and SAQ to SPQ, as specified in the recovery

plan. When all requests are restored, Connector

Control transitions to the Component Recovering

state in which the connector waits until the service is

relocated and instantiated by Change Management,

and then has its connection with the recovered service

established. Eventually, when Connector Control

receives the Reactive command, Connector Control

transitions to Active State and notifies Service

Requests Coordinator that the service is active so that

Service Requests Coordinator resumes sending

requests queued in SPQ to Connector Control.

3.5 Service Request Coordinator STM

Based on the discussion in the previous section, the

Service Request Coordinator must forward to

Connector Control certain types of client requests

based on the current state of the service, as shown in

fig. 4. While the service is active (fig. 4), the Service

Request Coordinator forwards all client requests it

receives to Connector Control and also queues these

requests in the SPQ.

When the Service Request Coordinator is notified

that the service is passivating, it transitions to the

Passivating state. The behavior of the Service

Requests Coordinator while in this state is similar to

its behavior in the Active state with one exception: in

the Passivating state, the Service Request Coordinator

does not forward to Connector Control any requests

that initiate a new transaction with the service, and

instead, queues such requests in the SPQ. Eventually,

the Service Request Coordinator is notified that the

service has become quiescent, causing the Service

Request Coordinator to transition to the Quiescent

state. While in the Quiescent state, the Service

Request Coordinator does not forward any requests to

Connector Control and instead queues them in the

SPQ. Finally, when service adaptation is completed,

the Service Request Coordinator receives a

notification from Connector Control that the service

is active, causing the Service Requests Coordinator to

transition to the Active state and to forward all

requests queued in the SPQ to Connector Control.

When service failures occur, the Service Request

Coordinator transitions to the Failed state. While in

the Failed state, the Service Request Coordinator

holds all client requests it receives in the SPQ. The

Service Request Coordinator may also receive

messages from the execution activity of MAPE-K to

restore any client requests that were lost due to

service failure. As a result, the Service Request

Coordinator moves these requests from the SRQ and

the SAQ to the head of the SPQ so that these requests

are resent to the recovered service. Finally, when the

service is recovered, the Service Request Coordinator

forwards all requests stored in the SPQ to Connector

Control and then transitions back to Active state.

4 RECOVERY AND

ADAPTATION PATTERNS

This section describes how the recovery connector

design discussed in the previous section can be used

to handle adaptation and recovery of components in

other architectural patterns (Gomaa, 2011).

Typical client/service communication uses the

Synchronous Message Communication with Reply

pattern, in which the client sends a message to the

service and waits for a response. In the

Asynchronous Message Communication with

Callback pattern, a client sends an asynchronous

request to the service but can continue executing and

receive the service response later. The asynchronous

request sent by the client to the service contains a

callback handle that the service uses when it finishes

processing the client request so that it can send the

response back to the client. A client in this pattern

does not send another request to the service until it

receives a response to the previous request.

Since in this pattern, a client sends one request at

a time to the service, the service recovery connector

(shown in fig. 1) handles requests and responses for

this pattern in the same way as for synchronous

communication with reply. Thus, although the client

behaviour is different, the service behaviour is not.

For this reason, the adaptation and recovery for the

Asynchronous Message Communication with

Callback pattern is handled in the same way as that

described in section 3.4.

In service-oriented architectures, a service

registers its name, location and service description

with a broker, which acts as an intermediary between

the clients and the service. In the Service

Registration pattern, the service initiates a

transaction with the broker by sending it a registration

request containing the service information. The

broker then registers the service and sends an

acknowledgement to the service. The service can also

re-register with the broker if it moves its location,

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

Figure 4: State Machine executed by Service Request Coordinator.

which requires another transaction between the

service and the broker.

From the adaptation and recovery point of view,

this pattern can be treated as a client that

communicates with a service using the Synchronous

Message Communication with Reply pattern. Thus,

the adaptation and recovery patterns for this

architectural pattern are exactly the same as those

described in section 3.4.

After the service has registered with the broker,

clients use the broker to locate the service. In the

Broker Handle pattern, a client sends a request to the

broker to obtain the service’s handle. The broker then

sends a response to the client containing the service’s

handle as a parameter. The client then uses the

service’s handle to interact with the service.

In this pattern, a client initiates two sequential

transactions by first initiating a transaction with the

broker to obtain the service’s handle and then by

initiating a transaction with the service using the

service’s callback handle. As a result, these

transactions can fail and be recovered independently

of each other.

A broker is adapted after it has completed all the

requests it has received, including brokering requests

from clients requesting a handle and service requests

for registration. New requests are held up until the

broker has been relocated. In the case of a broker

failure, all requests it is dealing with are aborted and

only restarted when the broker has been relocated and

instantiated. Both adaptation and recovery are carried

out as described in Section 3.

In service-oriented architectures, the goal is to

increase loose coupling between services so that

instead of services depending on each other,

coordinators are provided for situations where

multiple services need to be accessed, and access to

them needs to be coordinated and/or sequenced.

The coordinator may interact with the services

sequentially and/or concurrently. We consider a

coordinator interacting with multiple services as a

compound transaction that can be broken down into

an atomic transaction between the coordinator and

each service. In this case, when any of the services

fail, the service’s recovery connector restarts a failed

transaction with the service without affecting other

transactions that the coordinator is currently engaged

in with other services. Thus, the recovery and

adaptation patterns for services in this pattern are

exactly the same as discussed in section 3.4.

In the case of a client interacting with a

coordinator, if the coordinator needs to be adapted,

then the entire compound transaction must complete

before adaptation. In the coordinator failure, then the

entire compound transaction is aborted and is only

restarted after the coordinator has been recovered.

5 VALIDATION

This approach of self-healing and self-configuration

was validated by means of detailed simulation of self-

healing and self-configuration scenarios by 1)

executing each scenario, 2) simulating and

monitoring the behavior of the recovery connector

during adaptation or recovery, and 3) resuming the

application from a consistent state after recovery or

dynamic adaptation is completed.

Service Requests Coordinator STM

Quiescent Failed

Notify Passivating

Notify Quiescent

Notify Failed

Notify Active/

Forward Request {for every request in SPQ}

Active Passivating

Intermediate Request, Final Request/

Queue Request {into SPQ},

Forward Request {to Connector Control}

Request/

Queue Request {in SPQ}

First Request/

Queue Request {into SPQ}

First Request,

Intermediate Request, Final Request/

Queue Request {into SPQ},

Forward Request {to Connector Control}

Notify Failed

Request/

Queue Request {in SPQ}

Restore Request/

Dequeue Request {from SAQ or SRQ}

Queue Request {into SPQ}

Notify Quiescent

Notify Active/

Forward Request {for every request in SPQ}

Model-based Recovery Connectors for Self-adaptation and Self-healing

Components and connectors in the simulation are

implemented in Java and have a thread of control. In

addition, Java RMI is used as the middleware for

message delivery. The simulation runs on a single

machine. Thus, components are concurrent but

distribution is simulated.

The adaptation and recovery scenarios consist of

simulating adaptation and service failure,

respectively, while three transactions are being

processed. During simulation, every application

message contains in its header (1) a transaction

identifier that uniquely identifies the transaction of

this message, (2) the identifier of the message

producer component, (3) the identifier of the message

consumer component, (4) the timestamp at which the

message producer sent the message, (5) a message

type identifying whether the message initiates a

transaction, completes a transaction, or is an

intermediate message of a transaction, and (6) a

sequence number for detecting duplicate messages.

In the remaining of this section, we use the

notation msg(tid, s, r, ts, p) to represent messages,

where msg can be either request or response, tid is the

transaction identifier of the message, s is the identifier

of message sender, r is the identifier of message

receiver, ts is the timestamp of the message, and p

identifies the message type.

5.1 Recovery Scenario

In the failure scenario, the connector analyzes the

failure and determines which transactions need to be

recovered and sends them to the new service, after the

service has been instantiated on a different node. At

the time of service failure, the execution trace (fig. 5)

revealed that the service was engaged in three

transactions with three clients: two transactions

involving dialogs (transactions c1_1 and c2_1) and

one transaction involving a single request/response

messages (transaction c3_1). At the time of failure,

the execution trace shows that the messages queued

at the connector are as follows:

• SPQ contains no requests that have been received

by the connector but not forwarded to the service.

• SAQ contains three requests (received by the

connector and forwarded to the service):

o request(c2_1, client2, service, 1, begin)

o request(c3_1, client3, service, 1, none)

o request(c1_1, client1, service, 6, end)

• SRQ contains one request (for which a service

response is received at the connector):

o request(c1_1, client1, service, 1, begin)

• RFQ contains one response (received by the

connector but not forwarded yet to the client):

o response(c2_1, service, client2, 6,

intermediate)

• RRQ contains one response (received by the

connector and forwarded to the client)

o response(c1_1, service, client1, 3,

intermediate)

During failure analysis, the execution trace indicates

that the recovery connector determined transactions

c1_1, c2_1, and c3_1 as having failed because none

of them have a response that completes the

transaction in either RFQ or RRQ.

The recovery plan created while the connector is

in the Planning for Recovery state consists of a list

that identifies the messages that must be restored

from the SRQ and the SAQ to recover the failed

transactions. The list obtained from the execution

trace indicates that the first request to be recovered is

request(c1_1, client1, service, 1, begin), which is

queued in the SRQ, since this request was the first

request processed by the service before it failed. The

second request in the list was request(c2_1, client2,

service, 1, begin) queued in the SAQ since this

request was also processed by the service and its

response is queued in the RFQ. The list also contains

actions to recover request(c3_1, client3, service, 1,

none) and request(c1_1, client1, service, 6, end), in

that order, which are queued in the SAQ.

While in the Executing Recovery Plan state, the

connector executed the recovery plan by restoring

messages from the SRQ and the SAQ to the SPQ.

After all messages are recovered, the execution trace

shows that the messages queued in the SPQ (starting

from the head of the SPQ) are as follows:

• request(c1_1,client1,service,1, begin)

• request(c2_1,client2,service,1, begin)

• request(c3_1,client3,service,1, none)

• request(c1_1,client1,service,6, end)

The execution trace also indicates that while the

connector is in the Component Recovering state, it

received a new

request(c4_1, client4, service,1,

none). This request is queued at the tail of the SPQ,

so that it is sent last when the service is recovered.

After the service is recovered, the connector

resumed forwarding requests to the recovered service.

As shown in fig. 5, requests recovered from the SRQ

and SAQ are first resent sequentially, in the same

order specified in the recovery plan. Note that

response(c1_1, service, client1, 3, intermediate) has

already been forwarded to the client before the

service failure, so this response is discarded because

it is a duplicate. Then, new requests queued at the tail

of the SPQ are forwarded to the recovered service.

These requests need not be forwarded sequentially.

At this point, the connector resumes forwarding

requests and responses normally.

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

Figure 5: Fragment of execution trace of simulation for recovery scenario.

5.2 Adaptation Scenario

The goal of the adaptation scenario is to ensure that

the connector behavior handles dynamic service

adaptation without losing requests. In this scenario,

the connector transitions to the passivating state first

before adaptation takes place, until the three

transactions are completed. The execution trace (fig.

6) indicates that Connector Control received the

Passivate command while the service is engaged in

the three transactions. The requests that were

forwarded to the service when the connector

transitioned to the Passivating state are as follows:

• request(c1_1,client1,service,1, begin)

• request(c2_1,client2,service,1, begin)

• request(c3_1,client1,service,1, none)

• request(c1_1,client1,service,6, end)

The execution trace also indicates that the

recovery connector has received and forwarded the

intermediate responses of the first two requests as

follows:

• response(c1_1, service, client1, 3, intermediate)

• response(c2_1, service, client2, 6, intermediate)

Since the service is still engaged in three transactions,

both Connector Control and Service Request

Coordinator transition to the Passivating state, where

the service continues servicing transactions c1_1,

c2_1, and c3_1. The execution trace indicates that

while the Service Request Coordinator is in the

Passivating state, it received request(c4_1, client4,

service,1, none). The action was to hold that request

in the SPQ. However, when the Service Requests

Coordinator received request(c2_1, client2, service,

9, end), it forwarded that request to Connector

Control, since this request must be serviced in order

for the service to become quiescent. Eventually,

Connector Control received all responses to

transactions c1_1, c2_1, and c3_1. At this point, all

active transactions are completed and both Connector

Control and Service Request Coordinator transitioned

to the Quiescent state. When service adaptation is

completed and the connector is reactivated, the

execution trace reveals that the connector forwarded

request(c4_1, client4, service,1, none) queued in the

SPQ to the service and that the connector resumed

execution normally.

5.3 Random Failure and Adaptation

In addition to planned scenarios, our validation

consists of simulating failure and adaptation

occurring at random points during service execution.

The simulation consists of several runs in which the

recovery connector may randomly receive up to 50

dialogs from 50 clients. The service might fail or

receive the Passivate command from Change

Management at a random point during its execution.

As an example, in one run, the service received

the Passivate command while it was processing 9

transactions. Execution trace showed that Connector

Control and Service Requests Coordinator

transitioned to the Passivating state so that the service

continued processing these transactions. In this state,

execution trace shows that the recovery connector

continued forwarding requests of existing

transactions but held requests of new transactions in

Requests forwarded

to and responses

received from

service before

service failure.

Request recovered

from SRQ

Requests recovered

from SAQ

New request held in

SPQ

Model-based Recovery Connectors for Self-adaptation and Self-healing

Figure 6: Fragment of execution trace of simulation for adaptation scenario.

the SPQ. When the 9 transactions were processed by

the service, both Connector Control and Service

Requests Coordinator transitioned to the Quiescent

state. After the service was adapted and the recovery

connector reactivated, Connector Control forwarded

41 new requests, which had been previously held in

the SPQ while the service was being adapted. While

processing these transactions, the execution trace

indicates that the service failed. As a result, the

recovery connector recovered the requests of these

transactions, as explained previously in section 3.4.3.

When the service recovered, the recovery connector

restarted the failed transactions with the recovered

service, and then the service continued processing

these transactions normally. At the end of the run, the

execution trace shows that all 50 transactions were

processed and that every client received a response

for every request it had sent.

6 RELATED WORK

Research into self-adaptive, self-configuration, and

self-healing (Garlan et al., 2004; Kramer and Magee,

2007; Menasce et al., 2011; Stojnic et al., 2012)

investigated various automated approaches for

monitoring software systems at run-time and adapting

the software behavior dynamically by changing the

configuration of the software system from one

configuration to another in order to meet certain

system-level constraints and maximize the overall

system utility.

In the area of dynamic software adaptation,

Kramer and Magee investigated how a component

must transition to a quiescent state before safe

adaptation (Kramer and Magee, 1990). Ramirez et al.

discussed various design patterns, including

reconfiguration patterns, for self-adaptive systems

(Ramirez and Cheng, 2010). Gomaa et al discussed

dynamic software adaptation patterns for SOAs

including patterns for different types of service

coordination and distributed transactions (Gomaa et

al., 2010; Gomaa and Hashimoto, 2011, 2012). Li et

al. proposed an adaptable connector that can be used

to reconfigure service connections without affecting

application execution (Li et al., 2006). Irmert et al.

suggest a framework in which service

implementation can be replaced at run-time

transparently and atomically (Irmert et al., 2008).

In the area of self-healing for service-oriented

computing and SOAs, Danilecki et al. suggest a

rollback recovery protocol tailored to the distinctive

characteristics of SOAs (Danilecki et al., 2013).

Candea et al. investigated a platform-dependent

recovery server for J2EE applications using a

modified version of JBoss (Candea et al., 2003). Silva

et al. proposed an automated self-healing software

rejuvenation approach using virtualization where the

focus is to ensure that no messages can be lost due to

software aging and transient faults (Silva et al., 2009).

Salatge et al. suggest the use of fault-tolerance

connectors to increase service dependability in SOAs

(Salatge and Fabre, 2007).

Compared to these approaches, this paper

investigates the problem of integrating adaptation and

recovery patterns for SOAs, which is an area that has

received little attention in the literature. The goal is to

achieve a recovery connector that can be used to

handle both adaptation and recovery of services

Transactions initiated

before connector

received Passivate

command.

Service continued

processing existing

transactions while

passivating.

New transaction is

held in SPQ while

passivating.

New transaction held in SPQ is

forwarded after reactivation.

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

safely and transparently without losing any

application messages. The approach is platform-

independent to increase reuse of these connectors.

7 CONCLUSIONS

This paper has described an approach for self-

configuration and self-healing in which services are

safely adapted at run-time and recovered

transparently from failure to a consistent state using

recovery connectors. Furthermore, the same recovery

connector design is used to handle stateless and

stateful services, in which client requests are

idempotent, in different architectural patterns.

In this research, we consider the atomicity and

consistency properties of transactions (Bernstein and

Newcomer, 2009). Transaction atomicity is achieved

by ensuring that transaction requests and responses

are maintained at the connector for the duration of the

transaction and that no requests can be lost due to

service adaptation or failure. Thus, if a partially

executed transaction is interrupted due to service

failure, it can be restarted. Transaction consistency is

achieved by ensuring that the service always recovers

to a state where lost transaction requests are resent

and redundant messages are detected and removed.

The connector also ensures that previous requests of

a failed dialog are resent to recovered service in the

same order they were processed before failure to

ensure that the recovered service also process these

requests in that order.

Long-living transactions, which contain a human

in the loop, are also supported by our approach since

these transactions can be split into multiple,

independent stateless transactions. In addition, our

approach supports services in which requests are self-

contained. For instance, many web services use

cookies as a state maintenance mechanism. In this

case, the service can be treated as stateless. We are

currently investigating extending our approach to

handle stateful services that handle non-idempotent

client requests, as well as distributed transactions that

involve multiple stateful transactions (e.g. two-phase

commit protocol).

We assume that only a single component can fail

at a time. However in certain types of applications,

such as safety-critical systems, this assumption may

not be acceptable. We are investigating relaxing our

failure assumptions by extending our approach to

handle concurrent node failures. Furthermore, we are

considering tolerating failures occurring at the

recovery connectors by using replication techniques.

Future work includes (1) extending recovery

connectors to handle recovery and adaptation of

other, fully asynchronous architectural patterns such

as the master/slave and control patterns, (2)

incorporating software product line technology to

support multiple recovery strategies for architectural

patterns, (3) investigating recovery in software

systems by incorporating combinations of

architectural patterns, (4) extending the approach to

stateful services that receive non-idempotent

requests, and (5) considering different

communication patterns, including dialogs between

components and distributed transactions.

ACKNOWLEDGEMENTS

This work is partially supported by the AFOSR award

FA9550-16-1-0030.

REFERENCES

Avizienis, A. et al, 2004. Basic concepts and taxonomy of

dependable and secure computing. IEEE Trans.

Dependable Secure Comput. 1, 11–33.

Bernstein, P. A., Newcomer, E., 2009. Principles of

Transaction Processing, Second Edition, 2 edition. ed.

Morgan Kaufmann, Burlington, MA.

Candea, G. et al, O., 2003. JAGR: An Autonomous Self-

Recovering Application Server.

Danilecki, A. et al, P., 2013. Applying Message Logging to

Support Fault-Tolerance of SOA Systems. Found.

Comput. Decis. Sci. 38, 145–158.

Garlan, D. et al, 2004. Rainbow: architecture-based self-

adaptation with reusable infrastructure. Computer 37,

46–54.

Gomaa, H., 2011. Software Modeling and Design: UML,

Use Cases, Patterns, and Software Architectures,

Cambridge University Press, Cambridge ; New York.

Gomaa, H., Hashimoto, K., 2011. Dynamic Software

Adaptation for Service-oriented Product Lines, in:

Proc. of the 15th Int. Softw. Product Line Conf.,

Volume 2, SPLC ’11. ACM, New York, NY, USA, p.

35:1–35:8.

Gomaa, H., Hashimoto, K., 2012. Dynamic Self-adaptation

for Distributed Service-oriented Transactions, in: Proc.

of the 7th Int. Symp. on Softw. Eng. for Adaptive and

Self-Managing Systems, SEAMS ’12. IEEE Press,

Piscataway, NJ, USA, pp. 11–20.

Gomaa, H., Hashimoto, K., Kim, M., Malek, S., Menascé,

D.A., 2010. Software Adaptation Patterns for Service-

oriented Architectures, in: Proc. of the 2010 ACM

Symposium on Applied Computing, New York, NY,

USA, pp. 462–469. doi:10.1145/1774088.1774185

Irmert, F et al., 2008. Runtime Adaptation in a Service-

oriented Component Model, in: Proc. of the 2008 Int.

Model-based Recovery Connectors for Self-adaptation and Self-healing

Wkshp. on Softw. Engineering for Adaptive and Self-

Managing Systems, SEAMS ’08. ACM, New York,

NY, USA, pp. 97–104.

Kephart, J. O., Chess, D. M., 2003. The vision of autonomic

computing. Computer 36, 41–50.

Kramer, J., Magee, J., 2007. Self-Managed Systems: an

Architectural Challenge, in: Future of Softw.

Engineering, 2007. FOSE ’07. pp. 259–268.

Kramer, J., Magee, J., 1990. The evolving philosophers

problem: dynamic change management. IEEE Trans.

Softw. Eng. 16, 1293–1306. doi:10.1109/32.60317

Li, G. et al, 2006. Facilitating Dynamic Service

Compositions by Adaptable Service Connectors: Int. J.

Web Serv. Res. 3, 68–84.

Menasce, D., Gomaa, H., Malek, S., Sousa, J. P., 2011.

SASSY: A Framework for Self-Architecting Service-

Oriented Systems. IEEE Softw. 28, 78–85.

doi:10.1109/MS.2011.22

Ramirez, A. J., Cheng, B. H. C., 2010. Design Patterns for

Developing Dynamically Adaptive Systems, in: Proc.

of the 2010 ICSE Wkshp. on Softw. Engineering for

Adaptive and Self-Managing Systems, SEAMS ’10.

ACM, New York, NY, USA, pp. 49–58.

Salatge, N., Fabre, J.-C., 2007. Fault Tolerance Connectors

for Unreliable Web Services, in: 37th Annu. IEEE/IFIP

Int. Conf. on Dependable Systems and Networks, 2007.

DSN ’07. pp. 51–60.

Silva, L. M., Alonso, J., Torres, J., 2009. Using

Virtualization to Improve Software Rejuvenation. IEEE

Trans. Comput. 58, 1525–1538.

Stojnic, N., Schuldt, H., 2012. OSIRIS-SR: A Safety Ring

for self-healing distributed composite service

execution, in: 2012 ICSE Conf on Softw. Engineering

for Adaptive and Self-Managing Systems pp. 21–26.

Taylor, R. N. et al, 2009. Software Architecture:

Foundations, Theory, & Practice, Wiley, Hoboken, NJ.

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications