FUNCTIONALITY RECOMPOSITION FOR SELF-HEALING

Josu Martinez and Simon Dobson

Systems Research Group, School of Computer Science and Informatics, UCD Dublin IE, Ireland

Keywords:

Autonomic Computing, Formal Methods, Distributed Architecturess, Software Composition.

Abstract:

Autonomic computing aims to provide self-management and adaptation in the implementation of complex

(large, heterogeneous, distributed) systems over time. Such adaptations must be stable, in the sense of main-

taining the system’s high-level goals across environmental changes, which may lead to functionality loss. In

this paper we present FReSH, a decentralised component-based framework which main objective is to self-heal

the operation of complex systems in the face of behavioural disruptions. FReSH deals with formally speciﬁed

components that provide a single piece of functionality. The reusable and shareable nature of these building

blocks makes them eligible for dynamically recomposing the functionality provided by any failing component

of the system without human intervention. FReSH supports the construction of more ﬂexible, adaptive and

robust software structures suitable to cope with the environmental changes of complex systems.

1 INTRODUCTION

The technological advances of the recent years have

enabled the creation of new forms of dynamic in-

teractions and global cooperation among social or-

ganizations and business communities. Due to such

constantly increasing societal expectation of this new

era of computing, a myriad of heterogeneous and

distributed components are emerging to constitute

corporate-wide computing systems that extend be-

yond company boundaries (Nachira, 2007). However,

the construction of these type of systems introduce

new levels of complexity, not only because they may

evolve in size over the time, but also because both

new and old components must co-habit sharing their

resources while ensuring the correct operation of the

system.

One of the consequences is the possibility that

some functionality of the system becomes unavailable

at run-time. As it can be inferred from the results

obtained from some case studies that analyse com-

plex systems (Patterson et al., 2002), in these environ-

ments the probability of experiencing service inter-

ruption is proportional to amount and grade of inter-

dependencies between components. Thus whenever

components become unavailable e.g., due to the high

network trafﬁc or operation latency issues (compo-

nents may depend on other components’ services to

perform their operations), because they crash or they

are inconsistently updated, or simply because the ma-

chines they reside in are suddenly switched off, sys-

tem users may perceive that the system cannot pro-

vide some speciﬁc functionality any more. The worst

case scenario is that the activity of the system results

unexpectedly interrupted, which is not admissible at

all.

Because these type of systems evolve very rapidly

human administrators cannot cope with healing tasks

to ensure the correctness of their operation, and there-

fore a new formula to do so is required. This pa-

per presents a different self-healing approach specif-

ically designed to handle component unavailability

without operation interruption in such complex en-

vironments. We introduce FReSH, a decentralised

component-based Java framework able to detect oper-

ation disruptions at run-time and recompose any miss-

ing functionality by dynamically identifying, reusing

and self-assembling software pieces (i.e., local and

remote services) distributed over the various nodes

that belong to the system environment. Ideally, these

software components can be functions, methods, pro-

cedures, web services and even compositions of the

previous, and each provides some basic functional-

ity. Hence, FReSH can be used to build tractable and

dependable applications out components of different

nature, that ﬂexibly adapt to environmental changes.

159

Martinez J. and Dobson S. (2009).

FUNCTIONALITY RECOMPOSITION FOR SELF-HEALING.

In Proceedings of the 4th International Conference on Software and Data Technologies, pages 159-164

DOI: 10.5220/0002281701590164

 SciTePress

2 RELATED WORK

Since IBM coined the term autonomic computing

in 2001 many self-healing initiatives have been re-

searched. None of them are able in their own to cope

with the difﬁculties arisen in complex environments

described above in a completely autonomous manner.

However, far from rejecting them our proposed self-

healing strategy comprises and reuses some of their

conceptual properties, as exposed in Section 4.

We have analysed some approaches that fall into

four different self-healing categories: component re-

dundancy, architecture models, component micro-

rebooting and SOA-based process reorganization.

A component redundancy technique suggests a

self-assembly mechanism based on an agent entity

that replicates components to replace dead neigh-

bours and enables recomposition of entire struc-

tures (Nagpal et al., 2003). Another strategy in-

spired from biology is providing the system with the

ability of replicating cells in excess to combat ex-

ternal intrusions (George et al., 2003). On another

hand, Recovery-Oriented Computing (ROC) (Berke-

ley/Stanford, 2008) suggests to isolate faulty compo-

nents and replace them with redundant ones.

However, a full replication of all the components

that populate the system may introduce performance

issues due to the need of having to perform complex

redundancy management tasks. Another problem of

this approach is that many of the nodes containing the

components may be portable devices, which are con-

strained by tight memory limits.

One subset of self-healing techniques based on

architecture models focuses on dynamically recon-

ﬁguring the connector links among components to

correct performance deviations (Georgiadis et al.,

2002). Some other approaches (Appavoo et al., 2003;

de Lemos and Fiadeiro, 2002) replace failing services

or components by functionally equivalentones. Some

decision policies determine which alternative compo-

nent replaces the original one.

The main problem that this strategy presents is

that complex environments change very often, and

thus many of those components may not survive in

the system for a long time whereas others may ap-

pear or evolve (Nachira, 2007; Kephart and Chess,

2003). This fact makes repair plans become obsolete

even before applying them to the running system, and

enforces the administrator to constantly update them.

Furthermore, most of the researched solutions rely on

a centralized approach.

On another hand, faulty modules can be micro-

rebooted independently and automatically to avoid

fault propagation whenever they are suspected of not

functioning properly (Patterson et al., 2002). The ef-

ﬁciency of this technique resides on the fact that re-

starting single components takes less time than re-

booting the whole system.

In highly distributed environments where compo-

nents may have a large number of inter-dependencies

re-starting failing software artefacts may not be an

efﬁcient and reliable option (Tanenbaum and Steen,

2001). In certain situations where components may

have to be re-started e.g., due to multiple machine

power-cuts or heavy and persistent network traf-

ﬁc overloads other components that request services

from the failing ones may remain blocked too long,

which may degrade the performance of the entire sys-

tem to unacceptable limits.

Finally, Service-Oriented Architectures (SOA) is

a ﬂexible coordination paradigm that enables compo-

nents to export services over the network (Papazoglou

and Georgakopoulos, 2003). These services can be

discovered and dynamically bound at run-time to pro-

vide higher levelservices to other components (Baresi

et al., 2004).

Despite the apparent success on efﬁciently build-

ing reliable and robust service structures, web-based

mechanisms do not properly address self-healing in

complex environments. There is a trade-off between

constructing loose-coupled service-based structures

to improve maintenance and ﬂexibility, and fulﬁlling

the non-functional requirements of the system dur-

ing its execution (Baker and Dobson, 2005). Cer-

tain properties of the environment such as the net-

work bandwidth may negatively affect the operational

latency of some components. Hence, a more suit-

able architectural approach that reduces the inter-

dependencies among remote services is required for

complex systems.

3 ATOMIC AND COMPOSITE

COMPONENTS

A software component is a unit of composition that

provides some functionality with contractually spec-

iﬁed interfaces and explicit context dependencies

only (Clemens Szyperski, 2002). However, once

compiled these components become self-contained

pieces of software that remain as unalterable black

boxes during their execution. Hence, their reuse to

build new applications exclusively depends on the

knowledge programmers may have about the func-

tionality they provide. From this fact it can be in-

ferred that once these components become run-time

software entities they cannot be automatically com-

posed at run-time.

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

160

Similarly to some of the existing ontology lan-

guages created by the SOA community, such as

OWL-S (Martin et al., 2007), our approach consists

on associating components with some formal speciﬁ-

cation of the functionality they provide, so that they

can be identiﬁed and reused to autonomously com-

pose higher level software structures. At this prelim-

inary development stage we are using the JML state-

based speciﬁcation language (Burdy et al., 2005) to

specify the functional behaviour of atomic compo-

nents developed in Java. JML is a rich speciﬁca-

tion language used to formally describe the function-

ality of Java objects. Other atomic components such

as web services also must also have some associated

state-based speciﬁcations to be identiﬁed and reused.

These unambiguousspeciﬁcations can be matched

with other speciﬁcations to ﬁnd equivalences among

the components they describe (Zaremski and Wing,

1995). Furthermore, the speciﬁcation of a complex

component may be automatically decomposed into

some simpler, lower-level speciﬁcations (van Lam-

sweerde, 2000), which in turn could be matched with

the speciﬁcations of smaller components. One of the

key points of our proposal consists on providing rich

post-condition expressions to every reusable compo-

nent of the system, so that whenever it becomes un-

available it can be autonomously replaced by another

fully equivalent component or a specially combined

set of components which speciﬁcations match with

the sub-speciﬁcations of the former.

To create component compositions we are using

ORC (Misra and Cook, 2007), a concurrent and dis-

tributed component orchestration language with all

the required operators to implement sequential and

parallel structures. Moreover, ORC also provides

other features to reproduce a whole set of classic

programming idioms such as e.g., the conditional

and iterative statements. ORC components consti-

tute computational recipes where single components

can be glued together to compose higher-level com-

ponents, and therefore these relationships can be au-

tonomously altered at run-time. On another hand,

similarly to the atomic components these recipes per-

form some computation operations, and thus they are

considered composite software components.

Hence, ORC components must also be provided

with some associated formal speciﬁcations. This is

the base of our self-healing strategy and main contri-

bution. In the case of component unavailability (be-

cause e.g., a node that contains certain components

crashes, or the network trafﬁc is so high that some

nodes become disconnected) an ORC component is

automatically created to replace the failing one. An

ORC component may comprise other ORC compo-

nents. If the unavailable component is an ORC com-

ponent, depending on the causes of failure the newly

created ORC component may reuse some of the com-

ponents that were used by the failing structure, or it

may use completely different ones.

Components are selectively replicated in other

nodes of the system to reduce the inter-dependencies

among remote services, and thus overcome the SOA

drawbacks discussed in Section 2. However, some of

the components may be too heavy to be sent through

the network or there may exist certain legal issues that

may prevent components to be shared among nodes,

and therefore they may only be remotely invoked.

4 FUNCTIONALITY

RECOMPOSITION

FReSH is an autonomic component-based framework

with reﬂective capabilities. Similarly to other au-

tonomic frameworks such as Unity (Tesauro et al.,

2004), in FReSH every node of the system contains

an autonomic manager (AM) that supervises all the

components included in the node and performs func-

tionality recomposition whenever any of these com-

ponents become unavailable.

More in detail, an autonomic manager must be

able to perform introspection i.e., it must constantly

monitor the operation of its managed components

and detect behaviour inconsistencies. At this stage

of the framework development we assume the exis-

tence of a component unavailability detection mecha-

nism based on probe signals (Balasubramaniam et al.,

2005), which regularly sends probes to the compo-

nents to check the validity of its status. If a compo-

nent does not return any feedback to the autonomic

manager within a reasonable amount of time it is con-

sidered unavailable. In this case, the corresponding

autonomic manager in charge of that component must

execute some intercession actions i.e., it must carry

out certain procedure to ﬁx the problem without in-

terrupting the operation of the system. This proce-

dure consists ﬁrst on realising the functionality that

the unavailable component was providing, and sec-

ond on ﬁnding a functionally equivalent component

to replace the failing one (direct substitution). If no

equivalence is found, the autonomic manager must

make a local or global search for a combination of

other components that can be used to recompose the

missing functionality (composite replacement).

This strategy differs from direct replication and

replacement of components as it supports the re-

creation of unavailable functionality from already ex-

isting components. Moreover, it increases the proba-

FUNCTIONALITY RECOMPOSITION FOR SELF-HEALING

161

bility of obtaining suitable software structures to re-

place unavailable components while decreasing oper-

ational costs related to replication. For all these rea-

sons, FReSH is a more efﬁcient strategy to perform

self-healing in complex environments than other ex-

isting alternatives. The example exposed in Figure 1

shows how the system obtains the functionality of a

failing component by identifying, selecting, obtain-

ing and recomposing some other components. Al-

though the example shows how composite replace-

ment is performed, it is illustrative enough to under-

stand how direct substitution works.

Each node contains a catalogue with all the spec-

iﬁcations of the components included in it so that the

corresponding autonomic manager explicitly knows

the software entities it must supervise. Some formal

speciﬁcations contain more complex predicates than

others. As mentioned above, complex speciﬁcations

can be interpreted as a combination of simpler speci-

ﬁcations associated to other component implementa-

tions that may exist in the system, which is the base

for achieving functionality recomposition.

Whenever any component of any particular node

becomes unavailable at run-time, the autonomic man-

ager responsible of its health must ﬁrst suspend the

execution of any other component that consumes ser-

vices from it. Furthermore, the autonomic manager

must extend this requirement to the rest of the auto-

nomic managers to avoid cascading failures (this step

is not shown in Figure 1 for simplicity.)

First the corresponding autonomic manager must

internally search for an equivalent component. To do

so, it compares the functional speciﬁcations of the un-

available component with the speciﬁcations of other

components described in the local component cata-

logue (step 2). If no equivalent component is ob-

tained, this search is extended to other autonomic

managers so that they look up their catalogues in

the pursuit of an equivalent component (steps 3 and

4). Should any existing remote component match the

speciﬁcations of the unavailable component, the re-

mote autonomic manager sends that component to the

local autonomic manager over the network (steps 5, 6

and 7). However, due to certain constraints such as

e.g., size, amount of inter-dependencies or commer-

cial restrictions the transfer of some of these com-

ponents may not be permitted, and thus they must

be invoked remotely. Because they comprise com-

ponents of different nature and characteristics, many

ORC components may not be suitable to be shared

among environments.

If no equivalentcomponentexists in the entire sys-

tem its functionality must be recomposed through the

combination of other components. Depending on the

complexity of the speciﬁcation of an atomic compo-

nent, the correspondingautonomic manager may have

to split it into simpler speciﬁcations to facilitate the

search of equivalent components. In the case of a

composite component, the autonomic manager must

check the availability of the comprised local and re-

mote components, and just ﬁnd equivalent compo-

nents for the unavailable bits of functionality. Al-

though this is the simplest and quickest alternative,

in some cases the combination of other distinct com-

ponents may result in a more suitable composite en-

tity. Hence, autonomic managers must implement a

speciﬁcation combination algorithm to successfully

select the most appropriate components from all the

partially equivalent speciﬁcations it receives. Some

of the properties that must be taken into account to

appropriately decide the best selection of components

are e.g., the transferability nature of the components

or the grade of equivalence among components.

For every speciﬁcation resulted from the previ-

ous action an alternative equivalent component is

searched in the local repository (step 2). If no one

exists, the search is extended to other remote auto-

nomic managers (step 3), which must check if any of

the services they comprise matches the speciﬁcation

(step 4). Notice that in the example exposed in Fig-

ure 1 the autonomic manager of the second environ-

ment detects that three of its components have some

partial equivalence with the unavailablecomponent of

the ﬁrst environment, while the autonomicmanager of

the third environment discovers two partially equiva-

lent components. These speciﬁcations are sent over

the network to the autonomic manager of the ﬁrst en-

vironment (step 5). Then, it must decide which com-

bination of components is the most suitable one. In

this case the autonomic manager of the ﬁrst environ-

ment is interested in one component of the second en-

vironment and the two components of the third en-

vironment. After selecting the most suitable compo-

nents, the autonomic manager of the ﬁrst environment

requests them to the autonomic managers of the cor-

responding environments (step 6). The components

that satisfy some transference constrictions are sent

over the network so that they can be invoked locally,

while the rest must be invoked remotely (step 7).

Whether the substituting entity is atomic or com-

posite, once the corresponding autonomic manager

has received all the required resources and resolves

how they need to be composed to re establish the

missing functionality, it must dynamically generate a

new composite component and replace the failing one

with it (step 8). ORC provides some orchestration op-

erators and primitive functions to glue all the required

components together.

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

162

Figure 1: Functionality Recomposition for Self-Healing.

Finally, the autonomic manager must modify the

existing relationships that the local components had

with the failing component, so that they can consume

the service provided by the newly obtained struc-

ture. Similarly, other autonomic managers are then

requested to change the relationships that any of their

components may had with the unavailable compo-

nent. Then, the execution of all the affected com-

ponents is re-activated (these last two steps are not

shown in Figure 1 for simplicity.)

5 CONCLUSIONS AND FUTURE

WORK

In this paper we have introduced FReSH, a decen-

tralised component-based framework currently under

development that is able to dynamically detect un-

available components and recompose the functional-

ity delivered they were delivering. Functionality re-

composition is possible by making componentsshare-

able and reusable among the different nodes of the

system. To be identiﬁed, these components must have

some associated behavioural speciﬁcations that de-

scribe the functionality the component delivers.

However, to materialise this approach various

tasks remain as future work. At the moment we are

extending ORC to support autonomous and formally

speciﬁed compositions out of the JML speciﬁcations

associated to the atomic components implemented in

Java, so that the resulting software structures satisfy

the requirements for creating open systems (Nier-

strasz and Meijler, 1994). A next step is producing

sophisticated speciﬁcation discovery, matching and

combination algorithms to successfully obtain com-

ponents allocated anywhere in the system and con-

struct the most suitable equivalent structures to the

failing software entities.

On another hand, we have to deal with the het-

erogeneity nature of the components we want FReSH

to handle. In large-scale complex systems there may

exist components implemented in different technolo-

gies, which enforces certain restrictions on the pro-

gramming languages being used. Furthermore, we

also need to ﬁgure out what is the most appropriate

level of granularity of these components, as too ﬁne-

grained components may make FReSH an ineffective

technique due to the unaffordable amount of time that

composing new structures could take.

Finally, we also need to consider the case where a

whole node crashes and therefore even its autonomic

manager becomes unavailable. The underlying con-

cepts of ultra-stable systems (Hariri et al., 2006) may

support a solution for both component and node un-

availability, and thus needs to be further investigated.

ACKNOWLEDGEMENTS

Special thanks to Joseph Kiniry, Emil Vassev and Dal-

ibor Jaklin for all the discussions that helped shaping

the underlying concepts of FReSH. This work is par-

tially supported by Science Foundation Ireland under

FUNCTIONALITY RECOMPOSITION FOR SELF-HEALING

163

grant number 03/CE2/I303-1, Lero, the Irish Software

Engineering Research Centre.

REFERENCES

Appavoo, J., Hui, K., Soules, C. A. N., Wisniewski, R. W.,

Silva, D. M. D., Krieger, O., Edelsohn, D. J., Auslan-

der, M. A., Gamsa, B., Ganger, G. R., McKenney, P.,

Ostrowski, M., Rosenburg, B., Stumm, M., and Xeni-

dis, J. (2003). Enabling autonomic behavior in sys-

tems software with hot-swapping. IBM Systems Jour-

nal, 42(1).

Baker, S. and Dobson, S. (2005). Comparing service-

oriented and distributed object architectures. In OTM

Conferences (1), pages 631–645.

Balasubramaniam, D., Morrison, R., Kirby, G., Mickan, K.,

Warboys, B., Robertson, I., Snowdon, B., Greenwood,

R. M., and Seet, W. (2005). A software architec-

ture approach for structuring autonomic systems. In

DEAS’05: Proceedings of the 2005 workshop on De-

sign and evolution of autonomic application software,

pages 1–7, New York, NY, USA. ACM.

Baresi, L., Ghezzi, C., and Guinea, S. (2004). Towards

self-healing service compositions. In PriSE’04: First

Conference on the Principles of Software Engineer-

ing, volume 42, pages 27–46.

Berkeley/Stanford (2008). Recovery-Oriented Computing

(ROC). http://roc.cs.berkeley.edu.

Burdy, L., Cheon, Y., Cok, D. R., Ernst, M. D., Kiniry,

J. R., Leavens, G. T., Leino, K. R. M., and Poll, E.

(2005). An overview of jml tools and applications.

Int. J. Softw. Tools Technol. Transf., 7(3):212–232.

Clemens Szyperski, Dominik Gruntz, S. M. (2002). Com-

ponent Software: Beyond Object-Oriented Program-

ming. Addison-Wesley Longman Publishing Co., Inc.

de Lemos, R. and Fiadeiro, J. L. (2002). An architectural

support for self-adaptive software for treating faults.

In WOSS’02: Proceedings of the ﬁrst workshop on

Self-healing systems, pages 39–42, New York, NY,

USA. ACM.

George, S., Evans, D., and Marchette, S. (2003). A biologi-

cal programming model for self-healing. In SSRS’03:

Proceedings of the 2003 ACM workshop on Surviv-

able and self-regenerative systems, pages 72–81, New

York, NY, USA. ACM.

Georgiadis, I., Magee, J., and Kramer, J. (2002). Self-

organising software architectures for distributed sys-

tems. In WOSS’02: Proceedings of the ﬁrst workshop

on Self-healing systems, pages 33–38, New York, NY,

USA. ACM.

Hariri, S., Khargharia, B., Chen, H., Yang, J., Zhang, Y.,

Parashar, M., and Liu, H. (2006). The autonomic com-

puting paradigm. Cluster Computing, 9(1):5–17.

Kephart, J. and Chess, D. (2003). The vision of autonomic

computing. IEEE Computer, 36:41–50.

Martin, D., Burstein, M., Mcdermott, D., Mcilraith, S.,

Paolucci, M., Sycara, K., Mcguinness, D. L., Sirin, E.,

and Srinivasan, N. (2007). Bringing semantics to web

services with owl-s. World Wide Web, 10(3):243–277.

Misra, J. and Cook, W. (2007). Computation orchestration:

A basis for wide-area computing. Software and Sys-

tems Modeling, 6:83–110.

Nachira, F. (2007). Digital business ecosystems.

http://www.digital-ecosystems.org/book/de-

book2007.html.

Nagpal, R., Kondacs, A., and Chang, C. (2003). Pro-

gramming methodology for biologically-inspired self-

assembling systems. In AAAI Spring Symposium

on Computational Synthesis: From Basic Building

Blocks to High Level Functionality.

Nierstrasz, O. and Meijler, T. D. (1994). Requirements for

a composition language. In ECOOP’94: Workshop

on Models and Languages for Coordination of Par-

allelism and Distribution, Object-Based Models and

Languages for Concurrent Systems, pages 147–161,

London, UK. Springer-Verlag.

Papazoglou, M. P. and Georgakopoulos, D. (2003). Service-

oriented computing. Communications of the ACM,

46(10):46–54.

Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen,

M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E.,

Merzbacher, M., Oppenheimer, D., Sastry, N., Tet-

zlaff, W., Traupman, J., and Treuhaft, N. (2002). Re-

covery oriented computing (roc): Motivation, deﬁni-

tion, techniques and case studies. Technical report,

Berkeley, CA, USA.

Tanenbaum, A. S. and Steen, M. V. (2001). Distributed Sys-

tems: Principles and Paradigms. Prentice Hall PTR,

Upper Saddle River, NJ, USA.

Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A.,

Whalley, I., Kephart, J. O., and White, S. R. (2004).

A multi-agent systems approach to autonomic com-

puting. In AAMAS’04: Proceedings of the Third In-

ternational Joint Conference on Autonomous Agents

and Multiagent Systems, pages 464–471, Washington,

DC, USA. IEEE Computer Society.

van Lamsweerde, A. (2000). Formal speciﬁcation: a

roadmap. In ICSE’00: Proceedings of the Conference

on The Future of Software Engineering, pages 147–

159, New York, NY, USA. ACM.

Zaremski, A. M. and Wing, J. M. (1995). Speciﬁcation

matching of software components. In SIGSOFT’95:

Proceedings of the 3rd ACM SIGSOFT symposium

on Foundations of software engineering, pages 6–17,

New York, NY, USA. ACM.

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

164