AGENT-BASED FAULT MANAGEMENT OF EMBEDDED CONTROL

SYSTEMS

Atef Gharbi

, Mohamed Khalgui

, Jiafeng Zhang

and Samir Ben Ahmed

INSAT, Tunis, Tunisia

Xidian University, Xi’an, China

Keywords:

Software Control Component, Intelligent Agent, Functional Safety, Queueing System.

Abstract:

The paper deals with reconﬁgurable component-based embedded control systems to be safe when hardware or

software faults occur at run-time. We deﬁne an agent-based architecture to handle automatic reconﬁgurations

under well-deﬁned conditions when run-time faults occur. We propose an implementation for the agent which

maintains many queues to save run-time faults. This implementation aims to minimize the global waiting time

of faults in queues. Multiple simulations are applied in the paper to ﬁnd the best policy allowing an optimal

reactivity of the system. We develop the tool ”SimulatorAgent” to encode this approach that we apply to a

Benchmark Production System.

1 INTRODUCTION

The

new generation of industrial control systems is

addressing today new criteria as ﬂexibility and agility

(G. Pratl and Penzhorn, 2007). We distinguish two

reconﬁguration policies: static and dynamic policies

such that static reconﬁgurations are applied off-line

to apply changes before any system cold start (An-

gelov et al., 2005), whereas dynamic reconﬁgurations

are dynamically applied at run-time (Al-Saﬁ and Vy-

atkin, 2007). We are interested in automatic reconﬁg-

urations of an agent-based embedded control system

when hardware or software faults occur at run-time.

The system is implemented by different complex net-

works of Control Components (event-triggered soft-

ware units) such that only one is executed at a given

time when a corresponding reconﬁguration scenario

is automatically applied by the agent under well-

deﬁned conditions. We propose an agent-based archi-

tecture to handle automatic reconﬁgurations by cre-

ating, deleting or updating components to bring the

whole system into safe and optimal behaviors when

This work was supported in part by the Natural Science Foun-

dation of China under Grant 60773001, the Fundamental Research

Funds for the Central Universities under Grant No. 72103326, the

National Research Foundation for the Doctoral Program of Higher

Education, the Ministry of Education, P. R. China, under Grant

No. 20090203110009, ”863” High-tech Research and Develop-

ment Program of China under Grant No 2008AA04Z109, the Re-

search Fellowship for International Young Scientists, National Nat-

ural Science Foundation of China, and Alexander von Humboldt

Foundation.

faults occur.

We aim in this paper to ﬁnd the best solution

for the optimal management of run-time faults in or-

der to guarantee an optimal reactivity of the whole

system. We assume three types of faults: the ﬁrst

type affects sensors of the plant, the second affects

actuators and the last affects control components.

The agent maintains many queues to save run-time

faults. To decide what is the fault queue that the

agent should choose ﬁrst, we propose to evaluate

the performance by applying four approaches (Prior-

ity/FIFO, Priority/Round Robin, Priority/Priority and

Priority/Random). The measure of performance is

based on the waiting time of a fault in a queue. A

comparative study shows that Priority/Round Robin

is considered as the best approach whereas Prior-

ity/Priority as the worst one. The simulation is en-

sured through the tool ”SimulatorAgent” which en-

ables to check the agent-based embedded control sys-

tem when hardware or software faults occur at run-

time.

We describe in the next Section the agent’s al-

gorithm ensuring optimal management of run-time

faults. We present the experimentation in Section 3

and ﬁnally conclude the paper in Section 4.

2 AGENT’S ALGORITHM

By considering that a fault can affect a sensor, an actu-

ator or a Control Component, we deﬁne a list of faults

277

Gharbi A., Khalgui M., Zhang J. and Ben Ahmed S..

AGENT-BASED FAULT MANAGEMENT OF EMBEDDED CONTROL SYSTEMS.

DOI: 10.5220/0003490902770280

In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 277-280

ISBN: 978-989-8425-77-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

for each one of them. For each kind of faults, we as-

sociate a queue to save the occurrence of each fault,

in particular the fault type, the occurrence time and

the treatment time. Therefore, we have three kinds of

queues: to save the faults affecting sensors, we use the

sensor fault queue denoted by Queue

(1 ≤ j ≤ N

where N

represents the number of all fault queues

associated to sensor), to save the faults affecting ac-

tuators, we use the actuator fault queue denoted by

Queue

(1 ≤ j ≤ N

, where N

represents the num-

ber of all fault queues associated to actuator) and for

faults affecting Control Components, we deﬁne com-

ponent fault queue denoted by Queue

(1 ≤ j ≤ N

where N

represents the number of all fault queues

associated to control component).

The agent manages the system’s reactivity when

faults stored in queues should be treated. The general

algorithm is based on well-known scheduling poli-

cies. Our goal is to have an optimal behavior of the

agent for a safety system.

Formalization

We introduce the notations used in the algorithm:

• N

(resp. N

, N

) represents the number of the

whole fault queues related to a sensor (resp. an

actuator, a control component)

• Queue

(resp. Queue

, Queue

) represents a

queue associated to a deﬁned kind of a fault con-

cerning a sensor (resp. an actuator, a control com-

ponent). This queue saves different occurrences

of faults and their characteristics especially time

occurrence and time treatment

• GWT

(resp. GW T

, GW T

) represents the

global waiting time for the different faults in a

queue related to a sensor (resp. an actuator, a con-

trol component)

• MGW T

(resp. MGW T

, MGW T

) represents

the mean global waiting time for the different

faults in a queue related to a sensor (resp. an ac-

tuator, a control component)

For the sake of simplicity, we present here only

the main steps of the algorithm applied to the differ-

ent fault queues related to a sensor and these steps are

the same for the others (i.e. fault queues related to

an actuator or a control component). Let N

be the

number of faults queues that handle faults occuring

at run-time related to a sensor. Let Fault

be an oc-

currence of a fault related to the queue Queue

such

that 1 ≤ i ≤ Queue

.length(). We assume that the

agent computes the waiting time of each fault Fault

denoted by W T

i, j

. The waiting time is a measure of

the total time that a fault waits in a queue. It corre-

sponds to the duration between the occurrence time of

the fault (denoted by time

i, j

) and the end of its treat-

ment time by the agent (denoted by treatmentT

i, j

W T

i, j

= treatmentT

i, j

- time

i, j

We denote in addition by GW T

the global wait-

ing times of all the faults belonging to the same queue

Queue

. It is equal to the sum of the different waiting

times of different faults Fault

divided by their num-

ber where 1 ≤ i ≤ Queue

.length().

GW T

∑

Queue

.length()

i=1

W T

i, j

Queue

.length()

We denote also by Mean Global Waiting Time

(denoted here MGW T

) the sum of the global wait-

ing times of all faults in queues related to a sensor

divided by their number N

MGW T

∑

j=1

GW T

We distinguish in the agent’s algorithm two peri-

odic actions : fault occurrence and fault management.

When a fault occurs, the agent searches the kind of

this fault and puts it in the associated queue. The

fault management is a periodic task where the agent

treats a fault, calculates the waiting time associated to

this fault and then deletes it from the corresponding

queue. Finally, the agent calculates the mean global

waiting time for faults related to sensors, actuators

or control components. We present in the following

the detailed algorithm of the agent handling different

faults. This algorithm tries in particular to minimize

the global waiting time of faults in a queue.

Detailed Agent’s Algorithm

(0) Initialization

∀ j ∈ [1..N

] Queue

.clear();

∀ j ∈ [1..N

] Queue

.clear();

∀ j ∈ [1..N

] Queue

.clear();

(1) Fault occurrence

For each period ∆

If occurrence(fault) then

fault.time ← currentTime();

Switch type(fault)

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

Queue

.push( f ault);

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

Queue

.push( f ault);

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

Queue

.push( f ault);

(2) Fault management

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

278

For each period ∆

′

If treat(fault) then

Switch type(fault)

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

& i = the most priority fault in the Queue

W T

i, j

← currentTime() − Queue

.get(i).time

Queue

.pop(i);

GW T

← GW T

+W T

i, j

;

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

& i = the most priority fault in the Queue

W T

← currentTime() − Queue

.get(i).time

Queue

.pop(i);

GW T

← GW T

+W T

i, j

;

case f ault

∃ j ∈ [1..N

]/ f ault.kind = j

& i = the most priority fault in the Queue

W T

i, j

← currentTime() − Queue

.get(i).time

Queue

.pop(i);

GW T

← GW T

+W T

i, j

;

(3) Measure of Mean Global Waiting

Time

MGW T

← 0;

MGW T

← 0;

MGW T

← 0;

For j:= 1 to N

MGW T

← MGW T

+ GW T

/Queue

.length()

MGW T

= MGW T

Print(”MGWT for sensor faults: ”,MGW T

)

For j:= 1 to N

MGW T

← MGW T

+ GW T

/Queue

.length()

MGW T

= MGW T

Print(”MGWT for actuator faults: ”,MGW T

)

For j:= 1 to N

MGW T

← MGW T

+ GW T

/Queue

.length()

MGW T

= MGW T

Print(”MGWT for Control Component faults:

”,MGW T

)

We note ﬁnally that the approach complexity is

O(n) where n is the greatest number among N

, N

and N

3 EXPERIMENTATION

The goal of this research paper is to deﬁne an opti-

mal agent’s policy for feasible management of soft-

ware and hardware errors at run-time. We present a

comparative study based on the global waiting time

of faults in queues according to well-known schedul-

ing policies (Priority/FIFO, Priority/Round Robin,

Priority/Priority and Priority/Random). We propose

to evaluate the performance by applying four ap-

proaches so that we determine the best approach that

the agent should take:

Priority/FIFO Approach: for faults from the same

queue, we use the priority criteria; for faults re-

lated to different queues, we use the First In/First

Out criteria;

Priority/Round Robin Approach: for faults from

the same queue, we use the priority criteria; for

faults related to different queues, we use the

Round Robin criteria which means for the ﬁrst

time, we take a fault from the ﬁrst queue; for

the second time, we take a fault from the second

queue, and so on;

Priority/Priority Approach: for faults from the

same queue, we use the priority criteria; for faults

related to different queues, we use the priority cri-

teria between different queues;

Priority/Random Approach: for faults from the

same queue, we use the priority criteria; for faults

related to different queues, we use a random

choice.

To have a correct result, all the tests are based on

the same characteristics of faults which enable to gen-

erate the following results (Table 1, Table 2, Table 3,

Table 4).

Table 1: Waiting time according to Priority/FIFO approach.

Time unit Sensor Actuator Component

1 38 22 6

2 40 34 29

3 54 64 51

4 61 53 41

5 67 64 55

6 57 67 79

MGWT 52,83 50,67 43,50

Interpretation:

The Figure 1 presents the Mean Global Waiting

Time (MGWT ) for each approach. As seen from the

curves in Figure 1, we conclude that the best solu-

tion to be applied by the agent is the Priority/Round

Robin approach. This result may be expected because

AGENT-BASED FAULT MANAGEMENT OF EMBEDDED CONTROL SYSTEMS

279

Table 2: Waiting time according to Priority/Round Robin

approach.

Time unit Sensor Actuator Component

1 38 19 25

2 17 25 33

3 51 21 38

4 27 47 51

5 55 67 49

6 50 38 58

MGWT 39,67 36,17 42,33

Table 3: Waiting time according to Priority/Priority ap-

proach

Time unit Sensor Actuator Component

1 100 120 11

2 130 98 16

3 120 110 21

4 150 100 21

5 130 125 19

6 135 130 35

MGWT 127,5 113,83 20,5

the Priority/Round Robin approach ensures equality

between all the fault queues of the different cate-

gories which leads to treat the diffrent faults for ev-

ery queue without waiting a long time. We consider

also that the Priority/Random approach provides in-

teresting results as MGW T values are not important.

Nevertheless, the Priority/FIFO approach generates a

medium values of MGW T so it can not be consid-

ered as the best neither the worst approach. The Pri-

ority/Priority approach is the worst one. This degra-

dation of MGW T is due to that the agent gives pri-

ority to only one queue whereas the other queues are

neglected which leads to heavy MGW T . By consider-

ing all these interpretations, we recommend to apply

the Round Robin policy for the optimal implementa-

tion of the agent.

4 CONCLUSIONS

To guarantee a safe behavior of the whole system, we

deﬁne an agent-based architecture where the agent

controls the plant and treats fault whenever it oc-

curs. To do so, we classify the faults (faults related

to sensor, actuator or control component); for each

category, we deﬁne many kinds of faults. In or-

der to know what is the fault queue that the agent

should choose ﬁrst, we propose to evaluate the perfor-

mance by applying four approaches (Priority/FIFO,

Priority/Round Robin, Priority/Priority and Prior-

Table 4: Waiting time according to Priority/Random ap-

proach.

Time unit Sensor Actuator Component

1 18 0 18

2 18 29 84

3 52 14 79

4 62 50 81

5 36 75 48

6 26 52 37

MGWT 35,33 36,67 57,83

Figure 1: Comparative study.

ity/Random). The results obtained permit to calculate

the Mean Global Waiting Time (MGW T ) which leads

to consider the Priority/Round Robin approach as the

best solution and the Priority/Priority approach as the

worst one.

REFERENCES

Al-Saﬁ, Y. and Vyatkin, V. (2007). An ontology-based

reconﬁguration agent for intelligent mechatronic sys-

tems. In Third International Conference on Indus-

trial Applications of Holonic and Multi-Agent Sys-

tems. Springer-Verlag.

Angelov, C., Sierszecki, K., and Marian, N. (2005). De-

sign models for reusable and reconﬁgurable state ma-

chines. In L.T. Yang and All (Eds): EUC 2005, LNCS

3824, pp:152-163. International Federation for Infor-

mation Processing.

G. Pratl, D. Dietrich, G. H. and Penzhorn, W. (2007). A

new model for autonomous, networked control sys-

tems. IEEE Transactions on Industrial Informatics,

3(1).

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

280