ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy

Applic

ation to Spaceborne Data Communication Systems

∗

Jos´e Ruﬁno, Paulo Verissimo

Universidade de Lisboa, Faculdade de Ciˆencias, LaSIGE, Campo Grande, 1749-016 Lisboa, Portugal

Ricardo Pinto, Carlos Almeida, Guilherme Arroz

Universidade T´ecnica de Lisboa, Instituto Superior T´ecnico, Avenida Rovisco Pais, 1049-001 Lisboa, Portugal

Keywords:

Dependability and real-time, Controller area network, Spacecraft data communication.

Abstract:

The Controller Area Network (CAN) has played along the last decade a crucial role in the design and imple-

mentation of distributed embedded systems. However, the native CAN protocol exhibits a set of availability,

reliability and timeliness limitations. Given the large practical base of off-the-shelf microcontrollers inte-

grating standard CAN interfaces and the emergence of CAN protocol open cores, a fundamental question is

whether (and how) those components can be used for highly dependable applications of CAN?

This paper identiﬁes a fundamental set of shortcomings of the native CAN protocol and discusses how existing

CAN controllers can be combined with additional hardware/software components to secure the provisioning

of strict dependability and timeliness guarantees. Furthermore, the paper discusses the main issues in the

design and implementation of CANELy, a CAN-based infrastructure able of extremely reliable hard real-time

communication, and shows how CANELy components can be integrated in the onboard data communication

and processing infrastructure currently being designed for future space vehicle avionics.

1 INTRODUCTION

The Controller Area Network (CAN) has played

along the last decade a crucial role in the design and

implementation of distributed embedded systems in

areas as diverse as industrial automation, automotive,

train transportation, medical, oil drilling, aeronautics

and space. Standard CAN-based proﬁles have been

deﬁned for a diversiﬁed set of speciﬁc devices and

application domains. Recently, the domains of aero-

nautics (AEEC, 2010) and space (ECSS, 2005) have

been approached.

In the course of our current research aiming at

building a time- and space-partitioned architecture for

the next generation of space vehicle avionics we are

tackling the difﬁcult problem of integrating input/out-

∗

his work was partially developed within the scope

of the European Space Agency Innovation Triangle Initia-

tive program, through ESTEC Contract 21217/07/NL/CB,

Project AIR-II (ARINC 653 in Space RTOS – Industrial

Initiative, http://air.di.fc.ul.pt). This work was partially sup-

ported by Fundac¸˜ao para a Ciˆencia e a Tecnologia (Por-

tuguese Foundation for Science and Technology), through

the Multiannual Funding and CMU-Portugal Programs.

put (I/O) functions, such as sensors, actuators and net-

works while maintaining overall system responsive-

ness (Ruﬁno et al., 2010). The architectural princi-

ple of time- and space-partitioning (TSP) enables the

safe integration of applications with different degrees

of criticality in a single computing platform. Applica-

tions are segregated into logical containers, the parti-

tions, for the beneﬁt of fault containment and to ease

veriﬁcation, validation and certiﬁcation. Each parti-

tion uses a predeﬁned dedicated memory addressing

space; access to a given I/O device is granted to the

system partition hosting the corresponding agent for

I/O operations. Partitions and therefore I/O opera-

tions are scheduled under a predetermined, cyclically

repeated, sequence of time windows.

Given the reasonable body of research in CAN

dependable communications and the on-going stan-

dardisation activities in the space domain, we are ap-

proaching the use of the CAN data bus with the pur-

pose of integrating responsive remote terminal units

(RTU) and simple sensors/actuators in a TSP system

aboard a spacecraft (Figure 1).

CAN is traditionally viewed as a robust data

bus. However, the native CAN protocol exhibits a

456

Ruﬁno J., Verissimo P., Pinto R., Almeida C. and Arroz G..

ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy - Application to Spaceborne Data Communication Systems.

DOI: 10.5220/0003376004560463

In Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems (PECCS-2011), pages

456-463

ISBN: 978-989-8425-48-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Utilization of CAN in a spacecraft TSP system.

set of severe limitations with regard to provisioning

of strict availability, reliability and timeliness guar-

antees, which are a must for spaceborne applica-

tions. Given the large practical base of off-the-shelf

standard CAN interfaces, a fundamental question is

whether (and how) these components can be used for

highly dependable applications of the CAN data bus?

In fact, what is missing in the standard CAN data

bus to attain high levels of dependability is a set of

fault tolerance and timeliness-related services. These

can be provided off-the-shelf (i.e. without modiﬁca-

tions to the CAN standard or to existing CAN con-

trollers), through the use of properly encapsulated ad-

ditional hardware/software components. The mate-

rialization of this concept is called CAN Enhanced

Layer (CANELy), which is made from several hard-

ware and software building blocks (Ruﬁno, 2002).

This paper discusses the main issues in the design

and implementation of CANELy and how such func-

tionality can be effectively integrated with a TSP ar-

chitecture. The paper is organized as follows: Sec-

tion 2 provides a short description of CAN and anal-

yses its dependability; Section 3 discusses the system

model; Section 4 analyses how to improve the avail-

ability of the network infrastructure; Section 5 dis-

cusses how to secure CAN timely behaviour in the

presence of faults; Section 6 addresses the integration

of a semantically rich CANELy service interface in

a TSP architecture and the separation of implemen-

tation issues between hardware and software compo-

nents. Finally, Section 7 concludes the paper.

2 CAN DATA BUS

CAN is a multi-master data bus that uses a twisted

pair cable as transmission medium (CAN, 1993; CiA,

1994). The network maximum length depends on

the data rate. Typical values are: 40m @ 1 Mbps;

1000m @ 50 kbps. Data bus signalling takes one

out of two values: recessive (r), also the state of an

idle bus; dominant (d), which always overwrites a re-

cessive value. This behaviour, together with the use

of unique frame identiﬁers, is exploited for bus arbi-

tration. A carrier sense multi-access with determin-

istic collision resolution policy is used. When sev-

eral nodes compete for bus access, the node transmit-

ting the frame with the lowest identiﬁer always goes

through and gets the bus. Frames that have lost arbi-

tration or have been destroyed by errors are automat-

ically scheduled for retransmission. A data frame is

a piece of encapsulated information, which may con-

tain a message, a user-level piece of information. A

remote frame has no ﬁeld for message encapsulation.

In the signalling of abnormal network operation

incidents, the CAN protocol uses: error frames, for

(global) error signalling; overload frames, to react to

violations of the standard interframe spacing, which

has a nominalduration of three bit-times and is known

in CAN terminology as intermission (CAN, 1993).

Although the standard CAN physical layer allows

a few cabling faults (one wire open/short failures) to

be tolerated (CAN, 1993; CAN, 1997), no standard-

ized mechanism exists to provide resilience against

network partitioning if both wires of the network ca-

ble get simultaneously interrupted. A solution to

the problem of (physical) network partitioning has

to be built as an extension to the standard speciﬁca-

tion (NOB, 1998; Ruﬁno et al., 1999).

Furthermore, the occurrence of certain incidents

in CAN operation (such as: bit errors; transmitter/re-

ceiver glitches) produces a subtle form of (virtual)

network partitioning, called inaccessibility. Though

the standard CAN protocol has means of recovering

from these situations, the recovery process takes time,

leading to increase the network access delay as seen

by one or more nodes. This may induce a violation of

the expected network timeliness properties and there-

fore provisions to tolerate such kind of faults are re-

quired (Verissimo et al., 1997; Ruﬁno et al., 2006).

3 SYSTEM MODEL

The deﬁnition of a systemic model for CAN proved

extremely useful, showing the weaknesses of CAN

with regard to dependability and providing the

grounds to handle those problems effectively. The

fault assumptions for the system and a relevant set

of CAN protocol properties are drawn from previous

works on CAN (Ruﬁno et al., 1998; Ruﬁno et al.,

1999; Ruﬁno, 2002).

3.1 Fault Model

The CAN infrastructure is composed of N nodes in-

terconnected by a Channel. The Channel is the physi-

ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy - Application to Spaceborne Data Communication

Systems

457

cal path, i.e. the cable medium and transceivers, used

by Medium Access Control (MAC) entities to com-

municate.

A component is weak-fail-silent if it behaves cor-

rectly or crashes if it exhibits more than a given num-

ber of omission failures – the component’s omission

degree – in a time interval of reference, T

. In CAN,

an omission is an error that destroys a data or remote

frame. The following failure semantics are deﬁned

for CAN network components:

• individual components are weak-fail-silent with omis-

sion degree f

;

• failure bursts never affect more than f

transmissions in

a time interval of reference;

• omission failures may be inconsistent (i.e., not observed

by all recipients);

• there is no permanent failure of the Channel (e.g. the

simultaneous partitioning of all redundant media).

3.2 CAN Protocol Properties

For the sake of completeness, a discussion of a rele-

vant set of CAN properties is summarized next. The

foundation of CAN operation is described by the

physical layer properties formalized in Figure 2.

Property PCAN1 formalizes the quasi-stationary

propagation of signals in the CAN Channel (Stuart,

1999; Ruﬁno et al., 1999). A Bit is the physical layer

information unit and has a constant nominal duration.

A single Bit is broadcast in the Channel at a time, as

described by PCAN3. In absence of faults, a Bit p

at s assumes one and only one logical value v

(p).

The symbol

∏

is used in PCAN2 to specify a logical

AND function combining the signals from multiple

simultaneous transmitters into a single Bit value.

A key set of CAN MAC sub-layer properties is

also enumerated in Figure 2. Property MCAN1 de-

rives from CAN built-in error handling mechanisms,

implying that frame errors are transformed into omis-

sions. The residual probability of undetected frame

errors is negligible (Charzinski, 1994). Property

MCAN2 maps the system model failure semantics

onto CAN operational assumptions, being k≥ f

The behaviour of CAN in the time domain is de-

scribed by property MCAN4. In absence of faults, T

includes the normal queuing, access and frame trans-

mission delays. It depends on message latency classes

and offered load bounds (Davis et al., 2007; Zuberi

and Shin, 1997; Livani et al., 1998). In general, T

also needs to include the extra delays resulting from

the additional queuing effects caused by the periods

of inaccessibility (Pinho et al., 2000; Punnekkat et al.,

2000). The maximum frame transmission delay in-

cludes a corrective term, T

ina

, which accounts for the

Physical-level properties

PCAN1 - Bit Simultaneity: for any Bit p of any transmit-

ter s starting at t

(p), if t

(p) is the start of Bit p as seen by

receiver r, for any r, then in absence of faults, t

(p) = t

(p).

PCAN2 - Wired-AND Multiple Access: for all transmit-

ters s in N , the value of any B

it p seen by the channel c is,

in absence of faults, v

(p) =

∏

s∈N

(p).

PCAN3 - Bit Broadcast: in absence of faults, for any Bit

p on the channel c, and for any receiver r, v

(p) = v

(p).

MAC-level properties

MCAN1 - Error Detection: correct nodes detect any cor-

ruption done by the network in a locally received frame.

MCAN2 - Bounded Omission Degree: in a known time

interval T

, omission failures may occur in at most k trans-

missions.

MCAN3 - Bounded Inaccessibility: in a known time in-

terval T

, the network may be inaccessible at most i times,

with a total duration of at most T

ina

MCAN4 - Bounded Transmission Delay: any frame

queued for transmission is transmitted on the network

within a bounded delay of T

+ T

ina

Figure 2: Relevant CAN protocol properties.

worst case duration of inaccessibility events

(MCAN3). The inaccessibility characteristics of

CAN are obtained by analysis of the CAN protocol

(Verissimo et al., 1997; Ruﬁno, 2002).

4 NETWORK AVAILABILITY

The ﬁrst problem to be addressed concerns the avail-

ability of the CAN infrastructure. A commercial so-

lution (NOB, 1998) uses a self-healing ring/bus but it

does not solve the problem efﬁciently: ring reconﬁg-

uration may last as long as 100 ms, an extremely high

inaccessibility ﬁgure (Ruﬁno, 2002).

In CANELy, resilience to network physical parti-

tioning is achieved through replication of the physical

path (bus medium and transceivers) used by MAC en-

tities to communicate. Replication of channel media

assumes that: each cable replica is routed differently,

being reasonable to consider failures in different me-

dia as independent; any bit issued from a MAC sub-

layer is simultaneously transmitted on all the redun-

dant media interfaces.

Basic Media Redundancy Mechanisms

An innovative strategy to handle replicated media is

based on a Columbus’ egg idea and extends the wired-

AND nature of CAN (property PCAN2, in Figure 2)

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

458

to the media interface level (Ruﬁno et al., 1999):

the signals from the different M redundant media

receivers, M

(m), are combined in a conventional

AND function, before interfacing the standard MAC

sub-layer,Ch

The speciﬁcation of such strategy in VHDL

is drawn in Figure 3. This simple solution, fea-

sible given property PCAN1, ensures resilience to

medium physical partitions and stuck-at-recessive

failures (Ruﬁno et al., 1999).

−− MediaRX : Vector a ggregating the se v e r a l media .

−− ChRx : Channel incoming ( Rx) bi t stream , i s

−−−− l o gi c a l ’1 ’ i f a l l media are ’1 ’ , e l s e ’0 ’.

−−−− In CAN, l o g i ca l ’1 ’ <=> r e c e s s i v e ( r )

−−−− l o g i c a l ’0 ’ <=> dominant ( d )

ChRx <= ’1 ’ when MediaRX = (MediaRX ’ range => ’1 ’)

e l s e ’0 ’;

Figure 3: The AND-based media redundancy management

strategy in VHDL.

Resilience to bus stuck-at-dominant failures is

achieved exploiting the identity value of the logical

AND function. The contribution of each medium in-

terface for that function can be selectively disabled,

through the assertion of the M

dis

(m) signal, as speci-

ﬁed in Figure 4.

−− MediaRX : Ve c tor aggr e gati ng the s e v e r al media .

−− M Rx (m) : Medium m Rx s i g n a l ( t r an sc e i v e r ) .

−− M dis (m) : Medium m d i s a b l e s i g n a l .

procMediumRXOR: process i s

begin

−− P ar al l e l i z i n g s e l e c t i v e a c t i o n s on each medium

for m in 1 to NumberMedia loop

MediaRX(m) <= M Rx(m) or M dis (m) ;

end loop ; −− m

end process procMediumRXOR;

Figure 4: Media selection functions in VHDL.

The M

dis

(m) signal is asserted in conformity with

the speciﬁed in Figure 5, upon the detection of a

stuck-at-dominant failure, signalled through the as-

sertion of M

stkd

(m), or after a Medium has exceeded

its omission degree bound. The M

dis

(m) signal is

locked in the assert state until the negation of the

lock

(m) signal, by CANELy media quarantine en-

tities or upon a request issued from high-level man-

agement entities. The function speciﬁed in Figure 5

Very High-Speed Integrated Circuits (VHSIC) Hard-

ware Description Language.

contributes to ensure a safe operation, preventing a

faulty Medium of being unseasonably enabled.

−− M dis (m) : Medium m d i s a b l e s i g n a l .

−− M stkd (m) : Medium m i s stuck −at−dominant .

−− M

Od(m) : Medium m omis s ion degree .

−− k m : Medium m omis s ion degree bound .

−− M lock (m) : Medium m l o c k .

MediaDisable : for m in 1 to NumberMedia generate

−−−− P a r a l l e l i z i n g d i s a b l e a c t i o n s on each medium

−− Generate the d i s a b l e s i g n a l for Medium m

M dis (m) <=

M stkd (m) or (M Od(m)>k m ) or M lock (m) ;

end generate MediaDisable ;

Figure 5: Generation of the media disable functions in

VHDL.

The AND-based Media Redundancy Management

is a central component in the CANELy architecture of

Figure 6. The remaining modules identiﬁed in the di-

agram of Figure 6 provide additional monitoring and

fault treatment functions.

CAN Controller

Channel Interface

Media Redundant CAN Communication Channel

Figure 6: CANELy Media Redundancy Mechanisms.

Media and Channel Monitoring Functions

The set of Media and Channel monitoring functions

identiﬁed in Figures 7 and 8 are needed to comple-

ment the bare functionality provided by the AND-

based Media Redundancy Management strategy.

A combination of Media and Channel monitoring

signals is used to provide the following functionality:

• disable operation of Medium m, if a stuck-at-dominant

failure is detected, as reported through M

stkd

(m);

• perform receiver-based frame monitoring, comparing

Channel and Medium incoming frame data on a bit-

by-bit basis. This mechanism is fundamental to detect

Medium omissions;

• detect and account for omissions at each Medium inter-

face and evaluate the corresponding Medium Omission

degree, M

(m);

• disable operation of Medium m, if it exceeds the al-

lowed omission degree bound, k

, i.e. if M

(m) > k

ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy - Application to Spaceborne Data Communication

Systems

459

Basic Media Monitoring

stkd

(m) Stuck-at dominant medium

asserted if dominant for more than a given threshold;

negated upon detection of a recessive bit.

(m) Medium omission degree

incremented upon detection of an omission failure;

unchanged if common-mode or unkown-source errors;

reset upon correct frame transfer on the Medium.

lock

(m) Lock Medium m disable status

asserted upon disable of Medium m, by M

dis

(m) (Figure 5);

negated by media quarantine or by high-level entities.

Extended Media Monitoring

idle

(m) Medium idle

asserted if recessive for more than a given threshold;

negated upon assertion of Ch

EOT

(see Figure 8).

(m) Medium dominant signaling

asserted upon detection of a dominant bit;

negated upon assertion of Ch

EOT

(see Figure 8).

Figure 7: Media monitoring functions in CANELy.

Basic Channel Monitoring

SOF

Start Of Frame

asserted at beginning of frame transmission;

one bit-time duration.

Fok

Frame Correct

data or remote frame received without errors;

negated upon assertion of Ch

EOT

Err

Frame Error

asserted upon violation of CAN bit-stufﬁng rule;

negated upon assertion of Ch

EOT

End Of Transmission

asserted after detection of minimum bus idle period;

negated upon assertion of Ch

SOF

Extended Channel Monitoring

stk−Tx

Stuck-at dominant Channel

asserted if dominant for more than a given threshold;

negated upon management request.

Figure 8: Channel monitoring functions in CANELy.

These mechanisms provide effective resilience

against all the cabling failures discussed in Section 2.

They are not hard to implement in VHDL as FPGA

based components.

The VHDL/FPGA machinery of a functionally ef-

fective CANELy unit should also integrate speciﬁc

means for the preservation of dependability coverage,

as follows:

• detection of medium partition and medium stuck-at-

recessive failures and their signalling to high-level

management entities;

• early detection of stuck-at-dominant Channel failures,

allowing a prompt shutdown of the incorrect node;

Field Programmable Gate Array.

• operation of a CAN-oriented media quarantine

scheme, allowing an optimal k = 1 Channel omission

degree bound, if at least one channel media replica is

unaffected by errors (permanent or transient).

Management Interface

The layer management entities identiﬁed in Fig-

ure 6 are elements of the CANELy machinery im-

plemented as FPGA-based components. They pro-

vide an interface between the hardware infrastructure

and the high-level network management protocol en-

tities. Both invocation and notiﬁcation primitives are

included, as speciﬁed in Figure 9, given the param-

eters: (i) baud, the bus bit signalling rate; (ii) k

the media omission degree bound; (iii) m, the failed

Medium; (iv) mid, the message identiﬁer.

Invocation Primitives (canely-msu.req)

Description

Initialize (baud, k

)

Notiﬁcation Primitives (canely-msu.nty)

Description Issuing Condition

Omission degree exceeded (m) M

(m) > k

Stuck-at-dominant Medium (m) M

stkd

(m)

Stuck-at-recessive Medium (m, mid) M

idle

(m) ∧ ¬M

(m)

Medium partition (m, mid) M

idle

(m) ∧ M

(m)

Stuck-at-dominant Channel Ch

stk−Tx

Figure 9: CANELy redundancy management primitives.

The parameters signalled upon stuck-at-recessive

and Medium partition failures permits a high-level di-

agnose application to establish a node connectivity

matrix, useful to pinpoint the location of the failure

in the network cabling.

Although such mechanisms may be useless in

unmanned spacecraft, they may be important for

manned space ﬂights or in permanent planetary bases

where the crew may perform some repair actions.

5 INACCESSIBILITY CONTROL

Normal CAN operation can be hindered by periods

of inaccessibility, which derive from incidents in net-

work operation (e.g. bit errors) that temporarily pre-

vent communication. Service is not provided to some

or all of the nodes and this may have the effect of

increasing the corresponding queueing and network

access delays. Analysis of message transmission la-

tencies performed under the assumption the network

always operates normally (Davis et al., 2007; Zuberi

and Shin, 1997; Livani et al., 1998) are relevant and,

undoubtedly, useful for optimal system conﬁguration.

However, bounds are established that may be violated

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

460

upon the (even if rare) occurrence of inaccessibility

events.

To avoid timing failures due to network inacces-

sibility incidents it is required to control inaccessibil-

ity (Ruﬁno, 2002; Ruﬁno et al., 2006).

5.1 CAN Inaccessibility Boundedness

The ﬁrst step to control inaccessibility implies the

study of network accessibility constraints, ensuring

that the number of inaccessibility periods and their

duration have a bound. The analysis in (Verissimo

et al., 1997; Ruﬁno, 2002) provides a comprehensive

set of easy-to-use formulas to evaluate the worst-case

bounds of the periods of inaccessibility.

The results of such analysis are summarized in

Figure 10. It is worth noticing: the single bit er-

rors (on the leftmost part of Figure 10) are not re-

duced because they affect only the transmission of

one frame; the worst-case inaccessibility bound for

bus multi-burst errors is reduced, due to the effects

of CANELy media quarantine mechanisms; the ex-

tremely low worst-case ﬁgure of bus reconﬁguration

delay (209 µs @ 1Mbps), compared with other fail-

ure scenarios and, in particular, with the 100 ms of

existing commercial systems (NOB, 1998).

500

1000

1500

2000

2500

3000

Bit

Stuff

CRC

ACK

Form

React.

Ovload

Single

Burst

M. Burst

(k=3)

Deaf

Rx Stuck

Tx Stuck

Rx Fail

Tx Fail

Bus

Reconf.

CAN 2.0B Inaccessibility (bit-times)

Error Scenarios

Standard CAN

CANELy

Figure 10: Normalized durations of inaccessibility periods.

On the other hand, the actions taken in (Ruﬁno,

2002) to enforce the weak-fail-silent assumption for

the network components: are based on CAN own er-

ror conﬁnement mechanisms (Ruﬁno et al., 1998); in-

duced only a moderate, though interesting, reduction

of inaccessibility durations for receiver and transmit-

ter failure scenarios, as shown in Figure 10.

The avoidance of “babbling idiot” failures has fur-

ther been studied: the inaccessibility constraint de-

rived in (Broster and Burns, 2003) for CAN settings

has a normalized duration of 41 bit-times, much lower

than the values inscribed in Figure 10. Babbling id-

iot failures are not detectable by the native CAN error

handling and fault conﬁnement mechanisms. Protec-

tion has to be provided by special-purpose machinery

(bus guardian) (Broster and Burns, 2003).

5.2 Message Schedulability Analysis

Next, it is required to show that inaccessibility bounds

are suitably low for service requirements. This re-

quires a comprehensive analysis of message schedu-

lability guarantees given known trafﬁc patterns and

offered load bounds. Both error free and worst-case

error analysis are relevant. The former, is intended

to provide the parameters required for optimal sys-

tem conﬁguration (Davis et al., 2007; Zuberi and

Shin, 1997; Livani et al., 1998). The latter, given

a worst-case pattern of inaccessibility incidents, pro-

vides hard real-time guarantees of message schedu-

lability and deﬁnes worst-case message delivery de-

lays (Pinho et al., 2000; Punnekkat et al., 2000).

Extended versions of existing message schedula-

bility analysis tools and methodologies (Pinho et al.,

2000) should be able to provide relevant parameters

for system conﬁguration, including a bound for the

time the effects of inaccessibility last in the system.

5.3 CAN Inaccessibility Control

To enable low-level control of inaccessibility, the

Channel monitoring functions of Figure 8 should be

extended with the additional functionality summa-

rized in Figure 11. The Ch

Ina

signal, is used for:

• the evaluation of the real durations of inaccessibility

incidents, t

ina

, and of the extra message queuing and

network access delays, t

p ina

;

• the evaluation of inaccessibility upper bounds with re-

spect to the total number of incidents, i, and their to-

tal duration, T

ina

, in a period of reference (property

MCAN3);

• the evaluation of the worst-case duration of the en-

tire period where the effects of inaccessibility last in

the system, which we have deﬁned as inaccessibility

epoch, T

ina

Extended Channel Monitoring

Bidle

Bus Idleness

asserted if bus is idle for more than the nominal intermission;

negated upon detection of a dominant bit.

Ina

Channel Inaccessibility status

asserted upon assertion of Ch

Err

(Figure 8);

negated upon assertion of Ch

Bidle

Figure 11: Extension of Channel monitoring functions for

the control of inaccessibility.

The layer management interface of Figure 9 is

extended with the primitives described in Figure 12,

which are used to manage the effects of inaccessibil-

ity in protocol execution, at all the relevant levels of

the system (Ruﬁno, 2002; Ruﬁno et al., 2006).

ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy - Application to Spaceborne Data Communication

Systems

461

Invocation Primitives (canely-icu.req)

Description Bounds

Get Channel status (Ch

Ina

)

Get Channel inaccessibility events (Ch

) i

Get Channel inaccessibility times (t

ina

, t

p ina

) T

ina

, T

p ina

Notiﬁcation Primitives (canely-icu.nty)

Description

Channel status change (Ch

Ina

)

Figure 12: CANELy inaccessibility control primitives.

At application-level, a corrective term account-

ing for the worst-case duration of an inaccessibility

epoch is simply added to optimal timeout values. At

low-level protocols, advanced inaccessibility control

mechanisms allow: to account for the real duration

of an inaccessibility epoch; to selectively add a cor-

rective term to (optimal) timeout values, only when

inaccessibility affects protocol timeliness.

6 TSP SYSTEM INTEGRATION

In TSP systems the exchange of data between parti-

tions (either local or remote) is restricted to autho-

rized interpartition communication channels. Two

paradigms are used: sampling ports, holding only one

ﬁxed-size message; queueing ports, holding room for

a given number of variable-size atomic and totally or-

dered messages. The size of each message has a pre-

determined maximum size. Such approach is in con-

formity with the ARINC-653 speciﬁcation (AEEC,

2006).

The CANELy architecture and its companion real-

time protocol suite, supporting group communica-

tion (Ruﬁno et al., 1998), clock synchronization (Ro-

drigues et al., 1998), node failure detection and site

membership (Ruﬁno et al., 2003), constitutes an ex-

cellent candidate for supporting interpartition com-

munication in distributed TSP systems.

The CANELy machinery supporting the execu-

tion of the real-time communication protocol suite,

dubbed CANELy Dependability Engine, can be ex-

tended to support interpartition communication in dis-

tributed TSP systems, as depicted in Figure 13. Such

extension is achieved through provision and manage-

ment of the buffers necessary for implementing the

sampling and queueing communication ports. The ar-

chitecture proposed in Figure 13 decouples network

operations from message data processing by the cor-

responding partition in the distributed TSP system.

6.1 Proof-of-Concept Prototype

In the prototype of the CANELy architecture cur-

Figure 13: CANELy Dependability Engine extension with

support for distributed TSP systems.

rently being implemented (Figure 14), the CANELy

processing infrastructure required for the execution

of CANELy low-level protocols is materialized using

the state-of-the-art Dallas/Maxim DS80C390 High-

Speed Microprocessor (Dallas, 2005).

FPGA

Microcontroller

Dual-MAC

Reliable Comm. Protocol Suite

Layer Management

Dual-MAC

(optional)

Management

Interface

CANELy Functions

Control of Inaccessibility

CAN Monitoring

AND-based Media Redundancy

Channel

Interface

Media

Interfaces

Cable Connectors

Figure 14: CANELy prototype board.

The support for the low-level special-purpose

functions is implemented by a single, medium capac-

ity, programmable logic device (Field Programmable

Gate Array - FPGA) (Xilinx, 2009).

7 CONCLUDING REMARKS

Given the increasing demand for embedded dis-

tributed fault-tolerant systems based on low-cost net-

work technologies, it has been: investigated the short-

comings of CAN, with regard to dependability and

timeliness; deﬁned a systemic model of CAN that not

only did it show those weaknesses, but it provided the

grounds to handle those problems effectively.

This paper discussed the implementation of the

main components in the CANELy architecture, the

CAN Enhanced Layer (Ruﬁno, 2002), a combina-

tion of the CAN standard layer with some simple ma-

chinery resources and low-level protocols achieving

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

462

highly dependable real-time communications. The

CANELy mechanisms enhance the dependability and

timeliness of CAN-based systems and allow the as-

sessment of real system parameters (w.r.t. timing,

omission), thus making possible to monitor the cov-

erage of both dependability and timeliness models.

In the context of spaceborne applications, the

CANELy architecture can be used to support inter-

partition communication in distributed TSP systems.

Finally, this paper identiﬁed the set of functions

to be implemented as FPGA-based components and

the functionality that has to be integrated at CANELy

(software) protocols.

REFERENCES

AEEC (2006). Avionics application software standard in-

terface. ARINC Speciﬁcation 653, Airlines Electronic

Engineering Committee (AEEC).

AEEC (2010). General standardization of CAN (Controller

Area Network) for airborne use. ARINC Spec. 825-1,

Airlines Electronic Engineering Committee (AEEC).

Broster, I. and Burns, A. (2003). An analysable bus-

guardian for event-triggered communication. In Proc.

of 24th Real-time Systems Symposium, pages 410–

419, Cancun, Mexico. IEEE.

CAN (1993). International Standard 11898 - Road vehicles

- Interchange of digital information - Controller Area

Network for high-speed communication. ISO.

CAN (1997). TJA1053 - Fault-tolerant CAN transceiver.

Philips Semiconductors.

Charzinski, J. (1994). Performance of the error detection

mechanisms in CAN. In Proc. of the 1st Int. CAN

Conference, pages 1.20–1.29, Mainz, Germany. CiA.

CiA (1994). CAN Physical Layer for Industrial Applica-

tions - CiA Draft Standard 102 Version 2.0. CiA -

CAN in Automation.

Dallas (2005). DS80C390 Dual-CAN High-Speed Micro-

processor. Maxim/Dallas Semiconductors.

Davis, R. I., Burns, A., Bril, R. J., and Lukkien, J. J. (2007).

Controller Area Network (CAN) schedulability anal-

ysis: Refuted, revisited and revised. Real-Time Sys-

tems, 35:239–272.

ECSS (2005). ECSS Draft Standard ECSS-E-ST-50-15C.

Recommendations for CAN Bus in Spacecraft On-

board Applications. European Cooperation for Space

Standardization (ECSS).

Livani, M., Kaiser, J., and Jia, W. (1998). Scheduling hard

and soft real-time communication in the controller

area network (CAN). In Proc. of the 23rd IFAC/I-

FIP Workshop on Real-Time Programming, Shantou

- China. IFAC/IFIP.

NOB (1998). RED-CAN a fully redundant CAN-system.

NOB Elektronik AB Product Note - Sweden.

Pinho, L., Vasques, F., and Tovar, E. (2000). Integrating

inaccessibility in response time analysis of CAN net-

works. In Proc. of the 3rd Int. Workshop on Factory

Communication Systems, Porto, Portugal. IEEE.

Punnekkat, S., Hansson, H., and Norstrom, C. (2000). Re-

sponse time analysis under errors for CAN. In Proc.

of the Real-Time Technology and Applications Sympo-

sium, pages 258–265, Washington, USA. IEEE.

Rodrigues, L., Guimar˜aes, M., and Ruﬁno, J. (1998). Fault-

tolerant clock syncronization in CAN. In Proc. of

19th Real-Time Systems Symposium, pages 420–429,

Madrid, Spain. IEEE.

Ruﬁno, J. (2002). Computational System for Real-Time Dis-

tributed Control. PhD thesis, Technical University of

Lisbon - Instituto Superior T´ecnico, Lisboa, Portugal.

Ruﬁno, J., Craveiro, J., and Verissimo, P. (2010). Building

a time- and space-partitioned architecture for the next

generation of space vehicle avionics. In Proc. of the

8th IFIP Int. Workshop on Software Technologies for

Embedded and Ubiquitous Systems, pages 179–190.

IFIP, Springer.

Ruﬁno, J., Verissimo, P., and Arroz, G. (1999). A Colum-

bus’ egg idea for CAN media redundancy. In Digest

of Papers, The 29th Int. Symposium on Fault-Tolerant

Computing Systems, pages 286–293, Madison, Wis-

consin - USA. IEEE.

Ruﬁno, J., Verissimo, P., and Arroz, G. (2003). Node failure

detection and membership in CANELy. In Proc. of the

2003 International Conference on Dependable Sys-

tems and Networks, pages 331–340, San Francisco,

California, USA. IEEE.

Ruﬁno, J., Verissimo, P., Arroz, G., and Almeida, C. (2006).

Control of inaccessibility in CANELy. In Proc. of the

6th. Int. Workshop on Factory Communication Sys-

tems, pages 35–44, Torino, Italy. IEEE.

Ruﬁno, J., Verissimo, P., Arroz, G., Almeida, C., and Ro-

drigues, L. (1998). Fault-tolerant broadcasts in CAN.

In Digest of Papers, The 28th Int. Symposium on

Fault-Tolerant Computing Systems, pages 150–159,

Munich, Germany. IEEE.

Stuart, R. (1999). CAN bit timing requirements. Applica-

tion Note AN1798, Motorola, Inc.

Verissimo, P., Ruﬁno, J., and Ming, L. (1997). How hard is

hard real-time communication on ﬁeld-buses? In Di-

gest of Papers, The 27th Int. Symp. on Fault-Tolerant

Computing Systems, Washington - USA. IEEE.

Xilinx (2009). Spartan-3E FPGA family data sheet.

Zuberi, K. and Shin, K. (1997). Scheduling messages on

Controller Area Network for real-time CIM applica-

tions. IEEE Transactions on Robotics and Automa-

tion, 13(2):310–314.

ENFORCING DEPENDABILITY AND TIMELINESS IN CANELy - Application to Spaceborne Data Communication

Systems

463