Fault-tolerant Distributed Continuous Double Auctioning on
Computationally Constrained Microgrids
Anesu M. C. Marufu
, Anne V. D. M. Kayem
and Stephen Wolthusen
Department of Computer Science, University of Cape Town, Rondebosch 7701, Cape Town, South Africa
School of Mathematics and Information Security, Royal Holloway, University of London, London, U.K.
Norwegian Information Security Laboratory, Department of Computer Science, Gjovik University College, Gjovik, Norway
Computationally Constrained Microgrid, Power Network Stability, Fault Tolerance, Auction Protocol.
In this article we show that a mutual exclusion protocol supporting continuous double auctioning for power
trading on computationally constrained microgrid can be fault tolerant. Fault tolerance allows the CDA algo-
rithm to operate reliably and contributes to overall grid stability and robustness. Contrary to fault tolerance
approaches proposed in the literature which bypass faulty nodes through a network reconfiguration process,
our approach masks crash failures of cluster head nodes through redundancy. Masking failure of the main
node ensures the dependent cluster nodes hosting trading agents are not isolated from auctioning. A ren-
dundant component acts as a backup which takes over if the primary components fails, allowing for some
fault tolerance and a graceful degradation of the network. Our proposed fault-tolerant CDA algorithm has a
complexity of O (N) time and a check-pointing message complexity of O(W ). N is the number of messages
exchanged per critical section. W is the number of check-pointing messages.
Auction mechanisms for resource allocation support
power trading schemes on microgrids (Cui et al.,
2014), (Borenstein et al., 2002), (Izakian et al., 2010),
(Pałka et al., 2012), (Sta
nczak et al., 2015) (Marufu
et al., 2015). Most microgrid power trading schemes
assume network reliability. When computationally
constrained devices form the infrastructure of such
networks, problems ranging from signal distortion to
component failures emerge. (Marufu et al., 2015)
proposed a distributed Continuous Double Auction
(CDA) algorithm for allocating power in a compu-
tationally resource-constrained microgrid. The CDA
algorithm describes a market scenario in which trad-
ing agents sell goods (sellers) and buy goods (buy-
ers). The distributed CDA algorithms ensures effi-
cient message passing(Marufu et al., 2015) and mini-
mal trade execution time. However, the (Marufu et al.,
2015) CDA algorithm is not fault tolerant. Fault tol-
erance reinforces grid resilience.
Fault tolerant systems are able to provide services
in the presence of failure (Jalote, 1994) (M
edard and
Lumetta, 2003). Node failure is performance detri-
mental to the CDA mechanism. Furthermore, unhan-
dled node failures open avenues for attacks, such as
denial of service attacks (DoS) that take advantage of
these scenarios. Another supporting notion of fault
tolerance within CDA specifications for microgrids
stems from Murphy’s Law (Bloch, 2003). If we con-
sider the recent precedence for adaptations of CDA
mechanisms in solving new problems within similar
cyber-physical critical systems (Marufu et al., 2015),
the consequences of failure can be serious if fault pre-
vention and tolerance measures are not implemented.
Specifically, we consider a scenario where a head
cluster node within a hierarchically structured net-
work architecture fails. Such a failure implies that
the child cluster nodes will be isolated from the rest
of the network. In addition, failure of the head cluster
node may also result in partitioning of the network. If
a mutually exclusive distributed CDA mechanism is
employed, the isolated nodes which act as hosts for
the Trading Agents (TAs) will inevitably be excluded
from trading. For instance, if pivotal seller TAs fail
to participate in the auction market, a significant shift
of demand over supply occurs which will inflate the
energy prices for buyers. This disrupts grid stabil-
ity (since failure of TAs to participate in the auction
results uneven balancing of demand and supply) and
in-turn causes distrust (due to unreliable energy allo-
cation) among members within the microgrid.
Marufu, A., Kayem, A. and Wolthusen, S.
Fault-tolerant Distributed Continuous Double Auctioning on Computationally Constrained Microgrids.
DOI: 10.5220/0005744304480456
In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), pages 448-456
ISBN: 978-989-758-167-0
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
In this paper we propose a fault tolerant dis-
tributed CDA algorithm that is efficient in message
exchange and computation time. In contrast to the
standard approach of bypassing faulty nodes, our ap-
proach masks failure through redundancy of cluster
head nodes.
The remainder of this paper is organised as fol-
lows: Section 2 discusses the state of the art. Section
3 presents an overview of the distributed CDA frame-
work for a resource-constrained microgrid, while Sec-
tion 4 describes the fault model, failure handling strat-
egy the fault tolerant algorithm. We analyse the fault
tolerant CDA algorithm through correctness and effi-
ciency. Section 5 concludes this article and identifies
on-going and future work.
In (Marufu et al., 2015) a token based mutual exclu-
sion protocol is used to support a CDA auction mech-
anism, which means the suggested CDA formulation
inherits properties of the distributive primitives. Re-
quests to enter the CS are directed to whichever node
is the current token holder. The token-based mutual
exclusion approach adopted in (Marufu et al., 2015)
was inspired by (Raymond, 1989). Raymond’s al-
gorithm is resilient to non-adjacent node crashes and
recoveries, but not node/link failures. As an exten-
sion to Raymond’s algorithm, (Chang et al., 1990) im-
posed a logical direction on a number of edges to in-
duce a token oriented directed acyclic graph (DAG),
where there exists a directed path originating from
n while terminating at the token holding node. Re-
silience to link and site failures is achieved by allow-
ing request messages to be sent over all edges of the
DAG. Besides high message complexity costs, the so-
lution in (Chang et al., 1990) does not consider link
recovery. In related work (Dhamdhere and Kulkarni,
1994) reveal that the algorithm in (Chang et al., 1990)
can suffer from deadlock, thereby they propose a dy-
namically changing sequence number to each node to
form a total ordering of the nodes. Since the token
holding node possesses the highest sequence number,
a DAG is maintained if the links are defined to point to
a node with higher sequence number. In cases where a
node has no outgoing links to the token holder (due to
link failure), it will flood the network with messages
to build a spanning tree. Once the token is part of this
spanning tree, a token will be granted to this node,
bypassing other requests. If link failures are persis-
tent (as will be the case in our given context) starva-
tion will be inevitable since priority is given to nodes
that lose a path to the token holding node. In addi-
tion, flooding of messages incurs high costs for a typ-
ical resource constrained setup. (Walter et al., 2001)
present a reversal of the technique where a destination
oriented DAG is maintained when destination is a dy-
namic destination. Ordering of nodes is similar as in
(Dhamdhere and Kulkarni, 1994), but the lowest node
is assigned the token. The described works mainly ad-
dress the issue of link failure with little effort on ad-
dressing node failure, explicitly assuming that nodes
do not fail and network partitioning will not occur.
Another body of literature indicates that some work
towards solving node failure has been done. (Revan-
naswamy and Bhatt, 1997) propose a solution to han-
dle failures of nodes and links in a network with defi-
nitions carried over from (Raymond, 1989). Their at-
tempt at fault tolerance involves eliminating the failed
nodes and obtaining a different tree structure utilising
chords (edges of the tree yet unused for message ex-
changes) from the network. A reconfiguration process
attempts to connect parts of the tree separated due
to failures. Reconfiguration eliminates failed compo-
nents, when the cluster node is affected, the nodes that
are reliant on it will not be able to participate in the
bidding process. This is undesirable and so a robust
solution that is able to guarantee reliability is needed.
We consider a small community microgrid supported
by a double auctioning layer. An energy sharing
agreement amongst participating community mem-
bers ensures fair and trustworthy access to energy.
The community is comprised of a number of clus-
tered households within a given area. Distribution
of energy is facilitated by a communication network
that overlays a power distribution network. This com-
munication network is hierarchically structured with
a large number of mobile phones M
forming clus-
ter nodes and relatively fewer, fixed smart meters M
forming cluster heads. We consider that M
host the trading agents TA
which conduct trade in
the CDA market on behalf of the community mem-
To ensure that only one trading agent submits an
offer in the auction market at a time, (Marufu et al.,
2015) propose a protocol for the serialization of mar-
ket access and fair chances to submit an offer. The
protocol satisfies the following properties:
ME1: [Mutual exclusion] At most only one trad-
ing agent can remain in the auction market at any
time. This is a safety property.
Fault-tolerant Distributed Continuous Double Auctioning on Computationally Constrained Microgrids
ME2: [Freedom from deadlock] When the auc-
tion market is free, one of the requesting trading
agents should enter critical section. This is a live-
ness property.
ME3: [Freedom from Starvation] A trading agent
should not be forced to wait indefinitely to enter
the auction market while other trading agents are
repeatedly executing their requests. This is a live-
ness property.
ME4: [Fairness] requests must be executed in the
order in which they are made.
The algorithm in (Marufu et al., 2015) uses a token-
based approach to address the mutual exclusion prob-
lem because of the low message traffic generally as-
sociated with such algorithms (Ghosh, 2014) (Ray-
mond, 1989) (Raynal, 1986) (Kshemkalyani and
Singhal, 2008) (Garg, 2011). A token is a ‘per-
mit’ passed around among requesting trading agents
thereby granting privilege to enter the auction market.
Thus, a trading agent requests for the token in order
to participate in the auction. At any moment, only
one trading agent exclusively holds the token. The to-
ken contains a copy of the order-book to which the TA
submits its offer (Marufu et al., 2015).
Studies indicate that resiliency to failure as one weak-
ness in the adopted token-based approach (Chang
et al., 1990) (Dhamdhere and Kulkarni, 1994) (Wal-
ter et al., 2001). Although Marufu et als algorithm
(Marufu et al., 2015) includes some fault prevention
and to a lesser extent fault tolerance as part of the
token-handling specification the Marufu et al solution
assumes that M
nodes do not fail. Token-handling
in Marufu et al includes:
1. Temporarily relaying the token through a close-
by, same-cluster M
i node with better connec-
tivity to M
2. Prompting an M
i node requesting the token to
disable its sleep mode functionality until it has en-
tered the critical section.
3. Employing timing checks to ensure M
i nodes
do not hold on to the token indefinitely as a result
of faults or disconnections.
The Marufu et als CDA fails to guarantee trading
agents’ participation in the auction market in the event
of M
node failure. The following section discusses
the approach we use in order to design a distributed
CDA algorithm with fault tolerance.
We propose handling crash failures of M
nodes in
the network specifications from (Marufu et al., 2015).
As an initial step to building a fault-tolerant system,
the literature (M
edard and Lumetta, 2003) (Jalote,
1994) (van Steen and Tanenbaum, 2001) (Tanen-
baum and Van Steen, 2007) suggests defining the fault
model (the number and classes of faults that need to
be tolerated). A fault model includes a set of failure
scenarios along with frequency, duration and impact
of each scenario (M
edard and Lumetta, 2003). Inclu-
sion of fault scenarios in the fault model is based on
the frequency, impact on the system and feasibility
or the cost of providing protection. The next section
specifies the fault classes and scenarios we use to for-
mulate the fault tolerant CDA.
4.1 Fault Model
Among the several well-known failure classification
schemes, this article considers crash failures, omis-
sion failures, timing failures, response failures and
arbitrary failures (van Steen and Tanenbaum, 2001).
For simplicity and as an initial step towards tolerating
failure of M
nodes in (Marufu et al., 2015), this ar-
ticle only considers crash failure. If a system cannot
tolerate crash failures then there is no way it can tol-
erate the other classes of failure (Ghosh, 2014) (van
Steen and Tanenbaum, 2001). A crash failure occurs
when a process prematurely halts, but was working
correctly until it stopped. We consider a typical crash
failure of the M
nodes, where once the node has
halted, nothing is heard from it anymore. Communi-
cation through M
is blocked; hence, the algorithm
does not progress but instead will be held up until the
node is recovered. The two failure situations to con-
sider include crash of an M
node when in posses-
sion of the token or when not.
1. If a Token Possessing M
Node Fails: the al-
gorithm in (Marufu et al., 2015) will not recover
from failure of the site holding the token. They
are different scenarios to be considered which in-
clude: when the token is being utilized by an M
when the token is in transit between M
when the token is in the M
ready to be sent to
another child M
node or when the token is in
the M
ready to be passed to a requesting neigh-
bouring M
2. Token Requesting M
Node Fails: In this case,
if the M
node responsible for forwarding the to-
ken request crashes, the path would have to be re-
established. Failure of such a node results in parti-
tioning of the network, leading to similar impacts
ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy
stated in the previous case.
4.2 Fault Tolerance Approach
As mentioned before, each M
node acts as a proxy
to a cluster of M
nodes and its failure would need to
be masked to allow for continued service to the child
nodes. Thus, we introduce the concept of a re-
dundant M2
backup node to each M
node. The
reason for opting for this approach is inspired by its
simplicity and preservation of M
nodes’ function-
ality thereby enabling trading agents hosted on M
child nodes to transact in the auction market. A down-
side to primary backup fault tolerance is that it han-
dles Byzantine failures poorly because there is usually
no check routine to make sure the primary is function-
ing correctly. In addition, the primary backup system
must always be in agreement with the main system so
that the backup node can take over the functions of
primary node. Recovery from a primary node failure
is usually time consuming and complex.
4.3 Fault Tolerant CDA Algorithm
We propose a fault tolerant distributed CDA
algorithm. At the M
and M2
nodes the
algorithm executes the following routines:
GlobalTokenRequest, LocalTokenDistribution, and
GlobalTokenTrans f er. The M2
nodes also execute
an additional routine called CrashFailureHandling.
Contrary, M
nodes execute the following rou-
tines: LocalTokenRequest, LocalMarketExecution
and LocalTokenTransfer. The fault tolerant CDA
algorithm includes two types of request messages:
(global token request) and Req
(local token
request). Incoming Req
from neighbouring M
nodes (including ‘self’) are enqueued in a FIFO
RQ1 queue while Req
from child M
nodes are
enqueued in a FIFO RQ2 queue. M
nodes have a
POINT ER variable (points to the M
in possession
of the token, or next intermediate M
pointing to
a token holding node- see (Raymond, 1989)) where
is sent.
4.3.1 Local Token Request
When a TA needs to trade in the auction market, the
TA triggers the hosting M
i node to send a Req
to the M
provided that battery is above 10 percent
and it does not already possess the token. When a
is sent to M
, the sleep mode is deactivated
to ensure the M
node remains online to receive the
token. On receipt of the token FlagM
is turned to
IF(FlagM_{mp} == FALSE)
{IF(BatteryLife > 10%)
{ Send (ReqM_{mp};i) to M_{sm}
Disable doze mode
Wait until(FlagM_{mp}== TRUE)}}
Do not send ReqM_{mp}
4.3.2 Global Token Request
The GlobalTokenRequest routine will be the initial
procedure that allows an M
node to send a re-
quest to participate in the mutual exclusion token ex-
change process. Once a Req
is sent to the neigh-
bour node holding the token or in path to the to-
ken, TokenAsked which is a boolean variable is set
to TRUE. This avoids forwarding of similar request
messages to the same token holder. While the token
is not received the token, FlagM
remains FALSE.
Two subroutines or methods namely EnqueueRequest
and Nodebackup are executed while M
is waiting
for the token. EnqueueRequest enqueues the Req
and Req
requests into RQ1 and RQ2 respectively,
while Nodebackup creates a checkpoint by sending
updates to the M2
IF (FlagM_{sm} == FALSE)
{IF (TokenAsked == FALSE)
{Send ReqM_{sm} to node in POINTER*
Set TokenAsked to TRUE
WHILE(FlagM_sm == FALSE)
{IF (ReqM_{mp} == Received ||
ReqM_{sm} == Received)
NodeBackup() }
Do nothing }
Set FlagM_{sm} == TRUE }
ELSEIF (TokenAsked==TRUE)
WHILE (FlagM_sm == FALSE)
{IF (ReqM_{mp} == Received ||
ReqM_{sm} == Received)
NodeBackup() }
Do nothing }
Set FlagM_{sm} == TRUE
4.3.3 Local Token Distribution
Once the token is received the algorithm moves to
the LocalTokenDistribution routine. Receipt of the
token by the requesting M
node sets the boolean
variable FlagM
to TRUE indicating possession of
the token. GQ is a FIFO queue that stores a copy
of requests submitted and “locked in” at the arrival
of the token to the cluster head. Once a token is re-
ceived all requests in RQ2 are transferred to GQ2.
Fault-tolerant Distributed Continuous Double Auctioning on Computationally Constrained Microgrids
While there are still requests in GQ, the algorithm
will run the EnqueueRequest and BackupNode sub-
routines before sending to an M
node with a request
at the head of the GQ. After token is allocated to an
node, the nodes corresponding Req
entry is
removed from GQ. The M
node then waits for a
predefined time for the return of the token. To ensure
that M
does not wait for the token for an undefined
time, the routine will consider a time bound on the
wait period. Failure of the M
nodes to return the
token within the predefined time results in the token
degradation (an M
) and regenerated (at M
) from
the last known backup.
IF (FlagM_sm == TRUE)
{IF (self ReqM_{sm} is at head)
{GQ <- RQ
n <- GQ entries
WHILE {n >= 1}
{ EnqueueRequest()
IF (M_{mp} at head != disconnected)
{Assign token to $ M_{mp} $
Remove from GQ
Wait for token return
IF (TokenReturn == TRUE)
Send Token(TRUE) to next }
{TokenReturn = timed-out
TokenRegenerate() }
n--} }
Send Token(TRUE) to next }
4.3.4 Global Token Transfer
A token-possessing M
node will send the token if
it has a non-empty RQ1 (where Req
at head of the
RQ1 is not its request). The boolean FlagM
is then
set to FALSE and the token is sent to the respective
node with a Req
at the head of RQ1.
IF (FlagMsm = TRUE && RQ1 != empty &&
ReqM_{sm} != ‘self’)
{Set FlagM_{sm} to FALSE
Send token to ReqM_{sm} at RQ1 head}
4.3.5 Local Market Execution
Once a token is received at the M
node, the
TokenCounter variable is incremented. The TA is al-
lowed to create an offer (bid/ask) which is submit-
ted into the auction market order-book. TokenOB is
an online copy of the CDA order-book in the token.
LocalOB is a local copy of the CDA order-book up-
dated each time a trading agent participates in the auc-
tion market. Each M
has a ClusterDir that con-
tains a directory of neighbouring M
nodes. Token-
Counter keeps a record of the number of auction mar-
ket rounds. When a predefined number of rounds is
reached trading is terminated. A message including
trading day statistical data may be communicated to
the rest of the participating nodes.
IF (FlagM_{mp} == TRUE)
{ TokenCounter++
IF (tradeID == buyer)
{ FormOffer(ob, Pt, Pll, Pul)
bid = offer
IF (bid <= ob ||
out of [P_{ll}, P_{ul}] range)
bid is invalid
{ ob = bid
IF (ob >= oa)
P_{t} = oa
Trade! and Obook update} }
ELSIF (tradeID == seller)
{ FormOffer(ob, Pt, Pll, Pul)
ask = offer
IF (ask >= oa OR
out of [$ P_{ll}, P_{ul} $] range)
ask is invalid
{ oa = ask
IF (ob >= oa )
P_{t} = ob
Trade! and Obook update} }
{no new oa/ob in pre-specified period
Round ended with no transaction }
Update LocalOB from TokenOB}
4.3.6 Local Token Transfer
Once a trading agent has completed its execution in
the auction market the token is returned to M
. If the
connection to M
is ALIVE the token is send back,
if not the token-possessing M
tries to send the to-
ken via a neighbouring M
with probably the best
connectivity to M
IF (Connection to M_{sm} == ALIVE)
Return Token to M_{sm}
ELSIF (Connection to M_{mp} via M_{sm} == ALIVE)
Return Token to M_{sm}
{ Destroy the token
Revert back changes in the LocalOB }
Set FlagM_{mp} to FALSE
4.3.7 Crash Failure Handling
This routine is executed at the M2
node. An in-
depth discussion of the crash failure handling strategy
executed in this routine is presented in Subsection 4.4.
IF (Timeout = TRUE OR N_Query = TRUE)
{ Ping M_{sm} with Enquiry() message
IF (M_{sm} respond = TRUE)
{M_{sm} is alive
Wait for NodeBackup() }
Resume M_{sm} operations } }
ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy
4.4 Crash Failure Handling Strategy
There are four general activities defined by (Jalote,
1994) that systems employing fault tolerance have to
perform: error detection, failure containment, error
recovery, and fault resolution and continued system
service. These considerations have been factored into
the new fault tolerant protocol.
4.4.1 Error Detection
This phase deduces the presence of a fault by detect-
ing an error in the state of a subsystem. It is from
the presence of errors that failures and faults can be
deduced. Unless errors can be detected successfully
by a fault model, the fault model is somewhat use-
less. We consider that detection of M
node fail-
ure is done by use of timing checks. In our context
if the standby node M2
does not receive a backup
message from the M
node with respect to the timer
clock, it will send a “query” message to the M
. If no
response is received, it is assumed the M
has failed
which triggers M2
to enter the recovery phase (sec-
tion 4.3.7). We consider that each M2
carries a fail-
ure detector, to detect crashed M
. A failure detector
is called strongly accurate if only crashed processes
are ever suspected (Fokkink, 2013). We understand
in bounded delay networks, a strongly accurate (and
complete) failure detector is implemented as follows:
Suppose l
is a known upper bound on network la-
tency from M
to M2
. Each M
thus broadcasts
an “alive” message every ν time units. Each M
which no message is received for ν + l
time units
is suspected to have crashed. An inquiry message is
then send to confirm this suspicion in algorithm.
4.4.2 Failure Containment
The system design incorporates mechanisms to limit
the spreading of errors in the system, thereby con-
fining the failure to predetermined boundaries (line:
FailureContainment in 4.3.7). In order to prevent the
spread of error, the faulty node is suspended from any
participation and its tasks are taken by the standby
component. We assume this stand-by component
does not share similar properties susceptible to error
detected in the primary node.
4.4.3 Error Recovery
The two general techniques for error recovery are
backward and forward recovery (Jalote, 1994). We
chose backward recovery in which check-pointing is
done frequently on the standby M2
node and in the
event of a failure a system roll-back occurs. One main
drawback of the backward technique is the overhead
required. Assuming the M2
nodes have a fairly sta-
ble storage, frequent check-pointing is required which
affects the normal execution of the system even in the
absence of failures. Despite the high overhead, we
opt for backward recovery due to its simplicity and
independence from the nature of the fault or failure
(Jalote, 1994). We assume check-pointing invokes the
NodeBackup procedure everytime a token is returned
by M
to M
, or when a ReqM
or a ReqM
sage is received. To reduce the overhead, one message
(including M
current data structures) can be sent to
the M2
if M
is in possession of the token and is
check-pointing. We assume a roll-back procedure is
executed at the M2
node with regards to the last
4.4.4 Fault Resolution and Continued Service
A fault tolerant system has to function such that the
faulty components are not used through a fault con-
tainment method. By assuming the main component’s
role, the backup M2
node masks primary M
failure, isolating the faulty component but maintain-
ing system availability and reliability. The M2
establishes connection with M
nodes then resumes
communication with the neighbouring M
4.5 Proposed CDA Algorithm Analysis
We use the fault scenarios in analysing if the mod-
ified fault-tolerance algorithm is able to address the
faults noted as well as maintain the crucial mutual ex-
clusion properties. Further we discuss message and
time complexity of the fault tolerant distributed CDA
4.5.1 CDA Algorithm Correctness
We analyse how the fault tolerant CDA algorithm
guarantees the following properties in the presence of
potential fault, thus failure:
ME1: If token-handling is diligently implemented
only one token will be in the network at any par-
ticular time even in the presence of faults. The
proposed algorithm ensures that at any instant of
time, not more than one node holds the token.
Whenever a node receives a token, it becomes ex-
clusively privileged. Similarly, when a node sends
the token, it becomes unprivileged. Between the
instants there is no privileged node. Thus, there is
at most one privileged node at any point of time in
the network. If M
granting the token fails then
a privileged cluster node M
will not be able to
Fault-tolerant Distributed Continuous Double Auctioning on Computationally Constrained Microgrids
return the token within a predefined time. The to-
ken session is cancelled and the token is consid-
ered lost. The same M
node will revert back
the changes to the time before token was received
and the timed out token is destroyed before a to-
ken regeneration procedure in the M2
node. If a
privileged M
node finds itself without outgoing
links to neighbouring M
nodes and the timer has
elapsed or if it has failed, it will destroy the copy
of the token before failure on reboot or resumption
of services. The respective M2
backup node
will regenerate the token with all details with re-
spect to the last checkpoint. Thus, only one token
can be in the system guaranteeing mutual exclu-
ME2: When the token is free, and one or more
TAs want to enter the auction market but are not
able to do so, a deadlock can occur. This happens
due to
the token not being able to be transferred to a
node because no node holds the privilege,
the node in possession of the token is unaware
that there are other nodes requiring the privi-
lege, or
the token fails to reach a requesting unprivi-
leged node.
We are aware that the logical pattern established
using POINTER variables ensures a node that
needs the token sends ReqM
either to M
holding the token or to M
u a neighbouring node
that has a path to the token holder. M
send ReqM
to their cluster head M
node. If
we consider the orientation of tree links formed
by the M
nodes, say L at time t the resulting di-
rected graph. In all cases, there are no directed
cycles, making L[t] acyclic, where from any non-
terminal node there is a directed path to exactly
one terminal entity. Within finite time, we con-
sider every message will stop travelling at a M
If a privileged node fails our algorithm ensures
that the copy in the failed node is deleted while
a copy is regenerated from the last checkpoint.
By establishing connection with the neighbouring
nodes and M
nodes previously connected
to the primary the token path is complete and the
will be aware which node to pass the token.
ME3: If M
u holds the token and another node
v requests for the token, the identity of M
or of proxy nodes for M
v will be present in the
RQ1s of various M
nodes in the path connecting
the requesting node to the currently token-holding
node. Thus depending on the position of the node
v requests in those RQ1s, M
v will sooner
or later receive the privilege. In addition, enqueu-
ing ReqM
requests in RQ2 ensures that a token
gets released to the next M
at the head of the
RQ1 once the GQ is empty, no single cluster of
nodes continues to trade while other nodes
are starved. In the presence of failure the static
spanning tree may be partitioned, which means a
sub-tree that has the token will benefit while the
other sub-tree(s) is starved of the token. Our al-
gorithm ensures that partitioning does not occur
with a backup node taking the place of the failed
ME4: Every trading agent has the same opportu-
nity to enter into the auction market. The FIFO
queues RQ1 and RQ2 are serviced in the order of
arrival of the requests from nodes. A new request
from an agent that acquired the token in the recent
past is enqueued behind the remaining pending re-
quests. In a situation that a privileged M
fails while granting the local cluster M
nodes a
chance to participate in the auction, our algorithm
ensures the remaining M
nodes get a chance to
obtain the token and transact in the order in which
they send their requests for the token. By allowing
the backup M2
to resume the roles of the failed
instead of eliminating the failed M
fair token distribution is maintained.
4.5.2 CDA Algorithm Efficiency
Runtime: Usually it suffices to identify a dominant
operation and to estimate the number of times it is
executed. In the proposed Algorithm we consider
LocalTokenDistribution (executed on M
)to be the
most costly operation carried out. As a dominant op-
eration we regard the while loop (line 22). This run-
ning time depends linearly on the input’s size i.
Time Complexity: Analysis of the running time of
our fault tolerant CDA algorithm may not be suffi-
cient as we are more interested in analysing how this
running time increases when the input size increases.
Thus we consider the order of growth measure which
can be estimated by taking into account the dominant
term of the running time expression. The dominant
operation (section 4.3.7) is influenced by N which is
the number of cluster nodes. As n grows the time
complexity increases linearly at an order of O(N).
Message Complexity: Assuming that
V =
+ v
... + v
) is the number of Req
received by a non-token-possessing node M
i from
its neighbours because it is in line to a token holding
node. While U =
+ u
+ ... + u
) is the number
received by M
i. Each time a BackupNode is
ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy
executed W , which is the number of check-pointing
messages: W =
(U + V ), is saved in the stable
storage. In a worst case scenario all the N
cluster M
nodes will send a Rep
at random times
to M
while all child M
nodes (in a K-ary tree)
send Req
. This results in a message complexity of
O(W ).
We proposed a fault tolerant distributed CDA algo-
rithm for energy allocation within computationally
constrained microgrids. The fault tolerance proposed
specifically addresses crash faults. Instead of by-
passing the failed node and reconnecting the remain-
ing spanning tree segments, our algorithm masks fail-
ure by incorporating redundancy to each cluster head
node. The fault tolerant CDA allows trading agents
to have mutually exclusive permission to participate
in the auction market despite the presence of crash
failures. We present the correctness and efficiency of
the fault tolerant algorithm. The runtime of the al-
gorithm is linearly dependent on the input size i. The
time complexity for a single agent (hosted on a cluster
node) to trade in the auction market is O(N) where N
is number of cluster nodes; and the message complex-
ity is O(W ), where W is the number of check-pointing
messages; N is the number messages exchanged per
critical section execution. These are reasonable up-
per bounds of a fault tolerant supported CDA al-
gorithm employing redundancy and check-pointing.
While minimal redundancy is likely to be appropri-
ate for the microgrid case, as future work we seeks to
make a comparative evaluation to other schemes that
might be expensive for small amounts of extra fault-
tolerance, but whose costs are lower when higher de-
grees of fault tolerance are desired. In distant future
work we seek to investigate how malware may affect
fairness, thus stability of a fault tolerant CDA aug-
mented microgrid.
This work was done with the joint SANCOOP pro-
gramme of the Norwegian Research Council and the
South African National Research Foundation under
NRF grant 237817; the Hasso-Plattner-Institute at
UCT; and UCT postgraduate funding.
Bloch, A. (2003). Murphy’s law. Penguin.
Borenstein, S., Jaske, M., and Rosenfeld, A. (2002). Dy-
namic pricing, advanced metering, and demand re-
sponse in electricity markets. Center for the Study of
Energy Markets.
Chang, Y.-I., Singhal, M., and Liu, M. T. (1990). A fault
tolerant algorithm for distributed mutual exclusion.
In Reliable Distributed Systems, 1990. Proceedings.,
Ninth Symposium on, pages 146–154. IEEE.
Cui, T., Wang, Y., Nazarian, S., and Pedram, M. (2014). An
Electricity Trade Model for Microgrid Communities
in Smart Grid. In Innovative Smart Grid Technolo-
gies Conference (ISGT), 2014 IEEE PES, pages 1–5.
Dhamdhere, D. M. and Kulkarni, S. S. (1994). A to-
ken based k-resilient mutual exclusion algorithm for
distributed systems. Information Processing Letters,
Fokkink, W. (2013). Distributed Algorithms: An Intuitive
Approach. MIT Press.
Garg, V. K. (2011). Principles of distributed systems.
Springer Publishing Company, Incorporated.
Ghosh, S. (2014). Distributed systems: an algorithmic ap-
proach. CRC press.
Izakian, H., Abraham, A., and Ladani, B. T. (2010). An
auction method for Resource Allocation in Computa-
tional Grids. Future Generation Computer Systems,
Jalote, P. (1994). Fault Tolerance in Distributed Systems.
Prentice-Hall, Inc.
Kshemkalyani, A. D. and Singhal, M. (2008). Distributed
computing: Principles, Algorithms, and Systems.
Cambridge University Press.
Marufu, A. M., Kayem, A. V., and Wothulsen, S. (2015). A
distributed continuous double auction framework for
resource constrained microgrids. In Critical Informa-
tion Infrastructures Security, 2015. The 10th Interna-
tional Conference on, pages –. IEEE.
edard, M. and Lumetta, S. S. (2003). Network reliability
and fault tolerance. Encyclopedia of Telecommunica-
Pałka, P., Radziszewska, W., and Nahorski, Z. (2012).
Balancing electric power in a microgrid via pro-
grammable agents auctions. Control and Cybernetics,
Raymond, K. (1989). A tree-based algorithm for distributed
mutual exclusion. ACM Transactions on Computer
Systems (TOCS), 7(1):61–77.
Raynal, M. (1986). Algorithms for mutual exclusion.
Revannaswamy, V. and Bhatt, P. (1997). A fault tolerant
protocol as an extension to a distributed mutual ex-
clusion algorithm. In Parallel and Distributed Sys-
tems, 1997. Proceedings., 1997 International Confer-
ence on, pages 730–735. IEEE.
nczak, J., Radziszewska, W., and Nahorski, Z. (2015).
Dynamic Pricing and Balancing Mechanism for a Mi-
crogrid Electricity Market. In Intelligent Systems’
2014, pages 793–806. Springer.
Fault-tolerant Distributed Continuous Double Auctioning on Computationally Constrained Microgrids
Tanenbaum, A. S. and Van Steen, M. (2007). Distributed
Systems. Prentice-Hall.
van Steen, M. and Tanenbaum, A. (2001). Distributed sys-
tems, principles and paradigms. Vrije Universiteit
Amsterdam, Holland, pages 1–2.
Walter, J. E., Welch, J. L., and Vaidya, N. H. (2001). A mu-
tual exclusion algorithm for ad hoc mobile networks.
Wireless Networks, 7(6):585–600.
ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy