COALITION FORMATION WITH UNCERTAIN TASK EXECUTION

Hosam Hanna

GREYC - University of Caen

Bd Mar

´

echal Juin

14032 Caen - France

Keywords:

Coalition formation, Uncertainty, Group decision, Markov decision process.

Abstract:

We address the problem of coalition formation in environments where tasks’ executions are uncertain. Al-

though previous works provide good solutions for coalition formation problem, the uncertain task execution

problem is not taken into account. In environments where task execution is uncertain, an agent can’t be sure

whether he will be able to execute all the subtasks that are allocated to him or he will ignore some of them.

That is why forming coalition to maximize the real reward is an unrealizable operation. In this paper, we

propose a theoretical approach to form coalition with uncertain task execution. We view the formation of a

coalition to execute a task as (1) a decision to make and (2) as an uncertain source of gain. We associate then

the allocation of a task to a coalition with an expected reward that represents what agents expect to gain by

forming this coalition to execute this task. Also, the agents’ aim is to form coalition to maximize the expected

reward instead of the real reward. To reach this objective, we formalize the coalition formation problem by

a Markov Decision Process (MDP). We consider the situation where decisions are taken by one agent that

develops and solves the corresponding MDP. An optimal coalition formation which maximizes the agents’

expected reward is then obtained.

1 INTRODUCTION

Coalition formation is an important cooperation

methode for applications where an agent can’t efﬁ-

ciently execute a task by himself. The coalition for-

mation problem has widely been studied and many

approaches have been proposed. In game theory, we

ﬁnd some works that treated this problem without

taking into account the limited time calculation (Au-

mann, 1959), (Bernheim et al., 1987), and (Kahan and

Rapoport, 1984). In cooperative environments, many

algorithms were suggested to answer the question of

group formation (Shehory and Kraus, 1998). In mul-

tiagent systems, there are several coalition formation

mechanisms that include a protocol as well as strate-

gies to be implemented by agents given the proto-

col (Klusch and Shehory, 1996), (Shehory and Kraus,

1998), (Zlotkin and Rosenschein, 1994), (Learman

and Shehory, 2000). All these works have common

assumptions: resources consumption is perfectly con-

trolled by agents and the formation of a coalition to

execute a task is a certain source of reward. In other

words, an agent can exactly determine the quantity

of resources he will consume to execute any subtask,

and the formation of a coalition to execute a task

is sufﬁcient to obtain the corresponding reward. In

this study, we relax this assumption in order to adapt

coalition formation to more real cases, and we inves-

tigate the problem of formation coalition in environ-

ments where agents have uncertain behaviors.

Several works have investigated the coalition for-

mation problem where coalition value is uncertain

or known only to a limited degree of certainty. In

(Ketchpel, 1994), author considered the case where

agents do not have access to coalition value function,

and he proposed a two-agents auction mechanism that

allows to determine coalitions of agents that will work

together, and to decide how to reward the agents. In

(Blankenburg et al., 2003), authors studied situations

where coalition value is known only to a limited de-

gree of certainty. They proposed to use fuzzy quan-

tities instead of real numbers in order to express the

coalitions value. A fuzzy Kernel concept has been in-

troduced in order to yield stable solutions. Although

the complexity of the fuzzy kernel is exponential, it

has been shown that this complexity can be reduced

to polynomial complexity by placing a cap on the size

of coalitions. The uncertainty on coalition value can

be due to the unknown execution cost. In fact, when

agents reason in term of utility, the net beneﬁts of a

164

Hanna H. (2006).

COALITION FORMATION WITH UNCERTAIN TASK EXECUTION.

In Proceedings of the Eighth International Conference on Enterprise Information Systems - AIDSS, pages 164-169

DOI: 10.5220/0002459601640169

Copyright

c

SciTePress

coalition is deﬁned as the coalition value minus the

execution cost of all the coalition’s members. When

an agent of the coalition does not know with certainty

the execution costs of the other members, it is uncer-

tain regarding both the coalition’s net beneﬁts and its

net beneﬁts. A protocol allowing agents to negoti-

ate and form coalition in such a case has been pro-

posed in (Kraus et al., 2003) and (Kraus et al., 2004).

Another source of uncertainty on coalition value can

be the imperfect or deceiving information. A study

for this case has been proposed in (Blankenburg and

Klusch, 2004). In (Chalkiadakis and Boutilier, 2004),

authors proposed a reinforcement learning model to

allow agents to reﬁne their beliefs about others’ capa-

bilities.

Although these previous works deal with an impor-

tant uncertainty issue (uncertain coalition value), they

have several restrictive assumptions regarding another

possible sources of uncertainty as the uncertain re-

sources consumption (uncertain task execution) that

can be due to the uncertain agent’s behavior and to

the environment’s dynamism. In addition, they do not

take into account the effects of forming a coalition on

the future possible formations, a long-term coalition

formation planning can not then be provided. In ap-

plications as planetary rovers, for example, an agent is

confronted with an ambiguous environment where he

can not control his resources consumption when exe-

cuting tasks as good as he does in laboratory. A coali-

tion formation planning is important so that agents

adapt coalition formation to their uncertain behaviors.

The problem is more complex when resources con-

sumption is uncertain for all the agents. Unfor-

tunately, in such a system, an agent can’t be sure

whether he (or another agent) will be able to execute

all the subtasks that are allocated to him or he will

ignore some of them. So, forming coalitions to max-

imize the agents’ real reward is a complex (even un-

realizable) operation. In fact, a task is considered as

non executed if at least one of its subtasks is not ex-

ecuted. That is why, forming a coalition to execute

a task is a necessary but not sufﬁcient constraint to

obtain a reward, and the agents’ reward must be sub-

jected to the task execution and not only to the coali-

tion formation and task allocation. In this paper, we

take into account these issues and we present a prob-

abilistic model, based on Markov Decision Process

(MDP), that provides a coalition formation planning

for environments where resources consumption is un-

certain. We will show that according to each possible

resources consumption, agents can decide by an opti-

mal way which coalition they must form.

We begin in Section 2 with a presentation of our

framework. In section 3, we sketch our solution ap-

proach. We explain how to form coalition via MDP

in Section 4.

2 FRAMEWORK

We consider a situation where a set of m fully-

cooperative agents, A = {a

1

,...,a

m

} have to coop-

erate to execute a ﬁnite set of tasks T = {T

1

,...,T

n

}

in an uncertain environment. The tasks will be

allocated in a commonly known order: without

loss of generality, we assume that this ordering is

T

1

,T

2

, ··· ,T

n

. Each agent a

k

has a bounded quan-

tity of resources R

k

that he uses to execute tasks.

Each task consists of subtasks: for simplicity, we

assume that every task T

i

∈ T is composed by q

subtasks such as T

i

= {t

1

i

,...,t

q

i

}. Agent a

k

is

able to perform only a subset E

k

i

⊂ T

i

of the sub-

tasks of a given task T

i

. We assume that each task

T

i

∈ T satisﬁes the condition T

i

⊆∪

a

k

∈A

E

k

i

, oth-

erwise it is an unrealizable task. For each subtask

t

l

i

,T

i

, ∈ T,l =1,...,q, we can deﬁne the set of

agents, AE(t

l

i

), that are able to perform t

l

i

as follows:

AE(t

l

i

)={a

k

∈ A|t

l

i

∈ E

k

i

}. Since an agent can’t

execute a task T

i

by himself, a coalition of agents

must be formed in order to execute this task. Such

a coalition can be deﬁned as a q−tuple: a

1

,...,a

q

where agent a

l

∈ A executes subtask t

l

i

∈ E

a

l

i

.We

let C(T

i

) denote the set of all possible coalitions that

can perform task T

i

, it can be deﬁned as follows:

C(T

i

)={a

1

,...,a

q

|a

l

∈ A, t

l

i

∈ T

i

,t

l

i

∈ E

a

l

i

,l =

1,...,q}. A task is considered as realized if and only

if all its subtasks have been performed. For each re-

alized task T

i

, agents obtain a reward. We consider

a general situation where the tasks can be executed

with different qualities. For example, two agents can

take photos for the same object, but the resolution can

be different. The reward corresponding to the execu-

tion of a task depends then on the coalition that ex-

ecutes the task. We assume that agents have a func-

tion w(T

i

,c) that expresses the reward that can be ob-

tained if the coalition c executes task T

i

.

3 SOLUTION APPROACH

The key idea, in our approach, is to view the forma-

tion of a coalition to execute a task as a decision to

make that provides an expected reward instead of a

real gain. What one expects to gain by forming col-

lation c to execute task T

i

? In fact, when T

i

is allo-

cated to c, the agents expect to obtain two values. The

ﬁrst one is the value w(T

i

,c) which is subjected to

the execution of task T

i

. The second expected value

expresses the gain that can be obtained from future

formation and allocation taking into consideration re-

sources quantity consumed to execute T

i

. Indeed,

when a coalition executes a task, the agents’ available

resources is reduced. The chances to execute another

COALITION FORMATION WITH UNCERTAIN TASK EXECUTION

165

tasks can then be reduced. As the resources collection

consumed to execute task T

i

depends on the coalition

c executing T

i

, the gain the agents can obtain from

future formation and allocation depends also on coali-

tion c. Finally, the expected reward associated to the

formation of a coalition to execute a task is sum of

these two expected values. It is necessary to recall

here that our expected reward deﬁnition is different

from the expected coalition value deﬁned in (Chalki-

adakis and Boutilier, 2004) for a Bayesian reinforce-

ment learning model. In fact, the expected coalition

value notion is used to express what an agent, basing

on his expectation regarding the capabilities of other

agents, beliefs about the value of any coalition. In

addition, this notion doesn’t allow agents to take into

account the impact of the formation of a coalition on

the gain that can be obtained from the formation of

another coalitions (our second expected value).

Differently from known coalition formation meth-

ods that maximize the agents’ real gain

1

, the goal of

our agents is deﬁned as follows: for each task T

i

, form

a coalition c by such a way that it maximizes agents’

long-term expected reward. To realize this objective,

we have to treat the uncertain resources consumption

and to formalize the expected reward associated to

coalition formation. We will use a discreet representa-

tion of resources consumption and then deﬁne an exe-

cution probability distribution. Finally, we formalize

the coalition formation problem by a Markov deci-

sion process (MDP). It is well known that solving a

MDP allows to determine an optimal policy maximiz-

ing the long-term expected reward (Bellman, 1957;

Puterman, 1994).

3.1 Uncertain Resource

Consumption

In order to deal with the uncertain resources con-

sumption, we assume that the execution of subtask

t

l

i

∈ T

i

by agent a

k

can consume one quantity of re-

sources from a ﬁnite set R

t

l

i

k

of possible quantities of

resources. For simplicity, we assume that there are p

resources quantities in the set R

t

l

i

k

. Agent a

k

doesn’t

know which quantity of resources will be consumed,

but he can anticipate it using some probability distri-

bution:

Deﬁnition 3.1 With each agent a

k

∈ A is associ-

ated an execution probability distribution PE

k

where

∀t

l

i

∈ T

i

, ∀r ∈ R

t

l

i

k

,PE

k

(r, t

l

i

) represents the proba-

bility to consume the resources quantity r at the time

of the execution of subtask t

l

i

by agent a

k

.

1

Or another type of gain that doesn’t include the impact

of the formation of a coalition on the future formations.

If a coalition c = a

1

,...,a

q

∈C(T

i

) executes task

T

i

, a resources collection such as r

1

,...,r

q

can be

consumed, where agent a

k

consumes quantity r

k

to

perform subtask t

k

i

. Since one of p resources quan-

tities can be consumed by each agent a

k

to execute

subtask t

k

i

, then the execution of T

i

by c consumes

one collection from p

q

resources collections. We let

H

c

i

denote the set of all these resources collection.

The probability Pr(r

1

,...,r

q

,T

i

) to consume col-

lection r

1

,...,r

q

∈H

c

i

at the time of the execution

of T

i

by c is then the probability that each agent a

k

consumes the quantity r

k

. Using deﬁnition (3.1), this

probability can be deﬁned as follows

2

:

Pr(r

1

,...,r

q

,T

i

)=

q

k=1

PE

a

k

(r

k

,t

k

i

) (1)

3.2 Coalition Expected Reward

In our context, a speciﬁc agent, “controller”, is

charged to form coalitions and to allocate tasks. Con-

troller views the formation of a coalition to execute a

task as a decision to make. When such a decision is

made, a coalition is formed, a task is allocated to this

coalition, and a resources collection will be consumed

to execute the allocated task. As we have shown in

Section 3, the decision to form a coalition to execute

a task is associated with an expected reward. In the

following, we show how controller can calculate this

expected reward.

The controller observes the state of the system as

the couple of available resources of all the agents and

the set of formed coalitions and allocated tasks. Be-

ing in a state S, the decision that consists in forming a

coalition c to execute a task T

i

drives the system into

a new state S

h

in which task T

i

has been allocated to

coalition c and a resources collection h ∈ H

c

i

is an-

ticipated to be consumed when c executes T

i

.Inor-

der to take into account the uncertain task execution,

controller must anticipate all the possible resources

collections that can be consumed when c executes T

i

;

each possible consumption drives the system into a

different state. If agents of coalition c have enough

resources to execute T

i

(collection h is less than c’s

agents available resources), then the system receives

in state S

h

an immediate gain w(T

i

,c) (ﬁrst expected

value), else it receives zero. From state S

h

another

decision can be made and another reward can be so

obtained (second expected value). We let V [S

h

] de-

note the gain in state S

h

and we deﬁne it as the sum

2

Since PE

k

is a distribution probability on R

t

l

i

k

,we

have

PE

k

(r, t

l

i

)=1, ∀r ∈ R

t

l

i

k

. It is easy to ver-

ify that Pr represents a distribution probability on H

c

i

:

∀h∈H

c

i

Pr(h, T

i

)=1.

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

166

of both last rewards (see Section 4.3 for mathemat-

ical deﬁnition). Being in state S, the probability to

gain V [S

h

], if coalition c is formed to execute T

i

,

is expressed by the probability to consume resources

collection h because the system reaches state S

h

if

collection h has been consumed. This probability is

deﬁned by equation 1. We can say now that being

in state S the decision to form coalition c to execute

T

i

drives to state S

h

and allows to gain V [S

h

] with

probability Pr(h, T

i

), where h ∈ H

c

i

. The expected

reward of this decision can be deﬁned as follows:

E(Forming c to execute T

i

) =

h∈H

c

i

Pr(h, T

i

)×V [S

h

]

(2)

We note that the expected reward associated to a

decision made in the state S depends on the gain that

can be obtained in each state S

h

, and so on. The ques-

tion is then: being in a state S and knowing that there

are |C(T

i

)| coalitions capable to execute T

i

, which

decision controller has to make in order to maximize

his long-term expected reward? To answer this ques-

tion, we formalize our coalition formation problem

using a probabilistic model called Markov Decision

Process (MDP). We will show that the MDP allows to

determine an optimal formation coalition policy that

deﬁne for each system state the coalition to form in

order to maximize the system’s long-term expected

reward.

4 COALITION FORMATION

The coalition formation can be viewed as a sequential

decision process. At each step of this process, the de-

cision to form a coalition to execute a task has to be

made. In the next step, another decision concerning

the next task is made, and so on. The formation of

a coalition changes the system’s current state into a

new one. As it has been shown in the previous sec-

tion, the probability to transit between the system’s

current state and a new state only depends on the sys-

tem’s current state and on the made decision. So, this

process is a Markovian one (Papoulis, 1984; Bellman,

1957).

A Markov decision process consists of a set of all

system’s states S, a set of actions AC and a model of

transition (Bellman, 1957). With each state is associ-

ated a reward function and with each action is asso-

ciated an expected reward. In the following, we de-

scribe our MDP via: the states, the actions, the transi-

tion model and the expected reward.

4.1 States Representation

A state S of the set S represents a situation of coali-

tion formation and resources consumption for all the

agents. We let S

i

=

B

i

, R

1

i

,...,R

m

i

denote the

system state at time i where:

• B

i

is the set of couples task-coalition represent-

ing the coalition formation until time i: B

i

=

{(T

f

,c

f

)|f =1,...,i, coalition c

f

is formed to

execute task T

f

} ;

• R

k

i

,k =1,...,m is the available resources of the

agent a

k

at time i.

At time 0 the system is in the initial state S

0

=

(∅, R

1

,...,R

m

), where R

k

is the initial resources

of agent a

k

. At time n (number of tasks), system

reaches a ﬁnal state S

n

where there are no more tasks

to allocate or no more resources to execute tasks.

4.2 Actions and Transition Model

With each state S

i−1

∈Sis associated a set of ac-

tions AC(S

i−1

) ⊂AC. An action of AC(S

i−1

)

consists in forming coalition c ∈ C(T

i

) to exe-

cute task T

i

and in anticipating the resources col-

lection which can be consumed to execute T

i

.We

denote such an action by Form(c, T

i

). So, the set

AC(S

i−1

) contains |C(T

i

)| actions. Being in state

S

i−1

=

B

i−1

, R

1

i−1

,...,R

m

i−1

, the application

of action Form(c, T

i

) drives the system into a new

state S

h

i

which can be any state from the following

states:

S

h

i

=

B

h

i

, R

1

i

,...,R

m

i

(3)

where :

• c = a

1

,...,a

q

•∀h = r

1

,...,r

q

∈H

c

i

• B

h

i

= B

i−1

∪{(c, T

i

)}

•∀a

k

∈ A, a

k

∈ c, R

k

i

= R

k

i−1

•∀a

l

= a

k

∈ c,

R

k

i

=

⎧

⎨

⎩

R

k

i−1

− r

l

, if R

k

i−1

≥ r

l

0, if r

l

>R

k

i−1

In fact, there are |H

c

i

| possible future states because

the execution of T

i

by coalition c can consume one

resources collection of the set H

c

i

. The case where

r

l

>R

k

i−1

corresponds to the situation when agent

a

l

= a

k

try to execute task t

l

i

and he consumes all

his resources R

k

i−1

but t

l

i

is not completely performed

because it necessitates more resources (r

l

). The a

l

’s

available resources is then 0 and task T

i

can’t be con-

sidered as a realized task. If c’s agents have enough

resource to execute T

i

, an immediate gain equal to

w(T

i

,c) will be received in state S

h

i

. In the other

case (c’s agents available resources are not sufﬁcient

to completely execute T

i

), the immediate gain is equal

COALITION FORMATION WITH UNCERTAIN TASK EXECUTION

167

to 0. We let α(S

h

i

) denote the immediate gain in state

S

h

i

, thus:

α(S

h

i

)=

⎧

⎨

⎩

w(T

i

,c), if ∀a

l

= a

k

∈ c, r

l

≤ R

k

i−1

0, otherwise: ∃a

l

= a

k

∈ c, r

l

>R

k

i−1

(4)

Furthermore, the probability of the transition from

state S

i−1

to a state S

h

i

knowing that the action

Form(c, T

i

) is applied can be expressed by the prob-

ability to consume resources collection h by coalition

c, thus Pr(S

h

i

|S

i−1

,Form(c, T

i

)) = Pr(h, T

i

). It’s

important to know that state S

h

i

is inevitably differ-

ent from the state S

i−1

. In fact, the task to allocate

in S

i−1

was T

i

, while in any state S

h

i

,h ∈ H

c

i

we

form a coalition to execute task T

i+1

. In other words,

being in a state S at time i, there are no actions that

can drive the system to a state S

which was the sys-

tem’s state at time i

≤ i. Consequently, the devel-

oped MDP doesn’t contain loops, it is a ﬁnite horizon

MDP (Sutton and Barto, 1998). This is a very impor-

tant property as we will show in the following.

4.3 Expected Reward

The decision to apply an action depends on the re-

ward that the system expects to obtain by applying

this action. We denote by E(Form(c, T

i

),S

i−1

) the

expected reward associated to the action Form(c, T

i

)

applied in state S

i−1

. We recall that this expected

reward represents what the system, being in state

S

i−1

, expects to gain if coalition c is formed to

execute task T

i

. A policy π to follow is a map-

ping from states to actions. For state S

i−1

∈S,

π(S

i−1

) is an action from AC(S

i−1

) to apply. The

expected reward of a policy π(S

i−1

)=Form(c, T

i

)

is E(Form(c, T

i

),S

i−1

). An optimal policy is the

policy that maximizes the expected reward at each

state. In state S

i−1

an optimal policy π

∗

(S

i−1

) is

then the action whose expected reward is maximal.

Formally,

π

∗

(S

i−1

) = arg max

c∈C(T

i

)

{E (Form(c, T

i

) ,S

i−1

)}

(5)

Solving equation 5 allows to determine an opti-

mal coalition formation policy at each state S

i−1

.

To do this, the expected reward associated to ac-

tion Form(c, T

i

) has to be deﬁned. Deﬁning this

expected reward requires, basing on equation 2, the

deﬁnition of the reward associated with each state.

We deﬁne the reward V [S

i−1

] associated with a state

S

i−1

=

B

i−1

, R

1

i−1

,...,R

m

i−1

as an immediate

gain α(S

i−1

) accumulated by the expected reward of

the followed policy (reward-to-go). We can formulate

V [S

i−1

] and E(Form(c, T

i

),S

i−1

) using Bellman’s

equations (Bellman, 1957), thus:

➤

i−1

:

V [S

i−1

]= α(S

i−1

)

immediate gain

+ E(π

∗

(S

i−1

))

reward-to-go

according to π

∗

(6)

E(π

∗

(S

i−1

)) = max

c∈C(T

i

)

{E(Form(c, T

i

),S

i−1

)}

(7)

E (Form(c, T

i

),S

i−1

)=

h∈H

c

i

Pr(h, T

i

)×V

S

h

i

(8)

where state S

h

i

corresponds to the consumption of

resources collection h.

➤

n

:

V [S

n

]=α(S

n

) (9)

Since the obtained MDP is a ﬁnite horizon with no

loops, several known algorithms, as Value Iteration

and Policy Iteration, solve B

ELLMAN’s equations in a

ﬁnite time (Puterman, 1994), and an optimal policy is

obtained.

4.4 Optimal Coalition Formation

An optimal coalition formation can be obtained by

solving B

ELLMAN’s equations and then applying the

optimal policy at each state starting from initial state

S

0

. Here, we distinguish two cases according to the

execution model. The ﬁrst case corresponds to the

execution model where tasks must be sequentially ex-

ecuted in the allocation order (T

1

,T

2

,...,T

n

). In this

case, a coalition to execute task T

i+1

is formed at the

end of T

i

’s execution. Let π

∗

(S

i−1

)=Form(c, T

i

)

be the optimal policy to apply in the state S

i−1

. The

application of this policy means that the coalition c

must be formed to execute task T

i

. Assuming that re-

sources collection h has been consumed by c to exe-

cute T

i

, system then reaches the state S

i

= S

h

i

deﬁned

by equation 3. From this new state S

i

, controller ap-

plies the calculated optimal policy π

∗

(S

i

), and so on.

The second case corresponds to the execution

model where controller form all the possible coali-

tions before agents start the execution. In this case,

after each coalition formation, controller has to an-

ticipate the state the system will reach when execut-

ing the allocated task. Let π

∗

(S

i−1

)=Form(c, T

i

)

be the optimal policy to apply in the state S

i−1

.By

applying this optimal policy, coalition c is formed

to execute T

i

. As the execution is not immedi-

ate, controller anticipates the state S

i

the system

will reach when c executes T

i

. This state S

i

can

be any state S

h

i

,h ∈ H

c

i

. The state that the sys-

tem has big chances to reach is the state correspond-

ing to the resources collection that can be consumed

with a maximal probability. Formally, the state S

h

i

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

168

the system has a big probability to reach when c

executes T

i

is the state corresponding to the con-

sumption of the resources collection h that satisﬁes:

Pr(h, T

i

) = max

h∈H

c

i

{Pr(h, T

i

)}. From this new state

S

i

= S

h

i

, controller applies the calculated optimal

policy π

∗

(S

i

), and so on until reaching a terminal

state S

n

=(B

n

, R

1

n

,...,R

m

n

). Finally, the set

B

n

contains the formed coalitions and their allocated

tasks.

5 CONCLUSION

Approaches that proposed solution for coalition for-

mation problem with uncertain coalition value do not

take into account the uncertain task execution and the

impact of the formation of a coalition on the gain

that can be obtained from the formation of another

coalitions. In this paper, we addressed the problem

of coalition formation in environments where the re-

sources consumption is uncertain. We showed that

in such an environment, forming a coalition to exe-

cute a task have impacts on the possibility to form

another coalitions. Thus, this issue must be taken into

account at each time agents decide to form a coali-

tion. We introduced the notion of expected reward

that represents what agents expect to gain by form-

ing a coalition. The expected reward is deﬁned as

the sum of (1) what agents immediately gain if the

coalition executes the task and (2) what they expects

to gain by future formation. Our key idea is to view

the formation of coalitions as a decision to make that

provides, due to the uncertain task execution, an ex-

pected reward. Agents’ aim is then to form coalition

by a way that maximizes their long-term expected re-

ward instead of real reward. The coalition formation

problem has been formalized by a Markov decision

process. Since the obtained MDP is a ﬁnite horizon,

it can be solved in a ﬁnite time using known algo-

rithms as value-iteration and policy iteration. After

solving the MDP, the controller agent can optimally

decide, for each task, which coalition must be formed.

In other words, it can make optimal decisions about

the coalition formation.

REFERENCES

Aumann, R. (1959). Acceptable points in general cooper-

ative n-person games. volume IV of Contributions to

the Theory of Games. Princeton University Press.

Bellman, R. E. (1957). A markov decision process. journal

of Mathematical Mechanics, pages 6:679–684.

Bernheim, B., Peleg, B., and Whinson, M. (1987).

Coalition-proof nash equilibria: I concepts. Journal

of Economic Theory, 42(1):1–12.

Blankenburg, B. and Klusch, M. (2004). On safe kernel

stable coalition formation among agents. In Proceed-

ings of International Joint conference on Autonomous

Agents & Multi-Agent Systems, AAMAS04.

Blankenburg, B., Klusch, M., and Shehory, O. (2003).

Fuzzy kernel-stable coalition formation between ra-

tional agents. In Proceedings of International Joint

conference on Autonomous Agents & Multi-Agent Sys-

tems, AAMAS03.

Chalkiadakis, G. and Boutilier, C. (2004). Bayesian rein-

forcement learning for coalition formation under un-

certainty. In Proceedings of International Joint con-

ference on Autonomous Agents & Multi-Agent Sys-

tems, AAMAS04.

Kahan, J. and Rapoport, A. (1984). Theories of Coalition

Formation. Lawrence Erlbaum Associations Publish-

ers.

Ketchpel, S. (1994). Forming coalition in the face of uncer-

tain rewards. In Proceedings of AAAI, pages 414–419.

Klusch, M. and Shehory, O. (1996). A polynomial kernel-

oriented coalition formation algorithm for rational in-

formation agents. In Proceedings of ICMAS, pages

157–164.

Kraus, S., Shehory, O., and Taase, G. (2003). Coalition for-

mation with uncertain heterogeneous information. In

Proceedings of the Second International Joint confer-

ence on Autonomous Agents and Multi-Agent Systems,

AAMAS03, Australia.

Kraus, S., Shehory, O., and Taase, G. (2004). The ad-

vantages of compromising in coalition formation with

incomplete information. In Proceedings of Inter-

national Joint conference on Autonomous Agents &

Multi-Agent Systems, AAMAS04.

Learman, K. and Shehory, O. (2000). Coalition forma-

tion for large-scale electronic markets. In Proceedings

of the Fourth International Conference on Multiagent

Systems.

Papoulis, A. (1984). Signal Analysis. International student

edition, McGraw Hill Book Company.

Puterman, M. L. (1994). Markov Decision Processes. John

Wiley & Sons, New York.

Shehory, O. and Kraus, S. (1998). Methods for task allo-

cation via agent coalition formation. Artiﬁcial Intelli-

gence, 101:165–200.

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-

ing: An Introduction. MIT Press, Cambridge MA.

ISBN 0-262-19398-1.

Zlotkin, G. and Rosenschein, J. (1994). Coalition, cryptog-

raphy, and stability: mechanisms for coalition forma-

tion in task orientd domains. In Proceedings of AAAI,

pages 432–437.

COALITION FORMATION WITH UNCERTAIN TASK EXECUTION

169