A Fine-Tuning Aggregation Convolutional Neural Network Surrogate

Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

Shengbo Chang

and Katsuhide Fujita

Department of Electrical Engineering and Computer Science, Faculty of Engineering,

Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan

Keywords:

Automated Negotiation, Strategy Selecting Mechanism, Surrogate Model, Fine-Tuning.

Abstract:

Negotiation with the same opponent for multiple times for each in a different domain commonly occurs in real

life. We consider this automated negotiation problem as repeated-encounter bilateral automated negotiation

(RBAN), in which it is essential to learn experiences from the history of coping with the opponent. This

study presents a surrogate-model-based strategy selecting mechanism that learns experiences in RBAN by

ﬁne-tuning the proposed aggregation convolutional neural network (CNN) surrogate model (ACSM). ACSM

is promised to assess strategies more precisely by applying CNN to extract features from a matrix showing the

outcomes’ utility distribution. It ensures the abundance of extracted features by aggregating multiple CNNs

trained with diverse opponents. The ﬁne-tuning approach adapts ACSM to the opponent in RBAN by feeding

the present negotiation results to ACSM. We evaluate ACSM and the ﬁne-tuning approach experimentally

by selecting a strategy for a time-dependent agent. The experiments of negotiating with four Automated

Negotiating Agents Competition (ANAC) champions and six basic agents are performed. ACSM is tested on

600 negotiation scenarios originating from ANAC domains. The ﬁne-tuning approach is tested on 60 RBNA

sessions. The experimental results indicate that ACSM outperforms an existing feature-based surrogate model,

and the ﬁne-tuning approach is able to adapt ACSM to the opponent in RBAN.

1 INTRODUCTION

Negotiations with the same opponent multiple times

in a new domain each time happen in real life. For

example, a retailer may need to negotiate with one

diverse product supplier about the price, diversities,

and amount in each season as the preference of cus-

tomers change (Chkoniya and Mateus, 2019). More-

over, the identity of the opponent negotiating with is

known in the setting of the Automated Negotiation

League in the Automated Negotiating Agents Com-

petition (ANAC) 2022 (Aydogan et al., 2022), in-

dicating that candidates could change their strategies

by learning from their past experience with the given

opponent. In this study, we consider the strategy se-

lection of the repeated-encounter bilateral automated

negotiation (RBAN, i.e., a sequence of bilateral au-

tomated negotiation with the same opponent multiple

times and each time in a different scenario) (Renting

et al., 2022).

There is no single strategy that could dominate

https://orcid.org/0000-0001-6992-7566

https://orcid.org/0000-0001-7867-4281

all possible settings (Ilany and Gal, 2016). Previous

studies (Baarslag et al., 2012; Baarslag et al., 2013;

Ya’akov Gal and Ilany, 2015) demonstrated that the

best negotiation strategy varies with the negotiation

scenario, even for the same opponent. Therefore, se-

lecting the optimal strategy for each scenario is essen-

tial in RBAN. The similar behavior pattern of the op-

ponent in different scenarios is a distinguishable fea-

ture of RBAN that asks negotiators to use their expe-

rience and learn from the negotiation history to cope

with the opponent. Though several studies focused on

strategy selection (Ilany and Gal, 2016; Kawata and

Fujita, 2020; Wu et al., 2021; Baarslag et al., 2013;

Sengupta et al., 2021; Renting et al., 2020; G

unes¸

et al., 2017; Fujita, 2014; Fujita, 2018), few consid-

ered the problem of RBAN to the best of our knowl-

edge.

Surrogate models, generally used in algorithm se-

lection, predict the outputs for unknown algorithm pa-

rameter inputs by regressing the known inputs with

outputs. A surrogate model for strategy selection

in automated negotiation usually uses the negotiation

scenario features and a strategy conﬁguration as input,

Chang, S. and Fujita, K.

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter Bilateral Automated Negotiation.

DOI: 10.5220/0011701300003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 2, pages 277-288

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

277

and its output is the predicted evaluation value (Ilany

and Gal, 2016; Renting et al., 2020). For the given ne-

gotiation scenario, existing surrogate models use ex-

pert scenario features to predict the performance of a

strategy conﬁguration (Ilany and Gal, 2016; Renting

et al., 2020; Renting et al., 2022). Since these fea-

tures relied on human intuition, prediction accuracy

is usually lost to some extent. The 2-dimension out-

come space (our agent utility - the opponent utility)

of a negotiation scenario could represent the scenario

exhaustively. Additionally, convolution neural net-

works (CNN) can be trained to extract useful features

from the 2D outcome space automatically. Therefore,

training the CNN to extract scenario features is a fea-

sible way of overcoming the disadvantages of human

intuition.

This study extends the existing strategy selecting

mechanisms with a CNN-based surrogate model and

an online learning method for RBAN. The contribu-

tions of this study are as follows:

• An approach of extracting scenario features with

CNN on a discrete size-ﬁxed outcome distribu-

tion map which indicates the number of outcomes

falling within bins of a predeﬁned utility range.

• A surrogate model (ACSM) aggregates multi-

ple CNN-components that implement the feature-

extracting approach trained with diverse oppo-

nents, ensuring the robustness and feature abun-

dance when against an unknown opponent. A

ﬁne-tuning approach adapts the proposed surro-

gate model to the facing opponent in RBAN efﬁ-

ciently.

• The feature-extracting approach is validated ex-

perimentally by comparing ACSM with an expert-

feature-based neural network surrogate model

(NNSM) in various scenarios of single negotia-

tion. The ﬁne-tuning approach is validated exper-

imentally by comparing ACSM with ﬁne-tuning

with one without ﬁne-tuning in RBAN.

The remainder of this paper is structured as fol-

lows: Section 2 presents related works; Section 3

presents RBAN; Section 4 introduces ACSM with the

proposed feature-extracting method; Section 5 intro-

duces the strategy selecting mechanism of ﬁne-tuning

ACSM for RBAN; Sections 5 demonstrates the exper-

imental results; Section 6 summarizes this paper and

discusses future possibilities.

2 RELATED WORK

This work focuses on strategy selection for each ne-

gotiation scenario in RBAN. The area related mostly

is strategy selection in automated negotiation.

(Ilany and Gal, 2016) proposed a Meta-agent that

includes a strategy portfolio used in the ANAC. They

proposed several expert features to help build sur-

rogate models for evaluating negotiation strategies

in the given scenarios. Additionally, they extended

them to an online reinforcement learning version,

when the learned model is ﬂawed. They trained the

surrogate model to predict the average performance

when against a set of opponents. Extending their

work, (Renting et al., 2020) introduced the sequen-

tial model-based optimization mechanism for general

algorithm conﬁguration to select strategy parameters

for a dynamic agent under a set of opponents and do-

mains. The mechanism searches the conﬁguration

space accelerated by an expert-feature-based surro-

gate model. They also applied AutoFolio to con-

struct a strategy selector by domain and opponent fea-

tures (Renting et al., 2022). These studies rely on

the feature-based surrogate model. However, in this

study, we consider a new way of extracting negotia-

tion setting features with CNN.

(Fujita, 2018) proposed an approach to estimate

the opponent strategy and preference in multiple

times negotiation that could achieve better Pareto ef-

ﬁciency. (Kawata and Fujita, 2020) employed a re-

inforcement learning method to select the strategy for

multiple times negotiation inspired by (Ilany and Gal,

2016). (Taiji and Ikegami, 1999) proposed a strat-

egy for the repeated prisoner’s dilemma game that

uses a recurrent neural network to predict future in-

teractions with each other. This strategy optimizes

the next moves in the game. These works are ap-

plicable for the repeated negotiation where the oppo-

nent and negotiation domain are ﬁxed. (G

unes¸ et al.,

2017) applied boosting on bidding and acceptance

strategies. They proposed two versions of boosting

learning: learning to select a strategy and learning to

combine the output of different strategies. (Sengupta

et al., 2021) proposed an adaptive strategy switching

mechanism for their autonomous negotiating agent

framework. This mechanism could classify the op-

ponent in a negotiation scenario and use the expert

recommendation to select the coping strategy. Their

results show that they can outperform most existing

genius negotiators. Similarly, (Wu et al., 2021) pro-

posed a negotiating agent framework that leverages

Bayesian policy reuse in a negotiation. This frame-

work could recognize the opponent and give a cop-

ing policy or build a new policy when facing an un-

seen opponent. These works focused on the strategy

of coping with an opponent in a negotiation scenario,

and they do not consider the RBAN case.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

278

3 REPEATED-ENCOUNTER

BILATERAL AUTOMATED

NEGOTIATION

This work considers selecting a strategy from a strat-

egy portfolio for an agent in RBAN. Each negotia-

tion in RBAN is a bilateral automated negotiation that

consists of a negotiation protocol, a negotiation sce-

nario, and two negotiators. The negotiation protocol

and scenario settings in this paper adopt the bilateral

negotiation settings commonly used to evaluate ne-

gotiation strategies in literature (Ilany and Gal, 2016;

Baarslag et al., 2013; Renting et al., 2020; Renting

et al., 2022; Sengupta et al., 2021; Wu et al., 2021).

The negotiation protocol is the alternating offers

protocol (AOP) (Rosenschein and Zlotkin, 1994), in

which negotiators take turns to make an offer, ac-

cept an offer, or walk away. This continues until the

deadline is reached or one negotiator agrees or walks

away. The deadline can be measured in the number of

rounds or real wall-time. The negotiation scenario in-

cludes a negotiation domain and two preference pro-

ﬁles. The domain is public information. A preference

proﬁle is unique and private information only known

to its corresponding negotiator.

A domain D deﬁnes a set of issues I =

, . . . , I

issues

} with possible values V

, . . . , v

}, where n

issues

is the number of

issues, and k

is the number of values for issue I

A set of values for each issue is referred to as an

outcome ω. Ω is the set of all possible outcomes.

A preference proﬁle maps each outcome with a real

value in [0, 1] usually in the form of a utility function.

This paper adopts the linear additive utility function

U(ω) =

∑

i=1

(ω[I

]) with a reservation value r,

where w

is the weight of issue I

(

∑

i=1

= 1); e

(·)

is a function that maps the values of issue I

to real

numbers in [0, 1]; a negotiator will obtain its reserva-

tion value if no agreement is reached.

RBAN is a sequence of negotiations with the

same opponent under AOP. A negotiation in RBAN

could be denoted as a function π(θ, S) of a strat-

egy θ and a scenario S = (Ω,U

, r

). The

strategy selecting problem for a negotiation sce-

nario S

in a RBAN negotiation sequence Π

, . . . , π

argmax

∈Θ

our

(ω

)|ω

← π

(θ

, S

), Π

}

where Π

= < π

, π

, . . . , π

i−1

> is a subset of

the Π

, meaning the negotiations before π

; Θ =

{θ

, θ

, . . . , θ

} is the strategy space of an agent; and,

θ is a strategy conﬁguration that contains a set of nu-

merical or categorical parameters.

4 AGGREGATION CNN

SURROGATE MODEL

Figure 1 shows the structure of the proposed

ACSM. An ACSM contacts several pre-trained CNN-

components with an input layer in parallel and com-

pacts their outputs with an aggregation layer. The out-

put of the aggregation layer is the estimated agree-

ment utility.

The input layer includes a discrete outcome-

utilities matrix U, reservation values [r

, r

], and a

strategy conﬁguration θ. The strategy conﬁguration θ

could be real numbers representing real-valued strate-

gies or one-hot encoded vectors representing categor-

ical strategies. The discrete outcome-utilities matrix

U (m × m) is calculated from the outcome utilities.

Each element U

j,k(1≤ j≤m,1≤k≤m)

in U indicates the

number of outcomes in the corresponding utility bins

and could be calculated using Equation 1. Figure 2

shows an example of transforming the outcome space

to U (5 × 5). Mapping the outcome utility distribu-

tion to a size-ﬁxed matrix that indicates the number

of outcomes falling within utility bins of predeﬁned

makes the CNN-components applicable to domains

of different sizes and reduces the computing cost of

convolution.

A CNN-component, denoted C

(·) in Equation 2,

is already trained by the negotiation history data with

an opponent A

before integrating into ACSM. Each

CNN-component is trained with a unique opponent

agent. The output of a CNN-component C

(·) is

the estimated agreement utility of applying the in-

put strategy on the input negotiation scenario when

against the opponent agent A

. In the training phase,

labels are the real obtained agreement utilities of ap-

plying negotiation strategies on scenarios against the

opponent agent A

= C

(U, r

own

, r

, θ

), j = 1, . . . , n

(2)

The aggregation layer is a sigmoid-activated neu-

ron, ensuring the output is scaled to [0, 1]. Its out-

put is positively correlated to the weighted summation

of the outputs of the aggregated CNN-components

(Equation 3). Its weights are the online trainable pa-

rameters, which make a trade-off between the aggre-

gated CNN-Components. ACSM is expected to be

able to ﬁt different opponent agents by adjusting the

weights.

ˆu

= ACSM(T )

= sigmoid(

∑

i=1

(T )) ∝

∑

i=1

(T )

(3)

where T = (U, r

, r

, θ

); C

(·) denotes a CNN-

component trained with an opponent agent A

; n is

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

279

j,k



{ω

∈ Ω i f b

lower

, j

≤ U

(ω

) ≤ b

upper

, j

and b

lower

≤ U

(ω

) ≤ b

upper

}



lower

, j

= ( j − 1)×

, b

upper

, j

= j ×

, b

lower

= (k − 1) ×

, b

upper

= k ×

( j = 1, 2, . . . , m; k = 1, 2, . . . , m) (1)

where U

j,k(1≤ j≤m,1≤k≤m)

is an element in U shows the number of outcomes in the corresponding utility bin.

CNN-component with

𝐴

CNN2D

CNN−component with 𝐴

CNN2D

CNN−component with 𝐴

CNN2D

Agreement

utility 𝑢"

Hidden Layer

𝐀𝐍

𝑤

Initial:

𝑤

= ⋯ = 𝑤

A Strategy

𝜃

Discrete outcome

utilities matrix 𝕌'

Inputs

Reservation values

[𝑟

, 𝑟

]

CNN-components Aggregation Output

Figure 1: Structure of the proposed ACSM. CNN-component with A

means it is trained with the negotiation data against the

opponent A

the number of aggregated CNN-components.

5 STRATEGY SELECTING

MECHANISM OF

FINE-TUNING ACSM

Figure 3 demonstrates a strategy selecting mechanism

of ACSM with ﬁne-tuning (F-ACSM) for RBAM.

There are three primary procedures in the mecha-

nism: initialize ACSM, select a strategy using ACSM

with Monte-Carlo method, and ﬁne-tune the surrogate

model after each negotiation.

Initialize ACSM. Selecting the strategy that can

perform averagely best before getting any informa-

tion about the facing opponent is rational; hence, the

weights of the aggregation layer are initialized to

(n is the number of CNN-Components). The output

of initial ACSM is positively correlated with the mean

value of all component outputs (see Equation 4).

ˆu

= ACSM

(T ),C

(T ), . . . ,C

(T ))

= sigmoid(

∑

i=1

(T )) ∝

∑

i=1

(T )

(4)

Select a strategy with ACSM. The mechanism pre-

dicts the performance of all possible strategies with

ACSM and selects the one with best prediction. To

predict the performance of strategies for a given ne-

gotiation scenario, ACSM needs the reservation val-

ues and U as inputs, whereas the opponent’s reserva-

tion value and utility function are private information.

One feasible way of overcoming the private informa-

tion is to sample the unknown part and select the strat-

egy that performs better on the samples, i.e., Monte

Carlo method (Figure 4). In this multiple issue lin-

ear additive utility case, a sampling assigns a weight

and generates a mapping function e

), v

∈ V

for each issue I

∈ I under restrictions (Table 1). The

) maps a random number to each possible issue

value v

∈ V

. The ﬁrst proposal restriction assumes

an opponent would like to propose the bid that maxi-

mizes its utility at the ﬁrst step (Baarslag et al., 2012;

Baarslag et al., 2013; Ya’akov Gal and Ilany, 2015).

After sampling, each strategy θ

∈ Θ is evaluated us-

ing ACSM on all the samples. The average output on

the samples is seen as the predicted performance of a

strategy.

Fine-tune ACSM. The proposed ﬁne-tuning ap-

proach adjusts the weights of the aggregation layer of

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

280

Figure 2: Example of transforming the original outcome utilities to U (5 × 5), where each index on the axis correspond to a

utility range of the negotiator. The integers ﬁlled in the matrix indicate the number of outcomes in the corresponding utility

bin.

Fine-tuned

𝐀𝐂𝐒𝐌

𝟏

Negotiation 1

Strategy Selection

Evaluated by Monte-

Carlo Method

Initialize

𝐀𝐂𝐒𝐌

𝟎

Fine-Tune

Negotiation 2

Strategy Selection

Evaluated by Monte-

Carlo Method

Result

Fine-tuned

𝐀𝐂𝐒𝐌

𝐧$𝟏

Negotiation n

Strategy Selection

Evaluated by Monte-

Carlo Method

Result

Fine-Tune

Result

Figure 3: Procedures of the strategy selecting mechanism of F-ACSM model in a RBAN session with n negotiation scenarios.

ACSM after each negotiation (Algorithm 1). First, the

Hardheaded opponent model (HOM) (Van Krimpen

et al., 2013) estimates the opponent utility function

opp

by feeding with the opponent’s bidding history

. Then, a

is calculated with the

opp

as Equa-

tion 1. The estimated opponent reservation value ˆr

opp

is the minimum estimated utility in the opponent’s

bidding history BH

calculated by the

opp

. Finally,

the back-propagation optimizer ﬁne-tunes the aggre-

gation layer of ACSM by using

and ˆr

opp

as in-

puts, and the actual utility obtained in π

as the ex-

pected output. Suppose the diversity of the CNN-

components of an ACSM is enough. In that case, the

behavior pattern of the facing opponent agent must be

similar to one or a combination of the training agents.

Therefore, adjusting the aggregation layer weights of

the ACSM could adapt it to the opponent agent even

unknown.

6 EXPERIMENTS

Two experiments of selecting a strategy for a time-

dependent agent were performed to evaluate ACSM

and F-ACSM, respectively. One experiment of single

negotiations compared ACSM with an expert-feature-

based neural network surrogate model (NNSM) im-

plemented by ourselves using the same features as

in (Ilany and Gal, 2016), showing the capability of

the CNN-feature-based surrogate model. Another one

of RBAN compared ACSM-only with F-ACSM, pre-

senting the effect of the ﬁne-tuning approach. Both

of them applied diverse negotiation scenarios and op-

ponent agents. All experiments were performed on

NegMAS of Python (Mohammad et al., 2020).

6.1 Experimental Setup

We evaluated the proposed methods by selecting a

parameter for a time-dependent agent using only

a time-dependent strategy. This time-dependent

strategy is generally adopted by many advanced

agents (Ya’akov Gal and Ilany, 2015) and can notice-

ably affect negotiation results. The time-dependent

strategy (Faratin et al., 1998) follows a function: U

1 −





, where T is the maximum negotiation time,

and e controls the concession pattern. The lower value

of e means that the concession is faster at the start,

slower at the end, and vice versa. Usually, e is set

in [0.1, 5.0]. We limit the strategy space to range e ∈

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,

1.4, 1.6, 1.8, 2.0, 2.5, 3.0, 4.0, 5.0]

The scenarios for evaluation are from 12 domains

of ANAC 2013 (Table 2). Those scenarios are gen-

erated uniformly at random, covering multiple situa-

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

281

Table 1: Restrictions when sampling the opponent preference proﬁle.

Name of restriction Equation

Each weight range 0 < w

< 1

Total weights

∑

∈I

= 1

Value mapping function 0 ≤ e

) ≤ 1

First proposal e

) = 1 i f v

in ω

f irst

, v

∈ V

, I

∈ I

Reservation value r

opp

= [0.0, 1.0]

Sample1:

"##

, $

"%&

'()*

Evaluation of -

𝒌

.+,

(

( )!

Sample n:

/#$

"##

, $

"%&

'()* +,

Sample 2: !"

/#$

"##

, $

"%&

'()*

Sampling restricted by 0

+(,-.

"##

Known:

Own reserved value $

"%&

Own utility function 1

"%&

Outcomes 2

Figure 4: Evaluation of θ

∈ Θ in a negotiation scenario. ω

f irst

denotes the ﬁrst bid from the opponent.

tions. The average conﬂict level is 0.501 with a stan-

dard deviation of 0.132. The conﬂict levels of 94.4%

scenarios are located in [0.237, 0.765].

Opponent agents for evaluation are ten differ-

ent agents (Table 3) including four ANAC champi-

ons (Baarslag et al., 2012; Fujita et al., 2013; Mori

and Ito, 2017; Aydo

gan et al., 2020) and six basic

agents (Faratin et al., 1998).

Comparing the strategy-selecting mechanism ap-

plying the initial ACSM with one applying the expert-

feature-based NNSM is a feasible way of illustrat-

ing the effects of the proposed CNN-based feature-

extracting approach. The performance metrics are the

agreement utility, social welfare, and agreement ra-

tio. Their values are the average of ten times repeated

for a negotiation setting. The expert features referred

to (Ilany and Gal, 2016) are presented in Table 4. We

tested ACSM using three different numbers of Monte

Carlo samples: 10, 20, and 30. We found that the dif-

ferences between 20 and 30 were minimal, so we ulti-

mately chose to use 20 Monte Carlo samples. We set

the shape of U to 100 × 100, as this size was found to

be a good balance between training time and perfor-

mance when compared to 10 × 10 and 1000 × 1000,

which were all tested.

Performing the strategy selecting mechanisms ap-

plying ﬁne-tuned ACSM and the initial ACSM on

RBAN sessions could present the performance of

ACSM with and without ﬁne-tuning. One RBAN ses-

sion included 50 scenarios randomly sampled from

the 600 scenarios. 60 RBAN sessions were sampled

for testing, avoiding the randomness of one RBAN

session. The experiments with ten different opponent

agents were performed to demonstrate the efﬁciency

of ﬁne-tuning against different opponent agents. The

average agreement utility of 20 times repeats of a ses-

sion was used as the performance metric. We set the

learning rate for ﬁne-tuning to 0.01, as this value was

found to perform the best among 0.001, 0.01, and 0.1,

which were all tested.

6.2 Training

Six CNN-components are integrated into the ACSM,

each trained with a unique opponent agent to predict

the agreement utility for a scenario against that agent.

The output of the initial ACSM equals the average

value of the CNN-components. The baseline method

(NNSM) is trained with the scenario’s average agree-

ment utility of the same opponent agents used for

training CNN-components.

The applied architectures of CNN-component

(Figure 5. a) and NNSM (Figure 5. b) are selected

from ten different architectures designed intuitionally.

Interestingly, we found that a down-sampling layer of

Conv2D (stride = 2) outperforms the one of pooling

in learning scenario features.

The 3000 scenarios of training, uniformly ran-

domly sampled from the 12 domains, are ensured to

be different from those for evaluation. The evalua-

tion used both the basic agents and ANAC champi-

ons, while the training used only six basic agents (Ta-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

282

Algorithm 1: Fine-tuning after a negotiation π

. Ω

is the outcome space, θ

is the used strategy, ω

is the agreement outcome,

if no ω

then U

own

(ω

) = 0, BH

is the opponent bidding history, α is the learning rate, U

own

is own utility function and r

own

is own reservation value, HOM is the hardheaded opponent model.

Require: Ω

, θ

, ω

, BH

, α,U

own

, r

own

, ACSM

k−1

, HOM

Ensure: ACSM

opp

← HOM (BH

)

← Equation1



opp

own



ˆr

opp

← min



opp

(ω

)



∈ BH



ACSM

← ACSM

k−1

− α ∗ ∆



own

(ω

), ACSM

k−1



, r

own

, ˆr

opp

, θ



Table 2: Domain information of the experiments.

Origin Domain Domain Size Origin Domain Domain Size

D1 Lunch 3840 D2 Kitchen 15625

D3 House Keeping 384 D4 Fifty Fifty 11

D5 Defensive Charm 36 D6 Planes 27

D7 Outﬁt 128 D8 Wholesaler 56700

D9 Dog Choosing 270 D10 Animal 1152

D11 Nice or Die 3 D12 Smart Phone 12000

ble 3); thus, the methods were evaluated on negoti-

ations distributed both homogeneously and heteroge-

neously with the training set.

The inputs of training CNN-components utilized

the opponent’s private information for the conver-

gence. The output of training NNSM was the aver-

age agreement utility of the six training opponents on

a scenario. The batch size is set to 200. All train-

ing processes are stopped within 100 steps, although

the maximum number is set to 3000. Each train-

ing repeated ﬁve times with early stopping, and the

model with the best validation was used for testing.

The validation loss of the CNN-components of time-

dependent agents are around 0.06, and those of tit-

for-tat agents are around 0.13. The validation loss of

NNSM are around 0.11.

6.3 Results

The experimental results of strategy selecting mech-

anisms applying initial ACSM and NNSM are dis-

cussed in Section 6.3.1., comparing CNN-extracted

features with expert features. A strategy-selecting

mechanism is denoted as the surrogate model applied,

i.e., ACSM or NNSM, simplifying the notation. Sec-

tion 6.3.2. presents the differences between applying

ACSM with and without ﬁne-tuning against different

opponent agents.

6.3.1 ACSM and NNSM

The results are demonstrated from two perspectives.

One is the performance against each opponent over

all the scenarios, showing the inﬂuence of the oppo-

nent agent; another is the performance in each domain

against all the opponents, showing the inﬂuence of

scenario size.

Table 5 shows the results against each opponent.

Out of the ten opponent agents, ACSM performed

not worse in seven in terms of the agreement util-

ity, ACSM performed not worse in eight in terms of

the social welfare, and ACSM performed not worse in

nine in terms of the agreement rate. The differences

between ACSM and NNSM regarding the agreement

utility were marginal. One possible reason is that

the experiments were to select the strategy parame-

ter for a time-dependent agent, where stubborn strate-

gies were easier to get higher agreement utilities in

most cases; hence, both surrogate models learned to

select the most stubborn strategy (i.e., e = 5) for most

scenarios, resulting in the differences being marginal.

Notably, ACSM outperformed NNSM regarding so-

cial welfare and agreement rate noticeably, although

it was not trained for them. We found that ACSM

will ﬂex to less stubborn but more reasonable strate-

gies when the scenario is not promising, probably by

considering more scenario details, contributing to the

higher agreement ratios under similar agreement util-

ities, thus promoting higher social welfare. Simulta-

neously, NNSM lost more details when selecting the

strategy, resulting in a lower agreement ratio and so-

cial welfare.

According to all three performance metrics, the

only one of the ten opponent agents that NNSM dom-

inated ACSM was the time-dependent agent (e=1). Its

agreement utility value in a scenario was close to all

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

283

Table 3: Opponent agents used in this experiment.

Type Agent Name Year of ANAC/Strategy

Testing

AgentK 2010

Hardheaded 2011

Atlas3 2015

AgentGG 2019

Training

and

testing

Time dependent e = 0.1

Time dependent e = 1.0

Time dependent e = 5.0

Tit-For-Tat δ = 1

Tit-For-Tat δ = 2

Tit-For-Tat δ = 3

Table 4: Features used in features-based neural network surrogate model. U denotes own utility function.

Type Description Equation Notation

Domain

Number of issues |I |

Average number of values |

∑

I∈I

Number of outcomes |Ω|

Preference

Standard deviation of weights

∑

I∈I



−



Average utility of Ω

Ω

∑

ω∈Ω

U(ω)

Standard deviation utility of Ω

Ω

∑

ω∈Ω

(U(ω) −

Reservation

value

Reservation value r

Percent of Ω above r

Ω

|Ω

|, U(Ω

) > r

Average utility of Ω

|Ω

∑

ω∈|Ω

U(ω)

Standard deviation utility of Ω

|Ω

∑

ω∈Ω

(U(ω) −

)

Opponent

The utility of

the ﬁrst bid from the opponent

U(ω

f irst

)

training agents’ average agreement utility value, re-

sulting that NNSM performs like predicting the agree-

ment utility of the time-dependent agent (e=1), which

could be one reason that NNSM dominated regarding

the time-dependent agent (e=1). In contrast, when se-

lecting a strategy with ACSM, a strategy is evaluated

by all CNN-components; thus, the CNN-component

that evaluates strategies in a more radical way, i.e.,

leaves more apparent gaps between the evaluation val-

ues assigned to strategies, would have a more sig-

niﬁcant impact on the ﬁnal selection; consequently,

a strategy would not be preferred even if only one

CNN-component assigns it an evaluation value no-

ticeably lower than the other strategies.

Table 6 classiﬁes the results by domain, showing

the performance with a different number of outcomes.

Out of the 12 domains, ACSM performs not worse

in nine in terms of the agreement utility, ACSM per-

forms not worse in ten in terms of the social welfare,

and ACSM performs not worse in ten in terms of the

agreement rate. Notably, ACSM performed better in

the cases when the number of outcomes was greater

than 100, i.e., D1-3, D7-10, and D12, demonstrating

that ACSM could understand the negotiation settings

rich in information better than NNSM. In contrast, in

domains D4-6 and D11 with less than 50 outcomes,

ACSM performed the same with or slightly worse

than NNSM. One reason could be that the outcome-

utilities matrices of those domains are too sparse for

CNN to extract useful information.

To summarize, the hypotheses that ACSM out-

performs NNSM in terms of agreement utility by

0.027%, in terms of social welfare by 0.253%, and

in terms of agreement ratio by 0.335% are conﬁdent

at α = 0.1 according to the Mann-Whitney U test

(p = 0.095). These results indicate that a strategy

selection mechanism using ACSM is able to select

more advantageous strategies than one using NNSM

by making more accurate evaluations.

6.3.2 F-ACSM and ACSM

This section presents the experimental results of per-

forming strategy selecting mechanisms using ACSM

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

284

1x3

1x50

1x25

1x10

Output:

1x1

Input

Conv2D

5x5

Conv2D

2x2

strides=2

Conv2D

5x5

Conv2D

5x5,

strides=2

Conv2D

5x5

strides=2

Conv2D

2x2

strides=2

Conv2D

2x2

strides=2

flatten

concatenate

Dense Dense Dense Dense DenseDense Dense Dense Dense Dense Dense

Input: 11

176

352

176

Output: 1

Figure 5: The schematic diagram of CNN-component (a) and NNSM (b).

Table 5: Results of ACSM and NNSM regarding each opponent. Hard denotes the HardHeaded agent. T4T denotes the tit-

for-tat agent, Time denotes the time dependent agent, and the number below them means the strategy parameter. AU denotes

average agreement utility; SW denotes social welfare; AR denotes agreement rate; AC denotes ACSM; NN denotes NNSM.

Hard AgentK AgentGG Atlas3

T4T

Time

0.1

Time

AC 0.684 0.754 0.746 0.844 0.799 0.808 0.810 0.896 0.831 0.799

NN 0.684 0.754 0.737 0.845 0.798 0.809 0.810 0.894 0.834 0.794

AC 1.558 1.607 1.546 1.646 1.527 1.518 1.508 1.473 1.492 1.610

NN 1.554 1.602 1.526 1.646 1.525 1.516 1.503 1.468 1.494 1.606

AC 0.709 0.796 0.753 0.915 0.773 0.778 0.767 0.828 0.750 0.863

NN 0.706 0.792 0.738 0.910 0.772 0.775 0.760 0.827 0.757 0.852

with ﬁne-tuning (F-ACSM) and one without ﬁne-

tuning (ACSM) on RBAN sessions. The results re-

garding each opponent agent are demonstrated, show-

ing the performances of ﬁne-tuning against different

opponent agents. Only agreement utility is used as

the performance metric, considering ﬁne-tuning tar-

gets achieving a higher agreement utility.

Table 7 presents the results of the ACSM and F-

ACSM methods against various opponent agents. Ac-

cording to the Wilcoxon signed ranks test with a con-

ﬁdence level of α = 0.05, the results that are bolded in

each column are statistically signiﬁcantly greater than

the other. Our analysis shows that F-ACSM consis-

tently outperforms ACSM across a range of opponent

agents, both for the training and testing cases. This

indicates that the ﬁne-tuning method is effective at al-

lowing ACSM to adapt to the opponent agent being

faced in the current scenario, even it that is not used

in the training of the CNN components. In most cases,

the differences between F-ACSM and ACSM are no-

table, with only two exceptions: the Hardheaded and

Tit-for-Tat (δ = 3) opponent agents. One potential

reason for this is that the Hardheaded agent is par-

ticularly stubborn, making it difﬁcult to improve the

agreement utility through adjustments to the time-

dependent strategy. Another possible reason is that

the Tit-for-Tat (δ = 3) agent is complex and highly de-

pendent on the scenario, which may make it difﬁcult

to learn through ﬁne-tuning, even though its CNN-

component has a relatively lower validation accuracy

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

285

Table 6: Results of ACSM and NNSM in different domains. D1-12 refer to the domain in Table 2, respectively.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12

AC 0.765 0.847 0.806 0.805 0.784 0.746 0.773 0.832 0.817 0.798 0.781 0.812

NN 0.763 0.844 0.800 0.805 0.792 0.748 0.769 0.830 0.816 0.797 0.782 0.808

AC 1.480 1.621 1.570 1.545 1.564 1.478 1.556 1.582 1.564 1.529 1.534 1.560

NN 1.478 1.612 1.561 1.543 1.562 1.482 1.539 1.576 1.558 1.525 1.534 1.561

AC 0.739 0.822 0.808 0.842 0.815 0.706 0.732 0.830 0.817 0.792 0.836 0.782

NN 0.743 0.812 0.800 0.838 0.809 0.712 0.722 0.824 0.813 0.788 0.836 0.770

Table 7: Average agreement utility of F-ACSM and ACSM with each opponent in the 60 RBAN sessions each including 50

scenarios. F-AC and AC denote F-ACSM and ACSM respectively.

Hard AgentK AgentGG Atlas3

T4T

Time

0.1

Time

F-AC 0.6887 0.7564 0.7464 0.8472 0.8058 0.8114 0.8108 0.8953 0.8366 0.8018

AC 0.6885 0.7558 0.7455 0.8441 0.8048 0.8108 0.8107 0.8941 0.8353 0.8007

(0.17 compared to the average of 0.10).

Figure 6 demonstrates the changing of agreement

utility as the number of scenarios increases, show-

ing that F-ACSM performed better in most cases

and on average. Especially, F-ACSM learned the

testing-only ANAC agents not slower than the essen-

tial agents. We noticed some curves would go down,

especially after 20 scenarios. One reason could be

that, at ﬁrst, the surrogate model could be success-

fully tuned easier when it is far from the truth (the

optimal parameter conﬁguration); meanwhile, after

tuning, the misleading could cause more deterioration

when it is near the truth. Another could be that using

the estimated opponent’s preference proﬁle as inputs

for ﬁne-tuning can mislead the tuning sometimes, also

describing when against the Hardheaded agent why

F-ACSM deteriorated at ﬁrst and recovered rapidly.

Figure 6: Relative performance between F-ACNM and

ACNM with different opponents as the number of scenarios

increases. The value is the average for every ten negotiation

scenarios over the 60 sessions. The bold line presents the

average value of all opponents. T4T denotes the tit-for-tat

agent; Time stands for the time dependent Agent, the value

after denotes the strategy parameter.

To summarize, F-ACSM achieved an average im-

provement of 0.13% over ACSM when tested against

the 10 opponent agents. This result is statistically sig-

niﬁcant at the α = 0.05 level according to a Mann-

Whitney U test. It is expected that F-ACSM would

only show slight improvements over ACSM when

only selecting a concession speeds parameter for a ba-

sic time-dependent agent. Additionally, the changing

of relative performance as the number of scenarios in-

creases shows that ﬁne-tuning could adapt ACSM to

the opponent gradually in RBAN, although the pro-

cess may be a tortuous ascent.

7 CONCLUSION AND FUTURE

WORK

This paper presented an ACSM and ﬁne-tuning ap-

proach for a strategy selecting mechanism applied to

RBAN. The ACSM was characterized by using CNN

to intelligently extract negotiation scenario features

and aggregating different CNNs to ensure the diver-

sity of extracted features. The ﬁne-tuning approach

was applied to adjust the weights of CNNs of the

ACSM after each negotiation to adapt the ACSM to

the facing opponent. The ACSM was higher than the

NNSM in agreement utility, social welfare, and agree-

ment ratio in the experimental results of single negoti-

ations with selecting a parameter for a time-dependent

agent. This indicated that the CNN-feature-based

surrogate model is more promising than the existing

expert-feature-based surrogate model. The F-ACSM

was higher than the ACSM in agreement utility in the

experimental results of RBAN, showing that the ﬁne-

tuning method is beneﬁcial for adapting ACSM to the

opponent.

In future studies, we will consider overcoming the

negative effect of over-ﬁtting (i.e., the deterioration

caused by ﬁne-tuning, especially in the late RBAN.).

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

286

Designing some early-stopping or dynamic learning

rate rules for ﬁne-tuning would be beneﬁcial; how-

ever, the scenario uncertainties make it difﬁcult to

calculate the opponent model’s current accuracy and

weigh up its performance after tuning.

ACKNOWLEDGEMENT

This study was supported by JSPS KAKENHI Grant

Numbers 22H03641, 19H04216 and JST FOREST

(Fusion Oriented REsearch for disruptive Science and

Technology) Grant Number JPMJFR216S.

REFERENCES

Aydogan, R., Baarslag, T., Fujita, K., and Jonker, C.

(2022). 13th Automated Negotiating Agents Com-

petition (ANAC2022). https://web.tuat.ac.jp/ kat-

fuji/ANAC2022/genius.html.

Aydo

gan, R., Baarslag, T., Fujita, K., Mell, J., Gratch, J.,

de Jonge, D., Mohammad, Y., Nakadai, S., Morinaga,

S., Osawa, H., Aranha, C., and Jonker, C. M. (2020).

Challenges and main results of the automated nego-

tiating agents competition (ANAC) 2019. In Bassil-

iades, N., Chalkiadakis, G., and de Jonge, D., edi-

tors, Multi-Agent Systems and Agreement Technolo-

gies, pages 366–381, Cham. Springer International

Publishing.

Baarslag, T., Fujita, K., Gerding, E. H., Hindriks, K., Ito,

T., Jennings, N. R., Jonker, C., Kraus, S., Lin, R.,

Robu, V., and Williams, C. R. (2013). Evaluating

practical negotiating agents: Results and analysis of

the 2011 international competition. Artiﬁcial Intelli-

gence, 198:73–103.

Baarslag, T., Hindriks, K., Jonker, C., Kraus, S., and Lin, R.

(2012). The ﬁrst automated negotiating agents com-

petition (ANAC2010). In Ito, T., Zhang, M., Robu,

V., Fatima, S., and Matsuo, T., editors, New Trends in

Agent-Based Complex Automated Negotiations, pages

113–135. Springer Berlin Heidelberg, Berlin, Heidel-

berg.

Chkoniya, V. and Mateus, A. (2019). Digital category man-

agement: How technology can enable the supplier-

retailer relationship. In Smart Marketing With the In-

ternet of Things, pages 139–163. IGI Global.

Faratin, P., Sierra, C., and Jennings, N. R. (1998). Ne-

gotiation decision functions for autonomous agents.

Robotics and Autonomous Systems, 24(3):159–182.

Fujita, K. (2014). Automated strategy adaptation for multi-

times bilateral closed negotiations. In Proceedings

of the 2014 International Conference on Autonomous

Agents and Multi-Agent Systems, AAMAS’14, page

1509–1510, Richland, SC. International Foundation

for Autonomous Agents and Multiagent Systems.

Fujita, K. (2018). Compromising adjustment strategy based

on tki conﬂict mode for multi-times bilateral closed

negotiations. Computational Intelligence, 34(1):85–

103.

Fujita, K., Ito, T., Baarslag, T., Hindriks, K., Jonker, C.,

Kraus, S., and Lin, R. (2013). The second automated

negotiating agents competition (ANAC2011). In Ito,

T., Zhang, M., Robu, V., and Matsuo, T., editors,

Complex Automated Negotiations: Theories, Models,

and Software Competitions, pages 183–197. Springer

Berlin Heidelberg, Berlin, Heidelberg.

unes¸, T. D., Arditi, E., and Aydo

gan, R. (2017). Collective

voice of experts in multilateral negotiation. In An, B.,

Bazzan, A., Leite, J., Villata, S., and van der Torre,

L., editors, PRIMA 2017: Principles and Practice of

Multi-Agent Systems, pages 450–458, Cham. Springer

International Publishing.

Ilany, L. and Gal, Y. (2016). Algorithm selection in bilat-

eral negotiation. Autonomous Agents and Multi-Agent

Systems, 30(4):697–723.

Kawata, R. and Fujita, K. (2020). Meta-strategy based on

multi-armed bandit approach for multi-time negotia-

tion. IEICE Transactions on Information and Systems,

E103.D(12):2540–2548.

Mohammad, Y., Nakadai, S., and Greenwald, A. (2020).

Negmas: A platform for automated negotiations. In

PRIMA 2020: Principles and Practice of Multi-Agent

Systems: 23rd International Conference, Nagoya,

Japan, November 18–20, 2020, Proceedings, page

343–351, Berlin, Heidelberg. Springer-Verlag.

Mori, A. and Ito, T. (2017). Atlas3: A negotiating agent

based on expecting lower limit of concession func-

tion. In Fujita, K., Bai, Q., Ito, T., Zhang, M., Ren,

F., Aydo

gan, R., and Hadﬁ, R., editors, Modern Ap-

proaches to Agent-based Complex Automated Negoti-

ation, pages 169–173. Springer International Publish-

ing, Cham.

Renting, B. M., Hoos, H. H., and Jonker, C. M. (2020).

Automated conﬁguration of negotiation strategies. In

Proceedings of the 19th International Conference on

Autonomous Agents and MultiAgent Systems, AA-

MAS’20, page 1116–1124, Richland, SC. Interna-

tional Foundation for Autonomous Agents and Mul-

tiagent Systems.

Renting, B. M., Hoos, H. H., and Jonker, C. M. (2022).

Automated conﬁguration and usage of strategy port-

folios for mixed-motive bargaining. In Proceedings

of the 21st International Conference on Autonomous

Agents and Multiagent Systems, AAMAS’22, page

1101–1109, Richland, SC. International Foundation

for Autonomous Agents and Multiagent Systems.

Rosenschein, J. S. and Zlotkin, G. (1994). Rules of En-

counter: Designing Conventions for Automated Ne-

gotiation among Computers. MIT Press, Cambridge,

MA, USA.

Sengupta, A., Mohammad, Y., and Nakadai, S. (2021).

An autonomous negotiating agent framework with re-

inforcement learning based strategies and adaptive

strategy switching mechanism. In Proceedings of

the 20th International Conference on Autonomous

Agents and MultiAgent Systems, AAMAS’21, page

1163–1172, Richland, SC. International Foundation

for Autonomous Agents and Multiagent Systems.

A Fine-Tuning Aggregation Convolutional Neural Network Surrogate Model of Strategy Selecting Mechanism for Repeated-Encounter

Bilateral Automated Negotiation

287

Taiji, M. and Ikegami, T. (1999). Dynamics of internal mod-

els in game players. Physica D: Nonlinear Phenom-

ena, 134(2):253–266.

Van Krimpen, T., Looije, D., and Hajizadeh, S. (2013).

Hardheaded. In Ito, T., Zhang, M., Robu, V., and Mat-

suo, T., editors, Complex Automated Negotiations:

Theories, Models, and Software Competitions, pages

223–227. Springer Berlin Heidelberg, Berlin, Heidel-

berg.

Wu, L., Chen, S., Gao, X., Zheng, Y., and Hao, J. (2021).

Detecting and learning against unknown opponents

for automated negotiations. In Pham, D. N., Theer-

amunkong, T., Governatori, G., and Liu, F., editors,

PRICAI 2021: Trends in Artiﬁcial Intelligence, pages

17–31, Cham. Springer International Publishing.

Ya’akov Gal, K. and Ilany, L. (2015). The fourth auto-

mated negotiation competition. In Fujita, K., Ito, T.,

Zhang, M., and Robu, V., editors, Next Frontier in

Agent-Based Complex Automated Negotiation, pages

129–136. Springer Japan, Tokyo.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

288