A GAME THEORETIC BIDDING AGENT

FOR THE AD AUCTION GAME

Yevgeniy Vorobeychik

Sandia National Laboratories, Livermore, CA, U.S.A.

Keywords:

Bidding agents, Keyword auctions, Game theory.

Abstract:

TAC/AA (ad auction game) provides a forum for research into strategic bidding in keyword auctions to try

out their ideas in an independently simulated setting. We describe an agent that successfully competed in the

TAC/AA game, showing in the process how to operationalize game theoretic analysis to develop a very simple,

yet highly competent agent. Speciﬁcally, we use simulation-based game theory to approximate equilibria in a

restricted bidding strategy space, assess their robustness in a normative sense, and argue for relative plausibility

of equilibria based on an analogy to a common agent design methodology. Finally, we offer some evidence

for the efﬁcacy of equilibrium predictions based on TAC/AA tournament data.

1 INTRODUCTION

Trading Agent Competition (TAC) is a successful fo-

rum for research into competitive agent design in

an independent, highly complex, simulation environ-

ment. The ad auction game was recently introduced

with a speciﬁc focus on several key strategic aspects

of the keyword auction environment, carefully styl-

ized into a TAC/AA simulation. We developed our

agent to compete in TAC/AA, focusing primarily on

a simulation-based game theoretic approach to en-

lighten bidding strategy.

There has been much discussion about the norma-

tive and descriptive value of Nash equilibria in actual

strategic settings such as the one faced by a TAC/AA

agent. Historically, the use of game theory has been

relatively rare in agent design, even in the TAC tour-

naments (see (Wellman et al., 2006) for an exception).

One reason that agent designers often eschew game

theoretic techniques is that in general there may be

many equilibria, and the problem of equilibrium se-

lection requires coordination among the agents. Ad-

ditionally, any asymmetric equilibrium requires coor-

dination on roles. Finally, other agents may be im-

perfectly rational in a variety of ways (for example,

buggy). These are valid issues which reveal consid-

erable methodological uncertainty in operationalizing

game theoretic techniques even if we believed them

to be reasonable in a particular setting (i.e., when op-

ponent agents are rational and attempt to maximize

their payoffs). Our main contribution is to offer some

general guidance to agent designers in operationaliz-

ing game theory, which we illustrate in the context of

TAC/AA bidding strategy.

Our bidding strategy analysis restricts the consid-

eration set to discretized linear strategies that compute

a fraction of the myopic value per click to bid. We

perform simulation-based game theoretic analysis in

this restricted strategy space to (a) identify equilibria,

(b) suggest equilibrium selection techniques, and (c)

evaluate robustness of various possible strategies. We

ﬁnd, for example, that a particularly appealing equi-

librium, one reached by iterative best response seeded

with truthful bidding, is also very robust and is actu-

ally a best response to a range of reasonable opponent

strategies.

Finally, we assess predictive value of equilibrium

bidding policies derived using simulations based on

actual tournament data, ﬁnding that predictions pro-

gressively improve over the span of the tournament,

becoming relatively accurate on some measures.

2 THE TAC/AA GAME

The TAC/AA game features eight autonomous soft-

ware agents representing advertisers in a simulated

keyword (ad) auction. The advertisers interact with

the environment by submitting bids and ads to be

shown for a set of keywords over a sequence of 60

simulated days, each lasting 10 seconds. The envi-

ronment itself is comprised of the publisher (search

engine) agent, who collects the bids and ads from the

Vorobeychik Y..

A GAME THEORETIC BIDDING AGENT FOR THE AD AUCTION GAME.

DOI: 10.5220/0003140900350044

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 35-44

ISBN: 978-989-8425-41-6

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

advertisers and displays the ads on the search page

ordered by a ranking rule, as well as 90000 users

who search for keywords, click on ads, and make pur-

chases (conversions) from the advertisers. This inter-

action scheme between the agents is depicted visually

in Figure 1. Next we describe in some detail the agent

tasks and TAC/AA simulator implementation.

!"#$%#&'

()*+'

!"#$%&'()*

+,-./0*1-2,*

"&()&*

34&*

5"()%(&* !")6'3&(&*

347()8&()*

6$%69&*

Figure 1: Schematic of the TAC/AA game.

2.1 Advertiser Agents

A TAC/AA advertiser agent plays a role of a retailer

of home entertainment products. Each product is

a combination of a manufacturer and a component

(e.g., Lioneer TV). The game features three manufac-

turers and three components, for a total of nine prod-

ucts. While all advertisers are able to sell all products,

every advertiser specializes in a manufacturer and a

component. The manufacturer specialization yields a

1.5 factor increase in proﬁts from sales, while compo-

nent specialization results in a boost (roughly a factor

of 1.5) in conversion rates.

An advertiser may submit a bid and an ad for any

keyword and on any simulation day, to take effect on

the following day. In addition, he may specify a bud-

get constraint that limits spending for each keyword

individually, as well as for an entire day. Only two

ad types are allowed: generic (default) and targeted,

which speciﬁes a product. Advertiser’s total payoff

is the sum of his revenues from product sales less all

per-click costs over the span of a simulation.

For more details, see (Jordan and Wellman, 2009).

2.2 Publisher

The publisher has two tasks: ranking advertisers for

each keyword and computing advertiser payments per

click. An advertiser a is endowed at the beginning

of a game (simulation) with a baseline click-through-

rate (CTR) e

for each keyword q, which is only re-

vealed to the publisher. Given a collection of bids

, advertisers are ranked by a score b

)

, where

χ ∈ [0,1] is chosen and revealed to advertisers at the

beginning of each game.

The payments per click are

determined according to the generalized second-price

(GSP) scheme (Lahaie and Pennock, 2008). Specif-

ically, suppose that advertisers are indexed by their

rank (i.e., advertiser with a = 1 is ranked in the ﬁrst

slot). Then the payment of advertiser a is

a+1

)

that is, the score of the advertiser ranked immediately

below, divided by his click-through-rate to the power

χ. An exception to this payment rule arises when the

reserve price r

of a slot a (the slot in which a is

placed) is higher than p

, in which case the advertiser

simply pays r

. When an advertiser drops out due to

saturating a budget constraint, rank and payments per

click are recomputed for the remaining ads.

2.3 Search Users

Each of 90000 users has a speciﬁc product preference

and will only purchase his preferred product. User

preferences are distributed evenly across all products.

A user may submit three kinds of queries (key-

words): F0, F1, and F2. A unique F0 query spec-

iﬁes neither the manufacturer nor the component of

the user’s preferred product. Six F1 queries partially

reveal a user’s preference: three specify only the man-

ufacturer and three only the component of the desired

product. Finally, nine F2 queries completely reveal

the user’s preferred product (specify both the manu-

facturer and the component).

A user’s behavior is determined by his “state”. In-

deed, a user may not even submit search queries, or

may submit queries and click on ads with no intent

to purchase. In the latter case, such “informational”

users select uniformly among the three queries (F0,

F1, or F2) to submit to the publisher. Finally, a “fo-

cused shopper” submits a query depending on his “fo-

cus level” (0, 1, or 2) corresponding to the three key-

word classes above (thus, for example, a user in fo-

cus level 1 submits a F1 query). A user in a focused

See (Lahaie and Pennock, 2008) for a discussion of this

class of ranking rules.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

state makes a purchase (given a click) with conversion

probability described below. Transitions between user

states are Markovian, but non-stationary, as any user

who actually makes a purchase is effectively “reset”.

Consequently, user state distribution is affected by ad-

vertiser decisions.

After a keyword search, a user proceeds down the

list of ads in a Markovian fashion, clicking on an ad

he currently “views” with probability determined by

the baseline CTR e

of that ad, as well as the targeting

effect, which is negative if the advertised product does

not match a user’s preference, positive if it does, and

neutral if the ad is generic.

Upon clicking an ad, the probability that a user

subsequently makes a purchase depends on three

factors: user’s state, advertiser’s specialty, and ad-

vertiser’s capacity. Users in an “informational”

state may click on ads, but never make a purchase.

A focused shopper will purchase with probability

η(I

, f

), where

η(p,x) = px/(px + (1 − p)),

is a baseline conversion rate that depends on the

keyword q, f

is a factor that is 1.5 if the advertiser

specializes in the component preferred by the user and

1 otherwise. Finally,

= 0.995

(

∑

i=d−4

−C

)

with C

a capacity constraint of the advertiser, d cur-

rent day, and c

advertiser’s sales on day i. Note that

the value of I

on day d changes dynamically with

each sale on that day (i.e., as c

changes).

3 TOURNAMENT

15 participants registered for the TAC/AA tourna-

ment, which proceeded in three rounds: qualifying

rounds, semiﬁnals, and ﬁnals. No agents were elim-

inated in the qualifying rounds, as all were deemed

competent enough to proceed. The eight top scoring

agents from the semiﬁnal round competed in the ﬁ-

nals. The ﬁnal ranking of the top agents is shown in

Table 1. Our agent, QuakTAC, ﬁnished with the fourth

highest score, a mere 1.25% below the third-place ﬁn-

isher and 2.38% below the second place.

4 AGENT DESIGN

The decision environment in which a TAC/AA agent

acts is very complex, with much uncertainty and de-

cision interdependence between keywords and days.

Table 1: Final ranking for the TAC/AA tournament.

Rank Agent Average Score

1 TacTex 79,886

2 AstonTAC 76,281

3 Schlemazl 75,408

4 QuakTAC 74,462

5 munsey 71,777

6 epﬂagent 71,693

7 MetroClick 70,632

8 UMTac09 66,933

Thus, the process of designing and building an agent

must of necessity involve two aspects: an analysis

based on high-level strategic abstraction, as well as

low-level implementation details. Our design of agent

strategy (high level) had simulation-based game theo-

retic analysis at its core. To understand this analysis,

however, we must ﬁrst weave together some low-level

details, as well as abstraction steps that were under-

gone before the corresponding game theoretic prob-

lem was appropriately deﬁned.

First, we made a grand simpliﬁcation in agent de-

sign by focusing almost exclusively on bidding strat-

egy. As such, our budget was left always entirely un-

constrained. Furthermore, we ﬁxed the ad selection

policy before any strategic analysis of bidding, hope-

ful that the speciﬁc ad choice has relatively low payoff

impact (we revisit this assumption below).

4.1 Ad Selection

We choose a generic ad for a F0 keyword and a

targeted ad for all others. For a F1 keyword, we

choose the product in the ad to match the manufac-

turer/component in the keyword, while the missing

product element of the keyword is ﬁlled with the

advertiser’s specialty. The ad for the F2 keyword

matches the product in the keyword.

4.2 Bidding Policy

The problem of developing an effective bidding strat-

egy in keyword auctions has received much attention

in the literature, but there is relatively little practi-

cal evidence of efﬁcacy of any of the proposed tech-

niques. TAC/AA gives us an arguably objective,

highly complex, yet still stylized forum to test bid-

ding strategy development.

Perhaps the most natural approach to bidding in

a complex multiagent setting like TAC/AA is via a

combination of optimization and machine learning.

Indeed, machine learning has enjoyed considerable

success in TAC games historically (see, for example,

A GAME THEORETIC BIDDING AGENT FOR THE AD AUCTION GAME

(Pardoe and Stone, 2006)). Additionally, (Kitts and

Leblanc, 2004) suggested computing a myopic (one-

shot) proﬁt maximizing bid given learned regression

models of expected position and payment per click.

One problem with learning-based approaches is that

they do not prescribe what should be done in the ab-

sence of any information about the adversaries. Ad-

ditionally, they assume that adversary behavior is sta-

tionary and, thus, past behavior is a good predictor

of future behavior. In fact, learning may take some

time before its prescriptions are effective, and the

opponents will often be learning themselves, creat-

ing complex interactions between the learning algo-

rithms, with policies that are unlikely to be stationary.

We steer away from learning-based approaches

entirely, with our bidding policy determined by a

simulation-based equilibrium estimate. We do so not

to suggest that learning is a lost cause; rather, we fol-

low a precise research agenda: developing an agent

that plays an equilibrium strategy alone allows us to

directly measure the efﬁcacy of a pure game theo-

retic approach. Success of our approach will, thus,

make a good case for equilibrium as initial prediction

and strategic prescription, while further online explo-

ration may or may not lead an agent to play other,

more promising strategies.

In order to apply simulation-based game theoretic

techniques to bidding, we need to ﬁrst abstract the

complex environment of TAC/AA into a computation-

ally tractable restricted bidding strategy class. To this

end, we make a dramatic simpliﬁcation in considering

bidding strategies which are linear in an estimate of

an advertiser’s value per click v, i.e., b(v) = αv. The

motivation for such a restriction comes from the lit-

erature on the theory of one-item auctions (Krishna,

2002), which often exhibits equilibria that are lin-

ear in bidder valuations, as well as other game the-

oretic treatments of far simpler models of keyword

auctions (Vorobeychik, 2009). Note that this bidding

function is entirely myopic, as it contains no tempo-

ral dependence (or any other state information about

the game that may be available). On the other hand,

it is very simple to implement and highly intuitive: an

agent is asked to determine what fraction of his value

he wishes to bid. Indeed, particularly due to the simi-

larity of the GSP price mechanism to Vickrey auction,

a very natural strategy would be to bid one’s value,

setting α = 1. As we demonstrate below, this “truth-

ful bidding” turns out to be a very poor strategy in our

context.

While we have now a concrete class of bidding

strategies to focus on, we have yet another question

to answer before we can proceed to the actual analy-

sis stage: as value per click is not directly given, how

do we derive it from the TAC/AA speciﬁcation and/or

game experience? We devote the next section to this

question.

4.3 Estimating Value per Click

A value per click of an advertiser a for a keyword q is

the expected revenue from a click,

= Pr{conversion|click}E[R

|conversion].

Revenue from a conversion depends entirely on

whether the manufacturer in the keyword (user pref-

erence) matches the advertiser’s specialty. If the man-

ufacturer is speciﬁed in the keyword, the revenue is

$15 if it matches the specialty and $10 otherwise. If

not, the expected revenue is 15 ×

+ 10 ×

, as

there is a 1/3 chance of a specialty match.

To compute the conversion probability, we need to

estimate two things: the proportion of focused shop-

pers and the (expected) value of I

. We begin with

the former, assuming that an estimate of I

is avail-

able. Since the proportion of focused shoppers ac-

tually depends on agent policies, we obtain an ini-

tial estimate using an arbitrary ﬁxed policy, use the

result to estimate bidding equilibria, and then reﬁne

the estimate using equilibrium bidding policies.

we ﬁx agent policies, the proportion of focused shop-

pers on a given day for a keyword q can be com-

puted as the ratio of the empirical fraction of clicks

that result in purchases and the estimate of conver-

sion probability of a focused shopper. We average

such empirical proportions for every simulation day

over 100-130 simulations to obtain a daily estimate

of expected proportion of focused shoppers for each

keyword. We further average the resulting empirical

proportions of focused shoppers over keyword classes

(that is, over 6 F1 keywords in one case and over

9 F2 keywords in another). Thus, we have in the

end empirical proportions of focused shoppers for the

three classes of keywords, shown in Figure 2. Two

features of this plot are worthy of note. First, the

proportions are essentially the same for all keyword

classes. This is not very surprising: there isn’t a

very strong a priori reason to believe that they would

of necessity be different. Second, proportions fol-

low a damped harmonic oscillation pattern. These

oscillations are caused by the nonstationarity in the

state transition process: a higher proportion of fo-

cused shoppers yield a higher conversion probability

and, therefore, more sales, which result in the drop

of conversion probability due to exhausted capacity

In practice, it turned out that our estimates of focused

shopper proportions were not very sensitive to the speciﬁcs

of a bidding policy in our linear strategy space.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

!#$"

!" %!" &!" '!" (!" )!" *!"

!"#$#"%#&'(#)*+,-'./#$$,"+'

012'

+!"

+%"

+&"

Figure 2: User proportions in the focused shopping state.

and also a drop in the fraction of focused shoppers.

When the conversion probabilities are low, however,

few transactions occur, increasing the proportion of

focused shoppers. Interestingly, this process reaches

a near steady-state at around the midway point of a

game.

Suppose that we are estimating value per click on

day d for tomorrow (day d + 1). In order to com-

pute the value of I

d+1

, we need sales information for

the three days that precede day d, as well as total

sales for day d. Additionally, the value of I

d+1

(and,

hence, value per click) is not actually ﬁxed but will

change with every additional sale on day d + 1. On

day d we have exact information about sales on d − 1,

d − 2, and d −3 based on advertiser sales reports that

are provided at the beginning of each simulated day.

Furthermore, we can estimate the expected sales on

day d as the product of CTR, today’s conversion rate,

and the total number of impressions. While we know

none of these exactly, we can estimate each with rea-

sonable accuracy. First, we crudely estimate CTR as

the average observed CTR throughout the game. To-

day’s conversion rate requires estimating I

, for which

we have data from all days except current. We ob-

tain a conservative (high) estimate for today’s conver-

sion rate by only using those “known” sales in com-

puting I

(which in the end underestimates value per

click for d + 1). The total number of impressions for

each day of every keyword is estimated by running

100-130 simulations ofﬂine and averaging the number

of observed impressions, using a ﬁxed agent policy

vector, just as in estimating focused shopper propor-

tions. Next, we project total sales on day d + 1 using

again a conservative estimate of the conversion rate

that would be effective at the beginning of that day.

Finally, since value is roughly linear in I

d+1

, we com-

pute average I

d+1

over each incremental sale made on

day d + 1.

5 SIMULATION-BASED GAME

THEORETIC ANALYSIS

5.1 Equilibrium and Best Response

Analysis

Having restricted our bidding strategies to be of the

form b(v) = αv, we use simulation-based game theo-

retic analysis to estimate an equilibrium in this strat-

egy space. We note that an equilibrium estimated in

the analysis actually plays a dual role, one predictive,

describing what other agents will do, and one pre-

scriptive, telling us how to optimally respond to that

prediction.

In order to operationalize an equilibrium solution

in the prescriptive context, we make a substantial fur-

ther restriction and focus only on symmetric strategy

proﬁles, that is, restrict all agents to follow the same

bidding strategy b(v). Hence, we use α to refer both

to a speciﬁc bidding strategy and to a symmetric pro-

ﬁle of these. There are two key reasons for restrict-

ing attention to symmetric proﬁles. First, an asym-

metric equilibrium is difﬁcult to operationalize, since

it is not clear (when agents are ex-ante symmetric)

which role our agent should play. Second, even if we

pick a role for our agent, we still must assume that

others coordinate on their respective roles just as we

predict (at the minimum, no other agent may chooses

our agent’s role). Furthermore, we do not necessar-

ily lose much by the restriction from the descriptive

standpoint, since the agent ultimate cares about other

players’ choices only in the aggregate, insofar as they

impact CTR and payments, and it seems reasonable

that this is sufﬁciently captured by a symmetric equi-

librium proﬁle.

Since bids should be strictly positive to ensure

any proﬁt and, myopically, there is no reason to bid

above value per click, we restrict α to the (0,1] in-

terval. Furthermore, to enable a more detailed analy-

sis, we limit our equilibrium search to a discrete grid

{0.1,0.2,...,1} (we also performed analysis “be-

tween” some of these grid points, but found that we

do not lose very much due to our particular choice of

discretization).

One major hurdle in equilibrium-based agent de-

sign is the issue of equilibrium selection. Since in

our case equilibrium would offer both a prediction of

opponent play and a best response to it, the goal, if

we are to choose an equilibrium, is to choose one that

yields the most plausible such prediction.

A common and highly effective technique em-

ployed in designing computational agents to compete

against others is self-play (for example, Tesauro’s

A GAME THEORETIC BIDDING AGENT FOR THE AD AUCTION GAME

!"#$

!"%$

!"&$

!"'$

!"($

!")$

!"*$

!"+$

!",$

!"#$ !"%$ !"&$ !"'$ !"($ !")$ !"*$ !"+$ !",$ #$

!"#$%&"#'()#"%

*+,,"$-./%0-(12"%3#4%

#!!!"

$!!!!"

$#!!!"

%!!!!"

%#!!!"

&!!!!"

&#!!!"

'!!!!"

'#!!!"

#!!!!"

!($" !(%" !(&" !('" !(#" !()" !(*" !(+" !(," $"

!"#$%&'$()$*+%,$-)$.%

/0##$.)1+%/.)".$-0%2)(34$%

Figure 3: Best response function (left) and game theoretic regret (right) for all symmetric strategy proﬁles on the grid.

TD-Gammon agent was developed in such a

way (Tesauro, 1995)). While this approach is usually

applied at the level of individual game decisions when

opponents move sequentially, we can detect a rough

correspondence between self-play and a well-known

iterative best response dynamic, where a player com-

putes a maximizing action in each iteration assum-

ing stationary opponents. In our case, iterative best

response would proceed by ﬁrst selecting a starting

(seed) symmetric proﬁle α

, approximating a single-

agent best response strategy

to it, then setting the

symmetric proﬁle in the next iteration to be α

If this process converges and best responses are truly

optimal, it necessarily converges to a Nash equilib-

rium α

∗

. The fact that the process can be viewed as

roughly analogous to self-play suggests that equilib-

ria found in such a manner may have good predictive

properties, at least regarding the most competent of

opponents. However, the dynamic itself is not sufﬁ-

cient: even if we believe other agents to follow a sim-

ilar process, all need to agree on a starting point α

The choice of a starting point would, in general, be in-

formed by whatever conventions govern typical algo-

rithmic design in speciﬁc domains. In the context of

auctions with one-dimensional valuations (such as our

case), a rather focal starting point is truthful bidding,

particularly so since GSP is reminiscent of Vickrey

auctions which are, in fact, truthful. Hence, setting

= 1 seems a very reasonable way to seed a best

response dynamic in a way that would lead to good

predictions.

Following this approach, we obtained the equilib-

rium strategy for the purposes of the tournament via

several iterations of best response dynamics starting

at α = 1. A look at Figure 3 (left) shows that a best

response to a symmetric strategy proﬁle with α = 1

is α = 0.4, and a best response to a symmetric pro-

ﬁle with α = 0.4 is α = 0.2, which happens to be a

symmetric equilibrium in our restricted policy space.

Consequently, we were able to obtain a symmetric

equilibrium for the restricted discrete bidding strategy

space after only two best response iterations.

Based on the rapid convergence of iterative best

response in our setting, we can make another con-

jecture: the equilibrium that we thus locate is rela-

tively robust in the sense that the equilibrium strategy

is a best response (or nearly so) for a number of other

opponent strategies besides equilibrium. We suggest

that this is another positive side-effect of considering

best response dynamics in some settings. Another ex-

ample of this phenomenon is a ﬁrst-price sealed-bid

auction with private valuations uniformly distributed

on a unit interval, where the best response to truthful

bidding in the linear strategy space is also a symmet-

ric equilibrium strategy.

Figure 3 (left) demonstrates

this robustness in our case: α = 0.2 is a best response

to 0.2, 0.3, and 0.4. Indeed, this ﬁgure additionally

reveals another equilibrium at α = 0.1, but it is only a

best response to itself.

After the tournament we ran additional simula-

tions to paint a more complete picture of the best re-

sponse function in our discrete strategy space, which

is depicted in Figure 3 (left), with payoffs for any

conﬁguration of agent strategies computed based on

15-30 simulation runs.

Several items are noteworthy

from Figure 3. First, we may note that none of α > 0.4

are ever a best response. This does not necessarily

imply that these are poor strategies to play: it may

be that an agent gains little by deviating from such

a strategy, if all others jointly also play it. The cor-

responding measure of strategic stability, game the-

oretic regret, evaluates, for any strategy proﬁle α the

amount of gain an agent can obtain by unilaterally de-

viating when all others play the prescribed symmetric

Of course, we make no general claims here, just offer

some empirically motivated intuition and conjecture.

The total number of runs we could execute was lim-

ited due to our experimental environment and the non-trivial

running time of each simulation.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!"

(!!!!"

)!!!!"

*!!!!"

!+#" !+$" !+%" !+&" !+'" !+(" !+)" !+*" !+," #"

!"#$%&'(&)

*)(")&'+

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!"

(!!!!"

)!!!!"

*!!!!"

!+#" !+$" !+%" !+&" !+'" !+(" !+)" !+*" !+," #"

!"#$%&'(&)$%&*)(+,)&-$).$/0123$0143$0153$0167

8)(")&'9

Figure 4: Max regret of each strategy on the grid against all strategies (left) and against a limited subset of “reasonable”

opponents (right).

strategy:

ε(α) = max

∈[0,1]

u(α

,α

−1

) − u(α),

where u() is the symmetric utility function of each

bidder (in our case, we estimate it by running simula-

tions) and α

−1

indicates that all players besides bidder

1 play the prescribed α, while α

denotes a deviation

by bidder 1 (we pick bidder 1 arbitrarily here since

every bidder is equivalent in this context by symme-

try). Figure 3 (right) plots game theoretic regret of

all symmetric strategy proﬁles in our discrete strategy

space. This ﬁgure further, and rather soundly, con-

ﬁrms that any α > 0.4 makes a very poor strategic

choice, one unlikely to be adopted by even somewhat

competent opponents. This observation alone already

dramatically restricts our consideration set, and one

may well use such knowledge derived from a game

theoretic analysis to proceed with a more traditional

machine learning approach. This is a rather important

and general point: game theoretic techniques may of-

ten be quite useful in restricting the number of options

one needs to consider in learning, resulting, perhaps,

in signiﬁcant improvement in learning performance.

Another interesting observation is that the equilibrium

α = 0.2 is actually a best response to nearly every rea-

sonable strategy (i.e., α < 0.4, with α = 0.1 being the

lone exception) in our restricted space.

5.2 Robustness Analysis

We now turn to offer a methodology for a largely

prescriptive game theoretic analysis, which is partic-

ularly salient in practical agent design settings like

TAC/AA. This analysis is complementary to more tra-

ditional equilibrium analysis above, as it allows us

(among other things) to assess alternative equilibrium

strategies.

When deciding on a strategy for an agent in a mul-

tiagent system, an important consideration is robust-

ness to uncertainty about opponent decisions. A com-

mon way to measure robustness of a particular strat-

egy is via maximum regret, or the most that an agent

would have gained by switching to another strategy,

maximized over all opponent policies in a speciﬁc

consideration set. Figure 4 (left) shows max regret of

every strategy in response to our entire restricted con-

sideration set, while Figure 4 (right) plots max regret

when we restrict opponents to play only “reasonable”

strategies. We can observe that α = 0.2 fairs reason-

ably well in both cases; although α = 0.3 and α = 0.4

are more robust to very aggressive opponents (left), if

we assume that all opponents are reasonable, α = 0.2

has the smallest regret. In fact, good robustness prop-

erty of α = 0.3 in the unrestricted opponent setting

actually prompted us to use that strategy, rather than

0.2, in the semiﬁnal rounds, due to the risk that some

of the agents competing at that stage are still rather

unpolished (a prediction that proved correct). In con-

trast, the α = 0.1 equilibrium has relatively poor re-

gret properties in both settings. The upshot of this dis-

cussion is that we can augment standard simulation-

based game theoretic analysis with an analysis of max

regret, as well as game theoretic regret, to allow us to

best balance the risks from poor opponent strategy as-

sessment with beneﬁts of optimally responding to our

predictions in a given setting.

6 SOME ENHANCEMENTS

Our discussion above centered around an assumption

that the same bidding policy (parametrized by α) is

used for any keyword. If we use a state abstraction

that captures all relevant strategic aspects of the envi-

ronment, then there is no loss in utilizing a single bid-

ding policy for all keywords. However, as we focus on

policies that only use a myopic value per click, a rela-

tively simple way to compensate for our restriction is

A GAME THEORETIC BIDDING AGENT FOR THE AD AUCTION GAME

to use different policies for different keyword classes.

Thus, we may wish to use a bidding strategy that is

a vector < α

,α

>, where each component

prescribes the bidding strategy for the corresponding

keyword. Another natural generalization is to con-

template quadratic bid functions. We implement the

extension to quadratic bidding policies by specifying

a value of α

low

to use when I

= 0 (and, hence, v = 0),

and take the speciﬁed α (as above) to be applicable

when I

= 1 (value is maximal), with the restriction

that α

low

≥ α; the actual strategy is then a linear inter-

polation between these two extremes. This allows us

to add only a single parameter, even while allowing

different α for different keywords. The intuition for

our special restricted class of quadratic bidding func-

tions is that a higher fraction of value is submitted

as a bid when value is low. This is motivated by the

equilibrium structure of multiunit auctions (Krishna,

2002).

The equilibrium analysis that we had performed

above had used a one-dimensional strategy space, and

so estimating a best response did not require very

much computation. By considering, instead, a four-

dimensional strategy space, we make the problem of

exhaustive sampling of the entire strategy space in-

tractable due to the considerable simulation time re-

quired by each ad auction game. As a result, we

can no longer implement iterative best response pre-

cisely as would be prescribed in an ideal setting.

Rather, we simplify search the process by iterating

one-dimensional best response dynamics sequentially

over strategic parameters. Speciﬁcally, we proceed as

follows. First, we ignore the strategy space extension

and estimate an equilibrium in the one-dimensional

strategy space as described above. This gives us α

∗

We thereby set α

= α

low

= α

∗

and pro-

ceed to sequentially explore individual strategic pa-

rameters, starting with α

low

. More formally and gen-

erally, let s = {s

,...,s

} be a strategy factored into L

parameters and suppose that s is initialized to s

. We

suggest the following algorithm for approximating an

equilibrium in this factored strategy space:

1. Fix s

= s

, and perform best response dynamics

only allowing s

to vary. Assume that best re-

sponse dynamics converges (if not, we can termi-

nate it after a ﬁnite number of iterations and select

an equilibrium approximation based on some cri-

terion from all the explored candidates) to s

= s

∗

This gives us a new s = {s

∗

,...,s

}

2. Fix all strategic features at these values except s

and vary s

in another sequence of best response

iterations

3. Repeat step 2 sequentially for all strategic param-

eters.

When we have completed the procedure above for

all strategic parameters, we thereby have obtained

∗

= {s

∗

,...,s

∗

}. Note that s

∗

is not guaranteed to

be an equilibrium, since we only vary a single strate-

gic parameter at a time. Validating that such a proce-

dure actually yields good equilibrium approximation

is a subject of future work; for now, sufﬁce it to say

that its performance was quite satisfactory in the ac-

tual tournament, where we used < 0.1,0.2,0.2 > and

low

= 0.3, all obtained in this fashion.

7 ALTERNATIVE AD SELECTION

POLICIES

Having assumed until now that our choice of ad pol-

icy is reasonable (and, moreover, that a speciﬁc ad

policy has relatively little proﬁt impact), we consider

two simple alternative ad selection policies. The ﬁrst,

Generic Ad Selection, always chooses a generic ad.

The second, Specialty Ad Selection, always chooses

the ad to match the product to the advertiser’s manu-

facturer and component specialty. Figure 5 shows that

we were incorrect on one account: ad selection does

make a signiﬁcant impact on proﬁts. Fortunately, the

policy we actually used proved sensible, as it is signif-

icantly better than generic at the 0.9 conﬁdence level

and empirically (though not statistically signiﬁcantly)

better than the specialty ad selection policy.

!""""#

!$"""#

%""""#

%$"""#

&""""#

&$"""#

!"#$%&'$( )$'$*&+( ,-$+&"%./(

01$*"2$(3*45.(

Figure 5: Payoffs of three ad policies (baseline is the one

actually used in tournament). Error bars are conﬁdence in-

tervals at the 0.9 level.

Indeed, since we evaluated the selection policies for an

agent by ﬁxing the policies of others to be our “baseline”

described above, the baseline policy is shown to be an ap-

proximate equilibrium if we restrict the ad policy space to

only these three options.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

8 PREDICTIVE VALUE

OF EQUILIBRIUM:

EVIDENCE FROM

TOURNAMENT

We close our discussion with some evidence about

the descriptive quality of our approximate equilib-

rium policies from the TAC/AA tournament. In an

ad auction, key determinants of an agent’s proﬁts are

the distributions of CTRs and payments per click as

functions of submitted bids. We use the data from

tournament qualifying rounds, semiﬁnals, and ﬁnals

to see whether these distributions appear to converge

to equilibrium predictions. We evaluate the error of

an equilibrium prediction with respect to the tourna-

ment evidence about the distribution of some measure

(say, payments per click) as follows. First, we bin all

bids from simulated equilibrium and tournament ex-

perience of our agent into 50 intervals. For each bid

interval, we compute the maximum error between the

tournament and equilibrium distributions of the mea-

sure of interest (essentially, we use the Kolmogorov-

Smirnov test statistic), and then compute the weighted

average error over all bid intervals, with weights cor-

responding to the number of bids that fall into each

interval.

!#$"

!#%"

!#&"

!#'"

!#("

!"# !$%# !$&# !'#

)*+,-" ./01-" 213+,-"

Figure 6: Average distance between equilibrium and real-

ized (tournament) click-though-rate distributions.

The results, shown in Figures 6 and 7, show a

clear downward trend in error as the tournament pro-

gresses: as agents become more competent on aver-

age, equilibrium prediction becomes increasingly ac-

curate. At the minimum, this suggests that using equi-

librium predictions as initial policies in the absence of

other information can be quite effective.

A similar, and much rosier picture for equilibrium

On the other hand, ﬁnal errors are still non-trivial, so

augmenting this approach with learning seems quite desir-

able.

!#$"

!#%"

!#&"

!#'"

!#("

!"# !$%# !$&# !'#

)*+,-" ./01-" 213+,-"

Figure 7: Average distance between equilibrium and real-

ized (tournament) payment per click distributions.

prediction, is shown in Figure 8, where we look at

average realized values of α observed in the tourna-

ment: there is a clear downward trend as tournament

progresses, and the strategies in the ﬁnal rounds are

extremely close to equilibrium predictions.

We also

note that average proﬁts exhibit a similar trend, start-

ing rather low (high α) and growing to near the levels

predicted by the symmetric equilibrium at α = 0.2 as

the tournament progresses. This reveals, indirectly,

that as agent pool becomes more competent, submit-

ted bids are lower (on average), allowing bidders to

realize higher proﬁts.

!#$"

!#%"

!#&"

!#'"

!"#$%& '()*%& +*,#$%& -."*$*/0*")&

12(0#3(&!

Figure 8: Average α values used by bidders throughout the

tournament.

9 CONCLUSIONS

We demonstrate in concrete terms how to operational-

ize a pure game theoretic bidding strategy in a com-

plex simulated keyword auction game, combining

equilibrium analysis (which offers a combination of

Of course, these aren’t actual policies used, just our

abstraction of them into the linear strategy space.

A GAME THEORETIC BIDDING AGENT FOR THE AD AUCTION GAME

descriptive and prescriptive insights) with a purely

prescriptive analysis based on robustness. All the

analysis is done using simulations, as compared to

more traditional game-theoretic analyses which usu-

ally involve mathematical treatments. Furthermore, in

spite of the approximate nature of the resulting equi-

libria, we ﬁnd that they offer very valuable predic-

tions about the actual ad auction tournament bidding

(as captured by submitted bids, as well as observed

distribution of CTRs and prices). Finally, we offer an

algorithm for equilibrium approximation when strate-

gies are multi-dimensional, based on a sequence of

single-dimensional analyses. In the process, we of-

fer numerous general insights about operationalizing

game theoretic approaches in practical agent design

in multiagent systems.

REFERENCES

Jordan, P. R. and Wellman, M. P. (2009). Designing the

ad auctions game for the trading agent competition.

In IJCAI-09 Workshop on Trading Agent Design and

Analysis.

Kitts, B. and Leblanc, B. (2004). Optimal bidding on key-

word auctions. Electronic Markets, 14(3):186–201.

Krishna, V. (2002). Auction Theory. Academic Press.

Lahaie, S. and Pennock, D. M. (2008). Revenue analy-

sis of a family of ranking rules for keyword auctions.

In Eighth ACM Conference on Electronic Commerce,

pages 50–56.

Pardoe, D. and Stone, P. (2006). Tactex-05: A cham-

pion supply chain management agent. In Twenty-First

National Conference on Artiﬁcial Intelligence, pages

1489–1494.

Tesauro, G. (1995). Temporal difference learning and td-

gammon. Communications of the ACM, 38(3):58–68.

Vorobeychik, Y. (2009). Simulation-based game theoretic

analysis of keyword auctions with low-dimensional

bidding strategies. In Twenty-Fifth Conference on Un-

certainty in Articial Intelligence.

Wellman, M. P., Reeves, D. M., Lochner, K. M., and Suri,

R. (2006). Searching for walverine-05. In Agent-

Mediated Electronic Commerce: Designing Trading

Agents and Mechanisms, LNAI 3937, pages 157–170.

Springer-Verlag.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence