DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS

Daniel Michulke

and Stephan Schiffel

Department of Computer Science, Dresden University of Technology, Dresden, Germany

School of Computer Science, Reykjav

ık University, Reykjav

ık, Iceland

Keywords:

General game playing, Feature construction, Heuristic search.

Abstract:

General Game Playing (GGP) is concerned with the development of programs that are able to play previously

unknown games well. The main problem such a player is faced with is to come up with a good heuristic

evaluation function automatically. Part of these heuristics are distance measures used to estimate, e.g., the

distance of a pawn towards the promotion rank. However, current distance heuristics in GGP are based on too

speciﬁc detection patterns as well as expensive internal simulations, they are limited to the scope of totally

ordered domains and/or they apply a uniform Manhattan distance heuristics regardless of the move pattern of

the object involved.

In this paper we describe a method to automatically construct distance measures by analyzing the game rules.

The presented method is an improvement to all previously presented distance estimation methods, because it

is not limited to speciﬁc structures, such as, Cartesian game boards. Furthermore, the constructed distance

measures are admissible.

We demonstrate how to use the distance measures in an evaluation function of a general game player and show

the effectiveness of our approach by comparing with a state-of-the-art player.

1 INTRODUCTION

While in classical game playing, human experts en-

code their knowledge into features and parameters of

evaluation functions (e.g., weights), the goal of Gen-

eral Game Playing is to develop programs that are

able to autonomously derive a good evaluation func-

tion for a game given only the rules of the game. Be-

cause the games are unknown beforehand, the main

problem lies in the detection and construction of use-

ful features and heuristics for guiding search in the

match.

One class of such features are distance features

used in a variety of GGP agents (e.g., (Kuhlmann

et al., 2006; Schiffel and Thielscher, 2007; Clune,

2007; Kaiser, 2007)). The way of detecting and con-

structing features in current game playing systems,

however, suffers from a variety of disadvantages:

• Distance features require a prior recognition of

board-like game elements. Current approaches

formulate hypotheses about which element of the

game rules describes a board and then either

check these hypotheses in internal simulations of

the game (e.g., (Kuhlmann et al., 2006; Schif-

fel and Thielscher, 2007; Kaiser, 2007)) or try

to prove them (Schiffel and Thielscher, 2009a).

Both approaches are expensive and can only de-

tect boards if their description follows a certain

syntactic pattern.

• Distance features are limited to Cartesian board-

like structures, that is, n-dimensional structures

with totally ordered coordinates. Distances over

general graphs are not considered.

• Distances are calculated using a predeﬁned met-

ric on the boards. Consequently, distance values

obtained do not depend on the type of piece in-

volved. For example, using a predeﬁned metric

the distance of a rook, king and pawn from a2

to c2 would appear equal while a human would

identify the distance as 1, 2 and ∞ (unreachable),

respectively.

In this paper we will present a more general ap-

proach for the construction of distance features for

general games. The underlying idea is to analyze the

rules of game in order to ﬁnd dependencies between

the ﬂuents of the game, i.e., between the atomic prop-

erties of the game states. Based on these dependen-

cies, we deﬁne a distance function that computes an

admissible estimate for the number of steps required

to make a certain ﬂuent true. This distance function

127

Michulke D. and Schiffel S..

DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS.

DOI: 10.5220/0003744001270136

In Proceedings of the 4th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2012), pages 127-136

ISBN: 978-989-8425-95-9

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

can be used as a feature in search heuristics of GGP

agents. In contrast to previous approaches, our ap-

proach does not depend on syntactic patterns and in-

volves no internal simulation or detection of any pre-

deﬁned game elements. Moreover, it is not limited to

board-like structures but can be used for every ﬂuent

of a game.

The remainder of this paper is structured as fol-

lows: In the next section we give an introduction

to the Game Description Language (GDL), which is

used to describe general games. Furthermore, we

brieﬂy present the methods currently applied for dis-

tance feature detection and distance estimation in the

ﬁeld of General Game Playing. In Section 3 we in-

troduce the theoretical basis for this work, so called

ﬂuent graphs, and show how to use them to derive dis-

tances from states to ﬂuents. We proceed in Section 4

by showing how ﬂuent graphs can be constructed

from a game description and demonstrate their appli-

cation in Section 5. Finally, we conduct experiments

in Section 6 to show the beneﬁt and generality of our

approach and discuss and summarize the results in

Section 9.

2 PRELIMINARIES

The language used for describing the rules of gen-

eral games is the Game Description Language (Love

et al., 2008) (GDL). GDL is an extension of Datalog

with functions, equality, some syntactical restrictions

to preserve ﬁniteness, and some predeﬁned keywords.

The following is a partial encoding of a Tic-Tac-

Toe game in GDL. In this paper we use Prolog syntax

where words starting with upper-case letters stand for

variables and the remaining words are constants.

1 role( xp l a yer ). role( o play e r ).

3 init( cell (1 ,1, b ) ) . init( cell (1 ,2 , b )).

4 init( cell (1 ,3, b ) ) . ...

5 init( cell (3 ,3, b ) ) .

6 init( co n t rol ( xp l a yer ) ) .

8 legal(P , mark (X , Y ) ) : -

9 true( c ontr o l ( P )) , true( c e l l ( X , Y , b )).

10 legal(P , noop ) : -

11 role(P ) , not true( c o n trol ( P ) ) .

13 next( cell (X ,Y , x )) : -

14 does( x p l a y e r , m a r k ( X , Y ) ) .

15 next( cell (X ,Y , o )) : -

16 does( o p l a y e r , m a r k ( X , Y ) ) .

17 next( cell (X ,Y , C )) : -

18 true( c e l l ( X , Y , C )) , distinct(C , b ) .

19 next( cell (X ,Y , b )) : - true( c e l l ( X , Y , b )) ,

20 does(P , m a r k ( M , N )) ,

21 (distinct( X , M ) ; distinct(Y, N ) ) .

23 goal( xplayer , 100 ) : - line (x ).

24 ...

25 terminal : -

26 l i n e ( x) ; line (o ) ; not open .

28 line ( P ) :-

29 true( c e l l ( X ,1 , P )) ,

30 true( c e l l ( X ,2 , P )) ,

31 true( c e l l ( X ,3 , P )).

32 ...

33 open : - true( c e l l ( X ,Y ,b )).

The ﬁrst line declares the roles of the game. The

unary predicate init deﬁnes the properties that are

true in the initial state. Lines 8-11 deﬁne the legal

moves of the game with the help of the keyword legal.

For example, mark(X,Y) is a legal move for role P if

control(P) is true in the current state (i.e., it’s P’s turn)

and the cell X,Y is blank (cell(X,Y,b)). The rules for

predicate next deﬁne the properties that hold in the

successor state, e.g., cell(M,N,x) holds if xplayer marked

the cell M,N and cell(M,N,b) does not change if some cell

different from M,N was marked

. Lines 23 to 26 deﬁne

the rewards of the players and the condition for termi-

nal states. The rules for both contain auxiliary pred-

icates line(P) and open which encode the concept of a

line-of-three and the existence of a blank cell, respec-

tively.

We will refer to the arguments of the GDL key-

words init, true and next as ﬂuents. In the above

example, there are two different types of ﬂuents,

control(X) with X ∈ {xplayer, oplayer} and cell(X, Y, Z)

with X, Y ∈ {1, 2, 3} and Z ∈ {b, x, o}.

In (Schiffel and Thielscher, 2009b), we deﬁned a

formal semantics of a game described in GDL as a

state transition system:

Deﬁnition 1. (Game). Let Σ be a set of ground terms

and 2

denote the set of ﬁnite subsets of Σ. A game

over this set of ground terms Σ is a state transition

system Γ = (R,s

,T,l, u,g) over sets of states S ⊆ 2

and actions A ⊆ Σ with

• R ⊆ Σ, a ﬁnite set of roles;

• s

∈ S , the initial state of the game;

• T ⊆ S , the set of terminal states;

• l : R × A × S , the legality relation;

• u : (R 7→ A ) × S → S , the transition or update

function; and

• g : R × S 7→ N, the reward or goal function.

This formal semantics is based on a set of ground

terms Σ. This set is the set of all ground terms over

The special predicate distinct(X,Y) holds if the terms

X and Y are syntactically different.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

128

the signature of the game description. Hence, ﬂuents,

actions and roles of the game are ground terms in Σ.

States are ﬁnite sets of ﬂuents, i.e., ﬁnite subsets of

Σ. The connection between a game description D and

the game Γ it describes is established using the stan-

dard model of the logic program D. For example, the

update function u is deﬁned as

u(A,s) = { f ∈ Σ : D ∪ s

true

∪ A

does

|= next( f )}

where s

true

and A

does

are suitable encodings of the

state s and the joint action A of all players as a logic

program. Thus, the successor state u(A,s) is the set

of all ground terms (ﬂuents) f such that next( f ) is

entailed by the game description D together with the

state s and the joint move A. For a complete deﬁnition

for all components of the game Γ we refer to (Schiffel

and Thielscher, 2009b).

3 FLUENT GRAPHS

Our goal is to obtain knowledge on how ﬂuents evolve

over time. We start by building a ﬂuent graph that

contains all the ﬂuents of a game as nodes. Then we

add directed edges ( f

, f ) if at least one of the prede-

cessor ﬂuents f

must hold in the current state for the

ﬂuent f to hold in the successor state. Figure 1 shows

a partial ﬂuent graph for Tic-Tac-Toe that relates the

ﬂuents cell(3,1,Z) for Z ∈ {b, x, o}.

Figure 1: Partial ﬂuent graph for Tic-Tac-Toe.

For cell (3,1) to be blank it had to be blank before.

For a cell to contain an x (or an o) in the successor

state there are two possible preconditions. Either, it

contained an x (or o) before or it was blank.

Using this graph, we can conclude that, e.g., a

transition from cell(3,1,b) to cell(3,1,x) is possible

within one step while a transition from cell(3,1,o) to

cell(3,1,x) is impossible.

To build on this information, we formally deﬁne a

ﬂuent graph as follows:

Deﬁnition 2. (Fluent Graph). Let Γ be a game over

ground terms Σ. A graph G = (V, E) is called a ﬂuent

graph for Γ iff

• V = Σ ∪ {

0} and

• for all ﬂuents f ∈ Σ, two valid states s and s

is a successor of s) ∧ f

∈ s

(1)

⇒ (∃ f )( f , f

) ∈ E ∧ ( f ∈ s ∪ {

0})

In this deﬁnition we add an additional node

to the graph and allow

0 to occur as the source of

edges. The reason is that there can be ﬂuents in

the game that do not have any preconditions, for

example the ﬂuent g with the following next rule:

next(g) :- distinct(a,b). On the other hand, there

might be ﬂuents that cannot occur in any state, be-

cause the body of the corresponding next rule is un-

satisﬁable, for example: next(h) :- distinct(a,a). We

distinguish between ﬂuents that have no precondition

(such as g) and ﬂuents that are unreachable (such as

h) by connecting the former to the node

0 while un-

reachable ﬂuents have no edge in the ﬂuent graph.

Note that the deﬁnition covers only some of the

necessary preconditions for ﬂuents, therefore ﬂuent

graphs are not unique as Figure 2 shows. We will

address this problem later.

Figure 2: Alternative partial ﬂuent graph for Tic-Tac-Toe.

We can now deﬁne a distance function ∆(s, f

) be-

tween the current state s and a state in which ﬂuent f

holds as follows:

Deﬁnition 3. (Distance Function). Let ∆

( f , f

) be

the length of the shortest path from node f to node

in the ﬂuent graph G or ∞ if there is no such path.

Then

∆(s, f

) = min

f ∈s∪{

∆

( f , f

)

That means, we compute the distance ∆(s, f

) as

the shortest path in the ﬂuent graph from any ﬂuent in

s to f

Intuitively, each edge ( f , f

) in the ﬂuent graph

corresponds to a state transition of the game from a

state in which f holds to a state in which f

holds.

Thus, the length of a path from f to f

in the ﬂuent

graph corresponds to the number of steps in the game

between a state containing f to a state containing f

Of course, the ﬂuent graph is an abstraction of the

actual game: many preconditions for the state tran-

sitions are ignored. As a consequence, the distance

∆(s, f

) that we compute in this way is a lower bound

on the actual number of steps it takes to go from s

to a state in which f

holds. Therefore the distance

DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS

129

∆(s, f

) is an admissible heuristic for reaching f

from

a state s.

Theorem 1. (Admissible Distance). Let

• Γ = (R,s

,T,l, u,g) be a game with ground terms

Σ and states S ,

• s

∈ S be a state of Γ,

• f ∈ Σ be a ﬂuent of Γ, and

• G = (V,E) be a ﬂuent graph for Γ.

Furthermore, let s

7→ s

7→ . .. 7→ s

m+1

denote a legal

sequence of states of Γ, that is, for all i with 0 < i ≤ m

there is a joint action A

, such that:

i+1

= u(A

) ∧ (∀r ∈ R)l(r, A

(r), s)

If ∆(s

, f ) = n, then there is no legal sequence of

states s

7→ .. . 7→ s

m+1

with f ∈ s

m+1

and m < n.

Proof. We prove the theorem by contradiction. As-

sume that ∆(s

, f ) = n and there is a a legal sequence

of states s

7→ ... 7→ s

m+1

with f ∈ s

m+1

and m < n.

By Deﬁnition 2, for every two consecutive states s

i+1

of the sequence s

7→ ... 7→ s

m+1

and for ev-

ery f

i+1

∈ s

i+1

there is an edge ( f

, f

i+1

) ∈ E such

that f

∈ s

or f

0. Therefore, there is a path

,. .., f

, f

m+1

in G with 1 ≤ j ≤ m and the following

properties:

• f

∈ s

for all i = j,...,m + 1,

• f

m+1

= f , and

• either f

∈ s

(e.g., if j = 1) or f

Thus, the path f

,. .., f

, f

m+1

has a length of at most

m. Consequently, ∆(s

, f ) ≤ m because f

∈ s

∪ {

and f

m+1

= f . However, ∆(s

, f ) ≤ m together with

m < n contradicts ∆(s

, f ) = n.

4 CONSTRUCTING FLUENT

GRAPHS FROM RULES

We propose an algorithm to construct a ﬂuent graph

based on the rules of the game. The transitions of

a state s to its successor state s

are encoded ﬂuent-

wise via the next rules. Consequently, for each f

∈ s

there must be at least one rule with the head next(f’).

All ﬂuents occurring in the body of these rules are

possible sources for an edge to f

in the ﬂuent graph.

For each ground ﬂuent f

of the game:

1. Construct a ground disjunctive normal form φ of

next( f

), i.e., a formula φ such that next( f

) ⊃ φ.

2. For every disjunct ψ in φ:

• Pick one literal true( f ) from ψ or set f =

0 if

there is none.

• Add the edge ( f , f

) to the ﬂuent graph.

Note, that we only select one literal from each dis-

junct in φ. Since, the distance function ∆(s, f

) ob-

tained from the ﬂuent graph is admissible, the goal is

to construct a ﬂuent graph that increases the lengths of

the shortest paths between the ﬂuents as much as pos-

sible. Therefore, the ﬂuent graph should contain as

few edges as possible. In general the complete ﬂuent

graph (i.e., the graph where every ﬂuent is connected

to every other ﬂuent) is the least informative because

the maximal distance obtained from this graph is 1.

The algorithm outline still leaves some open is-

sues:

1. How do we construct a ground formula φ that is

the disjunctive normal form of next( f

2. Which literal true( f ) do we select if there is

more than one? Or, in other words, which pre-

condition f

of f do we select?

We will discuss both issues in the following sec-

tions.

4.1 Constructing a DNF of next(f

)

A formula φ in DNF is a set of formulas {ψ

,. .., ψ

}

connected by disjunctions such that each formula ψ

is a set of literals connected by conjunctions. We pro-

pose the algorithm in Figure 1 to construct φ such that

next( f

) ⊃ φ.

The algorithm starts with φ = next( f

). Then,

it selects a positive literal l in φ and unrolls this lit-

eral, that is, it replaces l with the bodies of all rules

h:-b ∈ D whose head h is uniﬁable with l with a most

general uniﬁer σ (lines 9, 10). The replacement is re-

peated until all predicates that are left are either of the

form true(t), distinct(t

) or recursively deﬁned.

Recursively deﬁned predicates are not unrolled to en-

sure termination of the algorithm. Finally, we trans-

form φ into disjunctive normal form and replace each

disjunct ψ

of φ by a disjunction of all of its ground

instances in order to get a ground formula φ.

Note that in line 4, we replace every occurrence

of does with legal to also include the preconditions

of the actions that are executed in φ. As a conse-

quence the resulting formula φ is not equivalent to

next( f

). However, next( f

) ⊃ φ, under the as-

sumption that only legal moves can be executed, i.e.,

does(r,a) ⊃ legal(r, a). This is sufﬁcient for con-

structing a ﬂuent graph from φ.

Note, that we do not select negative literals for un-

rolling. The algorithm could be easily adapted to also

unroll negative literals. However, in the games we

encountered so far, doing so does not improve the ob-

tained ﬂuent graphs but complicates the algorithm and

increases the size of the created φ. Unrolling negative

ICAART 2012 - International Conference on Agents and Artificial Intelligence

130

Algorithm 1: Constructing a formula φ in DNF with

next( f

) ⊃ φ.

Input: game description D, ground ﬂuent f

Output: φ, such that next( f

) ⊃ φ

1: φ := next( f

)

2: f inished := f alse

3: while ¬ finished do

4: Replace every positive occurrence of

does(r,a) in φ with legal(r, a).

5: Select a positive literal l from φ such that l 6=

true(t), l 6= distinct(t

) and l is not a re-

cursively deﬁned predicate.

6: if there is no such literal then

7: f inished := true

8: else

l :=

h:-b∈D,lσ=hσ

bσ

10: φ := φ{l/

11: end if

12: end while

13: Transform φ into disjunctive normal form, i.e.,

φ = ψ

∨ .. . ∨ ψ

and each formula ψ

is a con-

junction of literals.

14: for all ψ

in φ do

15: Replace ψ

in φ by a disjunction of all ground

instances of ψ

16: end for

literals will mainly add negative preconditions to φ.

However, negative preconditions are not used for the

ﬂuent graph because a ﬂuent graph only contains pos-

itive preconditions of ﬂuents as edges, according to

Deﬁnition 2.

4.2 Selecting Preconditions for the

Fluent Graph

If there are several literals of the form true( f ) in a

disjunct ψ of the formula φ constructed above, we

have to select one of them as source of the edge in

the ﬂuent graph. As already mentioned, the distance

∆(s, f ) computed with the help of the ﬂuent graph is

a lower bound on the actual number of steps needed.

To obtain a good lower bound, that is one that is as

large as possible, the paths between nodes in the ﬂuent

graph should be as long as possible. Selecting the best

ﬂuent graph, i.e., the one which maximizes the dis-

tances, is impossible. Which ﬂuent graph is the best

one depends on the states we encounter when playing

the game, but we do not know these states beforehand.

In order to generate a ﬂuent graph that provides good

distance estimates, we use several heuristics when we

select literals from disjuncts in the DNF of next( f

First, we only add new edges if necessary. That

means, whenever there is a literal true( f ) in a dis-

junct ψ such that the edge ( f , f

) already exists in the

ﬂuent graph, we select this literal true( f ). The ratio-

nale of this heuristic is that paths in the ﬂuent graph

are longer on average if there are fewer connections

between the nodes.

Second, we prefer a literal true( f ) over true(g)

if f is more similar to f

than g is to f

, that is

sim( f , f

) > sim(g, f

We deﬁne the similarity sim(t,t

) recursively over

ground terms t,t

sim(t,t

) =











1 t,t

have arity 0 and t = t

∑

sim(t

) t = f (t

,. ..,t

) and

= f (t

,. ..,t

)

0 else

In human made game descriptions, similar ﬂuents

typically have strong connections. For example, in

Tic-Tac-Toe cell(3,1,x) is more related to cell(3,1,b)

than to cell(b,3,x). By using similar ﬂuents when

adding new edges to the ﬂuent graph, we have a bet-

ter chance of ﬁnding the same ﬂuent again in a dif-

ferent disjunct of φ. Thus we maximize the chance of

reusing edges.

5 APPLYING DISTANCE

FEATURES

For using the distance function in our evaluation func-

tion, we deﬁne the normalized distance δ(s, f ).

δ(s, f ) =

∆(s, f )

∆

max

( f )

The value ∆

max

( f ) is the longest distance ∆

(g, f )

from any ﬂuent g to f , i.e.,

∆

max

( f )

def

= max

∆

(g, f )

where ∆

(g, f ) denotes the length of the shortest path

from g to f in the ﬂuent graph G.

Thus, ∆

max

( f ) is the longest possible distance

∆(s, f ) that is not inﬁnite. The normalized distance

δ(s, f ) will be inﬁnite if ∆(s, f ) = ∞, i.e., there is no

path from any ﬂuent in s to f in the ﬂuent graph. In

all other cases it holds that 0 ≤ δ(s, f ) ≤ 1.

Note, that the construction of the ﬂuent graph and

computing the shortest paths between all ﬂuents, i.e.,

the distance function ∆

, need only be done once for a

game. Thus, while construction of the ﬂuent graph is

more expensive for complex games, the cost of com-

puting the distance feature δ(s, f ) (or ∆(s, f )) only de-

pends (linearly) on the size of the state s.

DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS

131

5.1 Using Distance Features in an

Evaluation Function

To demonstrate the application of the distance mea-

sure presented, we use a simpliﬁed version of

the evaluation function of Fluxplayer (Schiffel and

Thielscher, 2007) implemented in Prolog. It takes the

ground DNF of the goal rules as ﬁrst argument, the

current state as second argument and returns the fuzzy

evaluation of the DNF on that state as a result.

1 ev a l ( ( D1 ; ...; Dn ), S , R ) : - ! ,

2 e v a l ( D1 , S , R1 ) , ..., e v al ( Dn , S , Rn ) ,

3 R is sum ( R1 , ..., Rn ) -

4 p rod u c t (R1 , ..., Rn ).

5 ev a l ( ( C1 , ..., Cn ), S , R ) : - ! ,

6 e v a l ( C1 , S , R1 ) , ..., e v al ( Cn , S , Rn ) ,

7 R is pr o d uct ( R1 , ... , Rn ).

8 ev a l (not( P ) , S , R) : - ! ,

9 e v a l ( P , S , Rp ) , R is 1 - Rp .

10 ev a l (true( F ) , S , 0 . 9 ) : - oc c u r s (F , S ) , !.

11 ev a l (true( F ) , S , 0 . 1 ).

Disjunctions are transformed to probabilistic

sums, conjunctions to products, and true statements

are evaluated to values in the interval [0, 1], basically

resembling a recursive fuzzy logic evaluation using

the product t-norm and the corresponding probabilis-

tic sum t-conorm. The state value increases with each

conjunct and disjunct fulﬁlled.

We compare this basic evaluation to a second

function that employs our relative distance measure,

encoded as predicate delta. We obtain this distance-

based evaluation function by substituting line 11 of

the previous program by the following four lines:

1 ev a l (true( F ) , S , R) : -

2 d e lta (S , F , D i stan c e ),

3 D ist a n ce = < 1 , ! ,

4 R is 0.8*(1 - Dis t ance ) + 0.1.

5 ev a l (true( F ) , S , 0).

Here, we evaluate a ﬂuent that does not occur in the

current state to a value in [0.1,0.9] and, in case the

relative distance is inﬁnite, to 0 since this means that

the ﬂuent cannot hold anymore.

5.2 Tic-Tac-Toe

state s

Figure 3: Two states of the Tic-Tac-Toe. The ﬁrst row is

still open in state s

but blocked in state s

Although on ﬁrst sight Tic-Tac-Toe contains no

relevant distance information, we can still take advan-

tage of our distance function. Consider the two states

as shown in Figure 3. In state s

the ﬁrst row con-

sists of two cells marked with an x and a blank cell.

In state s

the ﬁrst row contains two xs and one cell

marked with an o. State s

has a higher state value

than s

for xplayer since in s

xplayer has a threat of

completing a line in contrast to s

. The corresponding

goal condition for xplayer completing the ﬁrst row is:

1 li n e ( x ) : - true( c e l l (1 ,1 , x )) ,

2 true( c e l l (2 ,1 , x )) , true( c ell (3 ,1 , x ) ) .

When evaluating the body of this condition using

our standard fuzzy evaluation, we see that it cannot

distinguish between s

and s

because both have two

markers in place and one missing for completing the

line for xplayer. Therefore it evaluates both states to

1 ∗ 1 ∗ 0.1 = 0.1.

However, the distance-based function evaluates

true(cell(3,1,b)) of s

to 0.1 and true(cell(3,1,o)) of s

to 0. Therefore, it can distinguish between both states,

returning R = 0.1 for S = s

and R = 0 for S = s

5.3 Breakthrough

The second game is Breakthrough, again a two-player

game played on a chessboard. Like in chess, the

ﬁrst two ranks contain only white pieces and the last

two only black pieces. The pieces of the game are

only pawns that move and capture in the same way

as pawns in chess, but without the initial double ad-

vance. Whoever reaches the opposite side of the

board ﬁrst wins.

Figure 4 shows the initial position

for Breakthrough. The arrows indicate the possible

moves, a pawn can make.

The goal condition for the player black states that

black wins if there is a cell with the coordinates X,1

and the content black, such that X is an index (a number

from 1 to 8 according to the rules of index):

1 goal( black , 100) : -

2 i n dex (X ),true( c ell h o lds ( X , 1 , black )).

Grounding this rule yields

1 goal( black , 100) : -

2 true( c ell h old s (1 , 1 , bla c k ) ; ...;

3 true( c ell h old s (8 , 1 , bla c k )).

We omitted the index predicate since it is true for all 8

ground instances.

The complete rules for Breakthrough as well as Tic-

Tac-Toe can be found under http://ggpserver.general-game-

playing.de/.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

132

The standard evaluation function cannot distin-

guish any of the states in which the goal is not reached

because true(cellholds(X, 1, black)) is false in all of

these states for any instance of X.

The distance-based evaluation function is based

on the ﬂuent graph depicted in Figure 5.

Figure 4: Initial position in Breakthrough and the move op-

tions of a pawn.

Figure 5: A partial ﬂuent graph for Breakthrough, role

black.

Obviously it captures the way pawns move in

chess. Therefore evaluations of atoms of the form

true(cellholds(X, Y, black)) have now 9 possible values

(for distances 0 to 7 and ∞) instead of 2 (true and

false). Hence, states where black pawns are nearer

to one of the cells (1,8), .. ., (8,8) are preferred.

Moreover, the ﬂuent graph, and thus the distance

function, contains the information that some locations

are only reachable from certain other locations. To-

gether with our evaluation function this leads to what

could be called “strategic positioning”: states with

pawns on the side of the board are worth less than

those with pawns in the center. This is due to the fact

that a pawn in the center may reach more of the 8 pos-

sible destinations than a pawn on the side.

6 EVALUATION

For evaluation, we implemented our distance function

and equipped the agent system Fluxplayer (Schiffel

and Thielscher, 2007) with it. We then set up this

version of Fluxplayer (“ﬂux distance”) against its ver-

sion without the new distance function (“ﬂux basic”).

We used the version of Fluxplayer that came in 4th

in the 2010 championship. Since ﬂux basic is already

endowed with a distance heuristic, the evaluation is

comparable to a competition setting of two compet-

ing heuristics using distance features.

We chose 19 games for comparison in which we

conducted 100 matches on average. Figure 6 shows

the results.

pawn_whopping

battle

breakthrough

3pffa

four_way_battle

chinesecheckers3

point_grab

lightsout

catch_a_mouse

doubletictactoe

tictactoe

racetrackcorridor

smallest_4player

chickentictactoe

capture_the_king

chinesecheckers2

nim4

breakthroughsuicide_v2

knightthrough

-60 -40 -20 0 20 40 60

pawn_whopping

battle

breakthrough

3pffa

four_way_battle

chinesecheckers3

point_grab

lightsout

catch_a_mouse

doubletictactoe

tictactoe

racetrackcorridor

smallest_4player

chickentictactoe

capture_the_king

chinesecheckers2

nim4

breakthroughsuicide_v2

knightthrough

-60 -40 -20 0 20 40 60

Figure 6: Advantage in Win Rate of ﬂux distance.

The values indicate the difference in win rate, e.g.,

a value of +10 indicates that ﬂux distance won 55%

of the games against ﬂux basic winning 45%. Obvi-

ously the proposed heuristics produces results com-

parable to the ﬂux basic heuristics, with both having

advantages in some games. This has several reasons:

Most importantly, our proposed heuristic, in the way

it is implemented now, is more expensive than the

distance estimation used in ﬂux basic. Therefore the

evaluation of a state takes longer and the search tree

can not be explored as deeply as with cheaper heuris-

tics. This accounts for three of the four underper-

forming games. For example in nim4, the ﬂux basic

DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS

133

distance estimation provides essentially the same re-

sults as our new approach, just much cheaper. In chi-

nesecheckers2 and knightthrough, the new distance

function slows down the search more than its better

accuracy can compensate.

On the other hand, ﬂux distance performs better

in complicated games. There the higher accuracy of

the heuristics typically outweighs the disadvantage of

the heuristics being slower.

Interestingly the higher accuracy of the new dis-

tance heuristics is the reason for ﬂux distance los-

ing in breakthrough suicide. The game is exactly the

same as breakthrough, however, the player to reach

the other side of the board ﬁrst does not win but loses.

The heuristics of both ﬂux basic and ﬂux distance

are not good for this game since both are based the

minimum number of moves necessary to reach the

goal while the optimal heuristic would depend on

the maximum number of moves available to avoid

losing. However, since ﬂux distance is more ac-

curate, ﬂux distance selects even worse moves that

ﬂux basic. Speciﬁcally, ﬂux distance tries to maxi-

mize (a much more accurate) minimal distance to the

other side of the board, thereby allowing the opponent

to capture its advanced pawns. This behavior, how-

ever, results in a smaller maximal number of moves

until the game ends and forces to advance the few

remaining pawns quickly. Thus, the problem in this

game is not the distance estimate but the fact that the

heuristic is not suitable for the game.

Finally, in some of the games no changes

were found since both distance estimates performed

equally well. However, rather speciﬁc heuristics and

analysis methods of ﬂux basic could be replaced by

our new general approach. For example, the original

Fluxplayer contains a special method to detect when

a ﬂuent is unreachable, while this information is au-

tomatically included in our distance estimate.

Apart from the above results, we intended to use

more games for evaluation, however, we found that

the ﬂuent graph construction takes too much time in

some games where the next rules are complex. We

discuss these issues in Section 8.

7 RELATED WORK

Distance features are part of classical agent program-

ming for games like chess and checkers in order to

measure, e.g., the distance of a pawn to the promotion

rank. A more general detection mechanism was ﬁrst

employed in Metagamer (Pell, 1993) where the fea-

tures “promote-distance” and “arrival-distance” rep-

resented a value indirectly proportional to the distance

of a piece to its arrival or promotion square. However,

due to the restriction on symmetric chess-like games,

boards and their representation were predeﬁned and

thus predeﬁned features could be applied as soon as

some promotion or arrival condition was found in the

game description.

Currently, a number of GGP agent sys-

tems apply distance features in different forms.

UTexas (Kuhlmann et al., 2006) identiﬁes order

relations syntactically and tries to ﬁnd 2d-boards with

coordinates ordered by these relations. Properties

of the content of these cells, such as minimal/max-

imal x- and y-coordinates or pair-wise Manhattan

distances are then assumed as candidate features

and may be used in the evaluation function. Flux-

player (Schiffel and Thielscher, 2007) generalizes

the detection mechanism using semantic properties

of order relations and extends board recognition to

arbitrarily deﬁned n-dimensional boards.

Another approach is pursued by Clune-

player (Clune, 2007) who tries to impose a symbol

distance interpretation on expressions found in

the game description. Symbol distances, however,

are again calculated using Manhattan distances on

ordered arguments of board-like ﬂuents, eventually

resulting in a similar distance estimate as UTexas and

Fluxplayer.

Although not explained in detail, Ogre (Kaiser,

2007) also employs two features that measure the dis-

tance from the initial position and the distance to a

target position. Again, Ogre relies on syntactic detec-

tion of order relations and seems to employ a board

centered metrics, ignoring the piece type.

All of these approaches rely on the identiﬁcation

of certain ﬁxed structures in the game (such as game

boards) but can not be used for ﬂuents that do not

belong to such a structure. Furthermore, they make

assumptions about the distances on these structures

(usually Manhattan distance) that are not necessarily

connected to the game dynamics, e.g., how different

pieces move on a board.

In domain independent planning, distance heuris-

tics are used successfully, e.g., in HSP (Bonet and

Geffner, 2001) and FF (Hoffmann and Nebel, 2001).

The heuristics h(s) used in these systems is an ap-

proximation of the plan length of a solution in a re-

laxed problem, where negative effects of actions are

ignored. This heuristics is known as delete list re-

laxation. While on ﬁrst glance this may seems very

similar to our approach, several differences exist:

• The underlying languages, GDL for general game

playing and PDDL for planning, are different. A

translation of GDL to PDDL is expensive in many

games (Kissmann and Edelkamp, 2010). Thus,

ICAART 2012 - International Conference on Agents and Artificial Intelligence

134

directly applying planning systems is not often not

feasible.

• The delete list relaxation considers all (positive)

preconditions of a ﬂuent, while we only use one

precondition. This enables us to precompute the

distance between the ﬂuents of a game.

• While goal conditions of most planning prob-

lems are simple conjunctions, goals in the general

games can be very complex (e.g., checkmate in

chess). Additionally, the plan length is usually not

a good heuristics, given that only the own actions

and not those of the opponents can be controlled.

Thus, distance estimates in GGP are usually not

used as the only heuristics but only as a feature in

a more complex evaluation function. As a conse-

quence, computing distance features must be rel-

atively cheap.

• Computing the plan length of the relaxed planning

problem is NP-hard, and even the approximations

used in HSP or FF that are not NP-hard require to

search the state space of the relaxed problem. On

the other hand, computing distance estimates with

our solution is relatively cheap. The distances

∆

( f ,g) between all ﬂuents f and g in the ﬂuent

graph can be precomputed once for a game. Then,

computing the distance ∆(s, f

) (see Deﬁnition 3)

is linear in the size of the state s, i.e., linear in the

number of ﬂuents in the state.

8 FUTURE WORK

The main problem of the approach is its computa-

tional cost for constructing the ﬂuent graph. The most

expensive steps of the ﬂuent graph construction are

grounding of the DNF formulas φ and processing the

resulting large formulas to select edges for the ﬂuent

graph. For many complex games, these steps cause

either out-of-memory or time-out errors. Thus, an im-

portant line of future work is to reduce the size of for-

mulas before the grounding step without losing rele-

vant information.

One way to reduce the size of φ is a more selec-

tive expansion of predicates (line 5) in Algorithm 1.

Developing heuristics for this selection of predicates

is one of the goals for future research.

In addition, we are working on a way to construct

ﬂuent graphs from non-ground representations of the

preconditions of a ﬂuent to skip the grounding step

completely. For example, the partial ﬂuent graph in

Figure 1 is identical to the ﬂuent graphs for the other

8 cells of the Tic-Tac-Toe board. The ﬂuent graphs

for all 9 cells are obtained from the same rules for

next(cell(X,Y,_), just with different instances of the

variables X and Y. By not instantiating X and Y, the gen-

erated DNF is exponentially smaller while still con-

taining the same information.

The quality of the distance estimates depends

mainly on the selection of preconditions. At the mo-

ment, the heuristics we use for this selection are intu-

itive but have no thorough theoretic or empiric foun-

dation. In future, we want to investigate how these

heuristics can be improved.

Furthermore, we intend to enhance the approach

to use ﬂuent graphs for generalizations of other types

of features, such as, piece mobility and strategic posi-

tions.

9 SUMMARY

We have presented a general method of deriving dis-

tance estimates in General Game Playing. To obtain

such a distance estimate, we introduced ﬂuent graphs,

proposed an algorithm to construct them from the

game rules and demonstrated the transformation from

ﬂuent graph distance to a distance feature.

Unlike previous distance estimations, our ap-

proach does not rely on syntactic patterns or internal

simulations. Moreover, it preserves piece-dependent

move patterns and produces an admissible distance

heuristic.

We showed on an example how these distance fea-

tures can be used in a state evaluation function. We

gave two examples on how distance estimates can im-

prove the state evaluation and evaluated our distance

against Fluxplayer in its most recent version.

Certain shortcomings should be addressed to im-

prove the efﬁciency of ﬂuent graph construction and

the quality of the obtained distance function. Despite

these shortcomings, we found that a state evaluation

function using the new distance estimates can com-

pete with a state-of-the-art system.

REFERENCES

Bonet, B. and Geffner, H. (2001). Planning as heuristic

search. Artiﬁcial Intelligence, 129(1–2):5–33.

Clune, J. (2007). Heuristic evaluation functions for general

game playing. In AAAI, Vancouver. AAAI Press.

Hoffmann, J. and Nebel, B. (2001). The FF planning sys-

tem: Fast plan generation through heuristic search.

JAIR, 14:253–302.

Kaiser, D. M. (2007). Automatic feature extraction for

autonomous general game playing agents. In Pro-

ceedings of the Sixth Intl. Joint Conf. on Autonomous

Agents and Multiagent Systems.

DISTANCE FEATURES FOR GENERAL GAME PLAYING AGENTS

135

Kissmann, P. and Edelkamp, S. (2010). Instantiating gen-

eral games using prolog or dependency graphs. In

Dillmann, R., Beyerer, J., Hanebeck, U. D., and

Schultz, T., editors, KI, volume 6359 of Lecture Notes

in Computer Science, pages 255–262. Springer.

Kuhlmann, G., Dresner, K., and Stone, P. (2006). Automatic

Heuristic Construction in a Complete General Game

Player. In Proceedings of the Twenty-First National

Conference on Artiﬁcial Intelligence, pages 1457–62,

Boston, Massachusetts, USA. AAAI Press.

Love, N., Hinrichs, T., Haley, D., Schkufza, E., and Gene-

sereth, M. (2008). General game playing: Game de-

scription language speciﬁcation. Technical Report

March 4, Stanford University. The most recent ver-

sion should be available at http://games.stanford.edu/.

Pell, B. (1993). Strategy generation and evaluation for

meta-game playing. PhD thesis, University of Cam-

bridge.

Schiffel, S. and Thielscher, M. (2007). Fluxplayer: A suc-

cessful general game player. In Proceedings of the

National Conference on Artiﬁcial Intelligence, pages

1191–1196, Vancouver. AAAI Press.

Schiffel, S. and Thielscher, M. (2009a). Automated theorem

proving for general game playing. In Proceedings of

the International Joint Conference on Artiﬁcial Intel-

ligence (IJCAI).

Schiffel, S. and Thielscher, M. (2009b). A multiagent se-

mantics for the game description language. In Fil-

ipe, J., Fred, A., and Sharp, B., editors, Interna-

tional Conference on Agents and Artiﬁcial Intelli-

gence (ICAART), Porto. Springer.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

136