Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for

Model Selection

Simon Reichhuber

and Sven Tomforde

Intelligent Systems, Kiel University, Hermann-Rodewald-Str. 3, 24118 Kiel, Germany

Keywords:

Evolutionary Algorithms, Bet-Based Learning, Model Selection, Gradient-Free Optimisation.

Abstract:

This paper presents a framework for the application of an external bet-based evolutionary algorithm to the

problem of model selection. In particular, we have deﬁned two new risk functions, called sample space exotic-

ness and conﬁguration space exoticness. The latter is used to manage the risk of bet placement. Further, we

explain how to implement the bet-based approach for model selection in the domain of multi-class classiﬁca-

tion and experimentally compare the performance of the algorithm to reference derivative-free hyperparameter

optimisers (GA and Bayesian Optimisation) on MNIST. Finally, we experimentally show that for the classiﬁers

SVM, MLP, and Nearest Neighbors the balanced accuracy can be increased by up to three percentage points.

1 INTRODUCTION

The ﬁeld of Evolutionary Algorithms (EAs) has in-

tensively been expanded throughout the last decades.

Similar to other nature-inspired algorithms such as

Neural Networks, the design of these algorithms fol-

lowed a bottom-up approach, where only the main

ideas from nature have been simulated: In Neu-

ral Networks the neural structure or in EAs the en-

coding scheme for bits of information as chromo-

somes. With the increasing computing capabilities,

researchers have extended the basic concept of EAs to

solve more complex problems, evolving these meth-

ods to one of the standard methods for non-gradient-

based optimisation.

At the same time, the underlying methodology of

EAs remained the same: The solution of the opti-

misation problem is evolved by a randomly gener-

ated population with the help of the operator’s se-

lection, recombination and mutation successively ap-

plied over several generations to form a ”ﬁtter” popu-

lation. The term ”ﬁtness” indicates how close a chro-

mosome from the population has come to the opti-

mum of the optimisation problem.

Since the idea of ”ﬁnding the ﬁttest” of a large

population is also well-known in the domain of bet-

ting, we proposed to augment the concept of the basic

EA by introducing another population, the so-called

https://orcid.org/0000-0001-8951-8962

https://orcid.org/0000-0002-5825-8915

”bet population” (Reichhuber and Tomforde, 2021;

Reichhuber and Tomforde, 2022). Both populations

are evolved like normal EAs, but the bet population

consists of individuals that have insight into the solu-

tion space. With this, they can place bets (here repre-

sented as Gaussians) to all individuals that are located

in the vicinity of the bet position. These individuals

beneﬁt from the bets by increasing their ﬁtness. On

the other hand, a bet individual can win or lose de-

pending on whether the ﬁtness of the individuals in

the vicinity of the bet location has increased or de-

creased from the last generation.

In this paper, we extend and reﬁne the idea and

adapt the concept to the model selection problem. The

underlying concept is that this adaptation tackles two

major problems of EAs: First, the diversity of the

ﬁtness of the population stays healthy, meaning the

trade-off between exploring weak solutions and ex-

ploiting and reﬁning the so-far best solutions. Second,

also the diversity of the locations of the solutions is

increased, which is especially interesting for solving

optimisation problems with multiple local optima, as

is the case in model selection.

For the evaluation of the novel approach, we have

chosen the ﬁeld of model selection for multi-class

classiﬁcation and compared our approach to other

model selection algorithms, such as standard Genetic

Algorithms (GAs) or Bayesian Optimisation (BO).

We propose two novel exoticness metrics that are used

for risk management of the betting process and show

experimentally the resulting behaviour.

Reichhuber, S. and Tomforde, S.

Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for Model Selection.

DOI: 10.5220/0011698500003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 2, pages 259-267

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

259

The remainder of the paper is organised as fol-

lows. Section 2 discusses related work in the area

of Model Selection based on evolutionary algorithms,

also covering preliminary work on bet-based evolu-

tionary algorithms. Section 3 explains our approach

and Section 4 evaluates the methodology in compari-

son to other approaches with the goal to evolve suit-

able hyperparameters

for classiﬁers. Finally, Sec-

tion 5 summarises the paper and gives an outlook to

future work.

2 RELATED WORK

In the year 1988 Peter Asch and Richard Quandt anal-

ysed empirical data consisting of horse racetrack bets

and found some interesting results (Asch and Quandt,

1988). In the scenario, they found evidence that the

total of all bets resulting from individual win proba-

bility estimations are good approximates for the ob-

jective win probabilities. The winning probability of

a single horse has no Markov property in that sense

that even if its last run was not successful, it may not

have met its win conditions, but normally it is a hid-

den champion. The trust in such hidden champions

may result in so called long-shot bets that require the

perseverance to lose money over multiple unsuccess-

ful bet placements. The authors found tendencies that

a betting bias exists in that favourites are underbet and

long-shots overbet. These ﬁndings may have inspired

the authors to the title ’Exotic’ bets. The emergent

effect of approximating the win probabilities by the

sum of the subjective bets of different betting agents

inspired us to simulate the process of betting. For

the sake of generality, we replaced the horse riding

scenario with an evolutionary environment of an arbi-

trary ﬁtness function, which shall be optimised (Re-

ichhuber and Tomforde, 2021). In this paper, we fo-

cus on Model Selection in the scenario of classiﬁca-

tion, which is based on the architecture of a Genetic

Algorithm (GA).

John Holland published his idea of implementing

GAs and reﬁned it to a large extent (Holland, 1975;

Booker et al., 1989). Afterwards, several extensions

have been proposed. Inspired by evolution, the ba-

sic idea was to develop a natural selection process

through selection, recombination and mutation. Af-

ter a certain number of generations, the procedure al-

lowed to create individuals, which are better adapted

to the environment.

The terms hyperparameter and model are used synony-

mously in this paper. The same holds for hyperparameter

optimisation and model selection.

The adaptations of GAs to real numbers and

modiﬁcations of different mutation or recombination

strategies reducing the number of generations, hence

the number of ﬁtness function calls, turned the al-

gorithm into a general-purpose optimisation tool for

derivative-free multi-variate optimisation. Therefore,

many scientists in the domain of Machine Learning

(ML) made suggestions on how to adapt GAs to the

task of model selection. Another question was to

which environments the GAs are applicable to if sci-

entists think about the model selection task in Ma-

chine Learning.

In the literature, one can ﬁnd multiple applica-

tions of GAs applied to Model Selection. For ex-

ample, (Guerbai et al., 2022) have applied one-class

support vector machines in the scenario of Novelty

Detection and multi-class classiﬁcation or (Buchtala

et al., 2005) evolved radial basis function classiﬁers

for data mining applications. Furthermore, the au-

thors of (Young et al., 2015) applied EAs to deep neu-

ral networks with multiple convolutional and fully-

connected layers. Similar to our approach, in (de Lac-

erda et al., 2002) the authors used bootstrapping for

modelling the true prediction error but besides this

paper, most of the architectures rely on the basic GA

as it is presented by Holland (Holland, 1975). How-

ever, there are multiple suggested architectures such

as the Island/Course-grained Model as it is described

in (Gong et al., 2015) or the cellular distributed EAs,

which are analysed in (Giacobini et al., 2003; Gi-

acobini et al., 2004) – for example regarding their

takeover times, which is the duration that allows a

dominant individual to occupy the whole population.

There have also been other types of EAs, besides

the encoding of individuals. For example, Thomas

ack (B

ack and Schwefel, 1993) categorised three

types of EAs: Evolution Strategies, Evolutionary Pro-

gramming, and Genetic Algorithms (Forrest, 1996;

Grefenstette, 1993; Bies et al., 2006; Paterlini and

Minerva, 2010; Lessmann et al., 2006; Devos et al.,

2014; de Lacerda et al., 2002). First bet mechanics

have been applied to EAs in the context of a bet mech-

anism, where an external bet population coexists and

is able to place bets on the main population (Reichhu-

ber and Tomforde, 2021; Reichhuber and Tomforde,

2022). Each individual in the bet population repre-

sents a certain bet strategy and places bets on the main

population. These bets directly inﬂuence the ﬁtness of

the main population for the sake of high diversity.

Betting is not the only methodology, which steers

the evolution of GAs from an external source. In

(Guerbai et al., 2022), for instance, the evolutionary

process is steered with the help of temperature-based

control. Here, the EAs are used to optimise parame-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

260

ters of radial basis networks for novelty detection and

multi-class classiﬁcation (Frazier, 2018; Snoek et al.,

2012; Shahriari et al., 2015; Bodnar et al., 2020).

3 METHODOLOGY

In the following, we describe the methodology of

model selection for classiﬁcation with the means

of bet-based evolutionary algorithms. Mathemati-

cally, we solve an optimisation problem (see Equa-

tion 1), in which we want to ﬁnd the optimal hy-

perparameter θ

∗

∈ Θ among a predeﬁned set of hy-

perparameters Θ used to train a multi-class classiﬁer

θ,X

train

: X → Y , which has been trained on the

training data (X

train

Furthermore, a test error function is given

test-error : X × Y × Y → R measuring the classi-

ﬁcation error, i.e. a quantiﬁcation of the difference

between the true labels Y

test

and the predicted labels

test

) ∈ Y . With these ingredients, we can deﬁne

the optimisation problem:

∗

= argmin

θ∈Θ

test-error(X

test

θ,X

train

test

)).

(1)

Due to the size of the parameter space and the com-

putational complexity of the test error, we iteratively

ﬁnd a near-to-optimal solution given a limited com-

putation time. After a rough description of the basic

procedure, the novel genetic operators are explained

in detail. These are Initialisation, Encoding, and ﬁt-

ness calculation. At the end of the section, we de-

scribe the bet placement mechanism and explain the

two exoticness metrics used for risk management of

betting.

3.1 Procedure of External Bet-Based

Evolutionary Algorithms for Model

Selection

Finding the optimal D hyperparameters as the opti-

mal hyperparameter conﬁguration θ

∗

∈ R

of Equa-

tion 1, we establish a bet-based Evolutionary Algo-

rithm (BEA). By that, we refer to the minimisation

of the test error of a machine learning model through

evolutionary algorithms steered by an external popu-

lation. In Figure 1, the basic idea of the two coexisting

populations and their interaction during two iterations

is visualised. The separation of both populations is

taken from (Reichhuber and Tomforde, 2022) with a

few adaptations (highlighted in green in Figure 1).

Main Population

Fitness

Bet Population

Next Main

Population

Winnings

test-error

Bets

Next test-error Fitness

GA Operations

Next Bet Population

GA Operations

Losses

Training Data

Bootstrapping

Subset

Compute

Configuraiton Space

Exoticness

Compute Sample

Set Exoticness

1. Iteration

2. Iteration

Figure 1: External bet-based Evolutionary Algorithm pro-

cedure applied to Model Selection task.

In the ﬁrst iteration, a population is initialised,

which we denote the main population P

main

. Ad-

ditionally, we create the coexisting population de-

noted bet population P

bet

. Both populations are

uniform-randomly drawn from the hyperparameter

ranges, θ

min

, θ

max

∈ R

and the sigma step ranges

max

, σ

max

∈ R

are given a-priori. Afterwards, a

new, small subset (X

) is drawn from the training

data (X

train

), which we call bootstrapped set.

Then, each individual of the main population is eval-

uated according to the test error measuring the differ-

ence between the test data X

test

and the predic-

tions from a trained classiﬁer. The latter is equipped

with parameters from the individual and was trained

on the bootstrapped set.

Extracting the risk of the current bet-placement,

which we explain later, each individual j = 1, . . . , N

bet

of the bet population places bets on the parameter

space represented by a Gaussian with diagonal co-

variance, i.e. N (x

x|µ

, σ

), µ

, σ

∈ R

. Individuals

of the main population beneﬁt from a ﬁtness increase

when they are located in the vicinity of bets of the

main population. The exact calculation can be seen in

Algorithm 1.

Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for Model Selection

261

Algorithm 1: Procedure of exotic bet-based model selec-

tion.

1: function EXOTIC BET-BASED MODEL SELEC-

TION( f , θ

min

, θ

max

, σ

min

, σ

max

train

, X

test

)

 Normalise the data

2: X

train

← norm(X

train

, µ

train

, σ

train

)

3: X

test

← norm(X

test

, µ

train

, σ

train

)

 Initialise main and bet population

4: P

(0)

main

← {x

= (x

i,d

)

d=1

)}

main

i=1

5: x

i,d

∼ U([θ

min,d

, θ

max,d

])

6: P

(0)

bet

← {b

= (µ

j,d

, σ

j,d

)}

bet

j=1

7: µ

j,d

∼ U([θ

min,d

, θ

max,d

])

8: σ

j,d

∼ U([σ

min,d

, σ

max,d

])

9: for each g = 1, . . . , #Generations do

 Draw new bootstrap subset

10: X

← draw(X

train

)

 Calculate Sample Space Exoticness

11: SSE ← 1 − LDOVL(X

train

, X

)

12: for each i = 1, . . . , N

main

 Calculate ﬁtness of main population

13: F

main,i

←

14: −test-error

15: (X

test

, f

test

))

 Calc. Conﬁguration Space Exoticness

16: CSE

← CSE = 1 − N (x

x|P

main

)

 Calculate the individual risk

17: risk

← (1 − SSE) ∗CSE

 Calculate bets

18: bet

← risk

∗

bet

∑

bet

N (x

|µ

, σ

)

 Add individual bet to the ﬁtness

19: F

main,i

← F

main,i

+ bet influence

20: ∗ bet

21: end for

22: for each j = 1, . . . , N

bet

 Calculate ﬁtness of bet population

23: F

bet, j

←

main

∑

main

i=1

N (x

|µ

, σ

)

24: end for

 Apply GA Operations:

 Selection, Recombination, Mutation

25: P

(g)

main

← GA-Operation(P

(g−1)

main

, F

main

)

26: P

(g)

bet

← GA-Operation(P

(g−1)

bet

, F

bet

)

27: end for

28: θ

∗

←

argmax

x∈P

(g)

main

29: −test-error(X

test

30: f

train

test

))

31: return θ

∗

32: end function

To calculate the risk of betting, the bootstrapped

subset is analysed by each individual of the bet pop-

ulation to estimate an individual value of exoticness

called Sample Space Exoticness. Additionally, given

the deviation of the bet individual’s own guess to all

other guesses, (also known as Conﬁguration Space

Exoticness), each bet individual is able to calculate

its own risk of the bet placement as follows:

bet

= risk

∗

bet

∑

N (x

|µ

, σ

) (2)

where the risk is a combination of SSE and CSE:

risk

= (1 − SSE) ∗CSE

. (3)

The calculated bets will then be added to the main

population’s ﬁtness. After this correctly predicted

wins or losses of ﬁtness in the main population are

used as ﬁtness for the bet population. After the eval-

uation of the ﬁtness of both populations, the Genetic

Operations will be applied, which are listed below and

explained in the following.

1) Selection of the parents for example by using uni-

versal stochastic sampling

2) Recombination of the parents to obtain the chil-

dren population called offspring, e.g. with the

help of uniform crossover

3) Mutations on all individuals beside a small frac-

tion representing the elite individuals, e.g. with

one-step mutations

4) Declare the parental population and the offspring

as next generation

3.2 Genetic Operators

In the following, we point out the differences between

our approach compared to the EA presented in (Re-

ichhuber and Tomforde, 2021). Following the general

procedure of an EA, we describe each step in detail:

3.2.1 Initialisation

In the ﬁrst step, the individuals of both populations

are uniformly drawn from the conﬁguration space

∼U[x

min

, x

max

] ⊂ R

. Here, the limits of the hyper-

parameters have to be deﬁned in advance. Extensions

to a dynamically increasing feature space are possible

but have not been considered further in this paper.

3.2.2 Encoding

In the main population, each individual represents

a speciﬁc hyperparameter conﬁguration. Depending

on the classiﬁer we have to choose from, which re-

quires D hyperparameters, we encode the individual

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

262

Main Population

Bet Population

Population





















…















parents

















…



  







children







…







…





individuals

Configuration Space

…



























…

















parents

























…







 









children







…







…

 individuals

Sample Space

Figure 2: Encoding and interactions of the main population

and the bet population.

i ∈ {1, . . . , N

main

} of the main population of size N

main

as:

= (x

[1], . . . , x

[D])

On the other hand, we encode the individual of the bet

population j ∈ N

bet

such that each bet individual con-

sists of two components: The estimated guess of the

optimum µ

µ ∈ R

and the certainty about this guess

σ ∈ R

. Equipped with these parameters, the bet in-

dividual is able to calculate a so-called sample space

exoticness (SSE) and conﬁguration space exoticness

(CSE), which are explained in more detail in Sec-

tion 3.3.

In Figure 2, the encoding and interactions of both

populations are depicted. The bet-based parameters

are abbreviated as:

= (b

[1], . . . , b

[2D])

3.2.3 Fitness Calculation: Main Population

The ﬁtness function aims at an estimation of the test

classiﬁcation balanced accuracy of the classiﬁcation

problem. For most of the models, it is computation-

ally infeasible to train and test each individual on the

whole given sample data, i.e. the matrix inversion

in ﬁnding the optimal hyperplane in support vector

machines has time complexity O(n

) and cannot deal

with magnitudes of millions. Therefore, we test each

individual on a small subset. The problem of ap-

proximating the test error from a small subset was

pointed out by (Lessmann et al., 2006). They argue

that the problem of ﬁnding the correct hyperparame-

ter for support vector machines is quite hard when no

validation set is available. To make the most out of the

subset, we use k-fold cross-validation as in (de Lac-

erda et al., 2002). Each individual’s hyperparameter

choice is evaluated on a small subset of the training

data, which is uniformly and randomly drawn with

replacement. The training data represents a standard

classiﬁcation problem consisting of N measurements

∈ X and their target values y

∈ Y . In the case of

classiﬁcation, we suggest the balanced accuracy value

as a validation metric.

Since the computing time for the ﬁtness calcula-

tion crucially inﬂuences the total computing time (the

ﬁtness has to be calculated in each iteration for each

individual), we consider the balanced accuracy on a

bootstrapped set as a computational low-cost estima-

tion of the test error. Since the calculation of the ﬁt-

ness value has a sensitive share in the total computing

time of the algorithm, it should take as little time as

possible. To avoid overﬁtting to a speciﬁc bootstrap

set, we repeatedly draw a novel set in each iteration.

Using bootstrapping we draw N

<< N

train

samples

from the training data. These evaluate the balanced

accuracy score for each conﬁguration. In addition to

the negative test error the ﬁtness is reﬁned by the bets:

main,i

← F

main,i

+ bet influence ∗ bet

, (4)

where bet influence ∈ R

is a parameter that con-

trols the intensity of betting.

3.2.4 Fitness Calculation: Bet Population

The ﬁtness of the bet population is then deﬁned as:

bet

) =

main

∑

i=1

N (x

|µ

, σ

) (5)

3.3 Sample Space Exoticness and

Conﬁguration Space Exoticness

Each bet individual places bets according to risk man-

agement based on the Conﬁguration Space Exoticness

and the Sample Space Exoticness. The Conﬁguration

Space Exoticness expresses the deviation of a hyper-

parameter from the main population and the Sample

Space Exoticness is referred to the deviation of the

bootstrapped set to the training set.

3.3.1 Conﬁguration Space Exoticness

Based on the hyperparameter conﬁguration x

, repre-

sented by an individual of the main population, the bet

individual calculates the Conﬁguration Space Exotic-

ness (CSE) value for each sample. This value repre-

sents the deviation of the hyperparameter to all other

hyperparameters of the main population and is mea-

sured as the normal distribution, which can be seen in

Equation 6

CSE = 1−N (x

x|P

main

) = 1−N (x

x|µ

main

, σ

main

) (6)

, where means and variances of the main population

are deﬁned as follows:

main

∑

x∈P

main

Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for Model Selection

263

main

∑

x∈P

main

x − µ

main

)

3.3.2 Sample Space Exoticness

On the other hand, for each bootstrapped subset X

drawn at the beginning of a generation, a bet individ-

ual can calculate its Sample Space Exoticness value

SSE(X

) as seen in Equation 7.

SSE(X

(t)

) = 1 − LDOVL(X

train

, X

) (7)

The function LDOVL is a special metric for the compar-

ison of two distributions, which we designed for the

requirements of exotic BEAs. When calculating the

similarity of distributions in the conﬁguration space, a

major problem arises: On the one hand, hyperparam-

eters can be sensitive to slight changes, on the other

hand, local maxima can be far from each other. This

means that the similarity measure has to be sensible

to overlapping Gaussians in the vicinity of each other

(like it holds for the joint probability or the overlap-

ping coefﬁcient (OVL) (Inman and Jr, 1989) ) and

simultaneously has to be capable of measuring far-

distant Gaussians (like the Euclidean distance of the

means or the RMSE of the means and the standard de-

viations) without the risk of vanishing similarity due

to numerical instabilities.

Therefore, we have developed a so-called long-

distant overlapping coefﬁcient (LDOVL).

LDOVL. To measure slight differences of two dis-

tributions, which we assume in the case of bootstrap-

ping, we base our approach on the overlapping coefﬁ-

cient (OVL) (Inman and Jr, 1989). Given two normal

distributions N (·|µ

, σ

) and N (x|µ

, σ

) the over-

lapping coefﬁcient is calculated as following:

OV L(µ

, σ

, µ

, σ

) =

∞

−∞

min(N (x|µ

, σ

), N (x|µ

, σ

))dx.

(8)

One drawback of the OVL is that it vanishes for long-

distant distributions, where |µ

− µ

| >> σ

+ σ

. To

get around this problem, we introduce two areas: The

touching area and the long-distant area. The touch-

ing area is deﬁned for distributions where |µ

− µ

| ≤

k(σ

+σ

) with k is a multiplier for the standard devi-

ations. Within this range, we refer to touching distri-

butions and derive a normalised version of the over-

lapping coefﬁcient that is not vanishingly small. The

normalised version of the OVL is derived as follows:

OV L

(µ

, σ

, µ

, σ

) =

OV L(µ

, σ

, µ

OV L

min

(σ

, σ

)

OV L

max

(σ

, σ

) − OV L

min

(σ

, σ

)

(9)

where OV L

min

(σ

, σ

) =

OV L(0, σ

, 0 + k ∗ (σ

+ σ

), σ

)

and OV L

max

(σ

, σ

) =

OV L(0, σ

, 0, σ

)

In the formula above, we take the min-max nor-

malisation based on the minimum OVL value (cf. Fig-

ure 3a) and the maximum OVL value (cf. Figure 3b)

that is possible when shifting the second distribution

over the other without changing the standard devia-

tions.

(a) Minimum overlapping.

(b) Maximum overlapping.

Figure 3.

In contrast to the touching area, the long-distant

area is only used for long-distant distributions, where

|µ

− µ

| > k ∗ (σ

+ σ

). Here, we use the propor-

tion of the distance of both means |µ

− µ

| to the

maximum feature range f

max

− f

min

and normalise this

value with the min/max normalisation.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

264

Figure 4: Long-distant-overlapping coefﬁcient (LDOVL)

of two Normal distributions N (µ

= 0, σ

= 1) and

N (µ

, σ

= 4), µ

∈ [−40, 40]. Maximum LDOVL-value

is observed where the means of the two distributions are

equal, i.e. µ

= µ

= 0.

(µ

, σ

, µ

, σ

) = 1 −

|µ

− µ

| − k ∗ (σ

+ σ

)

max

− f

min

− k ∗ (σ

+ σ

)

(10)

Finally, to take care of a given feature range de-

ﬁned within a minimum/maximum feature value, i.e.

[ f

min

, f

max

], we cap distances, where one distribution

is outside the feature range to zero.

Combining all cases into one formula and intro-

ducing a weight to shift between the proportion of

touch distance and long-distance, we summarise the

long distant overlapping coefﬁcient (LDOVL):

LDOV L(µ

, σ

, µ

, σ

) =











0 if I holds

(1 − λ) + λ ∗ OV L

(µ

, σ

, µ

, σ

), if II holds

(1 − λ) ∗ LD

(µ

, σ

, µ

, σ

), if III holds

where

I : µ

6∈ [ f

min

, f

max

] or µ

6∈ [ f

min

, f

max

]

II : |µ

− µ

| ≤ k ∗ (σ

+ σ

)

III : k ∗ (σ

+ σ

) < |µ

− µ

| ≤ f

max

− f

min

and 0 ≤ λ ≤ 1

(11)

For a better understanding, we plotted the dif-

ferent LDOVL values of two distributions (see Fig-

ure 4). Here, the ﬁrst distribution remains constant

(N (µ

= 0, σ

= 1)) and the mean value of the sec-

ond distribution is shifted over the range -40 to 40

(N (µ

, σ

= 4)). We set the standard deviation mul-

tiplier to 2, which means that the touching area is de-

ﬁned between −k ∗ (σ

+ σ

) = −2 ∗ (1 + 4) = −10

and +k ∗ (σ

+ σ

) = 2 ∗ (1 + 4) = 10.

One-Dimensional Box Embedding. As the metric

LDOVL is only applicable for one dimensional point

Figure 5: One-dimensional box embedding.

Table 1: Hyperparameter ranges for polynomial support

vector machines (Poly SVM), radial basis function sup-

port vector machines (RBF SVM), multi-layer perceptrons

(MLP), and nearest neighbours (NN). The Poly SVM is

equipped with the kernel k(x

x, x

) = (γhx

x, x

+ ri)

and the

RBF SVM with the kernel k(x

x, x

) = exp(−γ||x

x − x

Classiﬁer Hyperparameters Parameter grid

Poly SVM

C Log([e − 3, e3])

r U([0, 100])

d [1, . . ., 6]

γ Log([e − 3, e3])

RBF SVM

C Log([−3, 3])

d [1, . . ., 4]

γ Log([e − 3, e3])

pca comp [5, .. . , 200]

MLP

layers

{(300), (150, 150), (100,200), (200, 100),

(100, 100, 100), (75,75, 75, 75),

(50, 100, 100, 50),(100, 50, 50, 100)}

α Log([e − 6, e − 1])

batch size {2

, 2

, . . . , 2

}

learning rate {

constant

adaptive

}

learning rate init Log([e − 4, e − 1])

activation f unction {

identity

logistic

tanh

relu

}

# neighbours [1, 2, 3]

weights [

uniform

distance

]

p-norm U[1, 3]

distributions, we applied a one-dimensional box em-

bedding as visualised in Figure 5.

4 EXPERIMENTAL EVALUATION

We conducted a showcase on the dataset mnist (see

Table 2) of using exotic bet-based Evolutionary Al-

gorithms (exotic BEA) for the hyperparameter opti-

misation using the classiﬁer from Table 1 with their

most important parameters and selected a hyperpa-

rameter range with speciﬁc scale. The classiﬁers are

taken from the python library sklearn (version 1.1.1).

All classiﬁers besides NN are preceded by a nor-

malisation consisting of a zero-mean-unit-standard-

deviation transformation and a min-max scaling to the

range [−1, 1].

Table 2: Properties of used dataset mnist.

Dataset # Features (train size, test size)

mnist (LeCun et al., 2010) (28, 28) gray images (60k, 10k)

Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for Model Selection

265

Table 3: Hyperparameter Optimisation (HPO) comparison

of the algorithms Genetic Algorithms (GA), Exotic bet-based

Evolutionary algorithms Exotic BEA, and Bayesian Opti-

misation (BO) on the mnist dataset.

Classiﬁer HPO balanced accuracy

Poly SVM

GA 0.864

Exotic BEA (bet infl. = 5e4) 0.898

Exotic BEA (bet infl. = 8e4) 0.889

Bayesian Optimisation 0.844

RBF SVM

GA 0.925

Exotic BEA (bet infl. = 5e4) 0.948

Exotic BEA (bet infl. = 8e4) 0.925

Bayesian Optimisation 0.902

GA 0.910

Exotic BEA (bet infl. = 5e4) 0.904

Exotic BEA (bet infl. = 8e4) 0.931

Bayesian Optimisation 0.901

MLP

GA 0.903

Exotic BEA (bet infl. = 5e4) 0.922

Exotic BEA (bet infl. = 8e4) 0.945

Bayesian Optimisation 0.892

Table 4: Parameters used for the evolutionary algorithm.

Parameter Description Value

Population size 100

Number of genera-

tions

100

Parents ratio 50 %

Elites ratio 1 %

Mutation probability

for means

1 %

Mutation probability

for covariances

1 %

For comparisons, we investigated the following

hyperparameter optimisers:

• Genetic Algorithms

• Bayesian optimisation (Nogueira, 14 )

• Exotic BEA with bet influence = 50000

• Exotic BEA with bet influence = 80000

For the sake of comparability, all the hyperparam-

eter optimisers have been called the test-error func-

tion 1000 times. For example, they have been evalu-

ated 100 main individuals over a period of 100 gener-

ations. Here, the bootstrap size was also set to 1000.

The EA- parameters from Table 4 have been used for

the main and the bet population.

In the case of mnist, the history of the maximum

test-error values of the evaluation of all EA algorithms

can be seen in Figure 7. This also indicates that the

test-error of the bootstrap set is only a rough estima-

tion of the test error on the whole training set. In

Figure 6, the effects of the betting process on the

main population’s ﬁtness over two generations is vi-

sualised.

Figure 6: Inﬂuence of betting on the ﬁtness distribution of

the main population (bet influence= 50000).

Figure 7: History of maximum ﬁtness values of 100 gener-

ations over 10 runs of Genetic Algorithms and exotic BEA

algorithms with various bet influence parameters. The

classiﬁer Poly SVM was trained on mnist. The maximum

ﬁtness values are sorted by ﬁtness over all generations.

5 CONCLUSION

The paper has shown a framework of how to ex-

ploit bet-based Evolutionary Algorithms to solve the

Model Selection task. In detail, we have deﬁned two

new risk functions, called sample space exoticness

and conﬁguration space exoticness. The latter is used

to manage the risk of bet placement. We also com-

pared the new model selector on different experimen-

tal scenarios and compared it to normal Genetic Al-

gorithms and Bayesian Optimisation, which showed

a slight advantage in terms of balanced accuracy: For

RBF SVMs evaluated on mnist the Exotic BEA ac-

quired a balanced accuracy of 0.948 in comparison

to Bayesian Optimisation (0.844) or normal Genetic

Algorithms (0.925).

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

266

REFERENCES

Asch, P. and Quandt, R. E. (1988). Betting bias in ‘ex-

otic’bets. Economics Letters, 28(3):215–219.

ack, T. and Schwefel, H.-P. (1993). An overview of evo-

lutionary algorithms for parameter optimization. Evo-

lutionary computation, 1(1):1–23.

Bies, R. R., Muldoon, M. F., Pollock, B. G., Manuck,

S., Smith, G., and Sale, M. E. (2006). A genetic

algorithm-based, hybrid machine learning approach

to model selection. Journal of pharmacokinetics and

pharmacodynamics, 33(2):195.

Bodnar, C., Day, B., and Li

o, P. (2020). Proximal distilled

evolutionary reinforcement learning. In Proceedings

of the AAAI Conference on Artiﬁcial Intelligence, vol-

ume 34, pages 3283–3290.

Booker, L. B., Goldberg, D. E., and Holland, J. H. (1989).

Classiﬁer systems and genetic algorithms. Artiﬁcial

intelligence, 40(1-3):235–282.

Buchtala, O., Klimek, M., and Sick, B. (2005). Evolu-

tionary optimization of radial basis function classiﬁers

for data mining applications. IEEE Transactions on

Systems, Man, and Cybernetics, Part B (Cybernetics),

35(5):928–947.

de Lacerda, E., de Carvalho, A., and Ludermir, T. (2002).

A study of cross-validation and bootstrap as objec-

tive functions for genetic algorithms. In VII Brazilian

Symposium on Neural Networks, 2002. SBRN 2002.

Proceedings., pages 118–123.

Devos, O., Downey, G., and Duponchel, L. (2014). Si-

multaneous data pre-processing and svm classiﬁcation

model selection based on a parallel genetic algorithm

applied to spectroscopic data of olive oils. Food chem-

istry, 148:124–130.

Forrest, S. (1996). Genetic algorithms. ACM Computing

Surveys (CSUR), 28(1):77–80.

Frazier, P. I. (2018). Bayesian optimization. In Recent ad-

vances in optimization and modeling of contemporary

problems, pages 255–278. Informs.

Giacobini, M., Alba, E., Tettamanzi, A., and Tomassini, M.

(2004). Modeling selection intensity for toroidal cel-

lular evolutionary algorithms. In Genetic and Evolu-

tionary Computation Conference, pages 1138–1149.

Springer.

Giacobini, M., Tomassini, M., and Tettamanzi, A. (2003).

Modeling selection intensity for linear cellular evo-

lutionary algorithms. In International Conference

on Artiﬁcial Evolution (Evolution Artiﬁcielle), pages

345–356. Springer.

Gong, Y.-J., Chen, W.-N., Zhan, Z.-H., Zhang, J., Li, Y.,

Zhang, Q., and Li, J.-J. (2015). Distributed evolution-

ary algorithms and their models: A survey of the state-

of-the-art. Applied Soft Computing, 34:286–300.

Grefenstette, J. J. (1993). Genetic algorithms and machine

learning. In Proceedings of the sixth annual confer-

ence on Computational learning theory, pages 3–4.

Guerbai, Y., Chibani, Y., and Meraihi, Y. (2022). Tech-

niques for selecting the optimal parameters of one-

class support vector machine classiﬁer for reduced

samples. International Journal of Applied Meta-

heuristic Computing (IJAMC), 13(1):1–15.

Holland, J. (1975). Adaptation in natural and artiﬁcial sys-

tems, univ. of mich. press. Ann Arbor.

Inman, H. F. and Jr, E. L. B. (1989). The overlapping co-

efﬁcient as a measure of agreement between probabil-

ity distributions and point estimation of the overlap of

two normal densities. Communications in Statistics -

Theory and Methods, 18(10):3851–3874.

LeCun, Y., Cortes, C., and Burges, C. (2010). Mnist hand-

written digit database. ATT Labs [Online]. Available:

http://yann.lecun.com/exdb/mnist, 2.

Lessmann, S., Stahlbock, R., and Crone, S. F. (2006). Ge-

netic algorithms for support vector machine model se-

lection. In The 2006 IEEE International Joint Con-

ference on Neural Network Proceedings, pages 3063–

3069. IEEE.

Nogueira, F. (2014–). Bayesian Optimization: Open source

constrained global optimization tool for Python.

Paterlini, S. and Minerva, T. (2010). Regression model

selection using genetic algorithms. In Proceedings

of the 11th WSEAS international conference on nu-

ral networks and 11th WSEAS international confer-

ence on evolutionary computing and 11th WSEAS in-

ternational conference on Fuzzy systems, pages 19–

27. World Scientiﬁc and Engineering Academy and

Society (WSEAS).

Reichhuber, S. and Tomforde, S. (2021). Bet-based evolu-

tionary algorithms: Self-improving dynamics in off-

spring generation. In ICAART (2), pages 1192–1199.

Reichhuber, S. and Tomforde, S. (2022). Evolving gaus-

sian mixture models for classiﬁcation. In ICAART (3),

pages 964–974.

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and

De Freitas, N. (2015). Taking the human out of the

loop: A review of bayesian optimization. Proceedings

of the IEEE, 104(1):148–175.

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practi-

cal bayesian optimization of machine learning algo-

rithms. Advances in neural information processing

systems, 25.

Young, S. R., Rose, D. C., Karnowski, T. P., Lim, S.-H.,

and Patton, R. M. (2015). Optimizing deep learning

hyper-parameters through an evolutionary algorithm.

In Proceedings of the workshop on machine learning

in high-performance computing environments, pages

1–5.

Exotic Bets: Evolutionary Computing Coupled with Bet Mechanisms for Model Selection

267