A New Stopping Criterion for Genetic Algorithms

Christelle Reynes and Robert Sabatier

Laboratoire de Physique Industrielle et Traitement de l’Information, EA 2415, Universit´e Montpellier 1,

15 Avenue Charles Flahault, 34070 Montpellier, France

Keywords:

Genetic Algorithm, Convergence, Stopping Criterion, Markov Chains, Simulation.

Abstract:

Obtaining theoretically legitimate stopping criteria is a difﬁcult task. Being able to use such criteria, especially

in real-encoding context, remains an open problem. The proposed criterion is based on a Markov chain

modelling and on the distribution of the number of occurrences of the locally best solution during several

generations under the assumption of non-convergence. The algorithm stops when the probability of obtaining

the observed number of occurrences is too small. The obtained criterion is able to ﬁt very different solution

spaces and ﬁtness functions (within studied limitations) without any required user intervention.

1 INTRODUCTION

In a theoretical point of view, a Genetic Algorithm

(GA) can be considered to have converged as soon

as the global optimum is found. But in practical is-

sues, convergence can only be detected by a persis-

tence of an optimum for several generations and is

rarely soundly adressed. In the proposedcriterion, the

number of generations without change of the current

local optimum as well as the proportion of the popu-

lation formed by the solution having the best ﬁtness

value will be taken into account.

Many studies arose about the design of a theoret-

ical framework to assess GA convergence. The most

important approach to model GAs is probably the use

of Markov chains (Davis and Principe, 1993).

The scope of the proposed criterion could be

linked to approaches such as takeover time and run-

time modelling. However, the former is based on

what happens without crossover and our objective is

to model the whole behaviour of the GA. The lat-

ter concerns a much more theoretical framework than

what is proposed here. Several studies have been pro-

posed such as (Storch, 2008) but their main goal is to

increase knowledge about algorithms behaviour and

performances and not really practical applications.

Some stopping criteria have been proposed but

most are based on binary encoding with rare exten-

sions to alphabets whose cardinality is restricted to

like in (Aytug and Koehler, 2000). The criterion

proposed in this paper follows the lead of those re-

searches as it also uses the Markov Chain formalism

to derive its results by studying the expected be-

haviour of GAs.

The proposed criterion acknowledges a less

rigourous theoretical framework but seeks for appli-

cability to as many cases as possible regarding en-

coding strategies, operator use, ﬁtness landscapes,...

Of course, there are limitations which are clearly ex-

plained in the following sections. It is important to

notice that this criterion does not claim to guarantee

that the optimal solution has been found, that is why

it will be called pseudo-convergence criterion. Ob-

viously, for real optimization issues, it is impossible

to ensure reaching the global optimum with heuris-

tic methods but it is of big interest to have criteria to

assess a good quality of the ﬁnal solution. Here, ev-

erything will be done to calibrate the criterion so that

it detects the convergence as quickly as possible but

ﬁnding a good quality solution with more conﬁdence

will be favoured with regards to speed.

2 A NEW CRITERION FOR

PSEUDO-CONVERGENCE

2.1 Overview of the Stopping Criterion

Our starting point is the following observation: a so-

lution with a good ﬁtness (locally or globally optimal)

is likely to gradually overrun the population. This is

due to selection, which favours survival of the best

solutions, and is likely to be strengthened by elitism.

202

Reynes C. and Sabatier R..

A New Stopping Criterion for Genetic Algorithms.

DOI: 10.5220/0004154202020207

In Proceedings of the 4th International Joint Conference on Computational Intelligence (ECTA-2012), pages 202-207

ISBN: 978-989-8565-33-4

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

Our stopping criterion is based on the number of oc-

currences of the locally best solution (denoted LBS,

that is to say the best one found so far) in the last pop-

ulations. One occurrence is deﬁned as one copy of

the solution which currently achieves the best ﬁtness

value. As elitism is to be used, the LBS is obviously

the best solution found so far.

The principle can be illustrated through a small

simulated example (described in section 3.1). Fig.

1 shows the evolution of the number of occurrences

of the LBS for 400 successive generations. After the

line, the number of occurences of the LBS count oc-

currences of the global optimum whereas before the

line, local optima were counted. It can be easily seen

that the number of LBS signiﬁcantly increases after

this appearance.

However, the algorithm convergence cannot be

questioned by considering only one generation. In-

deed, the stochastic aspect of the algorithm involves

constant ﬂuctuations. Hence, the sum of the results of

several successive generations will be used.

Conceptually, the criterion can be described as fol-

lows. The number of LBS occurences will be mod-

eled for one generation (denoted S

) and for the sum

of w successive generations (denoted S

, w > 1) un-

der the hypothesis that the global optimum has not

yet been found. After this modelling, it will be possi-

ble to associate a probabilityof obtainingan empirical

value s

obs

, P(S

= s

obs

), under this hypothesis. Thus,

as the GA comes to convergence, the probability for

to take the observed value will become very small

(let say less than p

). Then, we will be able to con-

sider that the underlying non convergence hypothesis

is no more true and we will decide to stop the algo-

rithm. Eventually, the criterion will be:

IF P(S

= s

obs

) < p

THEN stop the algorithm.

0 50 100 150 200 250 300 350 400 450

generations

occurences number

Figure 1: Evolution of the number of occurrences of the

LBS for the 400 ﬁrst generations for simulated data. The

vertical line indicates the ﬁrst appearance of the globally

best solution in the population.

2.2 Deﬁnition of the GA Used

Real encoding will be used. Concerning selection,

the ﬁtness of the new solutions is computed and the

solutions are ranked according to their ﬁtness val-

ues (ties are averaged). Then, the selection prob-

ability for the r-th ranked individual is deﬁned as

P[select r-th ranked individual ] = α × r+ β.

where α and β are deﬁned so that the sum over all the

individuals of the selection probabilities is one and

so that the probability to select the best individual is

twice as high as the median ranked individual. More-

over, elitism is used: the best solution of the current

population is automatically selected for the next gen-

eration.

In order to ease the modelling of the GA, muta-

tion and crossover rates will be applied to individual

solutions, and not on each encoding position.

2.3 Computation of P(S

= s

obs

)

2.3.1 Modelling of the Number of LBS

Occurrences for One Generation

Let {Z

} denote the process counting the number

of occurrences of the LBS in generation n. Unfor-

tunately, {Z

} does not only depend on {Z

n−1

} but

also on the quality of other solutions constituting the

previous population. Hence, in order to make it easier

to use theoretical results, two hypotheses have to be

assumed to consider {Z

} as a Markov chain.

Let {T

} denote a random variable which takes

value 0 if the LBS has changed between generations

n− 1 and n (denoting that the GA has not converged)

and value 1 if the same LBS has been kept. Then the

ﬁrst hypothesis required is the following one:

Hypothesis 1:

∀n ∈ N, P(T

= 1) = ϕ and P(T

= 0) = 1 − ϕ, for

some real constant ϕ ∈ [0, 1].

This hypothesis indicates that the probability of a lo-

cal optimum change cannot be null (the global op-

timum has not been reached) and does NOT evolve

along generations. In practice, this probability obvi-

ously changes. However, the most important is to ob-

tain a modelling which is especially precise just be-

fore convergence. When the process is far from con-

vergence, the model will over estimate the distance

to convergence but the model will ﬁt the process be-

haviour when the situation is decisive. That is why

we will choose a value for ϕ which is close to 1 (see

section 3.2 for more details).

ANewStoppingCriterionforGeneticAlgorithms

203

Hypothesis 2: The probability for new occurrences

of the current LBS to appear from individuals which

are not currently optimum is neglected.

With this hypothesis we consider that only selection

is responsible for increasing the frequency of the lo-

cal optimum. We neglect the possibility for mutation

and crossover to generate new occurrences of the

currently considered LSB. This hypothesis will be

of minor importance when the ﬁtness function takes

many different values and when the solution space is

of high dimension.

Modelling:

Once these two hypotheses are assumed, Z

value

only depends on Z

n−1

value and {Z

} can be consid-

ered as an order 1 Markov chain. Hence, its behaviour

can be described through its initial state and transition

probabilities, π

(k, l) = P[Z

= k|Z

n−1

= l] (with

(k, l) ∈ {1, 2, . . . , T

pop

}

, where T

pop

is the population

size). Two conditioning steps are required to compute

these probabilities.

First Conditioning

(k, l) has to be split according to the two possible

values of T

. Z

compulsory equals 1 if T

= 0.

Then, if k = 1, it may be due to a LBS change or the

previous LBS may have been lost during mutation

and crossover and retrieved thanks to elitism.

Second Conditioning

Let Z

n−1

denote the number of instances of the LBS

remaining when mutation and crossover operators

have been applied to generation (n− 1). If Z

= l, ac-

cording to the second hypothesis Z

∈ {0, 1, . . . , l}.

We obtain P[Z

= k|Z

n−1

= l,T

= 1] =

∑

j=1

P[Z

k|Z

n−1

= l, T

= 1, Z

n−1

= j]P[Z

n−1

= j].

To compute P[Z

n−1

= j], we have to consider

and p

the respective probabilities of mutation

and crossover for one solution. Then the probabil-

ity for one solution to undergo at least one change is

p = p

+ p

− p

× p

(as probability and crossover

are independent) and the probability to undergo no

change is q = 1− p. Finally, the distribution of Z

given by a binomial one with parameters (l, q) :

P[Z

n−1

= j] =





(1− q)

l− j

= b

jlq

(1)

Now, elitism has to be taken into account as it adds

one occurrence of the locally best solution. Hence,

only (k− 1) occurrences have to be selected to obtain

k occurrences in the next generation.

To compute the selection probability of the LBS,

when it has j occurrences, we need to use the se-

lection operator deﬁnition introduced in section 2.2.

The rank affected to each occurrence of the LBS is

r = T

pop

−

j−1

. Hence, the selection probability of

each occurrence of the LBS is p

∗

= α

r+β and the se-

lection probability of any of the j occurrences of the

LBS is j × p

∗

Outcome

Actually, here is the formula for π

(k, l) depending on

the value of k:

if k = 1,

(1, l) = 1− ϕ

∑

j=1

(1− jp

∗

)

pop

jlq

− 1

(2)

if 1 < k < T

pop

, π

(k, l) =

∑

j=1



pop

k− 1



( jp

∗

)

k−1

(1− jp

∗

)

pop

−k+1

jlq

and if k = T

pop

, π

pop

, l) =

∑

j=1



pop

( jp

∗

)

pop

−1

(1− jp

∗

) + ( jp

∗

)

pop



jlq

2.3.2 Modelling of the Number of LBS Instances

along w Generations

Let deﬁne:

(t)

∑

i=1

t+i

Hypothesis 3:

We assume that {S

(t)

} is stationary, that is to say, its

characteristics do not depend on time. In this case, it

means that:

P[S

(u)

= j] = P[S

(v)

= j], ∀(u, v) ∈ N

It is then possible to simply study S

∑

i=1

Several simulation results (not shown here)

showed that the stationarity hypothesis is not far from

reality and takes into account that the locally best so-

lution can still change which is completely consistent

with the non convergence.

Now, to determine the distribution of S

, we will

ﬁrstly focus on the joint distribution of (S

, Z

)

whose distribution is recursively assessed by

∀w, P[S

w+1

= s, Z

w+1

= k] =

pop

∑

l=1

w+1

(k, l)P[S

= s− k, Z

= l]. (3)

To obtain S

distribution, it is necessary to sum over

all the states of Z

IJCCI2012-InternationalJointConferenceonComputationalIntelligence

204

2.3.3 Final Criterion

Actually, let us sum up the computation of

P(S

= s

obs

P[S

= s] =

pop

∑

k=1

P[S

= s, Z

= k],

with

∀w, P[S

w+1

= s, Z

w+1

= k] =

pop

∑

l=1

w+1

(k, l)P[S

= s− k, Z

= l]. (4)

3 CRITERION STUDY

3.1 Threshold Determination

This illustration is a clustering problem. The goal is

to conceive a GA that performs unsupervised learn-

ing with an unknown number of groups. Only a max-

imum allowed number of groups, K

max

, has to be a

priori given. The approach chosen here is to optimize

the assignment of the observations to groups, making

the issue a combinatorial problem.

During initialization, for each potential solution in

the population, a number of groups, k, is uniformly

randomly chosen in {2, . . . , K

max

}. Then, an integer

in {1, . . . , k} is uniformly randomly chosen for each

of the n observations. Concerning mutation, three

possibilities are allowed: withdrawing or adding a

group and changing one or more assignment(s). The

crossover is a uniform one. Finally, the ﬁtness func-

tion depends on the number of groups, k, and on the

sum of within-group variances, in order to take into

account both parsimony and precision of the model.

The dataset is a simulated one, so that the global

optimal solution is known. The data contain 80 ob-

servations divided into four groups and described by

ﬁve features. The ﬁrst two features give the location

of the observations in a plan whereas the other three

ones are only uniform noise. For each group, the val-

ues of the ﬁrst two variablesare generated by a normal

distribution whose average gives the centroid location

and whose variance indicates the range.

To perform a ﬁrst, coarse estimation of the satisfy-

ing p

, the described GA has been applied six times

on this dataset and we observed the evolution of S

For these runs, the global optimum has been found by

the GA after respectively 109, 77, 71, 98, 70 and 73

generations. If we choose p

= 10

−3

, the global op-

timum is missed in three out of six runs (results not

shown). For p

= 10

−4

, one more run is success-

ful. From p

= 10

−5

, the global optimum is always

found.

In order to test p

= 10

−5

, the GA was run one

hundred times leading to 91% of optimum discovery

(with only one misclassiﬁcation in the remaining 8%

and two in the last 1%). It would be possible to choose

a smaller threshold. However, the efﬁciency improve-

ment would be small whereas the computation time

before stopping would be much lengthened. That is

why we chose p

= 10

−5

By applying the formula obtained in section 2.3

with the chosen parameters, Tab. 1 gives some mini-

mum values of s

obs

required to stop the algorithm for

usual values of p

, p

and T

pop

3.2 Parameters Inﬂuence

The criterion deﬁnition implies that it depends on GA

parameters but also on its window size w, P[T

= 0]

and the threshold previously studied. The continua-

tion of that section will deal with the study of the ﬁrst

two parameters inﬂuence. Thiscan be doneregardless

of the optimization problem which does not interfere

in the distribution computation.

P[T

= 0] has to be constant due to the 1

hypoth-

esis. We have determined P[T

= 1] < 1. Moreover

along generations, P[T

= 0] will rapidly become very

weak. Thus, small values are going to be studied.

The results can be found in Fig. 2. The plotted

value roughly shows the proportion that must be ﬁlled

up by the LBS before deciding to stop the GA. As

expected, the less the probability to ﬁnd a better so-

lution, the less this proportion. So that the criterion

is more stringent, small values of P[T

= 0] will be

favoured. From now on, we choose P[T

= 0] = 0.01.

Concerning the window size, Fig. 2 shows values

between 3 and 50. For small w the ﬁlled up proportion

has to be more important and rapidly decreases when

w value increases. From twenty generations, the de-

crease slows down that is why we chose a value of

w = 20 for further applications.

3.3 Limitations

The 1

hypothesis in section 2.3.1 has been studied in

previous paragraphs. Even if it is a strong hypothesis,

taking a small value for P[T

= 0] allows to minimize

the consequences with regards to convergence. The

objectiveof this section is to highlightcases for which

the second hypothesis cannot be assumed.

Firstly, if the considered issue deals with a solu-

tion space whose dimension is small, the probability

to obtain several times the same solution cannot be

ANewStoppingCriterionforGeneticAlgorithms

205

Table 1: Minimum values of s

obs

required to stop the algorithm for p

= 10

−5

, w = 20, p

∈ {0.5, 0.6, 0.7}, p

∈

{0.5, 0.6, 0.7} and T

pop

∈ {50, 100, 200,500}.

0.5 0.6 0.7

50 100 200 500 50 100 200 500 50 100 200 500

0.5 126 130 132 134 97 99 100 101 76 77 78 78

0.6 97 99 100 101 80 81 82 82 66 67 67 67

0.7 76 77 78 78 66 67 67 67 57 57 58 58

0 5 10 15 20 25 30 35 40 45 50

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Window size

Stopping proportion

Figure 2: Evolution of the stopping criterion with the win-

dow size and P[T

= 0]. The solid line curve represents

P[T

= 0] = 0.5, the dashed one to P[T

= 0] = 0.25, the dot-

ted one to P[T

= 0] = 0.1, the bold one to P[T

= 0] = 0.05

and the dot-dashed one to P[T

= 0] = 0.01.

neglected. However, the optimization in such spaces

is quite easy and does not require the use of a GA.

Yet, the same phenomenon would occur for many so-

lutions having the same ﬁtness value. For instance,

this can happen if the ﬁtness function is a misclassiﬁ-

cation with few individuals to be classiﬁed. In such a

case, if the solutions are really equivalent in the appli-

cation context, the problem is likely to be solved by

a simpler optimization method, else the ﬁtness has to

be reformulated to take into account this variety.

On the other hand, when initialization is not com-

pletely random but focused around chosen points our

criterion should not be used.

4 APPLICATIONS

The ﬁrst application will allow a detailed study of pa-

rameters inﬂuence including function complexity and

GA parameters. Finally, some usual test problems for

optimization will be used to show the results of the

criterion for various and difﬁcult functions.

4.1 Rastrigin’s Function

The generalized Rastrigin function (M¨uhlenbein

et al., 1991) is a usual non linear multimodal func-

tion used to test optimization methods. This function

presents many close local minima and only one global

minimum. The shape of the function is determined by

the external variables A and L, which control the am-

plitude and frequency modulations respectively. The

global minimum is 0.

Concerning Rastrigin’s parameters, for A, integer

values between 2 (the amplitude in the data is then

about 55) and 15 (the amplitude is about 100) will be

considered. The effect of L is really important, for

L = 1, the solution space contains 25 local minima

and for L = 5, we ﬁnd 729 minima. Thus, integer val-

ues between 1 and 5 will be considered.

For the ﬁrst simulation, we studied the inﬂuence

of A and L for p

= 0.6, p

= 0.5, T

pop

= 100 and

d = 2. For each combination between A and L, ﬁfty

runs of the GA have been performed.

For all the runs, the ﬁnal solution was into the

deepest hole of the function even if the average num-

ber of generations required to see the global optimum

for the ﬁrst time increases with both values of A and

Then, the GA parameters, T

pop

∈ {50, 100, 200},

and p

in{0.5, 0.6, 0.7} are studied for A ∈ {2, 25}

and L ∈ {2, 5}. For each combination, 450 runs have

been performed. Excepted the case A = 15 and L = 5,

the error rate is very low (0% in 59% of combinations,

less than 1% of errors in the remaining combinations)

and seems to be independent on the GA parameters.

In complex issues, the stopping criterion is slightly

less efﬁcient (but it only fails in maximum one out

of ten trials) and requires appropriate GA parame-

ters. Hence, our stopping criterion is really helpfulbut

does not make it any the less necessary to look for ap-

propriate parameters for the most complex problems.

The last studied parameter is the Rastrigin dimen-

sion d. Eight values have been chosen between 3 and

10. For the easiest combinations, the stopping crite-

rion performs very well (at most, the deepest hole was

missed twice). For 5 ≤ d ≤ 9 the optimum is missed

in at most 20% of runs but for d = 10, 17 out of 50

runs were not successful. These cases correspond to

really complex situations and it is not really surpris-

ing to miss the real optimum for certain trials. Here, it

can be interesting to notice that, in a general point of

IJCCI2012-InternationalJointConferenceonComputationalIntelligence

206

Table 2: Convergence results. In the ﬁrst three columns,

range, average and standard error (between brackets) val-

ues of the objective functions are given. The last column

indicates the theoretical optimum value.

function y

min

max

¯y y

opt

Osborne 6e-5 5e-3 2.54e-4 5.46e-5

(5e-4)

Bard 8.2e-3 8.2e-3 8.2e-3 0.008215

(4e-7)

Biggs 1.2e-6 5.5e-3 5.26e-4 0

(1e-3)

Gulf 8.4e-32 8.2e-5 2.2e-6 0

(6e-6)

view, it is always reasonable to perform a GA several

times to evaluate the solution robustness.

4.2 Application to Standard Test

Problems

A subset of test functions in (Mor´e et al., 1981) con-

sisting in sum of squares of n

functions of n

vari-

ables is used: namely Osborne I, Bard, Biggs EXP6

and Gulf Research and Development. Results are in-

troduced in Tab. 2.

In all cases, solutions obtaining very good values

of the objective functions have been found during the

different runs and the worst objective function value

obtained is always close compared to the real range of

it. Hence, running the GA a few times (which can be

considered as compulsory when dealing with stochas-

tic optimizers) using the proposed stopping criterion

is likely to bring much information about the true so-

lution. When no information is known about the

objective function behaviour, it could be really difﬁ-

cult to decide to stop after any given number of gen-

erations. Indeed, considering these functions, the cri-

terion required between a few hundreds and several

tens of thousands of generations to stop.

5 CONCLUSIONS

Thanks to the modelling of the process describing the

number of occurrences of the LBS during several suc-

cessive generations, a new stopping criterion has been

proposed for real-encoded GAs. The originality of

our criterion is on one side the focus made on the LBS

occurences and on the other side, the generality of

its use: operators are completely free as long as they

respect the deﬁnition of the mutation and crossover

rates and especially the criterion has been developed

to apply on real-encoded GAs. It has the main advan-

tage of taking into account all the GA operators with-

out requiring user intervention when changing prob-

lem. The modelling required three hypotheses imply-

ing some cases where this stopping criterion should

not be applied.

Despite the required simpliﬁcations, the theoret-

ical developments performed in this paper allow to

provide a useful understanding of GA unfolding even

if they do not restore the whole complexity of reality.

This distance between the model and the real situation

leads us to consider a very small probability (10

−5

)

for the algorithm stopping. In our opinion, this dis-

tance is mainly due to the second hypothesis.

Concerning the ﬁrst hypothesis, the most stringent

case has been chosen. Then, we probably would be

able to stop earlier without missing the global opti-

mum. However, the main goal of this criterion is not

to achieve speed performances. It is more speciﬁcally

designed to enable the user to obtain a good solution

without intervention in the GA stopping process.

Actually, even if the model does not perfectly ﬁts,

the simulations performed in this paper proved the

stopping criterion efﬁciency. Our stopping rule ap-

peared to be equally efﬁcient for completely differ-

ent and very complex functions. Robustness was also

shown concerning changes in the GA parameters.

Actually, the proposed stopping criterion should

be used instead of arbitrary criteria, for problems

within limitations of section 3.3. It does obviously

not guarantee to ﬁnd the global optimum, hence the

GA has to be run several times.

REFERENCES

Aytug, H. and Koehler, G. (2000). New stopping criterion

for genetic algorithms. European Journal of Opera-

tional Research, 126(3):662–674.

Davis, T. and Principe, J. (1993). A markov chain frame-

work for the simple genetic algorithm. Evolutionary

computation, 1(3):269–288.

Mor´e, J., Garbow, B., and Hillstrom, K. (1981). Testing un-

constrained optimization software. ACM Transactions

on Mathematical Software (TOMS), 7(1):17–41.

M¨uhlenbein, H., Schomisch, M., and Born, J. (1991). The

parallel genetic algorithm as function optimizer. Par-

allel computing, 17(6-7):619–632.

Storch, T. (2008). On the choice of the parent population

size*. Evolutionary Computation, 16(4):557–578.

ANewStoppingCriterionforGeneticAlgorithms

207