A Comprehensive Study on Subgraph Crossover in Cartesian Genetic

Programming

Roman Kalkreuth

Department of Computer Science, TU Dortmund University, Otto-Hahn-Straße 14, Dortmund, Germany

Keywords:

Cartesian Genetic Programming, Crossover, Subgraph.

Abstract:

While tree-based Genetic Programming is often used with crossover, Cartesian Genetic Programming (CGP)

is mostly used only with mutation as the sole genetic operator. In contrast to comprehensive and fundamental

knowledge about crossover in tree-based GP, the state of knowledge in CGP appears to be still ambiguous

and ambivalent. Two decades after CGP was ofﬁcially introduced, the role of recombination in CGP is still

considered to be an open and remaining question. Although some promising steps have been taken in the last

years, comprehensive studies are needed to evaluate the role of crossover in CGP on a large set of problems.

In this paper, we take a step forward on the crossover issue by comparing algorithms that utilize the subgraph

crossover technique which has been proposed for CGP to the traditional mutation-only CGP. Experiments on

well-known symbolic regression and Boolean function problems demonstrate that the use of algorithms that

utilize the subgraph crossover outperform the mutation-only CGP on well-known benchmark problems.

1 INTRODUCTION

Genetic programming (GP) can be understood as

a method that enables the automatic derivation of

programs for problem-solving. First work on GP

has been done by Forsyth (1981), Cramer (1985)

and Hicklin (1986). Later work by Koza (1990, 1992,

1994) signiﬁcantly popularized the ﬁeld of GP. GP

traditionally uses trees as program representation but

is not limited to this representation model. Over

two decades ago Miller, Thompson, Kalganova, and

Fogarty presented ﬁrst publications on Cartesian Ge-

netic Programming (CGP) —an encoding model in-

spired by the two-dimensional array of functional

nodes connected by feed-forward wires of an FPGA

device (Miller et al., 1997; Kalganova, 1997; Miller,

1999). CGP offers a graph-based representation

which in addition to standard GP problem domains,

makes it easy to be applied to many graph-based ap-

plications such as electronic circuits, image process-

ing, and neural networks. In addition to tree-based

GP CGP has pivotal advantages:,

• CGP encodes a directed acyclic graph (DAG)

which allows the evolution of structures that can

be represented as DAGs. In this way, CGP also

facilitates evolving topologies.

• The maximal size of encoded solutions is

bounded, saving CGP to some extent from bloat

that is characteristic to GP

Standard CGP is mostly used with mutation as the

sole genetic operator and with a (1 + λ)-strategy. The

reason for this is that various standard genotypic

crossover techniques failed to improve the search per-

formance of standard CGP. In contrast to comprehen-

sive knowledge about crossover in tree-based GP, the

state of knowledge in CGP appears to be still ambigu-

ous and ambivalent. The state of knowledge about

crossover in CGP has been recently surveyed and the

role of crossover is still considered to be an open and

remaining question. Even if some progress has been

made in recent years, comprehensive and detailed

knowledge about crossover in CGP is still missing.

A promising step forward was made by the introduc-

tion of the subgraph crossover (Kalkreuth et al., 2017)

but this technique has not been comprehensively stud-

ied in the past. Therefore, this work follows up for-

mer work on the crosover question by investigating if

the search performance of CGP algorithms that utilize

the subgraph crossover can be more efﬁcient as the

commonly used mutation-only CGP on a set of well-

known benchmark problems. This work also surveys

and analyzes relevant work which contributed to the

knowledge about crossover in standard CGP. Based

on our experiments and their results, we formulate

Kalkreuth, R.

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming.

DOI: 10.5220/0010110700590070

In Proceedings of the 12th International Joint Conference on Computational Intelligence (IJCCI 2020), pages 59-70

ISBN: 978-989-758-475-6

and analyze hypotheses that address important search

performance dogmas in the ﬁeld of CGP. The primary

intention of this work is to shed more light on the role

of the subgraph crossover in CGP and to pave the way

for further analyses of the behavior of crossover based

CGP algorithms. We study the ﬁtness space on a sub-

set of problems to obtain more understanding of our

ﬁndings.

Section 2 of this paper describes CGP and the sub-

graph crossover. Section 3 surveys previous work on

crossover in CGP, outlines the need for a comprehen-

sive study and hypotheses to be examined are formu-

lated. Section 4 is devoted to the experimental results

and the description of our experiments. We analyze

hypotheses that have been formulated in Section 3. In

Section 6 we discuss the results of our experiments.

Finally, Section 7 gives a conclusion and outlines fu-

ture work.

2 RELATED WORK

2.1 Cartesian Genetic Programming

In contrast to tree-based GP, CGP represents a ge-

netic program via genotype-phenotype mapping as an

indexed, acyclic, and directed graph. Originally the

structure of the graphs was a rectangular grid of n

rows and n

columns, but later work focused on a rep-

resentation with one row. The CGP decoding proce-

dure processes groups of genes and each group refers

to a function node of the graph. An exception are the

last genes of the genotype which represent the out-

puts. Each node is represented by two types of genes

which index the function number in the GP function

set and the node inputs. These nodes are called func-

tion nodes and execute functions on the input values.

The number of input genes depends on the maximum

arity n

of the function set. Given the number of out-

puts n

, the last n

genes in the genotype represent

the indices of the nodes, which lead to the outputs.

A backward search is used to decode the correspond-

ing phenotype. An example of the backward search

of the most popular one-row integer representation is

shown in Figure 1. The backward search starts from

the program output and processes all nodes which are

linked in the genotype. In this way, only active nodes

are processed during evaluation. The genotype in Fig-

ure 1 is grouped by the function nodes.

The ﬁrst (underlined) gene of each group refers to

the function number in the corresponding function

set in the ﬁgure. The integer-based representation

of CGP phenotypes is mostly used with mutation

only. Early studies on the efﬁciency showed that sev-

Genotype

0 1  0 1 2 1 2 2  3

Phenotype

IP1

IP2

Function

Lookup Table

Index Function

Addition

Subtraction

Division

Decode

Node

Number

2 3

432

Figure 1: Example of the decoding procedure of a CGP

genotype to its corresponding phenotype. The nodes are

represented by two types of numbers which index the num-

ber in the function lookup table (underlined) and the inputs

(non-underlined) for the node. Inactive function nodes are

shown in gray color. The identiﬁers IP1 and IP2 stand for

the two input nodes with node index 0 and 1. The identiﬁer

OP stands for the output node of the graph.

eral genetic crossover operators do not contribute to

the search performance of CGP. The number of in-

puts n

, outputs n

, and the length of the genotype is

ﬁxed. Every candidate program is represented with

∗n

∗(n

+ 1) + n

integers. Even when the length

of the genotype is ﬁxed for each candidate program,

the length of the corresponding phenotype in CGP is

variable, which can be considered as an advantage of

the CGP representation.

CGP is traditionally used with a (1+λ) selection

scheme of evolutionary algorithms. The new popu-

lation in each generation consists of the best individ-

ual of the previous population and the λ created off-

spring. The breeding procedure is mostly done by a

point mutation that swaps genes in the genotype of

an individual in the valid range by chance. An exam-

ple of a point mutation is given in Figure 2. The ﬁg-

ure shows the ﬂip of the value of a connection gene,

which causes a rewiring of the corresponding func-

tion node. Another point mutation is the ﬂip of the

functional gene, which causes a change of functional

behavior of the corresponding function node.

The (1+λ)-CGP is often used with a selection strategy

called neutrality, the idea that genetic drift yields to

diverse individuals having equal ﬁtness. The genetic

drift is implemented into the selection mechanism in a

way that individuals that have the same ﬁtness as the

normally selected parent are determined, and one of

these same-ﬁtness individuals is returned uniformly

at random.

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications

0 0 0 41 1 0 2 2 3 3 3 3

Node

number

2 3 4 5 OP

IP1

IP2

Parent

0 0 0 41 1 0 2 2 2 3 3 3

Node

number

2 3 4 5 OP

IP1

IP2

Mutant

Mutation

Figure 2: Example of the standard point mutation operator

in integer-encoded CGP. Genes of the genotype are selected

by chance, and their values are randomly ﬂipped within the

legal range of possible values. The connection gene of node

4 is mutated from a value of 2 to a value of 3. This causes

a rewiring of the second input of node 4 from the output of

node 3 to the output node of node 2.

2.2 The Subgraph Crossover Technique

The subgraph crossover technique for CGP was in-

troduced by Kalkreuth et al. (2017) and is inspired

by the subtree crossover found in tree-based GP. To

recombine two directed acyclic graphs, the subgraph

recombination is performed by respecting the CGP

phenotype. Merely swapping parts of the genotype

would be a disastrous approach according to the re-

portings by Clegg et al. (2007). The phenotype of

each individual is represented by the active path of

the graph and is determined through the evaluation

process. Furthermore, the active path of a graph leads

to the semantic value of a certain individual in CGP.

As a consequence, the subgraph crossover exclusively

recombines the genetic material of the active paths.

The idea of the subgraph crossover is that it should re-

duce the disruption which is caused by the genotypic

single-point crossover in standard CGP and truly re-

combine subgraphs.

For the description of the subgraph crossover proce-

dure, let n

be the predeﬁned number of inputs and

let n

be the predeﬁned number of function nodes. In

CGP, the inputs are indexed from 0 to n

−1 and the

function nodes of each graph are indexed from n

+ n

−1. The nodes which lie between the input

and output nodes are denoted as function nodes. The

crossover is done with two parents which are denoted

as P

and P

. For the crossover procedure, the node

numbers of the active function nodes are necessary.

The node numbers of the active nodes of P

and P

are stored in two arrays M

and M

. The active nodes

are determined by the backward search in the evalua-

tion procedure.

To deﬁne one suitable crossover point, we deﬁne two

possible crossover points C

and C

of the two par-

ents. With information about the active nodes and

the length of the path, we can choose two possible

crossover points. The possible crossover points C

and C

are chosen by chance in the range of the ac-

tive function nodes which are stored in M

and M

The possible crossover points may not be input or out-

put nodes. A general crossover point C

is deﬁned

by choosing the smaller crossover point from C

and

. The reason for this is that the subgraphs of the

parents, which will be placed in front of or behind

the crossover point of the offspring’s genome should

be balanced. The representation of CGP allows ac-

tive paths of an individual, which can start in the mid-

dle or back of the graph. The subgraph which will

be placed in front of the crossover point has to start

at more leading active nodes. If C

is deﬁned as the

possible point C

, the subgraph of P

in front of C

will be placed in front of C

in the offspring genome.

The subgraph behind C

of P

will be placed behind

in the offspring genome The crossover procedure

produces a new genome that represents the offspring

involving the phenotypes of both parents. In the case

that two children should be produced, the crossover

procedure is performed twice with two different gen-

eral crossover points. Since the representation of CGP

provides connections to any of the previous function

nodes of the graph, performing only the neighbour-

hood connect could result in a monotone data ﬂow of

the resulting phenotype. An example of the crossover

procedure is illustrated in Figure 3

3 THE STATUS OF CROSSOVER

IN STANDARD CGP

According to Clegg et al. (2007), the ﬁrst attempts of

recombination in standard CGP included the testing

of different genotypic crossover techniques. For in-

stance, the genetic material was recombined by swap-

ping parts of the genotypes of the parent individuals

or randomly exchanging selected nodes. Clegg et al.

(2007) reported that all four techniques failed to im-

prove the convergence of CGP and that merely swap-

ping the integers (in whatever manner) in the CGP

representation disrupts the search performance. Com-

pared to running CGP with mutation only, the addi-

tion of these crossover techniques hindered the per-

formance. The four methods were tested on the stan-

dard integer-based representation of CGP. In one of

the ﬁrst empirical studies in CGP, Miller (1999) ana-

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming

0 2 1

Index Function

0 +

1 -

2 *

3 /

3  2  3

OP1

2 1 0 

2 0  0

OP2

0 2  0

0 3  3

1  1  1 

3 0  0

0 4  5

3 5 2

Parent P

:= 6

 := 3

Function Lookup Table

:= 3

Offspring

OP2

0 2  0

0 2  1

2 1 0 

2 0  0

3 5 2

Offspring

= {2,4}

= {2,3,5,6}

Subgraph S

Random Connected Edge

Neighbourhood Connected Edge

Figure 3: Example of the subgraph crossover technique.

The subgraph crossover basically works similar to the

single-point crossover except that the active nodes on both

sides of the crossover point are preserved. The crossover

point is chosen in a way that it is located between active

function nodes. At the top of the ﬁgure, the arrays with the

active nodes and crossover points are listed. Below this in-

formation, the genotypes and phenotypes of the parents and

the offspring are shown, and the parts of the crossover are

marked with dashed boxes.

lyzed the computational efﬁciency on Boolean func-

tion problems. More precisely, Miller analyzed and

studied the inﬂuence of population size on the efﬁ-

ciency of CGP. The key ﬁnding was that extremely

low populations are most effective for the tested prob-

lems. The experiments of this study also revealed that

the use of a genotypic crossover reduces the compu-

tational effort only marginally. This was the motiva-

tion for the introduction of a real-valued representa-

tion and new crossover technique for CGP by Clegg

et al. (2007). The real-valued representation of CGP

represents the directed graph as a ﬁxed-length list

of real-valued numbers in the interval [0,1]. The

genes are decoded to the integer-based representa-

tion by their normalization values (number of func-

tions or maximum input range). The recombina-

tion of two genotypes is performed by an arithmetic

crossover with a random weighting factor, which can

also be found in the ﬁeld of real-valued Genetic Algo-

rithms. Clegg et al. (2007) showed that the new repre-

sentation in combination with crossover improves the

convergence behavior of CGP. However, for the later

generations, Clegg et al. (2007) found that the use

of crossover in real-valued CGP disrupts the conver-

gence on one of the two tested problems. Later work

by Turner (2012) presented results on three classes

of computational problems, digital circuit synthesis,

function optimisation and agent-based wall avoid-

ance. On these problems, it was found that the real-

valued representation together with the crossover op-

eration performed worse than standard CGP.

Kalkreuth et al. (2017) introduced and investigated a

subgraph crossover in CGP. They evaluated the utility

of the new crossover operator for a range of crossover

rates on a suite of benchmark problems in circuit de-

sign, symbolic regression and image ﬁlter design. Of-

ten high crossover rates were beneﬁcial and in all

cases a nonzero crossover rate performed better than

without crossover. However, they did not compare

their results with standard CGP so it still remains un-

clear whether crossover can signiﬁcantly outperform

mutational CGP.

Husa and Kalkreuth (2018) proposed the block

crossover and performed parameter sweeps with the

(1 + λ)-strategy versus a genetic algorithm using

block crossover on a suite of Boolean functions and

symbolic regression problems. The results of the

comparative study demonstrated that the traditional

(1+λ)-CGP algorithm cannot be stated as the univer-

sally predominant algorithm for CGP. The outcome of

the study gave signiﬁcant evidence that the (1 + λ)-

CGP cannot be considered as the most efﬁcient CGP

algorithm in the Boolean function domain, although

it seems to be often a good choice. The experiments

on the 1-resilient Boolean function demonstrated that

the (1 + λ)-CGP may indeed be signiﬁcantly inferior

to the other CGP algorithms. In this way, the outcome

of the study gave the ﬁrst evidence, that it is possi-

ble for crossover operators to outperform the standard

(1+λ) strategy. da Silva and Bernardino (2018) intro-

duced a new form of crossover for CGP when a single

genotype representation is used and the desired model

has multiple outputs. The proposed method modiﬁes

the standard evolutionary strategy commonly adopted

in CGP by combining the subgraphs of the best out-

puts of the parent and its offspring in order to generate

a new ﬁttest individual. Recently, Kalkreuth (2019)

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications

presented a study on two phenotypic mutation tech-

niques for CGP which are called Insertion and Dele-

tion. The subgraph crossover was used within a

(µ + λ)-strategy for a search performance comparison

between CGP and Evolving Graphs with Graph Pro-

gramming (EGGP) (Atkinson et al., 2018), another

state-of-the-art technique for evolving graphs. The

search performance of the (µ + λ)-strategy with sub-

graph crossover and Insertion and Deletion mutation

has been found very effective, when compared to the

search performance of EGGP.

3.1 The Need for a Comprehensive

Study

Husa and Kalkreuth (2018) pointed out that it is difﬁ-

cult to obtain well-performing parameter settings of

crossover algorithms and complicates fair compar-

isons in CGP. The reason for that is that crossover

based CGP algorithms require the conﬁguration of

more parameters as the (1 + λ)-CGP. Husa and

Kalkreuth (2018) also stated that merely relying on

the results of the meta-evolution can lead to ineffec-

tive parameter settings which result in unfair com-

parisons. Since the meta-evolution is done by an

evolutionary algorithm, it is possible that the meta-

evolution evolves towards a local optimum, which can

result in ineffective parameter settings. Therefore, a

solid and precise parameter tuning requires additional

steps which will be described in the following section.

Another reason for a new comparative study is that

a comparison with a bigger set of benchmark prob-

lems is needed to shed more light on the question of

crossover in CGP. Husa and Kalkreuth (2018) tested

only 8 problems in two major GP benchmark ﬁelds.

In a very recent survey about the status and the fu-

ture of CGP, Miller (2020) addressed open questions

in CGP. Regarding the open question of crossover in

CGP, Miller stated that crossover is still very underde-

veloped in CGP and that the subgraph crossover needs

to be comprehensively studied in the future.

3.2 Formulation of Hypotheses

With this work we want to shed more light into some

important dogmas, claims and research questions in

the ﬁeld of CGP, which we address with the formula-

tion and analysis of the following hypotheses:

Hypothesis 1 (Crossover). Crossover does not contribute

to the search performance of integer-based standard CGP.

Hypothesis 2 (Redundancy). Extremely large genotypes

perform most effectively in CGP.

Hypothesis 3 (Population size). Small populations perform

most effectively in CGP.

4 EXPERIMENTS

4.1 Experimental Setup

We performed experiments in the problem domain

of symbolic regression and Boolean function. To

evaluate the search performance of the tested algo-

rithms, we measured the number of ﬁtness evalu-

ations until the CGP algorithm terminated success-

fully (ﬁtness-evaluations-to-success) and the best ﬁt-

ness value, which was found after a predeﬁned num-

ber of generations (best-ﬁtness-of-run). In addition

to the mean values of the measurements, we calcu-

lated the standard deviation (SD) and the standard er-

ror of the mean (SEM). To classify the signiﬁcance

of our results, we used the Mann-Whitney-U-Test.

The mean values are denoted a

†

if the p-value is less

than the signiﬁcance level 0.05 and a

‡

if the p-value

is less than the signiﬁcance level 0.01 compared to

the (1 + 4)-CGP. Note that the mean values are only

denoted with the signiﬁcance level marker if the re-

sult of a certain algorithm is better than the result of

the (1 + 4)-CGP. We performed 100 independent runs

with different random seeds.

Parameter Tuning. To compare various CGP al-

gorithms fairly, we performed meta-evolution exper-

iments. Moreover, to ﬁnd effective parameter set-

tings for the respective algorithms we used a meta-

evolutionary algorithm, which evolved sets of effec-

tive settings. These sets were validated for the par-

ticular problem, and the best collection of parameters

was selected afterward. To determine efﬁcient param-

eter settings for fair comparisons, we utilized an ap-

proach to parameter tuning for CGP, which has been

used by Kaufmann and Kalkreuth (2017). The param-

eter tuning is done in three steps: In the ﬁrst place,

a set of well-performing parameter settings is deter-

mined by a meta evolutionary algorithm. The meta-

evolution is repeated several times, and the best set of

settings is used for further validation and ﬁne-tuning.

The determined set of parameters is validated manu-

ally, and the best conﬁguration is chosen for further

tuning. Finally, the best performing conﬁguration is

manually ﬁne-tuned. The algorithms which were used

in our study are listed in Table 1.

Table 1: List of the CGP algorithms.

Identiﬁer Description

(1 + 4)-CGP Traditional (1 + 4)-CGP algorithm

(1 + λ)-CGP Traditional (1 + λ)-CGP algorithm

(µ + λ)-CGP (µ + λ)-algorithm with subgraph crossover

Canonical-CGP Canonical genetic algorithm with

tournament selection and subgraph crossover

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming

4.2 Benchmarks

Symbolic Regression. We chose nine symbolic re-

gression problems from the work of McDermott et al.

(2012) for better GP benchmarks. The functions of

the problems are shown in Table 2. A training data set

U[a,b,c] refers to c uniform random samples drawn

from a to b inclusive and E[a,b,c] refers to a grid

of points evenly spaced with an interval of c, from

a to b inclusive. The Koza function set consisted of

eight mathematical functions (+, −, ∗, /, sin, cos,

ln(|n|), e

) and the Keijzer function set of ﬁve math-

ematical functions (+, ∗,

, −n ,

√

n ). The ﬁtness

of the individuals was represented by a cost func-

tion value. The cost function was deﬁned by the

sum of the absolute difference between the real func-

tion values and the values of an evaluated individual.

Let T =





p=1

be a training dataset of P random

points and f

ind

) the value of an evaluated indi-

vidual and f

ref

) the true function value. Let C :=

∑

p=1

ind

) − f

ref

)| be the cost function. When

the difference of all absolute values becomes less than

0.01, the algorithm is classiﬁed as converged. We

evaluated the more simple symbolic regression prob-

lems Koza 1, 2 & 3 with the ﬁtness-evaluation-to-

termination method. We deﬁned a maximum num-

ber of 8 ·10

ﬁtness evaluations for these three ex-

periments. The reason for choosing these three prob-

lems is the fact that we can ﬁnd an ideal solution more

likely on average than the other more complex bench-

mark problems, which require a huge amount of ﬁt-

ness evaluations to ﬁnd an ideal solution. The remain-

ing more complex problems were evaluated with the

best-ﬁtness-of-run method. We measured the best ﬁt-

ness after a budget of 10000 ﬁtness evaluations.

Table 2: Symbolic regression problems of the ﬁrst experi-

ment.

Problem Objective Function Vars Training Set Function Set

Koza-1 x

+ x

+ x 1 U[-1,1,20] Koza

Koza-2 x

−2x

+ x 1 U[-1,1,20] Koza

Koza-3 x

−2x

+ x

1 U[-1,1,20] Koza

Nguyen-4 x

+ x

+ x 1 U[-1,1,20] Koza

Nguyen-5 sin(x

)cos(x)−1 1 U[-1,1,20] Koza

Nguyen-6 sin(x) + sin(x + x

) 1 U[-1,1,20] Koza

Nguyen-7 ln(x + 1) + ln(x

+ 1) 1 U[0,2,20] Koza

Keijzer-6

∑

1/i 1 E[1,50,1] Keijzer

Pagie-1 1/(1 + x

−4

) + 1/(1 + y

−4

) 2 E[-5,5,0.4] Koza

Boolean Functions. In the Boolean domain, we

chose seven Even-Parity problems with n = 3 to 9

Boolean inputs. The goal was to ﬁnd a program that

produces the value of the Boolean even parity depend-

ing on the n independent inputs. The ﬁtness was rep-

resented by the number of ﬁtness cases for which the

candidate solution failed to generate the correct value

of the Even-Parity function. We also investigated

Table 3: Boolean function problems for the search perfor-

mance evaluation.

Problem Number of Inputs Number of Outputs

Parity-3 3 1

Parity-4 4 1

Parity-5 5 1

Parity-6 6 1

Parity-7 7 1

Parity-8 8 1

Parity-9 9 1

Adder 1-Bit 3 2

Adder 2-Bit 5 3

Subtractor 2-Bit 4 4

Multiplier 2-Bit 4 4

Mulitplier 3-Bit 6 6

Table 4: Parameter space explored by meta evolution for the

fundamental CGP algorithms.

(1 + 4)-CGP

number of nodes [10,4000]

mutation rate[%] [1,20]

(1 + λ)-CGP

λ [2,150]

number of nodes [10,4000]

mutation rate[%] [1,20]

Canonical-CGP

number of nodes [10,4000]

mutation rate[%] [1,20]

crossover rate[%] [10,100]

population size [10,1000]

tournament size [2,20]

(µ + λ)-CGP

µ [2,150]

λ [2,1000]

number of nodes [10,4000]

mutation rate[%] [1,20]

crossover rate[%] 20, 50, 70, 90

multiple output problems as the 2-Bit-Adder, 2-Bit-

Subtractor, 2-Bit- and 3-Bit-Multiplier. These sorts

of problems differ markedly from the parity problems,

and the multiple output multiplier has been proposed

as a suitable alternative. As a result, we receive a di-

verse set of problems in this domain.

To evaluate the ﬁtness of the individuals on the multi-

ple output problems, we deﬁned the ﬁtness value of an

individual as the number of different bits to the corre-

sponding truth table. When this number became zero,

the algorithm terminated successfully. We evaluated

the majority of the Boolean function problems with

the ﬁtness-evaluations-to-success method. However,

since the Even-Parity-8 & 9 and the 3-Bit-Multiplier

are very complex and computing-intensive problems,

we evaluated these problems with the best-ﬁtness-of-

run method with a budget of 10000 ﬁtness evalua-

tions. The set of benchmark problems with the cor-

responding number of inputs and outputs is shown in

Table 3.

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications

4.3 Meta Evolution

We tuned signiﬁcant parameters for all utilized CGP

algorithms on the set of problems which are shown in

Table 4. The ranges for the number of nodes are ori-

ented with the parameter settings found in Kaufmann

and Kalkreuth (2017). We used the meta-evolution

extension package of the Java Evolutionary Compu-

tation Research System (ECJ)

. For the meta-level,

we used a basic genetic algorithm (GA). The setting

of the meta-level GA is shown in Table 5.

Table 5: Conﬁguration of the meta-level GA.

Property Setting

Maximum generations 200

Population size 50

Mutation rate 1/n

Mutation tape gaussian mutation

Tournament selection size 4

Crossover rate 0.7

Crossover type intermediate recombination

Evaluation method best-ﬁtness-of-run

Number of trials 4

Tables 6 and 7 show the results of the meta-evolution.

The results in the Boolean domain reveal an effective

parametrization with small population size and an ex-

tremely high number of function nodes in the geno-

type. In the symbolic regression domain, the number

of function nodes is on average smaller compared to

the Boolean function domain. For the image operator

design problems, extremely high levels of redundancy

seem to be a good choice for the mutation-only CGP

algorithms. For the Canonical-CGP a smaller number

of function nodes seems to be an appropriate choice.

4.4 Search Performance Evaluation

Tables 8 and 9 show the results of the algorithm com-

parison in the Boolean domain. As visible, the re-

sults show no overall dominant and outstanding CGP

algorithm in this problem domain. However, on the

high order Parity-even problems, the (1+ 4)-CGP and

the (1+λ)-CGP seem to be a good choice. The re-

Table 6: Results of the meta evolution for the Boolean function problems.

Problem Algorithm Number Mutation Crossover µ λ Population Tournament

of nodes rate[%] rate[%] size size

Parity-3

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 2 – –

Canonical-CGP 2000 1 70 – – 10 2

(µ + λ)-CGP 3000 2 70 2 2 – –

Parity-4

(1 + 4)-CGP 1500 1 – – – – –

(1 + λ)-CGP 1500 1 – – 2 – –

Canonical-CGP 1500 1 70 – – 5 2

(µ + λ)-CGP 3000 1 90 2 2 – –

Parity-5

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 2 – –

Canonical-CGP 1000 1 70 – – 5 2

(µ + λ)-CGP 2000 1 70 2 2 – –

Parity-6

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 2 – –

Canonical-CGP 2000 1 70 – – 5 2

(µ + λ)-CGP 2000 1 90 4 2 – –

Parity-7

(1 + 4)-CGP 2500 1 – – – – –

(1 + λ)-CGP 2500 1 – – 2 – –

Canonical-CGP 2500 1 70 – – 5 2

(µ + λ)-CGP 2500 1 90 4 2 – –

Parity-8

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 2 – –

Canonical-CGP 2000 1 70 – – 5 2

(µ + λ)-CGP 2000 1 70 2 2 – –

Parity-9

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 2 – –

Canonical-CGP 2000 1 70 – – 5 2

(µ + λ)-CGP 2000 1 70 2 2 – –

Adder-1Bit

(1 + 4)-CGP 150 8 – – – – –

(1 + λ)-CGP 150 5 – – 2 – –

Canonical-CGP 200 6 50 – – 20 4

(µ + λ)-CGP 150 3 50 2 2 – –

Adder-2Bit

(1 + 4)-CGP 100 3 – – – – –

(1 + λ)-CGP 150 1 – – 2 – –

Canonical-CGP 150 2 90 – – 20 4

(µ + λ)-CGP 150 1 50 2 2 – –

Mult.-2Bit

(1 + 4)-CGP 1500 1 – – – – –

(1 + λ)-CGP 1500 1 – – 2 – –

Canonical-CGP 1000 1 70 – – 10 2

(µ + λ)-CGP 2000 1 70 2 8 – –

Mult.-3Bit

(1 + 4)-CGP 2000 2 – – – – –

(1 + λ)-CGP 1000 1 – – 2 – –

Canonical-CGP 1000 1 30 – – 5 2

(µ + λ)-CGP 2000 1 30 2 2 – –

Subtr.-2Bit

(1 + 4)-CGP 2000 1 – – – – –

(1 + λ)-CGP 2000 1 – – 8 – –

Canonical-CGP 1500 1 90 – – 5 2

(µ + λ)-CGP 1500 1 70 2 2 – –

https://cs.gmu.edu/ eclab/projects/ecj/

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming

Table 7: Results of the meta evolution for the symbolic regression problems.

Problem Algorithm Number Mutation Crossover µ λ Population Tournament

of nodes rate[%] rate[%] size size

Koza-1

(1 + 4)-CGP 10 20 – – – – –

(1 + λ)-CGP 10 20 – – 8 – –

Canonical-CGP 10 20 70 50 4

(µ + λ)-CGP 10 20 70 4 16 – –

Koza-2

(1 + 4)-CGP 10 20 – – – – –

(1 + λ)-CGP 10 20 – – 8 – –

Canonical-CGP 10 20 70 50 4

(µ + λ)-CGP 10 20 90 4 16 – –

Koza-3

(1 + 4)-CGP 10 20 – – – – –

(1 + λ)-CGP 10 20 – – 8 – –

Canonical-CGP 10 20 70 50 4

(µ + λ)-CGP 10 20 70 1 8 – –

Nguyen-4

(1 + 4)-CGP 120 10 – – – – –

(1 + λ)-CGP 100 10 – – 16 – –

Canonical-CGP 220 9 90 – – 50 5

(µ + λ)-CGP 200 1 70 10 200 – –

Nguyen-5

(1 + 4)-CGP 60 7 – – – – –

(1 + λ)-CGP 60 7 – – 16 – –

Canonical-CGP 100 6 25 – – 10 2

(µ + λ)-CGP 300 5 70 10 250 – –

Nguyen-6

(1 + 4)-CGP 100 10 – – – – –

(1 + λ)-CGP 100 10 – – 16 – –

Canonical-CGP 20 20 90 – – 50 7

(µ + λ)-CGP 300 1 70 10 250 – –

Nguyen-7

(1 + 4)-CGP 200 2 – – – – –

(1 + λ)-CGP 1000 2 – – 16 – –

Canonical-CGP 500 3 70 – – 250 7

(µ + λ)-CGP 200 10 90 10 200 – –

Keijzer-6

(1 + 4)-CGP 2000 3 – – – – –

(1 + λ)-CGP 2000 5 – – 16 – –

Canonical-CGP 100 5 70 – – 250 10

(µ + λ)-CGP 700 5 70 10 250 – –

Pagie-1

(1 + 4)-CGP 1500 7 – – – – –

(1 + λ)-CGP 1500 7 – – 16 – –

Canonical-CGP 500 5 90 – – 200 7

(µ + λ)-CGP 500 8 75 25 125 – –

Table 8: Results of the algorithm comparison for the Boolean function problem evaluated by the number of ﬁtness evaluations

(FE) to termination.

Problem Algorithm Mean FE SD SEM 1Q Median 3Q

Parity-Even-3

(1 +4)-CGP 3177 3417 ±343 1246 2136 3760

(1 +λ)-CGP 2495 2919 ±293 846 1534 2872

Canonical-CGP 3107 3070 ±307 1201 2104 3907

(µ +λ)-CGP 1565

‡

1517 ±152 602 1168 1892

Parity-Even-4

(1 +4)-CGP 15420 14152 ±1422 6292 10358 17726

(1 +λ)-CGP 16523 19168 ±1926 6095 11276 18557

Canonical-CGP 54967 47042 ±4727 24813 40612 71851

(µ +λ)-CGP 11135

‡

8447 ±845 5117 8527 14085

Parity-Even-5

(1 +4)-CGP 45542 33947 ±3411 21524 36834 61222

(1 +λ)-CGP 34375

‡

28146 ±2828 20685 27104 38941

Canonical-CGP 28413

‡

25538 ±2566 23388 19640 34876

(µ +λ)-CGP 43476 2055 ±1022 23814 36188 57182

Parity-Even-6

(1 +4)-CGP 199989 142915 ±14291 107418 163234 242573

(1 +λ)-CGP 118768

‡

73682 ±7368 65766 91577 156639

Canonical-CGP 242986 161762 ±16257 134518 200196 309346

(µ +λ)-CGP 110158

‡

75163 ±7516 63908 90676 135148

Parity-Even-7

(1 +4)-CGP 478055 301113 ±30111 268210 393362 605372

(1 +λ)-CGP 441857 328539 ±32853 226272 352254 545197

Canonical-CGP 631568 548180 ±54818 293613 453204 750792

(µ +λ)-CGP 358420

‡

246131 ±24613 189278 303988 451667

Adder-1Bit

(1 +4)-CGP 1895 1856 ±186 634 1252 2494

(1 +λ)-CGP 1415 1532 ±154 508 1057 1640

Canonical-CGP 2155 2018 ±202 882 1521 2907

(µ +λ)-CGP 1393

†

1311 ±132 496 954 1784

Adder-2Bit

(1 +4)-CGP 85667 84355 ±8478 29506 58650 110794

(1 +λ)-CGP 73417 58589 ±5888 33367 53654 94009

Canonical-CGP 225652 200384 ±20038 78247 158130 271717

(µ +λ)-CGP 68375 43229 ±5361 32998 64006 98052

Multiplier-2Bit

(1 +4)-CGP 11583 10469 ±1046 5020 8524 14498

(1 +λ)-CGP 17664 21664 ±2177 5400 9233 19262

Canonical-CGP 30489 25700 ±2582 13384 22696 22916

(µ +λ)-CGP 11055

‡

13281 ±1334 3635 6693 13220

Subtractor-2Bit

(1 +4)-CGP 11029 13975 ±13975 4642 6986 11878

(1 +λ)-CGP 8377

‡

9958 ±1000 2989 6111 9579

Canonical-CGP 35829 41822 ±4203 10346 20056 40698

(µ +λ)-CGP 15161 22388 ±2250 5291 9671 16705

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications

Table 9: Results of the algorithm comparison for the complex Booleans problems evaluated with the best-ﬁtness-of-run

method.

Problem Algorithm Mean SD SEM 1Q Median 3Q

Best Fitness

Parity-8

(1 + 4)-CGP 104, 24 8,50 ±0,86 99 105 110

(1 + λ)-CGP 92,9

‡

11,56 ±1,15 85 94,5 102,25

Canonical-CGP 113,28 5, 01 ±0, 50 110,75 114 117

(µ + λ)-CGP 107,03 7, 49 ±0, 74 103 108 112

Parity-9

(1 + 4)-CGP 230, 38 8,74 ±0,87 224,75 230 236

(1 + λ)-CGP 225,69

‡

11,45 ±1,14 220 228 234

Canonical-CGP 228,37 9, 02 ±0, 90 223,75 229 235, 25

(µ + λ)-CGP 226,63 13,71 ±1,37 220, 5 230 236

Multiplier-3Bit

(1 + 4)-CGP 46,39 7,57 ±0,75 41 47 51

(1 + λ)-CGP 42,31

‡

8,67 ±0,86 36 43 48

Canonical-CGP 52, 9 8,40 ±0,84 47 53 59

(µ + λ)-CGP 43,98

†

8,86 ±0,88 38 44 49

Table 10: Results for the algorithm comparison for the problems Koza 1, 2 & 3 evaluated by the number of ﬁtness evaluations

(FE) to termination.

Problem Algorithm Mean FE SD SEM 1Q Median 3Q Unﬁnished

runs

Koza-1

(1 +4)-CGP 8675635 16681422 ±1668142 441477 1814344 7045961 2

(1 +λ)-CGP 7370880

‡

17384354 ±1738435 204400 1050936 4294170 3

Canonical-CGP 663822

‡

838546 ±83854 135162 337950 710275 0

(µ +λ)-CGP 7780751

‡

15830735 ±1583073 197284 1830312 6318740 3

Koza-2

(1 +4)-CGP 8264426 19894512 ±1989451 150140 888884 4378756 6

(1 +λ)-CGP 8191549 20275790 ±2027579 94290 559028 4710848 1

Canonical-CGP 444118

‡

95000 ±286700 627550 29650 78800 0

(µ +λ)-CGP 5729778 11021660 ±1102166, 238156 1320880 5878696 1

Koza-3

(1 +4)-CGP 600153 1214527 ±121452 39076 177418 443038 0

(1 +λ)-CGP 753551 2535215 ±253521 29528 120368 431318 0

Canonical-CGP 32870

‡

57156 ±10435 2488 6700 32713 0

(µ +λ)-CGP 926857 3473467 ±347347 28548 121040 362180 0

Table 11: Results of the algorithm comparison algorithm for the symbolic regression problems evaluated with the best-ﬁtness-

of-run method.

Problem Algorithm Mean SD SEM 1Q Median 3Q

Best Fitness

Nguyen-4

(1 +4)-CGP 0,68 0,55 ±0,05 0,34 0, 58 0, 77

(1 +λ)-CGP 0, 61 0, 46 ±0,04 0, 35 0, 54 0, 74

Canonical-CGP 0,50

†

0,28 ±0,04 0,31 0, 47 0,60

(µ +λ)-CGP 0, 60

†

0,40 ±0,04 0,36 0, 54 0,76

Nguyen-5

(1 +4)-CGP 0,45 0,42 ±0,04 0,06 0, 32 0, 81

(1 +λ)-CGP 0, 39 0, 33 ±0,03 0, 08 0, 27 0, 63

Canonical-CGP 0,29

‡

0,27 ±0,03 0,05 0, 20 0,40

(µ +λ)-CGP 0, 28

‡

0,25 ±0,02 0,06 0, 19 0,45

Nguyen-6

(1 +4)-CGP 0,54 0,66 ±0,06 0,16 0, 29 0, 61

(1 +λ)-CGP 0, 50 0, 67 ±0,06 0, 15 0, 22 0, 50

Canonical-CGP 0,31

‡

0,31 ±0,03 0,15 0, 24 0,40

(µ +λ)-CGP 0, 61 0, 67 ±0,06 0, 16 0, 35 0, 67

Nguyen-7

(1 +4)-CGP 0,79 0,48 ±0,05 0,45 0, 67 1, 06

(1 +λ)-CGP 0, 71 0, 45 ±0,04 0, 44 0, 67 0, 76

Canonical-CGP 0,60

‡

0,35 ±0,03 0,36 0, 60 0,68

(µ +λ)-CGP 0, 62

‡

0,40 ±0,04 0,42 0, 63 0,68

Keijzer-6

(1 +4)-CGP 3,78 2,61 ±0,26 2,16 3, 24 4, 59

(1 +λ)-CGP 3, 38 2, 52 ±0,25 2, 41 3, 03 3,158

Canonical-CGP 2,81

†

1,13 ±0,11 1,78 2, 90 3,75

(µ +λ)-CGP 2, 88

†

1,09 ±0, 1 2,25 3, 14 3, 15

Pagie-1

(1 +4)-CGP 128, 18 48,19 ±4, 81 87,81 119,09 161,08

(1 +λ)-CGP 120, 75 44,95 ±4, 49 86,14 120,91 155,06

Canonical-CGP 98, 52

‡

50,57 ±5,08 59,04 85,31 130,04

(µ +λ)-CGP 99, 74

‡

41,246 ±4,12 65,32 95,79 131,76

sults of our experiments in the symbolic regression

domain are shown in Table 10 and 11. It is visi-

ble that the Canonical-CGP algorithm performs better

than the mutation-only CGP algorithms on all tested

problems. Furthermore, as visible (1 + 4)-, (1 + λ)-

and (µ + λ)-CGP reported runs in which no solution

was found.

4.5 Redundancy and Fitness Space

Analysis

We investigated the three symbolic regression prob-

lems Koza 1, 2 & 3 in more detail due to the com-

paratively low number of 10 function nodes which

has been determined with the parameter tuning. With

the experiments which are described in this subsec-

tion, we intended to investigate the role of continuous

and discrete ﬁtness spaces for the evolutionary search

CGP. We ﬁrst measured the search performance of

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming

Table 12: Conﬁguration of the (1 + 4)-CGP.

Property (1 + 4)-CGP

Maximum node count 10/20/50/100

Mutation rate [%] 20/10/8/6

Number of inputs 2

Number of outputs 1

Population size 5

Function set +, −, ∗, /

the three symbolic regression problems with genotype

lengths of 10, 20, 50, and 100 function nodes. The al-

gorithm conﬁguration for the experiments is shown in

Table 12. We performed 100 runs for each experiment

and measured the search performance by the number

of generations until the ideal solution was found. We

utilized the (1 + 4)-CGP algorithm. When this ﬁtness

function is used, the ﬁtness values can vary in a range

>=0

. Since the range of this ﬁtness function is not

ﬁxed, we evaluated the size of the ﬁtness space for

genotype lengths with 10, 20, 50 and 100 function

nodes by sampling 10

random genotypes. We eval-

uated the ﬁtness of each genotype and stored its fre-

quency in a hash map. Afterward, we used the size of

the hash map to conclude the magnitude of the space

of ﬁtness values. We also investigated all bench-

mark functions with discrete ﬁtness. In these types

of experiments, the ﬁtness function was similar to the

continuously ﬁtness function, with the exception that

the ﬁtness values were discretized within a range of

whole numbers from 0 to 20. In this way, the exper-

iments covered the investigation of the search perfor-

mance for different lengths of the genotype with con-

tinuous and discrete ﬁtness. The respective mutation

rates have been determined empirically.

Table 13: Number of ﬁtness values for various genotype

lenght for the problems Koza 1, 2 & 3.

Problem Number of Number of

function nodes ﬁtness values

Koza-1

10 15655

20 53695

50 172244

100 306242

Koza-2

10 15646

20 54133

50 173122

100 307665

Koza-3

10 15859

20 54823

50 173523

100 307412

Table 14 shows and Figure 4 illustrates the stylized

search performance behavior for various genotype

lengths when continuous ﬁtness is used. It is visible

that when the increase of the length of the genotype,

the search performance decreases. Table 13 shows the

results of the analysis of the ﬁtness space for the three

Koza problems. It can be seen that the size of the ﬁt-

ness space increases when the length of the genotype

is increased. Table 15 shows and Figure 15 illustrates

the search performance behavior for various genotype

lengths when discrete ﬁtness is used. In this case, it

Table 14: Results for the symbolic regression problems

Koza 1, 2 &¸ 3 when continuous ﬁtness is used.

Problem Number of Mean SD SEM Median

function nodes Fitness Evaluations

Koza-1

10 456547 749589 ±74958 47852

20 536685 814927 ±81493 56704

50 1373228 662682 ±66268 1586900

100 1407718 709660 ±70966 1934054

Koza-2

10 466726 667196 ±66719 123156

20 695164 768904 ±76890 281468

50 808779 800203 ±80020 525106

100 942653 877873 ±87787 709382

Koza-3

10 638451 ±75615 52473 259490

20 774234 812167 ±81216 476620

50 867915 826392 ±82639 540420

100 1154141 902271 ±90227 1852700

1e+05

3e+05

1e+06

10 20 50 100

Number of function nodes

Median number

of fitness evaluations

Koza−1

Koza−2

Koza−3

Figure 4: Stylized search performance behavior for various

genotype lengths when continuous ﬁtness is used.

Table 15: Results for the symbolic regression problems

Koza 1, 2 & 3 when discrete ﬁtness is used.

Problem Number of Mean SD SEM Median

function nodes ﬁtness evaluations

Koza-1

10 3694 3300 ±330 2840

20 3346 3870 ±387 2268

50 3300 4165 ±416 2114

100 2798 3812 ±381 1470

Koza-2

10 8589 8768 ±876 5984

20 5478 5518 ±552 3880

50 3076 5154 ±515 1414

100 1998 2469 ±246 1184

Koza-3

10 2500 2852 ±285 1808

20 2011 2238 ±223 1156

50 1212 1220 ±122 846

100 1145 1107 ±110 832

1000

3000

5000

10 20 50 100

Number of function nodes

Median number

of fitness evaluations

Koza−1

Koza−2

Koza−3

Figure 5: Stylized search performance behavior for various

genotype lengths when discrete ﬁtness is used.

is visible that increasing the length of the genotype

leads to an increase of the search performance.

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications

5 ANALYSIS OF HYPOTHESES

Hypothesis 1 (Crossover). Crossover does not

contribute to the search performance of integer-based

standard CGP.

The results of our experiments clearly show that two

algorithms that utilize subgraph crossover outper-

formed the (1 + 4)-CGP and (1 + λ)-CGP on various

problems in two different problem domains. Based

on these ﬁndings we can disprove that crossover can-

not contribute to the search performance of standard

integer-based CGP.

Hypothesis 2 (Redundancy). Extremely large geno-

types perform most effective in CGP.

Our redundancy and ﬁtness space analysis demon-

strated, that the rise in the genotype length increased

the size of the space of ﬁtness values rapidly on three

tested symbolic regression problems. Our experi-

ments also showed that the increase of the genotype

length led to a deterioration of the search performance

on all three tested problems. Therefore, we can state

that the dogma that extremely large genotypes per-

form most effectively in CGP cannot be generalized.

Hypothesis 3 (Population size). Small populations

perform most effective in CGP.

Our results in the symbolic regression domain show

that the claim that small population sizes generally

perform most effectively in CGP cannot be general-

ized. In the Boolean domain, the results and ﬁndings

of former studies seem to be coherent. However, it

has been demonstrated that this claim does not hold

for the symbolic regression and image operator de-

sign domain. Our results give clear evidence that the

use of medium and high population sizes can lead to a

signiﬁcantly better search performance than the tradi-

tional (1+4)-CGP in the symbolic regression domain.

6 DISCUSSION

Our experiments demonstrate that the subgraph

crossover can contribute to the search performance

by using a canonical GA or (µ + λ)-strategy. Further-

more, the results of our experiments indicate that the

predominance of the (1 + 4)-CGP and (1 + λ)-CGP

algorithms cannot be generalized in the Boolean

domain. However, for the high order parity problems,

the performance of the (1+λ)-CGP seems to be solid.

Overall, the results for the (1+λ)-CGP in the boolean

domain are coherent with previous studies (Miller and

Smith, 2006; Kaufmann and Kalkreuth, 2017). Our

experiments in the symbolic regression indicate that

the use of the subgraph crossover is beneﬁcial and can

contribute signiﬁcantly to the search performance in

these problem domains. Especially the performance

of the Canonical-CGP algorithm was superior to the

(1 + 4)-CGP on all tested problems in this problem

domain. Furthermore, our comparison of the whole

evolutionary process for the more simple benchmark

problems Koza 1, 2 & 3 revealed a big gap of the

search performance between the (1 + 4)-CGP and the

Canonical-CGP on these problems. Furthermore, the

Canonical-CGP ﬁnished all runs successfully within

the given budget of ﬁtness evaluations. The results

of our redundancy and ﬁtness space analysis indicate

that there might be a correlation between the size

of the space of ﬁtness values and the search perfor-

mance of a respective CGP algorithm. Since our

results also show that the parameter settings vary for

different problem domains it opens up the question,

which conditions or types of problems require bigger

or smaller population sizes and in which way the

subgraph crossover contributes to the evolutionary

search. The same question arises for the length of the

genotype. A preliminary assumption could be that

the ﬁtness landscape of certain problems requires

more exploration abilities to overcome local optima

or to explore a bigger and denser continuous ﬁtness

space. However, since the experiments of Miller and

Smith (2006) mostly focused on Boolean function

problems which are evaluated with discrete ﬁtness

more research is needed to investigate the behavior

of CGP algorithms in continuous ﬁtness spaces.

Our results also demonstrate that the ideal

parametrization of CGP depends on the prob-

lem. In this way our results conﬁrm the results of

a former study by Turner and Miller (2015) which

investigated different genotype lengths up to 100,000

nodes. In all cases there was an ideal number of

nodes which varied depending on the problem. The

ideal genotype lengths for the tested problems varied

between 50 and 3000 function nodes.

7 CONCLUSION AND FUTURE

WORK

A comprehensive study has been presented which

demonstrates that the use of subgraph crossover

is beneﬁcial on various problems in two different

problem domains. The results of the experiments

also clearly show that popular performance dogmas

of CGP are not coherent in another problem do-

main. However, our experiments demonstrate that

former ﬁndings of the effective use of low population

A Comprehensive Study on Subgraph Crossover in Cartesian Genetic Programming

sizes and extremely large genotypes are valid in the

Boolean domain. In the symbolic regression domain,

our experiments revealed that this dogma cannot be

generalized and showed that comparatively smaller

genotypes and bigger populations can also perform

effectively in CGP. This study paves the way for fur-

ther studies on the behavior of subgraph crossover

based CGP algorithms. Therefore, our future work

will focus on exploration analysis in ﬁtness and phe-

notype space of CGP. We will also focus on theoret-

ical work for crossover-based algorithms which will

be similar to the runtime analysis of Kalkreuth and

Droschinsky (2019) for mutational-only CGP.

REFERENCES

Atkinson, T., Plump, D., and Stepney, S. (2018). Evolving

graphs by graph programming. In Castelli, M., Sekan-

ina, L., Zhang, M., Cagnoni, S., and Garcia-Sanchez,

P., editors, EuroGP 2018: Proceedings of the 21st Eu-

ropean Conference on Genetic Programming, volume

10781 of LNCS, pages 35–51, Parma, Italy. Springer

Verlag.

Clegg, J., Walker, J. A., and Miller, J. F. (2007). A new

crossover technique for cartesian genetic program-

ming. In GECCO ’07: Proceedings of the 9th annual

conference on Genetic and evolutionary computation,

volume 2, pages 1580–1587, London. ACM Press.

Cramer, N. L. (1985). A representation for the adaptive

generation of simple sequential programs. In Grefen-

stette, J. J., editor, Proceedings of an International

Conference on Genetic Algorithms and the Applica-

tions, pages 183–187, Carnegie-Mellon University,

Pittsburgh, PA, USA.

da Silva, J. E. H. and Bernardino, H. (2018). Cartesian ge-

netic programming with crossover for designing com-

binational logic circuits. In 7th Brazilian Confer-

ence on Intelligent Systems, BRACIS 2018, S

ao Paulo,

Brazil, October 22-25, 2018, pages 145–150. IEEE

Computer Society.

Forsyth, R. (1981). BEAGLE a Darwinian approach to pat-

tern recognition. Kybernetes, 10(3):159–166.

Hicklin, J. (1986). Application of the genetic algorithm to

automatic program generation. Master’s thesis, Uni-

versity of Idaho.

Husa, J. and Kalkreuth, R. (2018). A comparative study on

crossover in cartesian genetic programming. In Eu-

roGP 2018: Proceedings of the 21st European Con-

ference on Genetic Programming, volume 10781 of

LNCS, pages 203–219, Parma, Italy. Springer Verlag.

Kalganova, T. (1997). Evolutionary approach to design

multiple-valued combinational circuits. In Proceed-

ings. of the 4th International conference on Applica-

tions of Computer Systems (ACS’97), pages 333–339,

Szczecin, Poland.

Kalkreuth, R. (2019). Two new mutation techniques for

cartesian genetic programming. In Proceedings of

the 11th International Joint Conference on Compu-

tational Intelligence, IJCCI 2019, Vienna, Austria,

September 17-19, 2019, pages 82–92. ScitePress.

Kalkreuth, R. and Droschinsky, A. (2019). On the time

complexity of simple cartesian genetic programming.

In Proceedings of the 11th International Joint Con-

ference on Computational Intelligence, IJCCI 2019,

Vienna, Austria, September 17-19, 2019, pages 172–

179. ScitePress.

Kalkreuth, R., Rudolph, G., and Droschinsky, A. (2017).

A new subgraph crossover for cartesian genetic pro-

gramming. In EuroGP 2017: Proceedings of the 20th

European Conference on Genetic Programming, vol-

ume 10196 of LNCS, pages 294–310, Amsterdam.

Springer Verlag.

Kaufmann, P. and Kalkreuth, R. (2017). Parametrizing

cartesian genetic programming: An empirical study.

In KI 2017: Advances in Artiﬁcial Intelligence - 40th

Annual German Conference on AI, Dortmund, Ger-

many, September 25-29, 2017, Proceedings, volume

10505 of Lecture Notes in Computer Science, pages

316–322. Springer.

Koza, J. (1990). Genetic Programming: A paradigm for ge-

netically breeding populations of computer programs

to solve problems. Technical Report STAN-CS-90-

1314, Dept. of Computer Science, Stanford Univer-

sity.

Koza, J. R. (1992). Genetic Programming: On the Pro-

gramming of Computers by Means of Natural Selec-

tion. MIT Press, Cambridge, MA, USA.

Koza, J. R. (1994). Genetic Programming II: Automatic

Discovery of Reusable Programs. MIT Press, Cam-

bridge Massachusetts.

McDermott, J., White, D. R., Luke, S., Manzoni, L.,

Castelli, M., Vanneschi, L., Jaskowski, W., Kraw-

iec, K., Harper, R., De Jong, K., and O’Reilly, U.-

M. (2012). Genetic programming needs better bench-

marks. In GECCO ’12: Proceedings of the fourteenth

international conference on Genetic and evolutionary

computation conference, pages 791–798, Philadel-

phia, Pennsylvania, USA. ACM.

Miller, J. F. (1999). An empirical study of the efﬁciency of

learning boolean functions using a cartesian genetic

programming approach. In Banzhaf, W., Daida, J.,

Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela, M.,

and Smith, R. E., editors, Proceedings of the Genetic

and Evolutionary Computation Conference, volume 2,

pages 1135–1142, Orlando, Florida, USA. Morgan

Kaufmann.

Miller, J. F. (2020). Cartesian genetic programming: its sta-

tus and future. Genetic Programming and Evolvable

Machines, 21(1):129–168.

Miller, J. F. and Smith, S. L. (2006). Redundancy and com-

putational efﬁciency in cartesian genetic program-

ming. IEEE Transactions on Evolutionary Computa-

tion, 10(2):167–174.

Miller, J. F., Thomson, P., and Fogarty, T. (1997). Design-

ing electronic circuits using evolutionary algorithms.

arithmetic circuits: A case study. In Genetic Algo-

rithms and Evolution Strategies in Engineering and

Computer Science, pages 105–131. Wiley.

Turner, A. J. (2012). Improving crossover techniques in a

genetic program. Master’s thesis, Department of Elec-

tronics, University of York.

Turner, A. J. and Miller, J. F. (2015). Neutral genetic drift:

an investigation using cartesian genetic programming.

Genet. Program. Evolvable Mach., 16(4):531–558.

ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications