
 
set when the last training example has been 
considered in the fitness function of the algorithm 
and not anytime before. 
Before moving on to the invalidation mechanism 
of the cache some more details regarding the 
implementation of the cache hit mechanics are 
necessary from the software coding point of view. 
When the genetic programming algorithm starts 
computing the fitness of a solution, it processes each 
gene one after the other. Before executing the 
corresponding computation it checks for a valid 
cache entry by evaluating the cache gene validity 
flag. If the flag is set then the computation is spared 
and the values for all training cases regarding the 
specific gene are retrieved from the cache. The 
location of the cache frame is found in the cache 
index table in the location that corresponds to the 
index of the solution being examined. Along with 
the values of the training cases the algorithm 
retrieves the next gene location in the solution that 
needs to be examined next. From figure 1 assuming 
that node (3) fires a cache hit, the returning-gene 
information will point to node (5). The evaluation 
function resumes its operation from node (5) whose 
result may or may not be in the cache. If cache hit 
does not occur then the evaluating function performs 
its computations as normally. The cache offsets 
stored in the gene-index table are just accenting 
integers that keep incrementing until the allocated 
cache memory space is exhausted. At that point 
there are three replacement strategies that can be 
used: the least recently used replacement (LRU), the 
FIFO replacement and the genome simplification 
process. The LRU requires to keep an aging variable 
in memory that is reset every time a hit occurs while 
the FIFO strategy just resets the cache allocation 
index to its zero relative address and start storing 
cache calculations from the beginning. The third and 
more appealing strategy is more appropriate for 
genetic programming cache implementations since it 
flushes the cache and builds up an updated one 
through the application of a computational 
simplification process. Complex nested genetic 
programs are replaced with shorter blocks that are 
mathematically equivalent. This reduces the average 
length of the population and thus accelerates the 
search even more. The invalidation of the cache is 
discussed in more details in the following 
discussion. The pseudo code that should be added to 
the original fitness function of a genetic 
programming algorithm is trivial and is shown in 
Figure 2. 
function FitnessFunction_CACHED() 
 for all Individuals in the population 
   for all TrainingCases  
       Offset=0; 
       while Offset < length(Individual) 
          index= AUX_CACHE_INDEXES[Individual,Offset] 
          if(AUX_CACHE_VALID[index]==true)    // HIT 
              result = CACHE[index,TrainingCase] 
              Offset=AUX_CACHE_RETURN[Individual,Offset] 
          else         // No Cache hit 
              result =  . . . . .       // Normal computations  
       Offset =  . . . . . .                                                          
AUX_CACHE_INDEXES[Individual,Offset] =     
               CurrentIndex 
        CACHE[CurrentIndex,TrainingCase]=result  
        AUX_CACHE_RETURN[Individual,Offset]= Offset 
        if TrainingCase == TRAINING_CASES_SIZE 
              AUX_CACHE_VALID[CurrentIndex++] = true  
Figure 2: The pseudo code of the modified fitness function 
that incorporates the genetic programming cache. 
Besides the cache hit detection and data retrieval, 
in order to have a functional and sane cache 
mechanism, data invalidation must be enforced in a 
way that guarantees genes' value entries 
synchronization. Cached gene values are valid as 
long as the genes do not undergo any modification 
since their initial computation. In a genetic 
algorithm the genes are altered through genetic 
operations like crossover and mutation. This means 
that when two parents produce an offspring or an 
individual gets mutated then the cached values 
corresponding to the involved individuals must be 
checked for validity. Some cached values must be 
invalidated while others are not influenced by the 
way a specific genetic operation is performed. To 
detect which cache entries are invalid after a genetic 
operation and which are valid, the auxiliary cache 
table holding the returning nodes must be used. The 
cache invalidation is explained based on a two point 
crossover scheme which is slightly more complex 
than one point crossover. The concept can be easily 
transferred to multi point crossover operations and is 
very similar to the checks performed for invalidating 
the mutation operation. As mentioned before, the 
invalidation rule is very simple: when a cached gene 
is altered, its cached value is invalidated. The gene 
alteration is detected by comparing the gene's 
returning node to the crossover point and if it is 
smaller, then the entry is still valid. On the contrary, 
if the gene's returning node is higher than the 
crossover point then the cached value must be 
invalidated since there is definitely gene alteration in 
the cached gene. Figure 3 shows the procedure in 
more detail. Cached genes and their range (starting 
and ending gene offset) are shown in the parents' 
chromosomes. The crossover points define the way 
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
262