
hidden-to-output weights to improve network 
performance at each step. This error signal is then 
estimated for each preceding layer, but the error 
signal attenuates. 
2.1  Extreme Learning Machine 
 
Figure 1: Simple ELM network. 
Huang et al’s (2004) principal contribution was to 
suggest that a set of random weights in the hidden 
layer could be used as a way to provide non-linear 
mapping between the input neurons and the output 
neurons. By having a large enough number of 
neurons in the hidden layer the algorithm can map a 
small number of input neurons to an arbitrarily large 
number of output neurons in a non-linear way. 
Training is performed only on the output neurons 
and performance similar to multi-layer feed-forward 
networks using back propagation achieved with 
much reduced training time.  
It is possible to train an ELM network as shown 
in Figure 1 by using back-propagation, but since the 
input-to-hidden weights are fixed, it is more efficient 
to estimate the output weights using the Moore-
Penrose pseudo-inverse (Huang et al., 2004). The 
weight matrix calculated is the best least square 
error fit for the output layer and in addition provide 
the smallest norm of weights, which is important for 
optimal generalisation performance (Bartlett, 1998). 
2.2 Cascade Correlation 
The Cascade Correlation algorithm (Cascor) 
(Fahlman and Lebiere, 1990) is a very powerful 
method for training artificial neural networks. 
Cascor is a constructive algorithm which begins 
training with a single input layer connected directly 
to the output layer. Neurons are added one at time to 
the network and are connected to all previous hidden 
and input neurons, producing a cascade network. 
When a new neuron is to be added to the network, 
all previous network weights are 'frozen'. The input 
weights of the neuron which is about to be added are 
then trained to maximise the correlation between 
that neuron's output and the remaining network 
error. The new neuron is then inserted into the 
network, and all weights connected to the output 
neurons are then trained to minimise the error 
function.  
Thus there are two training phases: the training 
of the hidden neuron weights, and the training of 
output weights. A previous extension to the cascor 
algorithm was by the use of the RPROP (Riedmiller, 
1994) algorithm to train the whole network 
(Treadgold and Gedeon, 1997) with ‘frozen’ weights 
represented by initially low learning rates. That 
model (Casper), was shown to produce more 
compact networks, which also generalise better than 
Cascor. 
2.3 Caveats 
We have said it is generally accepted that 3 layers of 
processing neurons is sufficient, but we must point 
out that this is not always true.  
For example, we know that in the field of 
petroleum engineering, in order to reproduce the 
fine-scale variability known to exist in core porosity/ 
permeability data, separate neural nets are used for 
porosity prediction, followed by another for 
permeability prediction. This produces better results 
than a single combined network (Wong, Taggart and 
Gedeon, 1995), and for hierarchical data (Gedeon 
and Kóczy, 1998).  
3 CASCADE CORRELATION 
AND EXTREME LEARNING 
MACHINE 
ELMs can be trained very quickly to solve 
classification problems. In general the larger the 
hidden layer the higher the learning capacity of the 
network. However the size of the hidden layer is 
critical to performance. Too small and the network 
will not have sufficient capacity to learn but too 
large, learning times will suffer and over fitting 
occurs.  
Finding the ideal size for the layer is 
problematic. If the number of neurons is greater or 
equal to the number of training patterns then the 
network will be able to achieve 100% learning. 
However this is not a useful conclusion as in most 
cases we would expect the network to achieve 
satisfactory learning with far less neurons than this. 
SIMULTECH2015-5thInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
272