On the Capacity of Hopﬁeld Neural Networks as EDAs for Solving

Combinatorial Optimisation Problems

Kevin Swingler

Computing Science and Maths, University of Stirling, Stirling, FK9 4LA, Scotland, U.K.

Keywords:

Optimisation, Hopﬁeld Neural Networks, Estimation of Distribution Algorithms.

Abstract:

Multi-modal optimisation problems are characterised by the presence of either local sub-optimal points or a

number of equally optimal points. These local optima can be considered as point attractors for hill climbing

search algorithms. It is desirable to be able to model them either to avoid mistaking a local optimum for

a global one or to allow the discovery of multiple equally optimal solutions. Hopﬁeld neural networks are

capable of modelling a number of patterns as point attractors which are learned from known patterns. This

paper shows how a Hopﬁeld network can model a number of point attractors based on non-optimal samples

from an objective function. The resulting network is shown to be able to model and generate a number of local

optimal solutions up to a certain capacity. This capacity, and a method for extending it is studied.

1 INTRODUCTION

Optimisation based on an objective function (OF) in-

volves ﬁnding an input to the OF that maximises its

output

. In many cases the OF has a structure that can

be exploited to speed a search considerably. There

are many algorithms that guide a search by sampling

from an objective function. Estimation of Distribution

Algorithms (EDAs) (M

uhlenbein and Paaß, 1996),

(Shakya et al., 2012) build a model of the probability

of sub-patterns appearing in a good solution, and then

generate new solutions by sampling based on those

probabilities. Objective function models (Jin, 2005)

estimate the OF by ﬁtting a model to selected samples

and then search the model. Population based meth-

ods such as Genetic Algorithms (Goldberg, 1989),

(De Jong, 2006) and Particle Swarm Optimisation

(Kennedy, 2001) do not build an explicit model, but

maintain a population of high scoring points to guide

the sampling of new points.

Any OF of interest is likely to be multi-modal,

which means that it contains more than one local

maxima. An algorithm that guides the sampling of

the OF towards high points is likely to reach a lo-

cal maximum before it ﬁnds the global maximum.

A local maximum is a point which has no immedi-

ate neighbours in input space with a higher output. If

we assume that the OF surface is smooth (or can be

smoothed) then each local maximum sits at the top of

Optimisation might alternatively minimise a cost func-

tion, but in this paper we maximise.

a hill delimited on all sides by local minima. A hill

climb from any point on the slopes of this hill will

take you to the local maximum so local maxima can

be considered as attractor states for a hill climbing al-

gorithm. One way to manage the problem of local

maxima is to model these attractor states. This paper

shows how a neural network can be used to explicitly

model the attractor states of an OF and investigates

the number of attractors a model can carry.

The motivation for using a Hopﬁeld network is

that it can model multiple point attractors in high di-

mensional space. The dynamics of such networks

means they are guaranteed to move from any point

in input space to a point of local maximum. Hopﬁeld

networks have been thoroughly studied so if they are

capable of modelling the attractors of an OF, then this

large body of existing research may be immediately

applied. In this paper we present evidence that Hop-

ﬁeld networks can learn the attractor states of arbi-

trary OFs from a relatively small number of function

samples.

Paper Structure. This paper ﬁrst describes the con-

cepts of objective functions and attractors in objec-

tive function space. Section 2 describes the standard

Hopﬁeld neural network and section 3 introduces the

Hopﬁeld EDA (HEDA) and addresses its capacity to

store local maxima as attractor states. Sections 4 and

5 describe how to train and search a HEDA using two

different learning algorithms. Section 6 shows the re-

sults of some experiments investigating the capacity

of different HEDA models and section 7 offers some

conclusions and future directions.

152

Swingler K..

On the Capacity of Hopﬁeld Neural Networks as EDAs for Solving Combinatorial Optimisation Problems.

DOI: 10.5220/0004113901520157

In Proceedings of the 4th International Joint Conference on Computational Intelligence (ECTA-2012), pages 152-157

ISBN: 978-989-8565-33-4

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

1.1 The Problem Space

We consider binary patterns over c where:

c = c

,...,c

∈ {−1,1} (1)

We denote the objective function as:

f (c) = x x ∈R (2)

Local maxima in f (c) can be viewed as the peaks

of hills with monotonically increasing slopes. Each

of these hills is an attractor in the search space and a

simple hill climbing search from any point on its slope

will lead to the hill’s attractor point. The goal is to ﬁnd

one or more patterns in c that maximise f (c) and to

avoid local maxima. We will do this by modelling the

attractor states.

2 HOPFIELD NETWORKS

Hopﬁeld networks (Hopﬁeld, 1982) are able to store

patterns as point attractors in n dimensional binary

space. Traditionally, known patterns are loaded di-

rectly into the network (see the learning rule 5 below),

but in this paper we investigate the use of a Hopﬁeld

network to discover point attractors by sampling from

an objective function. A Hopﬁeld network is a neural

network consisting of n simple connected processing

units. The values the units take are represented by a

vector, u:

u = u

,...,u

∈ {−1,1} (3)

The processing units are connected by symmetric

weighted connections:

W = [w

i j

] (4)

where w

i j

is the strength of the connection from unit

i to unit j. Units are not connected to themselves,

i.e. w

= 0 and connections are symmetrical, i.e.

i j

= w

. The values of the weighted connections

deﬁne the point attractors and learning in a standard

Hopﬁeld network takes place by setting the pattern to

be learned using formula 6 and applying the Hebbian

weight update rule:

i j

 w

i j

+ u

∀i 6= j (5)

A single pattern, c is set by

∀i u

 c

(6)

where  indicates assignment. Pattern recall is per-

formed by allowing the network to settle to an attrac-

tor state determined by the values of its weights. The

unit update rule during settling is



∑

(7)

where a

is a temporary activation value, following

which the unit’s value is capped by a threshold, θ,

such that:





1 if a

> θ

−1 otherwise

(8)

In this paper, we will always use θ = 0. The pro-

cess of settling repeatedly uses the unit update rule of

formulae 7 and 8 for a randomly selected unit in the

network until no update produces a change in unit val-

ues. At that point, the network is said to have settled.

The symmetrical weights and zero self-connections

mean that the network is a Lyapunov function, which

guarantees that the network will settle to a ﬁxed point

from any starting point.

With the above restrictions in place, the network

has an energy function that determines the set of pos-

sible stable states into which it will settle. The energy

function is deﬁned as:

E = −

∑

i, j

i j

(9)

Settling the network, by formulae 7 and 8 pro-

duces a pattern corresponding to a local minimum of

E in equation 9. Hopﬁeld networks have been used to

solve optimization tasks such as the travelling sales-

man problem (Hopﬁeld and Tank, 1985) but weights

are set by an analysis of the problem rather than by

learning. In the next section, we show how random

patterns and an objective function can be used to train

a Hopﬁeld network as a search technique.

3 HOPFIELD EDAS

We deﬁne a Hopﬁeld EDA (HEDA) as an EDA imple-

mented by means of a Hopﬁeld neural network. Al-

though a traditional Hopﬁeld network contains only

second order connections, we will allow higher order

networks and denote a HEDA of order m by HEDA

We will concentrate mostly on second order models

based on the traditional Hopﬁeld network - HEDA

models.

3.1 HEDA Capacity

The capacity of a HEDA is the number of distinct at-

tractors it can model. As each attractor is a single lo-

cal maximum, the capacity of the network determines

the number of local maxima a HEDA can absorb on

its way to the global maximum.

OntheCapacityofHopfieldNeuralNetworksasEDAsforSolvingCombinatorialOptimisationProblems

153

There is a limit on the number of attractor states

a Hopﬁeld network can represent. For random pat-

terns, (McEliece et al., 1987) states that the capacity

of such a network is n/(4 ln n) where n is the number

of elements in the model.

(Storkey and Valabregue, 1999) suggest an alter-

native to the Hebbian learning rule that increases the

capacity of a Hopﬁeld network. This new learning

rule can be used to increase the number of attractors

in a HEDA

and so increase the number of local max-

ima it is able to model. A Hopﬁeld network of order

2 trained with Storkey’s learning rule has a capacity

of n/

√

2lnn.

The Hopﬁeld approach described for HEDA

can

be extended to higher orders, with a corresponding

increase in storage capacity, along with an associ-

ated exponential increase in processing time. (Kub-

ota, 2007) states that the capacity of order m associa-

tive memories is O(n

/lnn).

4 TRAINING A HEDA

In this section we describe a method for training a

HEDA

. The principles apply equally to HEDAs of

higher order. During learning, candidate solutions are

generated randomly one at a time. Each candidate so-

lution is evaluated using the objective function and the

result is used as a learning rate in the Hebbian weight

update rule (see update rule 10). Consequently, each

pattern is learned with a different strength, which re-

ﬂects its quality as a solution.

4.1 The New Weight Update Rule

Hopﬁeld networks have a limited capacity for storing

patterns. If a number of patterns greater than this ca-

pacity are learned, patterns interfere with each other

producing spurious states, which are a combination of

more than one pattern. To learn the point attractors of

local optima without ever sampling those points, we

need to create spurious states that are a combination

of lower points. We do this by over-ﬁlling a Hop-

ﬁeld network with samples and introducing a strength

of learning so that higher scoring patterns contribute

more to the new spurious states. This yields a simple

modiﬁcation to the Hebbian rule:

i j

 w

i j

+ f (c)u

(10)

where f (c) is the objective function. This has the

effect of learning high scoring second order sub-

patterns more than lower scoring ones. Note that due

to the symmetry of the weight connections, each at-

tractor has an associated inverse pattern that is also an

attractor. The means that both the pattern and its in-

verse may need to be scored to tell the solutions apart

from their inverse twins.

4.2 The Learning Algorithm

The learning algorithm proceeds as follows:

1. Set up a Hopﬁeld network with W

i j

=0 for all i, j

2. Repeat the following until one or more stopping

criteria are met

(a) Generate a random pattern, c, where each c

has

an equal probability of being set to 1 or -1

(b) Calculate f (c) by equation 2

(d) Update the weight matrix W using the learning

rule in formula 10

Stop when a pattern of required quality has been

found or when the attractor states become stable or

the network reaches capacity.

5 SEARCHING A HEDA MODEL

Once a number of solutions have been found, the net-

work will have a number of local attractors. By pre-

senting a new pattern and allowing the network to set-

tle, the closest solution (or locally optimal solution)

will be found. This settling does not require f (c) to be

evaluated. In cases where a partial solution is already

known, the HEDA may be used to ﬁnd the closest lo-

cal optimum. Where the global optimum is required,

the model needs to be searched. This is quite fast be-

cause the objective function does not have to be eval-

uated very often. The search could be carried out in

a random fashion, picking points and settling to their

attractor states until an acceptable score is produced,

or using a more sophisticated search method such as

simulated annealing (Hertz et al., 1991).

5.1 Improving Capacity

Storkey (Storkey and Valabregue, 1999) introduced

a new learning rule for Hopﬁeld networks that in-

creased the capacity of a network compared to using

the Hebbian rule. The new weight update rule is:

i j

 w

t−1

i j

−

i j

(11)

where

i j



∑

k6=i, j

t−1

(12)

IJCCI2012-InternationalJointConferenceonComputationalIntelligence

154

and w

t−1

i j

is the weight at the previous time step.

The new terms, h

and h

i j

have the effect of cre-

ating a local ﬁeld around w

i j

that reduces the lower

order noise brought about by the interaction of differ-

ent attractors.

To use this learning rule in a HEDA, we make the

following alterations to the update rules:

i j

 w

t−1

i j

−u

i j

) f (p) (13)

and

i j



∑

k6=i, j

t−1

β (14)

where β < 1 is a discount parameter that controls how

much damping is applied to the learning rule.

6 EXPERIMENTAL RESULTS

The following experimental results test the new learn-

ing rules. We ﬁrst introduce the objective functions

used and then describe the experiments.

6.1 Experimental Objective Functions

We use two types of objective function: a set of dis-

tinct target patterns and a concept that contains depen-

dencies of order 2.

In the ﬁrst case, the set of target patterns are de-

noted as the set t:

t = {t

,...,t

} (15)

We then deﬁne the objective function as an inverse

normalised weighted Hamming distance between c

and each target pattern t

in t as.

f (c|t

) = 1 −

∑

(16)

where t

is element i of target j and δ

is the Kro-

necker delta function between pattern element i in t

and its equivalent in c. We take the score of a single

pattern to be the maximal score of all the members of

the target set.

f (c|t) = max

j=1...s

( f (c|t

)) (17)

6.2 Testing HEDA

Capacity by

Learning and Searching

This set of experiments compares the capacity of a

normally trained Hopﬁeld network with the search ca-

pacity of a HEDA

. We will compare two learning

rules (Hebbian and Storkey). The experiments are re-

peated many times, all using randomly generated tar-

get patterns where each element has an equal proba-

bility of being +1 or -1.

6.2.1 Experiment 1 - Hebbian HEDA

Capacity

In experiment 1 we compare Hebbian trained Hop-

ﬁeld networks with their equivalent HEDA

models.

The aim is to discover whether or not the HEDA

model can obtain the capacity of the Hopﬁeld net-

work. Hopﬁeld networks were trained on patterns us-

ing standard Hebbian learning, with one pattern at a

time being added until the network’s capacity was ex-

ceeded. At this point, the learned patterns were set to

be the targets for the HEDA

search and the network’s

weights reset.

100 repeated trials were made training HEDA

networks ranging in size from 10 to 100 units in steps

of 5. For each trial, the capacity of the trained net-

work, the number of those patterns discovered by ran-

dom sampling and the time taken to ﬁnd them all was

recorded.

Results. Regardless of the capacity or size of the

Hopﬁeld network, the HEDA

search was always able

to discover every pattern learned during the capacity

ﬁlling stage of the test. From this, we can conclude

that the capacity of a HEDA

for storing local optima

is the same as the capacity for the equivalent Hopﬁeld

network when searching for a set of random targets.

Figure 1 shows the relationship between Hopﬁeld

network size and capacity. The spread of capacity val-

ues is wide - varying with the level of interdependence

between the random patterns. The chart shows the

range and the inter-quartile range of capacity for each

network size.

Capacity is linear with network size in terms of

units, but the number of weights in a network in-

creases quadratically with capacity.

Figure 1: The learned capacity of HEDA

models of various

sizes across 100 trials.

OntheCapacityofHopfieldNeuralNetworksasEDAsforSolvingCombinatorialOptimisationProblems

155

Figure 2 shows the number of samples required to

ﬁnd all patterns against network capacity. By curve

ﬁtting to the data shown, we ﬁnd that the number of

samples required to ﬁnd all solutions is quadratic with

number of solutions.

In fact, as the search space grows, the number

of iterations required to fully characterise the search

space, as a proportion of the size of search space di-

minishes exponentially. For networks of size 100, the

search space has 2

100

possible states and the HEDA

is able to ﬁnd all of the targets in an average of

around 355,000 samples. That is a sample consisting

of 2.8

−25

of all possible patterns. In this modelling

exercise, no evolution towards a target took place; the

learning took place using random sampling.

Figure 2: The number of trials needed to ﬁnd all local op-

tima in a Hopﬁeld network ﬁlled to capacity, plotted against

the number of patterns to ﬁnd.

6.2.2 Experiment 2 - Storkey HEDA

Capacity

In experiment 2 we repeat experiment 1 but use the

Storkey learning rule rather than the Hebbian version.

The experimental procedure is the same as that de-

scribed above, except that we only have samples from

networks up to size 60, as the process for larger net-

works is very slow.

Results. As with the Hebbian learning, Storkey

trained HEDA

search was always able to discover

every pattern learned during the capacity ﬁlling stage

of the test. This shows that the improved learning rule

will deliver the increased capacity for capturing local

optima that we sought. The cost of this capacity is a

far slower learning algorithm, however.

Figure 3 shows the relationship between Hopﬁeld

network size and capacity for the Storkey trained net-

work.

Figure 4 shows the number of samples required

to ﬁnd all patterns against network size when using

the Storkey rule. Again, we see that search iterations

increase quadratically with network capacity.

Figure 3: The learned capacity of Storkey trained HEDA

models.

Figure 4: The number of trials needed to ﬁnd all local

optima in a Hopﬁeld network ﬁlled to capacity using the

Storkey learning rule, plotted against the number of patterns

to ﬁnd.

6.3 Experiments with Second Order

Patterns

The concept of vertical symmetry in a pattern involves

a second order rule. It is not possible to score the con-

tribution of pattern elements in isolation - each one

must be considered with its mirror partner. By design-

ing an objective function that scores a pattern in terms

of its symmetry, we show how a HEDA

can learn a

second order concept and so generate many patterns

that score perfectly against this concept, despite hav-

ing never seen any such patterns during training.

Patterns were generated in a 6 x 6 image of bi-

nary pixels. There are 2

(68,719,476,736) possible

patterns in such a matrix. Of those, there will be one

vertically symmetrical pattern for every possible pat-

tern in one half of the image. There are 2

(262,144)

such half patterns, representing 0.00038% of the total

number of possible patterns.

In this trial the network was allowed to learn un-

til ten different symmetrical patterns had been found.

At this point, the learning process was terminated and

the network was tested with a set of local searches de-

signed to count the number of attractor states learned.

IJCCI2012-InternationalJointConferenceonComputationalIntelligence

156

Results. Across 50 trials, the 36 node HEDA took

an average of 9932 pattern evaluations before it ter-

minated having found 10 perfect scoring patterns.

A record of the samples made showed that none of

the randomly generated patterns used during train-

ing gained a perfect score, so the network was only

trained on less than perfect patterns. The average

trained model’s capacity was found to be 132, which

gives an average of 75 OF evaluations per pattern

found. Figure 5 shows two examples of random start-

ing patterns and their associated attractor states.

Figure 5: Two examples of ﬁxed point attractors of the

HEDA

trained on the second order concept of symmetry.

The left hand column shows random starting points and the

right hand column shows the associated point attractor state.

7 CONCLUSIONS

It is possible to adapt both the Hebbian and Storkey

learning rules for Hopﬁeld networks to allow them to

learn the attractor states corresponding to multiple lo-

cal maxima based on random samples from an objec-

tive function. We have experimentally shown that the

capacity of these networks is at least equal to the ca-

pacity of a Hopﬁeld network trained directly on the

attractor points.

Networks trained in this way are able to ﬁnd a set

of attractors by undirected sampling - that is with no

evolution of solutions - in a number of samples that is

a very small fraction of the size of the search space.

In a second order network, as network size, n

varies, we have seen that search space grows expo-

nentially with n, the number of local optima that can

be stored grows linearly with n and the time to ﬁnd all

local optima grows quadratically with n.

Future work will address a number of area includ-

ing higher order networks and networks of variable

order; an evolutionary approach to training the net-

works when the number of local optima is higher than

the capacity of the network; the effects of spurious at-

tractors; the use of the Energy function of equation 9

as a proxy for objective function evaluations; and the

smoothing of noisy landscapes. Hopﬁeld networks

have been extensively studied so there is a wealth of

research on which to draw during future work.

ACKNOWLEDGEMENTS

Thank you David Cairns and Leslie Smith for your

helpful comments on earlier versions of this paper.

REFERENCES

De Jong, K. (2006). Evolutionary computation : a uniﬁed

approach. MIT Press, Cambridge, Mass.

Goldberg, D. E. (1989). Genetic Algorithms in Search, Op-

timization, and Machine Learning. Addison-Wesley

Professional, 1 edition.

Hertz, J., Krogh, A., and Palmer, R. G. (1991). Introduc-

tion to the Theory of Neural Computation. Addison-

Wesley, New York.

Hopﬁeld, J. J. (1982). Neural networks and physical sys-

tems with emergent collective computational abili-

ties. Proceedings of the National Academy of Sciences

USA, 79(8):2554–2558.

Hopﬁeld, J. J. and Tank, D. W. (1985). Neural computa-

tion of decisions in optimization problems. Biological

Cybernetics, 52:141–152.

Jin, Y. (2005). A comprehensive survey of ﬁtness approxi-

mation in evolutionary computation. Soft Computing

- A Fusion of Foundations, Methodologies and Appli-

cations, 9:3–12.

Kennedy, J. (2001). Swarm intelligence. Morgan Kaufmann

Publishers, San Francisco.

Kubota, T. (2007). A higher order associative memory with

mcculloch-pitts neurons and plastic synapses. In Neu-

ral Networks, 2007. IJCNN 2007. International Joint

Conference on, pages 1982 –1989.

McEliece, R., Posner, E., Rodemich, E., and Venkatesh,

S. (1987). The capacity of the hopﬁeld associative

memory. Information Theory, IEEE Transactions on,

33(4):461 – 482.

uhlenbein, H. and Paaß, G. (1996). From recombination

of genes to the estimation of distributions i. binary pa-

rameters. In Voigt, H.-M., Ebeling, W., Rechenberg,

I., and Schwefel, H.-P., editors, Parallel Problem Solv-

ing from Nature PPSN IV, volume 1141 of Lecture

Notes in Computer Science, pages 178–187. Springer

Berlin / Heidelberg.

Shakya, S., McCall, J., Brownlee, A., and Owusu, G.

(2012). Deum - distribution estimation using markov

networks. In Shakya, S. and Santana, R., editors,

Markov Networks in Evolutionary Computation, vol-

ume 14 of Adaptation, Learning, and Optimization,

pages 55–71. Springer Berlin Heidelberg.

Storkey, A. J. and Valabregue, R. (1999). The basins of at-

traction of a new hopﬁeld learning rule. Neural Netw.,

12(6):869–876.

OntheCapacityofHopfieldNeuralNetworksasEDAsforSolvingCombinatorialOptimisationProblems

157