Multiplicative Neural Network with Thresholds

Leonid Litinskii and Magomed Malsagov

Center for Optical Neural Technologies,

Scientific Research Institute for System Analysis of Russian Academy of Sciences,

Vavilov Str, Moscow, Russia

Keywords: Hopfield-type Network, Multiplicative Connections, Ground State.

Abstract: The memory of Hopfield-type neural nets is understood as the ground state of the net – a set of

configurations providing a global energy minimum. The use of thresholds allows good control over the

ground state. It is possible to build multiplicative networks with the degeneracy of the ground state

exceeding considerably the dimensionality of the problem (that is, the net memory can be much greater than

the dimensionality of the problem). The paper considers the potentials and limitations of the approach.

1 INTRODUCTION

Let us consider a Hopfield-type neural network with

a multiplicative connection matrix

(1 )

ij ij i j



 , , 1,..,ij p . Here



is the

Kronecker symbol,

is the space dimensionality,

real numbers

u are the coordinates of normalized

vector

( ,..., )

uuu= :

u

. The fixed point of

the net is a configuration whose binary coordinates

are the signs of coordinates of vector

( , ,..., ) is a fixed point;

sgn( ), 1,.., .

ss s

suip





The set of fixed points changes significantly if we

define the dynamics of the same matrix by using

thresholds

T , which are not only non-zero, but also

proportional to coordinates

u :



() , () ,

(1)sgn ()

ij ij i i

iijji

JfxMTgxu

Js T













 







(1)

Being functions of parameter

, multipliers ()

and

()

x themselves serve as free parameters of the

model.

()



is the

-th coordinate of configuration

()



s determining the state of the net at time



. The

arrangement of the set of fixed points of this sort of

net is more complicated and more interesting. It

turns out to be possible to determine fully the

configuration sets that bring the energy functional to

a global minimum. Such configurations are usually

called the ground state of the net (the term is

borrowed from physics). It is the ground state that is

regarded as the memory of a net: it is also the case

with the Hebb matrix and projection connection

matrix (Hertz et al., 1991).

Given model (1), it is possible to determine

analytically the dependence of the ground state on

external parameters

and

. It is possible

to control the ground state by varying external

parameters. In short, the findings are as follows.

Generally speaking, the whole set of

configurations

falls into sets of configurations

that are equally distant from vector

. Let us call

such sets

equidistant classes. It proves that only

equidistant classes can serve as the ground state of

the net: under particular conditions all

configurations of one class (and no other) provide a

global minimum to the energy functional. The

composition and the number of equidistant classes

are defined by vector

. The conditions that make

one or another class become the ground state are

determined by

()fx and ()

x .

The possibility to make the ground state multiply

degenerate by choosing vector

is a valuable

advantage of the approach. The ground state can

hold a great deal of configurations: the number of

configurations is a polynomial function of the

dimensionality

. That is to say, it becomes

possible to build networks of very large memory.

523

Litinskii L. and Malsagov M..

Multiplicative Neural Network with Thresholds.

DOI: 10.5220/0004629605230528

In Proceedings of the 5th International Joint Conference on Computational Intelligence (NCTA-2013), pages 523-528

ISBN: 978-989-8565-77-8

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

The disadvantage of the method is that not any set of

configuration can serve as the ground state. This

state can’t consist of fully random configurations

because the configurations must be equally distant

from vector

. Equidistant configurations are

located around vector

symmetrically. And that is

the limitation of the whole approach. How we can

overcome this restriction is considered at the end of

the paper.

In the next section we give the main results of

the work and their short explanations, and consider

one specific example. In the final section we analyze

the potentials and limitations of the approach.

2 MAIN RESULTS

The energy of state

of network (1) is equal to



,1 1

()~ 2

()( ) 2 ()( ),

ij i j i i

ij i

EJssTs

fx gx











 



u,s u,s

where

()u,s is the scalar product of

-dimensional

normalized vectors

and

()





u,s

. In

further consideration it will be better to seek maxima

() ()

Ess:

() ( )( ) 2 ( )( ) maxFfx gxsu,s u,s

(2)

2.1 Classes

Functional ()

s takes the same value for all

configurations the scalar products of which by

vector

have the same value. Let us introduce the

cosine of the angle between vectors

and





cos , / .wp su

When

runs over 2

possible configurations,

cos w

doesn’t necessarily takes 2

different values.

Let us number different values of the cosine in

descending order starting the numbering with 0:

01 1

cos cos ... 0 ... cos cos .

ww w w



 

(3)

The number

1t 

of different values of the cosine

does not exceed

. Let



stand for the class of

configurations

such that the cosine of the angle

between

and vector

is cos

w :









: , cos , 0,1,..., .

wk t   ssu

(4)

Clear that each configuration from class

 is the

same distance away from vector

, other

configurations being a different distance off

We see that functional

()

s (2) takes



values no matter what value

takes. All we have to

do to find the ground state is to find the greatest

among



values:

()

~ ( ) cos 2 cos , 0,1,.., .

kk k

Ffx w wk t



(5)

The number of classes



and their composition are

determined by vector

solely. With that,

are

determined by cosines



cos ~ ,

w su

for fixed

and

. We restrict our consideration to the case

when vectors

have only nonnegative coordinates.

The results can be easily extended to the case when

some of

u are negative (see below). We will assume

that

u are arranged in ascending order:

0....

uu u





It is easy to see that the sequence of cosines (3) is

symmetric about its middle point:

cos cos , , .

ktkktk

ww kt







If the number of different classes is even

(

12tl





), the cosines first go down to their

positive minimum

cos



, then they become

negative:

01 1

cos cos ... cos 0

cos cos ... cos cos .

ll t

ww w

ww ww



 

 

None of the cosines of the sequence is zero. On the

other hand, when

2tl



, one of the cosines (3) is

zero, and the sequence has the form:

11 0

cos ... cos cos 0

cos cos ... cos cos .

ll t

www

ww ww





  



In this case



-class configurations are orthogonal

to vector

By way of example let us build a few starting

classes



when the coordinates of vector

obey

the following rule:

1234

0...

uu uu u





with

2uu



. Class



holds configurations that

are nearest to vector

, so



e

, where

(1,1,..,1)



e . The corresponding cosine is equal to

cos /

wup



. Class

 consists of

IJCCI2013-InternationalJointConferenceonComputationalIntelligence

524

configurations that are a bit more distant from

than

 -class configurations. In our case it gives





( 1,1,...,1) 

, and

101

cos cos 2 /wwup .

The next class holds two configurations

(1, 1, 1, ..,1)



and

(1,1, 1, ..,1) , and

202

cos cos 2 /wwup.

Class

 also consists of two configurations

(1,1,1,..,1) and (1,1, 1,..,1) , and

3012

cos cos 2( ) /wwuup. Class

 holds one

configuration





(1, 1, 1,1,...,1) 

, and

402

cos cos 4 /wwup. So does class







(1,1,1, 1,...,1) 

504

cos cos 2 /wwup . And

so on. To distribute all configurations into classes

 , it is necessary to arrange in ascending order all

possible sums







where coefficient



can

be either 0 or 1. This task is similar to the number

partitioning problem (Mertens, 2001). In our case it

is not necessary to try to solve the problem in

general.

Another example. It is not difficult to describe

the distribution of configurations among equidistant

classes when vector

(1,1,...,1)ue . It is easy to

see that in this case the cosines take

1p  different

values:

cos 1 2 , 0,1,..., ,

wkpk p 

(6)

and the

-th class holds the configurations that have

exactly

negative coordinates. Let us introduce a

special notification for such classes:



()

: , 2 , 0,1,..., .

pkk p





   







sse

(7)

The number of configurations in class

()







Further one or another

-configuration will be

often used as vector

. Basing on classes

()



it is

simple to understand the structure of equidistant

classes in this case. Clear that both the number of

different cosines and their values remains the same

as with

ue

(see (6)). Coordinatewise

multiplication of all configurations from class

()



-configuration is used to obtain class

()



from

class

()



(7):



()

: , 2 , 0,1,..., .

kii

pkk p







   







ssσ

2.2 Functions f(x) and

(x)

Now let us consider the role of functions ()

and

()

x . Collection of



()

(5) is a family of

functions of

. Function ()

x , which surpasses

other functions at particular

, determines ground-

state class



Let the amplitude of function

()

x at point

be greater than amplitudes of other functions:

() ()

xFx kl. If ()

x and ()

x are

continuous functions, a small variation of

does

not change the superiority of

()

x over other

functions in the general case. Class

 keeps being

the ground state in a small vicinity of

. If

changes on, it becomes almost inevitable that

function

()

x intersects another function, say,

()

x . After that it is ()

x that starts exceeding

all other functions. At the point of intersection of

functions the ground state passes to class

 : the

transition of the ground state

 takes place.

Of course, the transition point is defined by forms of

functions

()

x , ()

x and cosines (3). However,

something about the way the ground state changes

can be understood from the general considerations.

Let us rearrange formula (5) by taking

()

x out

of the brackets and completing the expression in the

brackets to the square. Accurate to insignificant

items, the formula we get is





~()cos (),

()

() , 0,1,..,.

()

Ffx w x

pfx





 



(8)

Let us first assume that

() 0fx . In this event it is

necessary to maximize the modulus of the bracketed

expression with respect to

k to find the largest

max cos ( ) .





(9)

()



is negative, the maximum of modulus (9) is

ensured by the greatest value of the cosine, and the

solution of (9) is



. In this case, the ground state

is associated to class



. Conversely, if ()



positive, the maximum of modulus (9) is ensured by

the smallest value of the cosine. The solution of (9)



in this event, and the ground state is

attributed to class



 . So, when ()

x is

positive, either class



(if () 0gx ) or class



(if

MultiplicativeNeuralNetworkwithThresholds

525

() 0gx

) becomes the ground state.

Let us now examine what happens if

() 0fx



In this case it is necessary to minimize the modulus

of the bracketed expression (8) with respect to

k to

find the largest

. Generally speaking, to do it is

not at all difficult: it is just necessary to define

cos

w that is closest to the current value of ()



The corresponding class



will be the ground state

of the net. Let us look at Figure 1 to understand

collisions that occur in this case.

In Figure 1 the

-axis carries representative

values of

cos

w for 1, and 1kl l l . The

steadily decreasing curve represents function

()



c denotes the half sum of two successive values of

the cosine:

cos cos

,1,2,...,.

ckt







(10)

The value of

at which ()



 is indicated

() ().

kk k k

cx c







(11)

Let

belong to interval







initially:

1ll



 . It is easy to see that for any

from

this interval it is

cos

w that is nearest to ()



. So,

kl

is the solution of (9), and class

 serves as

the ground state of the net. Note that it is true for all

in the interval







. Variable

can grow

(fall) until it steps over



(



) and the ground

state passes to class

1l 

 (

1l 



), and so on.

Figure 1: Graphical solution of the problem (9): see body

of the paper.

We see that when

() 0fx

and

()



is a

continuous function, the changing of the ground

state changes its number by 1:

1kk

 . There is a

kind of continuity in its number changing with

parameter

. In principle, it is possible to organize

“discontinuous” control over ground-state “jumps”



 so that class numbers

and

would

differ by more than 1. For this purpose one should

use either discontinuous function

()

x , or the fact

that when

()

becomes positive, the ground state

passes from any class



to either class

 or



2.3 Example

To exemplify the results let us consider functions

()fx and ()

x of the following form (Litinskii,

1999):

() 1 2, () (1 ), 1.fx xgx q x q





In this case

()

x in (5) takes the form:

 

( ) cos 2 cos cos .

kkkk

x qpw xqpwqpw  

Competing functions

()

x are a family of straight

lines whose structure can be examined easily. As a

result, we get the following statement.

Theorem. When

grows indefinitely from the

initial value of 0, the ground state of a net passes

consecutively to classes

 (4):

max

012

...



    . Transition

1kk



occurs at critical point





max

/cos cos/2

,1,2,...,,

/cos cos

qp w w

xkk

qp w w









and as long as





xxx





, class

 is the ground

state of the net. Number

max

k of the last transition is

determined by the requirement that denominator

/cos cos

qp w w





 should be positive. If vector

is configuration, the ground-state configurations

are the only fixed points of the net.

The composition of classes

 is not detailed in

the theorem at all: classes consist of configurations

equally distant from vector

. After classes



are

defined with the aid of

, change of parameter

results in the ground state jumping from one class to

another. It is possible to show that independently of

vector

the first transition of the ground state



 occurs after ½:

1/2x  . Additionally, it

turns out that

max

k is always greater than

/2p

, and

max

x  . The use of factor

makes it possible to

regulate the total number of ground-state transitions.

IJCCI2013-InternationalJointConferenceonComputationalIntelligence

526

3 DISCUSSION

AND CONCLUSIONS

The findings from the previous paragraph allow us

to control the ground state of the net to a

considerable extent. Let us consider a

dimensional hyper-cube with edge length of 2 and

center at the origin of coordinates. Configurations

are located at cube vertexes. Symmetric directions in

the hyper-cube must be chosen as vector

. For

each

of that kind 2

-configurations are

distributed in symmetric sets with vector

being

the axis of symmetry. Each set like that forms one of

 classes. It can be turned into the ground state by

using the approach offered. Particularly, it is

possible to create the ground state from a very large

number of configurations. For example, the number

()



-class configurations (7) is equal to









!! !pkpk

Some coordinates of vector

can be zero. Let

0u  . Then the same class will comprise not only

configuration

(, ,..., )

sss , but also

configuration

' ( , ,..., )

sss . In other words,

vector

having a zero coordinate results in the

number of configurations doubling in each class



In this event the conclusive statement of Theorem is

more general and should read: if non-zero

coordinates of vector

are equal to each other, the

ground-state configurations are the only fixed points

of the net.

What possible consequences the approach can

have are not known yet. It is necessary to look

through all symmetric directions of

in the hyper-

cube and arrange cube vertexes with respect to

vertex-to-vector

distance in each case. It is

necessary to turn to methods of the group theory

here (Davis, 2007).

The disadvantage of the whole approach is that

configurations comprising the ground state can’t be

arbitrary. They are the same distance from vector

and, therefore, form a symmetric set. We hope that

the following tricks (or their combinations) can help

us to avoid total symmetry of the ground state. First,

we can use a few vectors like

into the connection

matrix and thresholds rather just one vector. For

example, let there be vector

(, ,..., )

vv vv ,

pv

, and let us consider a neural net similar to

(1):







()(1 )( ),

()( ),

(1)sgn () .

ij ij i j i j

iii

iijji

Jfx uuvv

Tgxuv

Js T



















 







(12)

If vectors

and

are configurations, it proves that

as long as

does not exceed the first transition

point

, the initial configurations

and

themselves are the ground state. If

x , a set

of configurations equally distant from both

and

will constitute the ground state. The net (12) will not

have other fixed points. Supported by a computer

simulation, this result arouses cautious optimism.

Second, it is possible to “separate” in (1)

thresholds

T and numbers

u used for building the

multiplicative matrix

. Let us use earlier-

introduced vector

and consider a neural net



()(1 ) , () ,

(1)sgn () .

ij ij i j i i

iijji

fx uu T gxv

sJsT







 







 







Tentative considerations show that its ground state is

formed by



-class configurations nearest to the

vector difference



. In other words, the trick

allows us to avoid the total symmetry of the ground

state. Of course the results need closer research.

The memory of the standard Hopfield model

with the Hebbian connection matrix and random and

independent patterns



()

s is well understood.

However, if the connection matrix is of the general

form, the memory of such a network is practically

unknown. In the same time an arbitrary connection

matrix

can be presented as a quasi-Hebbian one,

when using: i) orthogonal vectors

(μ)

u related to the

eigenvectors of the matrix

() ()

(1 )

ij ij i j

Juu











 



()+ ()

J~ u u

where

() ()

,...,



(μ)

u=( )

()





R , ,

(μ)(ν)

ii) or configuration vectors

(μ)

s with the weights r



(Kryzhanovsky, 2007):







()+()

J~ s s

()







R

Our multiplicative matrix

is only one term of the

quasi-Hebbian expansion. We hope that a detailed

analysis of the network with the connection matrix

will allow us to make headway on investigating

a more general case.

MultiplicativeNeuralNetworkwithThresholds

527

ACKNOWLEDGEMENTS

The work was supported by Russian Basic Research

Foundation (grants 12-07-00259 and 13-01-00504).

REFERENCES

Hertz, J., Krogh, A., Palmer, R., 1991. Introduction to the

Theory of Neural Computation. Massachusetts:

Addison-Wesley.

Mertens, S., 2001. A Physicist's Approach to Number

Partitioning. Theoret. Comput. Sci. 265, 79-108.

Litinskii, L. B., 1999. High-symmetry Hopfield-type

neural networks. Theoretical and Mathematical

Physics, 118: 107-127.

Davis, M. W., 2007. The geometry and topology of

Coxeter groups. Princeton: Princeton University Press.

Kryzhanovsky, B. V., 2007. Expansion of a matrix in terms

of external products of configuration vectors. Optical

Memory & Neural Networks (Information Optics), 16:

187-199.

IJCCI2013-InternationalJointConferenceonComputationalIntelligence

528