SOLVING NUMBER SERIES

Architectural Properties of Successful Artiﬁcial Neural Networks

Marco Ragni and Andreas Klein

Center for Cognitive Science, Friedrichstr. 50, 79098 Freiburg, Germany

Keywords:

Number series, Artiﬁcial neural networks.

Abstract:

Any mathematical pattern can be the generation principle for number series. In contrast to most of the appli-

cation ﬁelds of artiﬁcial neural networks (ANN) a successful solution does not only require an approximation

of the underlying function but to correctly predict the exact next number. We propose a dynamic learning

approach and evaluate our method empirically on number series from the Online Encyclopedia of Integer Se-

quences. Finally, we investigate research questions about the performance of ANNs, structural properties, and

the adequate architecture of the ANN to deal successfully with number series.

1 INTRODUCTION

Number series can represent any mathematical func-

tion. Take, for instance

(i) 4, 11, 15, 26, 41, 67, 108, 175, . . .

(ii) 5, 6, 7, 8, 10, 11, 14, 15, . . .

The ﬁrst problem (i) represents the Fibonacci se-

ries

. The second problem (ii) represents two nested

series

one starting with 5 the other with 6. In num-

ber series principally any computable function can be

hidden and the set of operators is not restricted. Math-

ematically, number series are described by a function

g : N → R and can contain the periodic sinus function,

exponential functions and polynomial functions, and

even the digits of π – there is no restriction whatso-

ever on the underlying function. There is an Online

Encyclopedia of Integer Sequences (Sloane, 2003)

(OEIS)

available online with a database to down-

load and a Journal of Integer Sequences

. A small

subset of number series problems are used in intel-

ligence tests for determining mathematical pattern-

recognition capabilities. Inevitably, identifying pat-

terns requires learning capabilities. As artiﬁcial neu-

ral networks (ANN) are the silver bullet in Artiﬁcial

Intelligence for learning approaches – it seems only

consequent to test how far we can get with a ﬁrst ap-

proach in untackling the mystery of number series.

n+2

:= a

n+1

+ a

with a

:= 4 and a

:= 11

n+2

:= a

n+1

+ (a

n+1

− a

) + 1

http://oeis.org/

http://www.cs.uwaterloo.ca/journals/JIS/

Not all number series have a similar difﬁculty,

some problems are easy to solve while others are

nearly impossible to deal with. A simple measure

might be to identify the underlying function and to

count the number of operations. However, the number

of operations is typically a symbolic measure (Marr,

1982) and does not necessarily correspond with ar-

chitectural properties in the neural networks. Solving

number series pose an interesting problem as it com-

bines deductive and inductive reasoning. Deductive

reasoning is necessary to identify the rule between

the given numbers, while inductive reasoning is nec-

essary to predict the next number. So far neural net-

works have been used for approximations, but rarely

for predicting exact number – a necessary condition

for solving number series. Related to these prob-

lems is the prediction of time series (Martinetz et al.,

1993; Connor et al., 1994; Farmer and Sidorowich,

1987), but the numbers to be predicted are integers

(and not – as usual – reals), which can pose even

more difﬁculty, the numbers adhere to mathematical

laws and are not empirically observed. In this sense,

we are much closer to the research ﬁelds trying to

reproduce ”logical properties” (like propositional or

boolean reasoning (Franco, 2006)) and not time se-

ries analysis. Other related approaches use ANNs for

example for teaching multiplication tables by train-

ing on empirical ﬁndings of pupils performances to

identify and provide a guideline for pupils educational

materials (Tatuzov, 2006). In this case, the networks

used, simulate the pupils behavior and had to learn

multiplication skills. In contrast to this approach, we

224

Ragni M. and Klein A..

SOLVING NUMBER SERIES - Architectural Properties of Successful Artiﬁcial Neural Networks.

DOI: 10.5220/0003682302240229

In Proceedings of the International Conference on Neural Computation Theory and Applications (NCTA-2011), pages 224-229

ISBN: 978-989-8425-84-3

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

are not interested in modeling error trials.

In the following, we propose a method using arti-

ﬁcial neural networks and – to the best of our knowl-

edge – a novel dynamic approach to solve number se-

ries problems. The approach is evaluated against over

50.000 number series taken from the OEIS. To ana-

lyze the structural properties of successful networks

and to identify the best conﬁgurations, we will sys-

tematically vary the architecture of artiﬁcial neural

networks.

2 DYNAMIC LEARNING

For our attempt to solve number series with artiﬁcial

neural networks (McCulloch and Pitts, 1943; Russell

and Norvig, 2003), we use three-layered networks (cf.

Figure 1) with error back-propagation and the hyper-

bolic tangent as the activation function. If x is the

length of the largest number in the given part of a

number series, we use

(1)

as input function to project the interval of numbers in

the number series to the interval [−1, 1]. Analog, we

project the output of the networks back to the original

scale with the function

= n ∗ (10

) (2)

with the same x as for input and rounded the result to

get an integer value. We trained the networks on all

but the last given number, which they had to predict.

To achieve this goal we trained the networks rather

to extrapolate the given number to extend the number

series then to classify or interpolate it. The weights

were initially randomly assigned with small positive

and negative values between zero and one. A momen-

tum factor of 0.1 was used.

For our analysis, we systematically varied the

learning rate, the number of input nodes, and the num-

ber of nodes within the hidden layer. Taken together

the ANNs can be described by the formula f (i, h, l),

with the number of input nodes i, the number of nodes

within the hidden layer h, and the learning rate l.

These variations should allow for a comparison of the

different ANNs, provide us with enough variability

to successfully solve number series and to identify

successful architectures for ANNs solving these prob-

lems.

To generate patterns for training and testing a net-

work, we built patterns as tuples of training values

and one target value. The number of training values

of a pattern is equivalent to the number of input nodes

i of the network used. Starting with the ﬁrst number,

Figure 1: Architecture of an ANN with 3 input nodes and 3

hidden nodes we used to predict the next value of a subse-

quence of a number series. The ANNs were trained on three

successive numbers of a sequence a

to a

(cf. Table 1).

a subsequence of training values was shifted through

the number series. As corresponding target value, the

next number of the subsequence was used (cf. Ta-

ble 1).

Table 1: Pattern generation example for an ANN using the

dynamic approach with three input nodes and a number se-

ries of seven given numbers (n

, ..., n

) with the n

to pre-

dict. Four training patterns (p

, ..., p

) are used each with

three training values (v

, v

i+1

, v

i+2

) and one target value

t = n

i+3

Pattern n

Since the last given value of the number series

with length n remains as target value and we need

at least one training and one test pattern, the max-

imum length of a subsequence of training values is

n − 2. Hence, for a network conﬁguration with m in-

put nodes exactly n−m patterns were generated. Con-

sequently, the ﬁrst (n − m) − 1 patterns were used for

training, while the last one remained for testing and

thus predicting the last given number of the sequence.

Therefore, the last given number of the number series

– the main target value – was never used for training

the ANNs.

The training of the network was iterated on the

patterns for a various number of times. We system-

atically varied the iterations. After the training phase

of the network we tested if the network captured the

inherent pattern with the last pattern without a given

target value and compared it to the actual number of

the number series.

SOLVING NUMBER SERIES - Architectural Properties of Successful Artificial Neural Networks

225

Table 2: Results of an empirical analysis with 17 participants for 20 number series and the performance of 840 network

conﬁgurations with ﬁve variants of training iterations. A step–width of 0.125 for the learning rate was used.

ID Number Series Correct Incorrect No No. of solving conﬁgurations

responses responses response with iterations:

0.5k 1k 5k 10k 15k

E001 12,15,8,11,4,7,0,3 15 2 306 385 475 530 554

E002 148,84,52,36,28,24,22,21 12 2 3 555 637 670 689 683

E003 2,12,21,29,36,42,47,51 14 1 2 405 440 502 539 551

E004 2,3,5,9,17,33,65,129 13 1 3 3 7 61 192 218

E005 2,5,8,11,14,17,20,23 9 3 5 581 618 659 667 675

E006 2,5,9,19,37,75,149,299 6 4 7 0 0 0 0 0

E007 25,22,19,16,13,10,7,4 16 1 562 615 648 667 670

E008 28,33,31,36,34,39,37,42 17 121 183 315 332 342

E009 3,6,12,24,48,96,192,384 13 1 3 0 0 0 0 0

E010 3,7,15,31,63,127,255,511 12 3 2 0 0 0 0 0

E011 4,11,15,26,41,67,108,175 8 1 8 6 14 32 22 18

E012 5,6,7,8,10,11,14,15 10 1 6 83 91 65 114 202

E013 54,48,42,36,30,24,18,12 16 1 274 299 338 376 401

E014 6,8,5,7,4,6,3,5 16 1 134 169 198 219 213

E015 6,9,18,21,42,45,90,93 14 1 2 48 24 94 101 103

E016 7,10,9,12,11,14,13,16 14 3 111 202 380 404 409

E017 8,10,14,18,26,34,50,66 13 1 3 57 46 30 29 24

E018 8,12,10,16,12,20,14,24 17 37 75 41 51 71

E019 8,12,16,20,24,28,32,36 15 2 507 546 594 613 634

E020 9,20,6,17,3,14,0,11 16 1 255 305 397 406 411

2.1 Human Performance

In a previous experiment (Ragni and Klein, 2011)

with humans we tested 20 number series to evaluate

reasoning difﬁculty and to benchmark the results of

our ANN. All number series can be found in Table 2.

We only brieﬂy report the results. Each of the 17 par-

ticipants in this paper and pencil experiment received

each number series in a randomized order and had to

ﬁll in the last number of the series. The problems

differed in the underlying construction principle and

varied from simple additions, multiplications to com-

binations of those operations. Also nested number se-

ries like 5, 6, 7, 8, 10, 11, 14, 15 . . . (cp. (ii) from the

introduction) were used.

For our analysis with ANNs we varied the learn-

ing rate of the conﬁguration ( f (i, h, l)) between 0.125

and 0.875 with a step-width of 0.125 and later on

with a step-width of 0.1 ranging from 0.1 to 0.9

(0.125 ≤ l ≤ 0.875 (or 0.1 ≤ l ≤ 0.9)). The number of

nodes within the hidden layer was iterated from one to

twenty (1 ≤ h ≤ 20). The number of input nodes was

varied between one and six nodes (1 ≤ i ≤ 6). For all

conﬁgurations only one output node was used. The

number of training itations was varied in ﬁve steps

starting with as low as 500 iterations on each pattern

before testing. Then raised the number in four steps

over 1.000, 5.000 and 10.000, up to 15.000 iterations.

Table 3: Results of solving conﬁgurations with a step–width

of 0.1 for the learning rate out of 1080 conﬁgurations.

ID No. of solving conﬁgurations

with iterations:

0.5k 1k 5k 10k 15k

E001 410 496 626 669 715

E002 698 800 863 871 887

E003 516 575 663 708 733

E004 8 11 83 246 283

E005 723 800 843 854 882

E006 2 4 0 0 0

E007 711 756 828 845 861

E008 165 238 409 425 437

E009 0 0 0 0 0

E010 0 0 0 0 0

E011 6 16 45 32 29

E012 103 111 88 151 257

E013 334 354 437 477 506

E014 166 202 274 277 285

E015 61 48 127 127 132

E016 155 259 485 527 523

E017 78 64 41 25 25

E018 38 108 59 71 87

E019 646 667 743 775 797

E020 304 400 517 521 528

There are three number series which were not

solved by any ANN conﬁguration tested with a

NCTA 2011 - International Conference on Neural Computation Theory and Applications

226

0.125 step-with conﬁguration. All others could

be solved (cf. Table 2). With a smaller step–

width for the learning rate (0.1) the number series

2, 5, 9, 19, 37, 75, 149 . . . was solved too, but the two

other number series remain unsolved (cf. Table 3).

Comparisons between the different conﬁgurations

show a negligible difference between the number of

iterations (cf. Table 2 and 3). It seems that 1.000

iterations might be already a good approximation for

solving number series.

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

Figure 2: Results for ANN conﬁgurations with lr = 0.125,

1 ≤ i ≤ 6, and 10.000 training iterations applied to the 20

problems depicted in Table 2.

The input nodes divide the number of solutions

in three classes (cf. Figure 2): We get the worst

results for ANNs with six input nodes. One and

ﬁve input nodes span the next level, and ﬁnally two

to four input-nodes solve the problems considerably

well, with a peak for an ANN with two input nodes

and about 15 hidden nodes. In average, if we increase

the learning rate the number of solved tasks decreases

with an increasing number of hidden nodes. Networks

with only one input node showed a slightly different

behavior. For these networks, the number of solved

tasks increases with the learning rate and decreases

with learning rates above 0.625. Taken together this

ﬁrst analysis already shows that simple number series

can be adequately solved. An interesting result is that

the higher the number of input nodes is – the worser

are the results. We will see that this does not hold for

arbitrary number series from the OEIS.

2.2 The OEIS Number Series

To assess the power of our dynamic approach we

chose to use the number series from the OEIS

database. The OEIS database contains a total of

187.440 number series. We selected those number se-

ries which consist of at least 20 numbers (to be able

to apply our dynamic training method) with values

smaller or equal to ±1.000. This constraint is sat-

isﬁed by exactly 57.524 number series, which were

chosen to be used to benchmark our ANN approach

with dynamic learning.

Due to the large number of problems, we varied

the learning rate (0.125 ≤ l ≤ 0.875), the number of

nodes within the hidden layer (1 ≤ h ≤ 7) of the con-

ﬁguration ( f (i, h, l)). We used four, eight and twelve

input nodes and only one output node. For all conﬁg-

urations, the number of training itations was 1.000.

The best ANN, in the sense of solving the most

number series, with four input nodes (and two hid-

den nodes and a learning factor of .75 ) was able to

solve 12.764 number series problems and all 49 tested

ANN conﬁgurations with four input nodes together

were able to solve 26.951 of the number series. The

best ANN, in the sense of a minimal deviation over all

tasks, was achieved by a network with four nodes in

the input layer and two in the hidden layer and a learn-

ing rate of .375. The deviation of the prediction and

the actual value, summed over all tasks, was 859.144.

The deviation of the network, solving the most tasks

was 868.506.

The best ANN with 8 input nodes (and two hidden

nodes and a learning rate of .75) solved 13.591 num-

ber series tasks and all 49 settings together were able

to solve 31.914 of the total of 57.524 tasks. The min-

imal deviating network was the analog as in the case

of four nodes in the input layer. Its overall deviation

was 904.134. The deviation of the network, solving

the most tasks with eight nodes in the input layer, was

926.262.

Considering the conﬁgurations with 12 input

nodes, the most number series (14.021) were solved

by a conﬁguration with two nodes in the hidden layer

and a learning rate of .75. Over all number series this

conﬁguration deviated by 992.875. The minimal de-

viating conﬁguration in this case was the one with two

nodes in the hidden layer and a learning rate of .25.

Its deviation was 962.510. All conﬁgurations (with

i = 12) together, were able to solve 33.979 number

series correctly.

The conﬁgurations solving the most number se-

ries are summed up in Table 4, showing that with

an increasing number of input nodes the number of

solved problem increases, too. This is also true for

the number of solved number series over all tested

conﬁgurations. In contrast, as shown in Table 5, with

an increase of the input nodes the deviation of each

task summed over all tasks rises. Combining these

results, this means that with an increasing number of

input nodes more number series could be solved, but

the prediction of the unsolved problems deviate more

from the correct solution.

SOLVING NUMBER SERIES - Architectural Properties of Successful Artificial Neural Networks

227

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

(a) lr = 0.25.

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

(b) lr = 0.375.

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

(d) lr = 0.625.

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

(e) lr = 0.75.

solved number series

number of hidden nodes

1 in

2 in

3 in

4 in

5 in

6 in

(f) lr = 0.875.

Figure 3: Results for ANN conﬁgurations (lr= learning rate, in = input nodes) with 10.000 training iterations applied to the

20 problems depicted in Table 2.

To capture the performance of this approach, we

additionally analyzed deviations of ±5 and ±10 from

the correct solution. The conﬁgurations solving the

most problems and also the number of solved prob-

lems over all conﬁgurations rise drastically, which is

also shown in Table 4.

Table 4: Conﬁgurations with 4, 8 and 12 input nodes (in),

solving the most number series of the OEIS database. Re-

sults are shown for exact solutions, ±5, and ±10 around the

exact solution.

Input Exact ±5 ±10

4 12.764 with 39.003 with 44.848 with

h=2, lr=.75 h=2, lr=.75 h=5, lr=.875

8 13.591 with 39.065 with 45.064 with

h=2, lr=.75 h=2, lr=.625 h=4, lr=.875

12 14.021 with 39.052 with 45.086 with

h=2, lr=.75 h=2, lr=.375 h=2, lr=.75

Table 5: Conﬁgurations with the smallest deviation summed

over all number series of the OEIS database.

No. input nodes deviation and conﬁguration

4 859.144 with h = 2, lr=.375

8 904.134 with h = 2, lr=.375

12 962.510 with h = 2, lr=.25

Again, solving means, they were able to correctly

predict the next number of the series, even though

they had never trained it. Figure 4 depicts the results

of the ANNs based on the OEIS problems.

3 CONCLUSIONS

The problem of selecting an adequate neural network

architecture for a given problem has come recently

more and more to the research focus (G

omez et al.,

2009). Similar questions have been lately investi-

gated for Boolean functions (Franco, 2006). We fol-

lowed this research line in dealing with number series.

To identify the adequate structures, we systematically

varied input nodes, hidden nodes, and the learning

rate to compare the different artiﬁcial neural networks

structures. Using a dynamic learning approach, we

are able to predict 90% correctly (18 of 20) of sim-

ple number series (which typically appear in intelli-

gence tests). If we use the above speciﬁed subset of

the OEIS database as a benchmark we are still able to

solve about 59% of the number series correctly (about

33.979 of 57.524 number series, cp. Table 6). Re-

laxing the goal to compute the correct number and

allowing deviations of ±10 allows to capture 50.139

number series.

A ﬁrst conclusion we can draw is that the struc-

ture of the artiﬁcial neural networks can determine the

success of solving a number sequence – there is a sys-

tematic pattern between learning rate, input nodes and

the number of nodes within the hidden layer – show-

ing that 2-4 input nodes and about 5-6 hidden nodes

provide the best framework for solving typical intel-

ligence test like number series. If the number series

can be mathematically much more complex number

NCTA 2011 - International Conference on Neural Computation Theory and Applications

228

10000

11000

12000

13000

14000

15000

16000

solved number series

number of hidden nodes

4 in

8 in

12 in

(a) lr = 0.25.

10000

11000

12000

13000

14000

15000

16000

solved number series

number of hidden nodes

4 in

8 in

12 in

(b) lr = 0.75.

Figure 4: Results for ANN conﬁgurations with 1.000 training iterations applied to a subset of the OEIS database.

series, then eight input nodes seem much better than

four input nodes while the number of hidden nodes

remain stable.

Table 6: Number of solved number series of the OEIS

database over all 49 tested conﬁgurations. Results are

shown for exact solutions, ±5, and ±10 around the exact

solution.

No. input nodes exact ±5 ±10

4 26.951 44.689 48.176

8 31.914 46.349 49.580

12 33.434 47.076 49.931

The right artiﬁcial neural networks seem – as a

method – powerful enough to expand into one of the

still privileged realms of human reasoning – to iden-

tify patterns and solve number series successfully.

The structure of the used ANNs can provide useful

insights.

Future work will systematically investigate to

what extend other types of artiﬁcial neural networks

and approaches show a better performance than back

propagating ones used in this approach.

REFERENCES

Connor, J. T., Martin, R. D., and Atlas, L. E. (1994). Recur-

rent neural networks and robust time series prediction.

IEEE Transaction on Neural Networks, 51(2):240–

254.

Farmer, J. D. and Sidorowich, J. J. (1987). Predicting

chaotic time series. Phys. Rev. Lett., 59(8):845–848.

Franco, L. (2006). Generalization ability of boolean func-

tions implemented in feedforward neural networks.

Neurocomputing, 70:351–361.

omez, I., Franco, L., and J

erez, J. M. (2009). Neural net-

work architecture selection: Can function complexity

help ? Neural Processing Letters, 30(2):71–87.

Marr, D. (1982). Vision: A Computational Investigation

into the Human Representation and Processing of Vi-

sual Information. Freeman, New York.

Martinetz, T. M., Berkovich, S. G., and Schulten, K. J.

(1993). “Neuralgas” network for vector quantization

and its application to time-series prediction. IEEE

Transaction on Neural Networks, 4:558–569.

McCulloch, W. S. and Pitts, W. (1943). A logical calculus

of the ideas immanent in nervous activity. Bulletin of

Mathematical Biophysics, 5:115–133.

Ragni, M. and Klein, A. (2011). Predicting numbers: An

AI approach to solving number series. In Edelkamp,

S. and Bach, J., editors, KI-2011.

Russell, S. and Norvig, P. (2003). Artiﬁcial Intelligence: A

Modern Approach. Prentice Hall, 2nd edition.

Sloane, N. J. A. (2003). The on-line encyclopedia of inte-

ger sequences. Notices of the American Mathematical

Society, 50(8):912–915.

Tatuzov, A. L. (2006). Neural network models for teaching

multiplication table in primary school. In IJCNN ’06.

International Joint Conference on Neural Networks,

2006, pages 5212 – 5217, Vancouver, BC, Canada.

SOLVING NUMBER SERIES - Architectural Properties of Successful Artificial Neural Networks

229