Combining Selective-Presentation

and Selective-Learning-Rate Approaches for

Neural Network Forecasting of Stock Markets

Kazuhiro Kohara

1

1

Department of Electrical, Electronics and Computer Engineering

Chiba Institute of Technology, 2-17-1, Tsudanuma, Narashino, Chiba, 275-0016, Japan

Abstract. We have investigated selective learning techniques for improving the

ability of back-propagation neural networks to predict large changes. We previ-

ously proposed the selective-presentation approach, in which the training data

corresponding to large changes in the prediction-target time series are presented

more often, and selective-learning-rate approach, in which the learning rate for

training data corresponding to small changes is reduced. This paper proposes

combining these two approaches to achieve fine-tuned and step-by-step selec-

tive learning of neural networks according to the degree of change. Daily stock

prices are predicted as a noisy real-world problem. Combining these two ap-

proaches further improved the performance.

1 Introduction

Prediction using back-propagation neural networks has been extensively investigated

(e.g., [1-5]), and various attempts have been made to apply neural networks to finan-

cial market prediction (e.g., [6-16]), electricity load forecasting (e.g., [17]) and other

areas. In the usual approach, all training data are equally presented to a neural net-

work (i.e., presented in each cycle) and the learning rates are equal for all the training

data independently of the size of the changes in the prediction-target time series. Also,

network learning is usually stopped at the point of minimal mean squared error be-

tween the network’s outputs and the desired outputs.

Generally, the ability to predict large changes is more important than the ability to

predict small changes, as we mentioned in the previous paper [12]. When all training

data are presented equally with an equal learning rate, the BPNN will learn the small

and large changes equally well, so it cannot learn the large changes more effectively.

We have investigated selective learning techniques for improving the ability of neural

networks to predict large changes. We previously proposed the selective-presentation

and selective-learning-rate approaches and applied them into stock market prediction

[12-14]. In the selective-presentation approach, the training data corresponding to

large changes in the prediction-target time series are presented more often. In the

selective-learning-rate approach, the learning rate for training data corresponding to

small changes is reduced. The previous paper [12] also investigated another stopping

criterion for financial predictions. Network learning is stopped at the point having the

Kohara K. (2008).

Combining Selective-Presentation and Selective-Learning-Rate Approaches for Neural Network Forecasting of Stock Markets.

In Proceedings of the 4th International Workshop on Artiﬁcial Neural Networks and Intelligent Information Processing, pages 3-9

DOI: 10.5220/0001508200030009

Copyright

c

SciTePress

maximum profit through experimental stock-trading.

This paper proposes combining the selective-presentation and selective-learning-

rate approaches. By combining these two approaches, we can easily achieve fine-

tuned and step-by-step selective learning of neural networks according to the degree

of change. Daily stock prices are predicted as a noisy real-world problem.

2 Combining Selective-Presentation and Selective-Learning-Rate

Approaches

To allow neural networks to learn about large changes in prediction-target time series

more effectively, we separate the training data into large-change data (L-data) and

small-change data (S-data). L-data (S-data) have next-day changes that are larger

(smaller) than a preset value.

In the selective-presentation approach, the L-data are presented to neural networks

more often than S-data. For example, all training data are presented every fifth learn-

ing cycle, while the L-data are presented every cycle. In the selective-learning-rate

approach, all training data are presented in every cycle; however, the learning rate of

the back-propagation training algorithm for S-data is reduced compared with that for

L-data. These two approaches are outlined as follows.

Selective-Presentation Approach

1. Separate the training data into L-data and S-data.

2. Train back-propagation networks with more presentations of L-data than of S-data.

3. Stop network learning at the point satisfying a certain stopping criterion (e.g., stop

at the point having the maximum profit).

Selective-Learning-Rate Approach

1. Separate the training data into L-data and S-data.

2. Train back-propagation networks with a lower learning rate for the S-data than for

the L-data.

3. Stop network learning at the point satisfying a certain stopping criterion (e.g., stop

at the point having the maximum profit).

We combine these two approaches to achieve fine-tuned and step-by-step learning

of neural networks according to the degree of change. The outline is as follows.

Combining Selective-Presentation and Selective-Learning-Rate Approaches

1. Separate the training data into L-data and S-data.

2. Separate L-data into two subsets: L1-data and L2-data, where changes in L2- data

are larger than those in L1-data.

3. Separate S-data into two subsets: S1-data and S2-data, where changes in S2-data

are larger than those in S1-data.

4. Train back-propagation networks with more presentations of L1- and L2-data than

of S1- and S2-data, and with a lower learning rate for L1- and S1-data than for L2

and S2-data.

5. Stop network learning at the point satisfying a certain stopping criterion (e.g., stop

4

at the point having the maximum profit).

In general, we can separate the training data into N subsets (N ≥ 2): D

1

-, D

2

-, …,

and D

N

-data, where changes in D

i

-data are larger than those in D

i-1

-data, and give

“selective intensities” I (number of presentations times learning rate) to D

1

-, D

2

-, …,

and D

N

-data as I

1

< I

2

< I

3

< … < I

N

.

3 Evaluation through Experimental Stock-Price Prediction

We considered the following types of knowledge for predicting Tokyo stock prices.

These types of knowledge involve numerical economic indicators [12-14].

1. If interest rates decrease, stock prices tend to increase, and vice versa.

2. If the dollar-to-yen exchange rate decreases, stock prices tend to decrease, and

vice versa.

3. If the price of crude oil increases, stock prices tend to decrease, and vice versa.

We used the following five indicators as inputs to the neural network.

• TOPIX: the chief Tokyo stock exchange price index

• EXCHANGE: the dollar-to-yen exchange rate (yen/dollar)

• INTEREST: an interest rate (3-month CD, new issue, offered rates) (%)

• OIL: the price of crude oil (dollars/barrel)

• NY: New York Dow-Jones average of the closing prices of 30 industrial stocks

(dollars)

TOPIX was the prediction target. EXCHANGE, INTEREST and OIL were cho-

sen based on the knowledge of numerical economic indicators. The Dow-Jones aver-

age was used because Tokyo stock market prices are often influenced by New York

exchange prices. We assume that tomorrow’s change in TOPIX is determined by

today’s changes in the five indicators according to the knowledge. Therefore, the

daily changes in these five indicators (e.g.

Δ

TOPIX(t) = TOPIX(t) - TOPIX(t-1))

were input into neural networks, and the next-day’s change in TOPIX was presented

to the neural network as the desired output (Figure 1). The back-propagation algo-

rithm was used to train the network. All the data of the daily changes were scaled to

the interval [0.1, 0.9]. A 5-5-1 multi-layered neural network was used (five neurons in

the input layer, five in the hidden layer, and one in the output layer).

3.1 Experiments

We used data from a total of 409 days (from August 1, 1989 to March 31, 1991): 300

days for training, 30 days for validation (making decisions on stopping the network

learning), and 79 days for making predictions. In Experiment 1, all training data were

presented in each cycle with an equal learning rate (

ε

= 0.7). In Experiment 2, L-data

were presented five times as often as S-data. Here, the large-change threshold was

14.78 points (about US$ 1.40), which was the median of absolute value of TOPIX

5

daily changes in the training data. In Experiment 3, the learning rate for the S-data

was reduced up to 20% (i.e.,

ε

= 0.7 for the L-data and

ε

= 0.14 for the S-data).

Experimental conditions in Experiments 1, 2, and 3 are shown in Table 1, 2, and 3.

Fig. 1. Neural prediction model.

Table 1. Experimental conditions in Experiment 1: conventional technique.

S-data L-data

Range of absolute

value of

Δ TOPIX(t)

0 to 50% 50 to 100%

Number of data 150 150

Relative number of

presentations (P)

1 1

Learning rate (

ε

)

0.7 0.7

P times

ε

(relative value)

0.7

(1)

0.7

(1)

Table 2. Experimental conditions in Experiment 2: selective presentation.

S-data L-data

Range of absolute

value of

Δ TOPIX(t)

0 to 50% 50 to 100%

Number of data 150 150

Relative number of

presentations (P)

1 5

Learning rate (

ε

)

0.7 0.7

P times

ε

(relative value)

0.7

(1)

3.5

(5)

Table 3. Experimental conditions in Experiment 3: selective-learning-rate.

S-data L-data

Range of absolute

value of

Δ TOPIX(t)

0 to 50% 50 to 100%

Number of data 150 150

Relative number of

presentations (P)

1 1

Learning rate (

ε

)

0.14 0.7

P times

ε

(relative value)

0.14

(1)

0.7

(5)

)(

)(

)(

)(

)(

tNY

tOIL

tINTEREST

tEXCHANGE

tTOPIX

Δ

Δ

Δ

Δ

Δ

Neural network

(size: 5-5-1)

)1( +

Δ

tTOPIX

6

Experimental conditions in Experiment 4 are shown in Table 4. S-data were separated

into S1- and S2-data, where changes in S2-data were larger than those in S1-data.

Here, the boundary between S1- and S2-data was at the 25% point. (The 25% point

means that 25% of the data is between the minimum data and the 25% point data. The

50% point corresponds to the “median.”) L-data were separated into L1- and L2-data,

where changes in L2-data were larger than those in L1-data. Here, the boundary be-

tween L1- and L2-data was the 75% point. The 25%, 50%, and 75% points were 5.36

(about US$ 0.51), 14.78 (US$ 1.40) and 31.04 points (US$ 2.94), respectively. L1-,

L2-, S1-, and S2-data each had 75 data. In Experiment 4, L1- and L2-data were pre-

sented five times as often as S1- and S2-data. In Experiment 4, the learning rate for

L1- and S1-data was reduced to 50% (i.e.,

ε

= 0.7 for L2- and S2-data, and

ε

= 0.35

for L1- and S1-data). Relative selective intensities (number of presentations times

learning rate) for S1-, S2-, L1-, and L2-data were 1, 2, 5, and 10, respectively.

Table 4. Experimental conditions in Experiment 4: the hybrid technique.

S1-data S2-data L1-data L2-data

Range of absolute

value of

Δ

TOPIX(t)

0 to 25% 25 to 50% 50 to 75% 75 to 100%

Number of data 75 75 75 75

Relative number of

presentations (P)

1 1 5 5

Learning rate (

ε

) 0.35 0.7 0.35 0.7

P times

ε

(relative value)

0.35

(1)

0.7

(2)

1.75

(5)

3.5

(10)

In each experiment, network learning was stopped at the point having the maxi-

mum profit (the learning was stopped at the point having the maximum profit for the

validation data during 8000 learning cycles). The prediction error and profit were

monitored after every hundred learning cycles.

When a large change in TOPIX was predicted, we tried to calculate “Profit” as fol-

lows: when the predicted direction was the same as the actual direction, the daily

change in TOPIX was earned, and when it was different, the daily change in TOPIX

was lost. This calculation of profit corresponds to the following experimental TOPIX

trading system. A buy (sell) order is issued when the predicted next-day's up (down)

in TOPIX is larger than a preset value which corresponds to a large change. When a

buy (sell) order is issued, the system buys (sells) TOPIX shares at the current price

and subsequently sells (buys) them back at the next-day price. Transaction costs on

the trades were ignored in calculating the profit. The more accurately a large change

is predicted, the larger the profit is.

In each experiment, the momentum parameter

α

was 0.7. All the weights and bi-

ases in the neural network were initialized randomly between -0.3 and 0.3. In each

experiment the neural network was run four times for the same training data with

different initial weights and the average was taken.

7

3.2 Results

The experimental results are shown in Table 5. Multiple regression analysis (MR) was

also used in the experiments. The “prediction error on large-change test data” is the

mean absolute value of the prediction error for the test L-data.

Applying our selective-presentation approach (Experiment 2) reduced the predic-

tion error for test L-data and improved profits: the prediction-error on L-data was

reduced by 7% (1- (21.3/22.9)) and the network’s ability to make profits through

experimental TOPIX-trading was improved by 30% (550/422) compared with the

results obtained with the usual presentation approach (Experiment 1).

The prediction error and profits in Experiment 3 (selective-learning-rate approach)

were comparable to those in Experiment 2 (selective-presentation approach). Com-

bining selective-presentation with selective-learning-rate approaches (Experiment 4)

further reduced the prediction error for test L-data and improved profits: the predic-

tion-error was reduced by 10% (1- (20.7/22.9)) and the network’s ability to make

profits was improved by 38% (581/422).

Table 5. Experimental results.

MR Exp. 1 Exp. 2 Exp. 3 Exp. 4

Presentation

method

equal equal selective equal selective

Learning rate equal equal selective selective

Prediction error for

large-change data

(relative value)

24.3

(1.06)

22.9

(1)

21.3

(0.93)

21.3

(0.93)

20.7

(0.90)

Profit on test data

(relative value)

265

(0.62)

422

(1)

550

(1.30)

563

(1.33)

581

(1.38)

4 Conclusions

We investigated selective learning techniques for forecasting. In the first approach,

training data corresponding to large changes in the prediction-target time series are

presented more often, in the second approach, the learning rate for training data cor-

responding to small changes is reduced, and in the third approach, these two tech-

niques are combined. The results of several experiments on stock-price prediction

showed that the performances of selective-presentation and selective-learning-rate

approaches were both better than the usual presentation approach, and combining

them further improved the performance. Next, we will apply these techniques today’s

stock market and other real-world forecasting problems. We also plan to develop a

forecasting method that integrates statistical analysis with neural networks.

8

References

1. Weigend, A., Huberman, B., Rumelhart, D.: Predicting the future: a connectionist approach.

International Journal of Neural Systems, Vol. 1, No. 3. (1990) 193-209

2. Vemuri, V., Rogers, R. (eds): Artificial Neural Networks: Forecasting Time Series. IEEE

Press, Los Alamitos, CA (1994)

3. Pham, D., Liu, X.: Neural Networks for Identification, Prediction and Control. Springer

(1995)

4. Kil, D., Shin, F.: Pattern Recognition and Prediction with Applications to Signal Charac-

terization. American Institute of Physics Press (1996)

5. Mandic, D., Chambers, J.: Recurrent Neural Networks for Prediction. John Wiley & Sons

(2001)

6. Azoff, E.: Neural Network Time Series Forecasting of Financial Markets. John Wiley and

Sons, West Sussex (1994)

7. Refenes, A., Azema-Barac, M.: Neural network applications in financial asset management.

Neural Computing & Applications, Vol. 2, No. 1. Springer-Verlag, London (1994) 13-39

8. White, H.: Economic prediction using neural networks: the case of IBM daily stock return.

Proceedings of International Conference on Neural Networks. San Diego, CA (1988) II-

451-II-458

9. Baba, N., Kozaki, M.: An intelligent forecasting system of stock price using neural net-

works. Proceedings of International Conference on Neural Networks. Singapore (1992) I-

371-I-377

10. Freisleben, B.: Stock market prediction with backpropagation networks. Lecture Notes in

Computer Science, Vol. 604. Springer-Verlag, Heidelberg (1992) 451-460

11. Tang, Z., Almeida, C., Fishwick, P.: Time series forecasting using neural networks vs. Box-

Jenkins methodology. Simulation, Vol. 57, No. 5. (1991) 303-310

12. Kohara, K., Fukuhara, Y., Nakamura, Y.: Selective presentation learning for neural network

forecasting of stock markets. Neural Computing & Applications, Vol. 4, No. 3. Springer-

Verlag, London (1996) 143-148

13. Kohara, K., Fukuhara, Y., Nakamura, Y.: Selectively intensive learning to improve large-

change prediction by neural networks. Proceedings of International Conference on Engi-

neering Applications of Neural Networks. London (1996) 463-466

14. Kohara, K.: Selective-learning-rate approach for stock market prediction by simple recur-

rent neural networks. Lecture Notes in Artificial Intelligence, Vol. 2773. Springer-Verlag,

Heidelberg, (2003) 141-147

15. Kohara, K.: Neural networks for economic forecasting problems. In: Cornelius T. Leondes

(ed): Expert Systems -The Technology of Knowledge Management and Decision Making

for the 21st Century-. Academic Press. San Diego, CA (2002)

16. Kohara, K.: Foreign Exchange Rate Prediction with Selective Learning BPNNs and SOMs.

Proceedings of World Multi-Conference on Systemics, Cybernetics and Informatics. Orland,

FL (2005) 350-354

17. Park, D., El-Sharkawi, M., Marks II, R., Atlas, L., Damborg, M.: Electric load forecasting

using an artificial neural network. IEEE Transactions on Power Systems. Vol. 6, No. 2.

(1991) 442-449

9