A Survey on Machine Learning for Stock Price Prediction:

Algorithms and Techniques

Mehtabhorn Obthong

1 a

, Nongnuch Tantisantiwong

2 b

, Watthanasak Jeamwatthanachai

1 c

and Gary Wills

1 d

1

School of Electronics and Computer Science, University of Southampton, Southampton, U.K.

2

Nottingham Business School, Nottingham Trent University, Nottingham, U.K.

Keywords:

Machine Learning, Deep Learning, Finance, Stock Price Prediction, Time Series Analysis.

Abstract:

Stock market trading is an activity in which investors need fast and accurate information to make effective

decisions. Since many stocks are traded on a stock exchange, numerous factors inﬂuence the decision-making

process. Moreover, the behaviour of stock prices is uncertain and hard to predict. For these reasons, stock

price prediction is an important process and a challenging one. This leads to the research of ﬁnding the most

effective prediction model that generates the most accurate prediction with the lowest error percentage. This

paper reviews studies on machine learning techniques and algorithm employed to improve the accuracy of

stock price prediction.

1 INTRODUCTION

In ﬁnancial markets, machine learning (ML) has

become a powerful analytical tool used to help and

manage investment efﬁciently. ML has been widely

used in the ﬁnancial sector to provide a new mech-

anism that can help investors make better decisions

in both investment and management to achieve better

performance of their securities investment. Equity

securities are one of the most traded securities (Lin

et al., 2018) as they have an attractive return (He

et al., 2015; Chou and Nguyen, 2018) and are a

relatively liquid asset given that they can be resold

and repurchased through stock exchanges.Despite the

attractive return, equity investment has high risk due

to the uncertainty and ﬂuctuation in the stock market

(Hyndman and Athanasopoulos, 2018). Investors

must, therefore, understand the nature of individual

stocks and their dependence factors that effect to

stock prices in order to increase their chances of

achieving higher returns. But all these, the investors

require to make effective investment decisions at the

right time (Ijegwa et al., 2014) using an accurate and

appropriate amount of information (Nguyen et al.,

a

https://orcid.org/0000-0002-3869-578X

b

https://orcid.org/0000-0001-5243-2970

c

https://orcid.org/0000-0002-4622-0493

d

https://orcid.org/0000-0001-5771-4088

2015) e.g. investor sentiment and interest rates.

Price prediction based on a few factors would be

easy but the result might be inaccurate because some

excluded factors may also be important in explaining

the movement of stock prices. The prices of indi-

vidual stocks can be affected by various factors e.g.

economic growth (Selvin et al., 2017). It is difﬁcult

to analyse all factors manually (Nguyen et al., 2015;

Sharma et al., 2017), so it would be better if there

were tools for supporting the analysis of this data

within a timely response.

Making the right decision within timely response

has posed a number of challenges as such a large

amount of information is required for predicting the

movement of the stock market price. These in-

formation are important for investors because stock

market volatility can lead to a considerable loss of

investment. The analysis of this large information is

thus useful for investors, and also useful for analysing

the direction of stock market indexes (Kim and Kang,

2019).

With the great success of ML in many ﬁelds,

research on ML in ﬁnance has gained more attention

and been studied continuously (Kim and Kang, 2019).

Thus, a desktop study was conducted in this paper

as to explore the application of machine learning

in ﬁnance: employed to algorithms and techniques,

exclusively focusing on stock prediction.

Obthong, M., Tantisantiwong, N., Jeamwatthanachai, W. and Wills, G.

A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques.

DOI: 10.5220/0009340700630071

In Proceedings of the 2nd International Conference on Finance, Economics, Management and IT Business (FEMIB 2020), pages 63-71

ISBN: 978-989-758-422-0

Copyright

c

2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

63

2 FINANCIAL INSTRUMENTS

A ﬁnancial instrument is a contract of tradable assets

(Lehmann, 2017), such as stocks, bonds, bills, curren-

cies, swaps, futures, and options, that gives the right

to part- or wholly-own an entity or to claim the assets

of the entity (Staszkiewicz and Staszkiewicz, 2014).

Financial assets are claims to the income produced by

real assets (e.g. selling cocoa beans, letting a building,

providing a service).

2.1 Equity

An equity asset, also known as a share, is issued

by a public company to represent partial ownership

of the company. Individual or group known as the

stockholders or shareholders will have the status of

a company owner. When the company wishes to

expand its business, more capital may be needed to

ﬁnance this plan. To raise this capital, the company

can issue new shares, after approval by existing share-

holders (because new issues of shares dilutes their

ownership), and sell them to investors. The quoted

value of the stock will increase if the company is

successful. Therefore, the performance of the stock

investment relates to both the success and to the real

assets of the company (Bodie et al., 2013).

2.1.1 Stock Market

A stock market, also known as the equity market, is

a public market where traders (investors in the ﬁnan-

cial markets) buy and sell the company’s shares and

derivatives by exchanging or processing in electronic

or in physical form (G

¨

oc¸ken et al., 2016). Generally,

ﬁnancial instruments are traded in the capital market

comprising a primary market and a secondary market.

The primary market is the place where securities

are distributed for the ﬁrst time. The initial public

offering (IPO) occurs here. The secondary market

refers to the market for trading among investors.

Examples are New York Stock Exchange (NYSE),

London Stock Exchange (LSE), Japan Exchange

Group (JPX), Shanghai Stock Exchange (SSE), and

NASDAQ.

2.1.2 Stock Index

A stock index is a representative of a group of stocks’

prices. This index is computed from the prices of

deﬁned stocks and its change can reﬂect the overall

performance of the stocks listed in the index. In

particular, a stock index is a weighted average market

value of a number of ﬁrms compared with the value

on the base trading day (Bodie et al., 2013). For

example, the Financial Times Stock Exchange 100

Index (FTSE 100) and Standard & Poor’s Composite

500 Index (S&P500)

1

2.1.3 Stock Trading

Stock trading is an important challenge for investors

because trading decision and stock prices can be

affected by the variety and complexity of information

including economic conditions, local politics, inter-

national politics, and social factors (Naranjo et al.,

2018). Stock trading involves buying and selling

shares in companies. Many different trading methods

are used by traders, such as day trading, position

trading, swing trading, and scalping (Mann and Kutz,

2016).

2.2 Other Financial Instruments

Bonds, also known as debt securities, are issued by

an obligated borrower to make the speciﬁed coupon

payments to the holder, also known as a bondholder,

over a speciﬁed period. Debt instruments include

treasury notes and bonds, municipal bonds, corporate

bonds, federal agency debt, and mortgage securities.

Most of these instruments promise either ﬁxed in-

come streams or income streams that are deﬁned from

a speciﬁc formula. That is the reason why they are

sometimes called ﬁxed-income securities.

Derivatives are securities whose payoffs are based

on the value of other assets, so-called underlying

assets, for example, stocks, currencies, bonds, com-

modities, etc. (Bodie et al., 2013). Financial deriva-

tives play an important role in the ﬁnancial markets

because they are used to hedge risks occurring from

the operational, ﬁnancing and investment activities of

companies (Lehmann, 2017). Four popular types of

derivatives are futures, options, forwards, and swaps.

The Foreign Exchange Rate is the price of one

currency in term of another currency. The foreign

exchange market is a formal network in which the

group of banks and brokers can exchange currencies

immediately or enter a contract to exchange curren-

cies in the future at the determined rate (Bodie et al.,

2013). The contracts traded in the exchange markets

divided into three types: spot, outright forward, and

swap (Brown, 2017).

Commodities are goods that are interchangeable

with the same type and same grade of commodities,

usually used as a raw material (cocoa, tea, silver)

to produce goods or services. Commodities can be

1

S&P500 is one of leading indicators and the important

benchmark for the 500 top-traded companies (Althelaya

et al., 2018b).

FEMIB 2020 - 2nd International Conference on Finance, Economics, Management and IT Business

64

traded based on current prices in the spot market, also

known as the cash market, or at a pre-speciﬁed price

in the futures market (Roncoroni et al., 2015). Some

commodities can be underlying assets of derivatives.

Commodities trading in the spot market are used for

immediate delivery, but the futures market is used for

trading for delivery at an agreed date in the future

(Whalley, 2016).

3 MACHINE LEARNING FOR

FINANCIAL INSTRUMENTS

Over the past few years, ML has been applied in many

research ﬁelds, especially ﬁnance and economics (Xu

and Wunsch, 2005). Many researchers have used

ML algorithms to create tools to analyse historical

ﬁnancial data and other related information (e.g. eco-

nomic conditions) for supporting decision-making in

investment. For example, Jeong et al. (2018) used

ML algorithms to support decision-making of stock

investment by using ﬁnancial news data and social

media data, while Chou and Nguyen (2018) forecast

the stock prices of construction companies in Taiwan

using a promising non-linear prediction model.

More importantly, using historical or time series

ﬁnancial data, carefully selecting appropriate models,

data, and features are all essential in order to produce

accurate results. The accurate results depend on

efﬁcient infrastructure, collection of relevant informa-

tion, and algorithms employed (Alpaydin, 2014). The

better quality of data, the more accurate the ML result.

With the great success in ML over the past few

years, it has changed the way investors use informa-

tion and it offers optimal analytic opportunities for

all investing types. Thus, ML is a signiﬁcant tool

to help ﬁnancial investment. Table 1 summarises

ML techniques used and applied to forecasting as-

set returns or ﬁnding the pattern or distribution of

asset returns. These techniques include clustering,

prediction, classiﬁcation, and others (e.g. portfolio

optimisation), while Table 2 presents the advantages

and disadvantages of each ML techniques used in the

ﬁnancial ﬁelds.

4 TIME SERIES DATA

Time series data are groups of continuous data that

were collected over a period of time (T ). The data are

collected yearly, monthly, weekly, daily or every hour,

minute, or second. Examples are the daily exchange

rate of pounds sterling (GBP) against the US dollar

(USD) between 1 January 2019 and the 31 December

2019, the monthly UK unemployment rate each year,

the daily closing price of stocks, and so on.

Time series data is comprised of four components

(Yaffee and McGee, 2000):

Trend or secular trend shows the direction of

movement of data in the long term. The tendencies

may be stable, increasing, or decreasing, during dif-

ferent time intervals.

Cycle is data movement patterns over periods

longer than one year. These ﬂuctuations are usually

affected by conditions associated with an economic

or business cycle (Hyndman and Athanasopoulos,

2018). Cycle is similar to season, but with longer

duration of ﬂuctuations, at least two years. The

nature of cyclical variation is periodic and will repeat

itself; for example, the rise and fall of the number of

batteries sold by National Battery Sales, Inc. from

1984 to 2003.

Seasonality, also known as seasonal variation,

seasonal ﬂuctuation or seasonal effect, is the move-

ment of data caused by the inﬂuence of an annual

Table 1: Existing algorithms and techniques applied to

ﬁnancial instruments.

Methods

∗

Type of ﬁnancial instrument

Stocks Bonds Derivatives Foreign

Exchange

Commodities

Clustering

K-Means X

SOM X

Hierarchical

Clustering

X

Prediction

RF X X X

SVM X X

MLP X X X X

LSTM X

RNN X X X X X

GAs X X X

KNN X X X X

SVR X X X X

MCS X X X X

ANNs X X X

CART X X

GP X X

BSM X X

GRNN X X

RBF X

BPNN X X X

LR X X

HMM X X X

Classiﬁcation

SVM X X X

KNN X X X X

LR X

ANNs X

*

Deﬁnitions of the methods are provided in Appendix section.

A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques

65

Table 2: Advantage and disadvantage of each ML algorithms and technique.

Methods Data Purpose Method Advantages Disadvantages References

ANNs: Artiﬁcial

Neural network

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Model + High ability to tackle complex nonlinear

patterns

+ High accuracy for modelling the

relationship in data groups Model can

support both linear and non-linear

processes

+ Model is robust and can handle noisy and

missing data

- Over ﬁtting

- Sensitive to parameter selection - ANNs

just give predicted target values for some

unknown data without any variance

information to assess the prediction

Wang et al. (2011);

G

¨

oc¸ken et al. (2016);

Zhou and Fan (2019)

ARIMA:

Autoregressive

integrated moving

average model

Time-series,

Financial

time-series

Forecasting

and

Clustering

Model + Works well for linear time series

+ It is the most effective forecasting

technique in social science

+ For short-run forecasting, it provides

more robust and efﬁcient than the relative

models with more complex structural

- Does not work well for nonlinear time

series

- The model determined for one series will

not be suitable for another

- Requires more data

- Takes a long time processing for a large

dataset

- Requires set parameters and is based on

user assumptions that may be false, the

resulting clusters being inaccurate

- The forecast results are based on past

values of the series and previous error

terms

Adebiyi et al. (2014);

Hyndman and

Athanasopoulos (2018);

Selvin et al. (2017)

BPNN: Back

propagation

neural network

Non-time series,

Time-series and

Financial time

series

Forecasting Model + Flexible nonlinear modelling capability

+ Strong adaptability

+ Capable of learning and massively

parallel computing Popular for predicting

complex nonlinear systems

+ Fast response

+ High learning accuracy

- Sensitive to noise

- Actual performance based on initial

values

- Slow convergent speed

- Easily converging to a local minimum

Wang et al. (2015);

Singh and Tripathi

(2017)

CART:

Classiﬁcation and

Regression Trees

Non-time series,

Financial

time-series

Classiﬁca-

tion and

Forecasting

Model + Can model nonlinearity very well

+ Results are easily interpretable

- Unstable even when the training data are

small changed

Pradeepkumar and Ravi

(2017)

FCM: Fuzzy c

means

Non-time series,

Time-series and

Financial time

series

Clustering Algorithm + Works well for searching

spherical-shaped clusters

+ Work effectively for small to medium

datasets

- Sensitive to noise

- Has problems with handling high

dimensional datasets

- The membership of the data point

depends directly on the membership values

of other cluster centres which may lead to

undesirable results

Grover (2014)

GAs: Genetic

Algorithms

Non-time series,

Time-series and

Financial time

series

Clustering,

Classiﬁca-

tion and

Forecasting

Algorithm + Can search the clusters with different

shapes by using different criteria

+ One of the best-suited algorithms for

learning the time-series datasets Works

well for the noisy data

+ Suitable for peculiarly hard problems

when little or no knowledge of the optimal

function is given and the search space is

very large

+ Suitable for solving the issue of deﬁning

proper parameters for ANNs

- Sensitive to parameter selection (Alfred et al., 2015;

Wang et al., 2011)

GMDH: Group

Method of Data

Handling

Financial

time-series

Forecasting Algorithm + Best ANN for handling the incorrect,

noisy, or small datasets

+ Provides higher accuracy and is an easier

structure than traditional ANN models

- Can generate a complicated polynomial

even for a simple system

- Does not consider the input-output

relationship well because of its limited

architecture

- Inefﬁcient for modelling nonlinear

systems that have different characteristics

in different environments

Pradeepkumar and Ravi

(2017)

GP: Gaussian

Processes

Time-series and

Financial

time-series

Classiﬁca-

tion and

Forecasting

Model + Flexible and easy computational

implementation

+ Sufﬁciently robust to generate the

automatic model

- Generates ”black box” models which are

difﬁcult to interpret

- Can be computationally expensive

Rizvi et al. (2017)

GRNN:

Generalized

Regression Neural

Network

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Model + Easy to implement because of a much

faster training procedure than other ANNs

+ Useful for performing predictions in

real-time

+ Does not require an iterative training

process Can estimate any arbitrary function

by adapting the function exactly from the

training data

+ Quick training approach

+ Provides the high accuracy of both linear

and nonlinear functional regressions, based

on the kernel estimation theory

- Requires more memory space to store the

model

- Can be computationally expensive

because of its huge size

Pradeepkumar and Ravi

(2017); Al-Mahasneh

et al. (2018)

Hierarchical

Clustering

Non-time series,

Time-series

Clustering Algorithm + Does not need to set any parameters, e.g.

the number of clusters

- The length of each time series is the same

because of the Euclidean distance

calculation requirement

- Unable to handle long time series

effectively because of poor scalability

Useful only for small datasets because of

its quadratic computational complexity

Wang et al. (2006)

continued on the next page

FEMIB 2020 - 2nd International Conference on Finance, Economics, Management and IT Business

66

Table 2: Advantage and disadvantage of each ML algorithms and technique. (cont.)

Methods Data Purpose Method Advantages Disadvantages References

HMM: Hidden

Markov Model

Non-time series,

Time-series and

Financial time

series

Clustering,

Classiﬁca-

tion and

Forecasting

Model + Strong statistical foundation

+ Able to model high level information

(language model, or syntactical rules)

- Requires parameters to be set and is based

on user assumptions that may be false with

the result that clusters would be inaccurate

- Takes a long time processing for a large

dataset

Aghabozorgi et al.

(2015); Belgacem et al.

(2017)

k-Means Non-time series,

Time-series and

Financial time

series

Clustering Algorithm + Works well for searching

spherical-shaped clusters

+ Works effectively for small to medium

datasets

+ Faster than hierarchical clustering

- The number of clusters must be speciﬁed

in advance

- Sensitive to noise

- Only spherical shapes can be determined

as clusters

- The quality of clustering is highly

dependent on the selection of initial centres

The length of each time series is the same

because of the Euclidean distance

calculation requirement

- Unable to handle long time series

effectively because of poor scalability

Wang et al. (2006);

Boomija and Phil (2008)

k-Medoids

(PAM)

Non-time series

and Time-series

Clustering Algorithm + Works well for searching

spherical-shaped clusters

+ Works effectively for small to medium

datasets

+ More robust to noisy data and outliers

than k-means

- The number of clusters must be speciﬁed

in advance

- Only spherical shapes can be determined

as clusters

- Does not scale well for large datasets

Boomija and Phil

(2008); Aghabozorgi

et al. (2015)

KNN: K Nearest

Neighbour

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Algorithm + Robust to noisy training data

+ Very efﬁcient if the training datasets are

large

- The number of nearest neighbours must

ﬁrst be determined

- Can be computationally expensive

- Memory limitation

- Sensitive to the local structure of the data

Archana and Elangovan

(2014)

LR: Logistic

Regression

Financial

time-series

Classiﬁca-

tion and

Forecasting

Model + High ability to tackle complex nonlinear

patterns

- Sensitive to outliers

- Strong assumptions

Wu and Li (2018)

LSTM: Long

Short-Term

Memory

Non-time series,

Time-series and

Financial Time

Series

Classiﬁca-

tion and

Forecasting

Model + Capable of analysing and exploiting the

interactions and patterns existing in data

through a self-learning process

+ Makes good predictions because it

analyses the interactions and hidden

patterns within the data

+ Good in remembering information for

long time

- Lacks a mechanism to index the memory

while writing and reading the data The

number of memory cells is linked to the

size of the recurrent weight matrices

Selvin et al. (2017);

Kumar et al. (2018)

MCS: Monte

Carlo Simulation

Financial

time-series

Forecasting Model + Very ﬂexible and virtually no limit for

analysis

+ Can model complex systems

+ All kinds of probability distributions can

be modelled

+ Time to results quite short

+ Easily understood by

non-mathematicians

+ Easy to see which inputs had the biggest

effect on the results

- No interactive link between data and

parameters

- Unidirectional

- Does not allow “backward reasoning”

Smid et al. (2010)

MLP: Multilayer

Perceptron

Non-time series,

Time-series,

Financial time

series

Forecasting,

Classiﬁca-

tion

Model + Can yield accurate predictions for

challenging problems

- Convergence is quite slow

- Local minima can affect the training

process

- Hard to scale

Pradeepkumar and Ravi

(2017)

PSO: Particle

Swarm

Optimization

Non-time series,

Time-series and

Financial time

series

Forecasting Algorithm + Easy to implement

+ Very few parameters to tweak

- Lacks a solid mathematical foundation for

analysing future development of relevant

theories

Pradeepkumar and Ravi

(2017)

RBF: Radial

Basis Function

Neural Networks

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Model + Robust to noisy input

+ The training is faster than perceptron

since there is no back propagation learning

involved

+ Very stable, and a generalization

capability

+ Good comprehensive adaptive and

learning abilities

+ Powerful technique for improvement in

multi-dimensional space

+ Quicker in convergence and more

accurate in the model than the Back

Propagation Neural Network

+ Does not suffer from local minima in the

same way as the multilayer perceptron

+ Only has one hidden layer making faster

learning than MLP

- Classiﬁcation process is slower than MLP Markopoulos et al.

(2016)

RF: Random

Forest

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Algorithm + Robust method for forecasting and

classiﬁcation problems since its design that

is ﬁlled with various decision trees, and the

feature space is modelled randomly

+ Automatically handles missing values

+ Works well with both discrete and

continuous variables

- Requires more computational power and

resources because it creates a lot of trees

- Requires more time to train than decision

trees

Pradeepkumar and Ravi

(2017)

continued on the next page

A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques

67

Table 2: Advantage and disadvantage of each ML algorithms and technique. (cont.)

Methods Data Purpose Method Advantages Disadvantages References

RNN: Recurrent

neural networks

Non-time series,

Time-series and

Financial Time

series

Classiﬁca-

tion and

Forecasting

Model + Very useful where for showing the time

relationships that occur between the inputs

and outputs in the neural network

- Difﬁcult to train Bai et al. (2018)

SOM: Self

organizing maps

Non-time series,

Time-series and

Financial time

series

Clustering

and Classi-

ﬁcation

Algorithm + Robust to parameter selection Yields a

good clustering result Excellent

data-exploring tool

- Does not work well for time series of

unequal length because of the difﬁculty

involved in determining the scale of weight

vectors

- Sensitive to outliers

Aghabozorgi et al.

(2015)

SVM: Support

Vector Machine

Non-time series,

Time-series and

Financial time

series

Classiﬁca-

tion and

Forecasting

Algorithm + Can provide the optimal global solution

and has excellent predictive accuracy

capability

+ Works well on a range of classiﬁcation

problems, such as those with high

dimensions

- Sensitive to outliers

- Sensitive to parameter selection

Wang et al. (2011)

SVR: Support

Vector Regression

Time-series and

Financial Time

Series

Forecasting Model + Powerful for ﬁnancial time series

prediction

+ Particularly suited to handle multiple

inputs

+ Provides high prediction accuracy

+ Ability to tackle the overﬁtting problem

- Sensitive to users’ deﬁned free parameters Nava et al. (2018)

season or speciﬁc period that will be repeated at the

same time of the year such as month effects and

quarter effects. The inﬂuence may be driven by

natural conditions, business procedures, social and

cultural behaviour.

Irregularity or irregular variation is short-period

irregular movements in the time series data possibly

due to disasters, wars, or strikes. This variation

usually affects business activity in the short term.

Time series analysis has been applied in eco-

nomics and to ﬁnance research (Sharma et al., 2017)

such as in economic forecasting, sales forecasting,

stock market analysis, and yield projection. As a

matter of fact, many ML applications have both been

proposed and been adopted to swiftly cope with, and

solve problems in time series analysis (Siami-Namini

and Namin, 2018).

5 MACHINE LEARNING FOR

STOCK PRICE PREDICTION

Stock price prediction has played in the very impor-

tant role of investments as efﬁcient stock price pre-

dictions can provide suggestions on trading strategies.

However, there is no guarantee that the stock price

prediction using historical data will be 100% accurate

due to the uncertainty in the future. For example,

stock price can ﬂuctuatedepending on political and

economic conditions. Thus, investors have used

fundamental and technical analysis simultaneously

for the stock price prediction (Beyaz et al., 2018).

Fundamental analysis is a method to estimate the

intrinsic value of a stock by analyzing various internal

and external factors that could have effects on the

value of stock or company (Selvin et al., 2017). The

fundamental factors include business environment,

ﬁnancial performance, economic data, and social and

political behaviour (Beyaz et al., 2018).

Technical analysis is a method to predict future

stock prices (Selvin et al., 2017) by using historical

data. This method focuses on an analysis of trends

of securities’ prices such as daily opening, high, low,

and closing prices. In addition, other features may

be considered and used in the technical analysis for

increasing accuracy in the prediction, for example,

volume and relative strength index (RSI).

• Opening Price is the ﬁrst price of any listed stock

at the beginning of an exchange on a trading day.

• High and Low Prices are the highest and lowest

price of the stock on that day. Generally, these

data are used by traders to measure the volatility

of the stock.

• Closing Price is a price of the stock at the close of

the trading day.

• Volume is the number of stocks or contracts traded

for a security in all the markets during a given

time period.

• Adjusted Closed Prices is considered as the true

price of that stock, and shows the stock’s value

after distributing dividends.

According to the literature, many algorithms and

techniques have been proposed for stock price pre-

diction, where some of them are shown in Section

3. Table 3 summarizes the performance of ML

algorithms and techniques (accuracy and error per-

centages) reported in the literature. The comparison

shows that many deep learning performed well in

term of producing low error percentages (such as

ANN, RNN, LSTM, stacked long short-term memory

(SLSTM), and bidirectional long short-term memory

FEMIB 2020 - 2nd International Conference on Finance, Economics, Management and IT Business

68

Table 3: Comparison of ML algorithms and techniques in ﬁnancial stock price prediction.

Paper Prediction

Techniques

Stocks/Index Input Data Accuracy (%) Error (%)

Hegazy et al.

(2014)

PSO,

LS-SVM,

ANN

S&P 500 Historical daily stock prices N/A LS-SVM: 0.1147

PSO: 0.7417

ANN: 1.7212;

Note: average of 13

companies which cover

all stock sectors in S&P

500 stock market

Adebiyi et al.

(2014)

ARIMA,

ANN

Dell index Historical daily stock prices N/A ARIMA: 0.608

ANN: 0.8614;

Note: average of one

month prediction

Nguyen et al.

(2015)

SVM AAPL, AMZN, BA, BAC, CSCO,

DELL, EBAY,ETFC, GOOG, IBM,

INTC, KO, MSFT, NVDA, ORCL,

T, XOM, YHOO

Historical daily stock prices and

mood information

54.41 (average)

60.00 (few stocks)

N/A

Patel et al.

(2015)

ANN, SVM,

RF,

Na

¨

ıve-Bayes

CNX nifty index, S&P Bombay

Stock Exchange (BSE) Sensex index,

Infosys Ltd., Reliance Industries

Historical daily stock prices Na

¨

ıve-Bayes: 90.19

RF: 89.98

SVM: 89.33

ANN: 86.69

N/A

Attigeri et al.

(2015)

LR Stock market price of two companies Historical daily stock prices, news

articles, and social media data

(twitter)

LR: 70 N/A

Dang and

Duong

(2016)

SVM VN30 Index: EIB, MSN, STB, VIC,

VNM

News relating to companies in the

VN30 Index

SVM: 73 N/A

Selvin et al.

(2017)

LSTM,

RNN, CNN,

ARIMA

NIFTY-IT index (Infosys, TCS),

NIFTY-Pharma index (Cipla)

Minute by minute stock prices (day

stamp, time stamp, transaction id,

stock price, and volume traded)

N/A Infosys:

CNN: 2.36/ RNN: 3.9/

LSTM: 4.18

ARIMA: 31.91

TCS:

CNN: 8.96/ RNN: 7.65/

LSTM: 7.82

ARIMA: 21.16

Cipla:

CNN: 3.63/ RNN: 3.83/

LSTM: 3.94

ARIMA: 36.53

Roncoroni

et al. (2015)

LSTM NIFTY 50 Historical daily stock prices N/A 0.00859

Khare et al.

(2017)

LSTM, MLP 10 unique stocks on New York Stock

Exchange

Minute by minute stock prices N/A MLP: 0.0025

LSTM: 0.048

Althelaya

et al. (2018a)

MLP, LSTM,

SLSTM,

BLSTM

S&P 500 Historical daily stock prices (closing

price)

N/A Short-term:

BLSTM: 0.00947

SLSTM: 0.01248

LSTM: 0.01582

MLP: 0.03875

Long-term:

BLSTM: 0.06055

SLSTM: 0.06637

LSTM: 0.08371

MLP: 0.09369

(BLSTM)) while the mixture of historical daily stock

prices and social media data can produce the accuracy

of up to 70% (Attigeri et al., 2015).

6 CONCLUSION AND FUTURE

WORK

Stock investments have been of interest to many

investors around the world. However, making a

decision is a difﬁcult and complex task as numerous

factors are involved. For successful investment, in-

vestors are keen to forecast the future situation of the

stock market. Even small improvements in predictive

efﬁciency can be very proﬁtable. A good prediction

system will help investors make investments more

accurate and more proﬁtable by providing supportive

information such as the future direction of stock

prices. For this reason, stock price prediction is a very

important process that can be beneﬁcial for investors.

This paper reviewed and compared the state-of-

the-art of ML algorithms and techniques that have

been used in ﬁnance, especially the stock price predic-

A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques

69

tion. The number of ML algorithms and techniques

has been discussed in terms of types of input, pur-

poses, advantages, and disadvantages. For stock price

prediction, some of ML algorithms and techniques

have been popularly selected as to their characteris-

tics, accuracy and error acquired.

In addition to the historical prices, other informa-

tion might have effect to the stock such as politics,

economic growth, ﬁnancial news and social media.

Many studies have proven that the sentiment analysis

has a high impact on future prices. Thus, a mix of

technical and fundamental analyses could produce the

prediction more efﬁcient and would be interesting to

be added in to the state-of-the-art ML as future works.

REFERENCES

Adebiyi, A. A., Adewumi, A. O., and Ayo, C. K. (2014).

Comparison of ARIMA and artiﬁcial neural networks

models for stock price prediction. Journal of Applied

Mathematics, 2014.

Aghabozorgi, S., Shirkhorshidi, A. S., and Wah, T. Y.

(2015). Time-series clustering–a decade review. In-

formation Systems, 53:16–38.

Al-Mahasneh, A. J., Anavatti, S. G., and Garratt, M. A.

(2018). Review of Applications of Generalized

Regression Neural Networks in Identiﬁcation and

Control of Dynamic Systems. arXiv preprint

arXiv:1805.11236.

Alfred, R. et al. (2015). A genetic-based backpropagation

neural network for forecasting in time-series data. In

2015 International Conference on Science in Informa-

tion Technology (ICSITech), pages 158–163. IEEE.

Alpaydin, E. (2014). Introduction to machine learning.

MIT press.

Althelaya, K. A., El-Alfy, E.-S. M., and Mohammed, S.

(2018a). Evaluation of bidirectional LSTM for short-

and long-term stock market prediction. In 2018 9th

International Conference on Information and Com-

munication Systems (ICICS), pages 151–156. IEEE.

Althelaya, K. A., El-Alfy, E.-S. M., and Mohammed, S.

(2018b). Stock Market Forecast Using Multivari-

ate Analysis with Bidirectional and Stacked (LSTM,

GRU). In 2018 21st Saudi Computer Society National

Computer Conference (NCC), pages 1–7. IEEE.

Archana, S. and Elangovan, K. (2014). Survey of classiﬁ-

cation techniques in data mining. International Jour-

nal of Computer Science and Mobile Applications,

2(2):65–71.

Attigeri, G. V., MM, M. P., Pai, R. M., and Nayak, A.

(2015). Stock market prediction: A big data approach.

In TENCON 2015-2015 IEEE Region 10 Conference,

pages 1–5. IEEE.

Bai, S., Kolter, J. Z., and Koltun, V. (2018). An empiri-

cal evaluation of generic convolutional and recurrent

networks for sequence modeling. arXiv preprint

arXiv:1803.01271.

Belgacem, S., Chatelain, C., and Paquet, T. (2017).

Gesture sequence recognition with one shot learned

CRF/HMM hybrid model. Image and Vision Comput-

ing, 61:12–21.

Beyaz, E., Tekiner, F., Zeng, X.-j., and Keane, J. (2018).

Comparing technical and fundamental indicators in

stock price forecasting. In 2018 IEEE 20th In-

ternational Conference on High Performance Com-

puting and Communications; IEEE 16th Interna-

tional Conference on Smart City; IEEE 4th Inter-

national Conference on Data Science and Systems

(HPCC/SmartCity/DSS), pages 1607–1613. IEEE.

Bodie, Z., Kane, A., and Marcus, A. J. (2013). Investments

and portfolio management. McGraw Hill Education

(India) Private Limited.

Boomija, M. and Phil, M. (2008). Comparison of partition

based clustering algorithms. Journal of Computer

Applications, 1(4):18–21.

Brown, B. (2017). The forward market in foreign exchange:

a study in market-making, arbitrage and speculation.

Routledge.

Chou, J.-S. and Nguyen, T.-K. (2018). Forward Forecast

of Stock Price Using Sliding-Window Metaheuristic-

Optimized Machine-Learning Regression. IEEE

Transactions on Industrial Informatics, 14(7):3132–

3142.

Dang, M. and Duong, D. (2016). Improvement methods for

stock market prediction using ﬁnancial news articles.

In 2016 3rd National Foundation for Science and

Technology Development Conference on Information

and Computer Science (NICS), pages 125–129. IEEE.

G

¨

oc¸ken, M.,

¨

Ozc¸alıcı, M., Boru, A., and Dosdo

˘

gru, A. T.

(2016). Integrating metaheuristics and artiﬁcial neural

networks for improved stock price prediction. Expert

Systems with Applications, 44:320–331.

Grover, N. (2014). A study of various Fuzzy Clustering

Algorithms. In 3, editor, International Journal of

Engineering Research, volume 3, pages 177–181.

He, J., Cai, L., Cheng, P., and Fan, J. (2015). Opti-

mal investment for retail company in electricity mar-

ket. IEEE Transactions on Industrial Informatics,

11(5):1210–1219.

Hegazy, O., Soliman, O. S., and Salam, M. A. (2014). A

machine learning model for stock market prediction.

arXiv preprint arXiv:1402.7351.

Hyndman, R. J. and Athanasopoulos, G. (2018). Forecast-

ing: principles and practice. OTexts.

Ijegwa, A. D., Rebecca, V. O., Olusegun, F., and Isaac,

O. O. (2014). A predictive stock market technical

analysis using fuzzy logic. Computer and information

science, 7(3):1.

Jeong, Y., Kim, S., and Yoon, B. (2018). An Algorithm

for Supporting Decision Making in Stock Investment

through Opinion Mining and Machine Learning. In

2018 Portland International Conference on Man-

agement of Engineering and Technology (PICMET),

pages 1–10. IEEE.

Khare, K., Darekar, O., Gupta, P., and Attar, V. (2017).

Short term stock price prediction using deep learning.

In 2017 2nd IEEE International Conference on Recent

Trends in Electronics, Information & Communication

Technology (RTEICT), pages 482–486. IEEE.

FEMIB 2020 - 2nd International Conference on Finance, Economics, Management and IT Business

70

Kim, S. and Kang, M. (2019). Financial series pre-

diction using Attention LSTM. arXiv preprint

arXiv:1902.10877.

Kumar, J., Goomer, R., and Singh, A. K. (2018). Long short

term memory recurrent neural network (LSTM-RNN)

based workload forecasting model for cloud datacen-

ters. Procedia Computer Science, 125:676–682.

Lehmann, M. (2017). Financial instruments. In Encyclo-

pedia of Private International Law, pages 739–747.

Edward Elgar Publishing Limited.

Lin, F.-L., Yang, S.-Y., Marsh, T., and Chen, Y.-F. (2018).

Stock and bond return relations and stock market

uncertainty: evidence from wavelet analysis. Interna-

tional Review of Economics & Finance, 55:285–294.

Mann, J. and Kutz, J. N. (2016). Dynamic mode decom-

position for ﬁnancial trading strategies. Quantitative

Finance, 16(11):1643–1655.

Markopoulos, A. P., Georgiopoulos, S., and Manolakos,

D. E. (2016). On the use of back propagation and

radial basis function neural networks in surface rough-

ness prediction. Journal of Industrial Engineering

International, 12(3):389–400.

Naranjo, R., Arroyo, J., and Santos, M. (2018). Fuzzy

modeling of stock trading with fuzzy candlesticks.

Expert Systems with Applications, 93:15–27.

Nava, N., Di Matteo, T., and Aste, T. (2018). Financial time

series forecasting using empirical mode decomposi-

tion and support vector regression. Risks, 6(1):7.

Nguyen, T. H., Shirai, K., and Velcin, J. (2015). Sentiment

analysis on social media for stock movement predic-

tion. Expert Systems with Applications, 42(24):9603–

9611.

Patel, J., Shah, S., Thakkar, P., and Kotecha, K. (2015).

Predicting stock and stock price index movement us-

ing trend deterministic data preparation and machine

learning techniques. Expert Systems with Applica-

tions, 42(1):259–268.

Pradeepkumar, D. and Ravi, V. (2017). Forecasting ﬁnan-

cial time series volatility using particle swarm opti-

mization trained quantile regression neural network.

Applied Soft Computing, 58:35–52.

Rizvi, S. A. A., Roberts, S. J., Osborne, M. A., and Nyikosa,

F. (2017). A Novel Approach to Forecasting Financial

Volatility with Gaussian Process Envelopes. arXiv

preprint arXiv:1705.00891.

Roncoroni, A., Fusai, G., and Cummins, M. (2015).

Handbook of multi-commodity markets and products:

structuring, trading and risk management. John Wiley

& Sons.

Selvin, S., Vinayakumar, R., Gopalakrishnan, E., Menon,

V. K., and Soman, K. (2017). Stock price predic-

tion using LSTM, RNN and CNN-sliding window

model. In 2017 International Conference on Ad-

vances in Computing, Communications and Informat-

ics (ICACCI), pages 1643–1647. IEEE.

Sharma, A., Bhuriya, D., and Singh, U. (2017). Survey

of stock market prediction using machine learning

approach. In 2017 International conference of Elec-

tronics, Communication and Aerospace Technology

(ICECA), volume 2, pages 506–509. IEEE.

Siami-Namini, S. and Namin, A. S. (2018). Forecasting

economics and ﬁnancial time series: ARIMA vs.

LSTM. arXiv preprint arXiv:1803.06386.

Singh, J. and Tripathi, P. (2017). Time Series Forecasting

Using Back Propagation Neural Network with ADE

Algorithm. International Journal of Engineering and

Technical Research, 7(5).

Smid, J., Verloo, D., Barker, G., and Havelaar, A. (2010).

Strengths and weaknesses of Monte Carlo simulation

models and Bayesian belief networks in microbial risk

assessment. International Journal of Food Microbiol-

ogy, 139:S57–S63.

Staszkiewicz, P. and Staszkiewicz, L. (2014). Finance: A

Quantitative Introduction. Academic Press.

Wang, L., Zeng, Y., and Chen, T. (2015). Back propagation

neural network with adaptive differential evolution

algorithm for time series forecasting. Expert Systems

with Applications, 42(2):855–863.

Wang, S., Yu, L., Tang, L., and Wang, S. (2011). A novel

seasonal decomposition based least squares support

vector regression ensemble learning approach for hy-

dropower consumption forecasting in China. Energy,

36(11):6542–6554.

Wang, X., Smith, K., and Hyndman, R. (2006).

Characteristic-based clustering for time series data.

Data mining and knowledge Discovery, 13(3):335–

364.

Whalley, J. (2016). Developing Countries and the Global

Trading System: Volume 1 Thematic Studies from a

Ford Foundation Project. Springer.

Wu, L. and Li, M. (2018). Applying the CG-logistic

Regression Method to Predict the Customer Churn

Problem. In 2018 5th International Conference on

Industrial Economics System and Industrial Security

Engineering (IEIS), pages 1–5. IEEE.

Xu, R. and Wunsch, D. C. (2005). Survey of clustering

algorithms.

Yaffee, R. A. and McGee, M. (2000). An introduction to

time series analysis and forecasting: with applica-

tions of SAS

R

and SPSS

R

. Elsevier.

Zhou, J. and Fan, P. (2019). Modulation format/bit rate

recognition based on principal component analysis

(PCA) and artiﬁcial neural networks (ANNs). OSA

Continuum, 2(3):923–937.

APPENDIX: SUPPLEMENTARY

The supplementary materiel for this article can be

found online at ePrint, the University of Southamp-

ton. https://eprints.soton.ac.uk/437785/

A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques

71