(2017), Selin (2020), and Duan et al. (2020) using
conventional time series forecasting techniques such
as univariate autoregression (AR), univariate moving
average (MA), simple exponential smoothing (SES),
and autoregressive integrated moving average
(ARIMA) (Jing, 2021). However, Cheng et al. (2010)
argued that these methods are not very practical for
this forecasting task due to the lack of seasonality and
high volatility of the cryptocurrency market and the
use of statistical models, which require that the
models only deal with linear problems and that the
variables must follow a normal distribution (Jing,
2021). Both the forecasting of digital currencies and
the challenge of asset price and return forecasting
have seen the application of machine learning
techniques in recent years. Machine learning
techniques have been applied successfully to stock
market forecasting by incorporating nonlinear
features into the forecasting model to deal with non-
stationary financial time series; the findings have
shown that the method is more effective for
predicting (Yuan et al., 2016). Dinh et al. (2018)
predicted the price of Bitcoin using recurrent neural
networks and long short-term memory (LSTM). The
results demonstrated that the machine learning
approach, with its advanced temporal properties,
could produce better predictions than the
conventional multi-layer perceptron (MLP) (Jiang,
2020).
This paper delves into Bitcoin prediction utilizing
a machine-learning framework. Its objective is to
scrutinize the strengths and weaknesses of diverse
machine learning models in forecasting Bitcoin prices
and conduct a comparative analysis as a pivotal
reference for financial scientists seeking to anticipate
Bitcoin's future price movements.
2 DATASETS AND METHODS
2.1 Datasets
The data used in this study is taken from Kaggle’s
official website, and the dataset is about the Bitcoin
price from 2014.09.17 to 2024.07.07 with the daily
opening and closing prices. This article first converts
the Date column of the data to a date format, sorts by
date, and then normalizes the Close column to
between [0,1]. Finally, this paper defines the
create_dataset function to create a time series dataset,
divides the dataset into a training set (80%) and a test
set (20%), and then adjusts the time step to adjust the
data to the 3D format required by the LSTM model.
2.2 Models
LR serves as a fundamental model for predicting
continuous-valued target variables. It postulates a
direct, linear correlation between the input features
and the output targets. By minimizing the mean
square error (MSE) between the predicted and
observed values, the linear regression (LR) model
identifies the optimal line that best fits the data. On
specific model parameters, the fit_intercept of the
model is set to True; that is, the model calculates the
intercept term of the model. The model's normalized
setting is set to False, which means that the model
does not normalize the regression variables until
fitted.
LSTM is a recurrent neural network (RNN)
capable of processing and predicting long-term
dependency problems in time series data. The LSTM
can handle the dependencies of data over a longer
time frame through its internal memory unit. The
number of LSTM layers used in this article is two; in
the first layer of the LSTM, the number of LSTM
cells (is 50). return_sequences is True. The
input_shape is the shape of the input data, set to (30,
1); that is, the time step is 30, and the number of
features is 1. In the second layer of the model, the
return_sequences is set to False, indicating that only
the output of the last time step is returned. Dense (1)
is a fully connected layer to output prediction results.
Support vector machines (SVM) is a classical
supervised learning algorithm for binary and multi-
classification problems. The basic idea is to draw an
optimal hyperplane in the feature space for
classification. Support vector regression (SVR) is
nothing but the type of SVM for the regression model.
In a nutshell, SVR tries to fit the error within a certain
threshold because it optimizes for finding a
hyperplane with as many training samples within this
range of errors from itself, using regularization
parameters that help put constraints on model
complexity. Regarding specific model parameters,
the kernel of the model is set to radial basis function
(RBF). The regularization parameter has been tuned
to 100 to balance the model's complexity and training
error. This adjustment helps prevent overfitting by
penalizing complex models. Additionally, the kernel
coefficient has been set at 0.1, a value that dictates the
extent to which individual training samples influence
the shape of the decision boundary. Furthermore, an
epsilon tube of 0.1 has been established, ensuring that
the model's predictions falling within this margin of
error are not penalized. This approach allows for
flexibility in prediction accuracy, accommodating a
range of minor deviations from the exact target value.