3 CONFIGURATIONS FOR
HYBRID MODEL
CNN is a frequently utilized model. The fundamental
building blocks of the CNN model, the convolution
layer and pooling layer, are capable of automatically
extracting and reducing the dimension of the input
features. This reduces the adverse impact of the
traditional model and works better for extracting
characteristics of stock information. The convolution
layer extracts some local features of the input stock
price sequence through the convolution kernel, which
is equivalent to the feature extractor. The stock price
series is reduced and the secondary features are
extracted during the pooling layer, so as to further
enhance the model's capacity for generalization. Long
Short-Term Memory(LSTM) is generally limited to
transmitting data in a single direction and only
accepting input from the past; it cannot process input
from the future. Data from the past and the future can
be taken into account by BiLSTM simultaneously. Its
principle is: Compute LSTM's path starting from both
ends respectively, and then merge the LSTMs of the
two directions(Althelaya, 2018). Past data
information of the input sequence is stored in the
forward LSTM. Information regarding the input
sequence's future is contained in the backward LSTM.
The human brain's capacity to focus on items is
simulated in the Attention Mechanism (AM). Giving
more weight to relevant information and less weight
to unimportant information is the fundamental tenet
of AM. The main structure of CNN-BiLSTM-AM
model is CNN, BiLSTM, and AM, including input
layer, CNN layer, BiLSTM layer, AM layer, and
output layer(Lu, 2021).In the training process, the
standardization process is applied to reduce the
difference between the data and better adapt to the
model training.Separately, every network layer
extracts and processes characteristics of data. Lastly,
the output layer stores the model's forecasting
findings. To continually improve the model's
predictive power, every discrepancy between the
actual value and the anticipated results is computed,
and backpropagation is used to update the model's
weight and deviation. The training process continues
until a set termination condition is met (completion of
a predetermined number of cycles or an error
threshold is reached). After the training process, the
model can be used for prediction. First, the input data
is standardized, and then the trained CNN-BiLSTM-
AM model is used to generate prediction results. And
then restore the output results to the original data
format, and finally output them.
An enhanced neural network model built on the
foundation of a recurrent neural network(RNN) is the
LSTM model, which resolves the gradient explosion
and disappearance issues with RNNs and offers a
greater capacity for generalization. The input, output,
and forgetting gates are added to the LSTM model in
comparison to the RNN. These gate units remove or
add data information, so that it can retain important
information as much as possible and remove
interference information. In these door units, first of
all, the forgetting door is responsible for forgetting
the useless historical stock information, then, based
on input stock data and historical information, the
input door modifies the unit status. The current stock
information is finally output by the output door based
on the status of the unit. Bidirectional Encoder
Representations from Transformers(BERT)'s design
is inspired by bidirectional and Transformer, which,
unlike traditional one-way language models, takes
into account both left and right contexts to more fully
capture the context of text. BERT model uses self-
attention mechanism to construct deep neural
network, and Transformer is the core to implement
bidirectional text coding (Zhang, 2024). BERT model
and BiLSTM model are used to extract the emotional
features of some financial news, and BERT self-
supervision function is used to predict the emotional
polarity of the remaining financial news. The
forecasting step utilizing LSTM can be continued
After integrating the obtained emotional features with
stock information.
4 IMPLEMENTATION RESULTS
The Shanghai Composite Index (000,001) stock is
chosen as the experimental data in an experiment (Lu,
2021). Models are trained using the training data set
that has been analyzed. MAE, RMSE, and R-square
(R
2
) are employed as the methodologies' evaluation
criteria to assess each model's ability to predict
outcomes. Higher prediction effect is associated with
lesser MAE and RMSE. The model's predictive
power increases with the proximity of its R
2
value to
zero, which runs from zero to one. Of the nine
methods in the Table 1, CNN-BiLSTM-AM performs
the best since its MAE and RMSE are the lowest and
its R
2
is closest to 1. Comparing BiLSTM with LSTM,
MAE decreased 4%, RMSE decreased 2%, indicating
that BiLSTM has better effect. BiLSTM and LSTM
are combined with CNN to form BiLSTM-CNN and
LSTM-CNN, respectively. The results show that
CNN-bilSTM has a higher R
2
and smaller MAE and
RMSE than CNN-LSTM. It shows that CNN-
Principe and Applications of Hybrid Prediction Models for Stock Price Forecasting