Progresses of Machine Learning in Stock Market Prediction: A
Comprehensive Model Comparison
Zheng Qin
a
Zhengzhou No.9 High School, Zhengzhou, China
Keywords: Artificial Intelligence, Machine Learning, Stock Price Prediction.
Abstract: Stock price prediction is a crucial technique in investment, but traditional approaches can hardly get high
accuracy and precision on predicting as they cannot considered about the various factors in the complex
market. As a result, more advanced method is desired by the market. In this review, several major methods of
machine learning of artificial intelligence are referred, such as Linear Regression, Random Forest, Recurrent
Neural Networks (RNN) and Long Short-Term Memory (LSTM). And the basic workflow behind the
application of these methods is also introduced. Based on the cases of different models in predicting stock
prices, the advantages and disadvantages of various machine learning models in forecasting are illustrated
and compared. Traditional models like Linear Regression offer simplicity but are limited by their inability to
capture non-linear relationships. In contrast, Random Forest models improve prediction accuracy through
ensemble methods but face challenges like overfitting and lack of temporal modelling capabilities. Advanced
models such as RNNs and LSTMs excel in capturing complex temporal dependencies, making them more
effective for stock price forecasting. This review can provide a good reference for scholars in the intersection
of finance and AI.
1 INTRODUCTION
The stock market provides a platform where
individuals can buy and sell shares of companies.
Each share sold in the market represents ownership in
a company, traded under numerous factors including
economic indicators, company performance, investor
behavior, and global events that influence the price.
For investors, predicting stock prices becomes
important because it influences various investment
decisions, the management of risks, and financial
planning.
Stock price prediction is the procedure of
estimating future stock prices based on historical
data. This has traditionally been done with the help of
financial models like the Capital Asset Pricing Model
(CAPM) and Efficient Market Hypothesis (EMH).
However, in general, these kinds of models are based
on the assumptions of market behavior but mostly fail
to capture the ever-changing complexity and
dynamism of financial markets. This would mean
a
https://orcid.org/0009-0003-7651-1253
very low predictive accuracy in facing settings where
not only changes but also those happening fast should
be taken into account. Advanced computing has
facilitated the emergence of advanced methods for
predicting stock prices. These methods are data-
driven and incorporate strong algorithms to analyze
past movement in price and possibly capture any
patterns that old models may not capture. Artificial
Intelligence has been regarded as a promising
direction for better prediction of stock price. As a
whole computer science, AI can process very big
volumes of data, to identify trends and even make
predictions with very little human intervention.
AI has taken a giant leap in recent years, and it can
be observed that there are many algorithms that show
potential for predictive tasks. Examples include
decision trees, random forests, and logistic
regression, which are well known for their inherent
characteristics in dealing with different data and
prediction problems. Furthermore, the advent of large
language models like Chat Generative Pre-Trained
Transformer (ChatGPT) epitomizes the vastness of
96
Qin and Z.
Progresses of Machine Learning in Stock Market Prediction: A Comprehensive Model Comparison.
DOI: 10.5220/0013509100004619
In Proceedings of the 2nd International Conference on Data Analysis and Machine Learning (DAML 2024) , pages 96-100
ISBN: 978-989-758-754-2
Copyright © 2025 by Paper published under CC license (CC BY-NC-ND 4.0)
the boundaries crossed by AI into natural language
processing and, further, decision-making.
AI has not been staying within the world of
finance but has touched the other major fields of
biology, chemistry, and medicine as well. Due to the
somewhat general capabilities gained by AI, these
have found applications in finance, including being
applied to the problem of prediction of stock price.
There are a great amount of cases of applying
machine learning models in the financial field.
Zeinalizadeh et al. created an application using a
neural network model to accurately forecast financial
service customer behavior (Zeinalizadeh, 2015).
Zhang et al. applied an SVM model to stock price
prediction with great success (Cao, 2019). A random
forest model was also used to further investigate the
prediction of market trends by Chen et al. (Chen,
2021), while Singh et al. introduced a deep learning
approach using Recurrent Neural Networks (RNNs)
and Long Short-term Memory (LSTM) networks to
improve stock price predictions (Singh, 2022). The
results of these studies all bring into focus the interest
in using AI to tackle financial prediction problems.
There is an incessant urge given the importance of
predicting stock prices, especially with the rapid
advances in AI technology, to explore the
transformation that these technologies are bringing
about in the financial markets. As a result, this review
was aimed to assess the state of the art in AI-based
stock price prediction, effectiveness, and
identification of further research and innovation
possibilities. The more AI develops, the more it is
integrated into the fabric of financial markets, the
better it gets at predictive accuracy, benefitting
investors, and in a far greater sense, the economy.
2 METHOD
2.1 The Introduction of Machine
Learning Workflow
Usually, the development process of a machine
learning-based stock price prediction model is
executed in a structured workflow that involves
several critical steps aim to ensure the robustness and
accuracy of the models, which will deliver reliable
predictions based on financial data.
2.1.1 Data Collection
The first step is gathering relevant data, it is a quite
crucial part for building an effective model. In the
context of stock price prediction, data is gained from
many kinds of sources such as stock market databases
and financial platforms like the website Yahoo
Finance. The data consist of different stocks in
different markets and key attributes such as opening
prices, closing prices, trading volumes, and other
market indicators. It is essential to collect all-sided
data because it forms the foudation of the model.
2.1.2 Data Preprocessing
After the collection of the data is finished ,it has to
experience a preprocessing to make sure its quality
and suitability is high enough for model training. It
concludes disposing missing value with imputation,
identifying and managing those error values affecting
the effectiveness of prediction,and sort the dataset
into two sets ,one is for training and one is for testing.
Normalization is also applied to the data in this step
,it can scale the data by making sure the contribution
of all features are equal and rational.This part is able
to reduce the biases and the performance in great
extent.
2.1.3 Model Building
For building the model, suitable machine learning or
deep learning models are selected and designed.
Random Forests, Decision Trees, and Neural
Networks are the most common ones, each of them
provides unique advantages for stock price
prediction. In this phase, the models architecture is
carefully constructed, and hyperparameters are tuned
to optimize its ability to learn from the data.
2.1.4 Model Training
The training process involves feeding the
preprocessed training data into the chosen models.
During training, the models learn patterns within the
data, then fitting the parameters to avoid the
occurrence of errors. This process is iterative, and it
gives the model the ability to generalize effectively to
new, unseen data.
2.1.5 Model Testing
After training, the models are evaluated using the
testing dataset. Measurement numbers such as Mean
Squared Error (MSE), and Root Mean Squared Error
(RMSE) are worked out to assess prediction
accuracy. These metrics help in identifying the
strengths and weaknesses of the model.
Progresses of Machine Learning in Stock Market Prediction: A Comprehensive Model Comparison
97
2.1.6 Model Deployment
After the model passes the test with metrics which are
lower than the thread and show its satisfying
performance, the deployment is ready. During the
deployment the model is integrated into a real-world
system where it can make predictions based on live
data. Continuous monitoring and updating are taken
in order to remain the model is accurate while the
market condition evolves.
2.2 Linear Regression
Linear regression is a basic statistical technique
which is always utilized building the connection
among a dependent and at least one independent
variable. The main concept behind the model is to fit
the linear equation to the gathered data such that the
total of squared variances between the gathered and
the predicted values is minimized.
Linear regression can be utilized to forecast stock
prices in coming days by inputting the past data from
the source. For instance, a study carried out by Cakra
et al., employed linear regression to analyze stock
price movements and developed a model that
demonstrated how past price trends could be
leveraged to predict future performance (Cakra,
2015). It is realized easily by fitting the model and
then asking the model to generate the forecasting
result. As detailed in their findings, the potential of
linear regression for making initial predictions is
shown, although it also acknowledged the limitations
of linear models in capturing complex market
dynamics.
2.3 Random Forest
Random Forest is a method for group learning it
works by combining multiple decision trees for both
higher accuracy and less serious overfitting. Creating
a "forest" of decision trees is the main concept of the
model, every tree is formed from a subset of the data
and features randomly. After integrating the result
from all the trees, the overall prediction is worked out,
commonly using voting for classification or
averaging for regression.
In stock price prediction, Random Forest enhance
prediction performance by leveraging its ability to
handle large datasets and complex interactions
between features. A study by Khaidem et al.
employed Random Forest to predict stock prices and
demonstrated its effectiveness in capturing intricate
patterns within the data (Khaidem, 2016). Their
review illustrated how Random Forest's robustness
against overfitting and analyzed how its ability of
disposing nonlinear relationships results in more
accurate stock price forecasts.
2.4 Recurrent Neural Networks (RNN)
An artificial neural network class called RNNs is
specifically made for sequence prediction
applications. They are quite different from traditional
feedforward networks, because connections of it can
form cycles, so they have unique ability to hold the
memory for a while. That is the reason why RNNs are
particularly suitable for time series data, where past
information is crucial for predicting future values.
As for stock price prediction, RNNs can be used
to simulate temporal dependencies and trends. For
example, Lu et al developed an RNN-based model:
Time-series Recurrent Neural Network (TRNN) to
analyze stock prices, and their work successfully
demonstrates how the network work in capturing
temporal relationships between past prices could be
utilized to forecast future stock trends (Lu, 2024).
Their research emphasized the advantages of RNNs
in modeling sequential data and their creative solution
for improving prediction accuracy by transforming
the one-dimension price-volume relationship into two
dimensions ones.
2.5 Long Short-Term Memory (LSTM)
LSTM networks are improved version of RNN
because it is designed to address the drawbacks of
original RNNs, like their difficulty in learning long-
term dependencies. Improved models have an
outstanding capability that enables the network to be
able to remember and make use of information over
longer sequences by introducing memory cells and
gating mechanisms for controlling the information
flow. Because of their capacity to handle intricate
patterns and long-term dependencies in time series
data, LSTMs have shown efficacy in stock price
prediction. For instance, Achyut et al used LSTM
networks to forecast stock prices and develop a
predictive model that take the advantage of the
network's capacity for remaining information across
extended periods (Ghosh, 2019). Their findings
showcased LSTM's superior performance in
capturing long-term trends and enhancing prediction
accuracy.
3 DISCUSSIONS
A number of machine learning models have distinct
strengths and weaknesses in predicting stock prices.
This section compares the advantages and limitations
DAML 2024 - International Conference on Data Analysis and Machine Learning
98
of Linear Regression, Random Forest, RNNs, and
LSTMs, and concludes the key challenges and future
directions in this field.
3.1 Comparison of Machine Learning
Models
Linear Regression: Linear Regression is
straightforward and easy to realize. It gets a superior
performance when the relationships between
variables are approximately linear. However, in
contrast it almost does not work in capturing patterns
present in stock price movements, when they are
complex and non-linear. So, its effectiveness is
strongly limited when facing more intricate
prediction tasks.
Random Forest: Random Forest gets satisfying
performance by aggregating multiple decision trees,
which enhances robustness and reduces overfitting
compared to individual decision trees. But it may still
be overfit with too many trees or irrelevant features.
Random Forest is less suitable for capturing stock
trends because it does not inherently take account for
temporal dependencies in time series data.
Recurrent Neural Networks: RNNs are
designed to handle sequential data and can capture
temporal dependencies, driving them applicable for
time series forecasting. But they may struggle when
facing vanishing and exploding gradient problems
because they increase the difficulty of learning
because of greater complexity when facing long-term
dependencies.
Long Short-Term Memory: It can address many
of the limitations of RNNs by incorporating
mechanisms to manage long-term dependencies and
complex temporal relationships. That is the reason
why they get much better performance in predicting
compared with RNNs. But they are also not perfect,
LSTMs can be very demanding and costly in
resources because they are computationally intensive
and require careful tuning.
3.2 Challenges in Stock Price Prediction
AI Model Interpretability: Many advanced models
all have a very low interpretability which means it is
very hard to understand the algorithm behind the
model and how do they work, such as deep learning
networks, they can be quite similar to "black boxes”,
This lack of transparency can reduce trust in the
results, complicate efforts to improve or adjust the
model when predictions are inaccurate.
Model Generalizability: Models trained on one
stock often struggle to generalize to other stocks
because of significant differences in data
characteristics. So most models are not applicable to
every piece of stock. But training separate models for
each stock is impractical due to high costs and data
collection challenges,so it is necessary to find a way
to improve the adaptability of models to enable them
get satisfying performance in prediction of various
stocks.
3.3 Future Directions
Expert Systems and Explainability Methods: In
order to make it easier to understand how the
prediction is made ,it is essential to adopt some
approaches such as SHapley Additive exPlanations
(SHAP) and Local Interpretable Model-agnostic
Explanations (LIME) . SHAP values help quantify
the impact of each feature on predictions, while LIME
provides simplified, interpretable approximations of
complex models, improving transparency and
understanding.
Transfer Learning and Domain Adaptation:
As for the poor generality of most models, some
techniques can offer promising and practical
solutions. Transfer learning leverages knowledge
from one domain to enhance performance in a related
domain (Weiss, 2016; Zhuang, 2020), while domain
adaptation adjusts models to perform well across
different data distributions. Taking full advantages of
these can save a lot of effort which is originally
wasted on retraining models and facilitate more
effective predictions across diverse stocks.
In summary, while traditional models like Linear
Regression and Random Forest have their uses,
advanced models such as RNNs and LSTMs are
better suited to handle the complexities of stock price
prediction. Addressing challenges related to model
interpretability and generalizability through
innovative methods will be crucial for advancing the
field and achieving more accurate predictions.
4 CONCLUSIONS
This review summarizes the application and progress
of the AI and machine learning in stock price
prediction and mentions many practical application
cases. And introduced the machine learning
workflow, different models such as linear regression,
random forest and their application in predicting
stock prices. In this paper, the advantages and
disadvantages of various model algorithms have been
listed and compared. Different models all have some
Progresses of Machine Learning in Stock Market Prediction: A Comprehensive Model Comparison
99
specific shortages. These findings highlight the
importance of deep learning models in financial
forecasting, especially as markets become more
complex and data driven. The successful application
of these models could lead to more accurate forecasts.
And this paper can provide a good overview reference
for computing and finance. This paper only focuses
on AI models but does not investigate traditional
financial models or sequential models such as Arima,
which will be further considered and analyzed in the
future.
REFERENCES
Cakra, Y. E., & Trisedya, B. D. 2015. Stock price prediction
using linear regression based on sentiment analysis.
2015 International Conference on Advanced Computer
Science and Information Systems (ICACSIS). IEEE.
Cao, H., et al. 2019. Stock price pattern prediction based on
complex network and machine learning. Complexity,
2019.1: 4132485.
Chen, Q., et al. 2021. Mapping China’s regional economic
activity by integrating points-of-interest and remote
sensing data with random forest. Environment and
Planning B: Urban Analytics and City Science, 48.7:
1876-1894.
Ghosh, A., et al. 2019. Stock price prediction using LSTM
on Indian share market. Proceedings of the 32nd
International Conference on. Vol. 63.
Khaidem, L., Saha, S., & Roy Dey, S. 2016. Predicting the
direction of stock market prices using random forest.
arXiv preprint arXiv:1605.00003.
Lu, M., & Xu, X. 2024. TRNN: An efficient time-series
recurrent neural network for stock price prediction.
Information Sciences, 657: 119951.
Singh, N., Mohan, B. R., & Naik, N. 2022. Hybrid model
of multifactor analysis with RNN-LSTM to predict
stock price. Advanced Machine Intelligence and Signal
Processing. Singapore: Springer Nature Singapore.
Weiss, K., Khoshgoftaar, T. M., & Wang, D. 2016. A
survey of transfer learning. Journal of Big Data, 3: 1-
40.
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... &
He, Q. 2020. A comprehensive survey on transfer
learning. Proceedings of the IEEE, 109(1), 43-76.
Zeinalizadeh, N., Shojaie, A. A., & Shariatmadari, M. 2015.
Modeling and analysis of bank customer satisfaction
using neural networks approach. International Journal
of Bank Marketing, 33.6: 717-732.
DAML 2024 - International Conference on Data Analysis and Machine Learning
100