2 LITERATURE REVIEW
Based on this background, the research conducted a
literature review to understand what the previous
researchers had done on sentiment analysis and stock
market prediction.
2.1 Sentiment Analysis Tools
Traditional machine learning and deep learning
models are widely utilized in sentiment analysis (Tan
et al., 2023). Machine learning models, such as
Native Bayes, SVM, and Logistic Regression, are
straightforward to deploy and offer good
interpretability but may not perform well with some
complex text data. In terms of deep learning models
(CNN, RNN, LSTM, and GRU), they require
substantial data and computing resources.
Hutto and Gilbert (2014) introduced a rule-based
sentiment analysis model. They compared its
effectiveness to eleven typical state-of-practice
benchmarks, including LIWC, ANEW, the General
Inquirer, SentiWordNet, and machine learning
techniques based on Naive Bayes, Maximum
Entropy, and Support Vector Machine (SVM)
algorithms. Although it performs well in the social
media analysis, it also needs to be verified in the stock
market analysis.
Umar, Binji, and Balarabe (2024) overviewed
corpus-based approaches for sentiment analysis.
They suggested that there exist some limitations, like
data sparsity and context sensitivity, though this
model excels in handling complex language and
domain-specific data compared to dictionary-based
and rule-based methods.
2.2 Machine Learning and Stock
Prediction
Machine learning approaches are commonly
employed to stock price forecasting. Lumoring,
Chandra, and Gunawan (2023) conducted a
comparative study of various models, including SVM
and Random Forest, and emphasized that Long Short-
Term Memory (LSTM) is the most effective. Besides
the advanced machine learning models, the influence
of market sentiment should not be overlooked in
stock market predictions.
Furthermore, the widely renowned large language
model, ChatGPT, has achieved accuracy rates of 70%
for Microsoft and 63.88% for Google in predicting
stock trends (Mumtaz & Mumtaz, 2023). It is
noticeable that ChatGPT has not been trained for
stock market prediction and is limited only to
predicting trends.
2.3 Sentiment Analysis and Stock Market
Performance
More and more studies focus on the relationship
between sentiment analysis and stock market
performance. The researcher collected historical
stock market data for 10 major biotech companies and
utilized the VADER sentiment analysis tool in
conjunction with time series models for forecasting.
The study revealed a significant positive correlation
between a company’s sentiment scores and those of
its competitors (Avila, 2023).
It is found that current research lacks the
combination of sentiment analysis and stock price
prediction. This paper is going to introduce a more
comprehensive model that merges social media and
financial sentiment lexicons with a machine learning
model to forecast the stock market.
3 METHOD
3.1 Data Set
3.1.1 Stock Market Tweets Data
The study downloaded the Stock Market Tweets Data
from the IEEE (Taborda et al., 2021). This open-
access dataset collected 943,672 tweets created from
April 9 to July 16, 2020. It used Twitter tags
(#SPX500, #SP500, SPX500, SP500, $SPX, #stocks,
$MSFT, $AAPL, $AMZN, $FB, $BBRK.B,
$GOOG, $JNJ, $JPM, $V, $PG, $MA, $INTC
$UNH, $BAC, $T, $HD, $XOM, $DIS, $VZ, $KO,
$MRK, $CMCSA, $CVX, $PEP, and $PFE as search
parameters to collect the top 25 companies ticker
tweets in the S&P 500 index. There are two files in
this data set. One file includes 5,000 tweets, and out
of those 5,000 tweets, 1,300 were manually annotated
and reviewed by a second independent annotator.
Consequently, this article utilized unlabelled tweets
for ticker classification and predicted the trends of
each company in the model.
3.1.2 YFinance
Regarding the yfinance library, this paper
downloaded and normalised the historical stock price.
Through the expanding window algorithm, 20% data
was used to train the model at the beginning, and the
train set would expand, keeping the test set at 10%