2.2 ARIMA Model
2.2.1 Dataset Description
The ARIMA model was applied to a dataset tracking
the spread of the Corona Virus Disease (COVID-19)
epidemic, sourced from Johns Hopkins
epidemiological records. This dataset includes time-
stamped information on infection rates, recoveries,
and other pandemic-related statistics. It is particularly
suited for ARIMA, which requires stationary time
series data for effective prediction. The goal is to
predict the incidence and prevalence of COVID-19
over time, leveraging ARIMA’s ability to model and
forecast linear trends in the data. This dataset is
described in the study Application of the ARIMA
model on the COVID-2019 epidemic dataset
(Benvenuto et al., 2020).
2.2.2 Core Technology
The ARIMA is a common statistical model in
analyzing time series data especially when the series
is linear and stationary. It integrates three
components. First, Auto-Regressive (AR)
Component – this part describes the connection
between an observation and certain previous
observations (or earlier time points). Next, the
Integrated (I) Component is the differencing of raw
observations to render the time series stationary. This
step helps in the removal of trends which helps in
making the model trendless by focusing on variations
around the mean. Lastly, the Moving Average (MA)
Component, this part captures the dependence of an
observation on the residual errors from a moving
average model of lagged observations.
For this application, ARIMA requires the tuning
of three parameters: p stands for the order of the AR
term, d for the degree of differencing, and q for the
order of the MA term. These parameters are normally
selected from statistics on model performance on the
validation data using statistics such as the Akaike
Information Criterion (AIC). Even though the
ARIMA model is quite easy to apply, there are
restrictions as it cannot handle non-linearity which is
often found in the real time data sets and especially
when exposed to shocks like the COVID-19
Pandemic (Benvenuto et al., 2020).
2.3 RNN and LSTM Models
2.3.1 Dataset Used
For the RNN and LSTM models, a stock market
dataset was utilized, which captures historical stock
prices over time. This dataset is highly volatile, with
stock prices influenced by a variety of external
factors, making it ideal for testing the ability of deep
learning models to capture complex, non-linear
dependencies. The dataset includes time-stamped
records of stock price movements, which allow the
models to learn temporal patterns and make future
price predictions. This dataset is explored in detail in
Predictive Data Analysis: Leveraging RNN and
LSTM Techniques for Time Series Dataset (Agarwal
et al., 2024).
2.3.2 Core Technology
RNNs are special types of neural networks that are
used to work with sequences of data. Neural networks
work beyond others in series where inputs are
processed independently from each other whereas
RNNs take feedback of the output and pump back into
the network. This looping mechanism empowers
RNNs to grasp temporal dependencies. Nonetheless,
RNNs face serious challenges, notably the vanishing
gradient problem where gradients reduce as they are
back-propagated through time making it hard for the
model to learn long term dependencies.
Therefore, to solve the vanishing gradient issue,
an LSTMs architecture is recommended. To
overcome this challenge, LSTMs have a memory cell
reserved for the whole sequence. This is by an input
gate which determines what information with regard
to the inputs should be used in updating memory.
Another known type is the forget gate which
determines which parts of the cell states it needs to
remember or forget and the output gate or the pseudo-
gate which has a similar function to patterns of what
it wants to reveal along with the final results. This
enclosed structure allows LSTM to maintain the
information that is important, and discard the rest, for
long periods, and is ideal for assessing future values
from a long range of previous and current values. In
the stock market application, LSTMs perform much
better than traditional RNNs and other models since
LSTMs address the vanishing gradient problem
inherent in highly volatile time series data while
incorporating the long-term dependencies. In one
application, LSTM was found to predict with an
accuracy of 91.97% was achieved on testing data
(Agarwal et al., 2024).