Comparative Analysis of Time Series Models for Forecasting the U.S.
Unemployment Rate: A Study of ARIMA, LSTM, and Intervention
Approaches
Yunpeng Li
a
Bachelor of Business Administration (Hons) Programme, Hong Kong Baptist University, Kowloon, Hong Kong, China
Keywords: Time Series Forecasting, Unemployment Rate Prediction, Economic Events, COVID-19 Pandemic, Machine
Learning Techniques.
Abstract: The unemployment rate reflects the overall health of the labor market and influences monetary and fiscal
strategies. The rapidly changing economic landscape, marked by events like the financial downturn of 2008
and the global health crisis caused by COVID-19, highlights the necessity of stable forecasting models that
capture complex dynamics and structural changes. This research centers on comparing various time series
models to forecast unemployment rates in the United States (ARIMA, LSTM and intervention approaches).
The research collected the US unemployment rate for the 16-24 age group from 1978 to 2023 and applied
time series visualization, seasonal decomposition, and intervention analysis to understand trends and event
impacts. ARIMA and LSTM are developed and evaluated by evaluation measures like MSE, RMSE, MAE,
and MAPE. The study aims to identify which model best captures trends, seasonal patterns, and structural
changes in the labor market. Preliminary findings suggest that LSTM models outperform ARIMA in complex
scenarios due to their ability to learn long-term dependencies. The results of this research will contribute to
improved forecasting methodologies, providing policymakers with more accurate predictions to inform
decision-making processes.
1 INTRODUCTION
The unemployment rate has always been a hot topic
for economists worldwide. As it affects all countries,
forecasting the unemployment rate is critical for
policymakers, economists, and businesses (Douglas,
S., & Zahed, M.,2024). It provides valuable insights
into the overall health of the economy and labor
market trends. The United States unemployment rate,
in particular, is a key economic indicator that
influences monetary policy decisions, fiscal planning,
and business strategies. Over the years, various time
series models have been developed and applied to
predict unemployment rates, each with strengths and
limitations.
Recent studies have highlighted the need for a
comprehensive comparison of different forecasting
models because the performance of these models can
vary significantly depending on economic conditions
and data characteristics. For instance, Douglas and
a
https://orcid.org/0009-0006-2287-6080
Zahed demonstrated the limitations of using only
ARIMA models for long-term unemployment rate
forecasts, emphasizing the need to consider
alternative approaches (Douglas & Zahed, 2024).
Similarly, Gostkowski and Rokicki compared several
predictive methods, including ARIMA and regression
models, but they did not include more advanced
machine learning techniques such as LSTM
(Gostkowski & Rokicki, 2021).
The rapidly evolving economic landscape,
particularly considering recent worldwide
occurrences, such as the COVID-19 outbreak, has
emphasized the significance of reassessing and
comparing different forecasting models. Traditional
models that performed well in the past may no longer
be as effective in capturing the complex dynamics of
today's labor market. As Barnichon and Nekarda
(2012) noted, models incorporating labor force flow
data can significantly outperform traditional
forecasting approaches, especially during economic
downturns (Barnichon & Nekarda, 2012).
372
Li, Y.
Comparative Analysis of Time Series Models for Forecasting the U.S. Unemployment Rate: A Study of ARIMA, LSTM, and Intervention Approaches.
DOI: 10.5220/0013697600004670
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 2nd International Conference on Data Science and Engineering (ICDSE 2025), pages 372-378
ISBN: 978-989-758-765-8
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
Predicting unemployment levels is a critical task
for economic policymakers and researchers.
However, the data used by a number of studies is
already outdated, and they didn’t involve the outliers
caused by the pandemic. Various methods have been
employed by many researchers to predict
unemployment rates, including traditional statistical
models and advanced machine-learning techniques.
Previous researchers have used various methods
to predict unemployment trends in the United States,
including traditional ARIMA models, primary
economic signals, and automatic time series modeling
techniques like Autometrics. Guerard, Thomakos,
and Kyriazi built upon earlier work by applying
Autometrics to improve models for real GDP and
unemployment, accounting for structural breaks and
outliers (Guerard, Thomakos & Kyriazi, 2020). Their
study emphasized the effectiveness of adaptive
learning forecasting and the significance of
incorporating leading indicators. However, the
effects of the COVID-19 period were not included in
their research process, which generated a huge impact
on the global unemployment rate.
Shan Zhong analyzed the U.S. real GDP and
unemployment rate data from 1948 to 2023 using
linear and nonlinear regression and ARIMA models
(Zhong, 2023). The study found that nonlinear
regression more accurately represents the relationship
between these two factors. ARIMA forecasts showed
optimistic future trends with GDP growth and low
unemployment but with wide confidence intervals.
Yurtsever proposed a hybrid model combining
LSTM and GRU deep learning techniques to forecast
unemployment rates in the U.S., U.K., France, and
Italy. Generally, the hybrid model outperformed
standalone LSTM and GRU models, except in Italy,
where GRU performed better. This study highlights
the effectiveness of combining different models to
enhance forecasting performance (Yurtsever, 2023).
Other researchers have also explored hybrid
approaches, such as combining ARIMA with
artificial intelligence methods, which have shown
promising results in reducing prediction errors
(Chakraborty et al., 2021; Ahmad et al., 2021).
Additionally, Xiao et al. revisited earlier forecasting
methodologies to explore relationships between
unemployment rates and leading economic indicators
such as data on weekly jobless claims and the U.S.
Leading Economic Indicator (LEI), demonstrating
that incorporating these variables can enhance
predictive accuracy (Xiao et al., 2022). Montgomery
et al. further emphasized that forecasting accuracy
could be improved by combining multiple time series
methods and carefully accounting for structural
breaks in historical data (Montgomery et al., 1998).
Similarly, Dritsakis and Klazoglou applied the Box-
Jenkins methodology extensively to forecast U.S.
unemployment rates, highlighting its effectiveness
but also acknowledging its limitations when
confronted with structural changes or unprecedented
economic shocks (Dritsakis & Klazoglou, 2018).
These findings collectively reinforce the necessity of
exploring diverse forecasting methodologies to better
capture complex labor market dynamics.
This study seeks to fill this research gap by
comparing three well-known time series forecasting
methods to predict unemployment trends in the
United States: ARIMA, LSTM neural networks, and
intervention approaches. By evaluating these diverse
models using the recent data from 1978 to 2023 and
considering their performance across different
economic conditions, this study seeks to point out
which model is best fitted to predict unemployment
trends and offer new perspectives on how effectively
different forecasting approaches perform in the
current financial landscape.
2 DATA AND METHOD
2.1 Data Collection and Description
The dataset used in this analysis contains the
unemployment rate for the 16-24 age group from
December 1978 to July 2023. The data was cleaned
and preprocessed by removing missing values,
converting the date column to a date format, and
arranging the data in chronological order. The dataset
provides a comprehensive view of the trends and
patterns in youth unemployment over more than 40
years.
2.2 Methods and Principles
This study employs several methodologies to analyze
and predict unemployment rates in the United States.
The primary methods are as follows.
2.2.1 Time Series Visualization
Time series visualization is a crucial step in
understanding the behavior of the data over time. This
involves graphically representing the unemployment
rate over time to identify trends, seasonal patterns,
and significant events. Visual inspection helps in
understanding the overall behavior of the data.
Firstly, this study presents a time series graph
depicting unemployment rates among younger age
Comparative Analysis of Time Series Models for Forecasting the U.S. Unemployment Rate: A Study of ARIMA, LSTM, and Intervention
Approaches
373
groups. The unemployment rate for the 16-24 age
group was plotted over time, as shown in Figure 1, a
clear trend and seasonal patterns were displayed. A
vertical line was added to mark the start of the
pandemic in March 2020, was marked to observe its
impact on the trend.
Figure. 1 Youth Unemployment Trends (16-24 years old). (Picture credit: Original)
2.2.2 Seasonal Decomposition
Seasonal decomposition, using techniques such as the
STL method, breaks down the time series into its
underlying components: trend, seasonal, and residual.
This decomposition aids in understanding the
underlying structure of the data. In this study, the time
series was decomposed into trend, seasonal, and
residual components using the STL method, as shown
in Figure 2. This provided valuable insights into the
underlying structure of the data.
Figure. 2 Seasonal Decomposition by STL model. (Picture
credit: Original)
2.2.3 Intervention Analysis
Intervention analysis examines how particular
incidents, like the financial crisis in 2008 and the
COVID-19 pandemic, influence changes in
unemployment rates. Intervention analysis helps in
quantifying the effects of these events.
In this paper, a linear regression model was built
to analyze the impact of the financial crisis in 2008
and the 2020 pandemic on the unemployment rate.
The model included dummy variables for these events
and a lagged term of the unemployment rate and also
allowed for the quantification of their impacts on the
unemployment rate at the same time.
2.2.4 ARIMA Model
The ARIMA approach is widely utilized for
forecasting time series data. It integrates
autoregressive (AR), differencing (I), and moving
average (MA) elements to effectively identify linear
dependencies and seasonal variations within the
dataset. In this literature, the ARIMA model was
applied to forecast the unemployment rate. The data
was differenced to achieve stationarity, and the
optimal ARIMA model was selected based on the
AIC values.
2.2.5 LSTM Model
Long Short-Term Memory (LSTM) networks are a
type of recurrent neural network (RNN) that are
ICDSE 2025 - The International Conference on Data Science and Engineering
374
particularly effective for time series forecasting. They
are capable of learning long-term dependencies in the
data. This research applied the LSTM approach to
predict unemployment rates. The dataset underwent
normalization before being divided into training and
testing subsets. The training subset was utilized to
build the model, while the testing subset was
employed to assess its performance.
2.3 Evaluation Metrics
The average squared difference between the expected
and actual values is measured by the Mean Squared
Error, or MSE. It is a frequently used indicator to
assess how well regression models perform.
Root Mean Squared Error (RMSE) is calculated
as the square root of Mean Squared Error (MSE) and
measures the size of prediction errors, expressed in
the original units of the data.
Mean Absolute Error (MAE) calculates the mean
of the absolute differences between observed and
predicted values, and it is more robust against outliers
than MSE.
Mean Absolute Percentage Error (MAPE)
calculates the average of absolute differences
between actual and predicted values expressed as
percentages, offering a relative assessment of
forecasting accuracy.
Mean Absolute Scaled Error (MASE) evaluates a
model's accuracy by comparing its prediction errors
against errors from a simple baseline (naive forecast),
resulting in a standardized performance metric.
3 RESULTS AND DISCUSSION
3.1 Intervention Analysis
The intervention analysis model was designed to
assess the impact of significant events on the
unemployment rate. The results indicate that the 2008
financial crisis had a substantial and statistically
significant impact on the unemployment rate, with a
coefficient of 0.32035. This suggests that the crisis
led to a marked increase in unemployment.
Conversely, the 2020 pandemic had a much smaller
and statistically insignificant impact, with a
coefficient of 0.01223186. This finding may be
attributed to government interventions, such as
stimulus packages and employment support
programs, which mitigated the pandemic's effects on
employment.
Figure. 3 ACF Plot of Youth Unemployment Rate. (Picture
credit: Original)
Figure. 3 presents the ACF plot for the original
youth unemployment rate data (ages 16-24). The
ACF plot visually represents the correlation between
observations at different lag intervals. The horizontal
axis represents different lag values, while the vertical
axis shows autocorrelation coefficients. The shaded
area indicates the confidence interval bounds;
correlations extending beyond these bounds are
statistically significant. From Figure 3, high initial
autocorrelation is observed that there is a very high
autocorrelation at lag 0 (as expected, always equal to
1), followed by gradually decreasing autocorrelations
at subsequent lags. This indicates strong persistence
in the unemployment rate data, meaning past
unemployment rates heavily influence current rates.
It is also noticeable that notable periodic spikes at
regular intervals (approximately every 12 lags)
suggest clear seasonal patterns. This aligns with
typical labor market dynamics where youth
unemployment rates fluctuate seasonally due to
school calendars, holidays, and seasonal employment
opportunities.
In summary, Figure 3 illustrates that the original
unemployment series is non-stationary due to
persistent trends and seasonal fluctuations. These
characteristics necessitate differencing or other
transformations before applying forecasting models
such as ARIMA.
Comparative Analysis of Time Series Models for Forecasting the U.S. Unemployment Rate: A Study of ARIMA, LSTM, and Intervention
Approaches
375
Figure. 4 ACF Plot of Differential Youth Unemployment
Rate. (Picture credit: Original)
Figure 4 shows the ACF plot after applying first-
order differencing to the youth unemployment rate
data. Differencing is a common step in achieving
stationarity by removing trends and stabilizing mean
values. It is observed that there is a significant
reduction in autocorrelation. Compared to Figure 3,
the autocorrelation values drop sharply after
differencing. This substantial reduction indicates that
differencing effectively removed much of the trend
component present in the original data. Furthermore,
unlike Figure 3, Figure 4 displays a rapid decline of
autocorrelation values toward zero after just a few
lags. This rapid decay pattern confirms that
differencing successfully transformed the series into
a stationary one, which is suitable for ARIMA
modeling.
In summary, Figure 4 demonstrates that
differencing has effectively addressed non-
stationarity caused by trends but has not completely
eliminated seasonal effects. Therefore, although
intervention analysis can now be reliably performed
on this stationary series, additional modeling
adjustments may still be beneficial for capturing
remaining seasonal dynamics accurately. These
findings justify using ARIMA models with
differencing (as performed in this study).
3.2 ARIMA Model
Figure 5 below in the research paper provides a
comparative visualization of Akaike Information
Criterion (AIC) values calculated for different
ARIMA models tested during the model selection
process. The ARIMA model selected based on the
lowest AIC value was ARIMA (3,1,2), with an AIC
of -1178.4186. This model was chosen because lower
AIC values suggest a better trade-off between
complexity and explanatory power, which indicates it
achieves an optimal balance between capturing
sufficient patterns in youth unemployment data and
avoiding excessive complexity. Meanwhile, it has the
ability to effectively capture the linear relationships
and seasonal patterns in the data. However, the
model's performance on the test set revealed a
relatively high RMSE of 3.51754016, indicating that
the predictions were not highly accurate. This
suggests that while ARIMA models are useful for
understanding linear trends, they may struggle with
complex or non-linear dynamics.
Figure. 5 Comparison of AIC Values for Different ARIMA
Models. (Picture credit: Original)
3.3 LSTM Model
The LSTM model's performance was evaluated using
RMSE. The RMSE for the LSTM model was
1.030382, which is significantly lower than the
ARIMA model's RMSE on the test set. This indicates
that the LSTM model yielded more precise
ICDSE 2025 - The International Conference on Data Science and Engineering
376
predictions. It demonstrated enhanced performance
compared to the ARIMA model.
Figure. 6 LSTM Model Prediction Performance. (Picture
credit: Original)
However, the LSTM model's predictions, as
shown in Figure. 6, exhibit a hysteresis quality,
indicating a lagged response to changes in the actual
unemployment rate data. This phenomenon can be
attributed to the model's design, which prioritizes
capturing long-term dependencies over immediate
adjustments to new trends. To improve the model's
responsiveness in the future, further research could
focus on refining the model architecture or optimizing
training parameters to better adapt to rapid changes in
economic conditions.
3.4 Comparison of Results
3.4.1 Intervention Analysis
The intervention analysis results indicated clear
differences in the impacts of major economic events
on youth unemployment rates. Specifically, the 2008
Financial Crisis: The unemployment rate increased
significantly during the crisis, as indicated by the
coefficient of 0.32035. The considerable effect of this
event on unemployment rates highlights the necessity
of including external shocks in predictive models. In
contrast, the 2020 Pandemic, with a coefficient of
0.01223186, indicates a negligible impact on the
unemployment rate, which may be due to government
interventions or other factors, such as stimulus
packages, employment subsidies, and targeted
economic support programs implemented during the
pandemic period, which effectively cushioning the
labor market from severe disruptions. The minimal
impact observed may reflect effective policy
interventions, highlighting the need for adaptive
forecasting approaches that account for such factors.
3.4.2 ARIMA Model
Through model selection based on AIC, the ARIMA
(3,1,2) model had the lowest AIC value, indicating it
was the best fit among the models tested and was
suitable for capturing linear relationships and
seasonal patterns present in youth unemployment
data. However, for its prediction Error: The RMSE of
3.51754016 on the test set suggests that the model's
predictions were not very accurate. The high RMSE
suggests limitations in addressing complex scenarios,
particularly when nonlinearities and structural breaks
exist within the historical data.
3.4.3 LSTM Model
In comparison to ARIMA, the LSTM model achieved
a notably lower RMSE of around 1.030382 on test
data. The lower RMSE indicates better performance
in capturing non-linear dynamics. This improved
accuracy highlights LSTM's strength in capturing
long-term nonlinear dependencies and complex
temporal patterns inherent in youth unemployment
rates. Nevertheless, despite its overall better
predictive capability, the LSTM model exhibited a
delayed response or hysteresis effect when reacting to
sudden shifts or structural changes in unemployment
trends.
3.5 Discussion
The results of this study highlight the importance of
selecting appropriate time series models based on the
complexity of the data. While ARIMA models are
effective for linear trends, LSTM models offer
superior performance in scenarios with non-linear
relationships. The intervention analysis underscores
the need to incorporate significant events into
forecasting models to improve accuracy. These
findings have implications for policymakers seeking
to improve their ability to foresee and make well-
informed choices regarding labor market initiatives.
4 CONCLUSIONS
This literature analyses the youth unemployment rate
data using various methods and provides valuable
insights into the trends and impacts of significant
events on the unemployment rate. The intervention
Comparative Analysis of Time Series Models for Forecasting the U.S. Unemployment Rate: A Study of ARIMA, LSTM, and Intervention
Approaches
377
analysis highlighted the significant impact of the
2008 financial crisis, while the 2020 pandemic had a
smaller and statistically insignificant impact. The
ARIMA model provided a baseline for forecasting,
but the LSTM model outperformed it in terms of
prediction accuracy, as indicated by the lower RMSE
value. The results suggest that deep learning models
like LSTM can be more effective for time series
forecasting in complex and non-linear scenarios. This
study has limitations in involving more related
control variables to forecast the unemployment rate in
the US because the unemployment rate is affected by
factors such as different genders, different races, and
the financial crisis period in 2008. If all these highly
correlated factors are included in this research, the
results should be more precise and accurate.
Therefore, the shortcoming of this research is not
taking into account all of these correlated control
variables. Additionally, as the LSTM exhibits a
hysteresis quality, other changes such as adjusting the
model architecture and improving data preprocessing
are required in future studies. In the future, this study
will verify more corresponding factors which perform
a big impact on the unemployment rate. Additionally,
more forecasting models like linear and non-linear
regression models would be applied to get the best-
performing model to predict the unemployment rate
so that governments and policymakers around the
world could look forward to future changes in the
unemployment rate and introduce relevant policies in
advance to stabilize the economic conditions.
REFERENCES
Barnichon, R., & Nekarda, C. J. (2012). The ins and outs of
forecasting unemployment: Using labor force flows to
forecast the labor market. Journal of Economic
Perspectives, 26(3), 45-66.
Douglas, S., & Zahed, M. (2024). Forecasting U.S.
unemployment rates using ARIMA: A time series
analysis from 1948 to 2019. SESUG 2024 Proceedings.
Retrieved from
https://sesug.org/proceedings/sesug_2024_SAAG/Pres
entationSummaries/Papers/148_Final_PDF.pdf
Dritsakis, N., & Klazoglou, P. (2018). Forecasting
unemployment rates in USA using Box-Jenkins
methodology. International Journal of Economics and
Financial Issues, 8(1), 9.
Gostkowski, M., & Rokicki, T. (2021). Forecasting the
unemployment rate: Application of selected prediction
methods. European Research Studies Journal,
XXIV(3), 985-1000.
Guerard, J., Thomakos, D., & Kyriazi, F. (2019). Automatic
time series modeling and forecasting: A replication case
study of forecasting real GDP, the unemployment rate,
and the impact of leading economic indicators. Cogent
Economics & Finance, 7(1), 1-20.
Guerard, J., Thomakos, D., & Kyriazi, F. (2020). Automatic
time series modeling and forecasting: A replication case
study of forecasting real GDP, the unemployment rate
and the impact of leading economic indicators. Cogent
Economics & Finance, 8(1), 1759483.
Montgomery, A. L., Zarnowitz, V., Tsay, R., & Tiao, G. C.
(1998). Forecasting the U.S. unemployment rate.
Journal of the American Statistical Association,
93(442), 478-493.
Xiao, H., Chen, R., & Guerard Jr, J. B. (2022). Forecasting
the U.S. unemployment rate: Another look. Wilmott
Magazine, November, 20-31.
Yurtsever, M. (2023). Unemployment rate forecasting:
LSTM-GRU hybrid approach. Journal for Labour
Market Research, 57(1), 1-9.
Zhong, S. (2023). The research and forecasts on real GDP
growth and the unemployment rate in the United States.
EMFRM, 2023, 159-165.
ICDSE 2025 - The International Conference on Data Science and Engineering
378