2 DATA AND METHOD
2.1 Data collection and description
Data is collected from Yahoo Finance. The dataset
contains eight variables and 5849 observa-tions. Date
is the record date. Data contains date from 2000-8-30
to 2023-12-29. The date without the gold trade is not
included. Variable Open is the open price of gold on
that date. High and Low are the highest and lowest
prices of gold that date. Close is the difference be-
tween the gold price on the current date and the last
day. SP500_Close is the S&P 500 index close point
that date. The index is one of the most crucial stock
indices in United States. It represents the performance
of the best 500 stocks in the American stock market.
USD_Index_Close is the United States Dollar index
close point that date. It can represent overall power
against other primary currencies worldwide.
vix_data_Close is the volatility in-dex. It can
represent the geopolitical and economic risk or
uncertainty level in the market. WTI_Crude is the oil
price of West Texas Intermediate. It is the critical
global oil price stand-ard.
Gold and WTI crude oil are indirectly linked
through inflation (Jain & Biswal, 2016). Oil and gold
prices have shown a positive correlation. The USD
index reflects the value of the dol-lar, which will
directly impact the price of gold. Investors may find
other assets instead of the US dollar to preserve value
when the US dollar becomes weaker. Gold is a
traditional safe-haven asset. Thus, gold will attract
more investors when USD becomes weaker, and the
in-creasing demand will make the gold price higher.
Similarly, because of the safe characteristics of gold,
when the VIX index increases, which indicates that
the risk becomes higher. Inves-tors need to invest on
gold to hedge the risk (Hapau, 2023). S&P500
reflects the risk senti-ment and capital allocation.
When the stock market is in a bull market period,
people tend to allocate more money to the stock
market for high returns. Thus, less money is willing
to allo-cate to gold. Conversely, if the stock market is
experiencing a downturn, more people will be willing
to buy gold to avoid high risk in the stock market (Jain
& Biswal, 2016).
2.2 Data processing and freature
creation
To ensure stationarity, model stability, and accuracy,
this paper will use the relative return of gold instead
of the gold close price directly (absolute return). As
shown in formula (1), relative returns are calculated
as the percentage change between the current gold
price and the gold price the previous day. P_t
represents the price of gold today and P_(t-1)
represents the price of gold on the last day.
Return =
P
−P
P
1
Three features are created to predict gold return
better. Moving average convergence diver-gence
(MACD) difference is used as an indicator to measure
the momentum and trend strength of gold price.
Formulas (2), (3), (4), (5) show how to calculate the
MACD difference. The MACD difference is
calculated by the difference between the MACD line
and signal line. The MACD line is calculated by
subtracting the 26-day exponential moving average
(EMA) from the 12-day. The signal line is the 9-day
EMA of the MACD line. where P is the gold price for
the period, i is the current period, n – number of data
considered for the calculation of the moving average
(Aguirre et al., 2020).
EMA = P
∗
2
n+1
+EMA
∗1−
2
n+1
2
MACD line = EMA
−EMA
3
Signal line = EMA
4
MACD difference = MACD line − Signal line
5
The relative strength index (RSI) is an indicator
that can identify overbought or oversold in the
market. It measures the speed of change of price
movements. Formulas (6), (7), (8), (9) (10) show how
to calculate RSI. The first step is to calculate the price
change. The second step is to calculate average gains
(AG) and losses (AL), n, which is the look-back
period of 14 days. The third step is to calculate
relative strength. Finally, relative strength is used to
calculate the RSI. When the RSI value exceeds 70, an
overbought signal exists on the asset. When the RSI
value is lower than 30, an oversold signal exists on
the asset (Husaini et al., 2024). P_t represents the
price of gold today and P
represents the price of
gold on the last day.
∆P
=P
−P
6
AG =
∑
∆P
∆P
>0
n
7
AL =
∑|
∆P
|
∆P
<0
n
8
RS =
AG
AL
9