Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM
Models
Konstantinos N. Vavliakis
1,2
, Andreas Siailis
1
and Andreas L. Symeonidis
1
1
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, GR54124, Greece
2
Pharm24.gr, Dafni Lakonias, GR23057, Greece
Keywords:
Sales Forecasting, e-Commerce, Neural Network, ARIMA, RNN.
Abstract:
Sales forecasting is the process of estimating future revenue by predicting the amount of product or services
a sales unit will sell in the near future. Although significant advances have been made in developing sales
forecasting techniques over the past decades, the problem is so diverse and multi-dimensional that only in a
few cases high accuracy predictions can be achieved. In this work, we propose a new hybrid model that is
suitable for modeling linear and non-linear sales trends by combining an ARIMA (autoregressive integrated
moving average) model with an LSTM (Long short-term memory) neural network. The primary focus of
our work is predicting e-commerce sales, so we incorporated in our solution the value of the final sale, as it
greatly affects sales in highly competitive and price-sensitive environments like e-commerce. We compare
the proposed solution against three competitive solutions using a dataset coming from a real-life e-commerce
store, and we show that our solution outperforms all three competing models.
1 INTRODUCTION
Sales forecasting is the process that enables a busi-
ness to estimate future sales. Inventory planning,
production scheduling, cash flow planning, alignment
of sales quotas and revenue expectations as well as
other commercial decisions, all depend on the pre-
cision of forecasts. Sales forecasting adds value
across an organization as for profitable retail busi-
nesses, accurate demand forecasting is crucial. Ac-
curate sales forecasting is of paramount importance to
e-commerce business (Qi et al., 2019), as e-commerce
sales are known to suffer from increased volatility,
unpredictability, and sudden spikes or lows, due to
abrupt changes in various source revenue channels,
like changes in organic traffic, paid media, social
buzz, etc.
To produce sales forecasts, a multidisciplinary
group of information must be considered, such as
historical trends, pricing, customer data, promotions,
selling channels, and product changes. Moreover, one
must successfully anticipate market trends, monitor
competitors, and consider other business plans. Typ-
ically, sales have three long-term stages growth, sta-
bility, and decline (Day, 1981), while in short term
they are affected by price, promotions, season, and
online ranking. Especially in e-commerce environ-
ments, sales fluctuations are sudden, blunt, and hard
to predict if not all underlying information is avail-
able. Thus, even though sales may show a linear trend
of increase or decrease in a specific period, certain
phases may show the characteristics of nonlinear fluc-
tuation because of various potential uncertainties (Li
et al., 2018).
Various techniques can be used for forecasting,
like qualitative techniques, time series analysis and
projection, as well as causal models (Chambers et al.,
1971). The first uses qualitative data, such as ex-
pert opinion and information about special events, and
may or may not consider the past. The second, on the
other hand, focuses entirely on patterns and pattern
changes, and thus relies entirely on historical data.
The third uses highly refined and specific information
about relationships between system elements and is
powerful enough to take special events formally into
account. As with time series analysis and projection
techniques, the past is important to causal models. Se-
lecting the appropriate forecasting technique requires
evaluation of various parameters (Armstrong, 2009)
like accuracy, convenience, market popularity, appli-
cations, data required, and cost of forecasting.
Towards improving sales forecasting, various top-
down and bottom-up techniques have been proposed
(Soto-Ferrari et al., 2019) in the past. Top-down
Vavliakis, K., Siailis, A. and Symeonidis, A.
Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM Models.
DOI: 10.5220/0010659500003058
In Proceedings of the 17th International Conference on Web Information Systems and Technologies (WEBIST 2021), pages 299-306
ISBN: 978-989-758-536-4; ISSN: 2184-3252
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
299
sales forecasting starts by identifying your total ad-
dressable market for each business segment. It takes
a higher-level approach to view your business. On
the opposite spectrum is bottom-up sales forecasting,
which starts with the product instead of the market
and unit sales instead of market share. Bottom-up
forecasting, as well as sales forecasting by product,
is usually reserved for more mature businesses.
The ultimate goal of predictive analytics for sales
forecasting is to fully automate the forecasting pro-
cess and enable continuous forecasting with real-time
data. This is done by capturing and digitizing hu-
man expertise, essentially teaching a computer system
to “think” like a human sales planner, being able to
model both linear and non-linear variables. Towards
this end, various machine learning techniques have
been proposed, including statistical methods, time se-
ries analysis, neural networks, and random forests.
In this work, we propose an augmented hybrid
model that handles linear and non-linear relationships
for solving the problem of automatic product sales
forecasting. The proposed model handles univariate
timeseries predictions, to predict the future number of
sales, by integrating an ARIMA model with a state-
of-the-art neural network. In addition, the final re-
tail price of the product is used as input in the neu-
ral network that improves the accuracy of the predic-
tions, as the e-commerce market usually is very price-
sensitive, so discounts and final price greatly affect
sales.
The remainder of this paper is structured as fol-
lows. Section 2 discusses related work on sales fore-
casting, while Section 3 presents the proposed fore-
casting model. Section 4 depicts the results on the
evaluation process. Finally, Section 5 summarizes
work done, discusses future work, and concludes the
paper.
2 RELATED WORK
Although most sales forecasting techniques are typ-
ically univariate methods that produce forecasts con-
sidering only the historical sales data of a single prod-
uct, there is a lot more information that can be used
for improving forecasting models. Apart from the his-
torical trends, like sales from previous years, extra in-
formation can be utilized, like pricing, customer data,
promotion activity, sales channel differentiation, and
product changes, as well as market trends, competitor
analysis, and future business plans. Towards improv-
ing sales forecasting various top-down and bottom-up
techniques have been proposed in the past.
Statistical sales forecasting models like ARIMA
(Box and Pierce, 1970), can be identified as one of
the most traditional and commonly used forecasting
methodologies. ARIMA models are a class of statis-
tical models for analyzing and forecasting time series
data that have been widely used for sales forecasting.
Recently researchers (Ramos et al., 2015) compared
the forecasting performance of state-space models
and ARIMA models. The forecasting performance
was demonstrated through a case study of retail sales
of five different categories of women’s footwear. The
results of this work showed that when an automatic
algorithm is used, the overall out-of-sample forecast-
ing performance of state space and ARIMA models
evaluated via RMSE, MAE, and MAPE (Chai and
Draxler, 2014) is quite similar on both one-step and
multi-step forecasts. Ramos et al., (2015) also con-
cluded that state space and ARIMA produce cover-
age probabilities that are close to the nominal rates
for both one-step and multi-step forecasts. More-
over, ARIMA models were also combined (Li et al.,
2018) with autoregressive neural networks (ARIMA-
NARNN) for forecasting e-commerce sales. This
work showed that the ARIMA-NARNN model, which
combines the linear fitting of ARIMA and the non-
linear mapping of NARNN, shows better prediction
performance than the ARIMA and NARNN methods.
Artificial neural networks (ANNs) have also been
widely used for forecasting models. A complete
framework was presented (Doganis et al., 2006) that
can be used for developing nonlinear time series sales
forecasting models. This method combined two arti-
ficial intelligence technologies, namely the radial ba-
sis function (RBF) neural network architecture, and
a specially designed genetic algorithm (GA). Situ-
ations where large quantities of related time series
are available have also been investigated (Bandara
et al., 2019) and results showed that conditioning the
forecast of an individual time series on past behav-
ior of similar, related time series can be beneficial.
Bandara et al. (2019) attempted to incorporate the
product assortment hierarchy in an e-commerce plat-
form that contained large numbers of related prod-
ucts, to a unified model. They trained a Long Short-
Term Memory network (Hochreiter and Schmidhu-
ber, 1997) that exploited the non-linear demand re-
lationships available in an e-commerce product as-
sortment hierarchy. They also proposed a systematic
pre-processing framework to overcome the challenges
in the e-commerce business. Finally, they intro-
duced several product grouping strategies to supple-
ment the LSTM learning schemes, in situations where
sales patterns in a product portfolio were disparate.
Novel neural networks called extreme learning ma-
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
300
chine (ELM) have also been investigated (Sun et al.,
2008) in order to find the relationship between sales
amount and some significant factors which affect de-
mand (such as design factors). Sun et al. (2008) used
real data from a fashion retailer to demonstrate that
the proposed methods outperform several sales fore-
casting methods which are based on backpropagation
neural networks.
Although ARIMA was one of the popular lin-
ear models in time series forecasting during the past
three decades. Recent research activities in forecast-
ing with artificial neural networks (ANNs) suggested
that ANNs can be a promising alternative to the tra-
ditional linear methods. Towards this end, ARIMA
models and ANNs are often compared with mixed
conclusions in terms of the superiority in forecasting
performance (Zhang, 2003). Since there are conflict-
ing studies about the superiority or not of neural net-
works, when compared with ARIMA models, hybrid
methods have also been proposed.
Zhang, (2003) proposed a hybrid methodology
that combines both ARIMA and ANN models that
take advantage of the unique strength of ARIMA and
ANN models in linear and nonlinear modeling. Ex-
perimental results with real data sets indicate that the
combined model can be an effective way to improve
forecasting accuracy achieved by either of the mod-
els used separately. On the other hand, a hybrid
forecasting method that also been proposed (Khan-
delwal et al., 2015) that applies ARIMA and ANN
separately to model linear and nonlinear components,
respectively after a prior decomposition of the se-
ries into low and high-frequency signals through dis-
crete wavelet transformation. These empirical results
with four real-world time series demonstrated that
the proposed method has yielded better forecasts than
ARIMA, ANN, and Zhang’s hybrid (Zhang, 2003)
model.
Other techniques, like multivariate methods, have
also been used. Fan et al., (2017) used online re-
views and a sentiment analysis method, the Naive
Bayes algorithm, to extract the sentiment index from
the content of each online review and integrate it into
the imitation coefficient of the Bass Norton model
to improve the forecasting accuracy. Their compu-
tational results indicated that the combination of the
Bass/Norton model and sentiment analysis has higher
forecasting accuracy than the standard Bass/Norton
model and some other sales forecasting models. On
the other hand, Lu et al., (2012) used multivari-
ate adaptive regression splines (MARS), a nonlinear
and nonparametric regression methodology, to con-
struct sales forecasting models for computer whole-
salers. Their experimental results show that the
MARS model outperforms backpropagation neural
networks, a support vector machine, a cerebellar
model articulation controller neural network, an ex-
treme learning machine, an ARIMA model, a mul-
tivariate linear regression model, and four two-stage
forecasting schemes across various performance cri-
teria. Guo et al., (2013) effectively applied multivari-
ate intelligent decision-making (MID) model and de-
veloped an effective forecasting model for the prob-
lem of sales forecasting problem in the retail industry
by integrating a data preparation and preprocessing
module, a harmony search-wrapper-based variable se-
lection (HWVS) module, and a multivariate intelli-
gent forecaster (MIF) module. Their experimental re-
sults showed that it is statistically significant that the
proposed MID model can generate much better fore-
casts than machine learning models and generalized
linear models do.
Other machine learning models have also been
employed frequently as they were able to achieve bet-
ter results using non-linear data. The recent research
shows that deep learning models (e.g., recurrent neu-
ral networks) can provide higher accuracy in predic-
tions compared to machine learning models due to
their ability to persist information and identify tem-
poral relationships. A study of deep learning-based
models for forecasting future directions of car sales
has also been proposed (Preeti Saxena, 2020). The re-
sults of this model based on ARIMA and Long Short-
Term Memory-Recurrent Neural Network (LSTM-
RNN) based models are analyzed and used for fore-
casting future directions. Their results showed that
LSTM-RNN is better than the ARIMA for the multi-
variate datasets.
Multi-disciplinary efforts have also been pre-
sented. Gurnani et al., (2017) evaluate and compares
various machine learning models, namely, ARIMA,
Auto-Regressive Neural Network (ARNN), XGBoost
(Chen and Guestrin, 2016), SVM (Hearst et al.,
1998), Hybrid Models like Hybrid ARIMA-ARNN,
Hybrid ARIMA-XGBoost, Hybrid ARIMA-SVM,
and STL Decomposition (Theodosiou, 2011), using
ARIMA, Snaive and XGBoost, to forecast sales of
a drug store. The accuracy of these models was
measured by metrics such as MAE and RMSE. Ini-
tially, a linear model such as ARIMA has been ap-
plied to forecast sales. But ARIMA was not able to
capture nonlinear patterns precisely, hence nonlinear
models such as Neural Network, XGBoost, and SVM
were used. Nonlinear models performed better than
ARIMA and gave lower RMSE. Then, to further op-
timize the performance, composite models were de-
signed using the hybrid technique and decomposition
technique. Hybrid ARIMA-ARNN, Hybrid ARIMA-
Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM Models
301
XGBoost, Hybrid ARIMA-SVM were used and all of
them performed better than their respective individual
models. The composite model was designed using
STL Decomposition where the decomposed compo-
nents namely seasonal trend, and remainder compo-
nents were forecasted by Snaive, ARIMA, and XG-
Boost. STL gave better results than individual and
hybrid models.
It is obvious that a lot of research efforts try to ana-
lyze and improve sales forecasting systems dynamics;
however, most of the existing solutions focus on spe-
cific case studies or offline retailers. Although dur-
ing the last years research focus has been shifted to
e-commerce, there is still a lot of progress to be made
for accurately forecasting sales. Moreover, most of
the proposed solutions focus on products and properly
forecasting their sales over time based on linear sales
data, while when non-linear or hybrid approaches
have been proposed, they rely on one-dimensional
data. In this work, we extend the current state of the
art by proposing a) a hybrid sales forecasting model
for dynamic pricing that optimally integrated ARIMA
and LSTM models, and b) integrates sales data with
pricing information for improved forecasting results.
3 PROPOSED SOLUTION
Our proposed solution is a hybrid model that
combines an ARIMA model for forecasting one-
dimensional time series data and an LSTM neural
network that models the non-linear residuals of the
ARIMA model together with the final retail price (re-
tail price after discounts). Selling price is a major
factor that affects sales, especially in highly compet-
itive environments, like e-commerce, thus our model
captures special discounts, promotions, and sales pe-
riods by the integration of the retail price, after dis-
counts, in our model. Moreover, trends and season-
ality are captured by the ARIMA time series analysis
of the proposed system. Our proposed model extends
the work of Zhang, (2003) by a) using state-of-the-art
neural model (LSTM) and b) extending the univari-
ate approach of Zhang into multivariate by using the
average retail selling price.
A time series y
t
is composed of a linear L
t
and a
non-linear component N
t
, according to Equation 1.
y
t
= L
t
+ N
t
(1)
The ARIMA methodology models the L
t
compo-
nent and the LSTM neural network models what can-
not be modeled by the linear ARIMA model, that is
the N
t
component. We call e
t
the non-linear informa-
tion until timestep t, so: e
t
= y
t
ˆ
L
t
, where
ˆ
L
t
is the
L
t
prediction from the ARIMA model until timestep t.
The input of the LSTM model is the non-linear resid-
uals that are not modeled from the ARIMA model.
In addition, we add another input which is the av-
erage final retail price of each product at timestep t,
as in Equation 2 where f is the non-linear function
that will be modeled by the LSTM, having as inputs
the ARIMA residuals and retail price for the last n
timesteps.
ˆe
t
= f (e
t1
, e
t2
, . . . , e
tn
, p
t1
, p
t2
, . . . , p
tn
) (2)
Figure 1 depicts the system architecture of the pro-
posed solution. The pre-processing phase includes
data cleaning, sorting, and indexing based on the date
sold. Since ARIMA models are suitable only for one-
dimensional time series analysis, we use as an input
only the quantity sold for n timesteps. The ARIMA
modeling function is depicted in Equation 3.
ˆy
t
= µ + φ
1
y
t1
+ ... + φ
p
y
tp
+
θ
1
e
t1
+ ... + θ
q
e
tq
(3)
Where y
t1
...y
tp
are the selling quantities for
p previous timesteps (autoregressive parameters) and
e
t1
...e
tq
are the moving average parameters that re-
fer to external factors for the previous q timesteps.
Factors φ
1
...φ
p
and θ
1
...θ
q
are the trained autoregres-
sive parameters and moving average parameters, re-
spectively. This process is repeated for d times.
The values of (p, d, q) that lead to the optimal re-
sults are different for each product, thus optimiza-
tion of (p, d, q) must take place to discover the op-
timal values that lead to the best MSE (mean square
error). After that, data prediction takes place to
model the residuals, which is the difference be-
tween the actual and predicted values (Residuals =
ActualPrediction). Two are the factors that indicate
a good prediction: a) the residuals are unrelated; thus,
we cannot find a relation between residuals that we
could use for improving prediction results, and b) the
residuals mean value is close to zero, thus the stan-
dard deviation between predicted and real values is
minimum.
After the ARIMA model is completed, the residu-
als together with the final price are normalized using
the min-max scaling technique (Equation 4) and they
are fed to the LSTM network.
x
0
=
(x x
min
)
(x
max
x
min
)
(4)
Next, we calculate the optimal number of time
lags, as well as the number of hidden neurons,
together with numerous other parameters such as
nE pochs, nSamples, batchSize, learningRate, loss
function, and activation function using a grid-search
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
302
Figure 1: Architectural diagram of the proposed solution.
technique with various parameters and then the
LSTM network, is ready for our predictions. Since
the LSTM network calculates the diversion between
the real sales quantity and the value predicted by the
ARIMA model the final sales prediction is depicted
in Equation 5.
FinalPrediction = Prediction
ARIMA
Prediction
LST M
(5)
Table 1: Example Results for Grid Search of Optimum Val-
ues for (p, q, d).
(p, q, d) MSE (p, q, d) MSE
(1,0,0) 86.152 (4,0,0) 107.509
(1,0,1) 86.950 (4,0,1) 96.286
(1,1,0) 51.191 (4,1,0) 91.419
(2,0,0) 95.040 (4,1,1) 89.248
(2,1,0) 53.616 (4,1,2) 110.837
(3,0,0) 106.914 (5,0,0) 104.366
(3,1,0) 71.840 (5,1,0) 82.534
(3,1,1) 90.332 (5,1,1) 83.516
(3,1,2) 92.695 (5,1,2) 84.089
4 EVALUATION
4.1 Evaluation Data
For evaluation, we used an anonymous dataset
from the Greek online pharmacy www.pharm24.gr.
Pharm24.gr which is a well-known online pharmacy
in Greece with a few hundred thousand visitors per
month. Although considerably smaller than the global
e-commerce giants, Pharm24.gr just like many more
small-medium e-commerce retailers, has enough traf-
fic and revenue to justify some research & develop-
ment for optimizing sales predictions, provided the
applied methods use limited resources. Our dataset
contained selling data for 23,432 products, spanning
over six years and 1,418,480 order lines. For each
product, we used the quantities sold per month and
the average retail price per month.
Pre-processing has to take place in order to con-
vert data to the appropriate format for the ARIMA
model. During pre-processing the following steps are
taken: a) sales data are ordered by datetime, b) data
are reduced to one-dimensional information, so extra
information like average price and other product at-
tributes are removed, and c) dates with zero sales are
filled in order to have equal sized timeseries.
4.2 Evaluation Metrics
For evaluation we used three metrics: a) Mean Square
Error (MSE), b) Root Mean Square Error (RMSE),
and c) Mean Absolute Error (MAE) according to
Equations 6, 7, and 8 respectively, where pred
f inal
i
is
the final prediction values as calculated by the com-
bination of the results of the ARIMA model and the
LSTM network for timeframe i, actual
i
is the actual
quantities sold in timeframe i and N is the number of
forecasting timeframes.
MSE =
1
N
n
i=1
(actual
i
pred
f inal
i
)
2
(6)
RMSE =
s
1
N
n
i=1
(actual
i
pred
f inal
i
)
2
(7)
MAE =
1
N
n
i=1
|actual
i
pred
f inal
i
| (8)
Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM Models
303
Table 2: Evaluation of our solution against the results of the ARIMA model, the LSTM network, and the Zhang hybrid model
for one product.
Actual
ARIMA
Prediction
LSTM
Prediction
Zhang
Prediction
Prediction
of the
Proposed Model
Month Quantity
Sold
1 19 10.042152 9.404712 12.200262 13.566653
2 6 11.821504 8.800409 11.237559 14.334739
3 25 11.247403 10.811686 12.549486 15.364458
4 19 15.037372 10.380136 14.679003 21.467420
5 25 18.728056 13.610783 18.885234 21.285071
6 18 15.972515 18.334824 15.079795 18.489854
7 19 15.211719 15.827807 14.937994 19.230617
8 13 14.377784 16.447681 14.430467 16.174313
9 14 12.831789 13.638979 13.167471 14.847084
10 12 11.790429 12.937215 11.285167 14.855980
11 22 12.453913 11.919153 12.391926 15.389440
12 12 16.381802 14.070756 15.835681 18.792803
In order to optimize the (p, d, q) values, as dis-
cussed in Section 3, we applied a grid search (Lerman,
1980) optimization algorithm for p = [0, 1, 2, 3, 4, 5],
d = [0, 1], and q = [0, 1, 2], where p is the number
of AR terms, d is the number of iterations for cal-
culating the residual values and q in the number of
MA. Table 1 depicts some examples of our tests.
The initial search values were careful selected by a
domain expert and then we applied grid search that
gave the optimal results for (p, d, q) = (2, 1, 2), fur-
thermore we set the number of epochs equal to 1000
(nE pochs = 1000).
Next, we optimized the LSTM model. We con-
sidered two different methods, batch learning and on-
line learning that follow a different training method.
Gradient descent training of neural networks can be
done in either a batch or on-line manner. Wilson and
Martinez, (2003) explained why batch training is al-
most always slower than on-line training, often or-
ders of magnitude slower, especially on large train-
ing sets. The main reason is due to the ability of
on-line training to follow curves in the error surface
throughout each epoch, which allows it to safely use
a larger learning rate and converge with less iterations
through the training data. Thus, we decided to use
online learning (batch size = 1).
For optimizing the LSTM weights, we used the
ADAM method, an algorithm for first-order gradient-
based optimization of stochastic objective functions,
based on adaptive estimates of lower-order moments
(Kingma and Ba, 2014), with the Keras (Ketkar,
2017) default values [learning rate = 0.001, beta 1 =
0.9, beta 2 = 0.999, epsilon = 1e 07] and the rec-
tified linear activation function (ReLU) (Agarap,
2019).
4.3 Evaluation Results
Next, we compared the results of our solution against
the results of a) the ARIMA model, b) the LSTM
network, and c) the Zhang’s hybrid model. Table 2
depicts the results of the evaluation process for one
product.
We performed the above experiment for 50 ran-
dom products and then, we calculated MSE, RMSE,
and MAE. The aggregated results are depicted in
Table 3, according to which the proposed model
achieved improved results when compared with any
of the ARIMA, LSTM or Zhang’s approaches, even
when we did not consider the retail price sold. Our
results further improved, and outperformed in all met-
rics all three competing models, by achieving 5.82%,
13.12%, and 1.84% improved RMSE, 5.29%, 9.88%,
and 0.39% improved MAE, and 11,44%, 23,67%,
and 5,88% improved MSE when compared with the
ARIMA model, LSTM, and Zhang’s model, respec-
tively.
In our first set of experiments, we noticed that
results were better on products with increased sales,
which is attributed to the fact that the LSTM network
requires a lot of data for proper training. Thus, we
performed two more experiments, where, instead of
randomly picking 50 products, we tested, in the first
case with the 10 best seller products and the second
case with the 10 worst seller products (with a min-
imum of 50 items sold). In the case of best seller
products, the results of the proposed system further
improved by 2.22% and 1.71% in terms of RMSE and
MAE, respectively
Finally, in order to test the adaptability of our so-
lution, we selected 10 products with high seasonal-
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
304
Table 3: Evaluation results for 50 products.
MSE RMSE MAE
LSTM 540.76758 13.2629 9.68830
ARIMA 466.05542 12.2340 9.21864
Baseline (Zhang) 438.51756 11.7378 8.76454
Proposed Methodology 415.44138 11.6794 8.88266
Proposed Methodology with Retail Price 412.74034 11.5222 8.73078
Table 4: Evaluation results for best sellers and worst sellers when compared with the baseline.
Improvement for
Best sellers
Improvement for
Worst Sellers
Improvement for
Seasonal Products
MSE 5.81% 0.27% 4.11%
RMSE 2.22% 0.15% 1.76%
MAE 1.71% 0.3% 0.92%
ity (sunscreens). The results of all these three exper-
iments are depicted in Table 4. In all three cases, the
proposed solution outperformed the baseline, as we
achieved a 1.76% improvement in RMSE, 0.92% im-
provement in MAE and 4.11% improvement in MSE
when compared with the Zhang’s model.
5 CONCLUSIONS & FUTURE
WORK
In this paper, we introduced a novel sales forecast-
ing model that is based on a hybrid model. We com-
bined an ARIMA model that is suitable for linear
data, with an LSTM Network that analyses the non-
linear residuals of the ARIMA model. We also added
to our model an extra feature, the average retail sales
price, which naturally has a significant effect on sales
volume, especially in highly price-sensitive environ-
ments, like the e-commerce field.
We compared the proposed solution with three
other methods: a) the ARIMA model, b) the LSTM
model, and c) the Zhang model. Our solution out-
performed all three models by achieving improved
RMSE, MAE, and MSE when compared with the
ARIMA model, LSTM and Zhang’s model, respec-
tively. We stated that our model works better when
there is a plethora of data (due to the LSTM network),
so we performed another experiment with the best
seller products and the results of the proposed sys-
tem further improved by 2.22% in terms of RMSE. Fi-
nally, we tested our system with ten random seasonal
products, where we achieved 1.76% improvement in
RMSE when compared with the Zhang’s model.
Our future work includes further testing the pro-
posed algorithm in real-world scenarios and improv-
ing our simulation framework in terms of available
configurations and extra features (e.g. out of stock
periods, web traffic sources, customer profile and
one time promotional products). Finally, our plans
include comparing the proposed system with more
sales forecasting models, as well as other available
datasets.
ACKNOWLEDGEMENTS
This research is co-financed by Greece and the Eu-
ropean Union (European Social Fund - ESF) through
the Operational Programme ”Human Resources. De-
velopment, Education and Lifelong Learning” in the
context of the project ”Reinforcement of Postdoctoral
Researchers - 2nd Cycle” (MIS-5033021), imple-
mented by the State Scholarships Foundation (IKY).
The authors would also like to thank Kostas Niko-
laros of Pharm24.gr for his valuable feedback regard-
ing sales trends.
REFERENCES
Agarap, A. F. (2019). Deep learning using rectified linear
units (relu).
Armstrong, J. (2009). Selecting forecasting methods. SSRN
Electronic Journal.
Bandara, K., Shi, P., Bergmeir, C., Hewamalage, H., Tran,
Q., and Seaman, B. (2019). Sales demand forecast in
e-commerce using a long short-term memory neural
network methodology. In Gedeon, T., Wong, K. W.,
and Lee, M., editors, Neural Information Processing,
pages 462–474, Cham. Springer International Pub-
lishing.
Box, G. E. P. and Pierce, D. A. (1970). Distribution of resid-
ual autocorrelations in autoregressive-integrated mov-
ing average time series models. Journal of the Ameri-
can Statistical Association, 65(332):1509–1526.
Chai, T. and Draxler, R. (2014). Root mean square er-
ror (rmse) or mean absolute error (mae)?– arguments
Optimizing Sales Forecasting in e-Commerce with ARIMA and LSTM Models
305
against avoiding rmse in the literature. Geoscientific
Model Development, 7:1247–1250.
Chambers, J. C., Mullick, S. K., and Smith, D. D. (1971).
How to choose the right forecasting technique. Har-
vard Business Review.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable
tree boosting system. In Proceedings of the 22nd
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’16, page
785–794, New York, NY, USA. Association for Com-
puting Machinery.
Day, G. S. (1981). The product life cycle: Analysis and
applications issues. Journal of Marketing, 45(4):60–
67.
Doganis, P., Alexandridis, A., Patrinos, P., and Sarimveis,
H. (2006). Time series sales forecasting for short
shelf-life food products based on artificial neural net-
works and evolutionary computing. Journal of Food
Engineering, 75(2):196–204.
Hearst, M., Dumais, S., Osuna, E., Platt, J., and Scholkopf,
B. (1998). Support vector machines. IEEE Intelligent
Systems and their Applications, 13(4):18–28.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8):1735–1780.
Ketkar, N. (2017). Introduction to Keras, pages 97–111.
Apress, Berkeley, CA.
Khandelwal, I., Adhikari, R., and Verma, G. (2015). Time
series forecasting using hybrid arima and ann mod-
els based on dwt decomposition. Procedia Computer
Science, 48:173–179. International Conference on
Computer, Communication and Convergence (ICCC
2015).
Kingma, D. and Ba, J. (2014). Adam: A method for
stochastic optimization. International Conference on
Learning Representations.
Lerman, P. M. (1980). Fitting segmented regression models
by grid search. Journal of the Royal Statistical Soci-
ety: Series C (Applied Statistics), 29(1):77–84.
Li, M., Ji, S., and Liu, G. (2018). Forecasting of Chinese
E-Commerce Sales: An Empirical Comparison of
ARIMA, Nonlinear Autoregressive Neural Network,
and a Combined ARIMA-NARNN Model. Mathemat-
ical Problems in Engineering, 2018:1–12.
Preeti Saxena, Pritika Bahad, R. K. (2020). Long short-term
memory-rnn based model for multivariate car sales
forecasting. International Journal of Advanced Sci-
ence and Technology, 29(04):4645 –.
Qi, Y., Li, C., Deng, H., Cai, M., Qi, Y., and Deng, Y.
(2019). A deep neural framework for sales forecast-
ing in e-commerce. In Proceedings of the 28th ACM
International Conference on Information and Knowl-
edge Management, CIKM ’19, page 299–308, New
York, NY, USA. Association for Computing Machin-
ery.
Ramos, P., Santos, N., and Rebelo, R. (2015). Performance
of state space and arima models for consumer retail
sales forecasting. Robotics and Computer-Integrated
Manufacturing, 34:151–163.
Soto-Ferrari, M., Chams-Anturi, O., Escorcia-Caballero,
J. P., Hussain, N., and Khan, M. (2019). Evaluation
of bottom-up and top-down strategies for aggregated
forecasts: State space models and arima applications.
In Paternina-Arboleda, C. and Voß, S., editors, Com-
putational Logistics, pages 413–427, Cham. Springer
International Publishing.
Sun, Z.-L., Choi, T.-M., Au, K.-F., and Yu, Y. (2008). Sales
forecasting using extreme learning machine with ap-
plications in fashion retailing. Decision Support Sys-
tems, 46(1):411–419.
Theodosiou, M. (2011). Forecasting monthly and quar-
terly time series using stl decomposition. Interna-
tional Journal of Forecasting, 27(4):1178–1195.
Zhang, G. (2003). Time series forecasting using a hybrid
arima and neural network model. Neurocomputing,
50:159–175.
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
306