Impact of Deep Learning Libraries on Online Adaptive Lightweight

Time Series Anomaly Detection

Ming-Chang Lee

1 a

and Jia-Chun Lin

2 b

Department of Computer Science, Electrical Engineering and Mathematical Sciences, Høgskulen p

a Vestlandet (HVL),

Bergen, Norway

Department of Information Security and Communication Technology, Norwegian University of Science and Technology

(NTNU), Gjøvik, Norway

Keywords:

Time Series, Univariate Time Series, Anomaly Detection, Online Model Training, Unsupervised Learning,

TensorFlow, Keras, PyTorch, Deeplearning4j.

Abstract:

Providing online adaptive lightweight time series anomaly detection without human intervention and domain

knowledge is highly valuable. Several such anomaly detection approaches have been introduced in the past

years, but all of them were only implemented in one deep learning library. With the development of deep learn-

ing libraries, it is unclear how different deep learning libraries impact these anomaly detection approaches

since there is no such evaluation available. Randomly choosing a deep learning library to implement an

anomaly detection approach might not be able to show the true performance of the approach. It might also

mislead users in believing one approach is better than another. Therefore, in this paper, we investigate the im-

pact of deep learning libraries on online adaptive lightweight time series anomaly detection by implementing

two state-of-the-art anomaly detection approaches in three well-known deep learning libraries and evaluating

how these two approaches are individually affected by the three deep learning libraries. A series of experi-

ments based on four real-world open-source time series datasets were conducted. The results provide a good

reference to select an appropriate deep learning library for online adaptive lightweight anomaly detection.

1 INTRODUCTION

A time series refers to a sequence of data points in-

dexed in time order, and it is a collection of observa-

tions obtained via repeated measurements over time

(Ahmed et al., 2016). Examples of time series in-

clude stock prices, retail sales, electricity consump-

tion, temperatures, humidity, CO2, blood pressures,

heart rates, etc. Due to the increasing prevalence of

the Internet of Things (IoT), more and more different

time series are continuously generated by diverse IoT

sensors and devices over time. Analyzing time se-

ries is valuable to businesses and organizations since

it gives insight into what has happened and identiﬁes

trends and seasonal variances to aid in the forecasting

of future events. It also enables businesses and or-

ganizations to take appropriate policies or make bet-

ter decisions (Kieu et al., 2018; Yatish and Swamy,

2020).

Time series anomaly detection is an analysis task

https://orcid.org/0000-0003-2484-4366

https://orcid.org/0000-0003-3374-8536

focusing on detecting anomalous or abnormal data

points in time series, and it has been widely used

in various applications ranging from cloud systems

(Deka et al., 2022), smart grids (Zhang et al., 2021),

healthcare (Pereira and Silveira, 2019) to agriculture

(Moso et al., 2021). Many time series anomaly de-

tection approaches have been introduced in the last

decade. Some were designed for univariate time se-

ries where there is only one time-dependent variable,

and the other approaches were designed for multivari-

ate time series that consists of more than one time-

dependent variables. In this paper, we focus on the

studies for univariate time series. To be more speciﬁc,

we focus on univariate time series anomaly detection

approaches that possess the following features: Unsu-

pervised learning, online model training, adaptability,

and lightweight since these features decide whether

an approach is practical or not. (Bl

azquez-Garc

ıa

et al., 2021).

Unsupervised learning refers to machine learning

models that have a self-learning ability to draw in-

ference from a dataset containing a small minority

106

Lee, M. and Lin, J.

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection.

DOI: 10.5220/0012082900003538

In Proceedings of the 18th International Conference on Software Technologies (ICSOFT 2023), pages 106-116

ISBN: 978-989-758-665-1; ISSN: 2184-2833

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

of abnormal data without any label. Since most of

real-world time series data do not have any label, it is

desirable to have an unsupervised anomaly detection

approach. Conventional machine learning models are

usually trained with a pre-collected dataset in an of-

ﬂine manner. Once the models are trained, they are

used for inference without any change. Hence, they

cannot reﬂect unseen situations or adapt to changes

on time series (Eom et al., 2015). Unlike ofﬂine

model training, online model training enables a ma-

chine learning model to be trained on the ﬂy, imply-

ing that the model can adapt to changes in the pat-

tern of the time series (i.e., adaptability). This fea-

ture is getting more and more popular, and it has

been provided by some systems or approaches such

as (Lee et al., 2020b; Eom et al., 2015; Chi et al.,

2021). Finally, lightweight means that an anomaly

detection approach neither has a complex network

structure/design nor requires excessive computation

resources such as General-Purpose Graphics process-

ing units (GPGPUs) or high-performance computers.

According to our survey, only few state-of-the-art

approaches satisfy all the above-mentioned charac-

teristics, such as RePAD (Lee et al., 2020b), ReRe

(Lee et al., 2020a), SALAD (Lee et al., 2021b), and

RePAD2 (Lee and Lin, 2023). However, all of them

were only implemented in one speciﬁc deep learning

library. In fact, a number of deep learning (DL) li-

braries have been introduced and widely used, such

as TensorFlow (Abadi et al., 2016), PyTorch (Paszke

et al., 2019), and Deeplearning4j (Deeplearning4j,

2023). They have a common goal to facilitate the

complicated data analysis process and offer integrated

environments on top of standard programming lan-

guages (Nguyen et al., 2019). However, it is unclear

the impact of these DL libraries on online adaptive

lightweight anomaly detection.

Therefore, this paper focuses on investigating

how different DL libraries affect online adaptive

lightweight time series anomaly detection by imple-

menting two state-of-the-art anomaly detection ap-

proaches in three widely-used deep learning libraries.

It is worth noting that our focus is not to compare dif-

ferent time series anomaly detection approaches re-

garding their detection accuracy or response time. In-

stead, we emphasize on investigating how these ap-

proaches are individually affected by different DL li-

braries.

A series of experiments based on open-source

time series datasets were performed. The results show

that DL libraries have a great impact on not only

anomaly detection accuracy but also response time.

Therefore, it is important to take the selection of DL

libraries into consideration when one would like to

design and implement an online adaptive lightweight

time series anomaly detection approach.

The rest of the paper is organized as follows:

Section 2 describes time series anomaly detection

approaches and DL libraries. Section 3 gives an

overview of the related work. Section 4 introduces

evaluation setup. Section 5 presents the evaluation

results. Section 6 concludes this paper and outlines

future work.

2 BACKGROUND

In this section, we introduce state-of-the-art anomaly

detection approaches for univariate time series and

some well-known DL libraries.

2.1 Anomaly Detection Approaches for

Univariate Time Series

Existing anomaly detection approaches for univariate

time series can be roughly classiﬁed into two cate-

gories: statistical based and machine learning based.

Statistical-based anomaly detection approaches at-

tempt to create a statistical model for normal time

series data and use this model to determine if a data

point is anomalous or not. Example approaches in-

clude AnomalyDetectionTs and AnomalyDetection-

Vec proposed by Twitter (Twitter, 2015), and Luminol

introduced by LinkedIn (LinkedIn, 2018). However,

statistical-based approaches might not perform well if

the data does not follow a known distribution (Alimo-

hammadi and Chen, 2022).

On the other hand, machine learning based ap-

proaches attempt to detect anomalies without assum-

ing a speciﬁc generative model based on the fact

that it is unnecessary to know the underlying pro-

cess of the data (Braei and Wagner, 2020). Green-

house (Lee et al., 2018) is a time series anomaly de-

tection algorithm based on Long Short-Term Mem-

ory (LSTM), which is a special recurrent neural net-

work suitable for long-term dependent tasks (Hochre-

iter and Schmidhuber, 1997). Greenhouse adopts a

Look-Back and Predict-Forward strategy to learn the

distribution of the training data. For a given time

point, a window of most recently observed data point

values are used to predict future data point values.

However, Greenhouse is not an online approach since

its LSTM model is trained with a pre-collected train-

ing data. Besides, it requires users to determine a

proper detection threshold.

RePAD (Lee et al., 2020b) is an online real-time

lightweight unsupervised time series anomaly detec-

tion approaches based on LSTM and the Look-Back

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection

107

and Predict-Forward strategy. RePAD utilizes a sim-

ple LSTM network (with only one hidden layer and

ten hidden units) to train a LSTM model with short-

term historical data points, predict each upcoming

data point, and then decide if each data point is

anomalous based on a dynamically calculated detec-

tion threshold. Different from Greenhouse, RePAD

does not need to go through any ofﬂine training.

Instead, RePAD trains its LSTM model on the ﬂy.

RePAD will keep using the same LSTM model if the

model predicts well. When the prediction error of the

model is higher than or equal to a dynamically calcu-

lated detection threshold, RePAD will retrain another

new model with recent data points.

ReRe (Lee et al., 2020a) is an enhanced time se-

ries anomaly detection based on RePAD, and it was

designed to further reduce false positive rates. ReRe

utilizes two LSTM models to jointly detect anoma-

lous data points. One model works exactly like

RePAD, whereas the other model works similar to

RePAD but with a stricter detection threshold. Com-

pared with RePAD, ReRe requires more compute re-

sources due to the use of two LSTM models.

SALAD (Lee et al., 2021b) is another online self-

adaptive unsupervised time series anomaly detection

approach designed for time series with a recurrent

data pattern, and it is also based on RePAD. Different

from RePAD, SALAD consists of two phases. The

ﬁrst phase converts the target time series into a series

of average absolute relative error (AARE) values on

the ﬂy. The second phase predicts an AARE value for

every upcoming data point based on short-term his-

torical AARE values. If the difference between a cal-

culated AARE value and the corresponding forecast

AARE value is higher than a self-adaptive detection

threshold, the corresponding data point is considered

anomalous.

Ziu et al. (Niu et al., 2020) introduced LSTM-

based VAE-GAN, which stands for a Long Short-

Term Memory-based variational autoencoder gener-

ation adversarial networks. This method consists of

one ofﬂine training stage to learn the distribution of

normal time series, and one anomaly detection stage

to calculate anomaly score for each data point in the

target time series. This method jointly trains the en-

coder, the generator, and the discriminator to take ad-

vantage of the mapping ability of the encoder and the

discriminatory ability of the discriminator. However,

the method requires that the training data contains no

anomalies. Besides, the method is not an online ap-

proach since its detection model will not be retrained

or updated after the training stage, meaning that it is

not adaptive.

Ibrahim et al. (Ibrahim et al., 2022) proposed

a hybrid deep learning approach that combines one-

dimensional convolutional neural network with bidi-

rectional long short-term memory (BiLSTM) for

anomaly detection in univariate time series. However,

the approach requires ofﬂine training and consider-

able training time due to parameter tuning required

by the used hybrid approach.

2.2 Deep Learning Libraries

Over the last few years, machine learning has seen

signiﬁcant advances. Many different machine learn-

ing algorithms have been introduced to address dif-

ferent problems. In the meantime, many DL libraries

have been developed by academy, industry, and open-

source communities, attempting to provide a fair ab-

straction on the ground complex tasks with simple

functions that can be used as tools for solving larger

problems (Ketkar and Santana, 2017).

TensorFlow (Abadi et al., 2016) is a popular open-

source Python-based DL library created and main-

tained by Google. It uses dataﬂow graphs to represent

both the computation in an algorithm and the state on

which the algorithm operates. TensorFlow is designed

for large-scale distributed training and inference. It

can run on a single CPU system, GPUs, mobile de-

vices, and large-scale distributed systems. However,

its low-level application programming interface (API)

makes it difﬁcult to use (Nguyen et al., 2019). Be-

cause of this, TensorFlow is usually used in combi-

nation with Keras (Keras, 2023), which is a Python

wrapper library providing high-level, highly modular,

and user-friendly API.

CNTK (CNTK, 2023) stands for Cognitive

Toolkit, and it was introduced by Microsoft and writ-

ten in C++ programming language. It supports the

Open Neural Network Exchange (ONNX) format, al-

lowing easy model transformation from one DL li-

brary to another one. As compared with TensorFlow,

CNTK is less popular (Nguyen et al., 2019). More-

over, the ofﬁcial website of CNTK shows that CNTK

is no longer actively developed.

PyTorch (Paszke et al., 2019) is an open-source

DL framework based on the Torch library. It aims to

provide an easy to use, extend, develop, and debug

framework. It is equipped with a high-performance

C++ runtime that developers can leverage for pro-

duction environments while avoiding inference via

Python (Ketkar and Santana, 2017). PyTorch sup-

ports tensor computation with strong GPU accelera-

tion and allows a network to change the way it be-

haves with small effort using dynamic computational

graphs. Similar to CNTK, it also supports the ONNX

format.

ICSOFT 2023 - 18th International Conference on Software Technologies

108

Deeplearning4j is an open source distributed deep

learning library released by a startup company called

Skymind in 2014 (Deeplearning4j, 2023)(Wang et al.,

2019). Deeplearning4j is written for java program-

ming language and java virtual machine (JVM). It is

ing library called ND4J, and it supports both CPUs

and GPUs. Deeplearning4j provides implementa-

tions of the restricted Boltzmann machine, deep be-

lief net, deep autoencoder, recurrent neural network,

word2vec, doc2vec, etc.

3 RELATED WORK

Nguyen et al. (Nguyen et al., 2019) conducted a

survey on several DL libraries. They also analyzed

strong points and weak points for each library. How-

ever, they did not conduct any experiments to com-

pare these DL libraries. Wang et al. (Wang et al.,

2019) compared several DL libraries in terms of

model design ability, interface property, deployment

ability, performance, framework design, and devel-

opment prospects by using some benchmarks. The

authors also made suggestions about how to choose

DL frameworks in different scenarios. Nevertheless,

their general evaluation and analysis are unable to

answer the speciﬁc question that this paper attempts

to answer, i.e., how DL libraries affect online adap-

tive lightweight time series anomaly detection ap-

proaches.

Kovalev et al. (Kovalev et al., 2016) evaluated the

training time, prediction time, and classiﬁcation ac-

curacy of a fully connected neural network (FCNN)

under ﬁve different DL libraries: Theano with Keras,

Torch, Caffe, Tensorﬂow, and Deeplearning4j. Ap-

parently, their results are not applicable to lightweight

anomaly detection approaches.

Zhang et al. (Zhang et al., 2018) evaluated the

performance of several state-of-the-art DL libraries,

including TensorFlow, Caffe2, MXNet, PyTorch and

TensorFlow Lite on different kinds of hardware, in-

cluding MacBook, FogNode, Jetson TX2, Raspberry

Pi, and Nexus 6P. The authors chose a large-scale

convolutional neural network (CNN) model called

AlexNet (Krizhevsky et al., 2017) and a small-scale

CNN model called SqueezeNet (Iandola et al., 2016),

and evaluated how each of them performs under dif-

ferent combination of hardware and DL libraries in

terms of latency, memory footprint, and energy con-

sumption. According to the evaluation results, there

is no single winner on every metric since each has

its own metric. Due to the fact that two used CNN

models are much complex than lightweight anomaly

detection approaches, their evaluation results and sug-

gestions may not be applicable.

Zahidi et al. (Zahidi et al., 2021) conducted an

analysis to compare different Python-based and Java-

based DL libraries and to see how they support dif-

ferent natural language processing (NLP) tasks. Due

to the difference between NLP tasks and time series

analysis, their results still cannot be applied to the

work of this paper.

Zhang et al. (Zhang et al., 2022) built a bench-

mark that includes six representative DL libraries

on mobile devices (TFLite, PyTorchMobile, ncnn,

MNN, Mace, and SNPE) and 15 DL models (10 of

them are for image classiﬁcation, 3 of them are for

object detection, 1 for semantic segmentation, and 1

for text classiﬁcation). The authors then performed a

series of experiments to evaluate the performance of

these DL libraries on the 15 DL models and different

mobile devices. According to their analysis and ob-

servation, there is no DL libraries that perform best

on all tested scenarios and that the impacts of DL

libraries may overwhelm DL algorithm design and

hardware capacity. Apparently, the target of our pa-

per is completely different from that of Zhang et al.’s

paper. Even though their results point out some use-

ful conclusions, their results cannot help us get a clear

answer about how different DL libraries affect online

adaptive lightweight anomaly detection.

4 EVALUATION SETUP

Based on the description in the Background sec-

tion, we chose RePAD and SALAD to be our target

anomaly detection approaches because both of them

possess all previously mentioned desirable features

(i.e., unsupervised learning, online model training,

adaptability, and lightweight). As for DL libraries, we

chose TensorFlow-Keras, PyTorch, and Deeplearn-

ing4j because they are popular and widely used. Re-

call both TensorFlow-Keras and PyTorch are based on

Python, it would be interesting to see how Deeplearn-

ing4j performs as compared with TensorFlow-Keras

and PyTorch. Here, the versions of TensorFlow-

Keras, PyTorch, and Deeplearning4j are 2.9.1, 1.13.1,

and 0.7-SNAPSHOT, respectively.

We implemented RePAD and SALAD in the three

DL libraries. Hence, there are six combinations as

shown in Table 1. RePAD-TFK refers to RePAD im-

plemented in TensorFlow-Keras, SALAD-PT refers

to SALAD implemented in PyTorch, and so on so

forth.

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection

109

Table 1: The six combinations studied in this paper.

RePAD SALAD

TensorFlow-Keras RePAD-TFK SALAD-TFK

PyTorch RePAD-PT SALAD-PT

Deeplearning4j RePAD-DL4J SALAD-DL4J

4.1 Real-World Datasets

To evaluate the three RePAD combinations, two

real-world time series were used. One is called

ec2-cpu-utilization-825cc2 (CC2 for short), and the

other is called rds-cpu-utilization-e47b3b (B3B for

short). Both time series are provided by the Nu-

menta Anomaly Benchmark (NAB) (Lavin and Ah-

mad, 2015). CC2 contains two point anomalies and

one collective anomaly, whereas B3B contains one

point anomaly and one collective anomaly. Note that

a point anomaly is a single data point which is identi-

ﬁed as anomalous with respect to the rest of the time

series, whereas a collective anomaly is deﬁned as a se-

quence of data points which together form an anoma-

lous pattern (Schneider et al., 2021).

Since CC2 and B3B consist of only 4032 data

points, they are unable to show the long-term per-

formance of the three RePAD combinations. Hence,

we created two long time series called CC2-10 and

B3B-10 by individually duplicating CC2 and B3B ten

times. Table 2 lists their details. Figures 1 and 2 illus-

trate all data points in CC2-10 and B3B-10, respec-

tively. Each point anomaly is marked as a red circle,

whereas each collective anomaly is marked as a red

curve line.

Table 2: Two extended real-world time series used to eval-

uate RePAD-TFK, RePAD-PT, and RePAD-DL4J.

Name Number of data

points

Time

interval

Duration Number of anomalies

CC2-10 40,320 5 140 days

20 point and 10

collective anomalies

B3B-10 40,320 5 140 days

10 point and 10

collective anomalies

Figure 1: All data points on the CC2-10 time series. Each

anomaly is marked in red.

On the other hand, to evaluate the three SALAD

combinations, we selected another two real-world re-

Figure 2: All data points on the B3B-10 time series. Each

anomaly is marked in red.

current time series. One is Taipei Mass Rapid Transit

(TMRT for short) (Yeh et al., 2019), and the other is

New York City Taxi demand (NYC for short) from

the Numenta Anomaly Benchmark (Lavin and Ah-

mad, 2015). The former consists of 1260 data points,

whereas the latter consists of 10320 data points. Table

3 summarizes the details of TMRT and NYC. They

contain only collective anomalies.

Table 3: Two real-world time series used to evaluate

SALAD-TFK, SALAD-PT, and SALAD-DL4J.

Name Number of data

points

Interval Duration

Number of

anomalies

TMRT 1,260 1 hour 2016/02/01 00:00 to

2016/03/31 23:00

1 collective

anomaly

NYC 10,320 30 min 2014/07/01 00:00 to

2015/01/31 23:30

5 collective

anomalies

4.2 Hyperparameters, Parameters, and

Environment

To ensure a fair evaluation, the three RePAD combi-

nations were conﬁgured with the same hyperparam-

eters and parameters, as listed in Table 4, following

the setting used by RePAD (Lee et al., 2020b). Re-

call that RePAD utilizes the Look-Back and Predict-

Forward strategy to determine data size for online

model training and data size for prediction. In this

paper, we respectively set the Look-Back parameter

and the Predict-Forward parameter to 3 and 1 based

on the setting suggested by (Lee et al., 2021a). In

other words, the LSTM models used by RePAD-TFK,

RePAD-PT, and RePAD-DL4J will be always trained

with three historical data points, and the trained mod-

els will be used to predict the next upcoming data

point in the target time series.

In addition, RePAD-TFK, RePAD-PT, and

RePAD-DL4J inherited the simple LSTM structure

used by RePAD (Lee et al., 2020b), i.e., only one

hidden layer and ten hidden units. Note that Early

stopping (EarlyStopping, 2023) was not used to

automatically determine the number of epochs since

this technique is not ofﬁcially supported by PyTorch.

ICSOFT 2023 - 18th International Conference on Software Technologies

110

For fairness, the number of epochs was set to 50 for

the three RePAD combinations.

Table 4: The hyperparameter and parameter setting used by

RePAD-TFK, RePAD-PT, and RePAD-DL4J.

Hyperparameters/parameters Value

The Look-Back parameter 3

The Predict-Forward parameter 1

The number of hidden layers 1

The number of hidden units 10

The number of epochs 50

Learning rate 0.005

Activation function tanh

Random seed 140

Table 5: The hyperparameter and parameter setting used by

SALAD-TFK, SALAD-PT, and SALAD-DL4J.

Hyperparameters/parameters The conversion phase The detection phase

The Look-Back parameter

288 for NYC,

63 for TMRT

The Predict-Forward parameter 1 1

The number of hidden layers 1 1

The number of hidden units 10 10

The number of epochs 100 50

Learning rate 0.001 0.001

Activation function tanh tanh

Random seed 140 140

Similarly, to make sure a fair evaluation, the three

SALAD combinations were all conﬁgured with the

same hyperparameter and parameter setting, as listed

in Table 5. However, the setting is slightly differ-

ent when it comes to the two used time series TMRT

and NYC. Recall that SALAD consists of one con-

version phase and one detection phase. The conver-

sion phase requires more data points for model train-

ing than the detection phase does. Hence, the Look-

Back parameter for the conversion phase of SALAD-

TFK, SALAD-PT, and SALAD-DL4J were all set to

288 and 63 on NYC and TMRT, respectively. Due

to the same reason, we conﬁgured 100 and 50 epochs

for the conversion phase and the detection phase of

the three SALAD combinations, respectively. On the

other hand, the Look-Back parameter for the detec-

tion phase of the three SALAD combinations were all

set to 3 no matter the used time series is TMRT or

NYC. This is because the detection phase works ex-

actly like RePAD, and three is the recommend value

suggested by (Lee et al., 2021a) for the Look-Back

parameter of RePAD.

The evaluations for all the six combinations were

individually performed on the same laptop running

MacOS 10.15.1 with 2.6 GHz 6-Core Intel Core i7

and 16GB DDR4 SDRAM. Note that we did not

choose GPUs or high-performance computers to con-

duct the evaluation since it is interesting to know how

TensorFlow-Keras, PyTorch, and Deeplearning4j im-

pact RePAD and SALAD on a commodity computer.

5 EVALUATION RESULTS

In this section, we detail the evaluation results of

the three RePAD combinations and the three SALAD

combinations.

5.1 Three RePAD Combinations

To measure the detection accuracy for each RePAD

combination, we chose precision, recall, and F-

score. Precision is the ratio between the true pos-

itives (TP) and all the positives, i.e., precision=

TP/(TP+FP) where FP represents false positive. Re-

call is the measure of the correctly identiﬁed anoma-

lies from all the actual anomalies, i.e., recall=

TP/(TP+FN) where FN represents false negative. F-

score is a well-known composite measure to eval-

uate the accuracy of a model, and it is deﬁned

as 2·(precision·recall)/(precision+recall). A higher

value of F-score indicates better detection accuracy.

It is worth noting that we did not utilize the tra-

ditional pointwise approach to measure precision, re-

call, and F-score. Instead, we refer to the evaluation

method used by (Lee et al., 2020a). More speciﬁcally,

if a point anomaly occurring at time point Z can be

detected within a time period ranging from time point

Z−K to time point Z+K, this anomaly is considered

correctly detected. On the other hand, for any col-

lective anomaly, if it starts at time point A and ends

at time point B (B>A), and it can be detected within a

period between A−K and B, we consider this anomaly

correctly detected. In this paper, we set K to 7 follow-

ing the setting suggested by (Ren et al., 2019), i.e., K

is 7 if the measurement interval of a time series is a

minute, and K is 3 for a hourly time series.

In addition, we used three performance metrics to

evaluate the efﬁciency of each RePAD combination.

The ﬁrst one is LSTM training ratio, which is the

ratio between the number of data points that require

a new LSTM model training and the total number

of data points in the target time series. A lower ra-

tio indicates less computation resources and quicker

response time because LSTM model training takes

some time. The second one is average detection time

for each data point when LSTM model training is not

required (ADT-NT for short). According to the design

of RePAD, the LSTM model will not be replaced if it

can accurately predict the next data point, which also

means that the detection can be performed immedi-

ately without any delay. The last performance metric

is average detection time when LSTM model training

is required (ADT-T for short). When LSTM model

training is required, the time to detect if a data point is

anomalous consists of the time to train a new LSTM

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection

111

model, the time for this new model to re-predict the

value of the data point, and the time to determine if the

data point is anomalous. Apparently, ADT-T would

be longer than ADT-NT due to LSTM model training.

Tables 6 to 9 show the performance of the three

RePAD combinations on the CC2-10 time series. It is

clear that RePAD-PT performs the best since it pro-

vides the highest detection accuracy, the least number

of LSTM training, and the shortest ADT-T. The re-

sult shows that PyTorch seems to be a good choice

for RePAD.

Although RePAD-TFK provides the second best

detection accuracy, its ADT-NT and ADT-T were ob-

viously the longest. It seems like TensorFlow-Keras

is less efﬁcient than PyTorch and Deeplearning4j.

On the other hand, we can see from Table 6 that

RePAD-DL4J provides the lowest detection accuracy

due to the lowest recall. Nevertheless, its ADT-NT

is the shortest and its ADT-T is the second shortest

with the smallest standard deviation. It seems that

Deeplearning4j offers more stable execution perfor-

mance than the other two libraries.

Table 6: The detection accuracy of the three RePAD com-

binations on the CC2-10 time series.

Combination Precision Recall F-score

RePAD-TFK 0.957 0.9 0.928

RePAD-PT 0.954 0.934 0.944

RePAD-DL4J 0.964 0.7 0.811

Table 7: The LSTM training ratio of the three RePAD com-

binations on the CC2-10 time series.

Combination LSTM training ratio

RePAD-TFK 0.0094 (379/40320)

RePAD-PT 0.0089 (357/40320)

RePAD-DL4J 0.0131 (528/40320)

Table 8: The ADT-NT of the three RePAD combinations on

the CC2-10 time series.

Combination ADT-NT (sec) Std. Dev. (sec)

RePAD-TFK 0.518 0.726

RePAD-PT 0.069 0.263

RePAD-DL4J 0.028 0.022

Tables 10 to 13 show the detection results of

the three RePAD combinations on another time se-

ries B3B-10. Apparently, RePAD-TFK has the high-

est detection accuracy and the lowest LSTM train-

ing ratio. However, its ADT-NT and ADT-T are the

longest. This result conﬁrms that TenserFlow-Keras

introduces more overhead to RePAD than the other

two libraries do.

When RePAD was implemented in PyTorch, it has

the second best detection accuracy, the second short-

Table 9: The ADT-T of the three RePAD combinations on

the CC2-10 time series.

Combination ADT-T (sec) Std. Dev. (sec)

RePAD-TFK 1.913 1.409

RePAD-PT 0.100 0.318

RePAD-DL4J 0.375 0.030

est ADT-NT, and the shortest ADT-T. In other words,

PyTorch provides a very good balance between detec-

tion accuracy and response time. On the other hand,

when RePAD-DL4J worked on B3B-10, its perfor-

mance is similar to its performance on CC2-10 (i.e.,

the lowest detection accuracy but satisfactory execu-

tion performance).

Table 10: The detection accuracy of the three RePAD com-

binations on B3B-10.

Combination Precision Recall F-score

RePAD-TFK 0.892 1 0.943

RePAD-PT 0.872 1 0.932

RePAD-DL4J 0.828 1 0.906

Table 11: The LSTM training ratio of the three RePAD

combinations on B3B-10.

Combination LSTM training ratio

RePAD-TFK 0.0026 (105/40320)

RePAD-PT 0.0028 (112/40320)

RePAD-DL4J 0.0042 (168/40320)

5.2 Three SALAD Combinations

To evaluate the detection accuracy of the three

SALAD combinations, we also used precision, recall,

and F-Score. Furthermore, we measured the average

time for each SALAD combination to process each

data point in their conversion phases and detection

phases.

Figure 3 shows the detection results of the three

SALAD combinations on the TMRT time series. Ap-

parently, all of them can detect the collective anomaly

without any false positive or false negative. Hence,

the precision, recall, and F-score of the three combi-

nations are all one as shown in Table 14.

Table 15 lists the time consumption of the three

SALAD combinations on TMRT. It is clear that

SALAD-PT has the shortest average conversion time

and average detection time, whereas SALAD-TFK

has the longest average conversion time and average

detection time. It seems like PyTorch is also the best

choice for SALAD so far.

Table 16 lists the detection results of the three

SALAD combinations on the NYC time series. We

can see that SALAD-DL4J has the best detection ac-

curacy. Recall that the conversion phase of SALAD

ICSOFT 2023 - 18th International Conference on Software Technologies

112

Table 12: The ADT-NT of the three RePAD combinations

on the B3B-10 time series.

Combination ADT-NT (sec) Std. Dev. (sec)

RePAD-TFK 0.517 0.724

RePAD-PT 0.069 0.263

RePAD-DL4J 0.028 0.015

Table 13: The ADT-T of the three RePAD combinations on

the B3B-10 time series.

Combination ADT-NT (sec) Std. Dev. (sec)

RePAD-TFK 1.989 1.436

RePAD-PT 0.105 0.325

RePAD-DL4J 0.388 0.039

Table 14: The detection accuracy of the three SALAD com-

binations on the TMRT time series.

Combination Precision Recall F-score

SALAD-TFK 1 1 1

SALAD-PT 1 1 1

SALAD-DL4J 1 1 1

(Lee et al., 2021b) aims to convert a complex time se-

ries into a less complex AARE series by predicting

the value for each future data point, measuring the

difference between every pair of predicted and actual

data points, and deriving the corresponding AARE

values. As we can see from Figure 4 that most of

the data points predicted by the conversion phase of

SALAD-DL4J matched the real data points. Conse-

quently, as shown in Figure 5, the detection phase

of SALAD-DL4J was able to detect all the collec-

tive anomalies even though there are some false posi-

tives. However, the good performance of the conver-

sion phase of SALAD-DL4J comes at the price of a

long conversion time (see Table 17) due to required

LSTM model training for many data points.

On the other hand, when SALAD-TFK and

SALAD-PT worked on NYC, they both had very

poor detection accuracy (see Table 16). SALAD-

TFK could detect only one collective anomaly, i.e.,

the snow storm. This is because the conversion phase

Figure 3: The detection results of the three SALAD combi-

nations on the TMRT time series.

Table 15: The time consumption of the three SALAD com-

binations on the TMRT time series.

Combination Average Conversion

Time/Std. Dev.(sec)

Average Detection

Time/Std. Dev. (sec)

SALAD-TFK 0.949/1.017 0.472/0.703

SALAD-PT 0.023/0.163 0.008/0.087

SALAD-DL4J 0.162/0.399 0.011/0.027

Table 16: The detection accuracy of the three SALAD com-

binations on the NYC time series.

Combination Precision Recall F-score

SALAD-TFK 0.447 0.2857 0.349

SALAD-PT 0.338 0.2857 0.310

SALAD-DL4J 0.709 1 0.830

Table 17: The time consumption of the three SALAD com-

binations on the NYC time series.

Combination Average Conversion

Time/Std. Dev.(sec)

Average Detection

Time/Std. Dev. (sec)

SALAD-TFK 0.477/0.798 0.488/0.721

SALAD-PT 0.045/0.273 0.022/0.147

SALAD-DL4J 2.306/4.969 0.018/0.042

Figure 4: The original data points in the NYC time series

versus the data points predicted by the conversion phase of

SALAD-DL4J.

Figure 5: The AARE values generated by the detection

phase of SALAD-DL4J versus the self-adaptive detection

threshold of SALAD-DL4J on the NYC time series.

of SALAD-TFK was unable to correctly predict data

points (as shown in Figure 6). This bad performance

consequently affected the detection phase of SALAD-

TFK and disabled it to detect anomalies. We can see

from Figure 7 that almost all AARE values are lower

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection

113

Figure 6: The original data points in the NYC time series

versus the data points predicted by the conversion phase of

SALAD-TFK.

Figure 7: The AARE values generated by the detection

phase of SALAD-TFK versus the self-adaptive detection

threshold of SALAD-TFK on the NYC time series.

than the detection threshold.

If we look at Figure 7 more closely, we can see

that the detection threshold was very high in the be-

ginning due to the high AARE values, which makes

SALAD felt that its current LSTM model did not need

to be replaced. Even though the threshold dropped

afterwards, it was still much higher than many subse-

quent AARE values. This is why most of the anoma-

lies could not be detected. Since SALAD-TFK re-

quires only a few model training, its average conver-

sion time is much shorter than that of SALAD-DL4J

(see Table 17).

The same situation happened to SALAD-PT when

it worked on the NYC series. SALAD-PT has very

poor detection accuracy even though its average con-

version time and average detection time are the short-

est.

6 CONCLUSIONS AND FUTURE

WORK

In this paper, we investigated how DL libraries impact

online adaptive lightweight time series anomaly de-

tection by implementing two state-of-the-art anomaly

detection approaches (RePAD and SALAD) in three

well-known DL libraries (TensorFlow-Keras, Py-

Torch, and Deeplearning4j) and conducting a series of

experiments to evaluate their detection performance

and time consumption based on four open-source time

series. The results indicate that DL libraries have a

signiﬁcant impact on RePAD and SALAD in terms of

not only their detection accuracy but also their time

consumption and response time.

According to the results, TensorFlow-Keras is not

recommended for online adaptive lightweight time se-

ries anomaly detection because it might lead to unsta-

ble detection accuracy and more time consumption.

When it was used to implement RePAD, RePAD had

satisfactory detection accuracy. However, when it was

used to implement SALAD, SALAD had unstable de-

tection accuracy on one used time series. Besides,

TensorFlow-Keras is less efﬁcient than PyTorch and

Deeplearning4j because it causes the longest response

time for both RePAD and SALAD.

On the other hand, PyTorch is the most efﬁcient

library among the three DL libraries since it enables

RePAD and SALAD to provide real-time processing

and instant responses. It also enables RePAD to pro-

vide high detection accuracy. However, similar to

TensorFlow-Keras, it causes unstable detection ac-

curacy when it was used to implement SALAD and

worked on the NYC time series.

Deeplearning4j is considered the most stable li-

brary among the three DL libraries because it not

only enables RePAD and SALAD to provide satisfac-

tory detection accuracy, but also enables RePAD and

SALAD to have reasonable time consumption and re-

sponse time.

We found that it is very important to carefully

choose DL libraries for online adaptive lightweight

time series anomaly detection because DL libraries

might not show the true performance of an anomaly

detection approach. What makes it even worse is

that they might mislead developers or users in believ-

ing that one bad anomaly detection approach imple-

mented in a good DL library is better than a good

anomaly detection approach implemented in a bad DL

library.

In our future work, we would like to release all the

source code (i.e., RePAD and SALAD implemented

in the three DL libraries) on a public software reposi-

tory such as GitHub, GitLab, or Bitbucket.

ACKNOWLEDGEMENT

The authors want to thank the anonymous reviewers

for their reviews and suggestions for this paper.

ICSOFT 2023 - 18th International Conference on Software Technologies

114

REFERENCES

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,

J., Devin, M., Ghemawat, S., Irving, G., Isard, M.,

et al. (2016). Tensorﬂow: a system for large-scale

machine learning. In Osdi, volume 16, pages 265–

283. Savannah, GA, USA.

Ahmed, M., Mahmood, A. N., and Hu, J. (2016). A survey

of network anomaly detection techniques. Journal of

Network and Computer Applications, 60:19–31.

Alimohammadi, H. and Chen, S. N. (2022). Perfor-

mance evaluation of outlier detection techniques in

production timeseries: A systematic review and

meta-analysis. Expert Systems with Applications,

191:116371.

azquez-Garc

ıa, A., Conde, A., Mori, U., and Lozano,

J. A. (2021). A review on outlier/anomaly detection

in time series data. ACM Computing Surveys (CSUR),

54(3):1–33.

Braei, M. and Wagner, S. (2020). Anomaly detection in

univariate time-series: A survey on the state-of-the-

art. arXiv preprint arXiv:2004.00433.

Chi, H., Zhang, Y., Tang, T. L. E., Mirabella, L., Dalloro,

L., Song, L., and Paulino, G. H. (2021). Universal

machine learning for topology optimization. Com-

puter Methods in Applied Mechanics and Engineer-

ing, 375:112739.

CNTK (2023). The microsoft cognitive toolkit is a uniﬁed

deep learning toolkit. https://github.com/microsoft/

CNTK. [Online; accessed 25-February-2023].

Deeplearning4j (2023). Introduction to core Deeplearning4j

concepts. https://deeplearning4j.konduit.ai/. [Online;

accessed 24-February-2023].

Deka, P. K., Verma, Y., Bhutto, A. B., Elmroth, E., and

Bhuyan, M. (2022). Semi-supervised range-based

anomaly detection for cloud systems. IEEE Transac-

tions on Network and Service Management.

EarlyStopping (2023). What is early stopping? https:

//deeplearning4j.konduit.ai/. [Online; accessed 24-

February-2023].

Eom, H., Figueiredo, R., Cai, H., Zhang, Y., and Huang,

G. (2015). Malmos: Machine learning-based mobile

ofﬂoading scheduler with online training. In 2015

3rd IEEE International Conference on Mobile Cloud

Computing, Services, and Engineering, pages 51–60.

IEEE.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,

Dally, W. J., and Keutzer, K. (2016). Squeezenet:

Alexnet-level accuracy with 50x fewer parame-

ters and¡ 0.5 mb model size. arXiv preprint

arXiv:1602.07360.

Ibrahim, M., Badran, K. M., and Hussien, A. E. (2022).

Artiﬁcial intelligence-based approach for univari-

ate time-series anomaly detection using hybrid cnn-

bilstm model. In 2022 13th International Conference

on Electrical Engineering (ICEENG), pages 129–133.

IEEE.

Keras (2023). Keras - a deep learning API written in

python. https://keras.io/about/. [Online; accessed 25-

February-2023].

Ketkar, N. and Santana, E. (2017). Deep learning with

Python, volume 1. Springer.

Kieu, T., Yang, B., and Jensen, C. S. (2018). Outlier detec-

tion for multidimensional time series using deep neu-

ral networks. In 2018 19th IEEE international confer-

ence on mobile data management (MDM), pages 125–

134. IEEE.

Kovalev, V., Kalinovsky, A., and Kovalev, S. (2016). Deep

learning with theano, torch, caffe, tensorﬂow, and

deeplearning4j: Which one is the best in speed and

accuracy?

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-

agenet classiﬁcation with deep convolutional neural

networks. Communications of the ACM, 60(6):84–90.

Lavin, A. and Ahmad, S. (2015). Evaluating real-time

anomaly detection algorithms–the numenta anomaly

benchmark. In 2015 IEEE 14th international confer-

ence on machine learning and applications (ICMLA),

pages 38–44. IEEE.

Lee, M.-C. and Lin, J.-C. (2023). RePAD2: Real-time,

lightweight, and adaptive anomaly detection for open-

ended time series. In Proceedings of the 8th Inter-

national Conference on Internet of Things, Big Data

and Security - IoTBDS, pages 208–217. INSTICC,

SciTePress. arXiv preprint arXiv:2303.00409.

Lee, M.-C., Lin, J.-C., and Gan, E. G. (2020a). ReRe: A

lightweight real-time ready-to-go anomaly detection

approach for time series. In 2020 IEEE 44th Annual

Computers, Software, and Applications Conference

(COMPSAC), pages 322–327. IEEE. arXiv preprint

arXiv:2004.02319. The updated version of the ReRe

algorithm from arXiv was used here.

Lee, M.-C., Lin, J.-C., and Gran, E. G. (2020b). RePAD:

real-time proactive anomaly detection for time series.

In Advanced Information Networking and Applica-

tions: Proceedings of the 34th International Confer-

ence on Advanced Information Networking and Ap-

plications (AINA-2020), pages 1291–1302. Springer.

arXiv preprint arXiv:2001.08922. The updated ver-

sion of the RePAD algorithm from arXiv was used

here.

Lee, M.-C., Lin, J.-C., and Gran, E. G. (2021a). How far

should we look back to achieve effective real-time

time-series anomaly detection? In Advanced Infor-

mation Networking and Applications: Proceedings of

the 35th International Conference on Advanced In-

formation Networking and Applications (AINA-2021),

Volume 1, pages 136–148. Springer. arXiv preprint

arXiv:2102.06560.

Lee, M.-C., Lin, J.-C., and Gran, E. G. (2021b). SALAD:

Self-adaptive lightweight anomaly detection for real-

time recurrent time series. In 2021 IEEE 45th An-

nual Computers, Software, and Applications Confer-

ence (COMPSAC), pages 344–349. IEEE.

Lee, T. J., Gottschlich, J., Tatbul, N., Metcalf, E., and

Zdonik, S. (2018). Greenhouse: A zero-positive ma-

chine learning system for time-series anomaly detec-

tion. arXiv preprint arXiv:1801.03168.

Impact of Deep Learning Libraries on Online Adaptive Lightweight Time Series Anomaly Detection

115

LinkedIn (2018). linkedin/luminol [online code reposi-

tory]. https://github.com/linkedin/luminol. [Online;

accessed 24-February-2023].

Moso, J. C., Cormier, S., de Runz, C., Fouchal, H.,

and Wandeto, J. M. (2021). Anomaly detection

on data streams for smart agriculture. Agriculture,

11(11):1083.

Nguyen, G., Dlugolinsky, S., Bob

ak, M., Tran, V.,

opez Garc

ıa,

A., Heredia, I., Mal

ık, P., and Hluch

L. (2019). Machine learning and deep learning frame-

works and libraries for large-scale data mining: a sur-

vey. Artiﬁcial Intelligence Review, 52:77–124.

Niu, Z., Yu, K., and Wu, X. (2020). LSTM-based VAE-

GAN for time-series anomaly detection. Sensors,

20(13):3738.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,

Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,

Antiga, L., et al. (2019). Pytorch: An imperative style,

high-performance deep learning library. Advances in

neural information processing systems, 32.

Pereira, J. and Silveira, M. (2019). Learning representa-

tions from healthcare time series data for unsuper-

vised anomaly detection. In 2019 IEEE international

conference on big data and smart computing (Big-

Comp), pages 1–7. IEEE.

Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing,

T., Yang, M., Tong, J., and Zhang, Q. (2019). Time-

series anomaly detection service at microsoft. In Pro-

ceedings of the 25th ACM SIGKDD international con-

ference on knowledge discovery & data mining, pages

3009–3017.

Schneider, J., Wenig, P., and Papenbrock, T. (2021). Dis-

tributed detection of sequential anomalies in univari-

ate time series. The VLDB Journal, 30(4):579–602.

Twitter (2015). AnomalyDetection R package [on-

line code repository]. https://github.com/twitter/

AnomalyDetection. [Online; accessed 24-February-

2023].

Wang, Z., Liu, K., Li, J., Zhu, Y., and Zhang, Y. (2019).

Various frameworks and libraries of machine learning

and deep learning: a survey. Archives of computa-

tional methods in engineering, pages 1–24.

Yatish, H. and Swamy, S. (2020). Recent trends in

time series forecasting–a survey. International Re-

search Journal of Engineering and Technology (IR-

JET), 7(04):5623–5628.

Yeh, C.-C. M., Zhu, Y., Dau, H. A., Darvishzadeh,

A., Noskov, M., and Keogh, E. (2019). Online

amnestic dynamic time warping to allow real-time

golden batch monitoring. https://sites.google.com/

view/gbatch?pli=1.

Zahidi, Y., El Younoussi, Y., and Al-Amrani, Y. (2021).

A powerful comparison of deep learning frameworks

for arabic sentiment analysis. International Journal

of Electrical & Computer Engineering (2088-8708),

11(1).

Zhang, J. E., Wu, D., and Boulet, B. (2021). Time se-

ries anomaly detection for smart grids: A survey. In

2021 IEEE Electrical Power and Energy Conference

(EPEC), pages 125–130. IEEE.

Zhang, Q., Li, X., Che, X., Ma, X., Zhou, A., Xu, M.,

Wang, S., Ma, Y., and Liu, X. (2022). A comprehen-

sive benchmark of deep learning libraries on mobile

devices. In Proceedings of the ACM Web Conference

2022, pages 3298–3307.

Zhang, X., Wang, Y., and Shi, W. (2018). pcamp: Perfor-

mance comparison of machine learning packages on

the edges. In HotEdge.

ICSOFT 2023 - 18th International Conference on Software Technologies

116