An Experimental Evaluation of the Adaptive Sampling Method for
Time Series Classification and Clustering
Muhammad Marwan Muhammad Fuad
Forskningsparken 3, Institutt for kjemi, NorStruct
Department of Chemistry, The University of Tromsø - The Arctic University of Norway, NO-9037 Tromsø, Norway
Keywords: Adaptive Sampling, Classification, Clustering, Data Mining, Time Series.
Abstract: Adaptive sampling is a dimensionality reduction technique of time series data inspired by the dynamic
programming piecewise linear approximation. This dimensionality reduction technique yields a suboptimal
solution of the problem of polygonal curve approximation by limiting the search space. In this paper, we
conduct extensive experiments to evaluate the performance of adaptive sampling in 1-NN classification and
k-means clustering tasks. The experiments we conducted show that adaptive sampling gives satisfactory
results in the aforementioned tasks even for relatively high compression ratios.
1 INTRODUCTION AND
RELATED WORK
A time series is a sequence of indexed values
=
(
)
,
(
)
,…,
(
)〉
(1)
Time series data mining arises in many domains
including economics, medicine, finance, and
astronomy. For this reason, time series data mining
has received attention over the last years.
The major time series data mining tasks include
query-by-content, clustering, classification, anomaly
detection, motif discovery, segmentation, and
prediction. Executing these tasks usually involves
performing another fundamental task in data mining
which is the similarity search. A similarity search
problem consists of a database , a query or pattern
, which does not necessarily belong to , and a
tolerance that determines the closeness of the data
objects to the query in order to be qualified as answers
to that query. The principal component of the
similarity search problem is the distance metric or the
similarity measure which quantifies how much two
data objects are close to each other. The Euclidean
Distance (ED) (Figure 1) is a widely used time series
distance metric. It is defined between two time series
=
,
,…,
and =
,
,…,
as:

(
,
)
=
|
−
|

(2)
Figure 1: The Euclidean distance.
Another popular similarity measure (not a distance
metric) used in time series data mining is the Dynamic
Time Warping (DTW) (Guo and Siegelmann, 2004)
(Figure 2). DTW is defined as:

(
,
)
=
(
,
)
+min

(
,
−1
)


(
−1,
)

(1,
−1)
(3)
where 1≤≤,1≤≤
Figure 2: Dynamic time warping.
48
Fuad, M.
An Experimental Evaluation of the Adaptive Sampling Method for Time Series Classification and Clustering.
DOI: 10.5220/0005694600480054
In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 48-54
ISBN: 978-989-758-173-1
Copyright
c
2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
A trivial solution to the similarity search problem
is to compare every single time series in against.
This is known as sequential scanning. Obviously, this
solution is not an efficient one given that modern time
series databases are usually very large.
Dimensionality Reduction Techniques, also
named Representation Methods, adopt the GEMINI
framework (Faloutsos et al, 1994) (Figure 3) to
process the similarity search problem of time series
more efficiently. In GEMINI, the original time series
are mapped onto low-dimension spaces, which
reduces their dimensionality, and then the query is
processed in those low-dimension spaces.
There are quite a few dimensionality reduction
techniques in the literature. The most popular ones are
Piecewise Aggregate Approximation (PAA) (Keogh
et al, 2000) and (Yi and Faloutsos, 2000), and
Adaptive Piecewise Constant Approximation (APCA)
(Keogh et al, 2001).
One dimensionality reduction technique that is
related to our experimental study is Piecewise Linear
Approximation (PLA) (Morinaka et al, 2001) (Figure
4). PLA transforms the time series into Δ-SEALS (Δ-
bounded Sequence of Approximated Liner Segments).
The basic idea of the Δ-SEALS is to approximate the
time series by a sequence of linear segments. Each
line segment is the longest possible linear segment
whose accumulated error does not exceed a given
deviation bound , where the error is defined by the
least square method.
Algorithm: range
_
query(
Q
,
)
1. Transform the time series in
database D from the original
n-dimensional space into
a lower dimensional space of
N dimensions
2. Define a lower bounding
distance on the reduced space:
,
≤
,
∀
,
∈
3. Eliminate all the time series
for which we have
(
,
)

to obtain a candidate answer
set
4. Apply
to the candidate
answer set and eliminate all
the time series that are
farther than ε from Q to get
the final answer set.
Figure 3: The GEMINI algorithm for range queries.
Another dimensionality reduction technique
related to the experimental section of this paper is
Discrete Fourier Transform (DFT) (Agrawal et al,
1993), (Agrawal et al, 1995) (Figure 5). The basic
idea of DFT is that a time series can be represented
using complex numbers called the Fourier
Coefficients, so a time series of 256 dimensions, for
instance, can be represented by 128 complex Fourier
coefficients. However, the first coefficients are the
most significant and the most representative ones, so
the other Fourier coefficients can be truncated
without much loss of information.
Figure 4: Δ-SEALS.
Figure 5: DFT using 8 coefficients.
In this paper we present an extensive experimental
evaluation of a certain dimensionality reduction
technique which is the adaptive sampling method of
time series. We show how this method can give good
results in time series classification and clustering
tasks compared with other methods.
The rest of the paper is as follows; in Section 2 the
adaptive sampling method is introduced. In Section 3
we present the experiments we conducted. In Section
4 we conclude this paper with final remarks and
directions for future work.
An Experimental Evaluation of the Adaptive Sampling Method for Time Series Classification and Clustering
49
2 THE ADAPTIVE SAMPLING
METHOD
In (Marteau and Ménier, 2006) the authors presented
the Adaptive Multiresolution Simplification model of
times series data which was inspired by the Dynamic
Programming Piecewise Linear Approximation
model presented in (Marteau and Gibet, 2005) and
derived from (Perez and Vidal, 1994) and
(Kolesnikov and Franti, 2003). This adaptive model
yields a suboptimal solution of the problem of
polygonal curve approximation by limiting the search
space.
We briefly present here an outline of the model:
given an n-dimensional time series , the objective is
to find an approximation
of that satisfies:
=


(
,
)
(4)
where
E is the root mean square error between
and the model
. The search is limited to the family
of piecewise linear and continuous functions
(
)
. The successive segments have to be
contiguous, so that the end of one segment is the
beginning of the next one. The authors apply the
dynamic programing algorithm to select the optimal
set of parameters
=
. This is done as follows:
first, we define the compression ratio of the piecewise
approximation as:
=1
|
|
|
(
)|
×
+1
(5)
where
(
)
∈ℝ
,∀
Given the value of and the width of the time
window =
(
)
∈
,,…,
, the number of
piecewise linear segments =
|
|
−1 is known
in this case.
Let (), by definition, be the parameters of a
piecewise approximation containing segments, and
let
(
,
)
be the minimal error of the best piecewise
linear approximation containing segments and
covering the time window
1,2,,
,
(
,
)
can
then be written as:
(
,
)
=
(
)

(
)
(
)
−
(
)

(6)
According to Bellman’s optimality principle
(Bellman, 1957), the above term can be decomposed
as:
(
,
)
=


(
,
)
+
(
−1,
)
(7)
where
(
,
)
=
,
(
)
−
(
)

and
,
(
)
=
(
)
−
(
)
×


+
(
)
is the
linear segment between
(
)
and
(
)
.
Recursion is initialized by observing that:
(
,
)
=0 ,,<
(8)
At the end of the recursion, we get the optimal
piecewise linear approximation; i.e. the set of time
stamps of the end points of the linear segments:
(
)
=

(
)

(
)
(
)
−
(
)

(9)
with the minimal error:
(
,
)
=
(
)
(
)
−
(
)

(10)
The complexity of the algorithm is
(
,
)
. In
order to reduce this complexity the search window
can be limited by using a lower bound =

−,0
for each step , and where 
is a user-defined parameter:
(
,
)
=


(
,
)
+
(
−1,
)
(11)
In practice we choose =

.
3 EXPERIMENTS
Before we present the outcome of our experiments,
we briefly introduce the two main data mining tasks
on which we tested the adaptive sampling method.
The two tasks are classification and clustering.
Classification: The goal of classification (also called
supervised learning) is to assign an unknown object
to one out of a given number of classes or categories
(Bunke and Kraetzl, 2003). Classification is based on
four fundamental components (Gorunescu , 2006): 1-
Class, which is a categorical variable representing the
‘label’ put on the object after its classification. 2-
Predictors, which are represented by the attributes of
the data to be classified. 3- Training dataset, which is
the set of data containing values for the two previous
components, and is used for ‘training’ the model to
recognize the appropriate class based on available
predictors. 4- Testing dataset, containing new data
that will be classified by the model constructed in the
previous steps.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
50
Table 1: 1-NN classification errors for different compression ratios.
One of the most popular classification techniques
of time series is −Nearest Neighbor Classification
(−NN). In −NN the query is classified according
to the majority of its nearest neighbours (Vlachos and
Gunopulos, 2003). Usually is taken to be 1, thus
applying a first nearest-neighbor (1−NN) rule using
leaving-one-out cross validation. This means that
every data object is compared to the other data objects
in the dataset. If the 1−NN does not belong to the
same class, the error counter is incremented by 1.
Clustering: It is the task of partitioning the data
objects into groups of similar objects. Clustering (also
called unsupervised learning) is different from
classification in that in clustering we do not have
target variables. Instead, clustering algorithms seek to
segment the entire data set into relatively
homogeneous subgroups or clusters (Larose, 2005).
There are different categories of clustering
algorithms, the one we are interested in in this paper
is Partitioning-based Clustering. In particular, we are
interested in -means clustering. In -means
clustering we have a set of data points in -
dimensional space
and an integer and the
problem is to determine a set of points, the
centroids, in
so as to minimize the mean distance
from each data point to its nearest center (Kanungo et
al, 2002).
Dataset Method
ρ
0% 5% 10% 25% 50% 75% 90%
Synthetic Control
DTW-PLA 0.007 0.003 0.010 0.036 0.083 0.147 0.217
ED-PLA 0.120 0.113 0.127 0.147 0.210 0.273 0.290
DFT 0.097 0.127 0.150 0.203 0.273 0.367 0.410
DFT-PLA NA 0.097 0.103 0.127 0.203 0.217 0.273
Gun-Point
DTW-PLA 0.093 0.087 0.100 0.133 0.180 0.220 0.287
ED-PLA 0.087 0.067 0.080 0.113 0.147 0.193 0.220
DFT 0.087 0.113 0.133 0.160 0.200 0.233 0.307
DFT-PLA NA 0.087 0.113 0.140 0.160 0.200 0.273
CBF
DTW-PLA 0.003 0.080 0.021 0.024 0.037 0.071 0.100
ED-PLA 0.148 0.128 0.148 0.173 0.188 0.209 0.356
DFT 0.112 0.147 0.184 0.209 0.234 0.382 0.398
DFT-PLA NA 0.112 0.136 0.173 0.209 0.234 0.263
Trace
DTW-PLA 0.000 0.000 0.010 0.040 0.090 0.160 0.190
ED-PLA 0.240 0.190 0.210 0.240 0.360 0.380 0.430
DFT 0.186 0.210 0.350 0.380 0.410 0.430 0.480
DFT-PLA NA 0.170 0.210 0.310 0.370 0.390 0.420
Lightning-2
DTW-PLA 0.131 0.117 0.131 0.164 0.183 0.217 0.262
ED-PLA 0.246 0.250 0.250 0.283 0.311 0.367 0.383
DFT 0.213 0.246 0.295 0.333 0.377 0.410 0.426
DFT-PLA NA 0.213 0.217 0.233 0.283 0.317 0.367
Lightning-7
DTW-PLA 0.274 0.247 0.274 0.301 0.342 0.397 0.438
ED-PLA 0.425 0.425 0.429 0.443 0.466 0.486 0.514
DFT 0.405 0.414 0.443 0.466 0.514 0.629 0.729
DFT-PLA NA 0.384 0.425 0.429 0.471 0.571 0.685
ECG
DTW-PLA 0.230 0.210 0.230 0.240 0.250 0.260 0.270
ED-PLA 0.120 0.090 0.110 0.130 0.160 0.200 0.230
DFT 0.120 0.130 0.150 0.170 0.210 0.240 0.260
DFT-PLA NA 0.100 0.110 0.140 0.160 0.190 0.220
Adiac
DTW-PLA 0.396 0.389 0.399 0.427 0.484 0.574 0.608
ED-PLA 0.389 0.385 0.393 0.420 0.473 0.567 0.595
DFT 0.385 0.420 0.470 0.567 0.592 0.687 0.709
DFT-PLA NA 0.385 0.413 0.475 0.560 0.575 0.673
An Experimental Evaluation of the Adaptive Sampling Method for Time Series Classification and Clustering
51
Table 2: k-means clustering quality for different compression ratios.
More formally, the -means clustering error can
be measured by:
=

,


(12)
Where

is the

point in the

cluster, and
is the number of points in that cluster. The quality
of the -means clustering increases as the error given
in (Eq. 12) decreases.
The number of clusters is determined by the user,
application-dependent, or given by a certain
clustering validity measure.
We conducted experiments on classification and
clustering tasks of time series data available at (Chen
et al, 2015). This archive makes up between 90% and
100% of all publicly available, labeled time series
data sets in the world, and it represents the interest of
the data mining/database community, and not just one
group (Ding et al, 2008).
The length of the time series varies between 60
(Synthetic_control) and 637 (Lightning-2). The size
of datasets varies between 61 (Lightning-2) and 900
(CBF), so as we can see, we tested our method on a
wide range of datasets of different lengths and sizes
Dataset Method
ρ
0% 5% 10% 25% 50% 75% 90%
Synthetic Control
DTW-PLA 0.990 0.995 0.962 0.858 0.726 0.618 0.528
ED-PLA 0.649 0.656 0.617 0.573 0.479 0.392 0.289
DFT 0.723 0.719 0.685 0.623 0.543 0.441 0.378
DFT-PLA NA 0.839 0.718 0.653 0.587 0.456 0.389
Gun-Point
DTW-PLA 0.879 0.896 0.814 0.684 0.527 0.438 0.373
ED-PLA 0.473 0.484 0.447 0.402 0.348 0.289 0.226
DFT 0.489 0.491 0.469 0.424 0.351 0.312 0.261
DFT-PLA NA 0.890 0.806 0.657 0.511 0.423 0.351
CBF
DTW-PLA 0.983 0.985 0.916 0.847 0.683 0.513 0.426
ED-PLA 0.602 0.610 0.595 0.548 0.463 0.372 0.253
DFT 0.643 0.656 0.617 0.579 0.512 0.436 0.324
DFT-PLA NA 0.787 0.746 0.675 0.610 0.476 0.358
Trace
DTW-PLA 0.843 0.835 0.802 0.736 0.619 0.494 0.327
ED-PLA 0.453 0.436 0.405 0.387 0.322 0.254 0.211
DFT 0.510 0.476 0.447 0.406 0.359 0.287 0.244
DFT-PLA NA 0.675 0.634 0.576 0.468 0.329 0.259
Lightning-2
DTW-PLA 0.958 0.969 0.879 0.808 0.612 0.465 0.389
ED-PLA 0.589 0.602 0.566 0.526 0.445 0.361 0.222
DFT 0.618 0.624 0.598 0.554 0.481 0.384 0.254
DFT-PLA NA 0.917 0.858 0.765 0.548 0.411 0.354
Lightning-7
DTW-PLA 0.817 0.786 0.739 0.675 0.628 0.463 0.301
ED-PLA 0.437 0.415 0.388 0.354 0.311 0.232 0.204
DFT 0.458 0.435 0.403 0.389 0.355 0.285 0.254
DFT-PLA NA 0.617 0.565 0.532 0.441 0.332 0.264
ECG
DTW-PLA 0.985 0.976 0.878 0.739 0.623 0.476 0.390
ED-PLA 0.674 0.680 0.652 0.611 0.543 0.425 0.354
DFT 0.662 0.653 0.635 0.578 0.521 0.398 0.322
DFT-PLA NA 0.670 0.647 0.597 0.537 0.402 0.338
Adiac
DTW-PLA 0.672 0.684 0.646 0.585 0.449 0.386 0.269
ED-PLA 0.362 0.380 0.343 0.321 0.287 0.224 0.189
DFT 0.380 0.395 0.364 0.334 0.305 0.245 0.195
DFT-PLA NA 0.395 0.351 0.344 0.311 0.264 0.211
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
52
to avoid getting biased results.
Table 1 shows the classification error (the smaller
the better) in a 1−NN classification task of the
adaptive sampling method applied to DFT (c.f.
Section 1), and also applied using DTW and the
Euclidean distance. The experiments are conducted
for different compression ratios ρ, where the ρ = 0%
indicates no adaptive sampling is performed (the
method is turned off)
The results show that DTW is adapted to the
classification task in question. Adaptive sampling
gave acceptable results even for compression ratios
between 25% and 50%. For dataset ECG the results
were quite acceptable even for a very high
compression ratio (ρ = 90%).
ED was also adapted to adaptive sampling as the
classification error was in general acceptable for
compression ratio of 50%.
When applying adaptive sampling to DFT, the
results were always better than the original method
for all datasets and for all compression ratios.
An interesting phenomenon that we noticed is that
in many cases, applying adaptive sampling for a
compression ratio of 5% gave better results than the
raw data themselves. We believe the reason for this is
that compassion has a positive effect of smoothing the
data.
We also conducted k-means clustering
experiments on the same datasets and for the same
compression ratios. Table 2 shows the k-means
clustering quality (the larger the better) of the datasets
we tested. As we can see from Table 2, the results of
the k-means clustering are similar to those of 1−NN
classification. They show that DTW is the most
adapted method for the k-means clustering task and
again the adaptive sampling yielded acceptable
results even for compression ratios between 25% and
50% for almost all the datasets tested. The results,
however, degraded in most cases for high
compression ratios.
ED was also adapted to adaptive sampling as the
quality of k-means clustering was still acceptable
even for a compression ratio of 50%.
As was the case with classification, adaptive
sampling improved the performance of DFT for all
datasets and for all compression ratios. When
applying adaptive sampling to DFT, the results were
always better than the original method for all
compression ratios and for all compression ratios.
The smoothing effect that appeared in the
classification task experiments for a compression
ratio of 5% also appeared in the k-means clustering
experiments.
4 CONCLUSIONS
In this paper, we conducted extensive experiments on
the adaptive sampling method of time series in 1−
NN classification and k-means clustering tasks. These
experiments were conducted on a variety of time
series datasets, using the Euclidean distance, the
dynamic time warping, and the discrete Fourier
transform (DFT). The output of our experiments
shows that even when using high compression ratios,
the performance of the adaptive sampling method is
still acceptable in the two aforementioned time series
data mining tasks. In some cases, the adaptive
sampling method yielded acceptable results even for
a high compression ratio.
In the future, we intend to study the impact of
adaptive sampling on other time series data mining
tasks and also to compare it with other time series
dimensionality reduction techniques.
REFERENCES
Agrawal, R., Faloutsos, C., & Swami, A. (1993): Efficient
similarity search in sequence databases. Proceedings of
the 4th Conf. on Foundations of Data Organization and
Algorithms.
Agrawal, R., Lin, K. I., Sawhney, H. S. and Shim, K.
(1995): Fast similarity search in the presence of noise,
scaling, and translation in time-series databases. In
Proceedings of the 21st Int'l Conference on Very Large
Databases. Zurich, Switzerland.
Bellman, R., (1957): Dynamic programming. Princeton
University Press, Princeton, NJ.
Bunke, H., Kraetzl, M. (2003): Classification and detection
of abnormal events in time series of graphs. In: Last,
M., Kandel, A., Bunke, H. (eds.): Data mining in time
series databases. World Scientific.
Chen,Y., Keogh, E., Hu, B., Begum, N., Bagnall, A.,
Mueen, A., and Batista, G. (2015). The UCR Time
Series Classification Archive. URL.
www.cs.ucr.edu/~eamonn/time_series_data.
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and
Keogh, E. (2008): Querying and mining of time series
data: experimental comparison of representations and
distance measures. In Proc of the 34th VLDB.
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y.
(1994): Fast subsequence matching in time-series
databases. In Proc. ACM SIGMOD Conf., Minneapolis.
Gorunescu, F. (2006): Data mining: concepts, models and
techniques, Blue Publishing House, Cluj-Napoca.
Guo, A.Y., and Siegelmann, H. (2004): Time-warped
longest common subsequence algorithm for music
retrieval, in Proc. ISMIR.
Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra,S.
(2000): Dimensionality reduction for fast similarity
An Experimental Evaluation of the Adaptive Sampling Method for Time Series Classification and Clustering
53
search in large time series databases. J. of Know. and
Inform. Sys.
Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S.
(2001): Locally adaptive dimensionality reduction for
similarity search in large time series databases.
SIGMOD pp 151-162.
Kolesnikov, A., and Franti, P. (2003): Reduced-search
dynamic programming for approximation of polygonal
curves. Pattern Recognition Letters.
Kanungo, T., Netanyahu, N.S., Wu, A.Y. (2002): An
efficient k-means clustering algorithm: analysis and
implementation. IEEE Transactions on Pattern
analysis and machine intelligence 24(7).
Larose, D.T. (2005): Discovering knowledge in data: an
introduction to data mining. New York, Wiley.
Marteau, P.F., and Gibet, S. (2005): Adaptive sampling of
motion trajectories for discrete task-based analysis and
synthesis of gesture. In Proc. of Int. Gesture Workshop,
Vannes, France.
Marteau, P.F., Ménier, G. (2006): Adaptive multiresolution
and dedicated elastic matching in linear time
complexity for time series data mining, Sixth
International conference on Intelligent Systems Design
and Applications (ISDA 2006), Jinan Shandong, China,
16-18 October.
Morinaka, Y., Yoshikawa, M., Amagasa, T., and Uemura,
S. (2001): The L-index: an indexing structure for
efficient subsequence matching in time sequence
databases. In Proc. 5th PacificAsia Conf. on Knowledge
Discovery and Data Mining, pages 51-60.
Perez, J. C., and Vidal, E. (1994): Optimum polygonal
approximation of digitized curves. Pattern Recognition
Letters.
Vlachos, M., and Gunopulos, D. (2003): Indexing time-
series under conditions of noise. In: Last, M., Kandel,
A., Bunke, H. (eds.): Data mining in time series
databases. World Scientific.
Yi, B, K., & Faloutsos, C. (2000): Fast time sequence
indexing for arbitrary Lp norms. Proceedings of the
26st International Conference on Very Large
Databases, Cairo, Egypt.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
54