Deviation-based Dynamic Time Warping for Clustering Human Sleep
Chiying Wang
1
, Sergio A. Alvarez
2
, Carolina Ruiz
1
and Majaz Moonis
3
1
Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609 U.S.A.
2
Department of Computer Science, Boston College, Chestnut Hill, MA 02467 U.S.A.
3
Department of Neurology, U. of Massachusetts Medical School, Worcester, MA 01655 U.S.A.
Keywords:
Dynamic Time Warping, Deviation, Human Sleep, Clustering.
Abstract:
In this paper, we propose two versions of a modified dynamic time warping approach for comparing discrete
time series. This approach is motivated by the observation that the distribution of dynamic time warping paths
between pairs of human sleep time series is concentrated around the path of constant slope. Both versions use
a penalty term for the deviation between the warping path and the path of constant slope for a given pair of
time series. In the first version, global weighted dynamic time warping, the penalty term is added as a post-
processing step after a standard dynamic time warping computation, yielding a modified similarity metric that
can be used for time series clustering. The second version, stepwise deviation-based dynamic time warping,
incorporates the penalty term into the dynamic programming optimization itself, yielding modified optimal
warping paths, together with a similarity metric. Clustering experiments over synthetic data, as well as over
human sleep data, show that the proposed methods yield significantly improved accuracy and generative log
likelihood as compared with standard dynamic time warping.
1 INTRODUCTION
Human sleep patterns are closely associated with
overall health and quality of life, making the scientific
study of sleep an important pursuit. Sleep stage tran-
sitions (Kishi et al., 2008) and bout durations (Chu-
Shore et al., 2010) are essential indicators in charac-
terizing the structure of sleep. Typical patterns of hu-
man sleep have been found (Bianchi et al., 2010), yet
sleep microstructure varies across individuals, being
affected by age, circadian rhythms (Dijk and Lockley,
2002), and other factors.
A substantial challenge in modeling the dynam-
ics of sleep is the scarcity of key dynamical events
such as stage transitions within sleep sequences. This
scarcity yields small samples over which dynami-
cal models are to be trained, leading to high uncer-
tainty in parameter estimates. An approach known
as dynamical modeling-clustering (CDMC) was pro-
posed (Alvarez and Ruiz, 2013) to address this chal-
lenge. CDMC reduces model variance through se-
lective aggregation of instances during a clustering
phase, so that models are learned over collections of
dynamically similar instances rather than individual
instances. The technique of initialization using clus-
tering by Dynamic Time Warping (DTW) similarity
(Oates et al., 2001) yields good convergence proper-
ties for CDMC (Wang et al., 2014).
Despite promising results, standard DTW as a
similarity measure for unsupervised clustering of time
series suffers from certain problems. One of these
is the over-warping problem shown in Fig. 1. Over-
warping refers to unnatural alignment of dissimilar
segments in two time series. In Fig. 1, a subse-
quence of length over 300 in one patient is matched
by dynamic time warping to a subsequence of length
less than 10 in another. The result is unacceptable,
yet the standard dynamic time warping distance be-
tween the two segments is zero. Explicitly penalized
DTW has been developed to address over-warping.
For example, (Clifford et al., 2009) proposes variable
penalty DTW, which reduces nondiagonal moves dur-
ing alignment. However, this approach is heavily de-
pendent on a user-defined penalty function and thus
difficult to apply in practice.
An additional concern with DTW is time com-
plexity. Variants of DTW have been proposed that
focus on improving efficiency by globally constrain-
ing the warping path to a predefined geometric re-
gion such as the Sakoe-Chiba band (Sakoe and Chiba,
1978) or the Itakura parallelogram (Itakura, 1975).
However, the use of global constraints alone can lead
88
Wang, C., Alvarez, S., Ruiz, C. and Moonis, M.
Deviation-based Dynamic Time Warping for Clustering Human Sleep.
DOI: 10.5220/0005729200880095
In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 4: BIOSIGNALS, pages 88-95
ISBN: 978-989-758-170-0
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
0 100 200 300 400 500 600 700 800
0
1
2
3
Time
Sleep Stage
0 100 200 300 400 500 600 700 800
0
1
2
3
Time
Sleep Stage
Patient 56
Patient 235
OverWarping
(a) Over-Warping Occurrence in Time Series.
Patient 235
Patient 56
100 200 300 400 500 600 700 800
100
200
300
400
500
600
700
0
0.2
0.4
0.6
0.8
1
OverWarping
Diagonal Line
(b) Over-Warping Path in Dynamic Time Warping.
Figure 1: Over-warping of sleep stage sequences using dynamic time warping. (Left) Use of standard dynamic time warping
inappropriately matches dissimilar segments (a long one versus a short one) in two sequences. (Right) The same over-
warping problem described in terms of warping search area. The area circled by the dashed line indicates a large deviation of
the standard warping path from the diagonal path of constant slope. The bar graph on the right indicates local cost measure
and the background in the right figure shows the local cost marix of two discrete time series (patient 56 and patient 235) in
dynamic time warping computation (see section 2.1).
0 200 400 600 800 1000
0
200
400
600
800
1000
20 Patients
20 Patients
Figure 2: Optimal time warping paths (dashed lines) be-
tween 20 pairs of human sleep recordings. DTW optimal
paths are usually close to the diagonal line from bottom-left
to top-right in the warping space. The varying boundary of
the distribution suggests the desirability of adaptively iden-
tifying warping areas locally instead of using a predefined
global constraint.
to the over-warping problem described above. Some
researchers (Ratanamahatana and Keogh, 2004) have
argued that the effect of warping band width on the
quality of the results is greatly domain dependent and
that a narrow band might be valuable. The distribu-
tion of optimal warping paths between pairs of sleep
time series in Fig. 2 (from the present authors’ own
work) likewise suggests that the use of local search
constraints would be desirable.
Main Contributions of the Present Paper
1. We propose two novel DTW variants, global
weighted dynamic time warping (gwDTW) and
stepwise deviated dynamic time warping (sd-
DTW), that penalize deviations of the warping
path from the path of constant slope. This over-
comes the over-warping issue in Fig. 1, while re-
taining the efficiency advantages of approaches
based on global constraints such as the Sakoe-
Chiba band (Sakoe and Chiba, 1978) and Itakura
parallelogram (Itakura, 1975), and without rely-
ing on domain dependent specifics as in variable
penalty DTW (Clifford et al., 2009).
2. We apply the proposed modified DTW for clus-
tering initialization within the combined dynami-
cal modeling-clustering (CDMC) framework (Al-
varez and Ruiz, 2013) over human sleep time se-
ries, and show that this approach better captures
the dynamics of human sleep.
Organization of the Paper. Section 2 reviews stan-
dard DTW, and describes the proposed deviation-
based dynamic time warping approach and its appli-
cation to time-series clustering. Section 3 presents
experimental results and analysis on time series clus-
tering using deviation-based dynamic time warping.
Section 4 describes conclusions and future work.
Deviation-based Dynamic Time Warping for Clustering Human Sleep
89
2 METHODS
We review standard dynamic time warping in sec-
tion 2.1, as that technique will serve as a baseline.
The proposed deviation-based dynamic time warping
approach is described in section 2.2.
2.1 Dynamic Time Warping (DTW)
Dynamic time warping (DTW) is a classical dynamic
programming algorithm for measuring the similarity
of two time series (e.g., (M
¨
uller, 2007)). It performs
an optimal alignment between two time series by non-
linearly warping their time dimensions. DTW has
been applied to speech recognition (e.g., (Sakoe and
Chiba, 1978) and (Itakura, 1975)), time series clas-
sification (Jeong et al., 2011), and unsupervised time
series clustering (Oates et al., 2001).
The following are the essentials of standard DTW,
as described in (M
¨
uller, 2007).
We consider two time sequences
X = (x
1
,x
2
,... ,x
N
) of length N N and
Y = (y
1
,y
2
,. .. ,y
M
) of length M N, with indi-
vidual values x
i
,y
j
in some feature space F .
A local cost measure is a function
c : F × F R
0
(1)
The value of c (x
i
,y
j
) is small if x
i
,y
j
are close to each
other, and otherwise not. For discrete time series, one
can use a cost matrix to define the values c(x, y) for all
pairs of values x,y; the simplest possibility is to use
the identity matrix, that is, to let c(x,y) = 0 if x = y,
otherwise c(x,y) = 1.
A warping path between X and Y is a sequence
p = (p
1
, p
2
,. .. , p
L
) (2)
where p
l
= (n
l
,m
l
) [1 : N] × [1 : M] for l [1 : L]
(the diagonal line from bottom-left to top-right in
Fig. 1(b)), and max{N, M} L N + M. It must sat-
isfy the following three conditions:
Boundary condition: p
1
= (1,1) and p
L
= (N,M)
are the start and end points respectively.
Monotonicity condition: horizontal and vertical
components increase monotonically: n
1
n
2
.. .n
L
and m
1
m
2
...m
L
.
Step size condition: for each l < L, the difference
p
l+1
p
l
is one of (1,0), (0,1),(1,1).
The total cost of a warping path p between X and
Y is
Φ
p
(X,Y ) =
L
l=1
c(x
n
l
,y
m
l
) (3)
Time series 1
1
2
ψ(Xn
l-1
,Ym
l-1
)
ψ(Xn
l-1
,Ym
l
)
ψ(Xn
l
,Ym
l-1
)
ψ(Xn
l
,Ym
l
)
X
Y
Time series 2
Figure 3: Deviation (e.g., gray shaded area) of warping
path (squares with directional arrows) from path of constant
slope (solid red line). Given two warping paths (1 and 2),
the path with smaller deviation (the green one) is better.
An optimal warping path p
is one having min-
imum total cost Φ
p
(X,Y ) among all warping paths
from p
1
to p
L
. Φ
p
(X,Y ) is referred to as the DTW
distance between sequences X and Y .
2.2 Proposed Approach
2.2.1 Deviation Measure
We address the standard DTW concerns of over-
warping and time complexity described in the In-
troduction, by penalizing nondiagonal moves in the
search for an optimal warping path. This is done by
using the measure of deviation discussed below.
Deviation refers to the area
p
(shaded areas in
Fig. 3) bounded by the warping path p between two
time series and the diagonal path of constant slope.
Deviation is computed by the procedure described
in Algorithm 1. The following are the main steps:
Initialize the deviation
p
to be 0. (step 1)
Calculate the slope k and the intercept b of the
diagonal path defined as y = k x + b. (step 2-3)
Repeat until the end point p
L
is reached. (step 5)
- Add absolute vertical distance between current
point p
i
and diagonal to the deviation
p
, if p
i
and p
i1
differ vertically. (step 6-7)
- update the counter, i. (step 8)
Return the deviation
p
(step 9)
BIOSIGNALS 2016 - 9th International Conference on Bio-inspired Systems and Signal Processing
90
Algorithm 1: Deviation Calculation.
Input: A warping path p = {p
1
, p
2
,. . . , p
L
}, each
p
i
= (n
i
,m
i
); L: total number of points in path p.
Output: The deviation
p
of p from the diagonal
line through p
1
and p
L
in warping space.
ComputeDeviation(p)
1.
p
= 0
2. k = (M 1)/(N 1)
3. b = M k N
4. i = 2
5. While (p
i
6= p
L
)
6. if (m
i
6= m
i1
)
7.
p
=
p
+ |m
i
(k · n
i
+ b)|
8. i = i + 1
9. return
p
Time Series 1
Time Series 2
5 10 15 20 25 30
1
2
3
4
5
6
7
8
9
10
DTW
sdDTW
Diagonal
1
2
3
4
5
6
7
Figure 4: Optimal warping paths for standard DTW and
stepwise deviation-based DTW (sdDTW). Standard DTW
allows large deviations in searching for a path of minimum
total cost. sdDTW aligns time series closer to the diagonal
line. Background shading indicates local cost in sdDTW,
which increases with distance to the diagonal.
2.2.2 Deviation-based Dynamic Time Warping
Global Weighted Dynamic Time Warping: The global
weighted dynamic time warping (gwDTW ) distance
between sequences X and Y is defined as
gwDTW (X,Y ) = λ
gw
· Φ
p
(X,Y )
+(1 λ
gw
) ·
q
p
(X,Y )
(4)
where Φ
p
(X,Y ) is standard DTW distance, and the
deviation
p
(X,Y )
(Algorithm 1) is added as a post-
processing penalty to standard DTW. p
(X,Y ) is the
optimal path between X,Y in standard DTW. λ
gw
con-
trols the balance between standard DTW cost and the
deviation
p
(X,Y )
. λ
gw
ranges from 0 to 1. The square
root of the deviation
p
(X,Y )
is used because it scales
linearly with Euclidean distance.
Stepwise Deviation-based Dynamic Time Warping:
The stepwise deviation-based dynamic time warping
(sdDTW ) distance between sequences X and Y is de-
fined as
sdDTW (X,Y ) = Ψ
p
(X,Y )
(5)
where Ψ
p
(X,Y ) is the minimum total cost of a warp-
ing path p
in Eq. 3 obtained by replacing the local
cost measure in Eq. 1 by the modified measure ϕ(x, y)
in Eq. 6. Thus, sdDTW optimal warping paths are
minimizers of a different cost measure than standard
DTW paths.
ϕ(x,y) = λ
sd
· c(x,y) + (1 λ
sd
) ·
p
(x,y) (6)
where c(x,y) denotes the original local cost measure
in equation 1. (x,y) is the deviation of a position
(x,y) relative to the diagonal path of constant slope
for sequences X and Y . See section 2.2.1. λ
sd
is a
parameter that determines the relative weights of the
standard local cost measure and the deviation.
We use dynamic programming as in standard
DTW to compute the sdDTW optimal warping path,
based on the modified accumulated cost matrix
¯
D(n,m) = Ψ(X(1 : n),Y (1 : m)) (7)
where X(1 : n) = (x
1
,. .. ,x
n
) and Y (1 : m) =
(y
1
,. .. ,y
m
). n [1 : N] and m [1 : M]. That is
X(1 : n) and Y (1 : m) are subsequences of X and Y .
The procedure is as follows:
Initially,
¯
D(n,1) =
n
k=1
ϕ(x
k
,y
1
) and
¯
D(1,m) =
m
k=1
ϕ(x
1
,y
k
).
Iteratively, take the minimum accumulated cost
from three immediately adjacent directions:
¯
D(n 1, m) + ϕ(x
n
,y
m
),
¯
D(n,m 1) + ϕ(x
n
,y
m
),
¯
D(n 1, m 1) + ϕ(x
n
,y
m
).
Until the final position (N,M) is reached.
¯
D(N,M) is the optimal dynamic time warping dis-
tance with respect to stepwise deviation-based dy-
namic time warping.
As illustrated in Fig. 4, optimal warping paths for
the sdDTW distance metric exhibit less over-warping
than the corresponding standard DTW paths.
2.2.3 Deviation-based Dynamic Time Warping
Clustering
Deviation-based dynamic time warping clustering
(dDTWC) performs unsupervised agglomerative hier-
archical clustering of time series using the deviation-
based DTW approaches in section 2.2.2 to calculate
Deviation-based Dynamic Time Warping for Clustering Human Sleep
91
distances. The proposed approach is described in
pseudocode in Algorithm 2. The main steps are:
Initially each time series instance X is in its own
cluster (steps 1-2).
Repeat until only k clusters remain (steps 3-6):
- Merge the closest clusters, C and C
0
; the dis-
tance between two instances (X and Y ) is de-
fined by gwDTW or sdDTW in section 2.2.2;
the distance between two clusters is the average
distance between pairs of instances.
Return clustering of dataset in k clusters (step 7).
3 EXPERIMENTAL EVALUATION
Deviation-based DTW clustering as described in sec-
tion 2.2.3 was compared with clustering using the
standard DTW distance metric. For all dynamic time
warping computations, the local cost measure in Eq. 1
was defined as c(x,y) = 1 if the elements x and y are
different, otherwise c(x, y) = 0. The weight values
λ
gw
and λ
sd
in Eqs. 4 and 5, respectively, were de-
termined empirically in order to maximize mean ac-
curacy over a sample of labeled synthetic data gener-
ated as in section 3.2 (but separate from the synthetic
data sample used for performance evaluation in sec-
tion 3.2.3): λ
sd
was set to 0.67. λ
gw
was set to 0.83.
All experiments were performed in MATLAB
R
(The
MathWorks, 2015).
Two sets of experiments were carried out, corre-
sponding to synthetic data and human sleep data, re-
spectively. Details specific to each of these are de-
scribed in sections 3.2.2 and 3.3.2 below.
3.1 Statistical Significance
Pairwise comparisons of median classification accu-
racy values (see section 3.2.2) of gwDTW and sd-
DTW clustering against the accuracy of standard
DTW clustering (for synthetic data) and of negative
log likelihood values of gwDTW and sdDTW cluster-
ing against that of standard DTW clustering (for hu-
man sleep data) were carried out by a non-parametric
two-sided Wilcoxon rank sum test, since a Lilliefors
normality test rejected normality at the p < 0.05 sig-
nificance level in each case. A Bonferroni correction
was performed jointly on the accuracy and log likeli-
hood Wilcoxon p-values to ensure a familywise error
rate less than 0.05.
3.2 Synthetic Markov Mixture Data
3.2.1 Dataset Generation
A synthetic dataset of discrete sequences was gener-
ated as in (Alvarez and Ruiz, 2013), from two distinct
Markov models, each with two states. The two mod-
els differ in their transition probability matrices. Self-
transition probabilities of 0.6 in one model and 0.8 in
the other were selected. The probabilities of transi-
tioning between states were 0.4 and 0.2, respectively.
One of the two models is selected randomly and used
to generate a sequence of the desired length, L. This
process is iterated until a predetermined number of
sequences, N, is obtained. The present paper uses the
values N = 100 and L = 300 in all trials.
3.2.2 Experimental Procedure
Clustering Classification Accuracy. Supervised
classification via clustering was performed with the
generating model label as the classification target.
Each cluster was associated with the class c that oc-
curs most frequently among its members. The evalu-
ation metric was classification accuracy, equal to the
fraction of labeled instances (X , c) that are assigned
to a cluster in which c is the majority class. Statistical
hypothesis testing was performed using a Wilcoxon
rank sum test to compare median accuracies as de-
scribed in section 3.1. Experimental procedure was
as in the following pseudocode:
Experimental Procedure, Synthetic Data Classifi-
cation:
begin
for i := 1 to Tr ialNum
SD = generateSyntheticDataset(N, L);
Accuracy(1, i) = evaluateByDTW(SD);
Accuracy(2, i) = evaluateBygwDTW(λ
gw
, SD);
Accuracy(3, i) = evaluateBysdDTW(λ
sd
, SD);
end
Perform Wilcoxon rank sum test over Accuracy.
end
Note:
generateSyntheticDataset followed the descrip-
tion in section 3.2.
evaluateByDTW refers to the clustering proce-
dure in section 2.2.3 and clustering evaluation by
classification accuracy (see above); likewise for
evaluateBygwDTW and evaluateBysdDTW.
The total number of sequences, N, was set to 100.
The length of a sequence, L, was set to be 300.
The number of trials, TrialNum, was set to 100.
BIOSIGNALS 2016 - 9th International Conference on Bio-inspired Systems and Signal Processing
92
Algorithm 2: Deviation-Based DTW Clustering (dDTWC).
Input: An unlabeled time series dataset D = {X | X is a time series}; a positive integer, k, the desired number
of clusters; a predefined local cost measure c : F × F R
0
where F is the feature space in which the time
series in D take their values. dDTW denotes the total cost measure associated with c, which is defined by Eq. 4
in the case of gwDTW and by Eq. 5 in the case of sdDTW.
Output: A partition of D into k clusters
dDTWC(D, k, d)
1. for each i, let C
i
= a cluster that contains only the i-th time series in D
2. s = the number of time series in D (initial number of clusters )
3. while s > k
4. (i
, j
) = arg min
i, j∈{1,...,s}
¯c(C
i
,C
j
) (where ¯c is mean cost for instance pairs in the two clusters)
= arg min
i, j∈{1,...,s}
XC
i
,Y C
j
dDTW(X,Y,c)
|C
i
|·|C
j
|
5. Merge C
i
and C
j
to reduce the number of clusters to s 1
6. s = s 1
7. return {C
1
,. .. ,C
k
}
DTW gwDTW sdDTW
0.5
0.6
0.7
0.8
0.9
1
Figure 5: Clustering accuracies using standard DTW,
gwDTW, and sdDTW as similarity measures over hidden
Markov mixture data. Non-overlapping notches indicate
significant difference in medians (p < 0.05). gwDTW and
sdDTW are significantly more accurate than standard DTW.
3.2.3 Synthetic Data Results
This section evaluates performance of clustering
over synthetic data using globally weighted DTW
(gwDTW) (Eq. 4) or stepwise deviation-based DTW
(sdDTW) as the similarity measure, as compared with
standard DTW similarity.
Clustering accuracies over the synthetic dataset
appear in Fig. 5. Both gwDTW and sdDTW perform
significantly better than standard DTW, proving the
benefit of incorporating deviation into the DTW com-
putation for clustering of synthetic time series data.
Median accuracies appear in Table 1.
Table 1: Median accuracies of clusterings based on
DTW, gwDTW, and sdDTW. Asterisks denote Bonferroni-
corrected statistical significance of differences with stan-
dard DTW in Wilcoxon rank sum test (p < 0.05).
DTW gwDTW sdDTW
0.65 0.92
0.91
3.3 Human Sleep Data
3.3.1 Datasets
A collection of 244 fully anonymized human
polysomnographic recordings was extracted from
polysomnographic overnight sleep studies performed
in the Sleep Clinic at Day Kimball Hospital in Put-
nam, Connecticut, USA. Each polysomnographic
recording is split into 30-second epochs. Lab tech-
nicians staged each 30-second epoch into one of the
sleep stages Wake, stage 1, stage 2, stage 3, and REM
(Rapid Eye Movement). Three versions of the human
sleep dataset are considered, depending on whether
these stage labels are grouped in some way:
(W5) uses the five standard stage labels Wake, 1,
2, 3, REM.
(WNR) uses the three stage labels Wake, NREM
(stages 1, 2, and 3), REM.
(WDL) uses the three stage labels Wake, Deep
(stage 3), Light (stages 1,2,REM).
3.3.2 Experimental Procedure
Unsupervised clustering was performed over the
human sleep datasets described in section 3.3.1.
Deviation-based Dynamic Time Warping for Clustering Human Sleep
93
The collective dynamic modeling clustering algo-
rithm (Alvarez and Ruiz, 2013) was used for cluster-
ing, with two-state hidden semi-Markov chain models
as the dynamical models. Initial cluster labels were
computed by deviation-based DTW clustering as de-
scribed in Algorithm 2, with either gwDTW (Eq. 4)
or sdDTW (Eq. 5) as the distance metric. Clustering
driven by the standard DTW distance metric was used
as a basis for comparison.
Generative negative log likelihood was used to
measure the quality of model fit for unsupervised
clustering. Given a hidden semi-Markov model,
M, built over a group of sequences such as hu-
man sleep sequences, the generative negative log-
likelihood log(P(s|M)) of a sequence, s, is a mea-
sure of the probability that the sequence, s, would
be produced by the model, M. Lower negative log-
likelihood values (higher generative probabilities) im-
ply a better model fit. The goal of clustering was
to minimize the generative negative log likelihood.
Comparison of median negative log likelihoods for
different models was measured by a Wilcoxon rank
sum test as described in section 3.1.
Experimental Procedure, Sleep Data Clustering:
begin
D1, D2, D3 = W5, WNR, WDL datasets (3.3.1)
m1, m2, m3 = DTW, gwDTW, sdDTW
for j = 1 to 3
for k = 1 to 3
(M, nlogll(Dj, mk, s1...s244)) = CDMC(Dj, mk)
end
Perform pairwise Wilcoxon rank sum tests on
nlogll(Dj, m1...m3, s1...s244)
end
end
Notes:
The W5, WNR, WDL datasets are as in sec-
tion 3.3.1.
DTW refers to clustering using standard DTW as
the similarity metric; gwDTW and sdDTW refer
to the deviation-based clustering techniques de-
scribed in section 2.2.3.
CDMC(D j,mk) refers to CDMC clustering (Al-
varez and Ruiz, 2013) with semi-Markov cluster
models, using the given method, mk, for cluster-
ing initialization, and is assumed to return a set of
dynamical models together with negative genera-
tive log likelihoods nlogll(Dj,mk,sl) for all input
sequences, sl, l = 1,···244.
Figure 6: Visualization of clusters over human sleep dataset
using gwDTW as similarity measure. Coordinates are
Weibull shape and scale parameters for Wake stage. Red
circles and blue triangles denote gwDTW clusters; back-
ground colors represent DTW clusters.
3.3.3 Human Sleep Data Results
Fig. 6 shows the two CDMC clusters (circles and tri-
angles) with coordinates equal to the Weibull scale
and shape parameters for the wake stage in the WNR
dataset. gwDTW clusters better capture the bound-
ary between the natural Weibull dynamical clusters in
the human sleep dataset, as compared with standard
DTW clusters.
Model fit was significantly better for both global
weighted DTW (gwDTW) clustering and stepwise
deviation-based DTW (sdDTW) clustering as com-
pared with standard DTW-driven clustering, as shown
in Table 2. This shows that deviation-based DTW is
superior to standard DTW as a similarity metric for
initialization of CDMC clustering over human sleep
data, as well as for standalone clustering over syn-
thetic data as shown in section 3.2.3.
Table 2: Median negative log likelihoods of gwDTW, sd-
DTW, and standard DTW clusterings over WNR, WDL, and
W5 human sleep datasets in section 3.3.1. Asterisks indi-
cate Bonferroni-corrected significance of differences with
standard DTW in Wilcoxon rank sum test (p < 0.05).
DTW gwDTW sdDTW
WNR 150.7 148.2
147.9
WDL 159.4 157.9
158.1
W5 194.1 192.3
191.9
4 CONCLUSIONS
This paper proposes two versions of a modified dy-
namic time warping (DTW) approach for comparing
BIOSIGNALS 2016 - 9th International Conference on Bio-inspired Systems and Signal Processing
94
discrete time series such as human sleep sequences:
global weighted dynamic time warping (gwDTW)
and stepwise deviation-based dynamic time warping
(sdDTW). Both versions penalize deviations from the
path of constant slope in the warping space, yielding
the efficiency advantages of DTW approaches based
on global constraints such as the Itakura parallelo-
gram or the Sakoe-Chiba band, while better account-
ing for local deviations. gwDTW adds a deviation-
based term to the standard DTW distance metric.
sDTW adds a deviation term into the local cost func-
tion that drives the DTW dynamic programming op-
timization itself, yielding an improved warping path
together with a similarity metric. Both gwDTW and
sdDTW lead to significantly better clustering results
than DTW in a classification task over labeled syn-
thetic semi-Markov data, as well as in unsupervised
clustering of human sleep data. The authors learned
of an interesting “salient feature” approach to con-
strained DTW (Candan et al., 2012) after completing
the work reported in the present paper. The salient
feature approach extracts features of the input se-
quences that are then used to define locally adaptive
constraints on the warping path. In future work, it
would be desirable to pursue a performance compar-
ison of the salient feature approach of (Candan et al.,
2012) with that of the present paper.
ACKNOWLEDGEMENTS
The authors thank the anonymous referees for com-
ments that helped improve the legibility of the paper,
and for making us aware of (Candan et al., 2012).
REFERENCES
Alvarez, S. A. and Ruiz, C. (2013). Collective probabilis-
tic dynamical modeling of sleep stage transitions. In
Proc. Sixth International Conference on Bio-inspired
Systems and Signal Processing (BIOSIGNALS 2013),
Barcelona, Spain.
Bianchi, M. T., Cash, S. S., Mietus, J., Peng, C.-K.,
and Thomas, R. (2010). Obstructive sleep apnea al-
ters sleep stage transition dynamics. PLoS ONE,
5(6):e11356.
Candan, K. S., Rossini, R., Wang, X., and Sapino, M.
L. (2012). sDTW: computing DTW distances using
locally relevant constraints based on salient feature
alignments. Proceedings of the VLDB Endowment,
5(11):15191530.
Chu-Shore, J., Westover, M. B., and Bianchi, M. T. (2010).
Power law versus exponential state transition dynam-
ics: application to sleep-wake architecture. PLoS
ONE, 5(12):e14204.
Clifford, D., Stone, G., Montoliu, I., Rezzi, S., Martin, F.-P.,
Guy, P., Bruce, S., and Kochhar, S. (2009). Alignment
using variable penalty dynamic time warping. Analyt-
ical Chemistry, 81(3):1000–1007.
Dijk, D. J. and Lockley, S. W. (2002). Invited review:
Integration of human sleep-wake regulation and cir-
cadian rhythmicity. Journal of Applied Physiology,
92(2):852–862.
Itakura, F. (1975). Minimum prediction residual principle
applied to speech recognition. IEEE Transactions on
Acoustics, Speech, and Signal Processing, 23(1):67–
72.
Jeong, Y.-S., Jeong, M. K., and Omitaomu, O. A. (2011).
Weighted dynamic time warping for time series clas-
sification. Pattern Recognition, 44(9):2231–2240.
Kishi, A., Struzik, Z. R., Natelson, B. H., Togo, F., and
Yamamoto, Y. (2008). Dynamics of sleep stage tran-
sitions in healthy humans and patients with chronic
fatigue syndrome. American Journal of Physiology-
Regulatory, Integrative and Comparative Physiology,
294(6):R1980–R1987.
M
¨
uller, M. (2007). Dynamic time warping. Information
retrieval for music and motion, pages 69–84.
Oates, T., Firoiu, L., and Cohen, P. R. (2001). Using dy-
namic time warping to bootstrap HMM-based cluster-
ing of time series. In Sequence learning: Paradigms,
algorithms, and applications, pages 35–52. Springer-
Verlag.
Ratanamahatana, C. A. and Keogh, E. (2004). Making time-
series classification more accurate using learned con-
straints. In Proceedings of SIAM International Con-
ference on Data Mining (SDM ’04), pages 11–22.
Sakoe, H. and Chiba, S. (1978). Dynamic programming
algorithm optimization for spoken word recognition.
IEEE Transactions on Acoustics, Speech and Signal
Processing, 26(1):43–49.
Wang, C., Alvarez, S. A., Ruiz, C., and Moonis, M.
(2014). Semi-Markov modeling-clustering of hu-
man sleep with efficient initialization and stopping.
In Proc. Seventh International Conference on Bio-
Inspired Systems and Signal Processing (BIOSIG-
NALS 2014), Barcelona, Spain.
Deviation-based Dynamic Time Warping for Clustering Human Sleep
95