Actigraphic Sleep Detection for Real-World Data of Healthy Young

Adults and People with Alzheimer’s Disease

Stefan L

udtke, Albert Hein, Frank Kr

uger, Sebastian Bader and Thomas Kirste

Mobile Multimedia Information Systems Group, Institute of Computer Science, University of Rostock,

18051 Rostock, Germany

Keywords:

Sleep Detection, Actigraphy, Hidden Markov Model, Machine Learning, Dementia.

Abstract:

Actigraphy can be used to examine the sleep pattern of patients during the course of the day in their com-

mon environment. However, conventional sleep detection algorithms may not be appropriate for real-world

daytime sleep detection, since they tend to overestimate the sleep duration and have only been validated for

nighttime sleep in a laboratory setting. Therefore, we evaluated the performance of a set of new sleep detection

algorithms based on machine learning methods in a real-world setting and compared them to two conventional

sleep detection algorithms (Cole’s algorithm and Sadeh’s algorithm). For that, we performed two studies with

(1) healthy young adults and (2) nursing home residents with Alzheimer’s dementia. The conventional algo-

rithms performed poorly for these real-world data sets, because they are imbalanced with respect to sensitivity

and speciﬁcity. A more balanced Hidden Markov Model-based algorithm surpassed the conventional algo-

rithms for both data sets. Using this algorithm leads to an improved accuracy of 4.1 percent points (pp) and

23.5 pp, respectively, compared to the conventional algorithms. The Youden-Index improved by 7.3 and 7.7,

respectively. Overall, for a real-world setting, the HMM-based algorithm achieved a performance similar to

conventional algorithms in a laboratory environment.

1 INTRODUCTION

People with dementia often suffer from a disturbed

circadian rhythm manifesting in sleep disorders (Mc-

Curry and Ancoli-Israel, 2003). These sleep disor-

ders are positively correlated with poor health, cogni-

tive impairment and mortality (Ancoli-Israel, 2009).

To apply treatments, it is necessary to know the day-

time sleep pattern of the patients. For example, for

patients having a delayed circadian rhythm, morning

bright light therapy can be beneﬁcial (Mishima et al.,

1994).

However, the gold standard method for

sleep/wake scoring, polysomnography (PSG),

can only be performed in a sleep laboratory, requires

a number of electrodes to be attached to the patient’s

skin, and the data must be evaluated by a trained

expert based on standardized rules (Rechtschaffen

and Kales, 1968). Actigraphy, on the other hand, is a

noninvasive tool for sleep detection that can also be

applied in a non-clinical environment, for example in

the subjects’ homes or care facilities. Furthermore,

using actigraphy, it is possible to record sleep patterns

over longer periods of time.

A number of algorithms for actigraphic

sleep/wake detection have been proposed (Cole

et al., 1992; Sadeh et al., 1989; Kushida et al., 2001;

Paquet et al., 2007; Nakazaki et al., 2014). The

standard procedure for validation of this algorithms

is comparison with PSG. However, algorithms

validated this way may not be applicable for daytime

sleep detection in a real-world environment, because

subjects undergoing PSG suffer from the so-called

ﬁrst night effect (the effect may actually last longer

than one night (Le Bon et al., 2001)). This means

that, because of the change in environment and the

knowledge of being under observation, subjects show

an aberrant sleep behaviour. Furthermore, Martin et

al. notes that the validity of daytime sleep estimation

of conventional algorithms is limited (Martin and

Hakim, 2011).

Therefore, we investigated the performance of

sleep detection algorithms in a non-laboratory en-

vironment. For this purpose, data of two differ-

ent subject groups has been recorded: (1) healthy

young adults, and (2) nursing home residents with

Alzheimer’s dementia. We proposed a set of new al-

gorithms based on different machine learning meth-

ods (Linear Discriminant Analysis (LDA), Logistic

Regression (LR), Support Vector Machine (SVM),

LÃijdtke S., Hein A., KrÃijger F., Bader S. and Kirste T.

Actigraphic Sleep Detection for Real-World Data of Healthy Young Adults and People with Alzheimerâ

Zs Disease.

DOI: 10.5220/0006158801850192

In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), pages 185-192

ISBN: 978-989-758-212-7

185

Table 1: Performance of different algorithms for sleep/wake classiﬁcation. Sens.: Sensitivity, Spec.: Speciﬁcity, Acc.: Accu-

racy, SD: Sleep disorders.

Author Sens. Spec. Acc. Subjects Algorithm

(Nakazaki et al., 2014) 90 65 85 34 healthy subjects Nakazaki’s

(Sadeh et al., 1989) 88 76 86 4 healthy subjects Sadeh’s

(Paquet et al., 2007) 95 54 91 100 healthy subjects Kushida’s

(Cole et al., 1992) 95 65 88 15 healthy, 26 subjects with SD Cole’s

(Hedner et al., 2004) 89 69 84 228 subjects with sleep apnea Hedner’s

(Kushida et al., 2001) 92 48 77 100 subjects with SD Kushida’s

(Paquet et al., 2007) 96 45 84 23 subjects with sleep deprivation Kushida’s

(Sadeh et al., 1989) 92 56 86 25 subjects with sleep apnea Sadeh’s

(Sadeh et al., 1989) 95 48 78 16 subjects with insomnia Sadeh’s

(Taibi et al., 2013) 96 36 76 16 subjects with insomnia Kushida’s

(Domingues et al., 2014) 76 82 78 29 healthy subjects Domingues’

(Orellana et al., 2014) 98 73 93 119 healthy adolescends Orellana’s

(Tilmanne et al., 2009) 92 58 82 354 infants Tilmanne’s

Hidden Markov Model (HMM)) and compared their

performance with the performance of two conven-

tional algorithms for sleep detection (Cole’s algo-

rithm and Sadeh’s algorithm).

2 RELATED WORK

In this section, algorithms for sleep/wake scoring are

presented, as well as studies investigating the per-

formance of these algorithms with different subject

groups.

Conventional sleep detection algorithms use Ac-

tivity Counts as input. Activity Counts are arbitrary

features of a raw acceleration signal, typically com-

puted for one-minute timeframes. They are generated

as follows: First, the acceleration signal (typically

sampled at 32 - 128 Hz (Van Someren et al., 1996))

is bandpass-ﬁltered (e.g. using a 0.5 - 11 Hz band-

pass ﬁlter (Van Someren et al., 1996)). Afterwards,

Activity Counts are generated from the ﬁltered sig-

nal by either counting the number of samples above

a threshold (time above threshold, TAT), counting the

number of zero crossings (ZC) or computing the sum

of the magnitude of all signal values (digital integra-

tion, DI) (Ancoli-Israel et al., 2003). These Activity

Counts are calculated on board the proprietary acti-

graphic devices, which makes replication of results

difﬁcult when a different actigraphic device is used.

For example, de Souza et al. reimplemented two

conventional sleep detection algorithms (Cole et al.,

1992; Sadeh et al., 1989), and obtained signiﬁcantly

different results (de Souza et al., 2003).

In the following, two conventional algorithms

(Cole’s algorithms and Sadeh’s algorithm) for sleep

detection are presented. These are also used in this

study as a reference value. They have been chosen

because of their wide distribution (Martin and Hakim,

2011) and reported performance.

Cole et al. (Cole et al., 1992) proposed an

algorithm that uses one-minute-timeframe Activity

Counts A

for classiﬁcation. For the classiﬁcation of

minute i, a linear combination of the four previous to

the two following Activity Counts is computed:

=0.00001(404A

i−4

+ 598A

i−3

+ 326A

i−2

+ 441A

i−1

+ 1408A

+ 508A

i+1

+ 350A

i+2

)

The coefﬁcients have been identiﬁed by linear regres-

sion.

Sadeh et al. (Sadeh et al., 1989) also use one-

minute-timeframe Activity Counts for classiﬁcation,

but compute higher-level features of these Activity

Counts before computing a linear model.

= 4.532 −0.06828A

− 0.0385 sd(A

i−5

,...,A

i−1

)

− 0.038 sd(A

i−9

,...,A

i−1

) + 0.0298 min(A

i+1

i+2

)

− 0.0299 sd(A

i−2

i−1

)

The classiﬁcation is obtained by applying a

threshold to D

or PS

, respectively. There are other

sleep detection algorithms that rely on the same basic

ideas. The algorithms presented in (Nakazaki et al.,

2014), (Kushida et al., 2001) and (Cook et al., 2004)

work similarly to Cole’s algorithm, but use differ-

ent coefﬁcients for the linear model. Furthermore,

Cook’s algorithm allows to choose the classiﬁcation

threshold to adapt the algorithm to different subject

groups. For example, a lower threshold leads to a

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

186

Figure 1: Sensor Bracelet.

higher speciﬁcity for subjects with insomnia (Lich-

stein et al., 2006).

The performance achievable with these algorithms

heavily depends on the subject group (Ancoli-Israel

et al., 2003). Table 1 lists performance results for the

algorithms. For healthy subjects, an accuracy of over

85 % can be achieved. Typically, the sensitivity (frac-

tion of data with class “sleep” correctly classiﬁed as

“sleep”) of the algorithms is signiﬁcantly higher than

the speciﬁcity (fraction of data with class “awake”

correctly classiﬁed as “awake”), the algorithms tend

to overestimate the sleep state (Ancoli-Israel et al.,

2003).

For subjects with sleep disorders (e.g. sleep ap-

nea), and especially for subjects with insomnia, the

achievable speciﬁcity and therefore the accuracy is

signiﬁcantly lower than for healthy subjects. The

low speciﬁcity occurs because these subjects spend

a greater portion of the night lying awake without

movement, which is difﬁcult to classify correctly.

Recently, a number of new algorithms for sleep

detection have been proposed that do not rely on lin-

ear models, but on decision trees (Taibi et al., 2013),

Artiﬁcial Neural Networks (Orellana et al., 2014) or

Hidden Markov Models (Domingues et al., 2014).

Domingues et al. address the problem of low sensi-

tivity by not optimizing the accuracy, but the geomet-

ric mean of sensitivity and speciﬁcity. Orellana et al.

addresses this problem by repeating the less frequent

class in the training data until both classes have the

same frequency in the training data set.

3 METHODS

3.1 Data Acquisition

Actigraphic data has been recorded by a custom wrist-

worn device (Grey Innovation, Melbourne, Australia,

cf. Figure 1). This device contains a 3-axes ac-

celerometer (sampled at 100 Hz), a 3-axes gyroscope

(sampled at 100 Hz) as well as two thermometers for

reference and skin temperature (sampled at 0.1 Hz).

Four healthy, young adults, as well as nine older

subjects with Alzheimer’s dementia participated in

this study. The healthy subjects (age 23.5 ± 1.9 years,

1 Female, 3 Males) participated in the study for ﬁve

days each. The sensors were worn by the subjects on

either the wrist or ankle from the afternoon until the

next morning, therefore sleep and wake periods are

present in every recording. The different recording

positions have been chosen to compare the suitability

of these recording positions for sleep detection. The

sleep/wake annotation for this data has been acquired

by a sleep diary recorded by the subjects. The dura-

tion of recorded sensor data for every subject, as well

as the duration of annotated data, is listed in Table

2. In total, 194.4 h of sensor data of healthy subjects

have been recorded. The subjects have been sleeping

45.6 % of the recording time.

The subjects with Alzheimer’s dementia (age 78.4

± 2.9 years, 6 Females, 3 Males) participated in this

study for 24 days each. All of these subjects lived in

care facilities during the course of the study. The sen-

sors have been applied to the wrist and ankle of each

subject by caregivers in the morning. The battery of

the bracelet lasted for about 8 hours, so that valid sen-

sor data are available each day from about 08:00 to

16:00. Night-time data of subjects with dementia has

not been considered, because no annotations are avail-

able at night. The sleep/wake annotation for these

subjects has been acquired by Dementia Care Map-

ping (DCM) (Sloane et al., 2007). DCM has been car-

ried out only for a fraction of the total recording time.

715.9 h of sensor data of subjects with dementia have

been recorded. DCM annotation has been performed

for 169.9 h (or 23.7 % of the total recording time). 7.1

h of the data of the subjects with dementia have been

annotated with the class sleep (4.2 % of the annotated

samples).

3.2 Preprocessing and Feature

Extraction

Two preprocessing operations are performed on the

data: First, the magnitude of the accelerometer and

gyroscope signals are computed. Subsequently, the

Actigraphic Sleep Detection for Real-World Data of Healthy Young Adults and People with Alzheimerâ

Zs Disease

187

Table 2: Minutes of sensor data, annotated data and minutes annotated with sleep, for each healthy subject (H) and subject

with dementia (D).

Subject H1 H2 H3 H4 D1 D2 D3 D4 D5 D6 D7 D8 D9

Sensor 4955 4261 4202 4849 8762 8429 9103 9310 6211 9108 8280 6302 9880

Annotation 4955 4261 4202 4848 1174 1352 859 532 900 882 1157 1912 1424

Sleep 2279 2314 2379 1785 0 195 20 30 112 30 5 0 35

resulting signals are ﬁltered with a 0.5 - 11 Hz Butter-

worth bandpass ﬁlter. The ﬁlter bandwidth has been

chosen according to (Van Someren et al., 1996). On

this preprocessed data, the following 37 features are

calculated for one-minute timeframes (an example of

different features for one recording is depicted in Fig-

ure 2):

• Statistical Features: Mean, variance, skewness,

kurtosis, median, 10-, 25-, 75-, and 90-percentile,

Shannon Entropy and energy of the preprocessed

sensor data.

• Features based on Activity Counts conventionally

used with sleep detection algorithms (Cole et al.,

1992; Sadeh et al., 1989): Threshold crossing rate

(thresholds 0.1 g and 1.8

◦

/s, respectively), rela-

tive frequency of samples above threshold.

• Lowpass-ﬁltered versions of mean and Activ-

ity Count-based features with the ﬁlter kernel

(1,2,4,8, 16,8, 4,2, 1)/46. This way, the smooth-

ing step that is done by the conventional algo-

rithms by including feature values of adjacent

time frames into the linear model, is performed

directly on the data.

• The angle between the acceleration vector in the

beginning and at the end of a time frame (Bieber

et al., 2014).

3.3 Classiﬁcation Algorithms

In this section, we describe the necessary adaptions to

Cole’s algorithm and Sadeh’s algorithm to apply them

to our data, as well as the proposed algorithms based

on machine-learning algorithms. The conventional

algorithms are based on the processing of Activity

Counts. However, the Activity Counts described in

the literature cannot be reproduced directly, because

of the different sensor modalities. Instead, all of the

features described above have been tested as input of

the algorithms. Furthermore, because we did not use

the original Activity Counts, the original coefﬁcients

of the linear model cannot be used. Therefore, the

coefﬁcients of the algorithms have been recalculated

using linear regression.

As described above, the class-distribution in our

data is imbalanced: For the subjects with dementia,

Mean

10P

TAT

TCR

Sleep

12 14 16 18

Time

Sleeping

No Yes

Figure 2: Example of some of the features for one record-

ing. The different quality of the features for classiﬁca-

tion can be seen (note the differences in the features for

sleep and awake periods. TAT: Time above threshold, TCR:

Threshold crossing rate. 10P: 10-Percentile. Sleep: Sleep

annotation.

the class awake occurs 95 % of the time. Algorithms

trained with these data can easily achieve a high ac-

curacy by always choosing the class awake. Prelim-

inary tests showed that the conventional algorithms,

trained with the data of the subjects with dementia,

indeed classify all samples as awake. However, this

behaviour is not desired for algorithms that should

be able to detect daytime sleep. Therefore, we use

stratiﬁed oversampling (Chawla, 2005) of the train-

ing data, similar to (Orellana et al., 2014). This means

that samples from the less frequent class are repeated,

until both classes have the same prior probability. For

the conventional algorithms, this resampling step has

to keep the sequential order of Activity Counts. This

is done by ﬁrst generating the set of activity counts

for each time frame and then resampling on these

sets. The LDA, LR and SVM are also trained using

the resampled training data. These algorithms do not

use adjacent features values as input (like the conven-

tional algorithms), but only the current feature value.

The HMM consists of two states, awake and

sleeping. The transition matrix is computed by count-

ing the relative frequency of state transitions in the

training data. For the observation model, a multivari-

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

188

−6 −4 −2

log(a)

frequency

−8 −7 −6 −5 −4 −3

log(a)

frequency

Figure 3: Histogram of sensor data, one recording, class

awake (top) and class sleeping (bottom), logarithmic scale.

Red line: Empirical log-normal distribution.

ate logarithmic normal distribution has been chosen.

This choice is based on the observation that many

processes associated with human movement are log-

normal distributed (Zhang and Popp, 1994). Further-

more, the log-normal distribution is a good represen-

tation of the real distribution of the sensor data (cf.

Figure 3). The parameters of the observation distribu-

tion have been estimated from the training data using

maximum-likelihood estimators. The prior probabil-

ities of the classes are estimated as the relative fre-

quency of the classes in the training data. The classi-

ﬁcation is acquired by computing the most probable

state sequence using the Viterbi Algorithm (Viterbi,

1967). We used an HMM because of its ability to

model temporal relations. We suspect that sensor data

of the current time frame may not be sufﬁcient for

sleep/wake discrimination, because short periods of

inactivity may not automatically mean that the sub-

ject was sleeping.

3.4 Performance Evaluation and

Experimental Design

In the case of imbalanced class distribution (as in

our data of the subjects with dementia), accuracy

as performance measure is not sufﬁcient (Chawla,

2005). A classiﬁer that overestimates the more fre-

quent class can achieve a high accuracy while having

a poor ability to detect the less frequent class (sleep,

in our case). Therefore, the performance measures

sensitivity (fraction of data with class sleep correctly

classiﬁed as sleep) and speciﬁcity (fraction of data

Table 3: Factors and levels of experimental design.

Factor Levels

Subjects H (Healthy) D (Subj. with Dementia)

Position W (Wrist), A (Ankle)

Algorithm Cole, Sadeh, LDA, LR, SVM, HMM

Features Single, PCA, 5 best

with class awake correctly classiﬁed as awake) are of

greater interest. A combination of these two measures

is the Youden-Index J = sensitivity + speci f icity − 1

(Youden, 1950). Because the Youden-Index gives a

balanced impression of both the sleep and wakeful-

ness detection ability of the algorithm, it is used as

the primary performance measure in this study.

We used a factorial design for this study. The

factors and levels are depicted in Table 3. The fac-

tors Subjects and Position represent the used data set

(healthy subjects or subjects with dementia, and the

respective recording position). The factor Algorithm

represents the classiﬁcation algorithm (either Cole’s

algorithm, Sadeh’s algorithm or one of the algorithms

based on LDA, LR, SVM or HMM). The factor Fea-

tures represents the used set of features. Every feature

was tested as univariate input of the algorithms. Fur-

thermore, we computed principal components of the

features and used the ﬁrst k = 1, ...,37 components.

Moreover, we used the feature combination of the 5

features that achieved the highest Youden-Index when

used univariately with the respective algorithm.

This experimental design results in 2∗2 ∗6∗(37 +

37 + 1) = 1800 conﬁgurations. The performance of

every conﬁguration was assessed using leave-one-

subject-out cross validation.

4 RESULTS

In this section, the results of the experiments outlined

in section 3.4 are presented. Examples for the classiﬁ-

cation of a recording of a healthy subject and a subject

with dementia are depicted in Figures 4 and 5. The re-

sults obtained by leave-one-subject-out cross valida-

tion are summarized in Table 4. In this table, the best

results for every data set and algorithm is reported,

i.e. the result of the feature combination achieving

the highest Youden-Index. For the healthy subjects

as well as for the subjects with dementia, the maxi-

mum performance of each algorithm was higher when

recording sensor data on the wrist instead of the ankle.

The reason for this is that people tend to move their

hands more than their feet in phases of rest (e.g. sit-

ting), this result is consistent with (Middelkoop et al.,

Actigraphic Sleep Detection for Real-World Data of Healthy Young Adults and People with Alzheimerâ

Zs Disease

189

Data

Sleep

HMM

Cole

Sadeh

0 5 10 15 20

Time

Sleeping

No Yes

Figure 4: Example classiﬁcation of one recording period for healthy subjects. Data: Raw acceleration values (lowpass-

ﬁltered for depiction). Sleep: Sleep annotations. HMM, Cole, Sadeh: Classiﬁcation obtained by respective algorithms. In

this example, the HMM shows a good classiﬁcation, while Cole’ and Sadeh’s algorithm overestimate the state sleeping and

the number of state transitions.

Data

Sleep

HMM

Cole

Sadeh

12 14 16 18

Time

Sleeping

No Yes

Figure 5: Example classiﬁcation of one recording period of subjects with dementia. Data: Raw acceleration values (lowpass-

ﬁltered for depiction). Sleep: Sleep annotations. HMM, Cole, Sadeh: Classiﬁcation obtained by respective algorithms. In

this example, the HMM shows a balanced sleep/wake classiﬁcation, while Cole and Sadeh overestimate the state sleeping.

1997). Therefore, in the following, only the wrist data

sets are considered, and for the ankle data, only the

best result on each data set is reported in Table 4.

When comparing the different feature sets that

lead to the highest Youden-Indices, two classes of fea-

tures lead to particularly high performances: Features

based on Activity Counts, as well as lowpass-ﬁltered

features (lowpass-ﬁltered statistical features, and

lowpass-ﬁltered features based on Activity Counts).

Using more than one feature (i.e. a multivariate fea-

ture set) has not led to an increase in performance for

the conventional algorithms and the HMM-based al-

gorithm. However, for the other machine learning-

based algorithms, the highest performances could be

achieved using multivariate features. This can be ex-

plained by differences of the algorithms in dealing

with correlated features.

On both data sets, the HMM-based algorithm

achieved the highest accuracy and the highest

Youden-Index. For the healthy subjects, the conven-

tional algorithms achieved a very high sensitivity (>

98 %), and a speciﬁcity of 77.5 % or 86.3 %. The sen-

sitivity and speciﬁcity of the HMM-based algorithm

is more balanced, therefore, by using the HMM-based

algorithm, the Youden-Index improved by 7.3 and the

accuracy improved by 4.1 percent points (pp), com-

pared to the conventional algorithms.

For the subjects with dementia, the performance

of all algorithms is signiﬁcantly lower. The conven-

tional algorithms could achieve a sensitivity of over

90 %, but a speciﬁcity of only ≈ 45 %. The low speci-

ﬁcity is typical for the conventional algorithms and

caused by the inability of the algorithms to distin-

guish short periods of rest from sleep. In contrast

to other studies, the subjects in this study have been

awake for most of the recording time. Therefore,

the low speciﬁcity has a great impact on the accu-

racy: The conventional algorithms could only achieve

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

190

Table 4: Performance of tested sleep/wake detection al-

gorithms. For every instance, the results of the feature-

combination achieving the highest Youden-index is re-

ported. HW: Healthy subjects, wirst position. HA: Healthy

subjects, ankle position. DW: Subjects with dementia, wrist

position. DA: Subjects with dementia, ankle position.

Data Method Sens. Spec. Acc. Youden

HW Cole 99.6 77.5 87.8 77.1

HW Sadeh 98.1 86.3 91.9 84.4

HW LDA 97.3 84.5 90.1 81.9

HW LR 91.2 94.5 92.0 85.6

HW SVM 98.0 89.6 93.6 87.6

HW HMM 96.5 95.2 96.2 91.7

DW Cole 94.7 45.8 48.6 40.5

DW Sadeh 93.3 45.7 48.4 39.1

DW LDA 74.6 63.5 64.7 38.1

DW LR 71.8 65.4 66.3 37.1

DW SVM 78.6 56.7 58.5 35.2

DW HMM 77.0 71.2 72.1 48.2

HA HMM 91.5 87.9 90.1 79.5

DA HMM 82.4 58.0 59.0 40.4

an accuracy of ≈ 48 %. The HMM-based algorithm

could again achieve a more balanced result (sensitiv-

ity 77.0 %, speciﬁcity 71.2 %), which improved the

Youden-Index by 7.7 and the accuracy by 23.5 pp.

Therefore, the HMM-based algorithm is superior to

the conventional algorithms in all cases, considering

Youden-Index and accuracy.

5 CONCLUSION

This study investigates sleep detection in a real-world

setting, instead of the laboratory environment used

in previous studies. To compare conventional sleep

detection algorithms and proposed machine learning-

based methods, we performed two studies: One with

healthy young adults and one with nursing home res-

idents with Alzheimer’s disease. In contrast to pre-

vious studies, the data has not been annotated us-

ing PSG, but using subjective information (sleep di-

ary or DCM). This annotation may not be as accurate

as PSG, but allows new insights into how sleep algo-

rithms perform when applied in a real-world situation.

For the healthy subjects, the conventional algo-

rithms achieved a higher performance than in previ-

ous studies with healthy subjects, e.g. (Sadeh et al.,

1989), particularly a higher speciﬁcity (about 10 per-

cent points more for each algorithm). One explana-

tion is the different data set, i.e. the different record-

ing time frame (day- and nighttime) and the differ-

ent annotation procedure: The subjects themselves

noted their sleep/wake periods, which will not be

completely accurate, especially during the night. This

could lead to an increased reported performance, be-

cause “difﬁcult to detect” wake periods during the

night have been omitted. On the other hand, we

trained the algorithms using features different from

the original Activity Counts, which could also lead

to an increased performance, if the features we used

were more informative for sleep/wake discrimination.

For the subjects with dementia, the sensitivity and

speciﬁcity of the conventional algorithms is compara-

ble to studies with subjects with insomnia, e.g. (Taibi

et al., 2013). This is reasonable, because people

with dementia often suffer from severe sleep disor-

ders (McCurry and Ancoli-Israel, 2003). However,

because of the low speciﬁcity, and because the sub-

jects have been awake for most of the recording time,

the accuracy of the conventional algorithms is very

low for this data set.

The HMM-based algorithm achieves a higher per-

formance (accuracy and Youden-Index) than the con-

ventional algorithms for both data sets. The reason for

this is that this algorithm is more balanced in terms

of sensitivity and speciﬁcity, which means that for a

loss in sensitivity, a higher speciﬁcity can be obtained.

Particularly, for daytime sleep detection, a high sen-

sitivity is important, because of the strong impact on

accuracy. With this algorithm, on the data of the sub-

jects with dementia, an accuracy that is similar to pre-

vious studies on subjects with insomnia can be ob-

tained (with a lower sensitivity, but a higher speci-

ﬁcity than in previous studies).

Future work might include using other sensor data

than movement for sleep classiﬁcation, for example

the time of the day or the heart rate, which can be

obtained unobtrusively by a pulse oximeter.

ACKNOWLEDGEMENTS

This project was supported by the German Federal

Ministry of Education and Research (BMBF, Funding

number: 16SV7349).

REFERENCES

Ancoli-Israel, S. (2009). Sleep and its disorders in aging

populations. Sleep medicine, 10:S7–S11.

Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moor-

croft, W., and Pollak, C. (2003). The role of actigra-

phy in the study of sleep and circadian rhythms. amer-

ican academy of sleep medicine review paper. Sleep,

26(3):342–392.

Actigraphic Sleep Detection for Real-World Data of Healthy Young Adults and People with Alzheimerâ

Zs Disease

191

Bieber, G., Kirste, T., and Gaede, M. (2014). Low sampling

rate for physical activity recognition. In Proceed-

ings of the 7th International Conference on Perva-

sive Technologies Related to Assistive Environments,

pages 15:1–15:8. ACM.

Chawla, N. V. (2005). Data mining for imbalanced datasets:

An overview. In Data mining and knowledge discov-

ery handbook, pages 853–867. Springer.

Cole, R., Kripke, D., Gruen, W., Mullaney, D. J., and

Gillin, J. C. (1992). Automatic sleep/wake identiﬁ-

cation from wrist activity. Sleep, 15(3):461–469.

Cook, K., Lichstein, K., Donaldson, J., Nau, S., Lester, K.,

and Aguillard, R. (2004). An exploratory validation of

actigraphic measures of insomnia. Sleep, 27:270–270.

de Souza, L., Benedito-Silva, A. A., Pires, M. N., Poyares,

D., Tuﬁk, S., and Calil, H. M. (2003). Further valida-

tion of actigraphy for sleep studies. Sleep, 26(1):81–

85.

Domingues, A., Paiva, T., and Sanches, J. M. (2014). Sleep

and wakefulness state detection in nocturnal actigra-

phy based on movement information. IEEE Transac-

tions on Biomedical Engineering, 61(2):426–434.

Hedner, J., Pillar, G., Pittman, S. D., Zou, D., Grote, L., and

White, D. P. (2004). A novel adaptive wrist actigraphy

algorithm for sleep-wake assessment in sleep apnea

patients. Sleep, 27(8):1560–1566.

Kushida, C. A., Chang, A., Gadkary, C., Guilleminault, C.,

Carrillo, O., and Dement, W. C. (2001). Comparison

of actigraphic, polysomnographic, and subjective as-

sessment of sleep parameters in sleep-disordered pa-

tients. Sleep medicine, 2(5):389–396.

Le Bon, O., Staner, L., Hoffmann, G., Dramaix, M., San Se-

bastian, I., Murphy, J. R., Kentos, M., Pelc, I., and

Linkowski, P. (2001). The ﬁrst-night effect may last

more than one night. Journal of psychiatric research,

35(3):165–172.

Lichstein, K. L., Stone, K. C., Donaldson, J., Nau, S. D.,

Soefﬁng, J. P., Murray, D., Lester, K. W., and Aguil-

lard, R. N. (2006). Actigraphy validation with insom-

nia. Sleep, 29(2):232.

Martin, J. L. and Hakim, A. D. (2011). Wrist actigraphy.

Chest Journal, 139(6):1514–1527.

McCurry, S. M. and Ancoli-Israel, S. (2003). Sleep dys-

function in alzheimers disease and other dementias.

Current Treatment Options in Neurology, 5(3):261–

272.

Middelkoop, H. A., Dam, E. M., Smilde-Van den Doel,

D. A., and Dijk, G. (1997). 45-hour continuous

quintuple-site actimetry: Relations between trunk and

limb movements and effects of circadian sleep-wake

rhythmicity. Psychophysiology, 34(2):199–203.

Mishima, K., Okawa, M., Hishikawa, Y., Hozumi, S., Hori,

H., and Takahashi, K. (1994). Morning bright light

therapy for sleep and behavior disorders in elderly pa-

tients with dementia. Acta Psychiatrica Scandinavica,

89(1):1–7.

Nakazaki, K., Kitamura, S., Motomura, Y., Hida, A.,

Kamei, Y., Miura, N., and Mishima, K. (2014). Valid-

ity of an algorithm for determining sleep/wake states

using a new actigraph. Journal of physiological an-

thropology, 33(1):1.

Orellana, G., Held, C., Estevez, P., Perez, C., Reyes, S.,

Algarin, C., and Peirano, P. (2014). A balanced

sleep/wakefulness classiﬁcation method based on acti-

graphic data in adolescents. In 2014 36th Annual

International Conference of the IEEE Engineering

in Medicine and Biology Society, pages 4188–4191.

IEEE.

Paquet, J., Kawinska, A., and Carrier, J. (2007). Wake

detection capacity of actigraphy during sleep. Sleep,

30(10):1362.

Rechtschaffen, A. and Kales, A. (1968). A manual of stan-

dardized terminology, techniques and scoring system

for sleep stages of human subjects.

Sadeh, A., Alster, J., Urbach, D., and Lavie, P. (1989). Acti-

graphically based automatic bedtime sleep-wake scor-

ing: validity and clinical applications. Journal of Am-

bulatory Monitoring, 2(3):209–216.

Sloane, P. D., Brooker, D., Cohen, L., Douglass, C., Edel-

man, P., Fulton, B. R., Jarrott, S., Kasayka, R., Kuhn,

D., Preisser, J. S., et al. (2007). Dementia care map-

ping as a research tool. International journal of geri-

atric psychiatry, 22(6):580–589.

Taibi, D. M., Landis, C. A., and Vitiello, M. V. (2013). Con-

cordance of polysomnographic and actigraphic mea-

surement of sleep and wake in older women with in-

somnia. J Clin Sleep Med, 9(3):217–225.

Tilmanne, J., Urbain, J., Kothare, M. V., Wouwer, A. V.,

and Kothare, S. V. (2009). Algorithms for sleep–wake

identiﬁcation using actigraphy: a comparative study

and new results. Journal of sleep research, 18(1):85–

98.

Van Someren, E. J., Lazeron, R. H., Vonk, B. F., Mirmiran,

M., and Swaab, D. F. (1996). Gravitational artefact in

frequency spectra of movement acceleration: impli-

cations for actigraphy in young and elderly subjects.

Journal of neuroscience methods, 65(1):55–62.

Viterbi, A. (1967). Error bounds for convolutional codes

and an asymptotically optimum decoding algorithm.

IEEE transactions on Information Theory, 13(2):260–

269.

Youden, W. J. (1950). Index for rating diagnostic tests. Can-

cer, 3(1):32–35.

Zhang, C.-L. and Popp, F.-A. (1994). Log-normal distri-

bution of physiological parameters and the coherence

of biological systems. Medical Hypotheses, 43(1):11–

16.

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

192