Heuristic Approximation of the MAP Estimator for Automatic

Two-channel Sleep Staging

Shirin Riazy

, Tilo Wendler

, J

urgen Pilz

, M. Glos

and T. Penzel

Hochschule f

ur Technik und Wirtschaft Berlin, Treskowallee 8, 10318 Berlin, Germany

Alpen-Adria Universit

at Klagenfurt, Universit

atsstraße 65-67, 9020 Klagenfurt am W

orthersee, Austria

Charit

e - Universit

atsmedizin Berlin, 10117 Berlin, Germany

Keywords:

Automatic Sleep Staging, Two-channel Measurement, Bayesian Statistics, Hidden Markov Model, MAP.

Abstract:

In this paper, we shall introduce an algorithm that classiﬁes EEG data into ﬁve sleep stages, relying only on

two-channel sleep measurements. The sleep of a patient (divided into intervals of 30 seconds) is assumed to

be a Markov chain on the ﬁve-element state space of sleep stages and our aim is to compute the most probable

chain of this hidden Markov model by a maximum a posteriori (MAP) estimation in the Bayesian framework.

Both the prior distribution of the chains and the likelihood model have to be trained on manual classiﬁcations

made by professionals. For this purpose, the data is ﬁrst preprocessed by a Fourier transform, a log transform

and a principal component analysis for dimensionality reduction. Since the number of possible chains is

immense (roughly 10

335

), a heuristic approach for the computation of the MAP estimator is introduced, that

systematically discards unlikely chains. The sleep stage classiﬁcation is then compared to the classiﬁcation of

a professional, who scores according to the AASM and uses a full polysomnography. The overall structure of

the hypnogram can adequately be reconstructed with error rates around 25%.

1 INTRODUCTION

In the context of a research project of the Hochschule

ur Technik und Wirtschaft Berlin, the Alpen-

Adria Universit

at Klagenfurt and the Charit

e Univer-

sit

atsmedizin Berlin, the authors have investigated on

a mathematical model for the automatic classiﬁca-

tion of sleep stages using very little measurements.

The measurement technique used for said classiﬁca-

tion is the Electroencephalography (EEG), a method

for the measurement of differences of electrical po-

tential on the surface of the head. This method has

been widely used for the diagnosis of diseases of the

central nervous system, as well as sleep processes.

Using EEG data as one of several parameters in a

full polysomnography, one can diagnose pathological

states of dormancy, various forms of insomnia as well

as dysfunctions of the circadian rhythm. To do so,

sleep is classiﬁed into sleep stages in order to monitor

the course of the sleep throughout the night.

For the classiﬁcation of sleep stages, the EEG data

has mostly been used for the visually distinctive sig-

nal structures, ranging in a certain frequency domain,

which can be measured in certain areas of the head,

usually in frontal, central and occipital positions. The

classiﬁcation procedure is done mostly by hand, with

the help of a polysomnographic software. To sim-

plify this process, we would like to propose an algo-

rithm for the automatic sleep-staging using only two-

channel measurements as input data.

In our case, the positions A

and A

, typically

measured at or right behind the ears at the so-called

mastoids shall be the only input for the classiﬁcation

in contrast to a full polysomnography.

1.1 Automatic Two-channel Sleep

Staging

The term sleep staging describes the classiﬁcation

of 30-second intervals of sleep (so-called epochs)

into different sleep stages. The method for assign-

ing a sleep stage to the measurement of an epoch

was ﬁrst systematized by Rechtschaffen and Kales

(Kales et al., 1968) and has since evoked generations

of sleep-staging manuals that contain detailed instruc-

tions as to how measurements should be adequately

matched with a sleep stage. In these manuals, the per-

centage of certain frequencies, appearances of certain

signal structures and the location of said signal struc-

tures play a vital role in the sleep-staging process.

236

Riazy S., Wendler T., Pilz J., Glos M. and Penzel T.

Heuristic Approximation of the MAP Estimator for Automatic Two-channel Sleep Staging.

DOI: 10.5220/0006242802360241

In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), pages 236-241

ISBN: 978-989-758-212-7

Figure 1: Averaged power spectrum of each sleep stage with the frequencies given in Hz.

The American Academy of Sleep Medicine’s manual

(Berry et al., 2012) differentiates between ﬁve stages

of sleep: Awake, Stage 1, Stage 2, Stage 3, REM

sleep, where the stages of light sleep (Stage 1 and

Stage 2) are sometimes called NREM 1 and NREM

2 (for Non-REM sleep 1 and 2) and the deep sleep

stage, Stage 3, is sometimes called NREM 3 (for Non-

REM sleep 3).

A measurement of a full night of sleep results in

approximately 960 epochs (8 hours) per patient. Fur-

ther, as recommended by AASM (Berry et al., 2012),

professional scorers rely on a full polysomnography,

which consists of several EEG nodes, EOG deriva-

tions, a Chin EMG and several other parameters for an

optimal rating of the sleep of the patient. The perfor-

mance of these measurements is not only painstaking

and expensive, but can also have a strong inﬂuence on

the sleep of the patient, leading to distorted data.

As the classiﬁcation of sleep stages via profes-

sionals tends to be tedious, there have been various

approaches to automatize the procedure (Tagliazuc-

chi et al., 2012; Redmond and Heneghan, 2003; Liang

et al., 2012; Pan et al., 2012; Wang et al., 2015; An-

derer et al., 2010; Gudmundsson et al., 2005), using

methods such as SVMs, neural networks or Hidden

Markov Models. Mostly, full polysomnographic data

was used and reliabilities ranging between 70% and

90% were achieved (Penzel and Conradt, 2000); how-

ever, there has also been automatic single-channel or

two-channel sleep staging (Zibrandtsen et al., 2016;

Wang et al., 2015; Koley and Dey, 2012). In (Ko-

ley and Dey, 2012), the measurements of the elec-

trodes at the mastoid (often referred to as A

and A

and M

) were used for the classiﬁcation into light

sleep, deep sleep and REM sleep, thus offering proof

that measurements at the mastoid yield enough reli-

able information for the classiﬁcation into three sleep

stages. In this paper, we would like to take it one step

further by classifying into ﬁve different sleep stages,

as is usual in sleep medicine. This task proves to be

highly difﬁcult, since the transitional stage 1 is often

uncertainly classiﬁed and can easily be mistaken for

the sleep stages 2 and Awake.

1.2 Overview and Notation

Our data set consists of A

- and A

- measurements

over 8 hours of sleep of several persons with a sam-

pling rate of 256Hz. This data is divided into blocks

of 30 seconds (so-called epochs), that contain 256 ·

30 = 7680 measurements each for A

and A



, . . . , E



⊆ R

7680

, T ≈ 960, k = 1, 2.

For each patient, we divide the set of epochs into a

training set and a test set

E = E

train

∪ E

test

train

[

k=1,2

, . . . , E

train

test

[

k=1,2

train

, . . . , E

In our case, we divided the 8-hour-measurement into

two halves and used the ﬁrst half as the training set

(which equates to T

test

≈ 480) and the second half as

the test set. This choice shall be improved in the next

stages of this project, as it is known that REM sleep

typically occurs more often in the second half of the

night. The sequence of sleep stages corresponding to

these epochs will be denoted by random variables

C = (C

, . . . , C

train

| {z }

=: C

train

, C

train

, . . . , C

| {z }

=: C

test

) ∈ S

Heuristic Approximation of the MAP Estimator for Automatic Two-channel Sleep Staging

237

Figure 2: PCA Coefﬁcients of the ﬁrst two dominant eigen-

vectors of measurements of different sleep stages.

where the state space S consists of the ﬁve sleep

stages:

S = {’Awake’, ’Stage 1’, ’Stage 2’, ’Stage 3’, ’REM’}.

We will assume its classiﬁcation by a professional

scorer (who had access to the full polysomnography),

train

, C

test

)

= (c

prof

train

, c

prof

test

)

to be correct. Our aim is to implement an algorithm

A, that classiﬁes the test set E

test

using only the pro-

fessional classiﬁcation c

prof

train

of the training set E

train

A : (E

train

, E

test

, c

prof

train

) 7→ c

algo

test

∈ S

T −T

train

such that c

algo

test

= c

prof

test

in many components with high

probability.

2 THE ALGORITHM

2.1 Preprocessing

The frequencies in a measurement play a central role

in the classiﬁcation of sleep stages (Berry et al., 2012;

Penzel and Conradt, 2000). Signals of certain fre-

quencies indicate conscious behaviour (such as the

highly-frequent beta waves) or unconsciousness (such

as delta waves). Therefore, it is meaningful to utilize

the power spectrum of a measurement. As it can be

seen in Figure 1, the average power spectra of differ-

ent sleep stages behave visibly different, except for

sleep stages one and two, which are not easily dis-

tinguishable. Furthermore, the power spectra of mea-

surements E

∈ R

7680

of epochs might be easier to

compare, since they are invariant under shifts in time.

Additionally, it is known that certain confounding

measurements may hinder the classiﬁcation of sleep

stages. These signals, which are called artifacts, can

be caused for example by sweat on the skin of the

proband or movements of body parts, eyes etc.. How-

ever, it is also known that these artifacts typically

evoke signals that have different frequencies than the

signals, which are used for the classiﬁcation of sleep

stages. Through a low pass ﬁlter, we ignore all the fre-

quencies above a ﬁxed threshold (approximately 66

Hertz). We do so in the following manner:

∈ R

7680

7→ F

= |FFT(E

)| ∈ R

7680

and

7→

= proj(F

) ∈ R

2000

where proj(·) is the projection of a vector onto its ﬁrst

2000 coordinates. We assume this to be adequate, as

the signals used for professional scoring range in the

same frequency interval and it is known that artifacts

created by bodily movement typically involve high-

frequent signals.

We would like to remark that we have purposely

not ﬁltered out all of the frequencies that are not used

in traditional sleep-staging in an attempt to retain as

much information as possible while still removing ar-

tifacts.

To further reduce the dimension of the data and

to make the data usable for classiﬁcation, we used a

principal component analysis (PCA), where we pro-

jected the data onto the space spanned by the 15 dom-

inant eigenvectors of the covariance matrix. As it can

be seen in Figure 2, the procedure retains a consid-

erable amount of information from the measurements

and the differences of the sleep stages can be seen

even when using only the ﬁrst two dominant eigen-

vectors. A scree plot was used to determine the opti-

mal amount of eigenspaces, which was 15 in our case.

To avoid an a priori bias towards the lower fre-

quencies, the Fourier transformed data is logarith-

mized before applying the PCA. This makes sense as

higher frequencies tend to have smaller amplitudes.

Let

= log

be the logarithmized projection of the power spectrum

for a t ∈ {1, . . . , T }. We used the data matrix (M

, M

)

for the PCA

(

)

log

7−→ (M

, M

)

PCA

7−−→ m

∈ R

where m

is the projected data resulting from the PCA.

This leads to the following overall preprocessing steps

7680

)

−→ (R

2000

)

−→ (R

2000

)

−→ R

)

FFT

7−−→

proj

(

)

log

7−→ (M

)

PCA

7−−→ m

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

238

We then deﬁne

train

:= {m

, . . . , m

train

}

test

:= {m

train

, . . . , m

}

2.2 Hidden Markov Model and the

MAP Estimator

We assume the sequence of sleep stages C =

, ..., C

) ∈ S

introduced in Section 1.2 to be a

Markov chain with transition matrix P ∈ R

5×5

, which

will be approximated by the empirical probabilities of

all possible transitions

s,s

#{x

t+1

= s, x

= s

| t = 1, . . . , T

train

− 1}

#{x

= s

| t = 1, . . . , T

train

− 1}

for s, s

∈ S (it is thereby not considered part of the in-

ference process). Accordingly, the invariant measure

π of the Markov chain will be approximated by the

empirical probabilities

π(s) =

#{x

= s | τ = 1, . . . , T

train

}

train

, s ∈ S .

For the inference process, our prior konwledge

P(C

test

) about C

test

is therefore modeled by Markov

chain of length T

test

= T − T

train

with P(C

) = π for

each t = T

train

+ 1, . . . , T and transition matrix P .

Bayes’ rule is the proper tool to update our knowl-

edge about C

test

given the measurements X

test

P(C

test

| X

test

) =

P(X

test

| C

test

)P(C

test

)

P(X

test

)

We will mainly focus on estimating the maximum a

posteriori (MAP) estimator of C

test

MAP

test

= argmax

P(C

test

= c | X

test

)

= argmax

P(X

test

| C

test

= c)P(C

test

= c).

Since each measurement X

is assumed to depend only

on C

, we assume that all measurements

| C

test

= X

| C

, t = T

train

+ 1, . . . , T,

are independent. Therefore, the likelihood model

P(X

test

| C

test

) is be given by the product of the in-

dividual likelihoods P(X

| C

P(X

test

| C

test

) =

∏

t=T

train

P(X

| C

)

the latter being trained by a pre-installed MATLAB

classiﬁer. Theoretically, we now have all the neces-

sary ingredients to compute the MAP estimator.

2.3 Heuristic Approximation of the

MAP Estimator

Though we are now able to compute the probability

of each realization C

test

= c ∈ S

test

given the mea-

surements X

test

, the straightforward approach of com-

puting all these probabilities is impractical due to the

huge number of possible chains,



test



= 5

test

≈ 3 · 10

335

Therefore, a heuristic approach is necessary in order

to compute these probabilities P(C

test

= c | X

test

) only

for “realistic” or “probable” chains. Since we are

only interested in the MAP estimator c

MAP

test

and not

in the whole probability distribution of C

test

, this

restriction is meaningful, however, it will only yield

an approximation of c

MAP

test

For the heuristic approach let us ﬁrst introduce the

notation

:= (C

Train

, . . . , C

Train

:= (X

Train

, . . . , X

Train

) := P(X

| C

= c

)P(C

= c

the latter being the value we want to maximize for

t = T

test

We will proceed as follows:

1. Compute all possible (5

= 625) values of π

2. For t = 5, . . . , T

test

, iterate

(a) Restrict the set of considered subchains

, . . . , c

t−1

) to 125 subchains with the highest

values of π

t−1

(b) Compute π

for all possible subchains

, . . . , c

) that contain the above subchains

(5 · 125 = 625 values).

This way, chains with unlikely subchains are dis-

carded successively and we end up with 625 very

probable (but possibly not most probable) chains.

Step (b) in the above iteration can be performed very

efﬁciently via the formula

P(C

| X

) ∝ P(C

t−1

| X

t−1

)P(C

| C

t−1

)P(X

| C

Through this heuristic, the number of evaluations of

this formula is reduced to 125 in each timestep.

3 RESULTS

The method was tested on ﬁve healthy subjects at

the sleep laboratory of the Advanced Sleep Research

GmbH. Each patient’s sleep was measured with a full

polysomnography for a duration of approximately 8

Heuristic Approximation of the MAP Estimator for Automatic Two-channel Sleep Staging

239

Figure 3: Sleep stage classiﬁcation of the second half of the measurement. Comparison of the MAP estimator method (blue)

and the classiﬁcation of a professional scorer (red).

Table 1: Confusion matrix of patient 3, where the classiﬁcations of a professional scorer are compared to the output of the

algorithm.

Classiﬁcation

Awake Stage 1 Stage 2 Stage 3 REM

Stages Awake 27 2 0 0 0

Stage 1 17 31 11 0 3

Stage 2 2 1 145 6 6

Stage 3 1 0 5 64 0

REM 3 7 7 0 145

hours. As stated before, the ﬁrst half of the night was

used as training data, whereas the second half of the

night was used to test the algorithm. The hypnogram

that was used as the “ground truth” to train and verify

the algorithm was scored by a professional somnolo-

gist, who used a full polysomnography.

The error rates for the ﬁve subjects were 14%,

24%, 34%, 22% and 31%, respectively. Thus, they

are ranging between 14%-34% with an average of

25%.

As one can see in Figure 3, the overall structure of

the classiﬁcation of the algorithm is very close to the

scoring of a professional, who had the advantange of

using a full polysomnography.

The evaluation of the confusion matrices showed

that the classiﬁcation of the Awake stage, as well as

stages 2 and 3 are fairly adequate. Stage 1 is often

mistaken for the Awake stage or stage 2 and REM

stage is sometimes mistaken for one of the light sleep

stages, as it can be seen in Table 1.

4 DISCUSSION

Summing things up, we can conclude that the mea-

surements at A

and A

contain relevant information

for automatic sleep-scoring. This can be viewed as a

proof of concept, as these positions are usually only

used as reference electrodes. Beyond the recognition

of wakefulness and sleep, the MAP estimator offers

differentiated classiﬁcations.

It appears that, in contrast to earlier methods used

for the classiﬁcation of sleep stages, the method pre-

sented in this paper seems to prefer continuous struc-

tures. Especially in phases of transition between

stages, where professional scorers seem to be uncer-

tain, the MAP estimation method prefers a single tran-

sition in contrast to the erratic behaviour of the pro-

fessional scorer.

It could be argued that such an undecisive phase

between the stages should be ﬂattened with respect to

the preceding and the following sleep stage. In this

case, it is unclear, whether classiﬁcation of the auto-

matic method or the scorer is preferable.

Usually, patients are requested to spend two nights

at a sleep laboratory, in order to minimize confound-

ing effects. A minimized measurement system with

the patient-speciﬁc algorithm at hand could replace

the second measurement and thus halve the costs for

sleep-related diagnosis at worst. At best, the algo-

rithm could learn to transfer the classiﬁcation from

one person to another.

ACKNOWLEDGEMENTS

We would like to thank the Charit

e Universit

atsmedi-

zin Berlin as well as the Advanced Sleep Research

GmbH for providing the data for testing the method.

BIOSIGNALS 2017 - 10th International Conference on Bio-inspired Systems and Signal Processing

240

REFERENCES

Anderer, P., Moreau, A., Woertz, M., Ross, M., Gruber,

G., Parapatics, S., Loretz, E., Heller, E., Schmidt, A.,

Boeck, M., et al. (2010). Computer-assisted sleep

classiﬁcation according to the standard of the amer-

ican academy of sleep medicine: validation study of

the aasm version of the somnolyzer 24× 7. Neuropsy-

chobiology, 62(4):250–264.

Berry, R. B., Brooks, R., Gamaldo, C. E., Harding, S. M.,

Marcus, C., and Vaughn, B. (2012). The aasm manual

for the scoring of sleep and associated events. Rules,

Terminology and Technical Speciﬁcations, Darien,

Illinois, American Academy of Sleep Medicine.

Gudmundsson, S., Runarsson, T. P., and Sigurdsson, S.

(2005). Automatic sleep staging using support vec-

tor machines with posterior probability estimates.

In International Conference on Computational In-

telligence for Modelling, Control and Automation

and International Conference on Intelligent Agents,

Web Technologies and Internet Commerce (CIMCA-

IAWTIC’06), volume 2, pages 366–372. IEEE.

Kales, A., Rechtschaffen, A., University of California,

L. A., and (U.S.), N. N. I. N. (1968). A manual of stan-

dardized terminology, techniques and scoring system

for sleep stages of human subjects. Allan Rechtschaf-

fen and Anthony Kales, editors. U. S. National Insti-

tute of Neurological Diseases and Blindness, Neuro-

logical Information Network Bethesda, Md.

Koley, B. and Dey, D. (2012). An ensemble system for

automatic sleep stage classiﬁcation using single chan-

nel eeg signal. Computers in biology and medicine,

42(12):1186–1195.

Liang, S.-F., Kuo, C.-E., Hu, Y.-H., and Cheng, Y.-S.

(2012). A rule-based automatic sleep staging method.

Journal of neuroscience methods, 205(1):169–176.

Pan, S.-T., Kuo, C.-E., Zeng, J.-H., and Liang, S.-F.

(2012). A transition-constrained discrete hidden

markov model for automatic sleep staging. Biomed-

ical engineering online, 11(1):1.

Penzel, T. and Conradt, R. (2000). Computer based

sleep recording and analysis. Sleep medicine reviews,

4(2):131–148.

Redmond, S. and Heneghan, C. (2003). Electrocardiogram-

based automatic sleep staging in sleep disordered

breathing. In Computers in Cardiology, 2003, pages

609–612. IEEE.

Tagliazucchi, E., von Wegner, F., Morzelewski, A., Borisov,

S., Jahnke, K., and Laufs, H. (2012). Automatic sleep

staging using fmri functional connectivity data. Neu-

roimage, 63(1):63–72.

Wang, Y., Loparo, K. A., Kelly, M. R., and Kaplan, R. F.

(2015). Evaluation of an automated single-channel

sleep staging algorithm. Nature and science of sleep,

7:101.

Zibrandtsen, I., Kidmose, P., Otto, M., Ibsen, J., and Kjaer,

T. (2016). Case comparison of sleep features from

ear-eeg and scalp-eeg. Sleep Science.

Heuristic Approximation of the MAP Estimator for Automatic Two-channel Sleep Staging

241