Detecting Neonatal Seizures using Short Time Fourier Transform and
Frechet Distance
Aleksandar Jeremic
1
and Dejan Nikolic
2
1
Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada
2
Physical Medicine and Rehabilitation, University Childrens Hospital, Faculty of Medicine,
Keywords:
Seizure Detection, Information Fusion, Machine Learning.
Abstract:
Recently there has been an increase in the number of long-term cot-bed EEG systems being implemented in
clinical practice in order to monitor neurological development of neonatal patients. Consequently a significant
research effort has been made in the development of automatic EEG data analysis tools including but not
limited to seizure detection as seizure frequency and/or intensity are one of the most important indicators of
brain development. In this paper we propose to evaluate time dependent power spectral density using short
time Fourier transform and using Frechet distance measure to detect presence and/or absence of seizures. We
propose to use three different distance measures as they capture different properties of the corresponding PSD
matrices. We evaluate the performance of the proposed algorithms using real data set obtained in the NICU of
the McMaster University Hospital. In order to benchmark performance of our proposed techniques we trained
and tested a support vector machine (SVM) classifier.
1 INTRODUCTION
Continuous EEG monitoring and analysis remain im-
portant clinical tools in neonatal intensive care units
(NICU) for early stage evaluation / detection / diag-
nosis of various types of encephalopathies. In the re-
cent decade there has been a significant advancement
in utilizing advanced EEG techniques for improving
outcomes for neonatal patients experiencing various
degrees of neurological developmental issues.(Faul,
2005). To this purpose long-term EEG monitoring
is being applied to cot-beds for a wider spectrum of
patients and not only to those with severe neurolog-
ical problems (Temko et al., 2015). Consequently
the amount of data being generated by such sys-
tems cannot be reviewed by experts due to the lim-
ited resources (number of personnel, time, etc.) One
of the most important and critical emergencies phe-
nomenons that is being monitored in NICUs is oc-
currence of seizures as it allows better understand-
ing of brain function in a variety of patients, from the
extremely premature newborn to the term baby with
acute injury. In addition, neonatal EEG facilitates
rapid diagnosis of seizures, identification of epileptic
encephalopathy, and may provides useful prognostic
information.
A seizure is defined clinically as a paroxysmal al-
teration in neurologic function, i.e., behavioural, mo-
tor, or autonomic function. It is a result of exces-
sive electrical discharges of neurones, which usually
develop synchronously and happen suddenly in the
central nervous system (CNS). It is critical to recog-
nize seizures in newborns, since they are usually re-
lated to other significant illnesses. Seizures are also
an initial sign of neurological disease and a potential
cause of brain injury (Volpe, 2001). In a clinical set-
tings physicians are able to detect seizures based on
EEG data however the process may be time consum-
ing considering the number of cot-beds in regular size
NICU department. To this purpose development of
computer-aided diagnosis would be extremely bene-
ficial as such system would be important from both
academic and clinical standpoint of view. From the
academic stand point automatic recording of seizures
and consequently analysis of these data would pro-
vide insight into frequency of occurrence and corre-
late it with the dynamic of neurological development.
From clinical standpoint it has been demonstrated that
the neonatal patient outcomes can be improved due to
early detection of certain encephalopathies.
In the last decade a significant number of short-
time Fourier transform and wavelet transform tech-
342
Jeremic, A. and Nikolic, D.
Detecting Neonatal Seizures using Short Time Fourier Transform and Frechet Distance.
DOI: 10.5220/0009178703420347
In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 4: BIOSIGNALS, pages 342-347
ISBN: 978-989-758-398-8; ISSN: 2184-4305
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
niques have been proposed. Wavelet transform tech-
niques commonly attempt to decompose the sig-
nal into the frequency bands of interest and thus
achieve discriminatory goals. On the other hand,
STFT attempts to analyze non-stationary properties
by analyzing various features contained in the time-
dependent power spectral density. To this purpose
several machine learning algorithms have been imple-
mented but their performance relies heavily on sig-
nificant amount of data which may not be available
since the patient-to-patient variability of seizure fea-
tures is significant. In our previous work, we pro-
posed two distributed detection algorithms for neona-
tal seizure detection using some of the commonly
used single channel seizure detection algorithms and
extended this approach to the detection of seizures us-
ing Frechet distance measure of sample EEG covari-
ances. In this paper we develop a distributed detec-
tion algorithm using a single-channel detection algo-
rithm based on short-time Fourier transform (STFT)
and three Frechet distance based detectors between
the training ensemble of STFT matrices and actual
data.
First, we present an estimator of the Frechet mean
of the power spectral density (PSD) matrix on the
manifold M using the different measures of Rie-
mannian distances. Then we introduce the Fr
´
echet
mean based on two Riemannian distances and dis-
cuss computational algorithms for calculating the pro-
posed distance means. In Section 3 we illustrate ap-
plicability of our results using data set of NICU pa-
tients. Finally, in Section 4 we discuss future direc-
tions.
2 SIGNAL MODEL
2.1 Short-time Fourier Transform
Let x(t) denote the uniformly sampled EEG signal,
then the discrete STFT can be written as
F {F(n,k)} =
M1
m=0
f (n m)w(m)e
2πmk/N
(1)
where f (n) is the EEG signal, w(m) is a support-
ing window, M is the window length, and N is the
number of samples used for calculating STFT. There-
fore this algorithm can be viewed as a continuous cal-
culation of STFT using a sliding windows and hence
contains in itself temporal variation of the EEG fre-
quency spectrum. The visual representation of matrix
F is commonly referred to as a spectrogram of the
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(a) First quarter
1 2 3 4 5 6 7 8
20
40
60
80
100
120
(b) Second quarter
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(c) Third quarter
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(d) Fourth quarter
Figure 1: PSD matrices in the presence of seizures - 1s
epoch with 25% overlap.
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(a) Figure E
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(b) Figure F
1 2 3 4 5 6 7 8
20
40
60
80
100
120
(c) Figure G
1 2 3 4 5 6 7 8
Time
20
40
60
80
100
120
Frequency
(d) Figure H
Figure 2: PSD matrices in the absence of seizures - 1s
epoch.
corresponding signal. In order to apply the Frechet
distance measure to the ensemble of the correspond-
ing STFT spectrograms F
i
we calculate the corre-
sponding power spectral density matrix given by
S
i
= F
i
F
H
i
(2)
where superscript H denotes Hermitian transpose
due to the fact that the entries of F are complex num-
bers. In Figures 1 we illustrate sliding window PSD
output of STFT for an arbitrary seizure epoch and in
Figure 2 we illustrate similar results fin the absence
of seizures.
2.2 Frechet Distance
To measure the distance between two M × M covari-
ance matrices A and B on manifold of positive definite
Detecting Neonatal Seizures using Short Time Fourier Transform and Frechet Distance
343
matrices M , we consider the metrics which have been
developed to measure distance between two points on
the manifold itself.
The first metric is obtained by measuring distance
between projections on the subspace spanned by uni-
tary matrices (Li and Wong, 2013)
d
R
1
(A,B) =
q
Trace(A) + Trace(B) 2Trace(A
1
2
BA
1
2
)
(3)
The second metric is obtained by measuring the
distance between their projections on the subspace
spanned by identity matrices. It has been shown (Li
and Wong, 2013) that this distance is equivalent to:
d
R
2
(A,B) =
q
Trace(A) + Trace(B) 2Trace(A
1
2
B
1
2
)
(4)
Let the points A,B M and let X be a the point on
the manifold at which we construct a tangent plane ( it
is usually denoted as T
M
X). According to the inner-
product
h
A,B
i
X
= Trace(X
1
AX
1
B) the log- Rie-
mannian metric is given as (Moakher, 2005):
d
R
3
(A,B) =
log(A
1
2
BA
1
2
)
2
=
s
M
i=1
log
2
(L
i
)
(5)
where the L
i
s are the eigenvalues of the matrix
A
1
B (Absil et al., 2009). (Metric d
R3
has been de-
veloped in various ways and has, for a long time, been
used in theoretical physics).
In order to solve the corresponding minimization
problems we presented detailed computational algo-
rithms for calculating these distances in (Jahromi et
al., 2015). In all the cases certain iterative proce-
dures are necessary however we demonstrated exis-
tence of unique solutions (means) for all the proposed
distances.
2.3 Local Detectors
We then construct two separate ensembles: in the ab-
sence of seizures we calculate sequence of PSD ma-
trices F
j
i
|H
0
for i = 1,..., p where p is the total num-
ber of windows for the jth EEG channel and simi-
larly, when the seizure is present we calculate F
j
i
|H
1
i = 1, ... ,q where q is the total number of windows
for the jth channel and particular seizure epoch. Note
that in the preliminary approach we will use a single
channel detection system using C3 channel based on
10 20 system labels.
For each ensemble we then calculate the centre of
the ensemble by using a Frechet mean given as the
point which minimizes the sum of the squared dis-
tances (Barbaresco, 2008):
ˆ
S|H
l
= argmin
SM
n
i=1
d
2
(S
i
|H
l
,F |H
l
) (6)
where d(.,.) denotes the metric being used respec-
tively. Therefore the above expression can be inter-
preted as a way of calculating an averaged sample psd
matrix using a sliding window where S
i
represents an
i th PSD window. Then we calculate empirical pdf
by finding by modelling the pdf as a set of radial basis
functions using
ˆ
S|H
l
as a centre i.e.
ˆp
j
(d|H
i
) =
n
l=1
α
l
e
−kd−kS k
2
d
j
/β
2
l
(7)
where subscript i denotes hypothesis (seizure
present or no seizure) and subscript j denotes with re-
spect to which of the three aforementioned distances
was used. Note that we obtain 6 different empirical
pdfs using two hypotheses and three distances. The
unknown coefficients α
l
and β
l
are obtained by ap-
plying least squares fit on the empirical counts based
on the training set (expert annotations).
The local decisions u
n
, n = 1,2,3 can be expressed
as
u
n
=
(
0, the nth detector favours H
0
1, the nth detector favours H
1
(8)
where ”favours” should be interpreted in the fol-
lowing way. If the prior probabilities are know pick
i so that P(H
i
) ˆp
n
(d|H
i
) P(H
1i
) ˆp
n
(d|H
1i
) (i.e.
maximum a posteriori detector) and if they are be-
ing treated as equally likely then pick ˆp
n
(d|H
i
)
ˆp
n
(d|H
1i
) (i.e. maximum likelihood detector). In
the remainder of the paper we will be using MAP de-
tector as the patients admitted to NICU have sufficient
number of seizure epochs.
2.4 Distributed Detection System
Each of the metric detectors presented in the previ-
ous section can be considered as a single channel i.e.
local detector. In order to improve the overall perfor-
mance of a single detectors we propose to combine
the existing single detectors and utilize their strengths
by extending previous results on blind multichannel
information fusion (Liu et al., 2007).
Figure 3 shows the structure of a typical parallel
distributed detection system with N detectors. The
local detectors transmit local decisions u
n
based on
a particular metric that they are using. Obviously in
BIOSIGNALS 2020 - 13th International Conference on Bio-inspired Systems and Signal Processing
344
Local
Detector LD
Local
Detector LD
Local
Detector LD
Phenomenon
Fusion
Center
u
u
u
1
2
n
y
1
y
2
y
n
u
0
1
2
n
Figure 3: Parallel Distributed Detection System.
our case there are three local detectors as we are us-
ing three different metrics. All the local decisions are
then sent to the fusion centre, where the global de-
cision u
0
is made based on a fusion rule in order to
minimize the overall probability of error. Additional
detectors can be added into the system whenever more
information is required to make final decision.
The local decisions u
n
, n = 1,2 can be expressed
as
u
n
=
(
0, the nth detector favours H
0
1, the nth detector favours H
1
(9)
where ”favours” should be interpreted in the follow-
ing way: pick i so that ˆp
n
(d|H
i
) ˆp
n
(d|H
1i
).
After receiving the local decisions, the fusion cen-
tre makes the global decision by applying an optimal
fusion rule in order to minimize the final error prob-
ability. The authors provided the optimality criterion
for N local detectors in the sense of minimum error
probability in (Varshney, 1986). We recall it here for
the case of N = 3.
u
0
=
(
1, if w
0
+
3
n=1
w
n
> 0
0, otherwise
(10)
where,
w
0
= log
P
1
P
0
(11)
and
w
n
=
(
log((1 P
m
n
)/P
f
n
), if u
n
= 1
log(P
m
n
/(1 P
f
n
)), if u
n
= 0
(12)
The probabilities of false alarm and missed detec-
tion of the nth local detector are denoted as P
f
n
and P
m
n
,
respectively. The optimal fusion rule tells us that the
global decision u
0
is determined by the a priori prob-
ability and the detector performances, i.e., P
1
, P
f
n
and
Figure 4: Scatter plot of detection performance using blind
method.
Figure 5: Scatter plot of detection performance using MAP
method.
Figure 6: Scatter plot of detection performance using ML
method.
P
m
n
. In our previous work we considered these prob-
abilities to be unknown (Mirjalily, 2003),(Liu et al.,
2007). In the current work we assume prior probabili-
ties P(H
0
) and P(H
1
) are unknown as they can change
Detecting Neonatal Seizures using Short Time Fourier Transform and Frechet Distance
345
significantly with time and depend on the neonate’s
state but we assume that the anomalies are estimated
from the empirical pdf distributions given in 7.
In order to make the final decision, we need to uti-
lize the information available to us: the local binary
decisions u
n
. Note that in the presence of the training
set (annotations) the initial guesses for unknown pa-
rameters can be obtained from the training set but they
are non-stationary and change with time. The details
of the implementation are given in our previous work,
(Liu et al., 2014), (Jeremic and Nikolic, 2019).
3 RESULTS
We evaluate the performance of the proposed algo-
rithms on the data set consisting of preterm infants
(GA less than 32 weeks) admitted to the Neonatal In-
tensive Care Unit at McMaster Hospital. Due to phys-
ical limitations we were able to obtain prior expect
knowledge on a very limited time length and limited
set of patients. We selected only patients with seizure
epochs and obtained expert annotations on a limited
length (2 hours per patient).
For illustrational purposes in Figures 4-6 , we plot
the detection performance as a scatter diagram of win-
dows selected from testing data. Note that in the pres-
ence of motion artifacts the actual performance will
actually vary significantly. Furthermore because the
original system design was based on no-seizures the
system was calibrated so that the probability of false
alarm is controlled. Due to motion artifacts and re-
action to pain stimuli during medical procedures in
NICU it is quite likely that local detectors will iden-
tify these manifestation in EEG as false seizure. In
Table 1 we present the results of our previously pro-
posed blind system (Jeremic and Nikolic, 2019) with-
out any training in which the detector anomalies and
priors are estimated and the local detectors are based
on (Rankine et al., 2007), (Gotman, 1997) and (Celka
and Colditz, 2002). In Tables 2 and 3 we illustrate our
two proposed algorithms with average probability of
error averaged of 1000 randomized training set runs.
As expected the proposed system performs better due
to the fact that expert annotations are available.
For comparison purposes we also implemented a
support vector machine (SVM) classifier which at-
tempts to find an optimal hyperplane in the feature
(or reduced dimension) space which minimizes over-
all probability of classification error. To this purpose
we use PSD images and reduce their dimensional-
ity using principal component analysis (PCA) as fea-
ture reduction preprocessing technique. The overall
average accuracy of SVM was 82% for seizure-free
Table 1: Average seizure detection performance - blind.
d
R1
d
R2
d
R3
Fused
false seizures 0.14 0.15 0.16 0.11
missed seizures 0.17 0.14 0.16 0.15
Table 2: Average seizure detection performance - training
set maximum a posteriori.
d
R1
d
R2
d
R3
Fused
false seizures 0.07 0.09 0.12 0.05
missed seizures 0.09 0.08 0.11 0.07
Table 3: Average seizure detection performance - training
set based maximum likelihood.
d
R1
d
R2
d
R3
Fused
false seizures 0.09 0.11 0.15 0.09
missed seizures 0.11 0.10 0.14 0.12
epochs and 78% for seizure epochs. The number of
features selected was set to 20 in order to capture 85%
of the variance (arbitrarily set). Note that the number
of features can be selected optimally and it will be
addressed in future work.
4 CONCLUSIONS
Automatic systems for seizure detection have been
subject of considerable research interest in the past.
One of main advantages lies in the fact that expert
time is potentially required only during the training
session. Furthermore, for newborn patients admitted
to NICU such systems enable continuous monitoring
of seizure events and hence can provide better insight
into neurological development. In recent years signif-
icant effort has been placed on developing systems
that predict seizures in order to potentially counter
them with appropriately generated electrical stimuli.
To this purpose in this paper we examined possibility
of detecting seizures by measuring different distances
using STFT. To achieve this goal we define local de-
tectors using empirically determined parameters and
fuse their local decisions using our previously devel-
oped information fusion algorithm for seizure detec-
tion. We demonstrated the applicability of the pro-
posed algorithms using a real data set consisting of
multiple NICU patients and expert annotations.
Our results indicate that training techniques of-
fer better performance if adequate expert annotations
are available. An effort should be placed on ex-
amining possibility of using machine learning tech-
niques which would enable efficient management of
resources. Due to patient-to-patient variability we
BIOSIGNALS 2020 - 13th International Conference on Bio-inspired Systems and Signal Processing
346
expect that certain semi-supervised approach will
have to be used in order to adjust parameters of
the detection system to a particular patient as num-
ber of seizures may be insufficient in the beginning
stage immediately after admission to NICU. Nonethe-
less, we expect that semi-supervised machine learning
and/or deep learning techniques could provide ade-
quate seizure detection with acceptable error levels
once the feature reduction algorithm is optimally de-
signed.
REFERENCES
Absil, P.-A., Mahony, R., and Sepulchre, R. (2009). Opti-
mization algorithms on matrix manifolds. Princeton
University Press.
Barbaresco, F. (2008). Innovative tools for radar signal pro-
cessing based on Cartan’s geometry of SPD matrices
& information geometry. Radar Conference, 2008.
RADAR’08. IEEE, pages 1–6.
Celka, P. and Colditz, P. (2002). A computer-aided detec-
tion of EEG seizures in infants: asingular-spectrum
approach and performance comparison. IEEE Trans.
on biomedical engineering, 49(5):455–462.
Faul, S. e. a. (2005). An evaluation of automated neonatal
seizure detection methods. Clinical Neurophysiology,
116(7):1533–1541.
Gotman, J. e. a. (1997). Automatic seizure detection
in the newborn: methods and initial evaluation.
Electroencephalography and clinical neurophysiol-
ogy, 103(3):356–362.
Jahromi et al., M. (2015). Estimating Positive Definite Ma-
trices Using Frechet Mean. In Biosignals 2015.
Jeremic, A. and Nikolic, D. (2019). Detecting neonatal
seizures using sample covariance estimation. Proc.
Biosignals 2019, 4(1):225–230.
Li, Y. and Wong, K. M. (2013). Riemannian distances for
EEG signal classification by power spectral density.
IEEE journal of selected selected topics in signal pro-
cessing.
Liu, B., Jeremic, A., and Wong, K. (2007). Blind adaptive
algorithm for M-ary distributed detection. In IEEE In-
ternational Conference on Acoustics, Speech and Sig-
nal Processing, 2007. ICASSP 2007, volume 2.
Liu, B., Jeremic, A., and Wong, K. (2014). Optimal dis-
tributed detection of multiple hypotheses using blind
algorithm. IEEE Trand. on Aerospace and Electronic
Systems, 50:1190–1203.
Mirjalily, G. e. (2003). Blind adaptive decision fusion
for distributed detection. IEEE Transactions on
Aerospace and Electronic Systems, 39(1):34–52.
Moakher, M. (2005). A differential geometric approach
to the geometric mean of symmetric positive-definite
matrices. SIAM Journal on Matrix Analysis and Ap-
plications, 26(3):735–747.
Rankine, L., Stevenson, N., Mesbah, M., and Boashash,
B. (2007). A nonstationary model of newborn EEG.
IEEE Trans. on Biomed. Eng., 54(1):19–28.
Temko, A., Marnane, W., Boyland, G., and Lightbody, G.
(2015). A computer-aided detection of EEG seizures
in infants: asingular-spectrum approach and perfor-
mance comparison. Decision Support Systems, 70:86–
96.
Varshney, P. (1986). Optimal data fusion in multiple sen-
sor detection systems. IEEE Trans. on Aerospace and
Electronic Systems, pages 98–101.
Volpe, J. (2001). Neurology of the newborn. WB Saunders
Co.
Detecting Neonatal Seizures using Short Time Fourier Transform and Frechet Distance
347