Contactless Optical Detection of Nocturnal Respiratory Events

Belmin Ali

1 a

, Tim Zauber

, Chen Zhang

, Wang Liao

, Alina Wildenauer

, Noah Leosz

Torsten Eggert

, Sarah Dietz-Terjung

, Sivagurunathan Sutharsan

, Gerhard Weinreich

Christoph Sch

obel

, Gunther Notni

, Christian Wiede

and Karsten Seidl

1,2

Faculty of Engineering, University of Duisburg-Essen, Duisburg, Germany

Fraunhofer Institute for Microelectronic Circuits and Systems, Duisburg, Germany

Department of Mechanical Engineering, Ilmenau University of Technology, Ilmenau, Germany

Center for Sleep Medicine, University Hospital Essen, Essen, Germany

Department of Pneumology, University Hospital Essen, Essen, Germany

Keywords:

Contactless, Optical, Apnea, Hypopnea, Respiration, Sleep, OSA, AHI, Multi-Spectral, Data Fusion.

Abstract:

Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder characterized by the collapse of

the upper airway and associated with various diseases. For clinical diagnosis, a patient’s sleep is recorded dur-

ing the night via polysomnography (PSG) and evaluated the next day regarding nocturnal respiratory events.

The most prevalent events include obstructive apneas and hypopneas. In this paper, we introduce a fully auto-

matic contactless optical method for the detection of nocturnal respiratory events. The goal of this study is to

demonstrate how nocturnal respiratory events, such as apneas and hypopneas, can be autonomously detected

through the analysis of multi-spectral image data. This represents the ﬁrst step towards a fully automatic and

contactless diagnosis of OSA. We conducted a trial patient study in a sleep laboratory and evaluated our re-

sults in comparison with PSG, the gold standard in sleep diagnostics. In a study sample with three patients,

24 hours of recorded video materials and 245 respiratory events, we have achieved a classiﬁcation accuracy of

82 % with a random forest classiﬁer.

1 INTRODUCTION

Obstructive sleep apnea (OSA) is a common sleep-

related breathing disorder characterized by the col-

lapse of the upper airway, affecting approximately 30-

50 % of the male and 15-25 % of the female popula-

tion on moderate level (Rundo, 2019), (Heinzer et al.,

2015), (Weinreich et al., 2013). OSA can be further

distinguished into apnea and hypopnea, whereas ob-

structive apnea is deﬁned as a reduction in airﬂow of

more than 90 % relative to airﬂow baseline rate for

at least ten seconds, while maintaining respiratory ef-

fort. A hypopnea occurs if airﬂow falls by at least

30 % relative to baseline frequency for at least ten sec-

onds, with a desaturation of at least 3 % (Berry et al.,

2020). For clinical diagnosis of OSA, a patient’s sleep

is usually recorded via polysomnography (PSG) or

polygraphy (PG) in sleep laboratory settings.

The main parameter for deciding whether OSA is

present is the apnea-hypopnea index (AHI), which in-

https://orcid.org/0000-0002-2630-3945

dicates how many apneas and hypopneas a patient has

on average per hour of sleep. OSA exists either if the

AHI exceeds 15 or if daytime sleepiness is reported

in combination with an AHI ≥ 5 (Berry et al., 2020),

(Kapur et al., 2017). OSA manifests itself through

various symptoms and complaints, both at night and

during the day. Common symptoms include night-

time choking and snoring, daytime sleepiness, and

concentration problems (Rundo, 2019), (Kapur et al.,

2017). Several risk factors are associated with OSA,

such as age, male gender, high BMI, or substance

abuse. Likewise, OSA is linked to cardiovascular dis-

ease such as hypertonia, coronary atherosclerosis or

heart failure and other conditions like depression and

diabetes (Rundo, 2019), (Heinzer et al., 2015), (Wein-

reich et al., 2013). An association between OSA and

an increased risk of motor vehicle accidents due to

excessive daytime sleepiness points out to the need

for an accurate diagnosis and subsequent treatment

(Karimi et al., 2015).

PSG is still considered the gold standard in di-

agnosis of sleep-related breathing problems in sleep

336

Ali

c, B., Zauber, T., Zhang, C., Liao, W., Wildenauer, A., Leosz, N., Eggert, T., Dietz-Terjung, S., Sutharsan, S., Weinreich, G., Schöbel, C., Notni, G., Wiede, C. and Seidl, K.

Contactless Optical Detection of Nocturnal Respiratory Events.

DOI: 10.5220/0011694400003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

336-344

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

medicine, but this procedure usually leads to discom-

fort in patients and ultimately to an unnatural sleep

behavior and potentially biased results. In case of

strong nocturnal movements or heavy sweating, elec-

trodes and sensors are at risk of detaching from the

patient or of causing artifacts during recording. More-

over, the application of PSG is time consuming and

requires close body contact. The introduction of a

contactless alternative to PSG has the potential to re-

duce the discomfort in patients, the measurement ar-

tifacts due to detached electrodes and the bias in the

measurement results due to the lack of sensors on the

body. Furthermore, a contactless alternative would re-

quire less direct contact between the patient and the

medical staff, and hence, reduce the risk of a viral

transmission, as well as, the workload of the medical

staff.

In this work, we introduce a contactless optical

system for detecting nocturnal respiratory events. The

goal of this study is to demonstrate the ability to

identify obstructive apneas and hypopneas in the pa-

tient’s breathing pattern in order to enable a fully au-

tonomous and completely contactless OSA diagnosis

system.

2 STATE OF THE ART

Previous works dealing with the detection of noctur-

nal respiratory events can be categorized into the fol-

lowing groups: (1) neonatal apnea detection, such as

in (Cattani et al., 2014) and (Lorato et al., 2021); (2)

apnea detection via respiratory motion analysis, such

as in (Gederi and Clifford, 2012), (Abad et al., 2016),

(Akbarian et al., 2020) and (Geertsema et al., 2020);

(3) apnea detection via depth camera, such as in (Yang

et al., 2017) and (Veauthier et al., 2019); (4) apnea

detection via combined respiratory motion and ther-

mography analysis, such as in (Scebba et al., 2021);

and (5) respiration rate measurement, such as in: (Hu

et al., 2018), (Vogels et al., 2018) and (Gastel et al.,

2021).

The following points can be stated about the re-

search gap according to the results from the literature

overview: (1) the OSA diagnosis accuracy of the con-

tactless optical detection algorithms presented in pub-

lished literature needs to be increased further for clin-

ical practice. Taking into account the potentials and

advantages of contactless measurements during sleep,

as discussed in the introduction, the need for further

work in this area is even more evident; (2) very lim-

ited to no attention is given to the classiﬁcation be-

tween hypopnea and apnea; and (3) a strong focus

is given to respiratory motion analysis, whereas the

analysis of other biosignals, such as rPPG or nose and

mouth breathing thermography appears insufﬁciently

researched.

3 IMPLEMENTATION

3.1 Measurement System

For the acquisition of the video data, a multi-modal

measurement systems is used. The sensor head of the

measurement system is shown in Fig. 1 (Zhang et al.,

2020). It is composed of a real-time NIR 3D sensor

consisting of a NIR GOBO projector at 850 nm (Heist

et al., 2018) and two NIR high-speed cameras at the

same wavelength. This 3D sensor can reconstruct and

transmit 3D images with very low latency at a framer-

ate of 15 Hz using the stereo matching acceleration al-

gorithm BICOS (Dietrich et al., 2019) and the graph-

ics processing unit NVIDIA RTX 2080. The real-time

3D data enable the analysis of patients’ head motion.

Next to the 3D sensor, two NIR cameras at 780 nm

and 940 nm as well as a thermal camera are mounted

into the sensor head and synchronized with the 3D

sensor for the estimation of vital signs from temporal

variations of skin reﬂectance and temperature. For the

NIR video acquisition, an LED array consisting of a

780 nm LED and three 940 nm LED is used to pro-

vide an irritation-free active illumination that does not

disturb sleeping patients in the night. Furthermore, a

color camera is reserved but not used in this night ap-

plication. The measurement setup in the sleep labora-

tory in the University Hospital Essen is shown in Fig.

2. The sensor head is placed perpendicularly to the

pillow at a 150 cm distance from the mattress.

3.2 Data Processing Chain

An overview of all steps in the data processing chain

is shown in Fig. 3. The data processing chain starts

with the three image streams from the two band-pass

ﬁltered (central wavelengths 780 nm and 940 nm,

FWHM = 10 nm) monochromatic cameras and the

FIR camera. Fig. 4 illustrates the extraction of dif-

ferent vital signals from multimodal 3D video data.

In the video data preprocessing, all 2D and 3D multi-

modal images are ﬁrstly registered with each other via

image transformations. Then, the approach proposed

in (Zhang et al., 2020) is implemented with some

adaptations. In the ﬁrst video frame the face region

and a large set of facial landmarks are detected in the

2D image at 780 nm using the library MediaPipe (Lu-

garesi et al., 2019). Based on the detected facial land-

marks, the eye and mouth regions are painted in black

Contactless Optical Detection of Nocturnal Respiratory Events

337

Figure 1: Sensor head composed of a GOBO projector (1),

two NIR cameras at 850 nm (2, 3), a NIR camera at 780

nm (4), a NIR camera at 940 nm (5), a thermal camera (6),

an LED array with LEDs at 780 nm and 940 nm (7), and a

reserved color camera (8).

Figure 2: Measurement setup in the sleep laboratory.

for video anonymization. Besides, different regions

of interest (ROI) are determined on the forehead, eye

corner, and below the nostrils and then transformed

into the 3D image, as shown in Fig. 4. From the sec-

ond frame, the current 3D face pose is modelled as

rigid body transformation and estimated from the 3D

locations of the facial landmarks tracked in the current

frame and their 3D locations in the ﬁrst frame. Us-

ing the estimated 3D face pose, the 3D ROIs created

in the ﬁrst frame are transformed into the 3D coordi-

nate system of the current frame. This ensures that

the ROIs in different frames always refer to the same

skin areas. Furthermore, the 3D face poses estimated

in different frames are used for motion analysis. If

strong head movements are detected, the face track-

ing is restarted.

The transformed 3D ROIs are then located in 2D

images at 780 nm and 940 nm as well as thermal im-

ages. The forehead ROI is projected onto the NIR

images, and the eye corner and nostrils ROIs are pro-

jected onto the thermal images. From the forehead

ROI, rPPG signals are extracted at both wavelengths

using the approach in (Zhang et al., 2020) based on

Eulerian video magniﬁcation (Wu et al., 2012). The

body temperature is extracted from the eye corner

ROI, and the respiratory temperature signal is ob-

tained from the nostrils ROI. Due to the lack of a

reference measurement in this study, the temperature

signal extracted from the eye corner ROI is omitted in

the continuation of this study.

After obtaining the computed rPPG and FIR time-

domain signals, the useful signal sequences have to be

extracted. The ﬁrst step in this phase is to do a time

synchronization of the measured data and the refer-

ence data. The second step is to identify all times-

tamps where respiratory events occurred and check

whether both the measured and the reference data are

available in the time of the event. If either the mea-

sured or the reference data is not available for the

whole duration of the event, then this sequence is ex-

cluded.

The time-domain signal preprocessing starts with

detrending and normalizing the three measured time-

series signals. Secondly, the time-series are band-pass

ﬁltered with a ﬁxed lower cut-off frequency of 0.0667

Hz. The lower cut-off frequency is selected due to

the ﬁndings on the analysis of the breathing process

in a PPG signal in (Charlton et al., 2016). The upper

cut-off frequency is varied among 0.8, 1.0 and 1.5 Hz,

the highest frequency range for which respiratory fea-

tures are expected. Short time-series sequences of an

FIR signal and of a 780 nm rPPG signal with different

upper cut-off frequencies are shown in Fig. 5 and Fig.

The last step in the preprocessing stage is to iden-

tify the respiration events in the data. During this step,

it is observed that the duration of an event is in the

range between 10 and 70 seconds. Due to the dissimi-

larity in length, events that are longer than 10 seconds

are divided into 10 second-long snips. These event-

snips are used for further data processing. Since nor-

mal breathing patterns are also required for the fur-

ther analysis, an approximately equal number of snips

from normal respiration is obtained.

For the next phase in the data processing chain,

the event-snips from both the raw and the band-pass

ﬁltered time-series ( f

c,lower

= 0.0667 Hz, f

c,upper

1.0 Hz) are forwarded. The remaining three phases

in the data processing chain are elaborated in separate

subsections.

3.3 Data Fusion and Feature Extraction

Data fusion and feature extraction are the two key as-

pects of this work. We aim to take advantage of the

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

338

Figure 3: Overview of the steps in the data processing chain.

Figure 4: Extraction of rPPG signals and temperature (FIR)

signals from multi-modal 3D video data.

information contained in the multi-spectral data and

fuse these in a data fusion stage in order to enhance

the detection of respiratory events. This is accom-

plished by simultaneously feeding the multi-spectral

time-series signals into the feature extraction.

In the feature extraction stage, the individual fea-

tures of each time-series, but also the correlation be-

tween them are analyzed in order to ﬁnd the best per-

forming features. The goal of the feature extraction

stage is to determine independent signal attributes

which hold information on the respiratory behavior

of the patient. The determination of the features to

be extracted from the multi-spectral time-series sig-

nals is done in the followings ways: (1) through dis-

cussions with sleep physicians and lung physicians

on characteristic patterns and expected waveforms of

respiratory signals; (2) through statistical and sig-

nal analysis methods; (3) through manual screening

of the data and recognition of patterns and features;

(4) through overview of published literature and ﬁ-

nally (5) through feature evaluation algorithms (as

presented in subsection 3.4).

A ﬁrst evaluation of the usefulness of a feature is

performed with histograms, in order to analyze the

distribution of a certain feature among normal respi-

ration and respiratory events. Due to space considera-

tions, only two histograms are presented in Figures 7

and 8. A total of 50 individual features are extracted

from all three spectral signals (780 nm rPPG, 940 nm

rPPG and FIR), both for the raw and for the band-

pass ﬁltered signals. Due to space considerations, not

all features can be listed in this paper.

3.4 Feature Selection

The feature selection is conducted with the Sequen-

tial Forward Selection (SFS) and the Sequential Back-

wards Selection (SBS) algorithms (Jain, 1997). In or-

der to evaluate the selection process, a random forest

classiﬁer is used. All features mentioned in 3.3 are

calculated for all three time-series signals (both for

the raw and band-pass ﬁltered version) and entered

into the feature selection stage. The goal of the feature

selection stage is to make a ranking of features ac-

cording to their contribution to the classiﬁcation task

at hand. The results of the feature selection stage are

discussed in section 4.2.

3.5 Classiﬁcation

Two types of classiﬁcation are to be conducted: (1) a

two-class classiﬁcation, where it is differentiated be-

tween normal and abnormal (apneas and hypopneas

as one class) respiration; and (2) three-class classiﬁ-

cation, where it is differentiated between normal res-

piration, apneas and hypopneas. Several classiﬁers

based on machine learning models are to be build

and trained for both classiﬁcation tasks. The included

Contactless Optical Detection of Nocturnal Respiratory Events

339

Figure 5: Time-series of FIR signal with ﬁxed lower cut-off

and varied upper cut-off frequencies. From top to bottom:

(1) raw signal; (2) f

c,upper

= 1.5 Hz; (3) f

c,upper

= 1.0 Hz;

and (4) f

c,upper

= 0.8 Hz.

Figure 6: Time-series of 780 nm rPPG signal with ﬁxed

lower cut-off and varied upper cut-off frequencies. From

top to bottom: (1) raw signal; (2) f

c,upper

= 1.5 Hz; (3)

c,upper

= 1.0 Hz; and (4) f

c,upper

= 0.8 Hz. The scale on

the y-axis is average gray-scale pixel value (AGPV).

classiﬁers are: decision tree classiﬁer, random for-

est classiﬁer, naive Bayes classiﬁer, linear regression

analysis, quadratic regression analysis and support

vector machine. For every, a 5-fold cross-validation

is to be implemented. Furthermore, several iterations

are to be conducted for each model by using a differ-

ent number of features in order to evaluate the per-

formance of the classiﬁers with respect to the model

complexity. The features are ranked in the feature

selection stage and groups of features are formed by

starting from the best ranked feature downwards. The

ranking of the best performing features is presented in

subsection 4.2.

4 RESULTS

4.1 Study Sample

Three patients are included in this trial study. All

three have been transferred to the sleep laboratory

of the University Hospital Essen because of a sus-

pected OSA and this was their initial diagnosis mea-

surement. All three measurements resulted in a diag-

nosed OSA with two patients having a mild to moder-

ate and one patient having a moderate to severe case

of OSA. The severeness scaling of OSA is done ac-

cording to the American Academy of Sleep Medicine

(AASM) (Berry et al., 2020). The reference measure-

ments in our study are evaluated semi-automatically.

This means that the evaluation is ﬁrstly done auto-

matically by the Noxturnal software from ResMed

Inc and then checked and corrected by experienced

sleep physicians according to the standard provided

by AASM.

All three patients spent one night in the sleep labo-

ratory. A parallel measurement with our multi-modal

measurement device is performed with each patient.

The summed sleep time equals to 14.53 hours, out of

which 10.11 hours are successfully measured with our

measurement device (ca. 70 %). The remaining 30 %

were not obtained due to one of the following reasons:

(1) patient out of view to the measurement device; (2)

patient rotated completely to the left or to the right

side; (3) short video intervals excluded due to move-

ment artifacts; or (4) hand is covering one or more

of the ROIs. According to the reference evaluations

of the PSG, a total of 102 apneas and 143 hypopneas

are registered in the whole sleeping duration. In the

time periods which are successfully obtained by the

measurement device, a total of 67 apneas and 99 hy-

popneas are registered. An overview of relevant pa-

tient information, sleep parameters and measurement

parameters is given in Table 1. The sleep param-

eters include the AHI, the obstructive AHI (oAHI),

ODI and the number of apneas and hypopneas. The

average, standard deviation (STD) and sum are pro-

vided for feasible parameters. The total duration of

the recorded video data is 24 hours (each night be-

tween 10 PM and 6 AM).

This study is approved by the Ethics Committee

of the Faculty of Medicine, University of Duisburg-

Essen (approval no. 21-10312-BO).

4.2 Classiﬁcation Results

A summary of the model accuracy compared to the

number of features for different classiﬁers is shown

in Fig. 9 for the two-class classiﬁcation and in Fig.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

340

Table 1: Overview of the patient sample included in the study with their personal information and relevant sleep and mea-

surement parameters.

Patient

No.

Gender

Age

(years)

Height

(cm)

Weight

(kg)

BMI

Total

sleep

time(h)

Measured

sleep

time(h)

AHI oAHI ODI

Total

obstr.

apneas

Measured

obstr.

apneas

Total

obstr.

hypop.

Measured

obstr.

hypop.

1 male 27 188 105 29.70 5.45 4.86 14.50 14.50 9.40 54 40 25 21

2 male 48 180 98 30.20 2.85 1.64 11.93 11.93 33.70 4 2 30 15

3 female 51 172 85 28.70 6.23 3.61 29.00 21.18 34.60 44 25 88 63

Average 42.00 180.00 96.00 29.53 4.84 3.37 18.47 15.87 25.90 34.00 22.33 47.67 33.00

STD 13.00 8.00 10.15 5.35 1.77 1.62 9.21 4.77 14.30 26.46 19.14 35.02 26.15

Sum 14.53 10.11 102 67 143 99

Figure 7: Histogram of number of positive turning points

for the ﬁltered 780 nm rPPG signal. The curve in beige

shows normal respiration, the curve in black shows apneas,

and the curve in violet shows hypopneas.

Figure 8: Histogram of the peak-to-peak distance of the FIR

signal. The curve in beige shows normal respiration, the

curve in black shows apneas, and the curve in violet shows

hypopneas.

10 for the three-class classiﬁcation. The best accu-

racy for the two-class classiﬁcation problem is 82 %

and it is reached with the 15 best ranked features and

the random forest classiﬁer. Overall, it can be ob-

served that the random forest classiﬁer provided the

highest classiﬁcation accuracy for any number of fea-

tures. The random forest classiﬁer obtained an accu-

racy of 80 % with only 2 features. This is followed

by a slight increase in accuracy until 15 features are

reached. The accuracy stayed approximately the same

after 15 features. A similar behavior, with slightly

lower accuracy is observed for all other classiﬁers ex-

cept for the decision tree classiﬁer. The best accuracy

for the three-class classiﬁcation problem is 74.8 %

and it is reached with the 15 best ranked features and

the linear regression analysis. For a low number (up to

ﬁve) of features, the random forest classiﬁer is again

the most accurate model. However, the linear regres-

sion analysis outperformed the random forest classi-

ﬁer in every other iteration with a feature count higher

than ﬁve.

For both classiﬁcation tasks, a convergence of the

accuracy is observed after 15 features. Iterations are

continued until 25 features are reached and then ter-

minated after no increase in accuracy is detected. The

top ranked features by the SFS algorithm are: (1)

peak-to-peak distance in the FIR signal; (2) positive

turning points of the ﬁltered 780 nm and 940 nm rPPG

signals; (3) absolute signal energy of the ﬁltered and

unﬁltered FIR signal; (4) skewness of the ﬁltered 780

nm rPPG signal; and (5) mean absolute deviation of

the ﬁltered FIR signal.

The peak-to-peak distance is the distance between

the global maximum and global minimum. As ex-

pected, the value of this feature for the FIR signal

tends to be lower for respiratory events compared to

normal respiration, since the breathing amplitude is

lower and hence, the temperature variations on the

ROI in the nasal area are lower as well. This behavior

is shown in Fig. 8. The trend found with the number

of positive turning points shows lower values for nor-

mal respiration compared to respiratory events. This

behavior is shown in Fig. 7. The assumed, however

not veriﬁed, reason for this is the lower SNR for res-

piratory event signals due to lower amplitudes.

The behavior of the absolute signal energy feature

is as expected and tends to be higher for normal res-

piration compared to respiratory events. This expec-

tation is based on the assumption that the overall area

under the curve is higher for normal respiration com-

pared to respiratory events. Skewness is the measure

of signal asymmetry. In our analysis, the skewness

in the ﬁltered 780 nm rPPG signal tends to be higher

for normal breathing patterns compared to respiratory

events. The mean absolute deviation is a measure of

the average absolute deviation from the mean. In case

of the FIR signal, the expected behavior is that the

value for this feature is higher for normal respiration,

due to overall higher amplitudes.

Contactless Optical Detection of Nocturnal Respiratory Events

341

Figure 9: Accuracy of the two-class classiﬁcation problem

with respect to different number of features and different

classiﬁcation models. The features are ranked and selected

by the SFS algorithm.

Figure 10: Accuracy of the three-class classiﬁcation prob-

lem with respect to different number of features and dif-

ferent classiﬁcation models. The features are ranked and

selected by the SFS algorithm.

5 DISCUSSION

In the scope of the study, interdisciplinary discus-

sions between sleep physicians and engineers are con-

ducted in order to understand the medical and practi-

cal requirements for an automated tool which will aid

the OSA diagnosis process. We concluded that the

tool needs to be able to classify between the stages of

OSA correctly. This means that a very precise estima-

tion of the AHI score and each individual respiratory

event is not essential and beyond the practical require-

ments for the diagnosis process. For more practical

context, we obtained a study (Cachada et al., 2017),

which evaluates the detection of respiratory events in

the Noxturnal software on a sample of 120 patients.

The evaluation in this study is based on the average

number of respiratory events for the entire duration of

sleep and not on the detection of individual events.

The presented results in (Cachada et al., 2017) for

obstructive apneas are 38.95 ± 24.53

events

hour

for Nox-

turnal and 36.92 ± 25.86

events

hour

for manual evalua-

tion, resulting in a Pearson correlation of 0.954. For

hypopneas they are 19.35 ± 12.84

events

hour

for Noxtur-

nal and 25.87 ± 17.31

events

hour

for manual evaluation,

resulting in a Pearson correlation of 0.84. A study

where the performance of the software-based detec-

tion of individual respiratory event is evaluated was

not found. This means that a direct comparison with

our results is not possible. Nevertheless, the results

obtained in our trial study on a small number of pa-

tients are very promising and an encouragement to

conduct more patient measurements in order to gain

a larger dataset and further develop and improve our

models. The currently achieved accuracy proves to be

satisfactory and in line with the practical requirements

for sleep diagnostics. Further comparisons with the

accuracy of currently available respiratory event de-

tectors (such as the Noxturnal software) will be per-

formed after obtaining a larger patient dataset.

During the feature selection and classiﬁcation we

observed that features from all spectral ranges con-

tribute to the classiﬁcation process and enhance the

accuracy compared to using only images from a sin-

gle spectrum. This supports our premise that the use

of multi-spectral image data will positively inﬂuence

the model accuracy. The highest share among the best

ranked features comes from the FIR signal. Although

the share of 780 nm rPPG and 940 nm rPPG features

is lower, they are still present in the best ranked fea-

tures and therefore signiﬁcantly contribute to the clas-

siﬁcation accuracy.

6 CONCLUSION

In this work, we have introduced a novel contact-

less optical method for detecting nocturnal respiratory

events. The method is tested on a sample of three pa-

tients and 245 respiratory events and resulted in a two-

class classiﬁcation accuracy of 82 % with the random

forest classiﬁer and a three-class classiﬁcation accu-

racy of 74.8 % with a linear regression analysis. An

unexpected early convergence in accuracy after only

15 top-ranked features is observed. Further investiga-

tions are required in order to understand the early ac-

curacy convergence and to analyze whether the inclu-

sion of new features (e.g. demographic patient data)

will increase the classiﬁcation accuracy.

This study is the ﬁrst step towards a fully au-

tonomous contactless optical diagnosis of OSA. The

results are highly promising and further improve-

ments in model accuracy and robustness are expected

after obtaining data from more patients. Further pa-

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

342

tient measurements in the sleep laboratory are already

planed. After obtaining these, we will add the ﬁnal

step in the data processing chain, which is the auto-

matic diagnosis of the OSA stage. Due to the limited

number of patients in this trial study we have decided

not to include this stage, but rather focus on develop-

ing an efﬁcient detection of respiratory events.

ACKNOWLEDGEMENTS

This work is funded through a research grant (No.: SE

3160/4-1 675251) from the German Research Foun-

dation (DFG).

REFERENCES

Abad, J., Mu

noz-Ferrer, A., Prada, M., L

opez, C., Marin,

A., Mart

ınez-Rivera, C., Morera-Prat, J., and Ruiz-

Manzano, J. (2016). Automatic video analysis for ob-

structive sleep apnea syndrome diagnosis. Sleep, 39.

Akbarian, S., Montazeri Ghahjaverestan, N., Yadollahi, A.,

and Taati, B. (2020). Distinguishing obstructive ver-

sus central apneas in infrared video of sleep using

deep learning: Validation study. 22(5).

Berry, R., Quan, S., Abreu, A., and et al. (2020). . The

AASM Manual for the Scoring of Sleep and Associ-

ated Events: Rules, Terminology and Technical Spec-

iﬁcations, Version 2.6. American Academy of Sleep

Medicine.

Cachada, N., Thomas, M., and Wharton, S. (2017). Com-

parison of manual and automatic scoring of limited

channel sleep studies:noxturnal software correlates

well with manual scoring in severe osa. page PA2301.

Cattani, L., Alinovi, D., Ferrari, G., Raheli, R., Pavlidis, E.,

Spagnoli, C., and Pisani, F. (2014). A wire-free, non-

invasive, low-cost video processing-based approach to

neonatal apnoea detection. pages 67–73.

Charlton, P. H., Bonnici, T., Tarassenko, L., Clifton, D. A.,

Beale, R., and Watkinson, P. J. (2016). An assessment

of algorithms to estimate respiratory rate from the

electrocardiogram and photoplethysmogram. Physi-

ological measurement, 37:610–626.

Dietrich, P., Heist, S., Landmann, M., K

uhmstedt, P.,

and Notni, G. (2019). Bicos—an algorithm for fast

real-time correspondence search for statistical pattern

projection-based active stereo sensors. Applied Sci-

ences 2019, Vol. 9, Page 3330, 9:3330.

Gastel, M. V., Stuijk, S., Overeem, S., Dijk, J. P. V., Gilst,

M. M. V., and Haan, G. D. (2021). Camera-based vi-

tal signs monitoring during sleep - a proof of concept

study. IEEE journal of biomedical and health infor-

matics, 25:1409–1418.

Gederi, E. and Clifford, G. D. (2012). Fusion of image and

signal processing for the detection of obstructive sleep

apnea. pages 890–893.

Geertsema, E., Visser, G., Sander, L., and Kalitzin, S.

(2020). Automated non-contact detection of central

apneas using video. Biomedical Signal Processing

and Control, 55:101658.

Heinzer, R., Vat, S., Marques-Vidal, P., Marti-Soler, H., An-

dries, D., Tobback, N., Mooser, V., Preisig, M., Mal-

hotra, A., Waeber, G., Vollenweider, P., Tafti, M., and

Haba-Rubio, J. (2015). Prevalence of sleep-disordered

breathing in the general population: the hypnolaus

study. The Lancet. Respiratory medicine, 3:310–318.

Heist, S., Dietrich, P., Landmann, M., K

uhmstedt, P., Notni,

G., and T

unnermann, A. (2018). Gobo projection

for 3d measurements at highest frame rates: a perfor-

mance analysis. Light: Science and Applications 2018

7:1, 7:1–13.

Hu, M., Zhai, G., Li, D., Fan, Y., Duan, H., Zhu, W., and

Yang, X. (2018). Combination of near-infrared and

thermal imaging techniques for the remote and simul-

taneous measurements of breathing and heart rates un-

der sleep situation. Plos One.

Jain, A. (1997). Feature selection: evaluation, application,

and small sample performance. IEEE Transactions on

Pattern Analysis and Machine Intelligence, 19:153–

158.

Kapur, V. K., Auckley, D. H., Chowdhuri, S., Kuhlmann,

D. C., Mehra, R., Ramar, K., and Harrod, C. G.

(2017). Clinical practice guideline for diagnostic test-

ing for adult obstructive sleep apnea: An american

academy of sleep medicine clinical practice guideline.

Journal of Clinical Sleep Medicine, 13:479–504.

Karimi, M., Hedner, J., H

abel, H., Nerman, O., and Grote,

L. (2015). Sleep apnea related risk of motor vehi-

cle accidents is reduced by continuous positive airway

pressure: Swedish trafﬁc accident registry data. Sleep,

38:341–349.

Lorato, I., Stuijk, S., Meftah, M., Kommers, D., An-

driessen, P., van Pul, C., and de Haan, G. (2021). Au-

tomatic separation of respiratory ﬂow from motion in

thermal videos for infant apnea detection. Sensors,

21(18).

Lugaresi, C., Tang, J., Nash, H., Mcclanahan, C., Uboweja,

E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G.,

Lee, J., Chang, W.-T., Hua, W., Georg, M., Grund-

mann, M., and Research, G. (2019). Mediapipe: A

framework for building perception pipelines.

Rundo, J. V. (2019). Obstructive sleep apnea basics. Cleve-

land Clinic journal of medicine, 86:2–9.

Scebba, G., Da Poian, G., and Karlen, W. (2021). Mul-

tispectral video fusion for non-contact monitoring of

respiratory rate and apnea. IEEE Transactions on

Biomedical Engineering, 68(1):350–359.

Veauthier, C., Ryczewski, J., Mansow-Model, S., Otte, K.,

Kayser, B., Glos, M., Sch

obel, C., Paul, F., Brandt,

A., and Penzel, T. (2019). Contactless recording of

sleep apnea and periodic leg movements by nocturnal

3-d-video and subsequent visual perceptive comput-

ing. Scientiﬁc Reports, 9:16812.

Vogels, T., Gastel, M. V., Wang, W., and Haan, G. D.

(2018). Fully-automatic camera-based pulse-oximetry

during sleep. IEEE Computer Society Conference on

Contactless Optical Detection of Nocturnal Respiratory Events

343

Computer Vision and Pattern Recognition Workshops,

2018-June:1430–1438.

Weinreich, G., Wessendorf, T. E., Erdmann, T., Moebus,

S., Dragano, N., Lehmann, N., Stang, A., Roggen-

buck, U., Bauer, M., J

ockel, K. H., Erbel, R., Teschler,

H., and M

ohlenkamp, S. (2013). Association of

obstructive sleep apnoea with subclinical coronary

atherosclerosis. Atherosclerosis, 231:191–197.

Wu, H. Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F.,

and Freeman, W. (2012). Eulerian video magniﬁca-

tion for revealing subtle changes in the world. ACM

Transactions on Graphics (TOG), 31.

Yang, C., Cheung, G., Stankovic, V., Chan, K., and Ono,

N. (2017). Sleep apnea detection via depth video and

audio feature learning. IEEE Transactions on Multi-

media, 19(4):822–835.

Zhang, C., Gebhart, I., K

uhmstedt, P., Rosenberger, M.,

and Notni, G. (2020). Enhanced contactless vital sign

estimation from real-time multimodal 3d image data.

Journal of Imaging 2020, Vol. 6, Page 123, 6:123.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

344