GAN-Based Data Augmentation for Improving Biometric Authentication

Using CWT Images of Blood Flow Sounds

Natasha Sahare

1,2

, Patricio Fuentealba

3 a

, Rutuja Salvi

, Anja Burmann

1 b

and Jasmin Henze

1 c

Fraunhofer Institute for Software and Systems Engineering ISST, Dortmund, Germany

Technical University of Dortmund, Dortmund, Germany

Instituto de Electricidad y Electr

onica, Facultad de Ciencias de la Ingenier

ıa,

Universidad Austral de Chile, Valdivia, Chile

IDTM GmbH, Recklinghausen, Germany

Keywords:

Data Augmentation, Generative Adversarial Networks, Continuous Wavelet Transform, Convolutional Neural

Networks, Blood Flow Sounds, Biometry.

Abstract:

Biometric identiﬁcation allows to secure sensitive information. Since existing biometric traits, such as ﬁnger-

prings, voice, etc. are associated with different limitations, we exempliﬁed the potential of blood ﬂow sounds

for biometric authentication in previous work. Therefore, we used measurements from seven different users

acquired with a custom-built auscultation device to calculate the spectrograms of these signals for each cardiac

cycle using continuous wavelet transform (CWT). The resulting spectral images were then used for training of

a convolutional neural network (CNN). In this work, we repeated the same experiment with data from twelve

users by adding more data from the original seven users and data from ﬁve more users. This lead to an im-

balanced dataset, where the amount of available data for the new users was much smaller, e.g., U1 had more

than 900 samples per side whereas the new user U9 had less than 100 samples per side. We experienced a

lower performance for the new users, i.e. their sensitivity was 18-21% lower than the overall accuracy. Thus,

we examined whether the augmentation of data leads to better results. This analysis was performed using

generative adversarial networks (GANs). The newly generated data was then used for training of a CNN with

several different settings, revealing the potential of GAN-based data augmentation for increasing the accuracy

of biometric authentication using blood ﬂow sounds.

1 INTRODUCTION

Biometric identiﬁcation systems are used to provide

security to data, as biometric data of an individual

is unique (Babiker et al., 2017). Existing biomet-

ric traits, such as ﬁngerprints, voice, face, iris, gait,

signature and handwriting are associated with several

limitations and drawbacks including susceptibility to

forgery, lifelong persistence issues and sensitivity to

external environmental conditions. The search for al-

ternative biometric characteristics led to the discovery

of electrocardiogram (ECG) signals and heart sounds,

which offer novel biometric information. While ECG

signals present unique advantages such as resistance

https://orcid.org/0000-0002-7119-0580

https://orcid.org/0000-0002-6989-1230

https://orcid.org/0000-0001-7180-2578

to tampering and forging, they require a complex

setup with electrodes and are relied on electronic

stethoscope for heart sound recordings. The explo-

ration of these biological characteristics aims to over-

come the shortcomings of traditional biometric meth-

ods and enhance the reliability of biometric sensing

tools. (Salvi and et al., 2021b)

In previous works, we demonstrated the potential

of blood ﬂow sounds from the carotid arteries, ac-

quired by a custom-built auscultation device, for per-

son identiﬁcation (Salvi and et al., 2021b), (Henze

and et al., 2022). Therefore, we analysed the sound

signals in the frequency domain using continuous

wavelet transform (CWT), assuming that the spectral

energies in the blood ﬂow sound contain signiﬁcant

information for the identiﬁcation of an individual. We

then trained two simple Convolutional Neural Net-

works (CNN) on the CWT images from those mea-

340

Sahare, N., Fuentealba, P., Salvi, R., Burmann, A. and Henze, J.

GAN-Based Data Augmentation for Improving Biometric Authentication Using CWT Images of Blood Flow Sounds.

DOI: 10.5220/0012318100003657

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2024) - Volume 2, pages 340-345

ISBN: 978-989-758-688-0; ISSN: 2184-4305

surements, separately for measurements from each

side. Therefore, we chose each heart cycle within the

measurements as a single sample for training and test-

ing. Irrespective of the side on which the measure-

ments were taken, this approach achieved an over-

all accuracy of over 95% for identifying 881 samples

from seven users. The confusion matrices for the ex-

periments on both sides are shown in Figure 1. As can

be seen, the sensitivities for each single user (marked

in dark blue) vary between 0.91 and 0.97 for the ex-

periment on the left side data and between 0.94 and

0.97 on the right side data.

In this work, we add more measurements from the

same seven users but also from ﬁve additional users to

further investigate the potential of blood ﬂow sounds

for biometric authentication, leading to a dataset of

1,765 samples in total. With retraining the CNN on

the new dataset, it achieves an overall accuracy of

over 87% with sensitivity values of over 90% for most

users. However, it clearly drops for users that con-

tributed with fewer samples to the dataset. Thus, we

investigate data augmentation techniques such as gen-

erative adversarial networks (GANs) and conditional

adversarial networks (CGANs) to research whether

the model’s accuracy enhances by adding synthetic

data to the real data.

As a ﬁrst step, we generate synthetic images us-

ing both approaches and compare the results on a

small subset of three users. An investigation using the

echet inception distances (FID) shows that the im-

ages generated by the GANs are more similar to the

original images and have smaller FID than the ones

generated by the CGAN. Matching that, the classiﬁ-

cation results using the GANs-generated samples are

slightly better than those using the CGAN-generated

samples. However, training of a separate GAN for

each of the labels takes signiﬁcantly longer than train-

ing one CGAN for all labels. Since the improvement

in classiﬁcation is very small, we therefore continue

the experiments on the whole dataset with the CGAN.

Strategic use of 30% CGAN-generated data and 70%

real data yields the best results, improving sensitivity

for all users and speciﬁc labels by 6-10%. This study

highlights the trade-off between reliability and train-

ing time for GANs and CGANs and shows the poten-

tial beneﬁts of synthetic data augmentation in limited

data domains.

2 MATERIAL AND METHODS

2.1 Data

The employed data includes 1,765 carotid sound

recordings sampled at 16 kHz acquired by a custom-

built audio auscultation device (Salvi and et al.,

2021a), (S

uhn and et al., 2020). Each recording con-

sists of 11 s in length and was collected between De-

cember 2020 and April 2022 from twelve users (U1-

U12). As shown in Table 1, the number of signals

acquired from the left and right carotid arteries are

overall balanced. They are analysed independently

for each side.

Figure 1: Confusion matrices from previous experiments

showing the results from CNNs trained on blood ﬂow

sounds from seven people for differentiation between the

individuals. Separately trained on data from the left side

(top) and right side (bottom). Sensitivities for each single

user are shown on the diagonal.

All signals were recorded under controlled cessa-

tion of breathing (apnea) to avoid potential noise gen-

erated from breathing episodes. To evaluate the sig-

nal quality, we visually examined the presence of S1

and S2 episodes, the main sounds produced from the

mechanical contraction (systole) and relaxation (dias-

tole) of the ventricles. As a result of this evaluation,

we included 1,674 signals considered as good quality

into further processing.

GAN-Based Data Augmentation for Improving Biometric Authentication Using CWT Images of Blood Flow Sounds

341

Table 1: Number of CWT images for each user in the

dataset. Includes more data for users U1-U7 that were al-

ready part of previous work and additional data for U7-U12

that were not included in previous work.

User ID Left Right

U1 907 921

U2 852 855

U3 965 1015

U4 1139 1149

U5 1150 1121

U6 1023 1011

U7 337 330

U8 270 270

U9 96 80

U10 181 192

U11 138 143

U12 1103 1086

2.2 Data Preparation and Classiﬁcation

After gathering all the signals, we used a discrete

wavelet transform to automatically detect swallowing

and coughing artifacts within the signal, as presented

in (Fuentealba and et al., 2021). Next, we performed a

spectral analysis to look at the signal properties based

on CWT. The spectral dynamics for each cardiac cy-

cle were then independently examined using the seg-

mentation function for phonocardiogram recordings

proposed by (Springer et al., 2016). This tool uses a

duration-dependent logistic regression-based Hidden

Markov model to pinpoint S1, systole, S2, and dias-

tole episodes. Following the time domain segmenta-

tion of the signal, the associated CWT spectrum was

segmented in accordance. Note that the completed

spectral analysis included the frequency range from 0

to fs/2 = 8 kHz.

We created a CNN with three convolutional lay-

ers with max pooling followed by two fully connected

layers for the categorization of the prepared CWT im-

ages. Rectiﬁed linear activation functions are applied

to all levels. The CNN determines a score for each

of the available classes for each input sample, and it

outputs the class with the highest score in a stratiﬁed

5-fold cross-validation with 10 repetitions. The hy-

perparameters in this setting were: a learning rate of

0.001, a batch size of 32 and 10 training epochs. This

CNN attained an overall accuracy of over 87% for all

the 12 users. The same neural network has been used

from the previous study with seven users (Salvi and

et al., 2021b) by updating the output layer, this time

with more units, as it has to return scores for 12 in-

stead of seven users.

Figure 2: Confusion matrices from this work’s experi-

ments showing the results from CNNs trained on blood ﬂow

sounds from twelve people (including seven from previous

work) for differentiation between the individuals. Sepa-

rately trained on data from the left side (top) and right side

(bottom). Sensitivities for each single user are shown on

the diagonal. Baseline result, does not include any synthetic

data.

2.3 Data Augmentation

We experienced a worse performance of our classi-

ﬁcation model for twelve users, particularly for the

new users who have comparatively less data than oth-

ers, as shown in Table 1. The confusion matrices for

the experiments on both sides are shown in Figure 2.

As can be seen, the sensitivities for each single user

(marked in dark blue) vary between 0.61 and 0.96 for

the experiment on the left side data and between 0.66

and 0.95 on the right side data. For some of the new

users, such as U9 and U11, the sensitivity is 18-21%

less than the overall accuracy. These results make us

conclude that the amount of data plays an important

role in the accuracy of the model prediction. Rather

HEALTHINF 2024 - 17th International Conference on Health Informatics

342

than asking the users to record more data, we investi-

gated data augmentation techniques.

Popular data augmentation techniques include

scaling, cropping, ﬂipping, rotating, etc. (Shorten and

Khoshgoftaar, 2019). As the CWT plots the spectro-

grams on the time and frequency domain, the x-axis

represents time, and the y-axis gives the frequency

(Addison, 2018). So, ﬂipping or rotating would com-

pletely change the values of the spectrogram. There-

fore, we used generative models, which can be used to

generate new examples that plausibly could have been

drawn from the original dataset. Generative adversar-

ial networks (GANs) are widely used for producing

clear and discrete synthetic outputs (Goodfellow and

et al., 2014).

A Generative Adversarial Network (GAN) is com-

prised of two key components: the generator model

and the discriminator model. The generator is tasked

with creating new synthetic data, while the discrim-

inator focuses on distinguishing between actual and

generated synthetic data. The efﬁcacy of a GAN

hinges on training both models concurrently. Ini-

tially, the generator produces lower-quality data, but

with ongoing training, it progressively enhances its

capacity to craft more realistic data. Conversely,

the discriminator begins by effortlessly discerning

real and synthetic data apart. As it undergoes

training, it eventually reaches a point where distin-

guishing between real and synthetic data becomes a

formidable challenge, exemplifying the intricate equi-

librium achieved within the GAN framework. (Good-

fellow and et al., 2014). See Figure 3 for an example

of a GAN-generated sample CWT image in compari-

son to a real CWT image from user U2.

GANs can only generate a single labelled output

at once, i.e., we have no control over which speciﬁc

label will be produced by the generator. There is

no mechanism for how to request a particular label

from the GANs (Mirza and Osindero, 2014). To train

twelve different labels of our dataset would be quite

time consuming as we would have to train GANs 12

times for 12 different users. A variation of GAN

called Conditional GAN (CGAN) can address this

problem. It consists of an additional input layer with

values of one-hot-encoded image labels. CGANS

generate multilabel output at once and are much more

efﬁcient (Mirza and Osindero, 2014).

2.4 Evaluation of Augmented Data

We employed the standard assessment metric known

as the Fr

echet Inception Distance (FID) to gauge the

quality of the generated images in comparison to the

authentic image set. Similar to the inception score,

Figure 3: Example of a real image from user U2 from our

dataset (left) and a synthetic image generated by the GAN

trained on data from U2 (right).

this evaluation utilizes the inception v3 model (Borji,

2018). Speciﬁcally, the coding layer of the model, sit-

uated just before the output classiﬁcation of images,

captures pertinent computer vision-oriented features

from input images. These activations are computed

for both real and synthetic images, their mean and co-

variance evaluated to render a multivariate Gaussian

representation. These computed values then encapsu-

late the activations across the real and synthetic image

samples. A perfect FID score would stand at 0.0, sig-

nifying a likeness between the two image sets (Borji,

2018).

Additionally, we assessed the practicality of

GANs and CGANs in our context by comparing the

classiﬁcation performance attained using synthetic

images from both methodologies. This evaluation

was conducted on a subset comprising just three users

(U1, U2, and U3). Following the outcomes outlined

in Section 3, we opted to proceed with CGANs for

subsequent experiments.

To ascertain the optimal ratio for harnessing syn-

thetic generated data to increase the size of training

set and consequently enhance the accuracy of biomet-

ric property analysis in blood ﬂow sounds, we trained

Convolutional Neural Networks (CNNs) under var-

ied real-to-synthetic data proportions: 1:9 (10% syn-

thetic to 90% augmented data), 5:5, and 3:7. Ad-

ditionally, we explored an approach involving aug-

menting the labels with fewer data instances using

GAN-generated synthetic data while keeping others

constant, thereby achieving a balanced image count.

Refer to Section 3 and Tables 2 and 3 for detailed in-

sights into the outcomes of these experiments.

3 RESULTS

Referring to Table 4, we can see that the FID values

for CGANs (considering only the left side of users’

data) are roughly twice as large as those for the cor-

responding GANs across the three users. This shows

that the images generated by the GANs are more sim-

GAN-Based Data Augmentation for Improving Biometric Authentication Using CWT Images of Blood Flow Sounds

343

Table 2: Sensitivities for experiments with different combi-

nations of real and augmented data for the left side. Col-

umn ”Real” contains the results from the baseline with only

real data. The columns ”10%”, ”30%” and ”50%” contain

the results from experiments with the corresponding amount

of augmented data. ”Gap” refers to the approach of only

adding as much augmented data as needed to balance the

data set for those users with smaller amounts of data.

User ID Real 10% 30% 50% Gap

U1 0.96 0.96 0.96 0.94 0.96

U2 0.88 0.89 0.89 0.86 0.91

U3 0.91 0.93 0.94 0.90 0.91

U4 0.93 0.93 0.94 0.91 0.93

U5 0.96 0.97 0.97 0.96 0.97

U6 0.94 0.96 0.96 0.93 0.94

U7 0.90 0.91 0.93 0.91 0.93

U8 0.92 0.93 0.93 0.91 0.95

U9 0.69 0.69 0.70 0.70 0.74

U10 0.81 0.82 0.85 0.82 0.86

U11 0.66 0.67 0.69 0.67 0.76

U12 0.93 0.93 0.94 0.90 0.95

Table 3: Sensitivities for experiments with different combi-

nations of real and augmented data for the right side. Col-

umn ”Real” contains the results from the baseline with only

real data. The columns ”10%”, ”30%” and ”50%” contain

the results from experiments with the corresponding amount

of augmented data. ”Gap” refers to the approach of only

adding as much augmented data as needed to balance the

data set for those users with smaller amounts of data.

User ID Real 10% 30% 50% Gap

U1 0.95 0.95 0.95 0.93 0.96

U2 0.93 0.94 0.94 0.89 0.95

U3 0.92 0.92 0.93 0.90 0.94

U4 0.94 0.94 0.94 0.91 0.94

U5 0.92 0.93 0.94 0.91 0.94

U6 0.93 0.94 0.95 0.90 0.96

U7 0.90 0.91 0.91 0.90 0.93

U8 0.92 0.93 0.93 0.91 0.95

U9 0.66 0.69 0.70 0.64 0.74

U10 0.83 0.83 0.85 0.82 0.86

U11 0.69 0.70 0.72 0.67 0.76

U12 0.92 0.93 0.94 0.90 0.95

ilar to the original images. Interestingly, when we an-

alyze the two resulting confusion matrices from the

second evaluation method (one using GANs and the

other using CGANs with augmented real data in a

3:7 ratio), we notice minor differences. Despite us-

ing GANs to augment the dataset, the performance

improvement of the CNN is just slightly better, about

0-1%, compared to using CGANs. This is intriguing

given that GANs require about 8 hours of training per

label, whereas CGANs can train across all labels si-

multaneously in the same time frame. This small per-

Table 4: Results from the pre-experiments investigating the

difference in synthetic images generated by the GANs vs.

the CGAN based on a subset of three users (U1-U3). Shows

the Fr

echet inception distance for each user and both kinds

of generated data.

User ID GANS CGANS

U1 11 23

U2 10 18

U3 12 29

formance gap prompts us to consider the efﬁciency of

these two training approaches.

For the experimentation conducted on the com-

plete dataset enriched with CGAN-generated sam-

ples, our approach began with the integration of 10%

augmented data alongside the authentic images. The

outcomes demonstrated an initial marginal enhance-

ment of 1-2% across at least 8 out of the 12 users.

To provide further insight, we present the sensitivities

pertaining to various blends of real and augmented

data in Table 2 and 3. However, as the augmenta-

tion escalated to encompass 50% augmented data and

50% real data, a decline in results became evident,

indicative of overﬁtting. This phenomenon was illus-

trated by user U1, where the sensitivity dropped from

96% with only real data to 94% upon introducing 50%

augmented data. This pattern was echoed across nu-

merous labels, reﬂecting a decrease in accuracy by

3-4%.

Interestingly, a turning point was observed when

we employed 30% augmented data and 70% real

data. This conﬁguration yielded promising outcomes,

showcasing a consistent rise in sensitivity across all

12 users. Optimal results materialized when we

strategically utilized CGAN-generated synthetic data

to bridge gaps in labels that required additional in-

stances to achieve a balanced dataset of 1,000 images

per user. Notably, some users, like U4, U5, U6, and

U12, already possessed over 1,000 images, render-

ing augmentation unnecessary. However, users such

as U1, U2, and U3, who required a modest inﬂux

of CGAN-generated images to attain the 1,000-image

threshold, experienced modest performance improve-

ments ranging from 0-3%.

Employing this methodology yielded a general

augmentation in sensitivity for all users. Particu-

larly remarkable were the advancements in sensitiv-

ity achieved for labels U9, U10, and U11, which ex-

hibited increases of 6-10% using this approach, as

meticulously illustrated in Table 2 and 3. This under-

scores the efﬁcacy of judiciously introducing CGAN-

generated data to enrich datasets, resulting in substan-

tial improvements across diverse users and labels.

HEALTHINF 2024 - 17th International Conference on Health Informatics

344

4 CONCLUSIONS

This work has presented a data augmentation ap-

proach to increase the size of training data for bet-

ter accuracy in investigating biometric properties in

blood ﬂow sounds using GANs and CNN. Previously,

the CNN model had given less sensitivity per class,

where some users have comparatively less data. So,

we tried adding data generated by GANs and CGANs

to evaluate if this leads to an improvement. When

comparing results from GANs and CGANs in a pre-

test on data from three users, it turns out that GAN

generates samples that are more similar (represented

by a lower FID) to the original samples than those

generated by CGANs. On the other hand, the train-

ing of a GAN for one label from the presented dataset

takes 8 hours, whereas a CGAN is trained for all 12

labels at the same time. So, the GAN’s output is

slightly more reliable but time consuming, whereas

the CGAN has the advantage of producing multi-

labelled output and therefore taking much less train-

ing time.

Synthetically increasing the size of data using

these presented methods can be beneﬁcial in a lim-

ited data domain. This study was mainly focused on

its application on CWT images of audio data, but the

concept can be expanded to other data domains. For

future work, it would be interesting to also differenti-

ate between correctly and incorrectly classiﬁed gener-

ated synthetic data. For GAN and CGANs to perform

better, more measurements from the users should be

included in the analysis.

ACKNOWLEDGEMENTS

Research funding: The authors acknowledge the

ﬁnancial support from the State of North Rhine-

Westphalia and the European Union (EU EFRE [LS-

2-2-038a]).

REFERENCES

Addison, P. S. (2018). Introduction to redundancy rules:

the continuous wavelet transform comes of age.

Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.,

376(2126):20170258.

Babiker, A., Hassan, A., and Mustafa, H. (2017). Heart

sounds biometric system. J. Biomed. Eng. Med. De-

vices, 2(2).

Borji, A. (2018). Pros and cons of GAN evaluation mea-

sures.

Fuentealba, P. and et al. (2021). Carotid sound signal arti-

fact detection based on discrete wavelet transform de-

composition. Curr. Dir. Biomed. Eng., 7(2):299–302.

Goodfellow, I. J. and et al. (2014). Generative adversarial

networks.

Henze, J. and et al. (2022). Towards identiﬁcation of bio-

metric properties in blood ﬂow sounds using neural

networks and saliency maps. Curr. Dir. Biomed. Eng.,

8(2):540–543.

Mirza, M. and Osindero, S. (2014). Conditional generative

adversarial nets.

Salvi, R. and et al. (2021a). Bodytune: Multi auscultation

device-personal health parameter monitoring at home.

Curr. Dir. Biomed. Eng., 7(2):5–8.

Salvi, R. and et al. (2021b). Vascular auscultation of carotid

artery: Towards biometric identiﬁcation and veriﬁca-

tion of individuals. Sensors, 21(19).

Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on

image data augmentation for deep learning. J. Big

Data, 6(1):60.

Springer, D. B., Tarassenko, L., and Clifford, G. D. (2016).

Logistic regression-hsmm-based heart sound segmen-

tation. IEEE Trans. Biomed. Eng., 63(4):822–832.

uhn, T. and et al. (2020). Auscultation system for acquisi-

tion of vascular sounds – towards sound-based moni-

toring of the carotid artery. Med. Devices Evid. Res.,

13:349–364.

GAN-Based Data Augmentation for Improving Biometric Authentication Using CWT Images of Blood Flow Sounds

345