The Usage of Data Augmentation Strategies on the Detection of Murmur

Waves in a PCG Signal

Jos

e Torres

, Jorge Oliveira

and Elsa Ferreira Gomes

1,3

Instituto Superior de Engenharia do Porto, Porto, Portugal

Universidade Portucalense Infante D. Henrique, Porto, Portugal

INESC TEC, Porto, Portugal

ﬁ

Keywords:

PCG, Deep Learning, Heart Disease, CNN, SMOTE.

Abstract:

Cardiac auscultation is a key screening tool used for cardiovascular evaluation. When used properly, it speeds

up treatment and thus improving the patient’s life quality. However, the analysis and interpretation of the

heart sound signals is subjective and dependent of the physician’s experience and domain knowledge. A

computer assistant decision (CAD) system that automatically analyse heart sound signals, can not only support

physicians in their clinical decisions but also release human resources to other tasks. In this paper, and to the

best of our knowledge, for the ﬁrst time a SMOTE strategy is used to boost a Convolutional Neural Network

performance on the detection of murmur waves. Using the SMOTE strategy, a CNN achieved an overall of

88.43%.

1 INTRODUCTION

Cardiovascular Diseases (CVD) are the leading cause

of death worldwide. An estimated 17.9 million people

died from CVD in 2019, it represents 32% of the num-

ber of deaths worldwide (WHO, 2020). A common

method to detect cardiac diseases is through a car-

diac heart sound auscultation (Mustafa, 2020). Nev-

ertheless, heart sound auscultation is a difﬁcult proce-

dure, since it requires continuous training and many

heart sounds are faint and hard to detect. Fortunately,

modern stethoscopes such as the Litmann 3200 can

amplify heart sounds, reduce the environment noise,

improve the user’s perception and, more importantly,

convert an acoustic to a digital signal (Prodoctor2019,

2020). This allowed, for the ﬁrst time, the develop-

ment of computer assisted decision (CAD) systems

based on auscultation. Such systems can ﬁnd sound

pattern features related to a dysfunctional or malfunc-

tion cardiac heart valve. An early detection allows a

more accurate treatment plan and thus improving the

patient’s life expectancy (Singh and Cheema, 2013;

Latif et al., 2018). The cost of CAD system is reduced

when compared to the cost of a specialised health-

care professional and more speciﬁc exams. These sys-

tems can also be used in developing countries where

people do not have the monetary power to access

an effective and equitable health service that meets

their needs. For the creation of CAD systems, a

heart sound dataset is required. However, the exist-

ing datasets have few heart murmur samples which

makes the training of machine learning algorithms

difﬁcult. On the other hand oversampling methods

might mitigate this limitation by increasing the minor-

ity class in the dataset. In this paper, the application

data augmentation strategies on the detection of mur-

mur events are analysed, more speciﬁcally SMOTE

(synthetic minority oversampling technique) and its

effect on convolutional neural network (CNN) based

architectures. Up to our knowledge it is the ﬁrst time

that, realistic and synthetic Mel spectrograms images

are generated from abnormal heartbeat signals. These

synthetic images are similar to the images generated

using the original sounds (Figure 3). This allows to

balance the data, reduce the overﬁtting and ﬁnd a

more discriminate decision boundary. The paper is

organised as follows. Section 2 provides some back-

ground concerning the fundamental waves of each

phonocardiogram (PCG) signal . Section 3 refers to

the related works in the literature. Section 4 describes

the proposed methodology on the detection of mur-

mur waves. Section 5 presents the SMOTE oversam-

pling approach used. In Section 6 we discuss the re-

sults of our experiments. Finally, Section 7 presents

the conclusions and the future work.

128

Torres, J., Oliveira, J. and Gomes, E.

The Usage of Data Augmentation Strategies on the Detection of Murmur Waves in a PCG Signal.

DOI: 10.5220/0010784500003123

In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 4: BIOSIGNALS, pages 128-132

ISBN: 978-989-758-552-4; ISSN: 2184-4305

2 BACKGROUND

Normal heart sounds are mainly generated by the vi-

brations of cardiac valves as they open and close dur-

ing each cardiac cycle and the turbulence of the blood

in the arteries. Blood ﬂowing through these structures

creates audible sounds, which are more signiﬁcant

when the ﬂow is more turbulent (Libby et al., 2007).

The heartbeat has two basic sounds, namely S1 and

S2. Each sound corresponds to a period called systole

and diastole. Systole is caused by ventricular pres-

sure on the tricuspid and mitral valves. The ﬁrst heart

sound (S1) is audible on the chest wall and is pro-

duced by vibrations of both valves as they close in at

the beginning of the systole. (MA., 2008). Although

the mitral component of S1 is louder and occurs ear-

lier, under physiological resting conditions, both com-

ponents (mitral and tricuspid) occur closely enough,

making it hard to distinguish between them (Dorn-

bush S, 2021), an illustration of a S1 sound is pro-

vided in Figure 1. Diastole happens as the muscles

in the ventricles relax, causing pressures in the auri-

cles to be greater than those in the ventricles, forcing

the tricuspid and mitral valves to open and the pul-

monary valve and aortic valve to close. The second

heart sound (S2) is produced by the closure of the

pulmonary valve and aortic at the beginning of the

diastole. S2 is also formed by two components, with

the aortic component being louder and occurring ear-

lier than the pulmonary component (since the pressure

in the aorta is higher than in the pulmonary artery).

In contrast, unlike S1, under normal conditions, the

closure sound of the aortic and pulmonary valves can

be distinguishable. This is due to an increase in ve-

nous return during inspiration which slightly delays

the pressure increase in the pulmonary artery and con-

sequently the pulmonary valve closure (Dornbush S,

2021), an illustration of a S2 sound is provided in Fig-

ure 1.

3 RELATED WORK

Several works are available in the literature that

consider the problem of heart sound classiﬁcation.

Banerjee and Majhi (Banerjee and Majhi, 2020) used

MFCC to extract the information from the heart sound

signals in PASCAL challenge dataset. These features

are further feed into several CNN models, an accu-

racy of 83% was reported by the authors. Potes (Potes

et al., 2016) also used MFCC to train an ensemble

classiﬁer, and 86% accuracy was reported using Phy-

sioNet dataset. Khan (Khan et al., 2021) extracted

short-time Fourier transform (STFT) features as in-

Figure 1: An example of a normalized heart sound record-

ing, the position of the fundamental heart sounds S1 and

S2 are displayed and identiﬁed. Furthermore, the Systolic

(Sys) and the Diastolic (Dias) periods are also displayed and

identiﬁed.

put to CNN models. The authors reported an accu-

racy of 96.8% using PASCAL and PhysioNet dataset.

Boulares (Boulares et al., 2020) used MFCC spectro-

grams as input to pre-trained models, such as VGG-

19. The authors used PASCAL dataset and reported

an accuracy of 77%. Koike (Koike et al., 2020)

used MFCC and Mel Spectrogram to retrain a pre-

trained model entitled Pretrained Audio Neural Net-

works (PANNs). The authors used PhysioNet dataset

and reported a sensitivity of 96.9% and speciﬁcity of

88.6%. Zabihi (Zabihi et al., 2016) extracted several

features from time, frequency and time-frequency do-

main from PhysioNet dataset in order to train a feed

forward Artiﬁcial Neural Networks (ANNs). Due to

the imbalance problem between normal and abnor-

mal signals presented in the dataset, a bootstrap re-

sampling technique was used to balance the dataset.

The authors reported an overall of 85.90%. Baydoun

(Baydoun et al., 2020) extracted a set of statistical fea-

tures from PhysioNet dataset. To balance the dataset,

the authors used oversampling techniques to replicate

the information of the minority class. The best result

was obtained by combining LogitBoost and Bagging

with an overall of 86.6%.

4 METHODOLOGY

The proposed methodology for this study was tested

using PhysioNet dataset to compare our results with

current state of art approaches. Nevertheless, in the

training phase both PhysioNet and Pascal datasets

were used, as a result more murmur wave patterns are

The Usage of Data Augmentation Strategies on the Detection of Murmur Waves in a PCG Signal

129

provided during the learning process. In order to get

robust and trusty results, 10-fold stratiﬁed cross vali-

dation method was implemented. This method ensure

that the class distribution in each fold is similar to that

in the original dataset and is guaranteed that all data

is tested once.

4.1 Data Processing

Our pre-processing starts by resampling the signal

22000Hz frequency rate. After that, the sound is ﬁl-

tered using the Butterworth 4th order, with a cut-off

bandpass ﬁlter at frequencies 20Hz-400Hz. In the last

step, the signal is divided into segments of 6 seconds

long. PCG signals shorter than 6 seconds are padded

with zeros until the desired length is reached. Follow-

ing the Koike et al. (Koike et al., 2020) experiments,

Mel Spectrograms were computed using a Hamming

window of 30ms and with a stride of 15ms. This set-

ting allows 50% overlap which ensures that no rel-

evant information is lost in the process. After that,

the spectrogram is normalised to the [0,1] range. For

training the CNN model, we use spectrogram images

206px width and 92px high. The dataset contains

sounds of different patients. For each patient differ-

ent sounds were recorded. To determine if a patient

is normal, in our work, we grouped these sounds by

patient.

4.2 Model Selection and Conﬁguration

For the detection of murmur events, we used a CNN

model where all the weights were randomly initialised

from a uniform distribution. The CNN architecture

adopted from (Khan et al., 2021) is shown in Figure 2.

The optimiser selected is the Adam function with

a learning rate of 0.01 and the binnary crossentropy

is our loss function. The model weights were updated

using a batch size of 64 random patient’s data.

After the model weights are updated, the model is

tested in the test set and grouped by patient. For each

patient from test set, all of their sounds are classi-

ﬁed, and is determined whether or not the patient has

pathological heart condition. For a patient to be con-

sidered a patient with pathological heart condition, at

least one murmur wave must be detected in one of

their recordings, otherwise it is considered a normal

patient. This process is repeated until the model is

trained and tested for the number of epochs deﬁned.

In our experiments the model was trained using 30

epochs.

When applying this methodology we have ensured

that signal segments from the same patient are not

Figure 2: The adopted CNN architecture. The architecture

is composed by four convolution layers, followed by a pool-

ing layer and a dropout layer. The last output layer is a sig-

moid layer.

placed in more that one fold. Thus avoiding, over-

ﬁtting to a speciﬁc subject in our dataset.

4.3 Dataset

4.3.1 PASCAL Challenge Dataset

The dataset is composed of a total of 312 heart rates

recordings collected from children or adults in calm

or excited states, divided into two separate sets called

“dataset A” and “dataset B”. These sounds were

recorded at the Cardiology Unit of the Maternal-Fetal

Unit of the Hospital Real Portugu

es in Recife, Brazil.

These recordings were collected in children. The du-

ration ranges from 1 to 10 seconds. Each sound was

categorised into one of three classes: Normal, Mur-

mur and Extrasystole. The Normal class has 200

heart sounds, the Murmur has 97 heart sounds and

Extrasystole has 46 heart sounds.

4.3.2 PhysioNet Computing in Cardiology

Challenge

The PhysioNet dataset was made available in 2016

for a phonocardiogram classiﬁcation challenge. This

is composed by 6 datasets (A-F), with a total of

3240 cardiac sounds (2575 normal e 665 abnormal).

The recordings were collected from different research

groups. The patients include children, adults and el-

ders. The duration of recordings ranges from 5 sec-

onds to 120 seconds.

BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing

130

Figure 3: On the left, a spectrogram image from the PhysioNet database; On the right, a synthetic spectrogram image generated

by the adopted SMOTE algorithm.

4.4 Metrics

The metric used to evaluate the model’s performance

is an average between the sensitivity (1) and speci-

ﬁcity (2), commonly named ”overall” (3). Overall,

sensitivity and speciﬁcity can be calculated by using

the following formulas:

Sensitivity =

T P

T P + FN

(1)

Speci f icity =

T N

T N + FP

(2)

Overall =

Sensitivity + Speci f icity

(3)

Where TP (True Positive) is the number of sub-

jects with CVD who have been correctly identiﬁed as

subject with CVD, TN (True Negative) is the num-

ber of healthy subject correctly identiﬁed as healthy,

FP (False Positive) is the number of healthy subject

incorrectly identiﬁed as subject with CVD and FN

(False Negative) is the number of subjects with CVD

incorrectly identiﬁed as healthy subjects.

5 DATA AUGMENTATION USING

SMOTE

For training the model, we use the Pascal and Phys-

ioNet dataset. Combining both datasets results in a

dataset with approximately 74% normal sounds and

26% abnormal sounds. The existing class ratio may

result in a poor learning process. To avoid it, the

dataset was balanced using a SMOTE technique. Syn-

thetic spectrograms similar to the ones of the minority

class were generated using the following procedure:

1. Identify the minority class;

2. Select the number of nearest neighbours K;

3. Compute a new tensor from a minority tensor and

any of its neighbours and using Eq. (4);

4. Repeat step 3 for all minority tensors and their K

neighbors until the dataset is balanced.

S = C + rand(0, 1) ∗ di f f (4)

Where S represents a synthetic tensor, C is the

considered minority tensor and di f f is a difference

tensor computed from C and the selected K. To be

able to generate new images using SMOTE, initially

we convert from a 4D (batch, width, height, channels)

to 2D array. To convert an array to 2D reshape tech-

niques are used. After that, we use a SMOTE ap-

proach with K of 20 to generate the synthetic ten-

sors. To get the desired image dataset, we need to

reshape the SMOTE dataset with shapes multiplied

previously. The Fig.3 shows an example of an abnor-

mal synthetic image.

6 RESULTS

In order to make fair comparisons with current state-

of-art solutions, we have implemented and tested the

algorithm proposed by (Nogueira et al., 2019). Ta-

ble 1 reports our ﬁndings.

Table 1: Comparison of results.

Alt. Sensitivity Speciﬁcity Overall

(Nogueira et al., 2019) 87.37% 79.07% 83.22%

CNN 85.41% 90.02% 87.72%

SMOTE-CNN 84.51% 92.34% 88.43%

Both implemented CNN model’s obtained a

higher overall performance than (Nogueira et al.,

2019). This might be due to the fact that, our CNNs

models are deeper and have a larger receptive ﬁeld

than the CNN model implemented by (Nogueira

et al., 2019). Furthermore, the impact of a SMOTE

technique was evaluated and also reported in Table 1.

Using a SMOTE technique, synthetic spectrogram

images from abnormal examples are added to our

databases. As a result, abnormal patterns that have not

yet been seen by CNN models were extrapolated and

The Usage of Data Augmentation Strategies on the Detection of Murmur Waves in a PCG Signal

131

created. As a result, the network is more capable of

distinguishing normal from abnormal examples. The

application of SMOTE techniques boosted the CNN

model in overall by 0.71%.

7 CONCLUSIONS AND FUTURE

WORK

In this paper, the problematic concerning to the short-

age of abnormal heart sound examples is studied and

addressed. We proposed the usage of SMOTE tech-

niques to generate synthetic spectrogram images. The

best result was achieved by a SMOTE-CNN algo-

rithm, an overall of 88.43%, a Sensitivity of 84.51%

and a Speciﬁcity of 92.34%.

The current results, strongly indicate that the ap-

plication of oversampling techniques, such as the

SMOTE, can improve signiﬁcantly the capability of

CNN model’s to discriminate between normal and ab-

normal heart beats. The proposed SMOTE approach

can serve as a basis for other unbalanced dataset prob-

lems besides heart sounds problems.

For future work we intend to extend out experi-

ments to pre-trained models. We also intend explore

other oversampling techniques.

ACKNOWLEDGEMENTS

This work is ﬁnanced by National Funds through

the Portuguese funding agency, FCT-Fundac¸

para a Ci

encia e a Tecnologia, within project

UIDB/50014/2020.

REFERENCES

Banerjee, M. and Majhi, S. (2020). Multi-class heart sounds

classiﬁcation using 2d-convolutional neural network.

In 2020 5th International Conference on Computing,

Communication and Security (ICCCS), pages 1–6.

Baydoun, M., Safatly, L., Ghaziri, H., and El-Hajj, A.

(2020). Analysis of heart sound anomalies using en-

semble learning. Biomed. Signal Process. Control.,

62:102019.

Boulares, M., Alaﬁf, T., and Barnawi, A. (2020). Transfer

learning benchmark for cardiovascular disease recog-

nition. IEEE Access, 8:109475–109491.

Dornbush S, T. A. (2021). Physiology, heart sounds.

Khan, K. N., Khan, F. A., Abid, A., Olmez, T., Dokur,

Z., Khandakar, A., Chowdhury, M. E. H., and Khan,

M. S. (2021). Deep learning based classiﬁcation of un-

segmented phonocardiogram spectrograms leveraging

transfer learning.

Koike, T., Qian, K., Kong, Q., Plumbley, M. D., Schuller,

B. W., and Yamamoto, Y. (2020). Audio for audio is

better? an investigation on transfer learning models

for heart sound classiﬁcation. In 2020 42nd Annual

International Conference of the IEEE Engineering in

Medicine Biology Society (EMBC), pages 74–77.

Latif, S., Usman, M., Rana, R., and Qadir, J. (2018).

Phonocardiographic sensing using deep learning for

abnormal heartbeat detection. IEEE Sensors Journal,

18(22):9393–9400.

Libby, P., Bonow, R., Mann, D., and Zipes, D. (2007).

Braunwald’s Heart Disease: A Textbook of Cardio-

vascular Medicine. 8th edition. Elsevier Science.

MA., C. (2008). Cardiac auscultation: rediscovering the

lost art. Curr Probl Cardiol., 7(33):326–408.

Mustafa, M. e. a. (2020). Detection of heartbeat sounds

arrhythmia using automatic spectral methods and

cardiac auscultatory. Journal of Supercomputing,

76(8):5899–5922.

Nogueira, D. M., Ferreira, C. A., Gomes, E. F., and Jorge,

A. M. (2019). Classifying heart sounds using images

of motifs, mfcc and temporal features. J. Med. Syst.,

43(6).

Potes, C., Parvaneh, S., Rahman, A., and Conroy, B. (2016).

Ensemble of feature-based and deep learning-based

classiﬁers for detection of abnormal heart sounds. In

2016 Computing in Cardiology Conference (CinC),

pages 621–624.

Prodoctor2019 (2020). Estetosc

opio eletr

onico: tudo o que

voc

e precisa saber.

Singh, M. and Cheema, A. (2013). Heart sounds classiﬁ-

cation using feature extraction of phonocardiography

signal. International Journal of Computer Applica-

tions, 4(77):13–17.

WHO (2020). World health organization (2020) cardiovas-

cular diseases (cvds), cardiovascular diseases (cvds).

Zabihi, M., Rad, A. B., Kiranyaz, S., Gabbouj, M., and Kat-

saggelos, A. K. (2016). Heart sound anomaly and

quality detection using ensemble of neural networks

without segmentation. In 2016 Computing in Cardi-

ology Conference (CinC), pages 613–616.

BIOSIGNALS 2022 - 15th International Conference on Bio-inspired Systems and Signal Processing

132