The Impact of Class Weight Optimization on Improving Machine

Learning Outcomes in Identifying COVID-19 Specific ECG Patterns

Sara Khan

, Walaa N. Ismail

, Shada Alsalamah

, Ebtesam Mohamed

and Hessah A. Alsalamah

3,5 c

Department of Information Systems and Technology Management,

George Washington University, Washington, DC, U.S.A.

Management Information Systems Department, College of Business, Al Yamamah University, Riyadh, K.S.A.

Information Systems Department, King Saud University, Riyadh, K.S.A.

Faculty of Computer Science, Minia University, Minia, Egypt

Computer Engineering Department, College of Engineering and Architecture, Al Yamamah University, Riyadh, K.S.A.

Keywords: Convolutional Neural Network (CNN), Covid-19, Data Imbalance, Electrocardiogram (ECG), Class Weights,

VGG16.

Abstract: The Covid-19 pandemic has resulted in 550 million cases and 6.3 million fatalities, with the virus severely

affecting the lungs and cardiovascular system. A study utilizes a VGG16 model adapted for a 12-Lead ECG

Image database to assess the disease's impact on cardiovascular health. The research addresses the challenge

of data imbalance by experimenting with different training approaches: using balanced datasets, imbalanced

datasets, and class weight adjustments for imbalanced datasets. These models are designed for a three-class

multiclass classification of ECG images: Abnormal, Covid-19, and Normal categories. Performance

evaluations, including accuracy scores, confusion matrices, and classification reports, show promising results.

The model trained on a balanced dataset achieved a 90% accuracy rate. When trained on an imbalanced dataset,

the accuracy dropped to 82%. However, with class weight adjustments, the accuracy rebounded to 87%. The

study proves that the adapted VGG16 model can effectively handle both balanced and imbalanced datasets.

Further testing and enhancements can be carried out using additional datasets, making it a valuable tool for

understanding the cardiovascular implications of Covid-19.

1 INTRODUCTION

Training the models with an imbalanced dataset gives

rise to a class imbalance problem, which is strongly

discouraged in supervised machine learning. This is

because during training the model becomes biased

towards the class that is present in the majority. The

model may achieve a high accuracy score, the reason

being it over-classifies the majority class and fails to

identify the minority class which ends up being

misclassified. One of the ways to overcome data

imbalance is by balancing class weights in an

imbalanced dataset. In other words, the class weight

of the category having less data is increased. During

training, the machine learning algorithm by default

https://orcid.org/ 0000-0002-1499-438X

https://orcid.org/ 0000-0002-3054-5015

https://orcid.org/ 0000-0002-4761-0864

assumes every category to be of equal weight. The

learning can be influenced during training by passing

customized class weight values. The minority class is

given a higher-class weight, as the model trains every

point, the error is multiplied by the weight of the

point. The model attempts to minimize this error for

categories having higher class weights.

2 RELATED WORK

In the ever-changing field of healthcare technology,

the application of machine learning, especially deep

learning, has shown significant promise. Numerous

recent research efforts have covered various

562

Khan, S., Ismail, W., Alsalamah, S., Mohamed, E. and Alsalamah, H.

The Impact of Class Weight Optimization on Improving Machine Learning Outcomes in Identifying COVID-19 Speciﬁc ECG Patterns.

DOI: 10.5220/0012413100003657

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2024) - Volume 2, pages 562-567

ISBN: 978-989-758-688-0; ISSN: 2184-4305

dimensions of its capability, from monitoring

cardiovascular health amid the COVID-19 pandemic

to detecting heart abnormalities.

An innovative study integrated 5G technology

into a real-time cardiovascular monitoring system

tailored for COVID-19 patients (Tan, 2021). Utilizing

a combination of convolutional neural networks

(CNNs) and long short-term memory (LSTM)

networks, the research achieved a prediction accuracy

of 99.29%, illustrating the potential of real-time

monitoring and deep learning in COVID-19 patient

care.

Another study used a one-dimensional CNN (1D-

CNN) for classifying various types of ECG rhythms

and beats (Darmawahyuni , 2022). The model, trained

on multiple databases, boasted an impressive

accuracy of 99.98%, thereby demonstrating the

power of deep learning in diagnosing complex heart

abnormalities.

Arrhythmia Classification Focusing on the

classification of arrhythmias into five categories, a

particular study employed deep convolutional neural

networks and used a well-established arrhythmia

database for training (Raza, 2022). The model

attained an accuracy of up to 98.9% with clean data,

emphasizing the effectiveness and reliability of

machine learning in heart disease diagnosis.

COVID-19 Detection Based on ECG Two studies

specifically tackled the early detection of COVID-19

through ECG trace images (Shahin, 2021) (Attallah,

2022). One study tested multiple CNN architectures

and found one model to outperform the others with an

89.64% accuracy rate. Another study examined a

broader array of deep learning algorithms and

achieved an accuracy rate of 98.8% in binary

classification scenarios.

Beyond ECG: Other Applications in COVID-19

Detection Research has also extended into other

diagnostic methods for COVID-19, particularly

focusing on chest X-ray images (El-Rashidy, 2020)

(Ozturk, 2020). High levels of accuracy, surpassing

97%, were achieved using various machine learning

models, with one study notably demonstrating

consistent training and testing accuracy, which speaks

to the model's robustness.

In conclusion, these studies set robust benchmarks

and provide a solid foundation in healthcare

applications involving machine learning. The current

study aims to contribute to this body of work by

introducing a technique for optimizing class weights

in imbalanced datasets to improve machine learning

model performance.

3 METHODOLOGY

The work completed can be divided into four

sections: Dataset Gathering, Pre-processing Dataset,

Building and Training model, and Evaluating Results.

3.1 Data Gathering

The VGG16 model is trained using a publicly

available ECG image database (Khan, 2021). This

database was created by collecting 12-lead ECG

images using the "EDAN SERIES-3" ECG device,

with a sampling rate of 500 Hz. The device was

installed in the Cardiac Care and Isolation units of

various healthcare institutes across Pakistan. Initially,

the database contained the following numbers of

images: 250 for COVID-19 patients, 859 for normal

individuals, 77 for myocardial infarction patients, 203

for patients with a previous history of myocardial

infarction, and 548 for patients with abnormal

heartbeats. For the purpose of three-class multiclass

classification, images belonging to the abnormal,

COVID-19, and normal categories were selected.

To create a balanced dataset, a total of 750 images

were used, with each category containing 250 images.

For an imbalanced dataset, 1470 images were

utilized: 380 images from patients with abnormal

heartbeats, 250 from COVID-19 patients, and 840

from normal individuals.

3.2 Pre-Processing Dataset

The methods used for processing the images are

important for the machine to learn the necessary

features to classify the images accurately. The images

are processed in MATLAB using the following three

steps including gamma correction (Fig. 1B),

grayscaling (Fig. 1C), and cropping (Fig. 1D). For

this specific problem, color is not an essential feature,

hence the images are grayscaled. Grayscaling will

reduce the computational power required and

increase training speed simplifying the learning

process. It also consumes less space which should be

taken into consideration when dealing with large

datasets. Gamma correction however helps in the

brightness and contrast adjustments, The gamma

value used is 0.6. The gamma is set < 1 to get the

desired effect that is the image is brightened and

darker regions are enhanced, decreasing sensitivity in

difference of lighting and making relevant patterns

easier to learn for the model.

The Impact of Class Weight Optimization on Improving Machine Learning Outcomes in Identifying COVID-19 Speciﬁc ECG Patterns

563

3.3 Building and Training Model

Using Python 3.7, TensorFlow 2.9.2 library, and Keras

interface, the VGG16 model is built, trained, and

tested. The model accepts a pre-processed 12-Lead

ECG Image as the input and undergoes three experi-

ments. The images were categorized into the following

types: Abnormal, Covid-19, and Normal. The detailed

experiment architecture is depicted in Fig 2.

Figure 1: (A) Original Image, (B) Image after grayscaling,

cropping.

Figure 2: Experiment architecture.

Transfer learning is a deep learning approach or

machine learning method where knowledge gained

after developing and training a model for one task is

transferred to another model for some other task. In

other words, the parameters of a pre-trained model are

reused to train another model that has a different task

or dataset. Transfer learning helps achieve better

performance even with fewer data at a much faster

speed. Several pre-trained models are available for

use. In this study, the VGG16 model is used. The

updated CNN architecture of the VGG16 model is

shown in Fig. 3.

The model is given an input image of size 500 x

700 and trained at 10 epochs. The training, testing and

validation sizes are 80%, 20%, and 10% respectively.

The VGG16 has a batch size of 4, an SGD optimizer

with a learning rate of 0.001, and the images are in

RGB color mode.

Figure 3: CNN Architecture for VGG16 network.

4 TRAINING MODEL

Three experiments were conducted as follows:

4.1 Training Model Using Balanced

Dataset

The model is trained on 525 images and validated on

75 images belonging to Abnormal, Covid-19, and

Normal ECG images. The accuracy and loss of the

training set and validation set for each epoch are

visually shown in Fig. 4 (A) and Fig. 4 (B)

respectively and recorded numerically in Table 1.

Figure 4: VGG16 Balanced Dataset: A) Training accuracy

(Red) and Validation accuracy (Blue) VS Epochs. B)

Training loss (Red) and Validation loss (Blue) VS Epochs.

Table 1: Balanced dataset accuracy and loss during training.

Epochs

Training

Accurac

Training

Loss

Validation

Accurac

Validation

Loss

73% 0.66 76% 0.79

84% 0.36 81% 0.49

86% 0.29 93% 0.38

91% 0.20 92% 0.22

93% 0.17 94% 0.13

96% 0.11 93% 0.13

97% 0.09 92% 0.18

98% 0.07 92% 0.19

98% 0.05 94% 0.14

98% 0.04 94% 0.11

4.2 Training Model Using Imbalanced

Dataset

The model is trained on 233 Abnormal images, 103

Covid-19 images, and 691 Normal images and

validated on 49 images from each category. The

HEALTHINF 2024 - 17th International Conference on Health Informatics

564

accuracy and loss of the training set and validation set

for each epoch are visually shown in Fig. 5 (A) and

Fig. 5 (B) respectively and recorded numerically in

Table 2.

Figure 5: VGG16 Imbalanced Dataset: A) Training

accuracy (Orange) and Validation accuracy (Blue) VS

Epochs. B) VGG16 Imbalanced Dataset: Training loss

(Orange) and Validation loss (Blue) VS Epochs.

Table 2: Imbalanced dataset accuracy and loss during

training.

Epochs

Training

Accurac

Training

Loss

Validation

Accurac

Validation

Loss

1 83% 0.46 47% 0.99

2 85% 0.34 74% 0.71

3 89% 0.25 82% 0.64

4 92% 0.18 81% 0.53

5 93% 0.14 81% 0.64

6 95% 0.12 82% 0.53

7 95% 0.10 80% 0.72

8 97% 0.07 88% 0.50

9 97% 0.06 89% 0.51

10 98% 0.04 81% 0.72

The model is trained by manually passing class

weight values during training which were calculated

automatically using scikit learn’s utils module. The

accuracy and loss of the training set and validation set

for each epoch when balancing class weights in an

imbalanced dataset are visually shown in Fig. 6 (A)

and Fig. 6 (B) respectively and recorded numerically

in Table 3.

Figure 6: VGG16 balanced class weights (Imbalanced

dataset): A) Training accuracy (Orange) and Validation

accuracy (Blue) VS Epochs. B) VGG16 Imbalanced

Dataset with Weighted Class: Training loss (Orange) and

Validation loss (Blue) VS Epochs.

Table 3: Balancing class weights in imbalanced dataset.

Epochs

Training

Accurac

Training

Loss

Validation

Accurac

Validation

Loss

1 74% 0.58 82% 0.53

2 82% 0.35 87% 0.41

3 87% 0.26 84% 0.40

4 90% 0.17 80% 0.72

5 92% 0.14 74% 1.00

6 92% 0.12 85% 0.45

7 95% 0.08 87% 0.55

8 95% 0.07 83% 0.66

9 97% 0.05 87% 0.45

10 98% 0.04 85% 0.56

5 TESTING TRAINED MODELS

The confusion matrix in multiclass classification

helps in calculating the following metrics for each

category:

• True Positive (TP): The number of predictions

where the classifier correctly predicts the

positive class. For example, the classifier

predicts an image to be of the covid-19

category which in fact was of the covid-19

category.

• False Positive (FP): The number of predictions

where the classifier incorrectly predicts the

negative class as positive. For example, the

classifier incorrectly predicts an image to be of

the covid-19 category which in fact was of the

non-covid-19 category.

• False Negative (FN): The number of

predictions where the classifier incorrectly

predicts the positive class as negative. For

example, the classifier incorrectly predicts an

image to be of the non-covid-19 category

which in fact was of the covid-19 category.

• True Negative (TN): The number of

predictions where the classifier correctly

predicts the negative class. For example, the

classifier correctly predicts an image to be of

the non-covid-19 category which in fact was

of the non-covid-19 category.

The Impact of Class Weight Optimization on Improving Machine Learning Outcomes in Identifying COVID-19 Speciﬁc ECG Patterns

565

Table 4: Classification report for all three experiments.

(A) Balanced Dataset

Precision recall f1-score support

Abnormal 0.86 0.84 0.85

50 Covid-19 1.00 1.00 1.00

Normal 0.84 0.86 0.85

Accuracy 0.90 150

(B) Imbalanced Dataset

Precision recall f1-score support

Abnormal 0.96 0.47 0.63

98 Covid-19 1.00 1.00 1.00

Normal 0.65 0.98 0.78

Accuracy 0.82 294

(C)

Balanced Class weights in Imbalanced

Dataset

Precision recall f1-score support

Abnormal 0.93 0.67 0.78

98 Covid-19 1.00 1.00 1.00

Normal 0.74 0.95 0.83

The confusion matrices are obtained when using

the test set to make predictions on the models that

were trained in Experiments 1, 2, and 3. The

Experiment 1 model: trained with a balanced dataset,

shown in Fig 7 A), correctly classifies 42 Abnormal

images, 50 Covid-19, and 43 Normal images. It

misclassifies 8 Abnromal images to be of the Normal

category and 7 Normal images to be of the Abnormal

category. The Experiment 2 model: trained with an

imbalanced dataset, shown in Fig 7 B), correctly

classifies 46 Abnormal images, 98 Covid-19 images,

and 96 Normal images. It misclassifies 52 Abnormal

images to be of the Normal category and 2 Normal

images to be of the Abnormal category. The

Experiment 3 model: trained by balancing class

weights in an imbalanced dataset, shown in Fig 7 C),

correctly classifies 66 Abnormal images, 98 Covid-

19 images, and 93 Normal images. It misclassifies 32

Abnormal images to be of the Normal category and 5

Normal images to be of the Abnormal category.

The classification Report of all three experiments

for the VGG16 model is summarized in Table 4. The

report shows precision, recall, and f1-score for each

category: Abnormal, Covid-19, and Normal. Support

refers to the number of images present for each

category, in the case of a balanced dataset, it is 50

images, which in total is 150 and for an imbalanced

dataset, it is 98 images, which in total is 294. The

classification report shows the accuracy of the model

on the whole test set.

Figure 7: Confusion Matrix for VGG16: A) Using Balanced

Dataset. B) Using Imbalanced Dataset. C) Balancing

weights in Imbalanced Dataset.

6 CONCLUSION

In this study, we develop a VGG16 deep learning

model using TensorFlow by modifying its final dense

layers. The model is subsequently trained on an ECG

Image Database to detect abnormalities and

successfully distinguish among three categories:

Abnormal heartbeat, COVID-19, and Normal ECG

images. All images undergo gamma correction

processing in a MATLAB environment.

We acknowledge the issue of data imbalance

present within the dataset and propose a method to

mitigate this problem by balancing the class weights.

For experimental validation, two datasets are created:

one balanced and one imbalanced. Three experiments

are conducted to demonstrate differences in training,

accuracy, and performance based on the distribution

of image categories. Model evaluation is performed

using a test set.

Results from Experiment 2 reveal that when an

imbalanced dataset is used without any

countermeasures, accuracy and performance decrease

from 90% to 82%. Experiment 3 shows that the

application of balanced class weights during training

leads to a 5% increase in accuracy compared to

Experiment 2, resulting in an overall accuracy of

87%.

Therefore, the method introduced is proven to

significantly improve the model's performance for

this specific dataset and may be applicable to similar

classification problems. In conclusion, this study

demonstrates that machine learning models are not

only useful for image classification but also offer

utility when data availability is a constraint.

HEALTHINF 2024 - 17th International Conference on Health Informatics

566

ACKNOWLEDGMENT

The Authors would like to acknowledge Professor

Haseeb Ahmad Khan's contribution in providing

guidance to the first author. His expertise,

mentorship, and constructive feedback were

instrumental in shaping this research and ensuring its

success.

REFERENCES

Attallah, O. (2022) “ECG-BiCoNet: An ECG-based

pipeline for COVID-19 diagnosis using Bi-Layers of

deep features integration”, Computers in Biology and

Medicine.

Darmawahyuni, A., Nurmaini, S., et al. (2022). “Deep

learning-based electrocardiogram rhythm and beat

features for heart abnormality classification”, PeerJ

Comput. Sci.

El-Rashidy, N., El-Sappagh, S. Islam, M. R. El-Bakry,

H. M. and Abdelrazek, S. (2022). “End-to-end deep

learning framework for Coronavirus (COVID-19)

detection and monitoring,” Electronics (Basel), vol. 9,

no. 9, p. 1439.

Shahin, I., Bou Nassif, A. and Alsabek. M. (2021).

"COVID-19 Electrocardiograms Classification using

CNN Models." arXiv preprint arXiv:2112.08931.

Khan, H., Hussain, M. and Malik, M. (2021) “ECG Images

dataset of Cardiac and COVID-19 Patients,” Data in

Brief, Vol. 34.

Ozturk, T., Talo, M. E., et al., (2020). “Automated

detection of COVID-19 cases using deep neural

networks with X-ray images,” Comput. Biol. Med., vol.

121, no. 103792, p. 103792.

Raza, A. Tran, K. P., Koehl, L. and Li, S. “Designing ECG

monitoring healthcare system with federated transfer

learning and explainable AI,” Knowl. Based Syst., vol.

236, no. 107763, p. 107763, 2022.

Tan, L. et al., (2021) “Toward real-time and efficient

cardiovascular monitoring for COVID-19 patients by

5G-enabled wearable medical devices: a deep learning

approach,” Neural Comput. Appl., pp. 1–14.

The Impact of Class Weight Optimization on Improving Machine Learning Outcomes in Identifying COVID-19 Speciﬁc ECG Patterns

567