2 LITERATURE REVIEW
Deep learning has started to be used in a variety of
industries as artificial intelligence and computer
technology have advanced. In the realm of image
enhancement, CNNs have come to the fore. In 2015,
Dong et al. first applied CNN to the field of image
super-resolution reconstruction, namely Super-
Resolution Convolutional Neural Network
(SRCNN). Due to the slow running speed and high
computational cost of SRCNN, Dong et al. improved
SRCNN in 2016, increased the running speed, and
named the model Fast SRCNN (FSRCNN). First,
SRCNN enlarges the low-resolution image through
interpolation and then restores it through the model.
However, Shi et al. (2016) believe that research
should start from the fundamentals and learn how to
enlarge the data samples through the model. Based on
this principle, they proposed an image super-
resolution algorithm called Efficient Sub-Pixel
Convolutional Network (ESPCN) with a sub-pixel
convolutional layer. The ESPCN algorithm
introduces a sub-pixel convolution layer, which
indirectly achieves image magnification and greatly
reduces the computational complexity of SRCNN. In
2020, Basak et al. introduced the channel attention
mechanism method into SRCNN and achieved good
results. Both SRCNN and ESPCN use Mean Square
Error (MSE) as the loss function. This results in
overly smooth images and insufficient details at high
frequencies. To solve this problem, Christian et al.
proposed a super-resolution reconstruction technique
based on GAN, SRGAN, and innovatively defined a
perceptual loss function (Ledig et al., 2017). The
SRGAN method introduces a sub-pixel convolution
layer to replace the traditional deconvolution layer
and introduces the feature extraction module in the
VGG19 model as the content loss for comparing
super-resolution images with original high-definition
images. The traditional content loss is obtained using
the MSE method, which directly compares the pixel
differences between two images. However, Christian
et al. believe that this traditional method will only
cause the model to over-learn pixel differences and
ignore the deep intrinsic features of the reconstructed
image (Ledig et al., 2017). Models such as the
VGG19 model that specialize in extracting intrinsic
features of images are just suitable for such tasks. At
this point, the SRGAN model's general framework is
now complete. When reconstructing super-resolution,
the texture details of the images generated by
SRGAN are much higher than those of SRCNN.
Nevertheless, the images with super-resolution that
were created by the original SRGAN model are still
different from the original high-resolution images. In
2021, Wang et al. proposed an enhanced SRGAN
based on the original SRGAN model, namely the
Enhanced SRGAN (ESRGAN), which achieved good
results.
3 EXPERIMENTAL DATASETS
This research uses experimental data that was
obtained from the Kaggle website. This website is a
professional machine learning platform website. The
data set comes from the open-source high-definition
COVID-19 data set shared by users and contains 5779
high-definition lung X-ray images. A training set and
a test set are separated from the data set in a 9:1 ratio.
The training set consists of 5216 photos, whereas the
test set consists of 563 images. This paper first uses
the resize function to unify the training set's picture
size to 720 × 1280 and then uses the randint
function to randomly crop the training set images,
setting the crop size to 96 × 96 and randomly
selecting the cropping position from the image. The
cropped images are then converted into tensors
through the torch. Tensor function and normalized to
−1,1
. Finally, the cropped images are quadrupled
using bicubic interpolation to obtain images of size
24 × 24 and input into the generator.
4 METHODS
This paper reduces the number of residual modules of
the generator in the SRGAN algorithm from 16 to 8
to lessen the complexity and quantity of computing.
At the same time, the dropout regularization
technology is added to the discriminator to prevent
model overfitting, strengthen the model's resilience
and capacity for generalization. In the course of
training, the high-resolution image is first changed
using a method called bicubic downsampling to
create a lower-resolution image, and then the image
is input into the generator, and the output super-
resolution image is obtained through the generator.
Then high-resolution image and super-resolution
image are input into the discriminator respectively,
and the discriminator determines the authenticity and
returns the outcome to the generator while optimizing
the parameters of the discriminator itself. Finally, the
high-resolution image and the super-resolution
image's intrinsic feature differences are compared
through the generator's loss function to optimize the
generator parameters. The specific training flowchart