A Multi-Task Learning Framework for Image Restoration

Using a Novel Generative Adversarial Network

Rim Walha

1,2 a

, Fadoua Drira

1 b

and Rania Bedhief

REGIM-Lab, ENIS, University of Sfax, Tunisia

Higher Institute of Computer Science and Multimedia of Sfax, University of Sfax, Tunisia

Keywords:

Multi-Task Learning, Deep Learning, Real-World Degradations, Image Restoration.

Abstract:

In the last years, deep learning has gained growing popularity in image restoration, becoming the efﬁcient

mainstream for the subsequent higher level computer vision processing tasks. In particular, image restoration

is a challenging task due to the high variations of degradations faced in the real-world scenarios. In this study,

we introduce an efﬁcient multi-task generative adversarial learning based framework as a practical solution

suitable for various types of image degradations. We apply recent advancements in deep learning to design,

build and train such a framework that can deal with several image restoration tasks treated simultaneously.

More precisely, the main speciﬁcities of the proposed architecture are: (1) the introduction of a novel genera-

tor based on an encoder with separate decoders, (2) the utilization of low-level multi-scale features within the

encoder component of our architecture, (3) the incorporation of the multi-scale transformer technique in each

decoder in order to learn and share the low-level features representations among different tasks. Our experi-

mental study demonstrates the efﬁciency and the robustness of the proposed framework for two speciﬁc image

restoration tasks including image deblurring and image denoising. Moreover, it achieves high performance

results that exceed those of state-of-the-art methods when evaluated on the same datasets.

1 INTRODUCTION

Nowadays, the popularity of computer vision appli-

cations reveals a pressing need for high quality im-

ages to guarantee efﬁcient based systems. Neverthe-

less, on the ascendant massive use of mobile internet

and the ubiquitous presence of cameras on various de-

vices, image quality could inevitably be corrupted by

several degradations during the acquisition and trans-

mission processes. The presence of these undesirable

artifacts including blur, noise, low resolution could

adversely affect the overall performance. An ade-

quate solution to this problem is image restoration;

a process that aims to recover an image from its de-

graded version. For simplicity, we limit this study to

two main image restoration tasks including denoising

and deblurring. Indeed, the principal challenge in im-

age denoising is to recover a clean signal A from the

noisy observation B corrupted by an additive noise

N, namely: B = A + N. Sophisticated ﬁlters have

been proposed in the literature, most of them could

https://orcid.org/0000-0002-0483-6329

https://orcid.org/0000-0001-6706-4218

be classiﬁed into six categories: wavelet-based, lin-

ear, non-linear, adaptive, total variation, and partial

differential equation based ﬁlters. Classical denois-

ing methods are mainly based on modifying transform

coefﬁcients (Guo et al., 2019) or averaging neigh-

borhood pixels (Walha et al., 2014). Major difﬁcul-

ties in noise removal consist in feature/edge/texture

preservation while smoothing away noise in ﬂat re-

gions without additional processing artefacts (Drira

et al., 2012; Walha et al., 2015; Walha et al., 2018).

These difﬁculties concern also the image deblurring.

The latter is the process of removing blurs. It could be

blind or non-blind according to the usage of blur ker-

nel information. Non-Blind process refers to handle

an image by a given known blur kernel, while blind

process aims to restore sharpness in an image without

prior knowledge about the blur kernel (Guemri et al.,

2017). In general, a blurred image Y can be modeled

as: Y = K ∗X +N, where K is the blur kernel, X is the

sharp image and N is the additive noise.

Earlier studies formulated the image restoration as

an inverse problem, emphasizing the deﬁnition of a

model for corrupted images while taking into account

priors of clean images. This model is then exploited to

928

Walha, R., Drira, F. and Bedhief, R.

A Multi-Task Learning Framework for Image Restoration Using a Novel Generative Adversarial Network.

DOI: 10.5220/0012421000003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 928-935

ISBN: 978-989-758-680-4; ISSN: 2184-433X

minimize an objective function, aiming to reconstruct

a clear image from its degraded version (Walha et al.,

2013). Due to the success of deep learning to achieve

good outstanding performance in various computer

vision applications (Harizi et al., 2022a; Harizi et al.,

2022b), recent image restoration studies have focused

on proposing solutions based on these architectures.

The main advantage here is that no explicit model-

ing of image prior is required. Well-known state-of-

the-art deep learning based solutions could be clas-

siﬁed into two groups: Single Task Learning (STL)

and Multi-Task Learning (MTL) based solutions. On

one hand, the ﬁrst group treats each degradation inde-

pendently therefore various networks are conceived

for different degradation types. On the other hand,

the second group focuses on the proposition of single

networks that can deal with a combined set of degra-

dations. It aims to optimize the performance across

multiple task predictors through some transfer knowl-

edge between them (Caruana, 1997).

In this work, our main concern is the proposition

of a new deep auto-encoder based multi-task genera-

tive adversarial learning framework for image restora-

tion. The latter is inspired by the recent success

of deep architectures including Convolution Auto-

Encoder (CAE), Transformers and Generative Adver-

sarial Networks (GAN). More speciﬁcally, we pro-

pose a multi-task end-to-end framework based on a

single encoder which learns multi-scale features rep-

resentations to be shared between different tasks. The

framework encompasses separate decoders contain-

ing multi-scale transformer blocks useful for further

analyzing local image structure and ﬁne details across

multi-scales in order to achieve an effective restora-

tion. Each decoder focuses on its speciﬁc restoration

task. The proposed framework could be obviously ex-

tended to cope with other restoration tasks.

The rest of this paper is organized as follows: Sec-

tion 2 outlines related works on MTL and STL in deep

networks-based image restoration. Section 3 details

the proposed multi-task learning framework. Section

4 presents our experimental study. The study is closed

with conclusions and emerging aspects for future re-

search in Section 5.

2 RELATED WORK

To address image restoration, deep neural networks

employ STL methods, emphasizing specialized ar-

chitectures for individual degradation types. Numer-

ous recent publications explore this area; (Koh et al.,

2021; Wang et al., 2022; Zamir et al., 2022; Li et al.,

2018) to name a few. These methods enable a net-

work to restore various types of degradation using the

same architecture. In (Wang et al., 2022), UFormer,

a transformer-based architecture, was introduced for

image restoration. It relies on a learnable multi-

scale restoration modulator incorporated into the de-

coder. Despite its effectiveness, single-stage meth-

ods like UFormer often exhibit high network com-

plexity. Later, Cheng et al. overcome the recourse

to complicated architecture for image denoising via

subspace learning (Cheng et al., 2021). A simple

baseline that adopts the single-stage UNet as archi-

tecture was suggested in (Chen et al., 2022). A set

of different networks could collaborate to tackle com-

plex image restoration tasks. Such process evolves

multi-stage methods that decompose the overall task

into smaller easier sub-tasks, each stage is based on

a lightweight sub-network. In (Zamir et al., 2021),

the authors propose MPRNet as a multi-stage pro-

gressive image restoration architecture, composed of

two encoder-decoder sub-networks and one original

resolution sub-network. Another multi-stage restora-

tion method proposed in (Chen et al., 2021), is called

HiNet. Feature fusion and attention-guided map are

introduced across stages. Also, a multi-axis MLP

based architecture called MAXIM was proposed in

(Tu et al., 2022).

Other recent studies proceed via MTL methods in

deep neural networks to deal with combined degrada-

tions. In fact, MTL is a subﬁeld of machine learning

useful in domain-related tasks (Crawshaw, 2020). It

is a mechanism of learning multiple tasks simultane-

ously using a shared model compared to STL. Gener-

ally, MTL improves the generalization capability and

accuracy performance mainly for correlated or related

tasks. Giving this context, the network could beneﬁt

from domain-speciﬁc knowledge encapsulated in the

training samples of the different tasks. Good repre-

sentations could be thus learned with less amount of

data and reduced overﬁtting. For instance, Liu et al.

proposed a two-step training based framework to re-

store images with unknown degradation factors (Liu

et al., 2019). These steps include MTL and ﬁne tun-

ing. Martyniuk (Martyniuk, 2019) presented an end-

to-end pipeline that contains a generic encoder and

separate decoders. The author introduced a new archi-

tecture for the generator inspired by the feature pyra-

mid networks to deal with deblurring, dehazing and

rain-drops removal tasks.

In conclusion, even though we noticed a limited

number of propositions dealing with MTL, the lat-

ter is an active research area with promising issues.

It could be very useful mainly for real-time applica-

tions. Indeed, the combinaison of mutiple tasks into

the same learning model reduces the computational

A Multi-Task Learning Framework for Image Restoration Using a Novel Generative Adversarial Network

929

Figure 1: Overview of the proposed multi-task generative adversarial network for image restoration.

complexity. This justiﬁes the motivation of our study.

3 PROPOSED MULTI-TASK

GENERATIVE ADVERSARIAL

LEARNING NETWORK FOR

IMAGE RESTORATION

In this section, we describe the proposed image

restoration framework that relies on a multi-task

GAN. An overview of the proposed framework is de-

picted in Fig. 1. As illustrated in this ﬁgure, our gen-

erator comprises a single multi-scale features based

encoder and separate decoders. Separate discrimina-

tors are used to distinguish fake images from real im-

ages. More details about each part of our proposition

are given in the following subsections.

3.1 Preliminaries

In this section, we explain the preliminary works

on which is based our framework. Indeed, Genera-

tive Adversarial Network is the core of our proposed

framework. It represents a deep neural network archi-

tecture designed by Goodfellow et al. (Goodfellow

et al., 2014). Figure 2 illustrates its basic framework.

It is comprised of two neural networks – a genera-

tive model and a discriminative model – trained with

an adversarial loss function to generate data that re-

sembles a distribution. The ﬁrst neural network, the

generator is used to generate new samples as close

as possible to given samples. The second neural net-

work, the discriminator, is to discriminate between

two different classes of data from the generator and

to determine the real or the fake.

Despite their promising results, the main limi-

tation of GANs is their unstable learning generally

caused by the gradient vanishing and the mode col-

lapse. To improve the learning stability of GAN-

based models, several variant have been proposed.

Coupling GAN with Auto-encoders (AE) as a sec-

ondary network is one among interesting proposed

solutions. Indeed, each network has its own learn-

ing process. Furthermore, AE could represent data

samples with lower dimensionality. The AE model

encompasses three components: the encoder, the de-

coder, and the loss function to compare the output to

the target image. Our investigation concerns GAN

and AE based models with a context of MTL.

Figure 2: Generative adversarial network basic framework.

3.2 Encoder Proposed Structure

In order to capture relevant features that describes ﬁne

details of an image, we propose a multi-scale features

based encoder as illustrated in Fig. 3. More precisely,

in order to deal with degradations at different scales,

the proposed encoder starts with extracting features

from the input degraded image x by using the Fea-

ture Pyramid Network (FPN) (Lin et al., 2017) with

Mobilnet-V2 backbone (Sandler et al., 2018). This

network outputs k multi-scale feature maps, each of

them is a 2D-dimensional vector corresponding to

the features extracted at different scales of the image.

These maps are referred to as annotation vectors:

FPN(x) =

{

M0, M1, ..., Mk

}

(1)

These multi-scale feature maps (except M0) are fur-

ther analyzed, as shown in Fig. 3, and are passed

through Convolution Maxpooling Blocks (CMB) to

extract low-level features. Each block comprises a

stack of eight units, each of them is formed by a 2D-

convolution layer followed by a 2D-maxpooling layer

and a ReLu activation layer. Especially, the number

of ﬁlters used in the convolution layers ranges from

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

930

Figure 3: Illustration of the encoder-decoder structure within the proposed multi-decoder based generator.

32 to 2048. The maxpooling is performed on the fea-

ture maps obtained from the convolution layer to pro-

gressively reduce the spatial size of the representation

that minimizes the number of parameters and compu-

tations in the proposed network. The output of each

CMB unit is transmitted to the next unit and the ﬁnal

output multi-scale low-level features are up-sampled

and concatenated into one tensor which contains rele-

vant information on different scale levels.

3.3 Decoders Proposed Structure

The proposed multi-task GAN-based image restora-

tion framework involves multi-decoder based gener-

ator. More precisely, separate decoders are used to

deal with different image restoration tasks. The same

architecture is used for each decoder. The separate

decoders share the low-level multi-scale features gen-

erated by the proposed encoder. As illustrated in

Fig. 3, these features are further analyzed in the de-

coder part using a 2D-convolution layer followed by

batch normalization and softmax layers, then upsam-

pled and concatenated with the M0 feature map. This

extra skip-connection with M0 is applied for preserv-

ing the features representation between the encoder

and the decoder parts, which helps the decoder to re-

cover information that might have been lost during

feed-forward convolutions. This leads to achieve bet-

ter sharpness and structural similarity in the generated

image.

Motivated by the outstanding success of trans-

formers in recent computer vision works, we pro-

pose to integrate in the decoders part the multi-

scale transformer blocks. This enhances our frame-

work’s ability to handle various degradations simulta-

neously, yielding promising results. As shown in Fig.

3, extra skip connections between the transformer

blocks are used. Especially, inspired from (Zamir

et al., 2022), each transformer block consists of multi-

deconvolution head transposed attention (MDTA) and

gated-deconvolution feed-forward network (GDFN).

The transformer block is designed as a multi-scale

block to adjust features in multiple layers of the de-

coder part. In fact, the MDTA is composed by depth-

wise convolutions to emphasize the local context in

order to produce the global attention map with layer

normalized tensor. We apply 1×1 convolutions to ag-

gregate pixel-wise cross-channel context followed by

3×3 depth-wise convolutions to encode channel-wise

spatial context. GDFN consists of depth-wise con-

volutions, which helps to encode information from

the neighboring pixel positions and it is useful for

learning local image structure for effective restora-

tion. GDFN contains also the non-linearity GELU

and the layer normalization. It controls the infor-

mation ﬂow through the respective hierarchical scale,

thereby allowing each scale to focus on the ﬁne details

complimentary to the other scale. GDFN focuses on

A Multi-Task Learning Framework for Image Restoration Using a Novel Generative Adversarial Network

931

enriching features with contextual information.

3.4 Loss Function

Likewise with other multi-scale restoration networks

(Martyniuk, 2019), we use a mixing loss L that com-

bines a content loss L

cont

with an adversarial loss L

adv

Hence, the overall loss function L can be formulated

as follows:

L = L

cont

+ β · L

adv

(2)

where β is a weight constant. The adversarial loss cor-

responds to the discriminators loss function, whereas

the content loss corresponds to the generator loss

function. In this work, we use the Wasserstein dis-

tance GAN Gradient Penalty, WGAN-GP (Gulrajani

et al., 2017), as the discriminators loss function. In

fact, WGAN-GP is shown to be efﬁcient for improv-

ing the stability of training and also to be robust to the

choice of the generator structure. The game between

the generator G and the discriminator D

t∈{1..n}

, called

as a critic, relies on the WGAN objective function

W GAN

which is constructed using the Kantorovich-

Rubinstein duality and deﬁned as follows:

W GAN

= E

x∼P

(x)] − E

ˆx∼P

( ˆx)] (3)

where P

presents the data distribution over real sam-

ple x and P

constitutes the generator’s distribution,

deﬁned by ˆx = G(z); the input z corresponds to a sam-

ple from a noise distribution. The discriminator tries

to maximize the L

W GAN

function during the training

phase by maximizing the difference between its re-

sults on real samples and its results on fake samples.

In the new form of Wasserstein metric, D

is de-

manded to be K-Lipschitz continuous. The idea is

that there exists a real constant K ≥ 0, called a Lip-

schitz constant, and the critic value approximates

K ·W (P

, P

), where W (P

, P

) is the Wasserstein dis-

tance that measures the distance between the distribu-

tions P

and P

. Here, each discriminator D

approxi-

mates the distance between real and fake samples.

The WGAN concept requires that the discrimina-

tor relies within the space of 1-Lipschitz functions. In

order to enforce the Lipschitz constraint and to main-

tain a stable learning process with gradient descents,

Gulrajani et al. (Gulrajani et al., 2017) suggest to add

to L

W GAN

a gradient penalty term which is deﬁned as:

ηE

ˆx∼P

ˆx

(

∥

▽

ˆx

( ˆx)

∥

− 1)

(4)

Thereby, the adversarial loss function consists of two

parts which are the WGAN loss function L

W GAN

and

the gradient penalty. This can be formulated by:

adv

= L

W GAN

+ ηE

ˆx∼P

ˆx

(

∥

▽

ˆx

( ˆx)

∥

− 1)

(5)

For the content loss L

cont

, the classical choice can

be the Mean Absolute Error (MAE) loss or the Mean

Squared Error (MSE) loss on raw pixels. Using those

functions leads to blurry artifacts on generated im-

ages. In this work, the content loss has two loss com-

ponents: the L

loss for preserving colors and the per-

ceptual loss function L

as follows:

cont

= L

+ 0.5 · L

(6)

The perceptual loss function L

(Eq.7) presents an

L2-loss (Johnson et al., 2016), but relies on the dif-

ference between the generated and target images ac-

cording to the feature maps within the generator part.

i, j

∑

x=1

i, j

∑

y=1



i, j





x,y

− Φ

i, j

(G(I

))

x,y



(7)

where I

and I

are respectively a degraded image and

a ground-truth image, H

i, j

and W

i, j

denotes the dimen-

sions of the feature maps, and Φ

i, j

represents the fea-

ture map generated by the j-th convolution layer.

4 EXPERIMENTAL EVALUATION

4.1 Datasets and Settings

In order to evaluate the proposed image restoration

framework, two well-known datasets are used, includ-

ing the GoPro dataset (Nah et al., 2017) and the SSID

dataset (Abdelhamed et al., 2018). Especially, the

SIDD, a Smartphone Image Denoising dataset, con-

tains 30000 noisy images taken from ten real-world

scenes under various lighting conditions and using

ﬁve smartphone cameras. This dataset aims to ad-

dress the problems of smartphones images denoising,

where the small sensor and aperture size cause no-

ticeable noise even in pictures taken at base ISO. Fur-

ther processing is applied to provide ground-truth im-

ages along with the noisy images. The GoPro dataset,

widely used for image deblurring, consists of 3214

pairs of blurry and sharp images captured in the wild

at 1280×720 resolution. It is divided into 2103 train-

ing pairs and 1111 test pairs. The images are derived

from high-speed camera videos at 240 frames per sec-

ond, with blurry images obtained by averaging suc-

cessive frames.

In this work, we used PyTorch for our imple-

mentation. The proposed multi-task framework was

trained on randomly cropped image patches of size

256 × 256 for 100 iterations per task. Horizontal and

vertical ﬂips and rotations are adopted for data aug-

mentation. The training was performed on “Google

Collaboratory Pro” using a GPU. The parameter β in

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

932

Eq.(4), that controls the relative importance of the loss

terms, is set to 0.01. The FPN within the encoder gen-

erated 5 feature maps (k ∈ 0..4).

4.2 Comparison with State-of-the-Art

Methods

The proposed network is evaluated on natural scene

images and compared with state-of-the-art deblur-

ring networks and denoising networks. Performance

Evaluation can be broadly categorized into quantita-

tive and qualitative evaluations. For the quantitative

evaluation, we use the Peak Signal-to-Noise Ratio

(PSNR) and the Structural SIMilarity index (SSIM).

Table 1 (respectively Table 2) illustrates the val-

ues of PSNR and SSIM generated by different meth-

ods performed on the GoPro dataset (respectively

the SIDD dataset) for the deblurring task (respec-

tively the denoising task). The proposed multi-task

GAN-based framework achieves a new state-of-the-

art PSNR value of 36.17 dB on the GoPro dataset,

as shown in Table 1. Results, illustrated in Figure

4, demonstrate superior deblurring performance with

sharper images and improved preservation of edges

and local details compared to state-of-the-art deblur-

ring networks.

Our framework, capable of handling various

degradations, is tested on the SIDD dataset for de-

noising. Results in Table 2 show its superior per-

formance compared to other state-of-the-art denois-

ing networks, as measured by PSNR. The proposed

framework demonstrates the second-best SSIM in-

dex performance and effectively removes real noise

while preserving ﬁne details in denoised images, as

shown in Figure 5 with magniﬁed regions from the

SIDD dataset. Our effective deblurring and denois-

ing results stem from utilizing low-level multi-scale

features in our encoder, enabling detailed analysis of

corrupted images. Moreover, the proposed decoder

structure relies on multiple instances of transformer

block which boost the GAN overall performance in

image restoration.

5 CONCLUSION AND OPEN

ISSUES

In this paper, we proposed an effective multi-task

GAN-based image restoration framework that ad-

dresses various degradations. Key features include:

(1) a unique generator with separate decoders, (2) uti-

lization of low-level multi-scale features in the en-

coder, and (3) integration of multi-scale Transformer

Table 1: PSNR and SSIM values from various methods on

the GoPro dataset for image deblurring, with the top two

results emphasized in bold and underlined.

Restoration method PSNR SSIM

MIMO-UNet (Cho et al., 2021) 32.68 0.959

HiNet (Chen et al., 2021) 32.71 0.959

MAXIM (Tu et al., 2022) 32.86 0.961

Restormer (Zamir et al., 2022) 32.92 0.961

MSCNN (Nah et al., 2017) 29.20 0.916

MPRNet (Zamir et al., 2021) 32.66 0.959

SRNet (Tao et al., 2018) 30.10 0.932

NAFNet (Chen et al., 2022) 32.88 0.961

MTLGAN (Martyniuk, 2019) 27.30 0.810

DeblurGAN-v2 (Kupyn et al., 2019) 29.55 0.934

MFC-Net (Xia et al., 2022) 31.04 0.916

SVRNN (Ren et al., 2022) 30.46 0.936

UFormer (Wang et al., 2022) 32.97 0.967

Our proposition 36.17 0.961

Table 2: PSNR and SSIM values from various methods on

the SIDD dataset for image denoising. Best ﬁrst and second

results are highlighted in bold and underlined, respectively.

Restoration method PSNR SSIM

MPRNet (Zamir et al., 2021) 39.71 0.958

MIRNet (Zamir et al., 2020) 39.72 0.959

NBNet (Cheng et al., 2021) 39.75 0.959

UFormer (Wang et al., 2022) 39.89 0.960

MAXIM (Tu et al., 2022) 39.96 0.960

HiNet (Chen et al., 2021) 39.99 0.958

Restormer (Zamir et al., 2022) 40.02 0.960

NAFNet (Chen et al., 2022) 40.30 0.962

Our proposition 42.41 0.961

technique in each decoder to learn and share low-

level feature representations across tasks. Extensive

experimental results on two datasets demonstrate the

framework’s superior quantitative results compared

to state-of-the-art methods for various degradations.

Additionally, qualitative evaluations on degraded im-

ages conﬁrm the framework’s ability to visually re-

construct plausible deblurred and denoised images ef-

ﬁciently. As a perspective of this work, we plan to

explore our multi-task framework for various appli-

cations, with a focus on evaluating its effectiveness in

tasks like image super-resolution, underwater image

restoration, and image dehazing.

A Multi-Task Learning Framework for Image Restoration Using a Novel Generative Adversarial Network

933

(a)

(b)

(c)

(d)

Figure 4: Examples of deblurring results on GoPro dataset. Left to right: input blurry images, output images obtained by the

proposed framework, ground truth images.

(a)

(b)

(c)

Figure 5: Examples of denoising results on SIDD dataset. Left to right: input noisy images, images reconstructed by the

proposed framework, ground truth images.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

934

REFERENCES

Abdelhamed, A., Lin, S., and Brown, M. S. (2018). A high-

quality denoising dataset for smartphone cameras. In

CVPR, pages 1692–1700.

Caruana, R. (1997). Multitask learning. Mach. Learn.,

28(1):41–75.

Chen, L., Chu, X., Zhang, X., and Sun, J. (2022). Simple

baselines for image restoration. In ECCV, Part VII,

volume 13667, pages 17–33.

Chen, L., Lu, X., Zhang, J., Chu, X., and Chen, C. (2021).

Hinet: Half instance normalization network for image

restoration. In CVPR, pages 182–192.

Cheng, S., Wang, Y., Huang, H., Liu, D., Fan, H., and Liu,

S. (2021). Nbnet: Noise basis learning for image de-

noising with subspace projection. In CVPR, pages

4896–4906.

Cho, S., Ji, S., Hong, J., Jung, S., and Ko, S. (2021). Re-

thinking coarse-to-ﬁne approach in single image de-

blurring. In ICCV 2021, pages 4621–4630.

Crawshaw, M. (2020). Multi-task learning with deep neural

networks: A survey. CoRR, abs/2009.09796.

Drira, F., Lebourgeois, F., and Emptoz, H. (2012). A new

pde-based approach for singularity-preserving regu-

larization: application to degraded characters restora-

tion. IJDAR, 15(3):183–212.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A. C., and Ben-

gio, Y. (2014). Generative adversarial nets. In NIPS

2014, pages 2672–2680.

Guemri, K., Drira, F., Walha, R., Alimi, A. M., and Lebour-

geois, F. (2017). Edge based blind single image de-

blurring with sparse priors. In VISAPP 2017, pages

174–181.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and

Courville, A. C. (2017). Improved training of wasser-

stein gans. In NIPS 2017, pages 5767–5777.

Guo, S., Yan, Z., Zhang, K., Zuo, W., and Zhang, L.

(2019). Toward convolutional blind denoising of real

photographs. In CVPR 2019, pages 1712–1722.

Harizi, R., Walha, R., and Drira, F. (2022a). Deep-learning

based end-to-end system for text reading in the wild.

Multim. Tools Appl., 81(17):24691–24719.

Harizi, R., Walha, R., Drira, F., and Zaied, M. (2022b).

Convolutional neural network with joint stepwise

character/word modeling based system for scene text

recognition. Multim. Tools Appl., 81(3):3091–3106.

Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Per-

ceptual losses for real-time style transfer and super-

resolution. In ECCV, Part II, volume 9906, pages

694–711. Springer.

Koh, J., Lee, J., and Yoon, S. (2021). Single-image de-

blurring with neural networks: A comparative survey.

Comput. Vis. Image Underst., 203:103134.

Kupyn, O., Martyniuk, T., Wu, J., and Wang, Z.

(2019). Deblurgan-v2: Deblurring (orders-of-

magnitude) faster and better. In ICCV 2019, pages

8877–8886.

Li, X., Wu, J., Lin, Z., Liu, H., and Zha, H. (2018). Recur-

rent squeeze-and-excitation context aggregation net

for single image deraining. In ECCV, Part VII, vol-

ume 11211, pages 262–277.

Lin, T., Doll

ar, P., Girshick, R. B., He, K., Hariharan, B.,

and Belongie, S. J. (2017). Feature pyramid networks

for object detection. In CVPR 2017, pages 936–944.

Liu, X., Suganuma, M., Luo, X., and Okatani, T. (2019).

Restoring images with unknown degradation factors

by recurrent use of a multi-branch network. arXiv:

CVPR.

Martyniuk, T. (2019). Multi-task learning for image

restoration. PhD thesis, Faculty of Applied Sciences,

Ukrain.

Nah, S., Kim, T. H., and Lee, K. M. (2017). Deep multi-

scale convolutional neural network for dynamic scene

deblurring. In CVPR, pages 257–265.

Ren, W., Zhang, J., Pan, J., Liu, S., Ren, J. S., Du, J., Cao,

X., and Yang, M. (2022). Deblurring dynamic scenes

via spatially varying recurrent neural networks. IEEE

Trans. Pattern Anal. Mach. Intell., 44(8):3974–3987.

Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., and

Chen, L. (2018). Mobilenetv2: Inverted residuals and

linear bottlenecks. In CVPR, pages 4510–4520.

Tao, X., Gao, H., Shen, X., Wang, J., and Jia, J. (2018).

Scale-recurrent network for deep image deblurring. In

CVPR, pages 8174–8182.

Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik,

A., and Li, Y. (2022). Maxim: Multi-axis mlp for

image processing. In CVPR, pages 5759–5770.

Walha, R., Drira, F., Alimi, A. M., Lebourgeois, F., and Gar-

cia, C. (2014). A sparse coding based approach for the

resolution enhancement and restoration of printed and

handwritten textual images. In ICFHR 2014, pages

696–701.

Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi,

A. M. (2013). Single textual image super-resolution

using multiple learned dictionaries based sparse cod-

ing. In ICIAP 2013, Part II, volume 8157, pages 439–

448.

Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi,

A. M. (2015). Joint denoising and magniﬁcation of

noisy low-resolution textual images. In ICDAR 2015,

pages 871–875.

Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi,

A. M. (2018). Handling noise in textual image res-

olution enhancement using online and ofﬂine learned

dictionaries. Int. J. Document Anal. Recognit., 21(1-

2):137–157.

Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., and Li, H.

(2022). Uformer: A general u-shaped transformer for

image restoration. In CVPR, pages 17662–17672.

Xia, H., Wu, B., Tan, Y., Tang, X., and Song, S. (2022).

Mfc-net: Multi-scale fusion coding network for image

deblurring. Appl. Intell., 52(11):13232–13249.

Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan,

F. S., and Yang, M. (2022). Restormer: Efﬁcient

transformer for high-resolution image restoration. In

CVPR, pages 5718–5729.

Zamir, S. W., Arora, A., Khan, S. H., Hayat, M., Khan, F. S.,

Yang, M., and Shao, L. (2020). Learning enriched

features for real image restoration and enhancement.

In ECCV, Part XXV, volume 12370, pages 492–511.

Zamir, S. W., Arora, A., Khan, S. H., Hayat, M., Khan, F. S.,

Yang, M., and Shao, L. (2021). Multi-stage progres-

sive image restoration. In CVPR, pages 14821–14831.

A Multi-Task Learning Framework for Image Restoration Using a Novel Generative Adversarial Network

935