Training Methods for Regularizing Gradients on Multi-Task Image

Restoration Problems

∗

Samuel Willingham

1,2 a

, M

arten Sj

ostr

2 b

and Christine Guillemot

1 c

Inria Rennes, Rennes, France

Mid Sweden University, Sundsvall, Sweden

Keywords:

Inverse Problems, Computer Vision, Image Restoration, Deep Equilibrium Models, Deep Priors.

Abstract:

Inverse problems refer to the task of reconstructing a clean signal from a degraded observation. In imaging,

this pertains to restoration problems like denoising, super-resolution or in-painting. Because inverse problems

are often ill-posed, regularization based on prior information is needed. Plug-and-play (pnp) approaches take

a general approach to regularization and plug a deep denoiser into an iterative solver for inverse problems.

However, considering the inverse problems at hand in training could improve reconstruction performance at

test-time. Deep equilibrium models allow for the training of multi-task priors on the reconstruction error via

an estimate of the iterative method’s ﬁxed-point (FP). This paper investigates the intersection of pnp and DEQ

models for the training of a regularizing gradient (RG) and derives an upper bound for the reconstruction loss

of a gradient-descent (GD) procedure. Based on this upper bound, two procedures for the training of RGs

are proposed and compared: One optimizes the upper bound directly, the other trains a deep equilibrium GD

(DEQGD) procedure and uses the bound for regularization. The resulting regularized RG (RERG) produces

consistently good reconstructions across different inverse problems, while the other RGs tend to have some

inverse problems on which they provide inferior reconstructions.

1 INTRODUCTION

This paper considers image enhancement and restora-

tion through the lens of inverse problems. Inverse

problems refer to a large subset of signal processing

applications, where one attempts to recover a signal

from a ﬂawed or noisy observation. In imaging, this

refers to the reconstruction of degraded images - e.g.

images with missing pixels, or low-resolution images.

Because inverse problems can be ill-posed, some kind

of regularization that imposes prior assumptions on

the search-space is necessary.

One approach to solving the regularization prob-

lem for inverse problems are the pnp methods

(Venkatakrishnan et al., 2013; Pesquet et al., 2021;

Chan et al., 2016; Le Pendu and Guillemot, 2023;

Zhang et al., 2021). These methods use pre-trained

https://orcid.org/0009-0005-1954-1143

https://orcid.org/0000-0003-3751-6089

https://orcid.org/0000-0003-1604-967X

∗

This project has received funding from the European

Union’s Horizon 2020 research and innovation programme

under the Marie Skłodowska-Curie grant agreement No

956770.

priors in iterative algorithms like the alternating di-

rection method of multipliers (ADMM) (Chan et al.,

2016), the forward-backward algorithm (FB) (Pes-

quet et al., 2021), or GD. ADMM and FB use a reg-

ularizing proximal operator (PO) that represents the

prior. This PO can be replaced with a deep Gaussian

denoiser (Venkatakrishnan et al., 2013; Chan et al.,

2016), leading to good reconstruction performance.

Apart from methods based on POs, there are other

approaches like regularization by denoising (Romano

et al., 2017), or the pnp Regularizing Gradient (pn-

pReG) (Fermanian et al., 2023), which trains a RG

that is used to regularize a pnp GD procedure. Over-

all, pnp approaches are very general in application,

but because they are not trained on the inverse prob-

lems at hand, performance can be further improved

(Willingham et al., 2023).

To train a full iterative scheme directly on the in-

verse problem at hand, deep equilibrium (DEQ) mod-

els (Gilton et al., 2021; Bai et al., 2019; Fung et al.,

2022; Winston and Kolter, 2020; Ling et al., 2022)

can be used to train an entire iterative method via its

FP. This can be done in a Jacobian-free manner (Fung

et al., 2022), which is effectively one step of back-

Willingham, S., Sjöström, M. and Guillemot, C.

Training Methods for Regularizing Gradients on Multi-Task Image Restoration Problems.

DOI: 10.5220/0012368100003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

145-153

ISBN: 978-989-758-679-8; ISSN: 2184-4321

145

propagation on the iterative method, starting at the FP.

When applied to a range of inverse problems, multi-

task (MT) DEQ models (Willingham et al., 2023) al-

low for the training of MT priors on a large range of

inverse problems by leveraging the reconstruction er-

ror over a range of inverse problems, leading to strong

MT reconstruction performance.

However, DEQ models heavily depend on ﬁnding

a good approximation of the FP. This can be quite dif-

ﬁcult, as convergence is not always guaranteed and

even if the algorithm does converge, this can take a

large amount of iterations or lead to bad estimates of

the FP. As a result, this can lead to sub-optimal pa-

rameter updates.

The contributions of this paper are:

• This paper investigates and compares different ap-

proaches of training a RG for a range of inverse

problems.

• We show that the difference between the GD

and the FB reconstruction errors is bounded from

above. This upper bound led us to propose a

method for training the RG using a FB optimiza-

tion method. The resulting RG is called RG1.

• We also propose a procedure that uses the up-

per bound to regularize the training of a multi-

task DEQGD procedure. This regularized RG

(RERG) leads to strong reconstruction perfor-

mance on a range of inverse problems by taking

into account the FB and the GD reconstruction-

errors. In testing, RERG displayed performance

close to whichever was better (RG1 or DEQGD)

on any given inverse problem.

• We compare four different RGs on a range of in-

verse problems and discuss the differences.

2 RELATED WORKS AND

THEORY

2.1 Inverse Problems and Maximum a

Posteriori Estimation

We consider the image formation model

y = A

x + ε

ε, (1)

where y

y denotes the degraded observation,

x ∈ R

the ground truth image, A : R

→ R

′

denotes the

degradation operation and ε

ε ∈ R

′

denotes additive

white Gaussian noise (AWGN) with standard devia-

tion σ ≥ 0, and d,d

′

∈ N.

In this paper, we consider maximum a posteri-

ori (MAP) estimation (Venkatakrishnan et al., 2013;

Zhang et al., 2021; Le Pendu and Guillemot, 2023;

Fermanian et al., 2023), meaning the search for an

x ∈ R

, that maximizes p(x

x | y

y); i.e. the x

x that is the

most likely to have caused observation y

y. Assuming

uniqueness, this leads to

MAP

= argmax

p(x

x | y

y) (2)

= argmin

−log p(y

y | x

x) − log p(x

x) (3)

= argmin

∥Ax

x − y

y∥

+ σ

R(x

x), (4)

where we set R(x

x) := −log p(x

x) and ∥·∥

denotes the

2-norm. The expression with the norm is called the

data-term. R is called the regularizer and represents

the prior distribution.

The MAP estimation problem can be solved us-

ing iterative algorithms, like the ADMM, FB or GD

algorithm.

2.2 Plug-and-Play Regularizing

Gradient

PnpReG (Fermanian et al., 2023) deals with the regu-

larization problem in equation (4) by linking the gra-

dient of the regularizer with its proximal operator

prox

z) := argmin

∥x

x − z

z∥

+ σ

R(x

x). (5)

As shown in (Fermanian et al., 2023), it holds that for

all z

z ∈ R

and σ ≥ 0

δR(x

δx



x=prox

= z

z − prox

z). (6)

Thus, the loss

link

z) = ∥σ

G(P

z)) − (z

z − P

z))∥

(7)

is used to train the gradient G of the regularizer corre-

sponding to the PO deﬁned by a Gaussian denoiser.

This uses the approximation P

x) ≈ prox

z),

where P

x) is trained as a deep-denoiser. The over-

all training loss is

pnpReG

= δ∥P

(

z) −

x∥

+ λL

link

(

z), (8)

where ∥·∥

is the L1 norm, λ > 0,

z =

x+ε

ε, with ε

ε be-

ing AWGN with standard deviation σ

> 0. Further-

more, δ is equal to 1 if σ = σ

and δ is equal to zero,

otherwise. This leads to a RG that is trained jointly

with a denoiser and produces strong reconstruction re-

sults when used as regularization in a GD algorithm.

Pnp approaches, however, do not take the reconstruc-

tion error into account and could potentially be im-

proved by considering the inverse problems at hand.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

146

2.3 Deep Equilibrium Models

DEQ models (Gilton et al., 2021; Bai et al., 2019;

Fung et al., 2022; Winston and Kolter, 2020; Ling

et al., 2022) allow for the training of a whole iterative

procedure h

via the corresponding FP x

and the re-

sulting reconstruction error. In (Bai et al., 2019), this

is done by considering h

) = x

and thus

δx

δθ

δh

δx



x=x

δx

δθ

δh

δθ



x=x

. (9)

Rearranging this and plugging δ x

/δθ into the

derivative of loss l : R

× R

→ R with respect to θ,

yields

δl(

x,x

)

δθ

δl(

x,x

δx



x=x

δx

δθ

(10)

δl(

x,x

δx



x=x

−1

δh

δθ



x=x

, (11)

where the Jacobian J := id −

δh

δx



x=x

exists and is

invertible (Fung et al., 2022). Assuming the inverse

of J to be equal to the identity still yields a direction

of descent (Fung et al., 2022). This is called Jacobian-

free back-propagation and can be likened to a single

step of traditional back-propagation, using the FP x

as input.

The following section will introduce important

deﬁnitions that we shall use to expand existing the-

ory and propose different training methods for RGs.

3 DEFINITIONS

As it is our intent to train and compare RGs to be used

in a GD algorithm, we shall ﬁrst deﬁne the necessary

functions and expressions.

We deﬁne the two iterative algorithms to solve the

MAP estimation problem in 4. One is the FB algo-

rithm (Pesquet et al., 2021) that was also used for the

multi-task DEQ (MTDEQ) prior (Willingham et al.,

2023), where for each ground truth image

x, we pick

a degradation A : R

→ R

′

and noise-level σ at ran-

dom, generating noise ε

ε, leading to a degraded obser-

vation y

y = A(

x) + ε

ε. For z

z ∈ R

. One iteration of the

FB algorithm takes the form

σ,A,θ

z,y

y) := P

θ,ησ

z − ηD(z

z,y

y)), (12)

where P

θ,σ

is a regularizer in form of a DRUNet (as

in (Le Pendu and Guillemot, 2023) and (Zhang et al.,

2021)) with parameters θ used on noise-level σ ≥ 0

with step-size η ∈ R. For z

z ∈ R

, we deﬁne

D(z

z,y

y) :=

∥Ax

x − y

y∥

δx



x=z

(13)

as the derivative of the data-term.

Similarly, for z

z ∈ R

, one iteration of the gradient-

descent algorithm is deﬁned as

σ,A,θ

z,y

y) := z

z − η



D(z

z,y

y) + σ

z,y



, (14)

where G

is the network representing the gradient of

the regularizer in (4), which we also refer to as RG.

We use the same 3-channel DRU-net architecture that

is used in (Fermanian et al., 2023). x

g,σ,A,θ

y) and

f ,σ,A,θ

y) are the FPs of g

σ,A,θ

(·,y

y) and f

σ,A,θ

(·,y

y),

respectively. For clarity of notation, the indices σ,A

and θ as well as the argument y

y will be omitted, when

this is allowed by the context.

For ease of notation, we deﬁne the reconstruction

errors

σ,A,θ

:= ∥ f (x

) −

x∥

(15)

σ,A,θ

:= ∥g(x

) −

x∥

(16)

where L

denotes the reconstruction error of the FB

algorithm and L

is the reconstruction error of the

GD algorithm.

Remark 1. Note that it is our intention to ﬁnd param-

eters θ and thus, RG G

, such that L

is small. This

is the entity that is evaluated when computing peak

signal-to-noise ratio (PSNR) on the reconstruction.

Deﬁnition 1 (Lipschitz continuity (Bauschke et al.,

2017)). We call a function h : R

→ R

, with n ∈ N,

L-Lipschitz continuous with relation to the metric in-

duced by ∥·∥

and with Lipschitz constant L ≥ 0 if

and only if for all x

∈ R

it holds that

∥h(x

) − h(x

)∥

≤ L∥x

− x

∥

. (17)

If there exists an L < 1 that permits this condition, we

call h a contraction.

Using these deﬁnitions, the following sections

will introduce an upper bound for the GD reconstruc-

tion error, propose training approaches based on said

upper bound and compare the resulting reconstruction

performance.

4 METHOD

4.1 Derivations

This section leverages the introduced deﬁnitions from

section 3 to derive a relationship between the FB re-

construction error and the GD reconstruction error

that leads to a new approach to realizing the goal in

remark 1.

Training Methods for Regularizing Gradients on Multi-Task Image Restoration Problems

147

Theorem 1. If GD procedure g is a contraction with

relation to ∥·∥

(with Lipschitz constant L

< 1), then

there exists an

x ∈ R

, such that

∥x

−

x∥

≤

1 − L

link

ησ

(

x) + ∥x

−

x∥

. (18)

Proof. By plugging

x = x

− ηD(x

) into L

link

, we

get that for all η > 0 and σ ≥ 0, it holds that

link

ησ

(

x) = ∥ησ

G(x

) − (x

− ηD(x

) − x

)∥

(19)

= ∥ησ

G(x

) + ηD(x

)∥

. (20)

This leads to

∥x

− x

∥

=∥x

− η(D(x

) + σ

G(x

)) (21)

+ η(D(x

) + σ

G(x

)) − x

∥

(22)

≤∥g(x

) − x

∥

link

ησ

(

x) (23)

≤L

∥x

− x

∥

link

ησ

(

x), (24)

resulting in the statement via the triangle inequality.

Remark 2. Furthermore, because for all a,b ∈ R, it

holds that a

+ b

≥ 2ab, we get

∥x

−

x∥

≤

(1 − L

)

link

ησ

(

x) + ∥x

−

x∥

, (25)

allowing us to limit the GD reconstruction error (i.e.

a RG to be used in a GD algorithm) by using the two

loss-terms on the right as training objectives.

Based on remark 2, we train a GD procedure by

training a PO and RG to minimize the bound in (25).

Remark 3. It immediately follows that for

bound

(1 − L

)

link

ησ

(

x) + 2∥x

−

x∥

, (26)

we also get that ∥x

−

x∥

is bounded by a convex

combination of itself and L

bound

, or more generally

∥x

−

x∥

≤ λL

bound

+ ζ∥x

−

x∥

, (27)

for all λ,ζ > 0 and with λ + ζ ≥ 1.

This allows us to constrain the hypothesis-space

for the training of a deep equilibrium regularizing gra-

dient, adding regularization for the DEQ training of a

GD algorithm, leading to the RERG, for which the

training objective will be deﬁned in the next section.

Data: Ground truth image set D, k ∈ {1,2,3}, set

of noise-levels Σ and set of degradations A;

Result: Parameters Θ for a regularizing gradient

and the corresponding proximal operator;

for a number of epochs do

for all

x from D do

(σ,A) ← random choice from Σ × A;

y ← A

x + ε

ε, ε

ε ∼ N (0, σ);

Find FPs x

of f

σ,A,θ

and x

of g

σ,A,θ

;

Calculate L

corresponding to the

algorithm trained;

Update parameters θ via loss L

;

end

Algorithm 1: Training algorithm for the RGs. k = 1 leads

to RG1, k = 2 gives the DEQGD and k = 3 leads to RERG.

4.2 Algorithm

Based on the deﬁnitions in section 3 and the deriva-

tions in section 4.1, we introduce the training pipeline

described in algorithm 1. This pipeline is quite simi-

lar to the one used in (Willingham et al., 2023), with

only one inverse problem considered at each iteration

and with different training objectives. We use this al-

gorithm for the training of three different RGs, via the

training objectives

:= 0.1L

link

− ηD(x

)) + L

(28)

:= L

(29)

:= L

+ 0.1L

(30)

where 0.1 is an experimentally chosen hyper-

parameter, which could likely be further optimized to

improve reconstruction performance at test time. The

weighting in L

could also be modiﬁed to increase the

weight of L

or L

, respectively.

Note that for the derivative of L

link

− ηD(x

))

used for the update of the parameters in algorithm 1,

we use Jacobian-free back-propagation (Fung et al.,

2022), i.e. the assumption that the approximation

δL

link

− ηD(x

))

δθ

≈

δL

link

δθ



x=x

−ηD(x

)

(31)

yields a direction of descent.

The use of algorithm 1 with objective L

is differ-

ent from pnpReG in three ways:

• We replace the denoising-loss (i.e. the ﬁrst

summand in (8)) with the FB reconstruction

loss, training the PO on the resulting FB

reconstruction-error rather than a Gaussian de-

noising problem.

• The algorithm evaluates L

link

x from Theorem

1, which depends on the equilibrium point of the

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

148

FB algorithm, instead of evaluating L

link

on im-

ages only perturbed by AWGN, as is done in the

pnpReG method of (Fermanian et al., 2023). If we

consider only L

link

x from Theorem 1, it would

train a PO and RG in a way such that the resulting

FB and GD algorithms have the same FP. Looking

at equation (20), this measures how ”non-ﬁxed”

is when put into the corresponding GD proce-

dure, since the part with L

link

is equal to zero if x

is a FP of the GD procedure.

• We only consider the case where the σ used for

link

is identical with the standard deviation of the

AWGN.

Similar to (Willingham et al., 2023), we evaluate the

neural nets at images that actually appear when it-

eratively solving the MAP estimation problem from

(4) for a given inverse problem, rather than on an as-

sumed Gaussian perturbation.

Furthermore, we can use the upper bound to regu-

larize the training of a DEQGD procedure by consid-

ering the reconstruction error of the GD procedure as

well as regularizing the RG by tying it to a PO via the

bound given in (25). This constrains the hypothesis-

space by using two loss-terms that can both be used

to approach the training objective outlined in remark

The following sections will highlight how the re-

sulting methods were trained and tested in order to

compare them to pnpReG (Fermanian et al., 2023),

MTDEQ (Willingham et al., 2023) and Gaussian de-

noiser plugged into the pnp ADMM algorithm (results

from (Fermanian et al., 2023)).

5 EXPERIMENTS

Using the theory and algorithm introduced in the pre-

vious sections, this section discusses the details of

training and the experiments made to compare the

trained RGs.

For the experiments we have trained three differ-

ent RGs using the pipeline outlined in algorithm 1:

• RG1, which is the prior that uses loss L

i.e. di-

rectly attempts to minimize the upper bound,

• A DEQGD, which uses loss L

, and

• RERG, which uses loss L

, combining both RG1

and a DEQGD.

5.1 Step-Sizes and Degradations

We use a step-size of η

= 0.05 for the GD proce-

dure, and similar to (Fermanian et al., 2023), we use

the adam optimizer (Kingma and Ba, 2014) to ﬁnd

the FP of the GD procedure from (14) for training.

The FB procedure uses step-size η

= 0.49, which

is taken from (Willingham et al., 2023).

We consider the following degradations in train-

ing:

• Gaussian deblurring with the level of blur σ

[0,4]

• Super-resolution with factors 1,2 and 4

• Pixel-wise completion where each pixel has a

chance of p

drop

∈ [0,0.99]. Each selection of p

drop

refers to a degradation.

Additionally, we used noise-levels sampled from a

uniform distribution on Σ := [0, 50/255]. For each it-

eration, there is a 1/3 chance of choosing deblurring,

super-resolution or pixel-wise completion. After this

choice, the degree of the degradation is sampled via a

uniform distribution on the corresponding set.

5.2 Dataset and Optimizer

The networks are optimized using the adam optimizer

(Kingma and Ba, 2014) on a training-dataset consist-

ing of the data-set from DIV2k (Agustsson and Tim-

ofte, 2017), the training-set from BSD500 (Arbelaez

et al., 2011), ﬂick2k (Lim et al., 2017) and the Wa-

terloo Exploration Database (Ma et al., 2016). Over-

all, this data-set contains 8394 images. Each iteration

takes a batch of 16 images and crops each image at a

random location to the size of 128 by 128 pixels.

For the ﬁnding of the FP, the forward iterations of

GD or FB are terminated if one or more of the follow-

ing three conditions hold:

• The forward iteration has gone on for 500 itera-

tions.

• The absolute value of any entry of an estimate is

larger than 100.

• The mean square distance between two consecu-

tive estimates is less than 10

−7

The ﬁrst condition is necessary because of time con-

straints; the second condition is in place to avoid

the network forgetting what has been trained previ-

ously if the iterative method starts to diverge. This

avoids unreasonably large gradients in failure-cases.

The ﬁnal condition is the proper convergence condi-

tion. Choosing a smaller margin for convergence or a

higher number of maximum forward iterations tends

to lead to a more accurate FP estimation. As a re-

sult, this can be expected to lead to more accurate, but

slower, training.

In our training, the networks for the RG are ini-

tialized with the pnpReG (Fermanian et al., 2023),

Training Methods for Regularizing Gradients on Multi-Task Image Restoration Problems

149

while the networks representing the PO are initial-

ized with the MTDEQ regularizer (Willingham et al.,

2023). The networks are trained for 150 epochs, start-

ing with a step-size of 10

−5

. The step-size is reduced

by 75% every 15 epochs.

5.3 Comparisons to Other Methods

We compare our different algorithms to MTDEQ

(Willingham et al., 2023), pnp ADMM with a Gaus-

sian denoiser (called Gauss in the tables) and pn-

pReG. The PSNR values for the two latter methods

are drawn from (Fermanian et al., 2023). The com-

parisons are done on set5 (Bevilacqua et al., 2012).

Hyper-parameters used for the testing of the RGs in a

GD algorithm are taken from (Fermanian et al., 2023)

and were tuned to produce the best results for pn-

pReG, meaning they were not further optimized for

RG1, DEQGD or RERG. Note that the weight of the

regularization is given by the AWGN in the inverse

problem, but for problems with no AWGN, a weight

larger than zero was chosen to allow for regulariza-

tion.

Based on these experiments, the next section will

compare the performance of the different approaches

and discuss the results.

6 RESULTS AND DISCUSSION

To showcase the differences that appear when using

the introduced bound for RG1 and RERG, this sec-

tion will compare the performance of four different

RGs and examine differences in reconstruction per-

formance.

Training a DEQGD is difﬁcult, as convergence of

the forward iterations is elusive and in our training

only about half the iterations converged before reach-

ing 500 iterations. This is why we used adam in for-

ward iteration as well as the stopping condition of any

entry having absolute value over 100. Training RG1

was quite stable and did not lead to any larger issues

in training, as the FB algorithm used tends to be much

more stable and converge faster.

The RG1 prior performs best for completion (see

table 1), while the DEQGD performs better for many

of the other applications (especially the noisy ones).

This is likely the case because the GD procedure di-

rectly uses the (known) noise-level in each iteration.

The FB algorithm, on the other hand, uses a proxi-

mal operator, in which the link between the level of

AWGN given to the proximal operator is processed

by a neural net, necessitating a training of this rela-

tionship. For zero-noise problems with little to no

AWGN, this is a boon, because the regularization

weight in a GD procedure may become too small to

provide meaningful regularization. This is why the

weight for the regularization is chosen to be larger

than zero at test-time, when a noise-less problem is

considered.

On any given task, RERG performs close to

whichever of RG1 and DEQGD performs best, out-

performing DEQGD on most problems. This means

that using the upper bound in addition to L

in train-

ing can lead to a procedure that leverages the recon-

struction errors of both, a GD and a FB algorithm to

perform well across all the degradations considered.

If one looks at the visual examples in ﬁgure 1, it ap-

pears as if the RERG avoids some of the artifacts that

appear in RG1 and DEQGD, like the blurring of the

butterﬂy’s pattern on the bottom right in ﬁgure 1 (b),

or the artifacts produced by RG1 in (a). While RG1

or the DEQGD both perform well for some problems

and worse for others, RERG appears to perform more

consistently across the different degradations consid-

ered.

Figure 2 shows that for inverse problems where

convergence is slow (necessitating a higher step-size)

DEQGD and RERG may converge faster, because

they are trained with a larger step-size and a limited

amount of iterations. The problem highlighted in ﬁg-

Table 1: This table compares methods on noise-less pixel-wise completion with 80 and 90 % of the pixels missing, respec-

tively. It also displays deblurring results for σ

noise

= 0.01 and Gaussian kernels with two different levels of blur σ

. Results

are reported as PSNR (dB) | SSIM (Structural Similarity Index Measure (Wang et al., 2004)). Both metrics are computed on

the red green blue color channels.

Methods

Completion Deblurring

80% 90% σ

= 1.6 σ

= 2.0

Gauss 30.20 | 0.893 26.20 | 0.821 32.06 | 0.884 30.88 | 0.866

MTDEQ 30.72 | 0.897 27.09 | 0.837 32.82 | 0.898 31.83 | 0.881

pnpReG 30.36 | 0.894 26.94 | 0.830 32.51 | 0.898 31.19 | 0.884

RG1 30.58 | 0.899 27.18 | 0.837 32.31 | 0.884 31.67 | 0.877

DEQGD 29.59 | 0.892 24.34 | 0.811 32.80 | 0.901 32.03 | 0.888

RERG 30.29 | 0.894 26.86 | 0.830 32.90 | 0.902 32.17 | 0.890

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

150

Table 2: Results for super-resolution on images that were down-sampled using a Gaussian kernel with σ

= 0.5 and a bicubic

kernel, respectively. This is done on a factor of 2 and 3, as well as two different noise-levels. Results are reported in the form

PSNR | SSIM.

Methods

Bicubic Gaussian

w/ σ

noise

w/ σ

noise

0.00 0.01 0.00 0.01

Gauss 35.20 | 0.940 33.80 | 0.917 35.14 | 0.938 32.74 | 0.900

MTDEQ 35.61 | 0.942 34.43 | 0.917 35.42 | 0.939 33.45 | 0.901

pnpReG 35.34 | 0.943 34.29 | 0.922 35.30 | 0.942 33.41 | 0.906

RG1 35.66 | 0.944 33.86 | 0.908 35.57 | 0.943 32.51 | 0.876

DEQGD 35.50 | 0.941 34.40 | 0.924 35.40 | 0.940 33.66 | 0.911

RERG 35.61 | 0.943 34.42 | 0.922 35.52 | 0.941 33.69 | 0.911

Gauss 31.49 | 0.892 30.39 | 0.861 31.45 | 0.890 29.17 | 0.819

MTDEQ 32.10 | 0.900 31.15 | 0.865 31.94 | 0.896 30.17 | 0.842

pnpReG 31.75 | 0.896 31.13 | 0.877 31.60 | 0.896 30.39 | 0.858

RG1 31.73 | 0.899 31.21 | 0.871 31.06 | 0.892 30.33 | 0.847

DEQGD 32.07 | 0.899 31.31 | 0.881 31.74 | 0.898 30.61 | 0.867

RERG 32.09 | 0.901 31.38 | 0.881 31.67 | 0.899 30.74 | 0.867

(a) Image results for noisy 3x Gaussian super-resolution.

(b) Image results for noise-less completion with 90% of pixels missing

= 2.0 from table 1

Figure 1: Image results for the GD based methods with corresponding PSNR in dB.

Training Methods for Regularizing Gradients on Multi-Task Image Restoration Problems

151

Figure 2: Average PSNR on set5 across the GD iterations

for the solution of a noise-less completion problem with

90% of the pixels missing (see table 1).

ure 2 is reconstructed with a step-size of 0.025, which

is the largest step-size used in testing, while most of

the other problems use much smaller step-sizes and

do not exhibit the same phenomenon.

One big issue with any GD-based scheme so far

is that they are quite slow to test (and to train, if one

uses a DEQGD approach). The test we performed had

1500 forward iterations, as do the tests in (Fermanian

et al., 2023), meaning this is much slower than the

pnp ADMM used for the MTDEQ (hyper-parameters

for the pnp ADMM algorithm used can also be found

in (Fermanian et al., 2023)).

Further investigation of different hyper-

parameters for the training of a RERG could

provide even better performance on the tasks con-

sidered and improve convergence speed at testing.

There are procedures that can be used to speed up FP

calculations, like the method from (Bai et al., 2021)

or the correction terms from (Bai et al., 2022), that

could be incorporated to speed up inference and FP

estimation in training.

7 CONCLUSION

In this paper, we introduced an upper bound that can

be used for the training of a GD procedure as both a

training objective and a regularization. We compared

four different types of RGs on a range of different in-

verse problems and discussed some of the differences,

showing that the use of an upper bound for regulariza-

tion to create a RERG can mitigate some of the disad-

vantages of the DEQGD and the RG1.

So far, few investigations have been done on RGs

and we extended the theoretical framework intro-

duced in (Fermanian et al., 2023) while proposing

two novel ways of training RGs. We compared the

resulting RGs and demonstrated that the RERG that

combines both the upper bound and a DEQGD pro-

duces strong reconstruction results across all the in-

verse problems considered.

REFERENCES

Agustsson, E. and Timofte, R. (2017). Ntire 2017 challenge

on single image super-resolution: Dataset and study.

In CVPR workshops, pages 126–135. IEEE.

Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. (2011).

Contour detection and hierarchical image segmenta-

tion. IEEE TPAMI, 33(5):898–916.

Bai, S., Geng, Z., Savani, Y., and Kolter, J. Z. (2022). Deep

equilibrium optical ﬂow estimation. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition, pages 620–630.

Bai, S., Kolter, J. Z., and Koltun, V. (2019). Deep equilib-

rium models. arXiv preprint arXiv:1909.01377.

Bai, S., Koltun, V., and Kolter, J. Z. (2021). Neural deep

equilibrium solvers. In International Conference on

Learning Representations.

Bauschke, H. H., Combettes, P. L., Bauschke, H. H., and

Combettes, P. L. (2017). Convex Analysis and Mono-

tone Operator Theory in Hilbert Spaces. Springer.

Bevilacqua, M., Roumy, A., Guillemot, C., and Alberi-

Morel, M. L. (2012). Low-complexity single-image

super-resolution based on nonnegative neighbor em-

bedding. In BMVC, pages 135.1–135.10. BMVA

press.

Chan, S. H., Wang, X., and Elgendy, O. A. (2016). Plug-

and-play admm for image restoration: Fixed-point

convergence and applications. IEEE TCI, 3(1):84–98.

Fermanian, R., Le Pendu, M., and Guillemot, C. (2023).

Pnp-reg: Learned regularizing gradient for plug-and-

play gradient descent. SIIMS, 16(2):585–613.

Fung, S. W., Heaton, H., Li, Q., McKenzie, D., Osher,

S., and Yin, W. (2022). Jfb: Jacobian-free back-

propagation for implicit networks. In Proceedings of

the AAAI Conference on Artiﬁcial Intelligence, vol-

ume 36, pages 6648–6656.

Gilton, D., Ongie, G., and Willett, R. (2021). Deep equi-

librium architectures for inverse problems in imaging.

IEEE TCI, 7:1123–1133.

Kingma, D. P. and Ba, J. (2014). Adam: A

method for stochastic optimization. arXiv preprint

arXiv:1412.6980.

Le Pendu, M. and Guillemot, C. (2023). Preconditioned

plug-and-play admm with locally adjustable denoiser

for image restoration. SIIMS, 16(1):393–422.

Lim, B., Son, S., Kim, H., Nah, S., and Lee, K. M. (2017).

Enhanced deep residual networks for single image

super-resolution. In CVPR. IEEE.

Ling, Z., Xie, X., Wang, Q., Zhang, Z., and Lin, Z. (2022).

Global convergence of over-parameterized deep equi-

librium models. arXiv preprint arXiv:2205.13814.

Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H.,

and Zhang, L. (2016). Waterloo exploration database:

New challenges for image quality assessment models.

IEEE TIP, 26(2):1004–1016.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

152

Pesquet, J.-C., Repetti, A., Terris, M., and Wiaux, Y. (2021).

Learning maximally monotone operators for image re-

covery. SIIMS, 14(3):1206–1237.

Romano, Y., Elad, M., and Milanfar, P. (2017). The little

engine that could: Regularization by denoising (red).

SIIMS, 10(4):1804–1844.

Venkatakrishnan, S. V., Bouman, C. A., and Wohlberg, B.

(2013). Plug-and-play priors for model based recon-

struction. In GlobalSIP, pages 945–948. IEEE.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.

(2004). Image quality assessment: from error visi-

bility to structural similarity. IEEE transactions on

image processing, 13(4):600–612.

Willingham, S., Sj

ostr

om, M., and Guillemot, C. (2023).

Prior for multi-task inverse problems in image recon-

struction using deep equilibrium models. In European

Signal Processing Conference (EUSIPCO).

Winston, E. and Kolter, J. Z. (2020). Monotone

operator equilibrium networks. arXiv preprint

arXiv:2006.08591.

Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., and

Timofte, R. (2021). Plug-and-play image restoration

with deep denoiser prior. IEEE TPAMI.

Training Methods for Regularizing Gradients on Multi-Task Image Restoration Problems

153