A Learning Paradigm for Interpretable Gradients

Felipe Torres Figueroa

, Hanwei Zhang

, Ronan Sicre

, Yannis Avrithis

and Stephane Ayache

Centrale Marseille, Aix Marseille Univ., CNRS, LIS, Marseille, France

Institute of Advanced Research on Artiﬁcial Intelligence (IARAI), Austria

Keywords:

Gradient, Guided Backpropagation, Class Activation Maps, Interpretability.

Abstract:

This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches

based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through

variants of backpropagation. However, it is well understood that gradients are noisy and alternatives like

guided backpropagation have been proposed to obtain better visualization at inference. In this work, we

present a novel training approach to improve the quality of gradients for interpretability. In particular, we

introduce a regularization loss such that the gradient with respect to the input image obtained by standard

backpropagation is similar to the gradient obtained by guided backpropagation. We ﬁnd that the resulting gra-

dient is qualitatively less noisy and improves quantitatively the interpretability properties of different networks,

using several interpretability methods.

1 INTRODUCTION

The improvement of deep learning models in the last

decade has led to their adoption and penetration into

most application sectors. Since these models are

highly complex and opaque, the requirement for inter-

pretability of their predictions receives a lot of atten-

tion (Lipton, 2018). Explanation and transparency be-

comes a legal requirements for systems used in high-

stakes and high-risk decisions.

In this work, we focus on the visual interpretabil-

ity of deep learning models. Model interpretability

is often categorized into transparency and post-hoc

methods. Transparency aims at producing models

where the inner process or part of it can be under-

stood. Post-hoc methods consider models as black-

boxes and interpret decisions mainly based on inputs

and outputs.

In visual recognition, most methods focus on

post-hoc interpretability by means of saliency maps.

These maps highlight the most important areas of

an image related to the network prediction. Initial

works focused on using gradients for visualization,

such as guided backpropagation (Springenberg et al.,

2014). CAM (Zhou et al., 2016) later proposed a

class-speciﬁc linear combination of feature maps and

opened the way to numerous weighting strategies.

Most CAM-based methods use backpropagation

in one way or another. Recognizing that the gra-

dients obtained this way are noisy, methods like

SmoothGrad (Smilkov et al., 2017) and SmoothGrad-

CAM++ (Omeiza et al., 2019) improve the quality of

saliency maps by denoising the gradients. However,

this requires several forward passes, thus comes with

increased cost at inference.

In this work, we rather propose a learning

paradigm for model training that regularizes gradients

to improve the performance of interpretability meth-

ods. In particular, we add a regularization term to the

loss function that encourages the gradient in the input

space to align with the gradient obtained by guided

back-propagation. This has a smoothing effect on gra-

dient and is shown to improve the power of model in-

terpretations.

Figure 1 summarizes our method. At training,

each input image is forwarded through the network to

compute the cross-entropy loss. Standard and guided

backpropagation is performed back to the input im-

age space, where our regularization term is computed.

This term is added to the loss and backpropagated

only through the standard backpropagation branch.

The key contributions of this work are as follows:

• We introduce a new learning paradigm to regular-

ize gradients.

• Using different networks, we show that our

method improves the gradient quality and the per-

formance of several interpretability methods us-

ing multiple metrics, while preserving accuracy.

Figueroa, F., Zhang, H., Sicre, R., Avrithis, Y. and Ayache, S.

A Learning Paradigm for Interpretable Gradients.

DOI: 10.5220/0012466800003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

757-764

ISBN: 978-989-758-679-8; ISSN: 2184-4321

757

guided

backprop

∂

/∂x

input image x

network

classiﬁcation

loss L

standard

backprop

∂L

/∂x

regularization

loss L

total loss

L = L

+ λL

logits y

stop gradient

Figure 1: Interpretable gradient learning. For an input image x, we obtain the logit vector y = f (x; θ ) by a forward pass

through the network f with parameters θ. We compute the classiﬁcation loss L

by softmax and cross-entropy (6), (7).

We obtain the standard gradient ∂L

/∂x and guided gradient ∂

/∂x by two backward passes (dashed) and compute the

regularization loss L

as the error between the two (8),(10)-(12). The total loss is L = L

+ λL

(9). Learning is based on

∂L/∂θ, which involves differentiation of the entire computational graph except the guided backpropagation branch (blue).

2 RELATED WORK

Interpretability of deep neural network decisions is

a problem that receives increasing interest. As in-

terpretability is not simple to deﬁne, Lipton (Lipton,

2018) proposes some common ground, deﬁnitions

and categorization for interpretability methods. For

instance, transparency aims at making models simple

so it is humanly possible to provide an explanation of

its inner mechanism. By contrast, post-hoc methods

consider models as black boxes and study the activa-

tions leading to a speciﬁc output.

LIME (Ribeiro et al., 2016) and SHAP (Lundberg

and Lee, 2017) are probably the most popular post-

hoc methods that are model agnostic and provide lo-

cal information. Concerning image recognition tasks,

it is common to generate saliency maps highlighting

the areas of an image that are responsible for a spe-

ciﬁc prediction. Several of these methods are either

based on backpropagation and its variants or on Class

Activation Maps (CAM) that weigh the importance of

activation maps.

2.1 Gradient-Based Approaches

Gradient-based approaches assess the impact of dis-

tinct image regions on the prediction based on the par-

tial derivative of the model prediction function with

respect to the input. A simple saliency map can be

the partial derivative obtained by a single backward

pass through the model (Simonyan et al., 2014).

Guided backpropagation (Springenberg et al.,

2014) enhances explanations by removing negative

gradients through ReLU units. For better visualiza-

tion, SmoothGrad (Smilkov et al., 2017) applies noise

to the input and derives saliency maps based on the

average of resulting gradients. Layer-wise Relevance

Propagation (LRP) (Bach et al., 2015) reallocates

the prediction score through a custom backward pass

across the network.

Our method has a similar objective as Smooth-

Grad (Smilkov et al., 2017) but instead of using sev-

eral forward passes at inference, we regularize gra-

dients using guided backpropagation during training.

Thus we obtain better gradients without modifying

the inference process and our method can be used

with any interpretability method at inference.

2.2 CAM-Based Approaches

Class Activation Maps (Zhou et al., 2016) produces

a saliency map that highlights the areas of an image

that are the most responsible for a CNN decision. The

saliency map is computed as a linear combinations of

feature maps from a given layer. Different variants

of CAM are proposed by deﬁning different weighting

coefﬁcients. Grad-CAM (Selvaraju et al., 2017), for

instance, spatially averages the gradient with respect

to feature maps. Grad-CAM++ (Chattopadhay et al.,

2018) improves object localization by using positive

partial derivatives and measuring recognition and lo-

calization metrics.

It is possible to extend CAM to multiple lay-

ers (Jiang et al., 2021) and to improve sensi-

tivity (Sundararajan et al., 2017) and conserva-

tion (Montavon et al., 2018) by the addition of ax-

ioms (Fu et al., 2020). Score-CAM (Wang et al.,

2020) is a gradient-free method that computes weight-

ing coefﬁcients by maximizing the Average Increase

metric (Chattopadhay et al., 2018). Further improve-

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

758

ment can be obtained by means of test-time optimiza-

tion (Zhang et al., 2023).

Some works provide explanations that not only lo-

calize salient parts of images, but also provide theo-

retical bases on the effect of modifying such regions

for a given input (Fu et al., 2020). An exhaustive al-

ternative performs ablation experiments to highlight

such parts (Ramaswamy et al., 2020).

All these approaches apply at inference, without

modifying the model or the training process. By con-

trast, our work applies at training with the objec-

tive of improving the quality of gradients, which is

much needed for gradient-based methods. Thus, our

method is orthogonal and can be used with any of

these approaches at inference.

2.3 Double Backpropagation

Double backpropagation is a general regulariza-

tion paradigm, ﬁrst introduced by Drucker and Le

Cun (Drucker and Le Cun, 1991) to improve general-

ization. The idea is used to avoid overﬁtting (Philipp

and Carbonell, 2018), help transfer (Srinivas and

Fleuret, 2018), cope with noisy labels (Luo et al.,

2019), and more recently to increase adversarial ro-

bustness (Lyu et al., 2015; Simon-Gabriel et al., 2018;

Ross and Doshi-Velez, 2018; Seck et al., 2019; Finlay

et al., 2018). It aims at penalizing the 

(Seck et al.,

2019), 

or 

∞

norm of the gradient with respect to

the input image.

Our method is related and regularizes the stan-

dard gradient by aligning it with the guided gradi-

ent, obtained by guided backpropagation (Springen-

berg et al., 2014).

3 BACKGROUND

3.1 Guided Backpropagation

The derivative of v = ReLU(u) = [u]

= max(u,0)

with respect to u is dv/du =

u>0

. By the chain rule,

a signal δv = ∂L/∂v is then propagated backwards

through the ReLU unit to δu = ∂L/∂u as δu =

u>0

δv,

where ∂L/∂v is the partial derivative of any scalar

quantity of interest, e.g. a loss L, with respect to v.

Guided backpropagation (Springenberg et al.,

2014) changes this to δ

u =

u>0

[δv]

, masking out

values corresponding to negative entries of both the

forward (u) and the backward (δv) signals and thus

preventing backward ﬂow of negative gradients.

Standard backpropagation through an entire net-

work f with this particular change for ReLU units

is called guided backpropagation. The correspond-

ing guided “partial derivative” or guided gradient of

scalar quantity L with respect to v is denoted by

∂

L/∂v. This method allows sharp visualization of

high-level activations conditioned on input images.

3.2 CAM-Based Methods

CAM-based methods build a saliency map as a linear

combination of feature maps. Given a target class c

and a set of 2D feature maps {A

}

k=1

, the saliency

map is deﬁned as

= ReLU

∑

k=1

, (1)

where the weight α

determines the contribution of

channel k to class c. The saliency map S

and the

feature maps A

are both non-negative because of

using ReLU activation functions. Different CAM-

based methods differ primarily in the deﬁnition of the

weights α

CAM (Zhou et al., 2016) originally deﬁnes α

the weight connecting channel k to class c in the clas-

siﬁer, assuming {A

} are the feature maps of the last

convolutional layer, which is followed by global av-

erage pooling (GAP) and a fully connected layer.

Grad-CAM (Selvaraju et al., 2017) is a general-

ization of CAM for any network. If y

is the logit of

class c, the weights are obtained by GAP of the partial

derivatives of y

with respect to elements of feature

map A

of any given layer,

∑

i, j

∂y

∂A

i j

, (2)

where A

i j

denotes the value at spatial location (i, j) of

feature map A

and Z is the total number of locations.

Guided Grad-CAM elementwise-multiplies the

saliency maps obtained by Grad-CAM and guided

backpropagation, after adjusting spatial resolu-

tions. The resulting visualizations are both class-

discriminative (by Grad-CAM) and contain ﬁne-

grained detail (by guided backpropagation).

Grad-CAM++ (Chattopadhay et al., 2018) is a

generalization of Grad-CAM, where partial deriva-

tives of y

with respect to A

are followed by ReLU as

in guided backpropagation and GAP is replaced by a

weighted average:

∑

i, j

i j

ReLU

∂y

∂A

i j

. (3)

A Learning Paradigm for Interpretable Gradients

759

The weights w

i j

of the linear combination are

i j

∂

∂(A

i j

)

∂

∂(A

i j

)

∑

a,b

∂

∂(A

i j

)

. (4)

Score-CAM (Wang et al., 2020) computes the

weights a

based on the increase in conﬁdence (Chat-

topadhay et al., 2018) for class c obtained by mask-

ing (element-wise multiplying) the input image x with

feature map A

= f (x ◦ s(Up(A

)))

− f (x

)

, (5)

where Up is upsampling to the spatial resolution of

x, s is linear normalization to range [0,1], ◦ is the

Hadamard product, f is the network mapping of input

image to class probability vectors and x

is a baseline

image.

While Score-CAM does not require gradients to

compute saliency maps, (5) requires one forward pass

through the network f for each channel k.

4 METHOD

Preliminaries. We consider an image classiﬁcation

network f with parameters θ, which maps an input

image x to a vector of predicted class probabilities

p = f (x; θ). At inference, we predict the class of max-

imum conﬁdence argmax

, where p

is the proba-

bility of class j. At training, given training images

X = {x

}

i=1

and target labels T = {t

}

i=1

, we com-

pute the classiﬁcation loss

(X,θ,T ) =

∑

i=1

CE( f (x

;θ),t

), (6)

where CE is cross-entropy:

CE(p,t) = −log p

. (7)

Updates of parameters θ are performed by an opti-

mizer, based on the standard partial derivative (gradi-

ent) ∂L

/∂θ of the classiﬁcation loss L

with respect

to θ, obtained by standard back-propagation.

Motivation. Due to non-linearities like ReLU acti-

vations and downsampling like max-pooling or con-

volution stride greater than 1, the standard gradient is

noisy (Smilkov et al., 2017). This is shown by visu-

alizing the gradient ∂L

/∂x with respect to an input

image x. By contrast, the guided gradient ∂

/∂x

(Springenberg et al., 2014) does not suffer much from

noise and preserves sharp details. The difference of

the two gradients is illustrated in Figure 1.

The main motivation of this work is that introduc-

ing a regularization term during training could make

the standard gradient ∂L

/∂x behave similarly to the

corresponding guided gradient ∂

/∂x, while main-

taining the predictive power of the classiﬁer f . We hy-

pothesize that, if this is possible, it will improve the

quality of all gradients with respect to intermediate

activations and therefore the quality of saliency maps

obtained by CAM-based methods (Zhou et al., 2016;

Selvaraju et al., 2017; Chattopadhay et al., 2018;

Wang et al., 2020) and the interpretability of network

f . The effect may be similar to that of SmoothGrad

(Smilkov et al., 2017), but without the need for sev-

eral forward passes at inference.

Regularization. Given an input training image x

and its target labels t

, we perform a forward pass

through f and compute the probability vectors p

f (x

,θ) and the contribution of (x

) to the classiﬁ-

cation loss L

(X,θ,T ) (6).

We then obtain the standard gradients δx

∂L

/∂x

and the guided gradients δ

= ∂

/∂x

with respect to x

by two separate backward passes.

Since the whole process is differentiable (w.r.t. θ) at

training, we stop the gradient computation of the lat-

ter, so that it only serves as a “teacher”. We deﬁne the

regularization loss

(X,θ,T ) =

∑

i=1

E(δx

,δ

), (8)

where E is an error function between the two gradient

images, considered below.

The total loss is deﬁned as

L(X,θ,T ) = L

(X,θ,T ) + λL

(X,θ,T ), (9)

where λ is a hyperparameter determining the regular-

ization strength. λ should be large enough to smooth

the gradient without decreasing the classiﬁcation ac-

curacy or harming the training process.

Updates of the network parameters θ are based on

the standard gradient ∂L/∂θ of the total loss L w.r.t.

θ, using any optimizer. At inference, one may use

any interpretability method to obtain a saliency map

at any layer.

Error Function. Given two gradient images δ,δ

consisting of m pixels each, we consider the follow-

ing error functions E to compute the regularization

loss (8).

1. Mean absolute error (MAE):

MAE

(δ,δ

) =



δ − δ



. (10)

2. Mean squared error (MSE):

MSE

(δ,δ

) =



δ − δ



. (11)

We also consider the following two similarity

functions, with a negative sign.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

760

3. Cosine similarity:

cos

(δ,δ

) = −

δ,δ

, (12)

where

denotes inner product.

4. Histogram intersection (HI):

(δ,δ

) = −

∑

i=0

min(

)

. (13)

Algorithm. Our method is summarized in algo-

rithm 1 and illustrated in Figure 1. It is interest-

ing to note that the entire computational graph de-

picted in Figure 1 involves one forward and two back-

ward passes. This graph is then differentiated again

to compute ∂L/∂θ, which involves one more forward

and backward pass, since the guided backpropagation

branch is excluded. Thus, each training iteration re-

quires ﬁve passes through f instead of two in standard

training.

Algorithm 1: Interpretable gradient loss.

Input: network f , parameters θ

Input: input images X = {x

}

i=1

Input: target labels T = {t

}

i=1

Output: loss L

←

∑

i=1

CE( f (x

;θ),t

)  class. loss (6)

foreach i ∈ {1,... ,n} do

δx

← ∂L

/∂x

 standard grad

← ∂

/∂x

 guided grad

DETACH(δ

)  detach from graph

←

∑

i=1

E(δx

,δ

)  reg. loss (8)

L ← L

+ λL

 total loss (9)

5 EXPERIMENTS

5.1 Experimental Setup

In the following sections, we evaluate the effect of our

approach on recognition and interpretability.

Models and Datasets. We train and evaluate a

ResNet-18 (He et al., 2016) and a MobileNet-

V2 (Sandler et al., 2018) on CIFAR-100 (Krizhevsky,

2009). ResNets are the most common CNNs and

the ResNet-18 is particularly adapted to low resolu-

tion images. MobileNet-V2 is a widely used com-

pact CNN. CIFAR-100 contains 60.000 images of 100

categories, split in 50.000 for training and 10.000 for

testing. Each image has a resolution of 32×32 pixels.

Settings. To obtain competitive performance and

ensure the replicability of our method, we follow the

methodology by weiaicunzai

. In particular, we train

for 200 epochs, with a batch-size of 128 images, SGD

optimizer, initial learning rate 10

−1

and learning rate

decay by a factor of 5 on epochs 60, 120 and 160.

At inference, we generate explanations fol-

lowing popular attribution methods derived from

CAM (Zhou et al., 2016), from the pytorch-grad-cam

library from Jacob Gildenblat

5.2 Faithfulness Metrics

Faithfulness evaluation (Chattopadhay et al., 2018)

offers insight on the regions of an image that are con-

sidered important for recognition, as highlighted by

the saliency map S

. Speciﬁcally, given a target class

c, an image x and a saliency map S

are element-wise

multiplied to obtain a masked image

= S

◦ x.

This masked image is similar to the original image

on the salient areas and black on the non-salient ones.

To evaluate the quality of saliency maps, we forward

both the original image x and its masked version m

through the network to obtain the predicted probabil-

ities p

and o

respectively. We then compute a num-

ber of metrics as deﬁned below.

Average Drop (AD). aims at quantifying how much

predictive power is lost when we consider the masked

image compared to the original one. Lower is better.

AD =

∑

i=1

− o

]

. (14)

Average Increase (AI). is also known as Increase

of Conﬁdence and measures the percentage of exam-

ples of the dataset where the masked image offers

a higher probability than the original for the target

class. Higher is better.

AI =

∑

< o

). (15)

Average Gain (AG). is recently introduced

in (Zhang et al., 2023) and designed to be a sym-

metric complement of AD, replacing AI. It aims at

quantifying how much predictive power is gained

when we consider the masked image compared to the

original one. Higher is better.

AG =

∑

i=1

− p

]

. (16)

https://github.com/weiaicunzai/pytorch-cifar100

https://github.com/jacobgil/pytorch-grad-cam

A Learning Paradigm for Interpretable Gradients

761

INPUT

GRAD-CAM GRAD-CAM++ SCORE-CAM ABLATION-CAM

BASELINE OURS BASELINE OURS BASELINE OURS BASELINE OURS

Man

Girl

Woman

Lobster

Couch

Figure 2: Saliency map comparison of standard vs. our training using different CAM-based methods on CIFAR-100 examples.

5.3 Causal Metrics

Causality evaluation (Petsiuk et al., 2018) aims at

evaluating the effect of masking certain elements of

the image in the predictive power of a model. Two

metrics are deﬁned as follows. Histograms and av-

erage values can be computed per image. Follow-

ing most previous work, we only show average values

over the test set.

Insertion. starts from a blurry image and gradually

inserts (unblurs) pixels of the original image, ranked

by decreasing saliency as deﬁned in a given saliency

map. At each iteration, images are passed through the

network to compute the predicted probabilities and

compare to the original.

Deletion. gradually removes the pixels by replacing

them by black, starting from the most salient pixels.

As for insertion, we compute the predicted probabili-

ties at each iteration.

5.4 Qualitative Results

We visualize the effect of our approach on saliency

maps and gradients, obtained for the baseline model

vs. the one trained with our approach.

Figure 2 shows saliency maps. We observe the

differences brought by our training method. The dif-

ferences are particularly important for Grad-CAM,

which directly averages the gradient to weigh feature

maps. Interestingly, the differences are smaller for

Score-CAM, which is not gradient-based but only ob-

INPUT BASELINE OURS

Bed

Lamp

Lawnmower

Maple Tree

Sunﬂower

Figure 3: Gradient comparison of standard vs. our training

on CIFAR-100 examples.

tains changes of predicted probabilities.

Figure 3 shows gradients. We observe slightly less

noise with our method, while the object of interest is

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

762

Table 1: Accuracy of standard vs. our training using

ResNet-18 and MobileNet-V2 on CIFAR-100. Using co-

sine error function for our training.

MODEL ERROR λ ACC

RESNET-18

Baseline – 73.42

Ours 7.5 × 10

−3

72.86

MOBILENET-V2

Baseline – 59.43

Ours 1 × 10

−3

62.36

Table 2: Interpretability metrics of standard vs. our training

using ResNet-18 and MobileNet-V2 on CIFAR-100. Using

cosine error function for our training.

RESNET-18

METHOD ERROR AD↓ AG↑ AI↑ INS↑ DEL↓

GRAD-CAM

Baseline 30.16 15.23 29.99 58.47 17.47

Ours 28.09 16.19 31.53 58.76 17.57

GRAD-CAM++

Baseline 31.40 14.17 28.47 58.61 17.05

Ours 29.78 15.07 29.60 58.90 17.22

SCORE-CAM

Baseline 26.49 18.62 33.84 58.42 18.31

Ours 24.82 19.49 35.51 59.11 18.34

ABLATION-CAM

Baseline 31.96 14.02 28.33 58.36 17.14

Ours 29.90 15.03 29.61 58.70 17.37

AXIOM-CAM

Baseline 30.16 15.23 29.98 58.47 17.47

Ours 28.09 16.20 31.53 58.76 17.57

MOBILENET-V2

METHOD ERROR AD↓ AG↑ AI↑ INS↑ DEL↓

GRAD-CAM

Baseline 44.64 6.57 25.62 44.64 14.34

Ours 40.89 7.31 27.08 45.57 15.20

GRAD-CAM++

Baseline 45.98 6.12 24.10 44.72 14.76

Ours 40.76 6.85 26.46 45.51 14.92

SCORE-CAM

Baseline 40.55 7.85 28.57 45.62 14.52

Ours 36.34 9.09 30.50 46.35 14.72

ABLATION-CAM

Baseline 45.15 6.38 25.32 44.62 15.03

Ours 41.13 7.03 26.10 45.38 15.12

AXIOM-CAM

Baseline 44.65 6.57 25.62 44.64 15.27

Ours 40.89 7.31 27.08 45.57 15.20

better covered by gradient activations.

5.5 Quantitative Results

We evaluate the effect of training a given model using

our proposed approach with faithfulness and causal-

ity metrics. As shown in Table 1 and Table 2, we

obtain improvements on both networks and on four

out of ﬁve interpretability metrics, while remaining

within half percent or improving accuracy relative to

the baseline, standard backpropagation.

The improvements are higher for faithfulness met-

rics AD, AG, and AI. Insertion gets a smaller but con-

sistent improvement. Deletion is mostly inferior with

our method, but with a very small difference. This

may be due to limitations of the metrics, as reported

in previous works (Zhang et al., 2023).

Table 3: Effect of error function on our approach, using

ResNet-18 and Grad-CAM attributions on CIFAR-100.

ERROR FUNCTION ACC AD↓ AG↑ AI↑ INS↑ DEL↓

Baseline 73.42 30.16 15.23 29.99 58.47 17.47

Cosine 72.86 28.09 16.19 31.53 58.76 17.57

Histogram 73.88 30.39 14.78 29.38 58.52 17.35

MAE 73.41 30.33 15.06 29.61 58.13 17.95

MSE 73.86 29.64 15.19 30.11 59.05 18.02

Table 4: Effect of regularization coefﬁcient λ (9) on our

approach, using ResNet-18 and Grad-CAM attributions on

CIFAR-100. Using cosine error function for our training.

λ ACC AD↓ AG↑ AI↑ INS↑ DEL↓

0 73.42 30.16 15.23 29.99 58.47 17.47

1 × 10

−3

73.71 29.52 15.17 30.03 59.23 17.45

2.5 × 10

−3

72.99 30.53 15.82 30.56 59.04 17.96

5 × 10

−3

72.46 30.10 16.06 30.67 57.47 17.80

7.5 × 10

−3

72.86 28.09 16.20 31.53 58.76 17.57

1 × 10

−2

73.28 28.97 15.75 31.16 58.99 17.50

1 × 10

−1

73.00 28.93 16.13 31.55 59.66 17.95

1 73.30 28.44 16.02 31.31 58.64 17.48

10 73.04 29.28 15.23 30.47 58.74 17.47

It is interesting to note that improvements on

Score-CAM mean that our training not only improves

gradient but also builds better activation maps, since

Score-CAM only relies on those.

5.6 Ablation Experiments

Using ResNet-18 and Grad-CAM attributions, we an-

alyze the effect of the error function and the regular-

ization coefﬁcient λ (9) on our approach.

Error Function. As shown in Table 3, we obtain a

consistent improvement on most metrics for all error

functions. Accuracy remains stable within half per-

cent of the original model. However, most options

have little or negative effect on deletion. Cosine sim-

ilarity provides improvements in most metrics, while

maintaining deletion performance. We thus choose

cosine error function by default.

Regularization Coefﬁcient. As shown in Table 4,

our method is not very sensible to the regularization

coefﬁcient λ. The value of 7.5 ×10

−3

works better in

general and is thus selected as default.

6 CONCLUSION

In this paper, we propose a new training approach

to improve the gradient of a CNN in terms of inter-

pretability. Our method forces the gradient with re-

A Learning Paradigm for Interpretable Gradients

763

spect to the input image obtained by backpropaga-

tion to align with the gradient coming from guided

backpropagation. The results of our training are eval-

uated according to several interpretability methods

and metrics. Our method offers consistent improve-

ment on most metrics for two networks, while remain-

ing within a small margin of the standard gradient in

terms accuracy.

REFERENCES

Bach, S., Binder, A., Montavon, G., Klauschen, F., M

uller,

K.-R., and Samek, W. (2015). On pixel-wise explana-

tions for non-linear classiﬁer decisions by layer-wise

relevance propagation. PloS one, 10(7).

Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-

ramanian, V. N. (2018). Grad-CAM++: Generalized

gradient-based visual explanations for deep convolu-

tional networks. In WACV.

Drucker, H. and Le Cun, Y. (1991). Double backpropaga-

tion increasing generalization performance. In IJCNN.

Finlay, C., Oberman, A. M., and Abbasi, B. (2018). Im-

proved robustness to adversarial examples using Lip-

schitz regularization of the loss.

Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., and Li, B.

(2020). Axiom-based Grad-CAM: Towards accurate

visualization and explanation of CNNs. arXiv preprint

arXiv:2008.02312.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In CVPR.

Jiang, P.-T., Zhang, C.-B., Hou, Q., Cheng, M.-M., and Wei,

Y. (2021). LayerCAM: Exploring hierarchical class

activation maps for localization. TIP.

Krizhevsky, A. (2009). Learning multiple layers of features

from tiny images. pages 32–33.

Lipton, Z. C. (2018). The mythos of model interpretability:

In machine learning, the concept of interpretability is

both important and slippery. Queue, 16(3):31–57.

Lundberg, S. M. and Lee, S.-I. (2017). A uniﬁed approach

to interpreting model predictions. In NeurIPS.

Luo, Y., Zhu, J., and Pﬁster, T. (2019). A simple yet ef-

fective baseline for robust deep learning with noisy

labels. arXiv preprint arXiv:1909.09338.

Lyu, C., Huang, K., and Liang, H.-N. (2015). A uniﬁed gra-

dient regularization family for adversarial examples.

In international conference on data mining.

Montavon, G., Samek, W., and M

uller, K.-R. (2018). Meth-

ods for interpreting and understanding deep neural

networks. Digital Signal Processing, 73:1–15.

Omeiza, D., Speakman, S., Cintas, C., and Weldermariam,

K. (2019). Smooth Grad-CAM++: An enhanced in-

ference level visualization technique for deep con-

volutional neural network models. arXiv preprint

arXiv:1908.01224.

Petsiuk, V., Das, A., and Saenko, K. (2018). RISE: Ran-

domized input sampling for explanation of black-box

models. arXiv preprint arXiv:1806.07421.

Philipp, G. and Carbonell, J. G. (2018). The nonlinear-

ity coefﬁcient-predicting generalization in deep neural

networks. arXiv preprint arXiv:1806.00179.

Ramaswamy, H. G. et al. (2020). Ablation-CAM: Vi-

sual explanations for deep convolutional network via

gradient-free localization. In WACV.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why

Should I Trust You?” explaining the Predictions of

Any Classiﬁer. In SIGKDD.

Ross, A. S. and Doshi-Velez, F. (2018). Improving the ad-

versarial robustness and interpretability of deep neu-

ral networks by regularizing their input gradients. In

AAAI.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and

Chen, L.-C. (2018). MobileNetv2: Inverted residuals

and linear bottlenecks. In CVPR.

Seck, I., Loosli, G., and Canu, S. (2019). L 1-norm double

backpropagation adversarial defense. arXiv preprint

arXiv:1903.01715.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-CAM: Visual

explanations from deep networks via gradient-based

localization. In ICCV.

Simon-Gabriel, C.-J., Ollivier, Y., Bottou, L., Sch

olkopf,

B., and Lopez-Paz, D. (2018). Adversarial vulnerabil-

ity of neural networks increases with input dimension.

Simonyan, K., Vedaldi, A., and Zisserman, A. (2014).

Deep inside convolutional networks: Visualising im-

age classiﬁcation models and saliency maps. ICLR

Workshop.

Smilkov, D., Thorat, N., Kim, B., Vi

egas, F., and Watten-

berg, M. (2017). Smoothgrad: removing noise by

adding noise. arXiv preprint arXiv:1706.03825.

Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-

miller, M. (2014). Striving for simplicity: The all con-

volutional net. arXiv preprint arXiv:1412.6806.

Srinivas, S. and Fleuret, F. (2018). Knowledge transfer with

jacobian matching. In International Conference on

Machine Learning, pages 4723–4731. PMLR.

Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic

attribution for deep networks. In ICML.

Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S.,

Mardziel, P., and Hu, X. (2020). Score-CAM: Score-

weighted visual explanations for convolutional neural

networks. In CVPR work.

Zhang, H., Torres, F., Sicre, R., Avrithis, Y., and Ayache,

S. (2023). Opti-CAM: Optimizing saliency maps for

interpretability. arXiv preprint arXiv:2301.07002.

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Tor-

ralba, A. (2016). Learning deep features for discrimi-

native localization. In CVPR.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

764