Concept Basis Extraction for Latent Space Interpretation of Image

Classiﬁers

Alexandros Doumanoglou

1,2 a

, Dimitrios Zarpalas

1 b

and Kurt Driessens

2 c

Information Technologies Institute, Centre for Research and Technology Hellas,

1st Km Charilaou - Thermi Road, Thessaloniki, Greece

Department of Advanced Computing Sciences, Maastricht University, 6200 MD Maastricht, The Netherlands

Keywords:

Concept Basis, Interpretable Basis, Unsupervised Learning, Explainable AI, Interpretability, Computer

Vision, Deep Learning.

Abstract:

Previous research has shown that, to a large-extend, deep feature representations of image-patches that belong

to the same semantic concept, lie in the same direction of an image classiﬁer’s feature space. Conventional

approaches compute these directions using annotated data, forming an interpretable feature space basis (also

referred as concept basis). Unsupervised Interpretable Basis Extraction (UIBE) was recently proposed as a

novel method that can suggest an interpretable basis without annotations. In this work, we show that the

addition of a classiﬁcation loss term to the unsupervised basis search, can lead to bases suggestions that align

even more with interpretable concepts. This loss term enforces the basis vectors to point towards directions

that maximally inﬂuence the classiﬁer’s predictions, exploiting concept knowledge encoded by the network.

We evaluate our work by deriving a concept basis for three popular convolutional networks, trained on three

different datasets. Experiments show that our contributions enhance the interpretability of the learned bases,

according to the interpretability metrics, by up-to +45.8% relative improvement. As additional practical

contribution, we report hyper-parameters, found by hyper-parameter search in controlled benchmarks, that can

serve as a starting point for applications of the proposed method in real-world scenarios that lack annotations.

1 INTRODUCTION

A crucial ﬁnding of post-hoc explainable artiﬁcial

intelligence (XAI) in computer vision, is that, in

standard convolutional image classiﬁer architectures

(Simonyan and Zisserman, 2015; Szegedy et al.,

2015; He et al., 2016), deep representations of im-

age patches corresponding to the same human under-

standable concept (e.g. image patches of a cat face, a

car window, or patches of a wall) lie on the same fea-

ture space direction (Szegedy et al., 2013; Kim et al.,

2018; Zhou et al., 2018). Furthermore, quantitatively,

this property is more explicit towards the top layers of

the networks (Alain and Bengio, 2016).

The conventional approach to discover concept di-

rections in the feature space of a pre-trained convolu-

tional neural network (CNN) requires annotated data

(Zhou et al., 2018; Kim et al., 2018). Each concept di-

rection is deﬁned by a concept vector which coincides

with the normal of the hyperplane that separates rep-

https://orcid.org/0000-0002-4337-1720

https://orcid.org/0000-0002-9649-9306

https://orcid.org/0000-0001-7871-2495

resentations of images (or image-patches) that depict

the concept from representations of images (or image-

patches) that depict other (negative) concepts. This

approach is applicable alike, whether the annotations

are available at the image-level (Kim et al., 2018) or

at pixel level (Zhou et al., 2018).

In (Zhou et al., 2018), the authors used the learned

set of concept vectors, to construct an interpretable

feature space basis (here also referred as concept ba-

sis). A concept basis can provide answers to sev-

eral questions regarding the CNN and its predictions

(Doumanoglou et al., 2023). For instance, it can be

used to explain the relationship between concepts and

ﬁlters (Fong and Vedaldi, 2018), to provide local ex-

planations by interpreting predictions of individual

examples (Zhou et al., 2018), or to provide global ex-

planations by quantifying the class sensitivity of the

CNN with respect to a concept (Kim et al., 2018).

Apart from the conventional approaches (Kim

et al., 2018; Zhou et al., 2018), recently a novel, unsu-

pervised, post-hoc, method was proposed that is able

to derive and suggest a concept basis without the need

of annotations (Doumanoglou et al., 2023). In that

Doumanoglou, A., Zarpalas, D. and Driessens, K.

Concept Basis Extraction for Latent Space Interpretation of Image Classiﬁers.

DOI: 10.5220/0012359700003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

417-424

ISBN: 978-989-758-679-8; ISSN: 2184-4321

417

case, the search for a concept basis is guided based on

the explanation that the basis provides for deep repre-

sentations. In particular, it was shown that projecting

a representation to all the concept vectors of an inter-

pretable basis and hard-thresholding, leads to sparse

binary representations and the search for the concept

basis is guided by this criterion.

To suggest an interpretable basis, prior work

(Doumanoglou et al., 2023) exploits structure in the

feature population of the studied CNN. However, the

suggested concept vectors are not explicitly linked to

the prediction strategy of the studied network. Here,

we argue that an explicit link between the concept

vectors and the CNN classiﬁer’s output has three ben-

eﬁts. First, in case the network makes predictions

based on human interpretable features, the suggested

basis’ interpretability can be improved by this link.

Second, in case the network uses a concept that is not

known to humans, this link can reveal the use of this

new concept which can be understood by inspecting

dataset samples that maximally activate the concept

vector (Kim et al., 2018). And third, in case the net-

work is cheating through spurious correlations in the

data, this link can expose this malfunction. Overall,

linking the basis search with the CNN classiﬁer’s out-

put produces a basis that can provide explanations re-

lated to the prediction strategy of the model. Eventu-

ally, the interpretation that this basis can provide can

be used to encourage trust to the model or debug it.

In this work, we make a simple, yet elegant, three-

fold impactful extensions to (Doumanoglou et al.,

2023). First, we empirically found that the number of

hyper-parameters of the Inactive Classiﬁer Loss (ICL)

in (Doumanoglou et al., 2023) does not contribute

much to the interpretability of the learned bases and

thus, here we propose a simpliﬁcation that reduces

their count. Second, and most important, we pro-

pose a loss term that guides the search of concept

vectors towards the directions that have stronger im-

pact on the CNN’s prediction outcomes and quanti-

tatively justify that the bases learned with this term

score better in the interpretability metrics. Third, we

provide indicative hyper-parameter values for the pro-

posed method, that were found by extensive hyper-

parameter search. Those values were experimentally

found to work best in our benchmarks (for which an-

notations are available for quantitative evaluation) and

may serve as a future reference for applying the pro-

posed method in real-world cases, when annotations

are absent. Finally, as a last contribution, inspired

from (Zarlenga et al., 2023), we propose an additional

basis interpretability metric which we also use for ba-

sis interpretability evaluation.

2 BACKGROUND

2.1 Problem Statement

Considering an intermediate layer of a pre-trained

CNN, let D ∈ N

denote the dimensionality of its

feature space. Let X ∈ R

H×W×D

denote the rep-

resentation of an image in this layer, and x

∈ R

a spatial element of this representation at location

p = (x, y),x ∈ {0,1,...,W − 1},y ∈ {0,1,...,H − 1}.

Previous work (Alain and Bengio, 2016; Zhou et al.,

2018; Kim et al., 2018) has shown that, based on

their semantic label, deep representations (either X or

), especially for layers near the top, are - up to a

large extend - linearly separable. Based on this fact,

(Zhou et al., 2018; Doumanoglou et al., 2023) were

motivated to construct a feature space basis (referred

as interpretable basis or concept basis) where each

vector in the basis, points towards the direction of

the feature space where representations of a human-

understandable concept lies. Once an interpretable

basis is learned, it can be used to attribute human-

understandable meaning to intermediate representa-

tions x

, by projecting x

onto the interpretable basis.

In both approaches (Zhou et al., 2018; Doumanoglou

et al., 2023), the learned basis is extracted in a post-

hoc manner by analyzing feature populations, and re-

gards a speciﬁc layer of the pre-trained CNN image

classiﬁer, while the weights of the CNN classiﬁer it-

self, are kept frozen. In this work, we contribute so-

lution improvements to the problem of unsupervised

concept basis extraction, by suggesting simple and el-

egant, yet impactful, adjustments to (Doumanoglou

et al., 2023).

2.2 Supervised Approach to Concept

Basis Extraction

Let w

∈ R

, i ∈ I , I = {0, 1,...,I − 1} denote a set

of I ≤ D, I ∈ N

concept vectors that form a concept

basis for the previously mentioned feature space. In

the standard supervised approach to concept basis ex-

traction (Zhou et al., 2018), each concept vector w

learned by training a linear classiﬁer {w

},b

∈ R

(also referred as concept detector (Bau et al., 2017;

Doumanoglou et al., 2023)) with the objective to sep-

arate representations of examples depicting a concept

(e.g. representations of patches depicting “car”) from

representations of examples depicting other (nega-

tive) concepts (e.g. representations of patches depict-

ing “wall”, “person”, “sky”, etc). For instance, after

successfully training a concept-detector for the con-

cept “car”, the vector w

, which coincides with the

separating hyper-plane’s normal, points towards the

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

418

Figure 1: From left to right: a) An interpretable direction w points towards the representations of a concept in the CNN’s

feature space. The position of the separating hyper-plane delimits the subspace where those representations lie. b) UIBE

ﬁnds a rotation of the feature space {w

} along with the positions {b

} of the separating hyper-planes, such as, under the new

rotated basis, each feature representation is classiﬁed positively by just one (or a few) of the classiﬁers {w

}. c) and d)

In this work we complement UIBE with an additional loss term that enforces the new basis to point towards directions that

maximally inﬂuence CNN’s predictions. An intermediate representation is manipulated towards the negative direction of the

basis vector that classiﬁes it positively, in an amount sufﬁcient to surpass the position of the classiﬁer’s separating hyper-plane,

and require that under this manipulation the prediction of the CNN becomes maximally uncertain.

feature space direction where the representation of the

concept (“car”) lies. Thus, projecting an arbitrary x

on w

measures how much car is contained in the rep-

resentation, while in that case, the bias b

, delimits

point zero, i.e. the minimum projected quantity that is

required in order to attribute the label “car” to x

For the supervised approach to work, the avail-

ability of a concept-dataset containing dense per-pixel

annotations is required. Correspondence between the

pixel-level annotations, which are provided in the im-

age input space, and the deep image representations

at the various spatial locations (x,y) is established via

calculations related to the receptive-ﬁeld of each spa-

tial feature x

2.3 Unsupervised Approach to Concept

Basis Extraction

Unsupervised Interpretable Basis Extraction (UIBE)

(Doumanoglou et al., 2023), has been proposed as

a novel method to suggest a concept basis with-

out the need of expensive annotations. At its core,

UIBE utilizes the same concept detector model (bi-

nary classiﬁers {w

}) as the supervised approach,

with the additional constraint that the vectors w

form an orthonormal basis (for the reasons behind

this, the reader is advised to refer to (Doumanoglou

et al., 2023)). For each concept detector in the basis,

the sigmoid classiﬁer rule y

p,i

= σ(s

− b

)) ∈

(0,1),s

∈ R

,||w

= 1 classiﬁes the representation

positively whenever y

p,i

> 0.5.

UIBE is learning a concept basis for a CNN’s in-

termediate layer, by analyzing the feature population

∀p in a concept dataset. During basis learning, and

due to the lack of annotations, the name of the concept

that each classiﬁer can detect is unknown. This name

is going to be determined later, using a procedure to

label the basis, as described in Section 2.4. Since

annotations are absent, training individual concept-

detectors is impossible. Instead of that, UIBE takes a

holistic approach, by considering the set of concept-

detectors in entirety. The basis that UIBE suggests,

satisﬁes a sparsity property that an interpretable basis

meets (Doumanoglou et al., 2023). In particular, in

UIBE, basis search is driven by the observation that

the semantic label(s) of each image-patch is (are) just

one (or just a few) over a plethora of other possible

semantic labels. Under the assumption that the pre-

trained studied CNN, has managed to disentangle the

different semantic concepts, this implies that for the

representation x

of each image-patch, only a frac-

tion of the concept-detectors (binary classiﬁers) as-

sociated with the concept basis would classify each

positively for the presence of the basis’ concepts.

This implies sparsity on the L1-normalized vectors

= [y

p,0

p,1

,...y

p,I−1

]. For a graphical illustration

please refer to Figure 1.

To ﬁnd a concept basis, UIBE utilizes four kinds

of losses. The ﬁrst one, is the Sparsity Loss (SL)

which, for the reasons described above, dictates en-

tropy minimization on the L1-normalized y

p,i

,∀i.

The second one, is the Maximum Activation Loss

(MAL), which enforces the most dominant positive

classiﬁcation j, ( j : y

p, j

> y

p,i

∀i ̸= j) to be the most

conﬁdent prediction in an absolute sense, i.e. y

p, j

→

1.0. SL together with MAL guide basis search to-

wards directions where each x

is attributed only

a small fraction of concept labels, compared to the

plethora of concept labels that the basis vectors are

associated with. The third one, is the Inactive Clas-

siﬁer Loss (ICL) which ensures that each one of the

concept-detectors in the basis, is actually meaning-

ful, by classifying positively a minimum amount of

in the concept dataset. Finally, the Maximum

Margin Loss (MML) minimizes s

, and thus im-

poses the largest possible margin around the separat-

ing concept-detector’s hyper-plane. In the rest of the

paper we say that a pixel p is explained by the ba-

sis, when y

p,i

> 0.5 for at least one i. We also say

that a pixel p is not explained by the basis when

Concept Basis Extraction for Latent Space Interpretation of Image Classiﬁers

419

p,i

< 0.5 ∀i, i.e. none of the concept detectors in the

basis classify pixel p positively, as a sample of a con-

cept. Finally, we only consider I = D (Sections 2.1,

2.2), leaving study and experiments for other values

of I for future work.

2.4 Basis Labeling

The term basis labeling refers to the process of at-

tributing a concept label name to each one of the vec-

tors in a concept basis. This process is applicable only

to the bases learned in an unsupervised way, since

for the bases that were learned in a supervised man-

ner, the label of each concept vector is known even

before training the respective concept-detector. In

cases where annotations are available, the basis label-

ing process can be accomplished in a systematic way.

Basis labeling methods (Bau et al., 2017; Mu and An-

dreas, 2020) can be used in tandem with an annotated

concept dataset, such as Broden (Bau et al., 2017) and

Broden Action (Ramakrishnan et al., 2019).

In this work we use (Bau et al., 2017) to label the

bases extracted with the proposed method. In particu-

lar, let φ

(c,K ) ∈ [0, 1] denote a metric score function

that is used to measure the suitability of the classiﬁer

i ({w

}) to accurately detect concept c in the anno-

tated concept dataset K . The basis vector i is assigned

the concept label c

⋆

, which is the label of the concept

dataset that maximizes φ

(c,K

train

), among all c, in

the training split K

train

of the concept dataset. For the

choice of φ we use Intersection Over Union (IoU), as

originally proposed in (Bau et al., 2017) and also used

in (Fong and Vedaldi, 2018; Mu and Andreas, 2020;

Doumanoglou et al., 2023)

2.5 Basis Interpretability Metrics

To measure the interpretability of a basis, we actu-

ally need to measure how well each concept-detector

performs in classifying positively the concept’s repre-

sentations, and negatively, the representations of con-

cept counter-examples (Bau et al., 2017; Zhou et al.,

2018). Thus, an annotated concept dataset is re-

quired.

In (Doumanoglou et al., 2023) two basis inter-

pretability metrics were proposed that were based

on ideas from other previous approaches (Bau et al.,

2017; Losch et al., 2021). In this work, we use

those exact same metrics, which are, subsequently,

described brieﬂy. Using the validation split of the

concept dataset K

val

, each classiﬁer {w

} is as-

signed a validation score φ

⋆

val

), with c

⋆

denot-

ing the concept label assigned to the classiﬁer during

the previously mentioned basis labeling procedure.

Then the two interpretability scores S

and S

are de-

ﬁned as:

I−1

∑

i=0

x≥ξ



∗

val

)



dξ (1)

ψ(ξ)dξ (2)

with (x) denoting the indicator function. The ﬁrst

metric S

, counts the number of concept detectors in

the basis with a validation score better than a thresh-

old ξ. In the second metric S

, ψ(ξ) is deﬁned as

ψ(ξ) = |{c

⋆

|∃i : φ(i,c

⋆

val

) ≥ ξ}|, i.e. the number

of unique concept detectors exhibiting performance

better than ξ (Bau et al., 2017). In both cases, we

make the scores, threshold agnostic, by integrating

across all ξ ∈ [0,1], as it was proposed in (Losch et al.,

2021; Doumanoglou et al., 2023).

2.6 Other Related Work

The proposed approach belongs in the same area as

other concept-based, post-hoc explainability methods

that do not require annotations for the discovery of

concepts encoded by a network. Although our ap-

proach builds upon a certain line of research (Szegedy

et al., 2013; Alain and Bengio, 2016; Bau et al., 2017;

Zhou et al., 2018; Kim et al., 2018; Doumanoglou

et al., 2023), which studies concepts as directions in

a CNN classiﬁer’s feature space, there are other, re-

cent, approaches that study concepts from different

viewpoints. For instance, in (Zhang et al., 2021), a

concept vector points to the concept cluster’s center

and (Vielhaben et al., 2022) uses spectral clustering

to ﬁnd meaningful subspaces in the CNN’s feature

space. Other than that, (Achtibat et al., 2023) extends

(Bach et al., 2015), by effectively turning a pixel at-

tribution method, to a concept discovery method. Fi-

nally, (Chormai et al., 2022) discovers disentangled

concept subspaces by taking into account the rele-

vance of neurons to the CNN’s prediction outcomes.

Our work is faithful to ﬁnding concepts that are rele-

vant to the CNN’s prediction outcome. In that sense,

our work also shares common ground with (Achtibat

et al., 2023) and (Chormai et al., 2022) although ap-

proaching concept discovery in a different way.

3 PROPOSED METHOD

3.1 Inactive Classiﬁer Loss (ICL)

Simpliﬁcation

In UIBE, the Inactive Classiﬁer Loss (ICL) was pro-

posed as a loss term that would enforce the concept

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

420

Table 1: a) Hyper-parameter names, bounds, step and initial value that were used in hyper-parameter tunning. The table also

provides the best values (suggestions) returned by the optimizer. λ

, λ

, and λ

are the loss weight parameters

for Sparity Loss, Max Activation Loss, Inactive Classiﬁer Loss, Maximum Margin Loss and CNN Classiﬁer Loss. τ refers

to the hyper-parameter of equation 3. b) Comparing interpretabilities of the natural feature space basis, the basis extracted

with UIBE and the basis extracted with the proposed method for the latent space of ResNet50. UIBE’s scores are set as

the reference scores against which relative percentage scores are calculated. The loss term introduced in this work leads to

learning a basis that is signiﬁcantly more interpretable than the reference, across all metrics, and especially in terms of S

(a)

τ λ

Lower 1.0 1.0 1.0 0.1 0.5 0.05

Upper 10.0 10.0 10.0 10.0 1.0 0.5

Step 0.5 0.5 0.5 0.5 0.1 0.01

Init 2.0 5.0 5.0 0.5 1.0 0.2

ResNet18 / Places365

Best 2.6 2.8 4.8 0.6 0.9 0.25

VGG16BN / ImageNet

Best 3.7 6.7 5.0 1.0 0.95 0.05

(b)

ResNet50 / MiT

Basis S

(↑) S

(↓)

Natural 90.82 (-27.1%) 16.24 (-12.0%) 0.023 (+0.44%)

UIBE 124.73 (+0.0%) 18.47 (+0.0%) 0.0229 (+0.0%)

Proposed 131.73 (+5.61%) 26.94 (+45.8%) 0.0225 (-1.75%)

detectors in the basis to classify positively (Section

2.3) a non-trivial amount of x

. In ICL’s absence,

some of the concept detectors in the basis might point

towards directions where none of x

is classiﬁed pos-

itively, introducing redundancy and a less useful basis

suggestion, since SL can be easier be fulﬁlled when

p,i

→ 0, for most i.

Experiments with UIBE have shown that the qual-

ity of the extracted bases are mostly insensitive to the

intuitive choice of hyper-parameters for ICL. Thus, in

this work, we opt to reduce the complexity of the ap-

proach and simplify ICL, by removing the notion of

partitions. Let τ ∈ (0, 1] denote a percentage over the

number of pixels p in the concept dataset, implicitly

controlling the minimum amount of pixels to be ex-

plained by the basis (Section 2.3). Let also ν

= τ/I

and γ > 1,γ ∈ R

a sharpening factor.

For the inactive classiﬁer loss we use the follow-

ing equation:

= E

i∈I

ReLU



− E

p,i

)]



(3)

which is the same formula as in (Doumanoglou et al.,

2023) but with a different deﬁnition for ν

, eliminat-

ing the hyper-parameters α

,ω

and the intermedi-

ate variables n

. Essentially, under the marginal case

where y

p,i

is close to zero or one (as urged by the com-

bination of SL and MAL), L

equals zero, when each

one of the concept detectors classify positively at least

percent of the pixels x

and equals one in the case

none of the concept detectors classiﬁes positively any

of them.

3.2 CNN Classiﬁer Loss (CCL)

The concept vectors in a basis derived with

(Doumanoglou et al., 2023) are not necessarily related

to the CNN’s predictions. However, as mentioned in

Section 1, this work proposes that such a link could

improve the suggested basis’ interpretability, reveal

new concepts exploited by the network, or aid model

debugging. Here we focus on the ﬁrst beneﬁt, while

leaving the study of the other two for future work.

The motivation of our second contribution is

based on the fact that, in the absence of annotations,

we may try to exploit the knowledge of concepts that

are encoded into the network, to aid the discovery

of interpretable directions. Our hypothesis is that,

in many cases, interpretable directions might maxi-

mally inﬂuence the classiﬁer’s predictions. For in-

stance, the classiﬁcation of an image as the scene

“park” might be inﬂuenced by the presence of con-

cepts “person”,“tree” and “bench”. Thus, the latent

representation of an image depicting a “park” could

have strong components across the directions of those

interpretable concepts. In that case, and as a conse-

quence, we would expect that if we manipulate the

representation of the “park” image by attenuating its

components across the previously mentioned concept

directions, the classiﬁer would ﬁnd it hard to classify

the image and thus its prediction would become very

much uncertain.

More formally, we introduce the CNN Classiﬁer

Loss (CCL), as a complement to the losses of UIBE.

The principal idea behind CCL is to manipulate each

of the representations x

, in such a way, that the re-

sulting representation x

′

is not explained by the ba-

sis (Section 2.3), i.e. y

′

p,i

< 0.5 ∀i and require that

the CNN’s prediction for the manipulated image rep-

resentation X

′

becomes maximally uncertain. Let

p,i

= w

− b

denote the signed distance of x

the i-th concept’s separating hyperplane. On the one

hand, if d

p,i

is positive, this means that x

is a sample

of concept i. Thus, according to our principal idea, we

need to attenuate the presence of this concept in x

manipulating the representation in the direction −w

and for a distance equal to d

p,i

. On the other hand,

if d

p,i

is negative, this implies that the concept i is

not present in x

and we do not need to perform any

Concept Basis Extraction for Latent Space Interpretation of Image Classiﬁers

421

manipulation to remove it. Let also α

p,i

= σ(s

′

p,i

)

a coefﬁcient in (0,1) that approaches 1 whenever we

need to perform a manipulation in x

for concept i.

The hyper-parameter s

′

∈ R

is considered ﬁxed and

its meaning is similar to s that was deﬁned in Section

2. The manipulation x

′

is given by the following for-

mula:

′

= x

− Wv (4)

with the columns of W being equal to the concept vec-

tors of the basis and v ∈ R

a vector with elements

= α

p,i

. Let the function f

denote the part of

the CNN after the layer of study. If f

′

) ∈ (0, 1)

denotes the ﬁnal vector of probabilities of the CNN,

the CNN Classiﬁer Loss is deﬁned as:

= −E

′

H ( f

′

)) (5)

with H denoting the entropy of the CNN’s prediction

vector for the K ∈ N

classes. An illustration of the

procedure is provided in Figure 1.

3.3 Basis Impurity Score

Inspired from the Oracle Impurity Score (OIS) that

was introduced in (Zarlenga et al., 2023), and to save

computational time, this work proposes a third basis

interpretability metric, that complements the previ-

ous two metrics described in Section 2.5. This met-

ric aims to capture how distinguished is the suitabil-

ity of each classiﬁer {w

} in the basis, in detect-

ing the concept associated with its assigned label.

More formally, let A ∈ [0,1]

I×C

a matrix with ele-

ments a

i,c

= φ

(c,K

val

) with c ∈ C and C denoting the

set of concept labels in the annotated concept dataset.

Let P ∈ {0,1}

I×C

the matrix with elements p

i,c

= 1 if

c = c

⋆

and 0 otherwise. The Basis Impurity Score is

deﬁned as:

I|C |

||P − A||

(6)

In eq (6) |C | denotes the number of concepts in C and

|| · ||

the Frobenius norm of a matrix. A value of

zero for S

implies that the concept detectors associ-

ated with the basis do not share suitability in detecting

concepts other than the concept of their assigned la-

bel, while values closer to 1 imply less exclusivity in

this suitability.

4 EXPERIMENTAL RESULTS

4.1 Experimental Setup

Overall Evaluation Approach. We choose to eval-

uate the bases extracted with the proposed method

based on the three basis interpretability metrics that

were described in Sections 2.5 and 3. For the eval-

uation, we derive bases for the last convolutional

layers (after ReLU) of three different CNN mod-

els: ResNet18 (He et al., 2016) trained on Places365

(Zhou et al., 2017), VGG16BN (Simonyan and Zis-

serman, 2015) (VGG16 with batch normalization lay-

ers) trained on ImageNet (Deng et al., 2009) and

ResNet50 (He et al., 2016) trained on Moments in

Time (MiT) (Monfort et al., 2019). Overall, we

follow the evaluation approach that was described

in (Doumanoglou et al., 2023). First we learn the

bases using the training split of the respective con-

cept dataset. Subsequently, we use (Bau et al., 2017)

to label the bases using the training split of the same

dataset and ﬁnally we use its validation split to com-

pute concept detector scores and the basis evaluation

metrics. For the networks trained on Places365 and

ImageNet we used the Broden (Bau et al., 2017) con-

cept dataset, while for the network trained on MiT we

used Broden Action (Ramakrishnan et al., 2019).

Parameter Initialization. In all of our experiments

we initialize the basis vectors with the vectors of

the natural feature space basis (Doumanoglou et al.,

2023). We also deviate from (Doumanoglou et al.,

2023) regarding the parameterization of margin and

bias. In this work, we use exponential parameteriza-

tion (e

) for both margin M = 1/s and bias b (in the

standardized space). Both M and b, are initialized to

t = log(0.5). We experimentally found that the expo-

nential parameterization stabilizes learning.

Hyper-Parameters. The efﬁcacy of the proposed

method, like UIBE’s, relies on minimizing the indi-

vidual loss terms with the right balance. The pro-

posed method linearly combines ﬁve individual loss

terms (SL, MAL, ICL, MML, CCL) with coefﬁcients

λ. In order to evaluate the potential of the proposed

approach, we perform two hyper-parameter tunning

experiments, by exploiting the availability of anno-

tations in the concept datasets. For these experi-

ments we take the 70% of the Broden’s training set

and further split it in 70%-30% train/validation splits.

For the validation metric we use S

. The hyper-

parameter tunning experiments regard bases learned

for ResNet18 and VGG16BN. We don’t perform any

hyper-parameter tunning regarding bases learned for

ResNet50. For hyper-parameter optimization we use

the Nevergrad (Rapin and Teytaud, 2018) optimiza-

tion platform. We perform tunning in steps. In the

ﬁrst step, we use the Two Points Differential Evo-

lution (TwoPointsDE) algorithm with budget 100, to

tune all parameters except λ

which is set to zero.

(This is the same as tuning UIBE, with the modiﬁca-

tion of the hereby proposed ICL loss). Subsequently,

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

422

Table 2: Comparing interpretabilities of the natural feature space bases and the bases extracted with either UIBE or the

proposed method. We set the metric scores extracted with UIBE, as the references against which relative percentage scores

are calculated. According to all the metrics, the loss term introduced in the proposed work consistently leads to basis extraction

of improved interpretability compared to UIBE, both for bases extracted for ResNet18 trained on Places365 as well as for

bases extracted for VGG16BN trained on ImageNet.

ResNet18 / Places365 VGG16BN / ImageNet

Basis S

(↑) S

(↓) S

(↑) S

(↓)

Natural 35.52 (-41.7%) 18.26 (-35.6%) 0.0272 (+5.0%) 34.72 (-27.0%) 11.86 (+1.4%) 0.0271 (+2.2%)

UIBE 60.93 (+0.0%) 28.39 (+0.0%) 0.0259 (+0.0%) 47.58 (+0.0%) 11.69 (+0.0%) 0.0265 (+0.0%)

Proposed 69.43 (+13.9%) 31.53 (+11.6%) 0.0256 (-1.1%) 48.96 (+2.9%) 12.11 (+2.9%) 0.0264 (-0.3%)

Table 3: Comparing Network Dissection results for all bases of ResNet18 and ResNet50. Two numbers are reported for each

concept category. The ﬁrst being the number of concept detectors that can detect a concept from the category, and second, the

number of (unique) concepts from the category that can be detected by the detectors.

ResNet 18 / Places365

Basis Object Part Scene Material Texture Color Action

Natural 116 / 45 10 / 7 261 / 121 2 / 2 50 / 26 0 / 0 -

UIBE 248 / 53 21 / 10 117 / 110 8 / 6 28 / 21 1 / 1 -

Proposed 174 / 46 11 / 8 258 / 135 7 / 6 33 / 25 1 / 1 -

ResNet50 / MiT

Natural 295 / 35 20 / 5 126 / 43 1 / 1 357 / 27 0 / 0 336 / 86

UIBE 289 / 39 49 / 4 332 / 42 8 / 3 665 / 39 99 / 3 140 / 73

Proposed 403 / 44 26 / 3 104 / 52 5 / 5 37 / 25 1 / 1 1082 / 120

we kept the rest of the hyper-parameters frozen from

step one and did an initial search for λ

with the

same optimizer and budget 60 using a broad range of

values in 0.5 − 5.0. (We did this step only once for

ResNet18 - and not for VGG16BN - to have an indi-

cation of λ

’s magnitude). Finally, we used the One-

PlusOne optimizer in order to further ﬁne-tune λ

the range 0.05 − 0.5 with a budget of 20 trials, for

both ResNet18 and VGG16BN. The exact details and

the best hyper-parameter values are given in Table 1.

In all experiments we set s

′

= 5.0.

Basis Learning Details. We use the Adam optimizer

to learn the concept bases, and the learning lasts for

a maximum of 300 epochs. We use a learning rate

scheduler to reduce the learning rate by a factor of 0.1

when the loss term does not improve for 10 epochs

and enable early stopping with patience of 15 epochs

and absolute minimum delta of 0.01. For the batch

size, we use the maximum number allowed by the

available GPU memory. We use the Broden (Bau

et al., 2017) and Broden Action (Ramakrishnan et al.,

2019) concept datasets to collect CNN feature popu-

lations and extract the concept bases. We use the ICL

and CCL losses proposed in this work, in addition to

the other losses from (Doumanoglou et al., 2023). For

basis learning, we linearly combine the various loss

terms based on the values suggested by the hyper-

parameter search and learn the bases using the whole

training split of the respective concept dataset (as op-

posed to the 70% of the training split that we used for

hyper-parameter search). For ResNet50, we used the

hyper-parameters suggested by the hyper-parameter

search for ResNet18.

4.2 Evaluation Results

In evaluation, we compare against the natural feature

space basis and the basis extracted with UIBE. Com-

parison with the supervised approach (Zhou et al.,

2018) yields similar ﬁndings as the ones reported in

(Doumanoglou et al., 2023) and is omitted. In these

experiments we use the full set of labels available in

the concept datasets (i.e. labels from all categories:

object / part / scene / material / texture / color and ac-

tion - when applicable -). For the natural feature space

basis we use w

= e

and compute b

according to (Bau

et al., 2017). Tables 1, 2 depict basis interpretability

results for all cases. For all the three different net-

works, the bases suggested by the proposed method

are scoring higher than UIBE across all the inter-

pretability metrics. The most prominent case regards

ResNet50 and S

, in which the proposed method sug-

gests a basis that is more interpretable than UIBE by

a relative factor of +45.8%.

Regarding ResNet18 and ResNet50, we ﬁnd it in-

teresting to share statistics that are reported by Net-

work Dissection (Bau et al., 2017) for all the con-

cept detectors in the bases participating in the bench-

mark. Those statistics were obtained using a ﬁxed

value of ξ = 0.04, which is the value that was pro-

posed in the respective paper. Two numbers are pro-

vided per concept category in Table 3. The ﬁrst be-

ing the number of concept detectors that can detect

a concept from the category, and second, the num-

ber of (unique) concepts from the category that can

be detected by the detectors. We ﬁnd noteworthy to

highlight that for ResNet18, the basis suggested by

Concept Basis Extraction for Latent Space Interpretation of Image Classiﬁers

423

the proposed method, has more than twice as much

concept detectors in the category scene than the basis

suggested by UIBE. Additionally its concept detec-

tors are able to detect 25 scene concepts more than

the detectors of UIBE. For a network that is trained

to do scene classiﬁcation this suggestion looks highly

plausible. An even more prominent result regards

ResNet50. Compared to UIBE, the proposed method

suggests a basis with 10 times more concept detec-

tors, for concepts in the action category. Addition-

ally, the respective basis’ concept detectors can detect

47 more action concepts than the detectors of UIBE.

This suggestion aligns better with the goal of a net-

work that is trained to perform action recognition.

5 CONCLUSION

In this work we proposed to complement previous

work (UIBE) with a novel loss term, that exploits

the knowledge encoded in CNN image classiﬁers and

suggests more interpretable bases. The proposed

method demonstrates up to 45.8% interpretability im-

provements in the extracted bases, when using opti-

mal hyper-parameters that were suggested for learn-

ing a basis regarding a different classiﬁer trained on

another task. Future work may study applications of

the proposed method to debug and improve model

performance.

ACKNOWLEDGEMENT

This work has been supported by the EC funded Hori-

zon Europe Framework Programme: CAVAA Grant

Agreement 101071178.

REFERENCES

Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wie-

gand, T., Samek, W., and Lapuschkin, S. (2023). From

attribution maps to human-understandable explana-

tions through concept relevance propagation. Nature

Machine Intelligence, 5(9).

Alain, G. and Bengio, Y. (2016). Understanding

intermediate layers using linear classiﬁer probes.

arXiv:1610.01644.

Bach, S., Binder, A., Montavon, G., Klauschen, F., M

uller,

K.-R., and Samek, W. (2015). On pixel-wise explana-

tions for non-linear classiﬁer decisions by layer-wise

relevance propagation. PloS one, 10(7).

Bau, D., Zhou, B., Khosla, A., Oliva, A., and Torralba,

A. (2017). Network dissection: Quantifying inter-

pretability of deep visual representations. In CVPR.

Chormai, P., Herrmann, J., M

uller, K.-R., and Montavon, G.

(2022). Disentangled explanations of neural network

predictions by ﬁnding relevant subspaces.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). Imagenet: A large-scale hierarchical

image database. In CVPR. IEEE.

Doumanoglou, A., Asteriadis, S., and Zarpalas, D.

(2023). Unsupervised interpretable basis extraction

for concept-based visual explanations. Acccepted

at IEEE TAI, currently available on arXiv preprint

arXiv:2303.10523.

Fong, R. and Vedaldi, A. (2018). Net2vec: Quantifying and

explaining how concepts are encoded by ﬁlters in deep

neural networks. In CVPR.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In CVPR.

Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J.,

Viegas, F., et al. (2018). Interpretability beyond fea-

ture attribution: Quantitative testing with concept ac-

tivation vectors (TCAV). In ICML. PMLR.

Losch, M., Fritz, M., and Schiele, B. (2021). Semantic bot-

tlenecks: Quantifying and improving inspectability of

deep representations. IJCV, 129(11).

Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K.,

Bargal, S. A., Yan, T., Brown, L., Fan, Q., Gutfre-

und, D., Vondrick, C., et al. (2019). Moments in time

dataset: one million videos for event understanding.

TPAMI, 42(2).

Mu, J. and Andreas, J. (2020). Compositional explanations

of neurons. In NIPS, volume 33. Curran Associates,

Inc.

Ramakrishnan, K., Monfort, M., McNamara, B. A., Las-

celles, A., Gutfreund, D., Feris, R. S., and Oliva, A.

(2019). Identifying interpretable action concepts in

deep networks. In CVPR Workshops.

Rapin, J. and Teytaud, O. (2018). Nevergrad - A

gradient-free optimization platform. https://GitHub.

com/FacebookResearch/Nevergrad.

Simonyan, K. and Zisserman, A. (2015). Very deep convo-

lutional networks for large-scale image recognition. In

Bengio, Y. and LeCun, Y., editors, ICLR.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2015). Going deeper with convolutions.

In CVPR.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Er-

han, D., Goodfellow, I., and Fergus, R. (2013). In-

triguing properties of neural networks. arXiv preprint

arXiv:1312.6199.

Vielhaben, J., Bl

ucher, S., and Strodthoff, N. (2022). Sparse

subspace clustering for concept discovery (SSCCD).

Zarlenga, M. E., Barbiero, P., Shams, Z., Kazhdan, D.,

Bhatt, U., Weller, A., and Jamnik, M. (2023). Towards

robust metrics for concept representation evaluation.

arXiv preprint arXiv:2301.10367.

Zhang, R., Madumal, P., Miller, T., Ehinger, K. A., and Ru-

binstein, B. I. P. (2021). Invertible concept-based ex-

planations for cnn models with non-negative concept

activation vectors. In AAAI.

Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-

ralba, A. (2017). Places: A 10 million image database

for scene recognition. IEEE TPAMI.

Zhou, B., Sun, Y., Bau, D., and Torralba, A. (2018). Inter-

pretable basis decomposition for visual explanation.

In ECCV.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

424