Feature Selection for Unsupervised Anomaly Detection and Localization

Using Synthetic Defects

Lars Heckler

1,2

and Rebecca K

onig

MVTec Software GmbH, Germany

Technical University of Munich, Germany

Keywords:

Anomaly Detection, Feature Selection, Visual Inspection, Synthetic Defects.

Abstract:

Expressive features are crucial for unsupervised visual Anomaly Detection and Localization. State-of-the-art

methods like PatchCore or SimpleNet heavily exploit such features from pretrained extractor networks and

model their distribution or utilize them for training further parts of the model. However, the layers com-

monly used for feature extraction might not represent the optimal choice for reaching maximum performance.

Thus, we present the ﬁrst application-speciﬁc feature selection strategy for the task of unsupervised Anomaly

Detection and Localization that identiﬁes the most suitable layer of a pretrained feature extractor based on

the performance on a synthetic validation set. The proposed selection strategy is applicable to any feature

extraction-based AD method and may serve as a competitive baseline for future work by not only outperform-

ing single-layer baselines but also features ensembled from multiple layer outputs.

1 INTRODUCTION

Detecting and localizing anomalies in natural im-

ages is an important task in modern computer vi-

sion. Anomalies manifest themselves in various ways

such as structural defects (Bergmann et al., 2019a;

Zou et al., 2022) or the violation of logical con-

straints (Bergmann et al., 2022). Consequently, the

task of Anomaly Detection and Localization (AD)

has a wide range of application areas reaching from

the medical domain (Seeb

ock et al., 2020; Menze

et al., 2015) over autonomous driving (Blum et al.,

2019; Hendrycks et al., 2019) and video surveillance

(Nazare et al., 2018; Li et al., 2013) to industrial in-

spection (Bergmann et al., 2019a; Zou et al., 2022).

Especially within the industrial context AD depicts a

crucial tool for quality assurance. Since modern pro-

duction lines are highly automated and aim for maxi-

mum production quality it is difﬁcult to acquire de-

fective samples that can be used within the design

phase of an AD system. Besides, all possible defect

types that might occur for a product cannot be known

prior to the launch of fabrication. For these reasons,

the task of industrial AD is predominantly tackled in

an unsupervised way, where the training of the AD

method only requires anomaly-free samples.

Within the domain of unsupervised AD, methods

that are built on top of pretrained networks have be-

come popular. These pretrained networks such as

ResNets (He et al., 2016) trained for ImageNet classi-

Figure 1: Application-speciﬁc feature selection for Patch-

Core. Performance of a layer is shown when exclusively

utilizing its corresponding feature map from the feature ex-

tractor. Using the features of a single Wide ResNet-50 layer

chosen by the proposed application-speciﬁc selection strat-

egy can practically improve the harmonic mean P

between

anomaly classiﬁcation and localization over several base-

lines, even over features ensembled from multiple layers, as

shown here for MVTec AD object tile.

ﬁcation (Krizhevsky et al., 2012) are used for extract-

ing features from the input images and stay frozen

during both training and inference time. Here, the un-

derlying assumption for solving the task of Anomaly

Detection is that the feature extractor generates dis-

154

Heckler, L. and König, R.

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects.

DOI: 10.5220/0012385500003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

154-165

ISBN: 978-989-758-679-8; ISSN: 2184-4321

tinct features for anomaly-free and anomalous images

that can then be exploited to differentiate between

these two classes. Moreover, extracting features on

a local patch level might allow a precise localization

of a defect (Roth et al., 2022; Defard et al., 2020).

Commonly, the extracted features originate from

the same layers of the feature extractor for every ap-

plication scenario. However, recently Heckler et al.

(2023) systematically analyzed the importance of pre-

trained feature extractors for unsupervised AD and

the corresponding choice of features. They found

that existing state-of-the-art AD methods are highly

sensitive to the particular choice of feature extrac-

tor and layer. In addition, the potential beneﬁts in

performance of using distinct feature layers for dis-

tinct application scenarios was highlighted. Moti-

vated by these ﬁndings, we ask how to determine

the features of a pretrained feature extractor that are

most suitable for a speciﬁc AD task without requir-

ing access to any real defective samples. The pro-

posed feature selection strategy builds upon the gen-

eration of synthetic defects to estimate the suitabil-

ity of a layer to provide expressive features for the

given task. Extensive experiments demonstrate the ef-

fectiveness and feasibility of application-speciﬁc fea-

ture selection for Anomaly Detection and Localiza-

tion, exemplarily shown in Figure 1. In particular, our

key contributions are:

• We present the ﬁrst feature selection strategy for

the task of unsupervised Anomaly Detection and

Localization that builds a competitive baseline for

future work within this research direction.

• Applying the proposed selection strategy that

does not require any real defective data we

demonstrate that it is feasible to select the features

of a single layer to obtain performances compara-

ble or superior to feature ensembling approaches.

• By ﬁne-grained experiments we conﬁrm the im-

portance of incorporating feature selection strate-

gies into future method development since com-

mon choices of feature layers are not guaranteed

to be optimal for a speciﬁc approach and applica-

tion scenario.

2 RELATED WORK

As outlined by recent surveys (Liu et al., 2023a;

Cui et al., 2023; Prunella et al., 2023), the task of

Anomaly Detection and Localization can be tackled

in various ways. Generally, Anomaly Detection meth-

ods can be separated into reconstruction-based and

embedding-based methods. Reconstruction-based

methods use network architectures such as convolu-

tional autoencoders (Masci et al., 2011) or Generative

Adversarial Networks (Goodfellow et al., 2014) to

learn the appearance of anomaly-free samples. After

training exclusively on such anomaly-free data their

reconstructions during inference erase the defects in

the anomalous test data. Consequently, comparing

the input image with its reconstruction results in large

deviations that indicate the presence of an anomaly

(Bergmann et al., 2019b; Akc¸ay et al., 2019; Luo

et al., 2023). Embedding-based methods try to model

the distribution of normal images in a predeﬁned la-

tent space. During inference, images outside of this

distribution are considered as anomalous (Roth et al.,

2022; Defard et al., 2020; Cohen and Hoshen, 2020).

Combinations of both approaches also exist such as

student-teacher networks. Utilizing Knowledge Dis-

tillation (Hinton et al., 2015; Ba and Caruana, 2014) a

student network learns to imitate the embedding of

a pretrained teacher network on anomaly-free data.

Consequently, at test time the student fails to imitate

the teacher for data containing defects. The discrep-

ancy in the output of the two networks is used to de-

tect the anomalies (Bergmann et al., 2020; Rudolph

et al., 2023; Bergmann et al., 2022).

Pretrained Feature Extractors. Especially

embedding-based and student-teacher approaches

extensively make use of pretrained feature extrac-

tors to obtain distinctive features for anomaly-

free and defective samples, respectively (Rip-

pel et al., 2021; Reiss et al., 2021; Defard

et al., 2020; Roth et al., 2022). These feature

extractors such as ResNets (He et al., 2016),

Wide ResNets (Zagoruyko and Komodakis, 2016),

DenseNets (Huang et al., 2017) or EfﬁcientNets

(Tan and Le, 2019) are usually pretrained on large

classiﬁcation tasks like ImageNet (Krizhevsky et al.,

2012) and remain ﬁxed during training of the AD

method. For instance, concatenating feature maps of

different hierarchy levels of the extractor PaDiM (De-

fard et al., 2020) models the distribution of normal

image patches by a multivariate Gaussian. Similarly,

Cohen and Hoshen (2020) retrieve features on image

level and use k-nearest-neighbors to determine the

average distance of the test image to the normal

training data. The state-of-the-art method PatchCore

(Roth et al., 2022) creates a feature memory bank by

subsampling the overall set of normal patch-features

with a coreset-selection strategy (Sener and Savarese,

2018). Again, the locally aware patch-features

combine feature maps from multiple layers of the

extractor. The distance of a test patch-feature to this

nominal memory bank deﬁnes the scoring function

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

155

for detecting and localizing anomalies.

Synthetic Defects for Unsupervised Anomaly De-

tection. The lack of defective samples for training

AD systems has led to the development of various

approaches to generate and exploit synthetic defects.

CutPaste (Li et al., 2021) cuts random image patches

from a defect-free image and pastes them into another

one. Schl

uter et al. (2022) expand this strategy by

Poisson image editing (P

erez et al., 2003) and back-

ground segmentation to obtain synthetic anomalies

with a more natural appearance. Subsequently, both

of these synthetic defect generation techniques are

integrated into a self-supervised learning framework.

Synthetic defects on image level are also exploited to

obtain more expressive features for the given AD task

(Cao et al., 2023). Likewise, perturbations in the fea-

ture space are intended to reﬂect synthetic abnormali-

ties as well (Zavrtanik et al., 2022; Liu et al., 2023b).

These perturbations may be used to train an anomaly

detection module to distinguish normal from anoma-

lous features. (Zavrtanik et al., 2022). SimpleNet (Liu

et al., 2023b) further ﬁrst utilizes a dedicated feature

adaption module before adding Gaussian noise to the

already adapted features. Subsequently, a discrimina-

tor network learns to distinguish normal from anoma-

lous features.

Feature Selection for Anomaly Detection.

Whereas the ideal properties of a feature for the

task of Anomaly Detection and Localization are

straightforward to deﬁne – similar for images or

image patches from the anomaly-free class but far

off from this distribution for anomalous samples

or patches – selecting such distinctive features

from a pretrained feature extractor is a much more

challenging problem. Therefore, concatenating

feature maps from different hierarchy levels has

become a prominent approach for building expressive

features (Roth et al., 2022; Defard et al., 2020; Liu

et al., 2023b; Yu et al., 2021). Here, ﬁne-grained

details from earlier layers are usually joined with

the high-level abstraction capability of later layers.

However, recent work demonstrates that features

extracted from a single layer of the extractor can

lead to comparable or even superior performance for

state-of-the-art AD methods (Heckler et al., 2023),

provided that these single-layer-features are selected

individually for each application scenario.

In this paper, we follow this line of work and

propose a feature selection strategy for unsupervised

AD when only considering a single layer for extract-

ing features. Here, opposed to Heckler et al. (2023)

who theoretically evaluate the potential of choosing

an application-speciﬁc feature layer, we present and

validate a strategy that can be implemented in real-

world applications. Besides, we provide more ﬁne-

grained insights into the importance of feature selec-

tion by considering a substantially larger set of layers

to choose from. To the best of our knowledge, we

are the ﬁrst to propose such a feature selection strat-

egy for the task of Anomaly Detection and Localiza-

tion. Accordingly, apart from the practical beneﬁt,

our method may also serve as a baseline for future

research in this ﬁeld.

3 METHOD

For identifying the most expressive features for a spe-

ciﬁc AD scenario we ﬁrst investigate whether certain

layers can be excluded from the overall set of layers

of a pretrained feature extractor to deﬁne the solu-

tion space (Sec. 3.1). Thereafter, the techniques used

to generate synthetic defects for our selection strat-

egy are explained in detail (Sec. 3.2). Finally, the

proposed application-speciﬁc selection strategy that

identiﬁes the most suitable layer for feature extraction

is described (Sec. 3.3).

3.1 Layers Extracting Expressive

Features

Using features from pretrained feature extractors for

the task of unsupervised AD allows for choosing from

a wide range of potential features. In theory, the fea-

ture map F ∈ R

H×W×C

generated by any layer L of

the extractor might be useful for the given application.

Commonly, features from earlier layers are assumed

to provide too little abstraction of the characteristics

of the input image, whereas layers from the end of the

extractor might be biased towards the pretraining task

such as ImageNet classiﬁcation (Roth et al., 2022).

To investigate whether these assumptions are valid we

analyze the set of potential features for an AD appli-

cation as follows. Given an AD scenario and a pre-

trained feature extractor X

pre

consisting of n layers

L = {L

}

i=1...n

, we obtain the overall set of features

F by forwarding the anomaly-free training images

AD,train

through X

pre

and extracting the features of

every layer L

∈ L, such that

F =

{

= X

pre,i

AD,train

), i = 1, . . . , n

}

. (1)

Here, X

pre,i

deﬁnes the feature extraction of the i-th

layer L

in X

pre

. Considering only the features of a

single layer we train the AD system with F

and then

evaluate the performance on the test set S

AD,test

. The

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

156

Figure 2: Synthetic defect types utilized within our feature

selection strategy with corresponding ground truth maps.

features for the test set are extracted from the same

layer L

∈ L of X

pre

as for S

AD,train

Overall, by this ﬁne-grained pre-analysis we ex-

plore whether it is possible to reduce the initial set

of features to F

red

⊆ F based on certain attributes of

the corresponding layers such as the layer type or its

position within the extractor network. Accordingly,

the layers L

red

⊆ L generating these features repre-

sent the set of layers to choose from for our feature

selection strategy (Sec. 3.3).

3.2 Synthetic Defect Generation

The generation of synthetic defects has become a

technique widely used in the domain of unsupervised

AD (Cao et al., 2023; Zavrtanik et al., 2021; Guo

et al., 2023; Zhang et al., 2023). We propose to exploit

this technique for feature selection. In particular, we

generate three different kinds of synthetic anomalies.

Simple Synthetic Anomalies. Given an anomaly-

free image x

good

∈ R

H×W×C

we randomly determine

an axis-aligned rectangular region of size H

rand

that ﬁts the original image dimensions. Then,

the pixel values inside this region are set to 0 in ev-

ery channel C of the image x

good

to obtain the image

containing the simple synthetic defect x

bad,syn

. Visu-

ally, this results in a black rectangular area as shown

in Fig. 2.

CutPaste for Synthetic Anomalies. Inspired by Li

et al. (2021) we build upon their proposed CutPaste

strategy to obtain images with artiﬁcial defects. Here,

a region from an anomaly-free source image x

good,src

is cut out and the resulting crop is pasted into an

anomaly-free target image x

good,tar

to obtain x

bad,syn

Both the size and position of the crop in the source

image and the paste position in the target image are

chosen randomly and independently from one an-

other. Figure 2 illustrates some synthetic anomalies

obtained through CutPaste.

Natural Synthetic Anomalies (NSA). In order to

create synthetic defects with a more realistic appear-

ance than the Simple or CutPaste defects we make

use of a technique proposed by Schl

uter et al. (2022).

Here, random crops from the anomaly-free source im-

age are scaled before pasting them into the anomaly-

free target image. To obtain a more seamless blending

Poisson image editing (P

erez et al., 2003) is utilized.

Besides, a threshold operation is applied to x

good,src

and x

good,tar

that is intended to separate the object

from the background. The crop and paste positions

need to be inside the foreground region. Thus, the

synthetic defects are likely to appear on the actual ob-

ject and seem more natural (Fig. 2). The implemen-

tation of NSA also allows multiple defective regions

in x

good,tar

in contrast to the other two defect types,

where we only generate one anomalous region in the

image.

3.3 Feature Selection Using Synthetic

Defects

Given a set of layers L of the pretrained feature ex-

tractor X

pre

, the task of feature selection is to identify

the layer L

∗

which generates the optimal features for a

speciﬁc AD scenario. Here, we consider the optimal

layer L

∗

as the layer leading to the best AD perfor-

mance when using its corresponding feature maps as

extracted features. Feature selection can thus be for-

malized as L

∗

= select (L) (Heckler et al., 2023).

We propose to utilize synthetic defects for deﬁn-

ing the select-function (Fig. 3). In particular, we

split the anomaly-free training set of an AD scenario

AD,train

into two disjoint parts D

and D

. The ﬁrst

part D

remains unchanged for training the AD sys-

tem, whereas the second part D

is split again dis-

jointly into D

V 1

and D

V 2

. Both D

V 1

and D

V 2

still

only contain anomaly-free images. Now, the images

in D

V 2

are augmented with synthetic defects, such

that D

V 2

becomes D

V,bad

. For ease of notation we ac-

cordingly denote D

V 1

as D

V,good

. In total we thus ob-

tain three non-overlapping dataset splits D

, D

V,good

and D

V,bad

from the original training set S

AD,train

. Us-

ing the features from a speciﬁc layer L the AD system

is trained using D

. The trained model M

is then

evaluated on the union of D

V,good

and D

V,bad

, the syn-

thetic validation set D

V,syn

, with corresponding per-

formance P {M

V,syn

)}. The optimal layer L

∗

be selected is the one maximizing P :

∗

= select (L) = arg max

L∈L

P {M

V,syn

)} (2)

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

157

Figure 3: Feature Selection for Anomaly Detection. From the original training set S

AD,train

a validation set D

and training

set D

are created. A part of D

is augmented with synthetic defects to obtain the synthetic validation set D

V,syn

that contains

both anomalous and anomaly-free images. The selected layer L

∗

is the layer leading to the best performance on D

V,syn

when

using its corresponding features for training the feature-extraction based AD method M .

We assume that the features of the layer selected

based on synthetic defects are also expressive for de-

tecting defects that occur in real-world applications

and, consequently, lead to high AD performance of

the corresponding model M

∗

4 EXPERIMENTS AND RESULTS

In this section we analyze the applicability of our pro-

posed feature selection strategy. Generally, the model

∗

using the features of the layer L

∗

selected with-

out any real defective sample is evaluated on real

test datasets D

test

. For this, we ﬁrst describe the

implementation details of our experiments, the used

datasets and evaluation metrics. Then, we conduct

a pre-analysis on the performance of layers to iden-

tify the solution space (Sec. 4.1) for our main exper-

iment that validates the proposed selection strategy

(Sec. 4.2). Further, we compare our single-layer re-

sults to multi-layer state-of-the-art performance (4.3)

and provide ablation studies on e.g. the selection cri-

terion (Sec. 4.4).

Implementation Details. In our experiments we

consider two different methods for unsupervised AD:

PatchCore (Roth et al., 2022) and SimpleNet (Liu

et al., 2023b). While PatchCore directly uses the

extracted features from a pretrained feature extrac-

tor to model the feature space without any fur-

ther training, SimpleNet tries to adapt these fea-

tures for the present AD task to the target domain.

As feature extractors we investigate the widely used

Wide ResNet-50 (Zagoruyko and Komodakis, 2016)

and DenseNet-201 (Huang et al., 2017). Since each

layer might provide expressive features for the given

AD task (Sec. 4.1), we cannot discard any layer in ad-

vance and train both methods for each layer of those

networks (L

red

≡ L). For extracting features, we use

the feature extraction package

of torchvision in Py-

Torch (Paszke et al., 2019). This results in analyz-

ing 172 layers for Wide ResNet-50 and 708 layers for

DenseNet-201. We use the ofﬁcial implementations

of PatchCore

and SimpleNet

. In the PatchCore im-

plementation we ﬁx the size of the coreset to 1000.

We train SimpleNet for 160 epochs and report the re-

sult of the ﬁnal model. For both methods we zoom

input images to 256 × 256 spatial dimensions without

center cropping.

Datasets. We train and evaluate individually on

all 15 object categories of the MVTec AD dataset

(Bergmann et al., 2019a). For the synthetic anomaly

generation, we split the original train split S

AD,train

each dataset category into a train (D

) and a valida-

tion split (D

) that contain 70% and 30% of the orig-

inal training images, respectively. Of the validation

split D

75% of the images are modiﬁed to contain

synthetic anomalies (D

V,bad

) and the remaining 25%

of the images are not changed (D

V,good

). This good to

bad ratio approximates the ratio in the real test set but

remains constant for each object category.

Evaluation Metrics. In our experiments we evalu-

ate both the quality of anomaly detection and anomaly

localization. For anomaly detection on image-level

we report the area under the receiver operator charac-

teristic curve (AU-ROC). For anomaly localization on

pytorch.org/vision/stable/feature extraction.html

github.com/amazon-science/patchcore-inspection

github.com/DonaldRR/SimpleNet

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

158

pixel-level we use the area under per-region-overlap

curve (AU-PRO) (Bergmann et al., 2019a). As sug-

gested by Bergmann et al. (2019a) we integrate the

PRO-curve only up to a false positive rate of 0.3

(AU-PRO

0.30

) using the ofﬁcial evaluation code

. On

image-level, we report the full area (AU-ROC

1.00

Heckler et al. (2023) showed that distinct fea-

ture layers may be optimal for the respective task,

detection and localization. However, considering

real-world applicability we want to select a layer

that solves both tasks adequately. Therefore, we

compute the harmonic mean P

of AU-ROC

1.00

and

AU-PRO

0.30

and use this metric as performance mea-

sure

2 · AU-ROC

1.00

· AU-PRO

0.30

AU-ROC

1.00

+ AU-PRO

0.30

. (3)

The harmonic mean heavily penalizes if one of the

performance metrics is small and the other one is

large, in contrast to the arithmetic mean (Komi

2011). In real-world applications, achieving satisfac-

tory results on both classifying and localizing anoma-

lies is a vital characteristic of an AD system.

4.1 Can We Reduce the Initial Set of

Layers?

In order to analyze whether it is possible to reduce

the initial set of layers and, thus, features to choose

from, we ﬁrst group all available layers of a fea-

ture extractor according to their hierarchical position

within the extractor and their layer type. Exemplary

shown for MVTec AD object grid when using Patch-

Core with a Wide ResNet-50 feature extractor, at ﬁrst

glance the results in Figure 4a indicate that later layers

containing higher-level semantics have the tendency

to perform slightly worse than earlier layers or even

layers from the input block. However, as shown in

the appendix, these patterns vary signiﬁcantly for dif-

ferent object categories. Likewise, patterns obtained

by grouping the feature performances by their corre-

sponding layer type are also very object-speciﬁc. In

addition, even for one object category we can observe

that features from any layer type might lead to high

performance (Fig. 4b).

Heckler et al. (2023) hypothesized that a pre-

selection of suitable layers based on the relative re-

ceptive ﬁeld size might be possible. Therefore, we

also determine the effective receptive ﬁeld size (Luo

et al., 2016) of each layer and divide it by the input

image size to obtain this attribute. Figure 5 shows the

AD performance over the relative receptive ﬁeld size

for PatchCore and both analyzed feature extractors,

mvtec.com/company/research/datasets/mvtec-ad

averaged over all 15 MVTec AD objects. Though an

optimal range of the receptive ﬁeld size seems to exist

here in general, the performance variations within this

range are still too large to use this attribute as a pre-

selection criterion. Moreover, the potentially ideal

range not only changes for distinct feature extractors

but also varies signiﬁcantly when using other methods

such as SimpleNet, as shown in the appendix.

Considering all results from our extensive pre-

analysis we ﬁnd that it is not possible to reduce the

initial set of all layers L based on the analyzed layer

attributes. Too many variations for distinct object-

categories, feature extractors and AD methods occur.

Therefore, rather than deﬁning application-speciﬁc

pre-selection criteria we developed a universal feature

selection strategy that can cope with all these chal-

lenges to a certain extend, as presented within the next

section.

(a) Wide ResNet-50 blocks.

(b) Layer type.

Figure 4: AD metrics on real test set of MVTec AD ob-

ject grid. Shown is for each metric the mean with standard

deviation over all layers belonging to same block (a) and

all layers of same type (b) in Wide ResNet-50, trained with

PatchCore.

4.2 Feature Selection Using Synthetic

Defects

Our proposed feature selection strategy is based on

synthetic defects. We generate three different types of

synthetic defects, namely Simple, CutPaste and Nat-

ural Synthetic Anomalies (NSA), becoming more and

more complex and realistic in this order. The idea is

to transfer the AD performance from the synthetic to

real data. For each defect type we select the layer

∗

for which the model performs best on the syn-

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

159

Table 1: Harmonic mean P

of AU-ROC

1.00

and AU-PRO

0.30

on the real test set of MVTec AD for the selected layers of each

defect type. The last column baseline is the average performance of all layers. For each object, feature extractor and anomaly

detection method, the best result is marked in bold.

PatchCore SimpleNet

WideResNet50 DenseNet201 WideResNet50

object Simple CutPaste NSA baseline Simple CutPaste NSA baseline Simple CutPaste NSA baseline

bottle 88.4 ± 0.9 94.1 ± 0.8 98.0 ± 0.2 91.0 ± 0.2 94.0 ± 0.2 95.0 ± 0.2 97.4 ± 0.3 90.9 ± 0.0 74.4 ± 22.2 91.0 ± 0.4 93.8 ± 0.7 42.8 ± 1.3

cable 77.1 ± 23.1 93.2 ± 1.2 93.9 ± 1.1 81.3 ± 0.2 93.5 ± 1.9 93.9 ± 0.8 94.7 ± 0.3 83.4 ± 0.1 47.7 ± 9.6 80.5 ± 2.1 87.9 ± 0.7 38.0 ± 0.5

capsule 97.4 ± 0.4 91.7 ± 0.5 97.5 ± 0.5 87.3 ± 0.1 90.8 ± 4.7 89.1 ± 1.3 92.6 ± 2.7 81.2 ± 0.1 85.5 ± 2.2 91.3 ± 1.5 77.8 ± 6.9 35.6 ± 0.6

carpet 73.1 ± 4.5 89.2 ± 8.0 90.2 ± 1.5 89.8 ± 0.1 80.5 ± 9.7 94.7 ± 0.8 85.7 ± 6.7 89.6 ± 0.0 52.8 ± 19.9 87.8 ± 0.4 81.3 ± 4.8 59.0 ± 0.2

grid 90.1 ± 1.3 84.9 ± 1.3 97.8 ± 0.8 83.7 ± 0.3 97.7 ± 0.3 87.3 ± 3.2 97.4 ± 0.8 71.1 ± 0.3 36.9 ± 7.4 80.7 ± 0.6 85.0 ± 2.2 28.4 ± 1.2

hazelnut 85.4 ± 7.5 96.3 ± 0.3 97.4 ± 0.1 93.9 ± 0.2 96.9 ± 0.2 95.8 ± 0.4 97.3 ± 0.1 91.9 ± 0.1 62.6 ± 4.1 86.9 ± 2.1 89.1 ± 3.5 33.7 ± 0.5

leather 92.2 ± 1.3 96.6 ± 0.7 98.9 ± 0.4 96.4 ± 0.0 99.2 ± 0.1 97.8 ± 0.0 98.9 ± 0.0 95.3 ± 0.0 71.1 ± 9.2 95.9 ± 1.8 98.1 ± 0.3 71.1 ± 0.4

metal nut 55.8 ± 14.4 90.1 ± 4.0 96.0 ± 1.3 86.4 ± 0.2 90.9 ± 0.1 92.3 ± 0.7 96.6 ± 0.8 85.4 ± 0.1 68.5 ± 16.2 88.6 ± 2.5 91.5± 0.4 35.2 ± 0.9

pill 95.1 ± 1.1 88.4 ± 3.4 86.0 ± 7.9 85.4 ± 0.2 90.4 ± 0.2 86.2 ± 2.6 79.2 ± 10.3 78.1 ± 0.0 74.9 ± 15.2 83.5 ± 3.7 72.5 ± 11.1 32.7 ± 0.4

screw 59.2 ± 26.2 82.7 ± 2.6 82.1 ± 11.9 72.0 ± 1.0 81.0 ± 1.0 73.5 ± 1.8 76.5 ± 2.1 62.9 ± 0.5 0.2 ± 0.2 73.0 ± 11.2 54.8 ± 18.5 28.5 ± 1.1

tile 87.5 ± 3.1 84.4 ± 1.5 97.6 ± 0.0 88.6 ± 0.0 83.4 ± 2.2 88.3 ± 4.3 84.0 ± 0.2 85.1 ± 0.0 20.1 ± 5.6 81.4 ± 3.8 88.3 ± 4.6 55.1 ± 1.0

toothbrush 94.0 ± 1.0 83.6 ± 1.4 89.7 ± 1.8 85.7 ± 0.1 91.7 ± 0.3 86.2 ± 1.2 67.3 ± 2.7 81.7 ± 0.3 45.7 ± 8.4 69.9 ± 0.8 68.5 ± 8.8 45.7 ± 1.3

transistor 58.3 ± 19.0 95.8 ± 0.2 86.1 ± 3.4 80.9 ± 0.2 79.7 ± 7.3 96.9 ± 0.2 88.6 ± 3.9 86.9 ± 0.2 43.8 ± 8.7 93.3 ± 0.5 93.7 ± 0.1 35.7 ± 0.7

wood 93.0 ± 0.7 91.0 ± 1.4 95.8 ± 0.3 88.3 ± 0.0 94.3 ± 0.2 88.8 ± 0.3 89.4 ± 0.0 80.6 ± 0.0 35.9 ± 16.4 82.8 ± 1.7 68.5 ± 13.3 53.1 ± 1.3

zipper 89.7 ± 2.2 94.8 ± 0.6 96.7 ± 0.5 90.5 ± 0.0 93.4 ± 0.5 95.6 ± 0.4 96.4 ± 0.5 88.5 ± 0.1 45.3 ± 9.1 88.7 ± 0.8 95.0 ± 0.3 40.4 ± 0.7

mean 82.4 ± 11.2 90.4 ± 2.7 93.6 ± 3.9 86.7 ± 0.3 90.5 ± 3.5 90.8 ± 1.7 89.5 ± 3.5 83.5 ± 0.2 51.0 ± 12.0 85.0 ± 3.5 83.1 ± 7.4 42.3 ± 0.9

Figure 5: AD performance on the real test set over rela-

tive receptive ﬁeld size of all layers of Wide ResNet-50 and

DenseNet-201 when using the corresponding features for

PatchCore. The mean values over all 15 MVTec AD ob-

jects are reported.

thetic validation set D

V,syn

in terms of the harmonic

mean P

of AU-ROC

1.00

and AU-PRO

0.30

. In Ta-

ble 1 we report this metric of the chosen layer on the

real test set P

∗

test

)}. We run PatchCore three

times in each case. SimpleNet is run three times for

Wide ResNet-50 but only once for DenseNet-201 due

to the large number of layers and thus long training

time. The results for SimpleNet with DenseNet-201

can be found in the appendix. The repeated experi-

ments each have a different dataset split D

and D

and different generated synthetic anomalies in D

V,bad

The reported values are the means of the three runs

and the resulting standard deviations. Table 1 also

contains a column baseline. We compute the baseline

value P

as average performance over all available

layers P

∑

l=1

test

)} since this would be

the expected value of P

when choosing a random

layer of the feature extractor.

In all experiments our feature selection method

outperforms the baseline when using CutPaste or NSA

defects. Even the very basic defect type Simple leads

to a layer choice that beneﬁts ﬁnal performance on the

real test set for many objects. As expected, the more

realistic the defect types appear, the better performs

the feature selection algorithm. For PatchCore the

layer selection based on NSA outperforms the base-

line by approximately 7 percentage points on average

both for Wide ResNet-50 and DenseNet-201. The av-

erage performance of Simple defects is on par with

those from CutPaste and NSA for DenseNet-201. It is

signiﬁcantly worse for Wide ResNet-50.

This could be due to the signiﬁcantly larger stan-

dard deviations between different runs, compared to

Table 2: Performance (P

) on the real test set of MVTec AD

of the layer L

∗

obtained by feature selection (FS) using NSA

synthetic defects compared to using features concatenated

from multiple layers (ML), both averaged over three runs.

PatchCore SimpleNet

WideResnet50 DenseNet201 WideResnet50

object FS ML FS ML FS ML

bottle 98.0 ± 0.2 97.8 ± 0.0 97.4 ± 0.3 97.9 ± 0.1 93.8 ± 0.7 71.4 ± 13.8

cable 93.9 ± 1.1 94.5 ± 0.1 94.7 ± 0.3 95.3 ± 0.2 87.9 ± 0.7 73.8 ± 8.4

capsule 97.5 ± 0.5 97.4 ± 0.2 92.6 ± 2.7 96.5 ± 0.5 77.8 ± 6.9 87.0 ± 5.1

carpet 90.2 ± 1.5 96.4 ± 0.1 85.7 ± 6.7 95.8 ± 0.3 81.3 ± 4.8 69.6 ± 23.8

grid 97.8 ± 0.8 96.0 ± 0.5 97.4 ± 0.8 95.7 ± 0.7 85.0 ± 2.2 34.7 ± 13.9

hazelnut 97.4 ± 0.1 98.0 ± 0.2 97.3 ± 0.1 97.3 ± 0.3 89.1 ± 3.5 35.5 ± 39.1

leather 98.9 ± 0.4 99.0 ± 0.0 98.9 ± 0.0 99.0 ± 0.0 98.1 ± 0.3 94.5 ± 2.9

metal nut 96.0 ± 1.3 97.3 ± 0.1 96.6 ± 0.8 97.2 ± 0.1 91.5 ± 0.4 67.7 ± 33.5

pill 86.0 ± 7.9 96.2 ± 0.2 79.2 ± 10.3 96.2 ± 0.2 72.5 ± 11.1 88.2 ± 1.0

screw 82.1 ± 11.9 94.4 ± 0.7 76.5 ± 2.1 90.8 ± 1.5 54.8 ± 18.5 14.1 ± 10.8

tile 97.6 ± 0.0 94.6 ± 0.2 84.0 ± 0.2 94.7 ± 0.1 88.3 ± 4.6 41.0 ± 20.5

toothbrush 89.7 ± 1.8 90.8 ± 0.0 67.3 ± 2.7 91.8 ± 0.6 68.5 ± 8.8 63.6 ± 4.0

transistor 86.1 ± 3.4 89.5 ± 0.7 88.6 ± 3.9 89.5 ± 0.2 93.7 ± 0.1 84.4 ± 4.3

wood 95.8 ± 0.3 95.0 ± 0.1 89.4 ± 0.0 94.9 ± 0.2 68.5 ± 13.3 65.2 ± 4.3

zipper 96.7 ± 0.5 97.6 ± 0.1 96.4 ± 0.5 97.7 ± 0.2 95.0 ± 0.3 46.8 ± 16.5

mean 93.6 ± 3.9 95.6 ± 0.3 89.5 ± 3.5 95.4 ± 0.5 83.1 ± 7.4 62.5 ± 17.5

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

160

those of CutPaste and NSA. The defect type Sim-

ple may give a good result with our feature selection

method, but is clearly more unstable. The reason for

this variability could be the large difference between

the Simple defects, which are only black rectangles,

and the real defects.

Looking at SimpleNet, there is a huge gap be-

tween the baseline performance and the performance

of the feature selection based on CutPaste or NSA.

Our proposed method outperforms the baseline by

more than 40 percentage points, i.e. doubling the per-

formance. Defect type Simple falls behind this perfor-

mance, but is still 9 percentage points better than the

baseline.

4.3 Comparison to Multi-Layer

Features

Table 2 compares the performance of PatchCore and

SimpleNet using multi-layer features as suggested in

the original work (Roth et al., 2022; Liu et al., 2023b)

to the results obtained by our feature selection using

NSA defects. For both feature extractors the default

multi-layer features are obtained by concatenating the

feature maps of the last layers of block 2 and 3, re-

spectively. On average, the selection of a single layer

by our method provides slightly worse features for

PatchCore than the default multi-layer setting. How-

ever, for most object categories feature selection leads

to comparable results (e.g. Wide ResNet-50: ob-

jects cable, leather, toothbrush) or even outperforms

the multi-layer baseline (e.g. Wide ResNet-50: ob-

jects bottle, capsule, tile). Likewise, for SimpleNet

feature selection leads to signiﬁcantly better results

than the multi-layer setting for nearly all object cat-

egories. Compared to the original publication (Liu

et al., 2023b) the multi-layer results for SimpleNet

are signiﬁcantly worse. This might be due to several

reasons: We disable checkpoint selection based on

the test set metrics, use a smaller number of training

images and often the poor localization performance

heavily affects the reported P

. In total, Table 2 not

only veriﬁes the feasibility of application-speciﬁc fea-

ture selection but also conﬁrms its potential beneﬁts

over state-of-the-art default feature layers.

4.4 Ablation Studies

Choice of Selection Criterion. In Sec. 4.2 we se-

lect the layer L

∗

based on the best value of the

harmonic mean P

of AU-ROC

1.00

and AU-PRO

0.30

achieved on the synthetic validation set D

V,syn

. We

want to investigate the impact of this selection crite-

rion on the ﬁnal performance on the real test set D

test

Thus, we additionally perform feature layer selection

based on anomaly classiﬁcation performance P

cls

AU-ROC

1.00

and report all metrics in Table 3. The

average performance over the three distinct dataset

splits with synthetic defects of type NSA, exemplar-

ily for PatchCore and DenseNet-201, is reported. We

see that the layer selection outperforms the baseline

for both selection criteria, and for all three evaluation

metrics. As expected, the evaluation metric P

on the

real test data is higher when selecting the layer based

on this metric (ﬁrst column), than when selecting it

based on P

cls

(second column). Interestingly, even

cls

is higher in the former case. This means, that

for the transfer from synthetic to real defective data it

is important to select features that are distinctive for

both anomaly classiﬁcation and anomaly localization.

This is in line with our motivation to also locate de-

fects precisely in real-world applications.

Table 3: Comparison of average performances P

, P

cls

and

loc

on the real test set of MVTec AD for feature layer se-

lection by harmonic mean P

and P

cls

with the baseline.

This table shows results of PatchCore on DenseNet-201

with feature selection based on defect type NSA.

selection crit. Harmonic mean AU-ROC

1.00

baseline

object P

cls

loc

cls

loc

cls

loc

bottle 97.4 99.9 95.0 83.7 98.9 72.6 90.9 99.6 84.0

cable 94.7 99.0 90.9 86.3 95.4 78.8 83.4 92.0 76.9

capsule 92.6 95.7 89.8 82.5 89.7 76.5 81.2 89.5 75.0

carpet 85.7 85.4 85.9 93.6 97.3 90.2 89.6 95.0 85.1

grid 97.4 99.4 95.5 84.1 91.0 78.1 71.1 79.9 64.6

hazelnut 97.3 100.0 94.8 96.5 99.9 93.3 91.9 96.5 87.9

leather 98.9 100.0 97.8 97.0 100.0 94.2 95.3 99.0 92.1

metal nut 96.6 99.5 93.9 82.3 93.4 73.7 85.4 93.8 78.9

pill 79.2 87.6 73.5 75.6 81.3 70.8 78.1 83.5 74.0

screw 76.5 85.0 69.8 84.2 88.3 81.0 62.9 76.7 55.0

tile 84.0 95.6 74.9 91.9 99.4 85.5 85.1 98.0 75.5

toothbrush

67.3 84.8 55.9 73.4 93.3 60.5 81.7 93.2 73.5

transistor 88.6 99.2 80.2 95.2 99.0 91.7 86.9 94.9 80.7

wood 89.4 97.0 82.8 89.4 97.0 82.8 80.6 96.9 70.1

zipper 96.4 97.7 95.2 83.8 94.7 75.7 88.5 96.0 82.6

mean 89.5 95.1 85.1 86.6 94.6 80.4 83.5 92.3 77.1

Generalization from Synthetic to Real Data. To

verify the generalization of feature selection using

synthetic defects on real test data, Table 4 reports

not only the performance on the real test set of

the model using the features of the selected layer

∗

(k = 1) but also the averaged performance

∑

l=1

P {M

test

)} over the top k layers with

best performance on the synthetic validation set

V,syn

. Especially for NSA synthetic defects the pro-

posed selection strategy always outperforms the base-

line of randomly choosing a single layer for Metal Nut

and Leather. This indicates that the best features se-

lected by our strategy using this kind of synthetic de-

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

161

Table 4: Average performance (P

) on the real test set of

MVTec AD of the k best layers on the synthetic validation

set for PatchCore with a Wide ResNet-50 feature extractor

using different synthetic defects. Results are averaged over

three distinct initial dataset splits D

and D

and compared

to the average performance over all layers.

object Metal Nut Leather

k Simple CutPaste NSA Simple CutPaste NSA

1 55.8 ± 0.0 90.1 ± 0.0 96.0 ± 0.0 92.2 ± 0.0 96.6 ± 0.0 98.9 ± 0.0

2 51.4 ± 16.4 91.4 ± 1.9 96.1 ± 0.9 92.9 ± 0.8 96.7 ± 1.0 98.9 ± 0.0

3 50.8 ± 14.2 91.4 ± 1.9 96.2 ± 0.8 93.9 ± 1.6 96.4 ± 1.0 99.0 ± 0.2

4 61.6 ± 22.6 90.5 ± 2.8 96.4 ± 0.8 94.0 ± 1.4 96.8 ± 1.1 98.8 ± 0.4

all 86.4 ± 12.8 96.4 ± 3.0

fect consistently generalize well to the defects occur-

ring in the real test set. This also applies to utilizing

CutPaste defects, whereas Simple defects are both in-

sufﬁcient and less stable. The complete ablation is

provided in the appendix.

5 CONCLUSION

Using features from a pretrained feature extractor has

become a popular paradigm in developing methods

for the task of unsupervised Anomaly Detection and

Localization. Commonly, for different application

scenarios features from the same layers are utilized.

Moreover, oftentimes multiple feature maps are com-

bined to obtain expressive features. However, recent

work found that an optimal, application-speciﬁc se-

lection of a single feature layer is sufﬁcient to reach

state-of-the-art performance or even outperforms the

default multi-layer setting (Heckler et al., 2023).

In this paper, we followed this line of work and

proposed the ﬁrst feature selection strategy for AD.

First, we demonstrated the necessity of consider-

ing all layers of a pretrained feature extractor in

application-speciﬁc feature selection. Thereupon, we

presented a selection strategy based on synthetic de-

fects that chooses from this overall set of layers and

does not require any real anomalous samples. Our

proposed strategy was evaluated for two state-of-the-

art AD methods, PatchCore (Roth et al., 2022) and

SimpleNet (Liu et al., 2023b), as well as two fea-

ture extractors, Wide ResNet-50 (Zagoruyko and Ko-

modakis, 2016) and DenseNet-201 (Huang et al.,

2017), both pretrained on ImageNet (Krizhevsky

et al., 2012). Especially using CutPaste or NSA syn-

thetic defects for our selection strategy consistently

led to improvements over the expected value of a ran-

domly chosen layer. Moreover, for certain object cat-

egories of the MVTec AD dataset application-speciﬁc

feature selection outperformed the default multi-layer

features as well.

Overall, our method may serve as a competitive

baseline for further research in this ﬁeld. However,

future work may reduce the computational cost of

estimating the suitability of a layer for generating

application-speciﬁc features. Other synthetic defect

types may also be considered in combination with our

strategy, potentially leading to an even more consis-

tent improvement over ensembled features from mul-

tiple layers.

REFERENCES

Akc¸ay, S., Atapour-Abarghouei, A., and Breckon, T. P.

(2019). Skip-ganomaly: Skip connected and adversar-

ially trained encoder-decoder anomaly detection. In

2019 International Joint Conference on Neural Net-

works (IJCNN), pages 1–8.

Ba, J. and Caruana, R. (2014). Do deep nets really need

to be deep? In Ghahramani, Z., Welling, M., Cortes,

C., Lawrence, N., and Weinberger, K., editors, Ad-

vances in Neural Information Processing Systems,

volume 27. Curran Associates, Inc.

Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., and

Steger, C. (2022). Beyond Dents and Scratches: Log-

ical Constraints in Unsupervised Anomaly Detection

and Localization. International Journal of Computer

Vision, 130(4):947–969.

Bergmann, P., Fauser, M., Sattlegger, D., and Steger, C.

(2019a). MVTec AD — A Comprehensive Real-

World Dataset for Unsupervised Anomaly Detection.

In IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), pages 9584–9592.

Bergmann, P., Fauser, M., Sattlegger, D., and Steger,

C. (2020). Uninformed Students: Student-Teacher

Anomaly Detection With Discriminative Latent Em-

beddings. In IEEE Conference on Computer Vision

and Pattern Recognition (CVPR), pages 4182–4191.

Bergmann, P., L

owe, S., Fauser, M., Sattlegger, D., and

Steger, C. (2019b). Improving Unsupervised De-

fect Segmentation by Applying Structural Similarity

to Autoencoders. In Proceedings of the 14th Inter-

national Joint Conference on Computer Vision, Imag-

ing and Computer Graphics Theory and Applica-

tions - Volume 5: VISAPP, pages 372–380. INSTICC,

SciTePress.

Blum, H., Sarlin, P.-E., Nieto, J., Siegwart, R., and Ca-

dena, C. (2019). Fishyscapes: A Benchmark for Safe

Semantic Segmentation in Autonomous Driving. In

IEEE International Conference on Computer Vision

Workshops (ICCVW), pages 2403–2412.

Cao, Y., Xu, X., Liu, Z., and Shen, W. (2023). Collaborative

discrepancy optimization for reliable image anomaly

localization. IEEE Transactions on Industrial Infor-

matics, pages 1–10.

Cohen, N. and Hoshen, Y. (2020). Sub-image anomaly

detection with deep pyramid correspondences. arXiv

preprint arXiv:2005.02357v1.

Cui, Y., Liu, Z., and Lian, S. (2023). A survey on unsu-

pervised anomaly detection algorithms for industrial

images. IEEE Access, 11:55297–55315.

Defard, T., Setkov, A., Loesch, A., and Audigier, R. (2020).

Padim: A patch distribution modeling framework for

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

162

anomaly detection and localization. In Pattern Recog-

nition. ICPR International Workshops and Challenges

2021, Proceedings, Part IV, volume 12664 of Lecture

Notes in Computer Science, pages 475–489. Springer.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Ghahra-

mani, Z., Welling, M., Cortes, C., Lawrence, N., and

Weinberger, K., editors, Advances in Neural Infor-

mation Processing Systems, volume 27. Curran Asso-

ciates, Inc.

Guo, Y., Jiang, M., Huang, Q., Cheng, Y., and Gong,

J. (2023). Mldfr: A multilevel features restoration

method based on damaged images for anomaly detec-

tion and localization. IEEE Transactions on Industrial

Informatics, pages 1–10.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-

ual Learning for Image Recognition. In IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 770–778.

Heckler, L., K

onig, R., and Bergmann, P. (2023). Explor-

ing the importance of pretrained feature extractors for

unsupervised anomaly detection and localization. In

Proceedings of the IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition (CVPR) Work-

shops, pages 2916–2925.

Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M.,

Steinhardt, J., and Song, D. (2019). A Bench-

mark for Anomaly Segmentation. arXiv preprint

arXiv:1911.11132v1.

Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the

knowledge in a neural network.

Huang, G., Liu, Z., van der Maaten, L., and Weinberger,

K. Q. (2017). Densely connected convolutional net-

works. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition (CVPR).

Komi

c, J. (2011). Harmonic mean. In International Ency-

clopedia of Statistical Science, pages 622–624, Berlin,

Heidelberg. Springer Berlin Heidelberg.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

ageNet Classiﬁcation With Deep Convolutional Neu-

ral Networks. In Advances in Neural Information Pro-

cessing Systems, pages 1097–1105.

Li, C.-L., Sohn, K., Yoon, J., and Pﬁster, T. (2021). Cut-

paste: Self-supervised learning for anomaly detection

and localization. In 2021 IEEE/CVF Conference on

Computer Vision and Pattern Recognition (CVPR),

pages 9659–9669.

Li, W.-X., Mahadevan, V., and Vasconcelos, N. (2013).

Anomaly Detection and Localization in Crowded

Scenes. IEEE Transactions on Pattern Analysis and

Machine Intelligence (TPAMI), 36(1):18–32.

Liu, J., Xie, G., Wang, J., Li, S., Wang, C., Zheng, F., and

Jin, Y. (2023a). Deep industrial image anomaly detec-

tion: A survey. arXiv preprint arXiv:2301.11514.

Liu, Z., Zhou, Y., Xu, Y., and Wang, Z. (2023b). Simplenet:

A simple network for image anomaly detection and

localization. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 20402–20411.

Luo, W., Li, Y., Urtasun, R., and Zemel, R. (2016). Un-

derstanding the effective receptive ﬁeld in deep con-

volutional neural networks. In Advances in Neural

Information Processing Systems, volume 29. Curran

Associates, Inc.

Luo, W., Yao, H., and Yu, W. (2023). Normal reference

attention and defective feature perception network for

surface defect detection. IEEE Transactions on Instru-

mentation and Measurement, 72:1–14.

Masci, J., Meier, U., Cires¸an, D., and Schmidhuber, J.

(2011). Stacked Convolutional Auto-Encoders for Hi-

erarchical Feature Extraction. In Artiﬁcial Neural Net-

works and Machine Learning – ICANN 2011, pages

52–59. Springer.

Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J.,

Farahani, K., Kirby, J., et al. (2015). The Multi-

modal Brain Tumor Image Segmentation Benchmark

(BRATS). IEEE Transactions on Medical Imaging,

34(10):1993–2024.

Nazare, T. S., de Mello, R. F., and Ponti, M. A. (2018). Are

pre-trained cnns good feature extractors for anomaly

detection in surveillance videos? arXiv preprint

arXiv:1811.08495.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,

Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,

Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-

Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,

Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).

PyTorch: An Imperative Style, High-Performance

Deep Learning Library. In Advances in Neural Infor-

mation Processing Systems, volume 32.

erez, P., Gangnet, M., and Blake, A. (2003). Poisson im-

age editing. ACM Trans. Graph., 22(3):313–318.

Prunella, M., Scardigno, R. M., Buongiorno, D., Brunetti,

A., Longo, N., Carli, R., Dotoli, M., and Bevilacqua,

V. (2023). Deep learning for automatic vision-based

recognition of industrial surface defects: A survey.

IEEE Access, 11:43370–43423.

Reiss, T., Cohen, N., Bergman, L., and Hoshen, Y. (2021).

Panda: Adapting pretrained features for anomaly de-

tection and segmentation. In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition, pages 2806–2814.

Rippel, O., Mertens, P., and Merhof, D. (2021). Model-

ing the distribution of normal data in pre-trained deep

features for anomaly detection. In 2020 25th Inter-

national Conference on Pattern Recognition (ICPR).

IEEE.

Roth, K., Pemula, L., Zepeda, J., Sch

olkopf, B., Brox, T.,

and Gehler, P. V. (2022). Towards total recall in in-

dustrial anomaly detection. In IEEE/CVF Conference

on Computer Vision and Pattern Recognition, CVPR

2022, pages 14298–14308. IEEE.

Rudolph, M., Wehrbein, T., Rosenhahn, B., and Wandt, B.

(2023). Asymmetric student-teacher networks for in-

dustrial anomaly detection. In IEEE/CVF Winter Con-

ference on Applications of Computer Vision, WACV

2023, pages 2591–2601. IEEE.

Schl

uter, H. M., Tan, J., Hou, B., and Kainz, B. (2022). Nat-

ural synthetic anomalies for self-supervised anomaly

detection and localization. In Avidan, S., Brostow,

G., Ciss

e, M., Farinella, G. M., and Hassner, T., edi-

tors, Computer Vision – ECCV 2022, pages 474–489,

Cham. Springer Nature Switzerland.

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

163

Seeb

ock, P., Orlando, J. I., Schlegl, T., Waldstein, S. M.,

Bogunovi

c, H., Klimscha, S., Langs, G., and Schmidt-

Erfurth, U. (2020). Exploiting epistemic uncertainty

of anatomy segmentation for anomaly detection in

retinal oct. IEEE Transactions on Medical Imaging,

39(1):87–98.

Sener, O. and Savarese, S. (2018). Active learning for con-

volutional neural networks: A core-set approach. In

6th International Conference on Learning Represen-

tations, ICLR 2018.

Tan, M. and Le, Q. (2019). Efﬁcientnet: Rethinking model

scaling for convolutional neural networks. In Interna-

tional conference on machine learning, pages 6105–

6114. PMLR.

Yu, J., Zheng, Y., Wang, X., Li, W., Wu, Y., Zhao, R.,

and Wu, L. (2021). Fastﬂow: Unsupervised anomaly

detection and localization via 2d normalizing ﬂows.

arXiv preprint arXiv:2111.07677v2.

Zagoruyko, S. and Komodakis, N. (2016). Wide residual

networks. In British Machine Vision Conference 2016.

British Machine Vision Association.

Zavrtanik, V., Kristan, M., and Sko

caj, D. (2022). Dsr

– a dual subspace re-projection network for surface

anomaly detection. In Avidan, S., Brostow, G., Ciss

M., Farinella, G. M., and Hassner, T., editors, Com-

puter Vision – ECCV 2022, pages 539–554, Cham.

Springer Nature Switzerland.

Zavrtanik, V., Kristan, M., and Sko

caj, D. (2021). Draem

- a discriminatively trained reconstruction embedding

for surface anomaly detection. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion (ICCV), pages 8330–8339.

Zhang, X., Li, S., Li, X., Huang, P., Shan, J., and Chen,

T. (2023). Destseg: Segmentation guided denoising

student-teacher for anomaly detection. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition (CVPR), pages 3914–3923.

Zou, Y., Jeong, J., Pemula, L., Zhang, D., and Dabeer,

O. (2022). Spot-the-difference self-supervised pre-

training for anomaly detection and segmentation. In

Computer Vision – ECCV 2022, pages 392–408,

Cham. Springer Nature Switzerland.

APPENDIX

Implementation Details of Synthetic

Defects

For all three defect types D

V,bad

consists of the same

original images, only the synthetic defect type differs.

We do not put any prior knowledge on the real defects

into defect generation.

Simple and CutPaste. We parameterize the rectan-

gles such that they have a minimum distance to the

image borders of 20 pixels. Width and height are

randomly sampled and cover between ten and ﬁfty

percent of the image width and height, respectively.

Thus, it is not guaranteed that the defect appears on

the actual object in the image.

Natural Synthetic Anomalies (NSA). We make

use of the ofﬁcial implementation at https://github.

com/hmsch/natural-synthetic-anomalies with commit

id 9195916 and use all of the proposed parameters.

In particular, we distinguish between objects and tex-

tures for the Poisson blending. NSA (Schl

uter et al.,

2022) also estimates the background to ensure that de-

fects are generated on the objects. However, this pro-

cedure sometimes fails and defects can also occur in

the background. In each image, the number of gener-

ated defects is randomly sampled between 1 and 4 for

textures and objects screw and zipper and between 1

and 3 for the remaining objects.

Can We Reduce the Initial Set of Layers?

For the following experiments we trained on a re-

duced training set D

of the original training set

AD,train

and evaluate the performance on the original

test set S

AD,test

for each MVTec AD object category,

respectively. For comparability to our main experi-

ments, D

contains 70% of the original anomaly-free

training images.

Performance by Block and Layer Type. The per-

formance when exclusively using a single layer L for

feature extraction grouped by block for MVTec AD

object cable is provided in Figure A.1 for PatchCore.

In contrast to object grid (Figure 4a), features from

later layers are more suitable.

Figure A.1: AD metrics on real test set of MVTec AD ob-

ject cable. Shown is for each metric the mean with stan-

dard deviation over all layers belonging to same block in

Wide ResNet-50, trained with PatchCore.

Performance over Relative Receptive Field. The

performance averaged over all MVTec AD objects for

SimpleNet is shown in Figure A.2.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

164

Table A.1: Average performance (P

) on the real test set of MVTec AD of the k best layers on the synthetic validation set for

PatchCore with a Wide ResNet-50 feature extractor using different synthetic defects. Results are averaged over three distinct

initial dataset splits D

and D

and compared to the average performance over all layers.

object Bottle Cable Capsule Hazelnut Carpet

k Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA

1 88.4 ± 0.0 94.1 ± 0.0 98.0 ± 0.0 77.1 ± 0.0 93.2 ± 0.0 93.9 ± 0.0 97.4 ± 0.0 91.7 ± 0.0 97.5 ± 0.0 85.4 ± 0.0 96.3 ± 0.0 97.4 ± 0.0 73.1 ± 0.0 89.2 ± 0.0 90.2 ± 0.0

2 72.8 ± 17.3 93.9 ± 0.4 98.1 ± 0.1 77.5 ± 19.1 92.9 ± 1.1 93.9 ± 0.7 97.2 ± 0.2 91.8 ± 0.4 97.3 ± 0.2 88.7 ± 5.1 96.4 ± 0.1 97.2 ± 0.2 71.5 ± 2.6 84.2 ± 6.2 90.9 ± 1.0

3 77.3 ± 15.9 93.9 ± 0.7 98.1 ± 0.2 82.8 ± 17.9 92.9 ± 1.0 93.5 ± 1.0 91.1 ± 10.5 91.6 ± 0.5 97.4 ± 0.3 89.3 ± 4.7 96.3 ± 0.3 97.3 ± 0.3 69.0 ± 4.3 84.9 ± 5.6 91.2 ± 1.3

4 77.5 ± 14.0 93.8 ± 0.8 98.0 ± 0.2 85.1 ± 16.3 93.2 ± 1.0 93.7 ± 0.9 92.7 ± 9.7 91.2 ± 1.4 97.5 ± 0.3 90.9 ± 5.2 96.3 ± 0.6 97.4 ± 0.3 74.3 ± 10.0 85.0 ± 10.3 91.2 ± 1.3

all 91.0 ± 8.0 81.3 ± 16.9 87.3 ± 10.7 93.9 ± 5.4 89.8 ± 6.6

object Pill Screw Toothbrush Transistor Zipper

k Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA

1 95.1 ± 0.0 88.4 ± 0.0 86.0 ± 0.0 59.2 ± 0.0 82.7 ± 0.0 82.1 ± 0.0 94.0 ± 0.0 83.6 ± 0.0 89.7 ± 0.0 58.3 ± 0.0 95.8 ± 0.0 86.1 ± 0.0 89.7 ± 0.0 94.8 ± 0.0 96.7 ± 0.0

2 95.5 ± 0.9 89.8 ± 2.1 87.9 ± 2.4 59.2 ± 13.0 82.4 ± 1.8 78.9 ± 7.8 93.5 ± 1.0 85.0 ± 1.5 88.6 ± 1.5 66.9 ± 11.7 95.8 ± 0.2 87.2 ± 3.1 88.4 ± 3.1 94.2 ± 0.6 96.6 ± 0.4

3 95.3 ± 0.8 90.4 ± 2.2 90.3 ± 5.3 64.6 ± 14.1 82.9 ± 2.4 78.8 ± 7.3 90.7 ± 5.2 85.4 ± 1.4 88.6 ± 1.5 65.9 ± 12.9 95.9 ± 0.2 86.7 ± 2.7 86.4 ± 4.5 93.9 ± 1.0 96.9 ± 0.8

4 94.7 ± 1.7 90.0 ± 2.2 91.0 ± 5.6 65.4 ± 13.6 83.2 ± 2.9 78.4 ± 6.4 91.9 ± 5.1 85.0 ± 1.7 88.2 ± 1.5 67.0 ± 15.9 95.9 ± 0.3 88.2 ± 3.5 85.6 ± 4.3 94.0 ± 0.9 96.8 ± 0.9

all 85.4 ± 10.6 72.0 ± 18.6 85.7 ± 8.4 80.9 ± 15.3 90.5 ± 8.2

object Grid Tile Wood

κ Simple CutPaste NSA Simple CutPaste NSA Simple CutPaste NSA

1 90.1 ± 0.0 84.9 ± 0.0 97.8 ± 0.0 87.5 ± 0.0 84.4 ± 0.0 97.6 ± 0.0 93.0 ± 0.0 91.0 ± 0.0 95.8 ± 0.0

2 90.6 ± 0.7 84.5 ± 0.6 97.4 ± 0.5 86.9 ± 1.8 85.0 ± 2.3 97.7 ± 0.1 92.7 ± 0.8 89.0 ± 2.7 95.7 ± 0.3

3 90.3 ± 0.8 84.4 ± 0.5 97.4 ± 0.5 89.1 ± 3.5 85.5 ± 2.3 97.1 ± 0.9 92.8 ± 0.7 89.9 ± 3.0 95.6 ± 0.3

4 89.4 ± 3.7 83.6 ± 1.6 97.2 ± 0.8 89.7 ± 4.2 85.6 ± 2.2 96.8 ± 1.0 92.7 ± 0.7 89.0 ± 3.0 95.6 ± 0.4

all 83.7 ± 12.4 88.6 ± 6.5 88.3 ± 8.6

Figure A.2: AD performance on the real test set over rela-

tive receptive ﬁeld size of all layers of Wide ResNet-50 and

DenseNet-201 when using the corresponding features for

SimpleNet. The mean values over all 15 MVTec AD ob-

jects are reported.

- Considering all experiments, the observed pat-

terns vary signiﬁcantly for different object categories

and methods such that a universal pre-selection of ex-

pressive features by block, layer type or relative re-

ceptive ﬁeld size is challenging.

Feature Selection Using Synthetic Defects

The results of our main experiment where we ap-

ply our feature selection strategy using different syn-

thetic defects for SimpleNet extracting features with

a single layer of a DenseNet-201 can be found in Ta-

ble A.2.

Table A.2: Harmonic mean P

on the real test set of

MVTec AD for the selected layers of each defect type, for

SimpleNet with a DenseNet-201 feature extractor. The last

column baseline is the average performance of all layers.

Only a single run was conducted.

object Simple CutPaste NSA baseline

bottle 94.2 93.0 96.7 87.0

cable 68.5 92.7 90.3 68.8

capsule 93.8 67.2 94.1 71.9

carpet 70.4 91.5 78.2 81.0

grid 97.0 88.3 97.8 67.2

hazelnut 93.7 96.3 96.8 79.1

leather 99.1 97.4 98.9 90.3

metal nut 86.1 92.3 93.9 73.9

pill 89.4 78.8 63.7 65.8

screw 77.4 73.9 60.0 55.4

tile 70.8 86.3 82.2 72.6

toothbrush 84.5 76.3 33.9 73.9

transistor 62.4 95.8 83.9 73.1

wood 84.9 84.4 85.6 70.6

zipper 95.5 95.4 96.9 83.7

mean 84.5 87.3 83.5 74.3

Ablation Studies

Generalization from Synthetic to Real Data. The

results of all MVTec AD objects for our ablation

study that validates the consistent generalization from

synthetic to real defects in our selection strategy, es-

pecially for defect types CutPaste and NSA, can be

found in Table A.1 for PatchCore extracting features

with a single Wide ResNet-50 layer.

Feature Selection for Unsupervised Anomaly Detection and Localization Using Synthetic Defects

165