Visual Anomaly Detection and Localization with a Patch-Wise Transformer and Convolutional Model

Afshin Dini, Esa Rahtu

2023

Abstract

We present a one-class classification approach for detecting and locating anomalies in vision applications based on the combination of convolutional networks and transformers. This method utilizes a pre-trained model with four blocks of patch-wise transformer encoders and convolutional layers to extract patch embeddings from normal samples. The patch features from the third and fourth blocks of the model are then combined together to form the final representations, and then several multivariate Gaussian distributions are mapped on these normal embeddings accordingly. At the testing phase, irregularities are detected and located by setting a threshold on anomaly score and map defined by calculating the Mahalanobis distances between the patch embeddings of test samples and the related normal distributions. By evaluating the proposed method on the MVTec dataset, we find out that not only can this method detect anomalies properly due to the ability of the convolutional and transformer layers to present local and overall properties of an image, respectively, but also it is computationally efficient as it skips the training phase by using a pre-trained network as the feature extractor. These properties make our method a good candidate for detecting and locating irregularities in real-world industrial applications.

Download


Paper Citation


in Harvard Style

Dini A. and Rahtu E. (2023). Visual Anomaly Detection and Localization with a Patch-Wise Transformer and Convolutional Model. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 144-152. DOI: 10.5220/0011669400003417


in Bibtex Style

@conference{visapp23,
author={Afshin Dini and Esa Rahtu},
title={Visual Anomaly Detection and Localization with a Patch-Wise Transformer and Convolutional Model},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={144-152},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011669400003417},
isbn={978-989-758-634-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - Visual Anomaly Detection and Localization with a Patch-Wise Transformer and Convolutional Model
SN - 978-989-758-634-7
AU - Dini A.
AU - Rahtu E.
PY - 2023
SP - 144
EP - 152
DO - 10.5220/0011669400003417
PB - SciTePress