개념 기반 시각 이상 탐지와 의미 설명

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

In recent years, Visual Anomaly Detection (VAD) has gained significant attention due to its ability to identify anomalous images using only normal images during training. Many VAD models work without supervision but are still able to provide visual explanations by highlighting the anomalous regions within an image. However, although these visual explanations can be helpful, they lack a direct and semantically meaningful interpretation for users. To address this limitation, we propose extending Concept Bottleneck Models (CBMs) to the VAD setting. By learning meaningful concepts, the network can provide human-interpretable descriptions of anomalies, offering a novel and more insightful way to explain them. Our contributions are threefold: (i) we develop a Concept Dataset to support research on CBMs for VAD; (ii) we improve the CBM architecture to generate both concept-based and visual explanations, bridging semantic and localization interpretability; and (iii) we introduce a pipeline for synthesizing artificial anomalies, preserving the VAD paradigm of minimizing dependence on rare anomalous samples. Our approach, Concept-Aware Visual Anomaly Detection (CONVAD), achieves performance comparable to classic VAD methods while providing richer, concept-driven explanations that enhance interpretability and trust in VAD systems.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Visual Anomaly Detection (VAD) is a computer vision task that aims to identify both normal and anomalous images, while also highlighting the pixels responsible for the anomaly. VAD has applications in various fields, including, but not limited to, manufacturing, medicine, and surveillance [1], [2], [3]. However, collecting an annotated dataset for training VAD models in a supervised way is costly and time-consuming, especially in the manufacturing domain, when the dataset must comprise several images with a wide range of defect types. Because of this, in recent years, many VAD models have been developed and trained based on an unsupervised paradigm, which requires the training set to be made up only of normal images. Although VAD models can produce anomaly segmentation masks and thus provide interpretable results, it is questionable whether these visual insights are sufficient, as they fail to deliver a human-understandable description of the anomaly and the problematic image as a whole. In this work, we address the lack of interpretability in VAD models by considering a different VAD paradigm based on Concept Bottleneck Models (CBMs) [4]. Incorporating CBM models Fig. 1: In a fully supervised setting, CONVAD achieves performance comparable to established VAD models while additionally offering interpretable, concept-based explanations. Additionally, it shows competitive performance also in settings where anomalous images are scarce. into the VAD framework enables it to providing not only image/pixel-level labels but also a series of concepts that can be used by the final user to understand model predictions and enhance the decision-making process. In addition, thanks to CBMs, the users can perform interventions on the intermediate features (concepts), aiding the model in making the final correct prediction, enabling seamless human-machine collaboration, which would not be possible in the classic VAD models.

This work, to the best of our knowledge, is the first to test the adaptation of CBMs to the Visual Anomaly Detection scenario. This adaptation is not straightforward, and a series of challenges have been tackled: (i) CBM models require a Concept Dataset for training, where each sample is annotated with a set of concepts from a predefined concept vocabulary. However, no such dataset currently exists for VAD. Therefore, to adress this gap, we propose an automated pipeline for extracting meaningful con-cepts and annotating state-of-the-art industrial datasets with them, leveraging the capabilities of Vision Language Models (VLMs). (ii) Unsupervised VAD approaches have the advantage of providing a pixel-level visualization of the area of the anomaly, which standard CBMs do not, since they are limited to sample-level predictions. We thus extend the CBM architecture to provide not only concept-based explanations but also visual explanations using the student-teacher paradigm alongside the concept extractor. (iii) Current state-of-the-art VAD models assume to have only normal images during training, but CBMs need both normal and anomalous images to be trained. We introduce a pipeline for generating synthetic anomalies, enabling effective concept learning while minimizing dependence on rare and difficultto-acquire anomalous samples.

The rest of the paper is organized as follows: Sec. II provides relevant VAD literature, while Sec. III explains in detail the CBM architecture and training methodologies. Then, Sec. IV shows our method, Concept-Aware VAD (CONVAD), which is based on the CBM models and adapts them for the VAD setting, solving the previously discussed problems. Sec. V outlines the experimental setting with implementation details and considered VAD scenarios. Sec. VI discusses the results. Finally, Sec. VII suggests directions for future work.

A significant amount of research has been conducted in the Visual Anomaly Detection (VAD) domain in recent years, leading to the development of numerous VAD models. Most VAD methods can be broadly categorized into the following three groups: 1. Feature-based Methods: these approaches rely on representations extracted from pre-trained neural networks. By leveraging the rich semantic and structural information encoded in these features, anomalies can be detected as deviations from normal patterns. Some well-known feature-based methods include PatchCore [5], STFPM [6], FastFlow [7] and PaDiM [8]. 2. Reconstruction-based Methods: reconstruction-based methods operate under the assumption that models trained solely on normal data will struggle to accurately reconstruct anomalous regions. During inference, reconstruction errors highlight potential anomalies. Examples of such methods include AnoGAN [9], f-AnoGAN [10], and UniAD [11]. A downside compared with feature-based models is the need to train a generative model, which can be computationally expensive and resource-intensive. 3. Synthetic Anomaly Methods: these methods augment the original dataset by generating sy

View Original ArXiv

This content is AI-processed based on ArXiv data.

개념 기반 시각 이상 탐지와 의미 설명

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found