The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection
In recent years, performance on existing anomaly detection benchmarks like MVTec AD and VisA has started to saturate in terms of segmentation AU-PRO, with state-of-the-art models often competing in the range of less than one percentage point. This lack of discriminatory power prevents a meaningful comparison of models and thus hinders progress of the field, especially when considering the inherent stochastic nature of machine learning results. We present MVTec AD 2, a collection of eight anomaly detection scenarios with more than 8000 high-resolution images. It comprises challenging and highly relevant industrial inspection use cases that have not been considered in previous datasets, including transparent and overlapping objects, dark-field and back light illumination, objects with high variance in the normal data, and extremely small defects. We provide comprehensive evaluations of state-of-the-art methods and show that their performance remains below 60% average AU-PRO. Additionally, our dataset provides test scenarios with lighting condition changes to assess the robustness of methods under real-world distribution shifts. We host a publicly accessible evaluation server that holds the pixel-precise ground truth of the test set (https://benchmark.mvtec.com/). All image data is available at https://www.mvtec.com/company/research/datasets/mvtec-ad-2.
💡 Research Summary
The paper addresses a critical stagnation in industrial visual anomaly detection benchmarks such as MVTec AD and VisA, where state‑of‑the‑art methods now achieve near‑perfect segmentation scores (AU‑PRO 0.30 > 90 %). Because performance differences are often less than one percentage point, stochastic variations in training and hyper‑parameter tuning obscure genuine methodological advances. To break this impasse, the authors introduce MVTec AD 2, a new benchmark comprising eight carefully selected industrial scenarios and a total of 8,004 high‑resolution images (2.6–5 MP).
Key design aspects
- Challenging visual conditions – The dataset includes transparent objects (Vial), back‑lit jelly, dark‑field illuminated sheet metal, highly reflective cans, and bulk goods where objects overlap (Wall Plugs, Walnuts, Rice). These conditions generate complex reflections, refractions, and occlusions that are rarely present in existing benchmarks.
- Extremely small defects – Defects can be only a few pixels wide, requiring the full resolution of the images to be processed. This forces methods to handle memory‑intensive feature extraction or efficient patch‑based strategies.
- Defect distribution across the whole frame – Unlike MVTec AD and VisA, where anomalies are concentrated near the image centre, MVTec AD 2 deliberately places defects at the borders as well. This tests the robustness of models that rely on centre cropping or padding tricks.
- Multi‑lighting design – For each object, at least four illumination setups are captured (standard, back‑light, dark‑field, spot‑light). The test split contains a “mix” subset where lighting conditions are unseen during training, enabling a systematic study of distribution shift caused by illumination changes.
Dataset split and evaluation protocol
- Training and validation contain only normal images captured under the standard lighting.
- The public test set (TEST pub) provides a small number of labeled images for quick sanity checks.
- Two large private test sets (TEST priv and TEST priv,mix) are only evaluable through a publicly hosted benchmark server (https://benchmark.mvtec.com/). Ground‑truth masks are hidden, preventing any test‑set over‑fitting.
Empirical study
Seven recent unsupervised anomaly detection methods were re‑implemented under identical conditions: PatchCore, RD, RD++, EfficientAD, MSFlow, SimpleNet, and DSR. On established benchmarks these models routinely exceed 90 % AU‑PRO, but on MVTec AD 2 their scores collapse to a range of 46 %–59 %, with EfficientAD achieving the highest mean of 58.7 %. This dramatic drop demonstrates that current approaches, which often rely on memorising normal feature distributions or simple reconstruction errors, are ill‑suited for the combined challenges of high variability, complex illumination, and border defects.
Contributions and impact
- A genuinely hard benchmark – By integrating multiple sources of difficulty (transparent/reflective surfaces, overlapping objects, tiny defects, border anomalies, and lighting shifts), MVTec AD 2 forces researchers to move beyond marginal gains and develop methods that are robust, scalable, and applicable to real production lines.
- Robustness to distribution shift – The explicit lighting‑shift test set provides a controlled yet realistic scenario for evaluating domain‑adaptation or illumination‑invariant techniques, a topic that has received little systematic attention in the anomaly‑detection literature.
- Fair, reproducible evaluation – The hidden‑ground‑truth server eliminates the common practice of tuning hyper‑parameters on the test set, ensuring that reported improvements reflect true algorithmic progress.
Future research directions suggested by the authors include:
- Multi‑scale and memory‑efficient architectures capable of processing megapixel images without sacrificing fine‑grained detail.
- Data‑augmentation or self‑supervised strategies that explicitly model illumination variations (e.g., photometric invariance, contrastive learning across lighting conditions).
- Hybrid physical‑model‑based approaches that incorporate reflectance or refraction models for transparent and metallic surfaces.
- Loss functions that give special weight to border pixels or employ boundary‑aware regularisation.
In summary, MVTec AD 2 constitutes a substantial step forward for the field of unsupervised industrial visual anomaly detection. It re‑defines the benchmark landscape from “high accuracy on easy data” to “robust performance under realistic, multi‑factorial challenges,” thereby opening a fertile ground for innovative algorithms that can be deployed directly in manufacturing environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment