An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
Unsupervised anomaly detection (AD) methods typically assume clean training data, yet real-world datasets often contain undetected or mislabeled anomalies, leading to significant performance degradation. Existing solutions require access to the training pipelines, data or prior knowledge of the proportions of anomalies in the data, limiting their real-world applicability. To address this challenge, we propose EPHAD, a simple yet effective test-time adaptation framework that updates the outputs of AD models trained on contaminated datasets using evidence gathered at test time. Our approach integrates the prior knowledge captured by the AD model trained on contaminated datasets with evidence derived from multimodal foundation models like Contrastive Language-Image Pre-training (CLIP), classical AD methods like the Local Outlier Factor or domain-specific knowledge. We illustrate the intuition behind EPHAD using a synthetic toy example and validate its effectiveness through comprehensive experiments across eight visual AD datasets, twenty-six tabular AD datasets, and a real-world industrial AD dataset. Additionally, we conduct an ablation study to analyse hyperparameter influence and robustness to varying contamination levels, demonstrating the versatility and robustness of EPHAD across diverse AD models and evidence pairs. To ensure reproducibility, our code is publicly available at https://github.com/sukanyapatra1997/EPHAD.
💡 Research Summary
The paper tackles a practical yet under‑explored problem in unsupervised anomaly detection (AD): training data are often contaminated with undetected anomalies, which biases the learned model and degrades detection performance. Existing remedies either require access to the training pipeline, knowledge of the contamination ratio, or expensive re‑training, making them unsuitable for real‑world deployments where proprietary models are used as black boxes.
To address this gap, the authors propose EPHAD (Evidence‑Based Post‑hoc Adjustment Framework for Anomaly Detection), a test‑time adaptation method that adjusts the outputs of a pre‑trained AD model without touching its training data or architecture. The core idea is to introduce an evidence function T(x) that assigns higher scores to samples believed to be normal. Evidence can be derived from multimodal foundation models such as CLIP (image‑text similarity), classical AD techniques like Local Outlier Factor (LOF), or domain‑specific heuristics.
Mathematically, if the contaminated model estimates a density f±(x) (or an equivalent score‑based density), EPHAD constructs a revised density (\check f_{\pm}(x) = f_{\pm}(x) \exp(T(x)/\beta) / Z_{\beta}), where β is a temperature parameter controlling the trade‑off between fidelity to the original model and alignment with the evidence. The authors prove that, under a mild condition (the expected log‑evidence under the true normal distribution is positive), the KL‑divergence between the true normal density f+ and the adjusted density (\check f_{\pm}) is strictly smaller than that between f+ and the original contaminated density f±. This guarantees that the adjustment moves the model closer to the true normal distribution.
For high‑dimensional data where explicit density estimation is infeasible, the framework is extended to score‑based AD. The original anomaly score s_out(x) is combined with the evidence term, yielding a new score that can be thresholded in the usual way. The temperature β again modulates the influence of the evidence.
Empirical evaluation spans eight visual AD benchmarks (including MVTec‑AD), twenty‑six tabular datasets, and a real‑world industrial dataset. Five representative AD models (DeepSVDD, GOAD, CS‑Flow, PatchCore, etc.) are tested with three evidence sources (CLIP similarity, LOF, and handcrafted domain rules). Across the board, EPHAD improves the Area Under the ROC Curve (AUC) by 3–12 percentage points compared to the unadjusted model, with larger gains at higher contamination levels (≥20 %). Ablation studies confirm that the combination of model predictions and evidence outperforms either source alone, and that performance is robust to reasonable variations of β and evidence weighting. Importantly, on clean (non‑contaminated) data the method does not cause noticeable degradation, demonstrating safe applicability.
The paper positions EPHAD within the broader literature on test‑time adaptation, drawing parallels to KL‑regularized policy updates used in generative model fine‑tuning. By requiring only black‑box access to the AD model and a lightweight evidence function, EPHAD offers a practical, computationally cheap solution for scenarios where data contamination cannot be eliminated or quantified. The authors release their code on GitHub, facilitating reproducibility and encouraging adoption in both academic and industrial settings.
Comments & Academic Discussion
Loading comments...
Leave a Comment