Vendi Novelty Scores for Out-of-Distribution Detection

Vendi Novelty Scores for Out-of-Distribution Detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estimates in feature space, often under restrictive distributional assumptions. In this work, we introduce a third paradigm and formulate OOD detection from a diversity perspective. We propose the Vendi Novelty Score (VNS), an OOD detector based on the Vendi Scores (VS), a family of similarity-based diversity metrics. VNS quantifies how much a test sample increases the VS of the in-distribution feature set, providing a principled notion of novelty that does not require density modeling. VNS is linear-time, non-parametric, and naturally combines class-conditional (local) and dataset-level (global) novelty signals. Across multiple image classification benchmarks and network architectures, VNS achieves state-of-the-art OOD detection performance. Remarkably, VNS retains this performance when computed using only 1% of the training data, enabling deployment in memory- or access-constrained settings.


💡 Research Summary

Out‑of‑distribution (OOD) detection is a prerequisite for deploying machine learning systems in safety‑critical settings, yet most post‑hoc detectors rely on either the model’s confidence scores (e.g., maximum softmax probability, energy) or on density estimation in the feature space (e.g., Mahalanobis distance, class‑conditional Gaussians). These approaches either make strong assumptions about the geometry of the learned representations or incur heavy computational overhead (e.g., k‑nearest‑neighbor searches, multiple forward‑backward passes).

The paper introduces a fundamentally different paradigm: measuring OOD novelty through diversity. The authors build on the Vendi Scores (VS), a family of similarity‑based diversity metrics originally proposed for evaluating dataset diversity. Given a set of normalized embeddings X∈ℝ^{N×D} and a positive‑semi‑definite kernel k, the VS is defined as the Rényi entropy of the eigenvalues of the similarity matrix K = XXᵀ/N. For order q=2 the score simplifies to VS₂ = Tr(K²), which can be computed efficiently when the cosine kernel is used because the non‑zero eigenvalues of K coincide with those of the D×D Gram matrix XᵀX/N. This reduces the computational cost from O(N³) to O(D²N + D³), and with a rank‑1 approximation even to O(D).

The Vendi Novelty Score (VNS) leverages this machinery to quantify how much a test sample x increases the diversity of the in‑distribution (ID) feature set. For each class c, the method first builds a class‑conditional density matrix ρ_c = X_cᵀX_c / N_c from the training embeddings belonging to that class. After adding the test embedding h(x) (ℓ₂‑normalized) the updated matrix is ρ’_c = (N_c ρ_c + h(x)h(x)ᵀ)/(N_c+1). The class‑conditional novelty is defined as the log‑ratio Δ_c(x) = log VS₂(ρ’_c) – log VS₂(ρ_c). By keeping only the largest eigenvalue λ_c and its eigenvector u_c of ρ_c, the authors derive a closed‑form O(D) expression for Δ_c(x) that depends on the projection α_c(x) = (u_cᵀ h(x))². This approximation avoids the instability of estimating a full eigenspectrum for small classes and empirically yields the best performance.

To turn the per‑class novelty scores into a single OOD metric, the classifier’s predictive distribution p_c(x) is used as a weighting scheme. The top‑K classes with highest probabilities are selected (set T_K(x)), and a hyper‑parameter γ ≥ 0 controls the sharpness of the weighting. The local OOD score is

S_LOCAL‑OOD(x) = Σ_{c∈T_K(x)} N_c p_c(x)^γ Δ_c(x).

This formulation mirrors recent probability‑weighted aggregations (e.g., GEN) but replaces confidence‑based terms with diversity‑based novelty contributions.

In addition to local novelty, the authors incorporate a global diversity term that captures how the test sample affects the overall dataset’s diversity. The global density matrix ρ_global = XᵀX / N is used to compute VS_∞ = 1/λ_max(ρ_global). By applying a first‑order rank‑one update (Proposition 3.1), the change in the largest eigenvalue after adding h(x) can be approximated as λ_max(ρ’_global) ≈ N λ_max(ρ_global) + (u_maxᵀ h(x))²/(N+1). This yields a global novelty Δ_global(x) that is scaled by the total number of training points n, leading to

S_GLOBAL‑OOD(x) = – n log


Comments & Academic Discussion

Loading comments...

Leave a Comment