Extremal Contours: Gradient-driven contours for compact visual attribution

Extremal Contours: Gradient-driven contours for compact visual attribution
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.


💡 Research Summary

The paper “Extremal Contours: Gradient-driven contours for compact visual attribution” introduces a novel paradigm for explaining vision models by replacing traditional dense perturbation masks with smooth, parameterized contours. The core motivation addresses the limitations of existing methods, such as fragmented, overfitted masks that require extensive post-processing and lack topological guarantees.

The proposed method represents an explanation as a single, simply connected star-convex region. This region is defined by a smooth closed contour parameterized by a low-dimensional truncated Fourier series relative to a learnable center point. This formulation reduces the number of free parameters by orders of magnitude compared to pixel-wise masks. The contour is converted into a soft mask via a sigmoid function. The mask is then used to create two perturbed versions of the input image: a “preserved” variant where the inside of the contour is kept clear and the outside is blurred, and a “deleted” variant with the opposite treatment.

Optimization is driven by an extremal principle objective. The loss function encourages the feature embedding of the preserved variant to remain similar to the original image’s embedding, while pushing the embedding of the deleted variant away from it. This is complemented by an adaptive area penalty term that promotes compactness and a spectral regularization term that penalizes high-frequency oscillations in the contour shape, ensuring smooth boundaries. The entire system is optimized end-to-end for a single image using gradient descent, without requiring model retraining or dataset-level optimization.

Comprehensive evaluations are conducted on ImageNet and COCO datasets using both supervised (ResNet-50) and self-supervised (DINO ViT-B/16) models. Quantitative metrics assessing localization (relevance rank/mass), complexity (entropy, sparseness), and faithfulness show that Extremal Contours achieve competitive or superior performance compared to strong baselines like Gradient SHAP, Integrated Gradients, Smooth Mask, and Grad-CAM++. Notably, the method delivers particularly strong gains on the self-supervised DINO model, improving relevance mass by over 15% while maintaining positive faithfulness correlations where some baselines fail. Qualitatively, the generated explanations are compact, smooth, and intuitively interpretable as they cleanly enclose the object of interest, contrasting with the often diffuse or fragmented attributions of other methods.

The paper further demonstrates the robustness of the method to initialization and hyperparameter settings, its ability to generate fidelity-area trade-off profiles by controlling the target mask area, and outlines a natural extension to multiple contours for handling images with several objects. In summary, Extremal Contours offer a structured, efficient, and robust approach for visual attribution that successfully balances explanation fidelity, compactness, and stability.


Comments & Academic Discussion

Loading comments...

Leave a Comment