Pulling Back the Curtain on Deep Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Post-hoc explainability methods typically associate each output score of a deep neural network with an input-space direction, most commonly instantiated as the gradient and visualized as a saliency map. However, these approaches often yield explanations that are noisy, lack perceptual alignment and, thus, offer limited interpretability. While many explanation methods attempt to address this issue via modified backward rules or additional heuristics, such approaches are often difficult to justify theoretically and frequently fail basic sanity checks. We introduce Semantic Pullbacks (SP), a faithful and effective post-hoc explanation method for deep neural networks. Semantic Pullbacks address the limitations above by isolating the network’s effective linear action via a principled pullback formulation and refining it to recover coherent local structures learned by the target neuron. As a result, SP produces perceptually aligned, class-conditional explanations that highlight meaningful features, support compelling counterfactual perturbations, and admit a clear theoretical motivation. Across standard faithfulness benchmarks, Semantic Pullbacks significantly outperform established attribution methods on both classical convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), while remaining general and computationally efficient. Our method can be easily plugged into existing deep learning pipelines and extended to other modalities.

💡 Research Summary

The paper introduces Semantic Pullbacks (SP), a novel post‑hoc attribution technique that improves upon traditional gradient‑based explanations by leveraging the adjoint (pullback) of a network’s dynamic linear component. Modern deep networks contain many input‑dependent operations—ReLU gating, batch‑norm statistics, attention weights, layer‑norm scaling—that cause gradients to incorporate unnecessary derivative terms, resulting in noisy, unstable saliency maps that often fail sanity checks.

Core Concept – Pullback:
The authors model a pretrained network (f) as a composition of dynamic linear maps (W_\ell(x_{\ell-1})). For a scalar selector (u) (e.g., the logit of class (c)), the network output can be written as (s_u(x)=\langle u, f(x)\rangle = \langle W(x)^\top u, x\rangle). The vector field (\nu_u(x)=W(x)^\top u) is called the pullback of (u). Unlike the full gradient (\nabla_x s_u(x)), the pullback only propagates the effect of the linear part, ignoring how the gating/normalisation/attention parameters themselves depend on the input. Consequently, pullbacks coincide with gradients for purely linear or piece‑wise‑linear layers (e.g., Conv, ReLU, Max‑Pool) but diverge for SiLU/GELU, LayerNorm, and self‑attention, where gradients contain extra terms from differentiating through the gates or softmax.

From Single Pullback to Expected Pullback:
A single pullback samples the network’s feature direction at a single input point. The authors argue that a neuron typically encodes a feature in expectation over the data distribution; at any given image only a subset of that feature may be active, making the raw pullback appear fragmented. To capture the locally expected pullback, they propose two approximations:

Soft Pullback (SfP): Replace hard gating (e.g., ReLU’s indicator (1{z>0})) with a smooth expectation derived from a Gaussian assumption on pre‑activations. Concretely, the hard gate is substituted by (\Phi(z/\sigma)) (the normal CDF) or a temperature‑scaled sigmoid (\sigma(z/\tau)). This “soft adjoint” yields a backward operator that passes weak but consistent signals, allowing weakly expressed feature components to accumulate in the attribution map.
Double Pullback (DP): For layers where routing is already smooth (self‑attention, LayerNorm), a single soft pullback often highlights background regions. The authors therefore apply a two‑step procedure: first compute a pullback, add the resulting perturbation to the input, and then compute a second pullback on this perturbed input. Formally, \

Pulling Back the Curtain on Deep Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment