The effect of whitening on explanation performance
Explainable Artificial Intelligence (XAI) aims to provide transparent insights into machine learning models, yet the reliability of many feature attribution methods remains a critical challenge. Prior research (Haufe et al., 2014; Wilming et al., 2022, 2023) has demonstrated that these methods often erroneously assign significant importance to non-informative variables, such as suppressor variables, leading to fundamental misinterpretations. Since statistical suppression is induced by feature dependencies, this study investigates whether data whitening, a common preprocessing technique for decorrelation, can mitigate such errors. Using the established XAI-TRIS benchmark (Clark et al., 2024b), which offers synthetic ground-truth data and quantitative measures of explanation correctness, we empirically evaluate 16 popular feature attribution methods applied in combination with 5 distinct whitening transforms. Additionally, we analyze a minimal linear two-dimensional classification problem (Wilming et al., 2023) to theoretically assess whether whitening can remove the impact of suppressor features from Bayes-optimal models. Our results indicate that, while specific whitening techniques can improve explanation performance, the degree of improvement varies substantially across XAI methods and model architectures. These findings highlight the complex relationship between data non-linearities, preprocessing quality, and attribution fidelity, underscoring the vital role of pre-processing techniques in enhancing model interpretability.
💡 Research Summary
This paper investigates whether data whitening—a linear decorrelation preprocessing step—can alleviate the well‑documented problem of suppressor variables causing erroneous attributions in post‑hoc explainable AI (XAI) methods. Suppressor variables arise when features are statistically dependent; they are not directly predictive but can be exploited by a model to improve performance, leading many attribution techniques to assign high importance to these irrelevant features.
The authors employ the XAI‑TRIS benchmark, which provides synthetic 8 × 8 binary image classification tasks built around tetromino shapes. Four scenarios are considered: a linear task (LIN), a multiplicative non‑linear task (MUL‑T), a rigid task with random translations/rotations (RIGID), and an XOR task. Each scenario is generated with two background types: an uncorrelated “WHITE” background (baseline) and a correlated “CORR” background obtained by Gaussian smoothing, which deliberately introduces feature correlations and thus suppressor effects.
Five whitening transforms are evaluated: (1) Sphering (eigendecomposition of the sample covariance), (2) Symmetric Orthogonalization (eigendecomposition of the overlap matrix), (3) Optimal Signal Preservation (OSP, using the correlation matrix), (4) Cholesky whitening (lower‑triangular transform based on the covariance), and (5) Partial Regression (feature‑wise residualization). All transforms are applied only to the CORR data; the WHITE data remains unwhitened as a control.
Three model families are trained on each (whitened or raw) dataset: a Linear Logistic Regression (LLR), a four‑layer Multi‑Layer Perceptron (MLP) with ReLU activations, and a four‑layer Convolutional Neural Network (CNN) with ReLU and max‑pooling. Models are required to achieve at least 80 % test accuracy to ensure comparable predictive performance across conditions.
Sixteen popular feature‑attribution methods are examined, including LIME, SHAP, Gradient SHAP, Integrated Gradients, Layer‑wise Relevance Propagation (LRP), DeepLIFT, and several others. For reference, four model‑agnostic baselines—Sobel edge detection, Laplace edge detection, a uniform random map, and the raw input itself—are also reported.
Explanation quality is quantified using two metrics adopted from Clark et al. (2024b): (a) Precision, defined as the proportion of correctly identified tetromino pixels among the top‑k most important pixels, and (b) Earth Mover’s Distance (EMD), measuring the optimal‑transport cost required to transform the attribution map into the ground‑truth mask (normalized so that a score of 1 indicates a perfect explanation).
Key empirical findings:
- On CORR data, all XAI methods suffer a marked drop in both precision and EMD compared with the WHITE baseline, confirming that correlated background noise induces suppressor‑driven mis‑attributions.
- Applying whitening partially restores performance, but the magnitude of recovery varies widely across transforms. Sphering, OSP, and especially Symmetric Orthogonalization achieve the largest gains, often bringing precision close to the WHITE level and reducing EMD by 30‑40 %.
- Cholesky whitening and Partial Regression provide modest improvements but leave noticeable residual importance in background regions, likely because Cholesky’s ordering dependence and Partial Regression’s incomplete decorrelation leave some linear dependencies intact.
- The benefit of whitening is most pronounced for the linear LLR model, where the relationship between features and predictions remains linear; for the highly non‑linear CNN, whitening still helps but cannot fully eliminate spurious attributions because the network’s internal non‑linearities generate new feature correlations.
- Gradient‑based methods (Gradient SHAP, Integrated Gradients) show a clear visual shift from diffuse to more focal heatmaps after whitening, whereas sampling‑based methods like LIME are less sensitive to the preprocessing step.
Theoretical analysis revisits the two‑dimensional suppressor example from Wilming et al. (2023). With features (x_1) (predictive) and (x_2) (non‑predictive) correlated by (\rho), the Bayes‑optimal classifier depends only on (x_1). However, a standard linear regression‑based attribution will allocate weight proportional to (\rho) on (x_2). Whitening the data (transforming the covariance to the identity matrix) removes (\rho), aligning the regression coefficients with the Bayes‑optimal solution. The authors extend this reasoning to show that, for any linear model, whitening eliminates suppressor influence, but for non‑linear models the transformation does not guarantee removal because subsequent non‑linear layers can re‑introduce dependencies.
Conclusions:
- Data whitening can substantially mitigate suppressor‑induced attribution errors, but its efficacy depends on (i) the specific whitening algorithm, (ii) the linearity of the downstream model, and (iii) the sensitivity of the XAI method to feature correlations.
- Symmetric Orthogonalization, Sphering, and OSP are the most reliable transforms for preserving the original feature space while achieving decorrelation.
- For practitioners, incorporating whitening into the preprocessing pipeline is advisable when dealing with correlated inputs, but one should still validate explanation quality for each model‑XAI pairing, especially when using deep non‑linear architectures.
Overall, the study provides a comprehensive empirical and theoretical assessment of whitening as a practical tool to enhance the fidelity of post‑hoc explanations, highlighting both its promise and its limits.
Comments & Academic Discussion
Loading comments...
Leave a Comment