Conditional Feature Importance revisited: Double Robustness, Efficiency and Inference
Conditional Feature Importance (CFI) is a classical variable importance measure that accounts for the relationship between the studied feature and the others. However, CFI has not yet been studied from a theoretical perspective because the conditional sampling step has generally been considered a purely practical problem. In this article, we demonstrate that the recent Conditional Permutation Importance (CPI) is indeed a valid implementation of this concept. Under the conditional null hypothesis, we then establish a double robustness property that can be used for variable selection. With either a valid model or a valid conditional sampler, the method correctly identifies null coordinates. Under the alternative hypothesis, we study the theoretical target and link it to the popular Total Sobol Index (TSI). We introduce the Sobol-CPI, which generalizes CPI/CFI, prove that it is nonparametrically efficient, and provide a bias correction. Finally, we propose a consistent and valid type-I error test and present numerical experiments that illustrate our findings.
💡 Research Summary
This paper revisits the classical notion of Conditional Feature Importance (CFI) and provides a rigorous statistical foundation for its modern implementation, the Conditional Permutation Importance (CPI). The authors first show that CPI is indeed a valid estimator of CFI when the conditional sampling step is performed correctly. Under the conditional null hypothesis (X_j ⟂⊥ Y | X_{‑j}), they prove a double‑robustness property: accurate detection of a null feature requires only one of two components—either the predictive model b_m or the conditional sampler P′_j—to be consistent. This contrasts with the LOCO (Leave‑One‑Covariate‑Out) approach, which needs both the full‑model and the reduced‑model estimators to be accurate.
The theoretical framework assumes an additive‑innovation model for the covariates, X_j = ν_{‑j}(X_{‑j}) + ε_j with ε_j ⟂⊥ X_{‑j}. Under this assumption, if the conditional regression ν̂_{‑j} consistently estimates ν_{‑j}, the conditional permutation constructed by adding permuted residuals converges in 2‑Wasserstein distance to the true conditional distribution. Consequently, CPI’s conditional sampler asymptotically draws from the desired distribution, establishing CPI as a bona‑fide CFI measure.
Beyond the null case, the authors link CPI to the Total Sobol Index (TSI), a widely used sensitivity‑analysis metric defined as the loss reduction when a feature is omitted. They demonstrate that the raw CPI does not target TSI, but a simple bias‑correction yields a new estimator, Sobol‑CPI, which directly estimates TSI. Sobol‑CPI is shown to be non‑parametrically efficient, achieving the semiparametric efficiency bound, while avoiding the computational burden of LOCO (which requires retraining a model for each feature).
Inference is addressed by noting that, under the null, the influence function of CPI vanishes, leading to under‑estimated variance if standard asymptotic normality is used. The paper proposes a variance‑correction technique and a non‑parametric finite‑sample test that leverages the accuracy of the conditional sampler. Both methods guarantee valid type‑I error control and retain high power.
Extensive experiments on synthetic data, genomics, and image classification illustrate the theoretical claims. Results show that (i) CPI reliably identifies null features when either the predictor or the conditional sampler is well‑specified; (ii) Sobol‑CPI provides unbiased estimates of TSI with lower variance than PFI and comparable or better performance than LOCO; (iii) the proposed testing procedures maintain nominal significance levels even in high‑dimensional, highly correlated settings where PFI fails.
In summary, the paper delivers a comprehensive treatment of conditional variable importance: it formalizes CPI as a theoretically sound CFI estimator, establishes double robustness for variable selection, introduces Sobol‑CPI as an efficient TSI estimator, and supplies valid inference tools. These contributions bridge the gap between practical permutation‑based importance measures and rigorous sensitivity‑analysis theory, offering practitioners a robust, computationally efficient alternative for feature relevance assessment in complex, correlated data environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment