ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations

ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. Further, even when model access is possible, their exact computation may be prohibitively expensive. We investigate whether meaningful Shapley value estimations can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN that is pretrained on synthetic datasets generated from random structural causal models and supervised using exact or near-exact Shapley values. Once trained, ExplainerPFN predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations. Our contributions are fourfold: (1) we show that few-shot learning-based explanations can achieve high fidelity to SHAP values with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley values without access to the underlying model or reference explanations; (3) we provide an open-source implementation of ExplainerPFN, including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.


💡 Research Summary

ExplainerPFN tackles a fundamental limitation of Shapley‑based model explanations: the need for direct access to the underlying predictive model. In many real‑world deployments—credit scoring APIs, proprietary hiring tools, medical diagnosis services—models are black‑boxes, and querying them repeatedly for SHAP calculations is either impossible or prohibitively expensive. The authors propose a zero‑shot approach that requires only the input data distribution and the model’s predictions, without any gradients, label queries, or pre‑computed explanations.

The method builds on TabPFN, a transformer‑based tabular foundation model that learns to perform in‑context learning from synthetic data. Instead of predicting target labels, ExplainerPFN is trained to predict per‑instance Shapley values. To generate training data, the authors sample random directed acyclic graphs (DAGs) representing structural causal models (SCMs). From each DAG they generate a synthetic tabular dataset, select a subset of nodes as features, and one node as a binary target. A simple feed‑forward network is then trained on this synthetic data to serve as the “base model” f b . Exact Shapley values for f b are computed (using exact enumeration for low‑dimensional cases and permutation‑based SHAP for higher dimensions), yielding triplets (X, Ŷ, Φ) where X are feature vectors, Ŷ are model predictions, and Φ are the true Shapley attributions. By varying DAG topology, noise levels, feature counts, and target nodes across millions of tasks, the model learns a distribution‑level prior over the relationship between (x, ŷ) pairs and their Shapley vectors.

The architecture mirrors TabPFN’s transformer encoder: each (x_i, ŷ_i) pair becomes a token, enriched with positional information. Tokens interact through bidirectional self‑attention, allowing the model to capture both intra‑instance feature interactions and inter‑instance statistical dependencies that are crucial for attribution. A design choice forces the predicted output ŷ_i and each feature x_{j,i} to occupy the first two columns of the input matrix, ensuring a consistent mapping from these columns to the corresponding attribution while still leveraging the full context of the remaining features.

Training uses a probabilistic objective. Shapley values are standardized within each task and discretized into K buckets. The model outputs a categorical distribution p(˜ϕ_{j,i}=b_k | x_i, ŷ_i, X_ref, Ŷ_ref). The loss is the Negative Log Predictive Density (NLPD), encouraging both accurate point estimates (the expectation over the bucket distribution) and calibrated uncertainty. Normalization of Shapley values (subtracting task‑wise mean and dividing by standard deviation) stabilizes training across tasks with heterogeneous scales.

During inference, ExplainerPFN receives only the test set X_test and the corresponding predictions Ŷ_test from the inaccessible model. Optionally, a reference set (X_ref, Ŷ_ref) drawn from the same distribution can be supplied, but no ground‑truth attributions are required. The model then produces estimated Shapley vectors ˆΦ for every test instance, effectively acting as a conditional generative model of explanations.

Empirical evaluation proceeds along three axes. First, on synthetic data, the authors assess whether the model can recover the underlying DAG structure, confirming that the learned prior captures causal patterns. Second, on a suite of real‑world tabular benchmarks (UCI, OpenML), ExplainerPFN’s estimated attributions are compared against KernelSHAP, TreeSHAP, and other surrogate explainers using correlation, RMSE, and rank‑based metrics. Results show that the zero‑shot model attains performance comparable to these established methods, often outperforming them when feature correlations are strong. Third, a few‑shot scenario is examined where 2–10 true SHAP vectors are provided as references; ExplainerPFN with only two references matches or exceeds the accuracy of few‑shot surrogate models that rely on up to ten examples.

The paper’s contributions are fourfold: (1) demonstration that few‑shot explanations can achieve high fidelity with as few as two reference points; (2) the first zero‑shot method for estimating Shapley values without any model access or reference explanations; (3) an open‑source pipeline—including synthetic data generator, pre‑training code, and pretrained weights—facilitating reproducibility; (4) extensive experiments showing competitive performance against state‑of‑the‑art SHAP approximations. Limitations are acknowledged: reliance on synthetic pre‑training may not capture all complexities of real domains; the current focus is on binary classification, leaving multi‑class and regression extensions for future work; and performance can be sensitive to the choice of bucket granularity and token ordering.

In summary, ExplainerPFN opens a new research direction where foundation models learn to “explain” without ever seeing real explanations, leveraging only the statistical structure of tabular data and model predictions. This capability could dramatically broaden the applicability of explainable AI to settings where model access is legally or technically restricted, providing timely, trustworthy feature importance estimates for auditing, bias detection, and regulatory compliance. Future work may explore richer synthetic priors, multi‑task learning across different explanation paradigms, and integration with other game‑theoretic attribution concepts such as Banzhaf values.


Comments & Academic Discussion

Loading comments...

Leave a Comment