Fair Feature Importance Scores via Feature Occlusion and Permutation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.

💡 Research Summary

The paper addresses a gap in the fairness literature: while feature importance methods for predictive performance are well‑studied, there is a lack of simple, model‑agnostic tools that quantify how individual input features affect a model’s fairness. The authors propose two complementary importance scores that can be applied to any classifier or regressor and any fairness metric (e.g., demographic parity, equalized odds).

1. Permutation‑based Fair Feature Importance (ρ_perm).
For a given feature j, the method randomly permutes its values across the dataset, producing a perturbed matrix X^π(j). A model f^π(j) is trained on this permuted data, while the original model f is trained on the unaltered data. The importance score is defined as the difference in the chosen fairness metric h between the two models: ρ_perm(j) = h(y, f^π(j)(X^π(j)), z) – h(y, f(X), z). By breaking the statistical dependence between feature j and the target while preserving the marginal distribution of all features, the score isolates the contribution of j to unfairness. The authors note that computing ρ_perm for all M features requires M+1 model trainings, which can be prohibitive in high‑dimensional settings, and that strong correlations among features may dilute the interpretability of a single‑feature permutation.

2. Occlusion‑based Fair Feature Importance (ρ_occl).
The second score removes feature j entirely, yielding a reduced matrix X^{–j}. A model f^{–j} is trained on this reduced data, and the importance is the fairness difference relative to the full‑feature model: ρ_occl(j) = h(y, f^{–j}(X^{–j}), z) – h(y, f(X), z). This leave‑one‑out approach directly measures how the absence of a feature changes the fairness of predictions. It is particularly robust when the sample size is limited because it does not rely on a stochastic perturbation of the data.

Efficient computation via Minipatch Learning.
To avoid training a separate model for every possible feature removal, the authors adopt the minipatch framework (Yao & Allen, 2020). They repeatedly sample small sub‑matrices (minipatches) of size n × m (with n ≪ N and m ≪ M), train a model on each, and evaluate the fairness metric on the held‑out portion of the data. For a given feature j, they aggregate the fairness scores only from those minipatches that do not contain j, producing an estimate (\hat b^{–j}). The overall fairness estimate (\hat b) is the average over all minipatches. The occlusion importance is then approximated as (\hat b^{–j} – \hat b). This procedure requires only K model trainings (K = number of minipatches) regardless of M, making ρ_occl scalable to thousands of features.

Experimental validation.
The authors conduct two sets of experiments.

Synthetic data: They generate 1,000 samples with 10 features. Features 1–2 are correlated with a binary protected attribute z (Bernoulli 0.2), while features 1–5 are predictive of the outcome (logistic for classification, linear for regression). Using a Random Forest, they compute both ρ_perm and ρ_occl. The results show strongly negative fairness scores for the two bias‑inducing features and positive accuracy scores for the predictive features, confirming that both metrics correctly identify the intended contributions.

Real‑world benchmarks: They evaluate the Adult Income and German Credit datasets, both standard fairness testbeds. The Adult data contains 96 one‑hot encoded features and 45 k instances; the protected attribute is gender. The German Credit data has 56 one‑hot features and 1 k instances, also using gender as the protected group. Because of the high dimensionality, they employ minipatch learning with K = 2,000 patches, each covering 20 % of rows and columns. Results on the Adult dataset reveal that “Relationship: Husband” receives the most negative fairness score, reflecting a direct gender bias, while “Capital Gain” is highly predictive (large positive accuracy score). “Hours‑per‑week” exhibits a fairness‑accuracy trade‑off, indicating that removing it would improve fairness at the cost of predictive performance. The model’s overall accuracy is 0.84 with a fairness score of 0.85, suggesting room for fairness improvement. In the German Credit dataset, features such as “Duration of the Loan” and “Credit Amount” have positive scores on both dimensions, indicating that this domain exhibits less tension between fairness and accuracy (overall accuracy 0.80, fairness 0.94).

Discussion and future directions.
The paper highlights several contributions: (i) introduction of two intuitive, model‑agnostic fairness importance scores; (ii) a scalable minipatch‑based implementation for the occlusion metric; (iii) empirical evidence that the scores align with known sources of bias in synthetic and real data; and (iv) a demonstration that the metrics can guide feature selection to improve fairness without severely harming performance. Limitations include the computational burden of the permutation method (requiring M+1 full model trainings) and potential sensitivity to multicollinearity. The occlusion method, while efficient, may conflate the effect of a feature with interactions that are lost when the feature is removed. The authors propose extending the framework to multi‑feature interactions, comparing with Shapley‑based fair importance, and adapting the approach to deep neural networks.

Overall assessment.
The work offers a practical addition to the fairness toolbox. Its strength lies in the simplicity of the definitions, the clear connection to established permutation and leave‑one‑out techniques, and the clever use of minipatch learning to make the occlusion score tractable for high‑dimensional, real‑world datasets. By providing both a perturbation‑based and a removal‑based perspective, the authors enable practitioners to diagnose whether a feature is directly responsible for unfair outcomes or whether its influence is mediated through correlations with other variables. The experimental results are convincing and align with prior domain knowledge, reinforcing the credibility of the proposed metrics. Future research that addresses the noted limitations—especially handling feature interactions and extending to non‑tree models—will further solidify the relevance of these methods for responsible AI development.

Fair Feature Importance Scores via Feature Occlusion and Permutation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment