TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

💡 Research Summary

This paper revisits the use of probabilistic values—most notably the Shapley and Banzhaf values—for ranking features in the local explanations of decision‑tree models. The authors begin by observing that the two most common evaluation metrics for feature rankings, the insertion (Ins) and deletion (Del) metrics, can be jointly interpreted as a single joint optimization problem: find a subset S of features that maximizes the model’s prediction f(S) while simultaneously minimizing the prediction on the complement set. Formally, this objective can be written as maximizing ½·(f(S) − f(

TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

💡 Research Summary

Comments & Academic Discussion

Leave a Comment