Uncertainty-Aware Extrapolation in Bayesian Oblique Trees

Uncertainty-Aware Extrapolation in Bayesian Oblique Trees
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Decision trees are widely used due to their interpretability and efficiency, but they struggle in regression tasks that require reliable extrapolation and well-calibrated uncertainty. Piecewise-constant leaf predictions are bounded by the training targets and often become overconfident under distribution shift. We propose a single-tree Bayesian model that extends VSPYCT by equipping each leaf with a GP predictor. Bayesian oblique splits provide uncertainty-aware partitioning of the input space, while GP leaves model local functional behaviour and enable principled extrapolation beyond the observed target range. We present an efficient inference and prediction scheme that combines posterior sampling of split parameters with \gls{gp} posterior predictions, and a gating mechanism that activates GP-based extrapolation when inputs fall outside the training support of a leaf. Experiments on benchmark regression tasks show improvements in the predictive performance compared to standard variational oblique trees, and substantial performance gains in extrapolation scenarios.


💡 Research Summary

The paper addresses a fundamental limitation of tree‑based regression models: the piecewise‑constant leaf predictions are bounded by the range of training targets and become over‑confident when faced with distribution shift. While recent variational oblique predictive clustering trees (VSPYCT) introduced Bayesian treatment of split parameters, they still rely on constant prototypes at the leaves, leaving functional uncertainty unmodeled. To close this gap, the authors propose VSPYCT‑GP, a single‑tree Bayesian model that equips each leaf with a Gaussian Process (GP) regressor.

Model architecture

  • Oblique splits: Each internal node implements a probabilistic split ρ(x;θ)=σ(wᵀx+b). The weight vector w and bias b are treated as random variables with variational posteriors q(θ). During prediction, split parameters are sampled from q(θ), yielding stochastic routing that captures structural uncertainty.
  • GP leaves: For every leaf ℓ, the subset of training points routed to it (Dℓ) is used to fit a GP fℓ∼GP(mℓ,kℓ). A shared kernel family is employed across leaves, but hyper‑parameters are optimized per leaf by maximizing the marginal likelihood with Adam. Exact GP inference provides posterior mean μℓ(x) and variance σ²ℓ(x).

Extrapolation‑aware gating
The authors define a leaf‑specific support region using the Mahalanobis distance d(x,ℓ)=‖x−\bar{x}ℓ‖{Σ_ℓ^{-1}} where \bar{x}_ℓ and Σ_ℓ are the empirical mean and covariance of the leaf’s inputs. A soft gating weight w(x,ℓ)=σ((d−τ)/T) (σ denotes the logistic sigmoid, τ a distance threshold, T a temperature) interpolates between the leaf’s constant prototype \bar{y}_ℓ and the GP prediction. When d≪τ, w≈0 and the model behaves like the original VSPYCT, preserving in‑distribution stability. When d≫τ, w≈1 and the GP governs both mean and uncertainty, enabling calibrated extrapolation. The final predictive distribution for a test point x is obtained by Monte‑Carlo averaging over M samples of split parameters, routing, and gated GP outputs. The total predictive variance follows the law of total variance: E


Comments & Academic Discussion

Loading comments...

Leave a Comment