Partition Trees: Conditional Density Estimation over General Outcome Spaces
We propose Partition Trees, a tree-based framework for conditional density estimation over general outcome spaces, supporting both continuous and categorical variables within a unified formulation. Our approach models conditional distributions as piecewise-constant densities on data adaptive partitions and learns trees by directly minimizing conditional negative log-likelihood. This yields a scalable, nonparametric alternative to existing probabilistic trees that does not make parametric assumptions about the target distribution. We further introduce Partition Forests, an ensemble extension obtained by averaging conditional densities. Empirically, we demonstrate improved probabilistic prediction over CART-style trees and competitive or superior performance compared to state-of-the-art probabilistic tree methods and Random Forests, along with robustness to redundant features and heteroscedastic noise.
💡 Research Summary
The paper introduces Partition Trees, a novel tree‑based framework for conditional density estimation (CDE) that works over arbitrary outcome spaces, handling both continuous and categorical variables within a single unified formulation. The authors model the conditional distribution as a piecewise‑constant density defined on a data‑adaptive partition of the joint covariate‑outcome space (Z = X \times Y). By adopting a measure‑theoretic view, they define the conditional density as the Radon–Nikodym derivative (f = dP_{XY} / d(P_X \otimes \mu_Y)), where (\mu_Y) is Lebesgue for continuous outcomes and counting measure for discrete outcomes.
For any measurable cell (A = A_X \times A_Y) with positive mass, the optimal constant approximation of (f) is the cell‑average
(c_A = P_{XY}(A) / \bigl(P_X(A_X),\mu_Y(A_Y)\bigr)).
Empirically, the authors replace the probabilities with sample frequencies, yielding the estimator
(\hat f_{\pi}(x,y) = \frac{n_{XY}(A)}{n_X(A),\mu_Y(A_Y)}) for the cell (A) that contains ((x,y)). When necessary, a normalized version (\bar f_{\pi}) ensures that the conditional density integrates to one.
The tree construction proceeds by recursively partitioning (Z). A leaf corresponds to a product cell; splits may act on any covariate or outcome coordinate. Splits on (X) refine the covariate region, while splits on (Y) refine the outcome bins inside a fixed covariate region, thereby creating a data‑adaptive histogram of (Y) for each (x). The objective function is the conditional negative log‑likelihood
(L(\pi) = -\mathbb{E}{P{XY}}
Comments & Academic Discussion
Loading comments...
Leave a Comment