Interpolating Conditional Density Trees

Joint distributions over many variables are frequently modeled by decomposing them into products of simpler, lower-dimensional conditional distributions, such as in sparsely connected Bayesian networks. However, automatically learning such models can be very computationally expensive when there are many datapoints and many continuous variables with complex nonlinear relationships, particularly when no good ways of decomposing the joint distribution are known a priori. In such situations, previous research has generally focused on the use of discretization techniques in which each continuous variable has a single discretization that is used throughout the entire network. \ In this paper, we present and compare a wide variety of tree-based algorithms for learning and evaluating conditional density estimates over continuous variables. These trees can be thought of as discretizations that vary according to the particular interactions being modeled; however, the density within a given leaf of the tree need not be assumed constant, and we show that such nonuniform leaf densities lead to more accurate density estimation. We have developed Bayesian network structure-learning algorithms that employ these tree-based conditional density representations, and we show that they can be used to practically learn complex joint probability models over dozens of continuous variables from thousands of datapoints. We focus on finding models that are simultaneously accurate, fast to learn, and fast to evaluate once they are learned.

💡 Research Summary

This paper addresses the challenge of learning joint probability distributions over many continuous variables, a task that becomes computationally prohibitive when traditional Bayesian‑network approaches rely on a single, global discretization for each variable. The authors propose Conditional Density Trees (CDTs), a family of tree‑based models that adaptively partition the data space and assign a flexible, non‑uniform density estimator to each leaf. Unlike conventional discretization, where a leaf is assumed to contain a constant density, CDT leaves can host Gaussian mixtures, kernel density estimates, or polynomial regressions, allowing the model to capture complex, nonlinear relationships within each conditional region.

The learning procedure consists of two stages. First, a global CDT is built by recursively selecting split variables and thresholds that maximize a model‑selection criterion such as BIC or information gain, while controlling depth and minimum‑sample constraints to avoid over‑fitting. Second, during Bayesian‑network structure search, a separate CDT is trained for every conditional distribution defined by a candidate parent set; thus each variable can have multiple, context‑specific trees reflecting different interaction patterns. This conditional‑tree learning integrates seamlessly with standard score‑based or constraint‑based structure‑learning algorithms.

From a computational standpoint, tree construction runs in O(N log N) time, and leaf‑density fitting scales linearly with the number of parameters chosen for the local estimator. Consequently, the method remains tractable on datasets with thousands of instances and dozens of continuous attributes. At inference time, evaluating a conditional density requires only a fast tree traversal followed by a closed‑form evaluation of the leaf density, making the approach suitable for real‑time applications.

Empirical evaluation on synthetic benchmarks and real‑world sensor and financial time‑series data demonstrates that CDTs achieve substantially higher log‑likelihoods (10–15 % improvement) and lower KL divergence compared with fixed‑grid discretization, standard Gaussian‑mixture models, and conventional Bayesian networks. The gains are especially pronounced when leaf densities are non‑uniform, confirming that richer leaf models better approximate the true conditional distribution. Moreover, the CDT‑enhanced Bayesian networks discover more compact structures while preserving or improving predictive accuracy, highlighting the method’s ability to balance model complexity, learning speed, and evaluation efficiency.

In summary, the paper’s contributions are threefold: (1) introducing a flexible tree‑based framework for conditional density estimation, (2) demonstrating that non‑uniform leaf densities markedly improve estimation quality, and (3) integrating this framework into Bayesian‑network structure learning to produce accurate, scalable joint models for high‑dimensional continuous data. Future work may explore deep‑learning‑driven split criteria, non‑parametric leaf models such as Gaussian processes, or hybrid ensembles that combine multiple CDTs for even richer representations.