Learning Latent Tree Graphical Models
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset.
💡 Research Summary
The paper addresses the problem of learning latent‑tree graphical models when only a subset of variables is observed. A latent tree consists of hidden (latent) nodes and observed nodes connected in a tree structure; the hidden nodes are not directly measured but explain the dependencies among the observed variables. Existing approaches either force the observed variables to be leaves or rely on global optimization over all variables, which quickly becomes computationally prohibitive as the number of variables grows.
The authors propose two novel, provably consistent, and computationally efficient algorithms that remove these restrictions.
The first algorithm, Recursive Grouping (RG), builds the tree bottom‑up using an “information distance” metric. Information distance is defined as the negative logarithm of the mutual information between two variables; in a tree it is proportional to the length of the unique path connecting them. RG computes pairwise information distances among the observed variables, identifies groups of siblings (variables that share a common hidden parent), inserts a new hidden node for each group, and contracts the group into a single composite node. This process repeats recursively until a minimal latent tree is obtained—one that contains no redundant hidden nodes (every hidden node has at least two children). The authors prove that, given enough samples, RG recovers the true latent tree with probability approaching one.
The second algorithm, CLGrouping, dramatically reduces the computational burden of RG by first performing a global preprocessing step. It constructs a tree over the observed variables alone, typically using the Chow‑Liu algorithm (maximum‑weight spanning tree based on pairwise mutual information) or a minimum‑spanning‑tree approach. This observed‑only tree serves as a proxy for the true latent structure, clustering together observed variables that are likely to be close in the underlying latent tree. The algorithm then applies RG (or an equivalent local grouping procedure) independently within each cluster. Because each cluster contains far fewer nodes than the whole dataset, the overall complexity drops from O(n³) to roughly O(n log n), where n is the number of observed variables. A post‑processing step refines the connections between clusters to correct possible boundary errors.
Both algorithms are also presented in regularized forms. By incorporating model‑selection criteria such as the Bayesian Information Criterion (BIC), the regularized versions balance fit to the data against tree complexity, preventing the proliferation of unnecessary hidden nodes and allowing the methods to produce latent‑tree approximations of arbitrary distributions.
The experimental evaluation is extensive. Synthetic experiments involve hidden Markov models, star‑shaped graphs, and randomly generated latent trees, varying sample size and number of nodes. Results show that CLGrouping consistently outperforms plain RG and other baselines (e.g., Neighbor‑Joining, Spectral Tree) in both accuracy of structure recovery and runtime, especially for large‑scale problems with thousands of variables. Real‑world case studies include modeling monthly returns of S&P 500 stocks and the word co‑occurrence structure in the 20 Newsgroups text corpus. In the financial data, the learned latent tree reveals sector‑level dependencies; in the text data, it uncovers a hierarchical organization of topics. The regularized models demonstrate that even when a perfect tree representation is impossible, the algorithms still yield useful approximations that capture the dominant dependency patterns.
In summary, the paper makes three key contributions: (1) a recursive grouping algorithm that leverages information distances to construct minimal latent trees without assuming observed nodes are leaves; (2) a clustering‑guided extension (CLGrouping) that combines a global observed‑only tree with local grouping to achieve near‑linear scalability; and (3) regularized variants that enable latent‑tree approximation for general distributions. The theoretical guarantees, algorithmic simplicity, and strong empirical performance suggest that these methods will be valuable for a wide range of applications where hidden hierarchical structure must be inferred from partially observed data.
Comments & Academic Discussion
Loading comments...
Leave a Comment