The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.
💡 Research Summary
The paper introduces the Discrete Infinite Logistic Normal distribution (DILN), a Bayesian non‑parametric prior that extends the Hierarchical Dirichlet Process (HDP) by allowing correlations among the group‑level mixture weights. While the HDP shares an infinite set of atoms across groups, each group’s weights are drawn independently from a Dirichlet distribution, precluding any explicit modeling of dependence between components. DILN overcomes this limitation by assigning each atom a latent location in a low‑dimensional space and defining a kernel function over these locations; the kernel determines the covariance among the weights. Mathematically, DILN can be viewed as an HDP scaled by an exponentiated Gaussian process, which admits an equivalent representation as a normalized collection of gamma‑distributed random variables. This construction yields a non‑parametric analogue of the logistic‑normal prior: the infinite set of component probabilities is generated by exponentiating a multivariate Gaussian and normalizing, thereby inheriting a full covariance matrix that captures inter‑topic correlations.
The authors develop a mean‑field variational inference algorithm based on the normalized‑gamma representation. Variational parameters include the topic‑word distributions, document‑topic proportions, and the latent locations of topics. The kernel matrix linking locations appears in the variational updates, but by restricting the latent space to a modest dimensionality the computational burden remains tractable. To handle massive corpora, they further derive a stochastic variational inference (SVI) scheme that processes mini‑batches, updating global variational parameters with natural‑gradient steps. This SVI version retains the ability to learn both the number of topics (through the non‑parametric construction) and their correlation structure.
Empirical evaluation is performed on four medium‑sized text collections—Wikipedia (10 k documents), Science magazine, The New York Times, and The Huffington Post—and on a large‑scale Nature corpus containing roughly 350 k articles. DILN is compared against the HDP and the Correlated Topic Model (CTM). Across all datasets, DILN achieves lower perplexity, indicating superior predictive performance. Moreover, the learned topic correlation matrices are interpretable: for example, political, military, and economic topics exhibit strong positive correlations, while topics about food show negative correlations with those same themes. Visualizations of the latent topic locations provide an intuitive map of the thematic landscape. In the large‑scale Nature experiment, the stochastic variational algorithm scales efficiently, converging faster and yielding better held‑out likelihood than online HDP and online LDA baselines.
In summary, DILN furnishes a principled way to embed correlation structure within a Bayesian non‑parametric mixed‑membership model, preserving the flexibility of the HDP while extending it with a logistic‑normal‑type covariance. The normalized‑gamma formulation enables tractable variational inference, and the stochastic extension makes the approach applicable to corpora of hundreds of thousands of documents. Future work may explore richer kernels (e.g., deep neural embeddings), extensions to multimodal data, or hierarchical DILN constructions that incorporate observed covariates.
Comments & Academic Discussion
Loading comments...
Leave a Comment