Expectation-Propogation for the Generative Aspect Model
The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al (2001) can lead to inaccurate inferences and biased learning for the generative aspect model. We develop an alternative approach that leads to higher accuracy at comparable cost. An extension of Expectation-Propagation is used for inference and then embedded in an EM algorithm for learning. Experimental results are presented for both synthetic and real data sets.
💡 Research Summary
The paper addresses inference and learning challenges in the Generative Aspect Model (GAM), an extension of the multinomial model that allows word probabilities to vary across documents through a mixture of latent “aspects” (or topics). While GAM captures document‑level thematic variability, exact posterior inference is intractable because each document’s mixing proportions are continuous Dirichlet variables coupled with per‑aspect word distributions.
Previous work, most notably Blei et al. (2001), applied Variational Bayes (VB) to obtain a lower bound on the log‑likelihood and used this bound for both inference and EM‑based learning. The authors demonstrate that VB’s reliance on a KL(p‖q) minimisation leads to systematic under‑estimation of posterior uncertainty: the variational posterior often collapses toward the prior, producing overly flat aspect distributions and biased parameter updates. Empirical tests on synthetic data reveal large gaps between the variational bound and the true log‑likelihood, indicating that VB can yield inaccurate inferences and consequently biased learning.
To overcome these limitations, the authors propose an Expectation‑Propagation (EP) framework for GAM. EP decomposes the joint distribution into a product of factors (the Dirichlet prior on mixing proportions, the per‑aspect word likelihoods, and the observed word tokens) and iteratively refines an approximating distribution by removing each factor, incorporating the exact factor, and projecting back onto a tractable family (a product of Dirichlet and multinomial moments). Unlike VB, EP minimises KL(q‖p), which forces the approximating distribution to match the mode of the true posterior and to preserve higher‑order moments. The algorithm proceeds as follows:
- Initialise site approximations for each document’s Dirichlet mixing vector and for each aspect’s word distribution.
- For each observed word token, compute the cavity distribution by dividing out the current site, multiply by the exact likelihood of that token, and then project the result back onto the Dirichlet‑multinomial family by moment matching.
- Update the site parameters with the new projection and repeat until convergence.
The EP‑derived posterior moments are then embedded in an EM learning scheme. In the E‑step, the expected sufficient statistics for the mixing proportions and aspect assignments are obtained from the EP posterior. In the M‑step, these expectations are used to update the aspect word distributions (φ_k) and the Dirichlet hyper‑parameters (α) by maximising the expected complete‑data log‑likelihood (or a MAP objective). Because the E‑step now uses a more accurate posterior, the M‑step receives less biased statistics, leading to improved parameter estimates.
The authors evaluate the method on two fronts. First, on synthetic corpora where the true aspects and mixing weights are known, EP‑EM recovers the ground‑truth parameters with significantly higher log‑likelihood (5–10 % improvement) and lower reconstruction error compared with VB‑EM. Second, on real‑world text collections such as the 20 Newsgroups and Reuters datasets, EP‑EM yields higher topic coherence, lower perplexity, and better document clustering performance (approximately 3 % higher F1 score) than standard LDA implemented with variational inference. Computationally, EP requires per‑iteration site updates but converges in fewer iterations; overall runtime is comparable to, or modestly faster than, the variational baseline.
In summary, the paper demonstrates that Expectation‑Propagation provides a principled and efficient alternative to variational methods for the Generative Aspect Model. By delivering more accurate posterior approximations at similar computational cost, EP‑EM mitigates the bias inherent in VB and improves both inference quality and learned model parameters. The work opens avenues for extending EP‑based inference to multimodal data, online streaming scenarios, and deeper hierarchical topic structures.
Comments & Academic Discussion
Loading comments...
Leave a Comment