Spectral Learning for Supervised Topic Models
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.
💡 Research Summary
This paper tackles the problem of learning supervised Latent Dirichlet Allocation (sLDA) parameters using spectral methods rather than the traditional variational inference or Gibbs sampling approaches. The authors first introduce a two‑stage spectral algorithm. In the first stage they apply existing spectral techniques for unsupervised LDA to recover the topic‑word matrix O from low‑order observable moments (first, second, and third order). After whitening and robust tensor power decomposition they obtain a canonical version of O that satisfies orthogonality conditions. In the second stage they propose a novel “power‑update” step that leverages a newly defined third‑order moment involving the response variable y and word occurrences. This moment enables a closed‑form linear system from which the regression weight vector η and the noise variance σ² are recovered, assuming the previously estimated O is fixed. The two‑stage method enjoys a sample complexity comparable to that of vanilla LDA, but because supervision is not used when estimating O, its predictive performance can be slightly inferior to fully Bayesian methods.
To overcome this limitation, the authors develop a single‑phase (joint) spectral algorithm. They construct an extended topic vector v_i =
Comments & Academic Discussion
Loading comments...
Leave a Comment