Higher-Order Markov Tag-Topic Models for Tagged Documents and Images
This paper studies the topic modeling problem of tagged documents and images. Higher-order relations among tagged documents and images are major and ubiquitous characteristics, and play positive roles in extracting reliable and interpretable topics. In this paper, we propose the tag-topic models (TTM) to depict such higher-order topic structural dependencies within the Markov random field (MRF) framework. First, we use the novel factor graph representation of latent Dirichlet allocation (LDA)-based topic models from the MRF perspective, and present an efficient loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Second, we propose the factor hypergraph representation of TTM, and focus on both pairwise and higher-order relation modeling among tagged documents and images. Efficient loopy BP algorithm is developed to learn TTM, which encourages the topic labeling smoothness among tagged documents and images. Extensive experimental results confirm the incorporation of higher-order relations to be effective in enhancing the overall topic modeling performance, when compared with current state-of-the-art topic models, in many text and image mining tasks of broad interests such as word and link prediction, document classification, and tag recommendation.
💡 Research Summary
The paper addresses the problem of topic modeling for documents and images that are annotated with multiple tags. While conventional topic models such as Latent Dirichlet Allocation (LDA) treat documents as bags of words and capture only pairwise relationships (e.g., citations, co‑authorship), they ignore the richer higher‑order structures that arise when several tags simultaneously describe a single document or image. To fill this gap, the authors propose the Tag‑Topic Model (TTM), a framework that explicitly incorporates both pairwise and higher‑order tag relations within a Markov Random Field (MRF) formulation.
First, the authors reinterpret LDA from an MRF perspective. They convert the directed generative graph of LDA into an undirected factor graph where the document‑specific topic proportion θ_d and the topic‑specific word distribution φ_j become factor nodes, and each word’s latent topic label z_{w,d} is a variable node connected to both factors. This representation allows the topic assignment problem to be viewed as a labeling task, enabling the use of smoothness or sparsity priors that are common in MRF inference.
Second, they extend the factor graph to a factor hypergraph that mirrors the bipartite tag‑document/image structure. A hyperedge connects a set of tags {γ_1,…,γ_M} with a single document or image δ. The associated hyper‑factor encodes a higher‑order potential that favors configurations where all tags in the hyperedge jointly support a consistent topic labeling. Although a naïve higher‑order potential would require J^M possible configurations (J = number of topics, M = hyperedge order), the authors exploit the smoothness assumption to restrict attention to only J·M “representative” configurations, dramatically reducing computational complexity.
Inference and parameter learning are performed using loopy belief propagation (BP). Messages are passed between variable nodes (topic labels) and factor/hyper‑factor nodes, yielding an approximation of the posterior p(z | w, tags). The BP updates naturally provide expected sufficient statistics for θ and φ, which are then updated in an EM‑like fashion. The algorithm avoids the expensive Gibbs sampling used in many Bayesian topic models and converges in far fewer iterations.
The experimental evaluation covers both textual corpora (e.g., DBLP, NIPS) and image collections (e.g., Flickr) where documents/images are associated with multiple author or annotation tags. Three downstream tasks are examined: (1) word and link prediction, (2) document classification, and (3) tag recommendation. TTM is compared against a suite of baselines, including plain LDA, Relational Topic Model (RTM), Author‑Topic Model (ATM), and Labeled LDA (L‑LDA). Across all tasks, TTM consistently outperforms the baselines, achieving improvements of 3–12 % in accuracy, F1, or MAP. The gains are especially pronounced when the higher‑order hyper‑factors are active, confirming that modeling multi‑tag interactions captures latent thematic structures that pairwise models miss.
The paper also discusses limitations. The design of the higher‑order potential relies on manually specified smoothness/sparsity priors, which may need adaptation for different domains. Noisy or irrelevant tags can degrade performance because the hyper‑factor may enforce inappropriate smoothness. Moreover, the current formulation assumes a static set of tags; handling dynamic tag addition or deletion remains an open problem.
In conclusion, the authors demonstrate that (i) recasting topic modeling as an MRF labeling problem enables the application of efficient graph‑based inference, and (ii) explicitly modeling higher‑order tag relations via a factor hypergraph yields measurable improvements in topic quality and downstream task performance. Future work is suggested in automatic learning of higher‑order potentials (e.g., via neural networks), integration of unstructured metadata (comments, user behavior), and hybridization with deep generative models such as variational autoencoders to further enrich the representation of tagged multimodal data.
Comments & Academic Discussion
Loading comments...
Leave a Comment