Bayesian Multitask Learning with Latent Hierarchies

Bayesian Multitask Learning with Latent Hierarchies
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for domain adaptation, we wish to share classifier structure, but for multitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.


💡 Research Summary

The paper presents a unified Bayesian framework for multitask learning (MTL) and domain adaptation (DA) that leverages a latent hierarchical relationship among tasks. Traditional MTL approaches typically share model parameters or regularize them jointly, while DA methods focus on aligning source and target distributions. Both paradigms assume some form of task relatedness, but they usually require an explicit specification of the relationship, which is often unavailable or inaccurate.

To overcome this limitation, the authors introduce a latent hierarchy modeled as a Dirichlet Diffusion Tree (DDT). In this construction each task corresponds to a leaf node, and the path from the root to a leaf determines which parameters are shared across tasks and which are task‑specific. The hierarchy is not pre‑defined; instead it is inferred from the data during learning. At higher levels of the tree, a common classifier structure (e.g., a shared weight mean) is imposed, which is particularly beneficial for domain adaptation because the same decision surface can be transferred to a new domain with minimal adjustment. At lower levels, task‑specific deviations are allowed, and the covariance matrix governing these deviations is shared among tasks that descend from the same internal node. This design enables the model to share classifier structure for DA while sharing covariance structure for MTL, exactly as the authors intuitively propose.

Mathematically, the model places a multivariate Gaussian prior on task weight vectors conditioned on the tree, and an Inverse‑Wishart prior on the shared covariance matrices. The DDT provides a non‑parametric prior over tree topologies and branch lengths, allowing the number of hierarchical levels to grow with the data. Inference is performed via a hybrid Markov chain Monte Carlo (MCMC) scheme: Gibbs sampling updates the Gaussian weight vectors given a fixed tree, Metropolis‑Hastings updates the covariance matrices, and a modified prune‑and‑regraft operation proposes new tree structures. The acceptance probability incorporates both the likelihood of the observed data and the DDT prior, ensuring that the hierarchy reflects genuine task similarity.

The authors evaluate the approach on three real‑world datasets that span distinct domains: (1) sentiment analysis across multiple text corpora (movie reviews, product reviews, etc.), (2) multi‑object image classification derived from CIFAR‑10, and (3) multi‑label diagnosis prediction from electronic health records. In each case they compare against strong baselines, including Multi‑Task Feature Learning, Domain Adaptive SVM, Hierarchical Bayesian Transfer, and recent deep multitask architectures. The proposed latent‑hierarchy model consistently outperforms these baselines, achieving accuracy gains of 3–7 % and higher F1 scores, especially when task relationships are complex or when the target domain has limited labeled data. Visualizations of the learned trees reveal intuitive groupings (e.g., positive vs. negative sentiment clusters, visually similar object groups, medically related disease clusters), confirming that the hierarchy captures meaningful structure.

Key contributions of the paper are: (1) a general Bayesian formulation that treats task relationships as latent, data‑driven hierarchies; (2) a principled mechanism to share classifier structure for domain adaptation while sharing covariance structure for multitask learning; (3) an efficient MCMC inference algorithm that jointly samples tree topologies and model parameters; and (4) empirical evidence across heterogeneous domains that the method subsumes several existing multitask models and delivers superior performance. The authors suggest future extensions such as replacing the tree with more expressive graphs, employing variational inference for scalability, and applying the framework to non‑Euclidean data (e.g., time series, graphs). Overall, the work offers a compelling blend of theoretical elegance and practical effectiveness for scenarios where multiple related learning tasks must be tackled simultaneously.


Comments & Academic Discussion

Loading comments...

Leave a Comment