Deep Multivariate Models with Parametric Conditionals
We consider deep multivariate models for heterogeneous collections of random variables. In the context of computer vision, such collections may e.g. consist of images, segmentations, image attributes, and latent variables. When developing such models, most existing works start from an application task and design the model components and their dependencies to meet the needs of the chosen task. This has the disadvantage of limiting the applicability of the resulting model for other downstream tasks. Here, instead, we propose to represent the joint probability distribution by means of conditional probability distributions for each group of variables conditioned on the rest. Such models can then be used for practically any possible downstream task. Their learning can be approached as training a parametrised Markov chain kernel by maximising the data likelihood of its limiting distribution. This has the additional advantage of allowing a wide range of semi-supervised learning scenarios.
💡 Research Summary
The paper addresses a fundamental limitation of current deep probabilistic models for computer‑vision tasks: they are usually built around a single factorisation of the joint distribution that is chosen to suit a particular downstream task (e.g., classification, conditional generation). When the number of variables grows beyond two, the number of possible factorizations explodes, and a model that is convenient for one inference problem often cannot provide the other conditionals without expensive auxiliary networks or approximations.
To overcome this, the authors propose to represent the joint distribution of an arbitrary collection of heterogeneous variables (images, segmentations, attributes, latent codes, etc.) solely by a set of conditional distributions pθi(xi | x−i). Each conditional is parametrised by its own neural network and can be trained independently. The conditionals are combined into a single Markov‑chain transition kernel that, at each step, randomly selects a variable i, keeps all other components fixed, and samples a new value for xi from pθi. This kernel is a weighted mixture of Gibbs updates (weights αi) and defines a reversible Markov chain whose stationary distribution pθ(x) is the joint distribution implicitly defined by the conditionals.
Learning is formulated as maximum‑likelihood estimation of the stationary distribution. Because the stationary distribution is not analytically tractable, the authors derive a variational lower bound LB(θ,q,n) that depends on an auxiliary Markov kernel q(x′|x) and on the distribution obtained after n steps of sampling from q starting from the data distribution. The bound can be written as
LB(θ,q,n) = L(θ) – n · E_{Q_n}
Comments & Academic Discussion
Loading comments...
Leave a Comment