Inference with Discriminative Posterior

Reading time: 6 minute
...

📝 Original Info

  • Title: Inference with Discriminative Posterior
  • ArXiv ID: 0807.3470
  • Date: 2008-11-18
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We study Bayesian discriminative inference given a model family $p(c,\x, \theta)$ that is assumed to contain all our prior information but still known to be incorrect. This falls in between "standard" Bayesian generative modeling and Bayesian regression, where the margin $p(\x,\theta)$ is known to be uninformative about $p(c|\x,\theta)$. We give an axiomatic proof that discriminative posterior is consistent for conditional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior results in better conditional inference if the model family is incorrect. If the model family contains also the true model, the discriminative posterior gives the same result as standard Bayesian generative modeling. Practical computation is done with Markov chain Monte Carlo.

💡 Deep Analysis

Deep Dive into Inference with Discriminative Posterior.

We study Bayesian discriminative inference given a model family $p(c,\x, \theta)$ that is assumed to contain all our prior information but still known to be incorrect. This falls in between “standard” Bayesian generative modeling and Bayesian regression, where the margin $p(\x,\theta)$ is known to be uninformative about $p(c|\x,\theta)$. We give an axiomatic proof that discriminative posterior is consistent for conditional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior results in better conditional inference if the model family is incorrect. If the mode

📄 Full Content

arXiv:0807.3470v2 [stat.ML] 18 Nov 2008 Inference with Discriminative Posterior Jarkko Saloj¨arvi† Kai Puolam¨aki‡ Eerika Savia† Samuel Kaski† Helsinki Institute for Information Technology Department of Information and Computer Science Helsinki University of Technology P.O. Box 5400, FI-02015 TKK, Finland † Author belongs to the Finnish Centre of Excellence in Adaptive Informatics Research. ‡ Author belongs to the Finnish Centre of Excellence in Algorithmic Data Analysis Research. Abstract We study Bayesian discriminative inference given a model family p(c, x, θ) that is assumed to contain all our prior information but still known to be incorrect. This falls in between “standard” Bayesian generative modeling and Bayesian re- gression, where the margin p(x, θ) is known to be uninformative about p(c|x, θ). We give an axiomatic proof that discriminative posterior is consistent for condi- tional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior re- sults in better conditional inference if the model family is incorrect. If the model family contains also the true model, the discriminative posterior gives the same re- sult as standard Bayesian generative modeling. Practical computation is done with Markov chain Monte Carlo. 1 Introduction Our aim is Bayesian discriminative inference in the case where the model family p(c, x, θ) is known to be incorrect. Here x is a data vector and c its class, and the θ are parameters of the model family. By discriminative we mean predicting the con- ditional distribution p(c | x). The Bayesian approach of using the posterior of the generative model family p(c, x, θ) has not been shown to be justified in this case, and it is known that it does not always generalize well to new data (in case of point estimates, see for example 1 [1, 2, 3]; in this paper we provide a toy example that illustrates the fact for posterior distributions). Therefore alternative approaches such as Bayesian regression are ap- plied [4]. It can be argued that the best solution is to improve the model family by incorporating more prior knowledge. This is not always possible or feasible, however, and simplified models are being generally used, often with good results. For example, it is often practical to use mixture models even if it is known a priori that the data can- not be faithfully described by them (see for example [5]). There are good reasons for still applying Bayesian-style techniques [6] but the general problem of how to best do inference with incorrect model families is still open. In practice, the usual method for discriminative tasks is Bayesian regression. It dis- regards all assumptions about the distribution of x, and considers x only as covariates of the model for c. Bayesian regression may give superior results in discriminative inference, but the omission of a generative model for x (although it may be readily available) makes it difficult to handle missing values in the data. Numerous heuristic methods for imputing missing values have been suggested, see for example [7], but no theoretical arguments of their optimality have been presented. Here we assume that we are given a generative model family of the full data (x, c), and therefore have a generative mechanism readily available for imputing missing values. From the generative modeling perspective, Bayesian regression ignores any infor- mation about c supplied by the marginal distribution of x. This is justified if (i) the covariates are explicitly chosen when designing the experimental setting and hence are not noisy, or (ii) there is a separate set of parameters for generating x on the one hand and c given x on the other, and the sets are assumed to be independent in their prior distribution. In the latter case the posterior factors out into two parts, and the parame- ters used for generating x are neither needed nor useful in the regression task. See for instance [4, 8] for more details. However, there has been no theoretical justification for Bayesian regression in the more general setting where the independence does not hold. For point estimates of generative models it is well known that maximizing the joint likelihood and the conditional likelihood give in general different results. Maximum conditional likelihood gives asymptotically a better estimate of the conditional like- lihood [2], and it can be optimized with expectation-maximization-type procedures [9, 10]. In this paper we extend that line of work to show that the two different approaches, joint and conditional modeling, result in different poster

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut