Gibbs posterior for variable selection in high-dimensional classification and data mining

Reading time: 6 minute
...

📝 Original Info

  • Title: Gibbs posterior for variable selection in high-dimensional classification and data mining
  • ArXiv ID: 0810.5655
  • Date: 2008-11-03
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables "$K$" can be much larger than the sample size "$n$." In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

💡 Deep Analysis

Deep Dive into Gibbs posterior for variable selection in high-dimensional classification and data mining.

In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “$K$” can be much larger than the sample size “$n$.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

📄 Full Content

arXiv:0810.5655v1 [stat.ME] 31 Oct 2008 The Annals of Statistics 2008, Vol. 36, No. 5, 2207–2231 DOI: 10.1214/07-AOS547 c ⃝Institute of Mathematical Statistics, 2008 GIBBS POSTERIOR FOR VARIABLE SELECTION IN HIGH-DIMENSIONAL CLASSIFICATION AND DATA MINING1 By Wenxin Jiang and Martin A. Tanner Northwestern University In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of can- didate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “K” can be much larger than the sample size “n.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior. 1. Introduction. The problem of interest here is to predict y, a {0,1} response, based on x, a vector of predictors of dimension dim(x) = K. We have Dn = (y(i),x(i))n 1, the observed data with sample size n, typically as- sumed to form n i.i.d. (independent and identically distributed) copies of (y,x). One is often interested in modeling the relation between y and x, selecting components of x that are most relevant to y, and predicting y using selected information from x. In the approach of Bayesian variable selection (BVS), one chooses compo- nents of x according to some probability distribution (prior and posterior). The BVS approach is very popular for handling high-dimensional data (with Received February 2007; revised August 2007. 1Supported in part by NSF Grant DMS-07-06885. AMS 2000 subject classifications. Primary 62F99; secondary 82-08. Key words and phrases. Data augmentation, data mining, Gibbs posterior, high- dimensional data, linear classification, Markov chain Monte Carlo, prior distribution, risk performance, sparsity, variable selection. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2008, Vol. 36, No. 5, 2207–2231. This reprint differs from the original in pagination and typographic detail. 1 2 W. JIANG AND M. A. TANNER large dimension K, sometimes larger than the sample size n), and has had a wide range of successful applications. See, for example, Smith and Kohn (1996), George and McCulloch (1997), Gerlach, Bird and Hall (2002), Lee, Sha, Dougherty, Vannucci and Mallick (2003), Zhou, Liu and Wong (2004) and Dobra, Hans, Jones, Nevins, Yao and West (2004), among others. For classification purpose, a regression model p = p(y|x) (y ∈{0,1}) is typically assumed to be logit linear or probit linear and parameterized by a parameter β, that is, p(y|x) = µy(1 −µ)1−y, where µ = exp(xT β) 1+exp(xT β) (for logis- tic regression) or R xT β −∞(2π)−1/2e−u2/2 du (for probit regression). A prior on p is then induced by placing a prior on parameter β, forcing most of its compo- nents to be zero, such that only a low-dimensional subset of x is selected in regression. The corresponding posterior follows a standard Bayesian treat- ment as (posterior) ∝(likelihood)×(prior) ∝{Qn i=1 p(y(i)|x(i))}×(prior). A number of things can be generated from this posterior: parameter β, condi- tional density p(y|x), mean function µ, as well as the classification rule (for y) I[µ > 0.5] = I[xT β > 0]. Jiang (2007) has shown that under certain regu- larity conditions, the prior can be specified to render near-optimal posterior performance for density estimation, mean estimation and classification. The current paper introduces a new direction to BVS. Unlike Jiang (2007), we will construct a modified posterior (called Gibbs posterior) using a risk function of interest (such as the classification error) directly, instead of using the usual likelihood-based Bayesian posterior. We will first focus on the statistical properties (e.g., classification performance) of BVS with a Gibbs posterior. (Section 7 will handle the algorithmic aspects.) A problem with the usual Bayesian posterior. Below, we first demon- strate by a simple example that in case of model misspecification, the usual likelihood-based BVS can provide suboptimal performance. Later our theory will suggest that the proposed BVS with Gibbs posterior can improve over the usual approach, since we will show that the proposed method can still achieve near-optimality in some sense, despite the potential misspecification. In Jiang (2007), it is assumed that the true model (with density p∗) is of a known transformed linear form, say, logit linear, so that ln{p∗(y = 1|x)/p∗(y

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut