Statistics / Machine Learning Statistics / stat.ME

Gibbs posterior for variable selection in high-dimensional classification and data mining

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Gibbs posterior for variable selection in high-dimensional classification and data mining
ArXiv ID: 0810.5655
Date: 2008-11-03
Authors: Researchers from original ArXiv paper

📝 Abstract

In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables "$K$" can be much larger than the sample size "$n$." In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

💡 Deep Analysis

Deep Dive into Gibbs posterior for variable selection in high-dimensional classification and data mining.

In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “$K$” can be much larger than the sample size “$n$.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

📄 Full Content

arXiv:0810.5655v1 [stat.ME] 31 Oct 2008 The Annals of Statistics 2008, Vol. 36, No. 5, 2207–2231 DOI: 10.1214/07-AOS547 c ⃝Institute of Mathematical Statistics, 2008 GIBBS POSTERIOR FOR VARIABLE SELECTION IN HIGH-DIMENSIONAL CLASSIFICATION AND DATA MINING1 By Wenxin Jiang and Martin A. Tanner Northwestern University In the popular approach of “Bayesian variable selection” (BVS), one uses prior and posterior distributions to select a subset of can- didate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classiﬁcation error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspeciﬁed. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables “K” can be much larger than the sample size “n.” In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior. 1. Introduction. The problem of interest here is to predict y, a {0,1} response, based on x, a vector of predictors of dimension dim(x) = K. We have Dn = (y(i),x(i))n 1, the observed data with sample size n, typically as- sumed to form n i.i.d. (independent and identically distributed) copies of (y,x). One is often interested in modeling the relation between y and x, selecting components of x that are most relevant to y, and predicting y using selected information from x. In the approach of Bayesian variable selection (BVS), one chooses compo- nents of x according to some probability distribution (prior and posterior). The BVS approach is very popular for handling high-dimensional data (with Received February 2007; revised August 2007. 1Supported in part by NSF Grant DMS-07-06885. AMS 2000 subject classiﬁcations. Primary 62F99; secondary 82-08. Key words and phrases. Data augmentation, data mining, Gibbs posterior, high- dimensional data, linear classiﬁcation, Markov chain Monte Carlo, prior distribution, risk performance, sparsity, variable selection. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2008, Vol. 36, No. 5, 2207–2231. This reprint diﬀers from the original in pagination and typographic detail. 1 2 W. JIANG AND M. A. TANNER large dimension K, sometimes larger than the sample size n), and has had a wide range of successful applications. See, for example, Smith and Kohn (1996), George and McCulloch (1997), Gerlach, Bird and Hall (2002), Lee, Sha, Dougherty, Vannucci and Mallick (2003), Zhou, Liu and Wong (2004) and Dobra, Hans, Jones, Nevins, Yao and West (2004), among others. For classiﬁcation purpose, a regression model p = p(y|x) (y ∈{0,1}) is typically assumed to be logit linear or probit linear and parameterized by a parameter β, that is, p(y|x) = µy(1 −µ)1−y, where µ = exp(xT β) 1+exp(xT β) (for logis- tic regression) or R xT β −∞(2π)−1/2e−u2/2 du (for probit regression). A prior on p is then induced by placing a prior on parameter β, forcing most of its compo- nents to be zero, such that only a low-dimensional subset of x is selected in regression. The corresponding posterior follows a standard Bayesian treat- ment as (posterior) ∝(likelihood)×(prior) ∝{Qn i=1 p(y(i)|x(i))}×(prior). A number of things can be generated from this posterior: parameter β, condi- tional density p(y|x), mean function µ, as well as the classiﬁcation rule (for y) I[µ > 0.5] = I[xT β > 0]. Jiang (2007) has shown that under certain regu- larity conditions, the prior can be speciﬁed to render near-optimal posterior performance for density estimation, mean estimation and classiﬁcation. The current paper introduces a new direction to BVS. Unlike Jiang (2007), we will construct a modiﬁed posterior (called Gibbs posterior) using a risk function of interest (such as the classiﬁcation error) directly, instead of using the usual likelihood-based Bayesian posterior. We will ﬁrst focus on the statistical properties (e.g., classiﬁcation performance) of BVS with a Gibbs posterior. (Section 7 will handle the algorithmic aspects.) A problem with the usual Bayesian posterior. Below, we ﬁrst demon- strate by a simple example that in case of model misspeciﬁcation, the usual likelihood-based BVS can provide suboptimal performance. Later our theory will suggest that the proposed BVS with Gibbs posterior can improve over the usual approach, since we will show that the proposed method can still achieve near-optimality in some sense, despite the potential misspeciﬁcation. In Jiang (2007), it is assumed that the true model (with density p∗) is of a known transformed linear form, say, logit linear, so that ln{p∗(y = 1|x)/p∗(y

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Gibbs posterior for variable selection in high-dimensional classification and data mining

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Data spectroscopy: Eigenspaces of convolution operators and clustering

Low Dimensional Embedding of fMRI datasets

A stochastic model of human visual attention with a dynamic Bayesian network

Start searching

No results found