Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors
We explore the use of generalized t priors on regression coefficients to help understand the nature of association signal within "hit regions" of genome-wide association studies. The particular generalized t distribution we adopt is a Student distrib…
Authors: Anthony Lee, Francois Caron, Arnaud Doucet
Ba y esian Sparsit y-P ath-Analysis of Genetic Asso ciation Signal using Generalized t Priors An thony Lee ∗ F rancois Caron † Arnaud Doucet ‡ Chris Holmes § Octob er 26, 2018 Abstract W e explore the use of generalized t priors on regression co efficients to help understand the nature of association signal within “hit regions” of genome-wide association studies. The particular generalized t distribution w e adopt is a Student distribution on the absolute v alue of its argumen t. F or lo w degrees of freedom w e sho w that the generalized t exhibits ‘sparsit y- prior’ properties with some attractiv e features o ver other common forms of sparse priors and includes the w ell kno wn double-exp onen tial distribution as the degrees of freedom tends to ∞ . W e pay particular attention to graphical represen tations of p osterior statistics obtained from sparsity-path-analysis (SP A) where w e sweep ov er the setting of the scale (shrink age / precision) parameter in the prior to explore the space of p osterior mo dels obtained ov er a range of complexities, from v ery sparse models with all co efficien t distributions hea vily concen trated around zero, to mo dels with diffuse priors and co efficien ts distributed around their maximum likelihoo d estimates. The SP A plots are akin to LASSO plots of maximum a p osteriori (MAP) estimates but they characterise the complete marginal p osterior distri- butions of the co efficien ts plotted as a function of the precision of the prior. Generating p osterior distributions ov er a range of prior precisions is computationally challenging but naturally amenable to sequential Monte Carlo (SMC) algorithms indexed on the scale pa- rameter. W e show how SMC simulation on graphic-pro cessing-units (GPUs) provides very efficien t inference for SP A. W e also presen t a scale-mixture representation of the general- ized t prior that leads to an EM algorithm to obtain MAP estimates should only these b e required. 1 In tro duction Genome-wide association studies (GW AS) ha v e presen ted a num b er of interesting c hallenges to statisticians [1]. Con ven tionally GW AS use single mark er univ ariate tests of asso ciation, testing marker by marker in order to highligh t “hit regions” of the genome showing evidence of asso ciation signal. The motiv ation for our work concerns metho ds in Bay esian sparse multiple regression analysis to characterise and decomp ose the asso ciation signal within such regions ∗ Oxford-Man Institute and Departmen t of Statistics, Universit y of Oxford, UK † INRIA Bordeaux Sud-Ouest and Institut de Math ´ ematiques de Bordeaux, Univ ersity of Bordeaux, F rance ‡ Institute of Statistical Mathematics, Japan and Universit y of British Columbia, Departmen t of Statistics and Departmen t of Computer Science, Canada. § Departmen t of Statistics and Oxford-Man Institute and W ellcome T rust Centre for Human Genetics, Uni- v ersity of Oxford, and MRC Harwell, UK 1 spanning p ossibly h undreds of mark ers. Hit-regions t ypically cov er lo ci in high link age disequi- librium (LD) which leads to strong correlation b et w een the markers making multiple regression analysis non-trivial. A priori we w ould exp ect only a small num b er of mark ers to be resp on- sible for the asso ciation and hence priors that induce sparsit y are an important to ol to aid understanding of the genotype-phenotype dep endence structure; an interesting question b eing whether the asso ciation signal is consisten t with a single causal mark er or due to multiple effects. W e restrict atten tion to case-con trol GW AS, i.e. binary phenotypes, these b eing b y far the most prev alen t although generalisations to other exp onen tial family likelihoo ds or non-linear mo dels is relatively straightforw ard. In the non-Ba y esian literature sparse regression analysis via p enalised lik eliho od has gained increasing p opularit y since the seminal pap er on the LASSO [2, 3]. The LASSO estimates hav e a Bay esian interpretation as maxim um a p osteriori (MAP) statistics under a double exp onen- tial prior on regression coefficients π ( β ) ∝ exp( − λ P j | β j | ) and it is kno wn that, unlike ridge p enalties using a normal prior, the Lasso prior tends to pro duce ‘sparse’ solutions in that as an increasing function of the regulariation p enalt y λ more and more of the MAP estimates will b e zero. F rom a Bay esian p ersp ectiv e the use of MAP estimates holds little justification, see the discussion of [3], and in [4] they explore full Bay esian p osterior analysis with Lasso double- exp onen tial priors using Mak o v c hain Mon te Carlo (MCMC) to make inference. While the Lasso p enalt y / prior is p opular there is increasing a wareness that the use of iden tical ‘p enalization’ on each coefficient, e.g. λ P p j =1 | β j | , can lead to unacceptable bias in the resulting estimates [5]. In particular co efficien ts get shrunk tow ards zero even when there is ov erwhelming evidence in the likelihoo d that they are non-zero. This has motiv ated use of sparsity-inducing non-conv ex p enalties whic h reduce bias in the estimates of large co efficien ts at the cost of increased diffi- cult y in computing. Of note we w ould mention the “adaptive” metho ds [6, 7] in the statistics literature and iteratively reweigh ted metho ds [8, 9] in the signal pro cessing literature; although these pap ers are only concerned with MAP estimation there b eing no published articles to date detailing the use of full Bay esian analysis. In the con text of GW AS there ha ve b een recen t illustrations of the use of adaptive sparsity priors and MAP estimation [10], reviewed and compared in [11]. In [10] they use the Normal- Exp onen tial-Gamma sparsity-prior of [12] to obtain sparse MAP estimates from logistic regres- sion. The use of contin uous sparse priors can b e con trasted with an alternativ e approach using t wo comp onen t mixture priors suc h as in [13] which adopt a distribution con taining a spike at zero and another comp onen t with broad scale. Co efficien ts are then a p osteriori classified into either state, relating to whether their corresponding v ariables (genotypes) are deemed predic- tiv ely irrelev an t or relev an t. In the GW AS setting the tw o-component mixture prior has b een explored by [14, 15]. F or mo del exploration the sparsity prior has some b enefits in allo wing the statisticians to visualize the regression analysis ov er a range of scales / mo del complexities. In this paper we advocate the use of the generalized t prior, first considered in [16], on the regression co efficients and a full Bay esian analysis. The generalized t prior w e adopt is a Studen t distribution on the absolute v alue of the co efficien t. W e demonstrate it has a num b er of attractiv e features as a ‘sparsity prior’ in particular due to its simple analytic form and in terpretable parameters. In applications the setting of the scale parameter in the prior is the k ey task that affects the sparsit y of the p osterior solution. W e b eliev e m uch is to b e gained in exploring the con tinuum of p osterior mo dels formed b y sweeping through the scale of the prior, ev en ov er regions of lo w p osterior probabilit y , in an exploratory approach we term sparsity- path-analysis (SP A). This leads to wards providing graphical represen tations of the p osterior densities that arise as w e mov e from the most sparse models with all co efficien t densities hea vily concen trated around the origin through to models with diffuse priors and coefficients distributed 2 around their maxim um likelihoo d estimates. W e stress that even though parts of this mo del space ha v e lo w probabilit y , they ha ve utility and are in teresting in gaining a fuller understanding of the predictor-resp onse (genot yp e-phenot yp e) dep endence structure. SP A is an exploratory to ol that is computationally c hallenging but highly suited to sequential Mon te Carlo (SMC) algorithms indexed on the scale parameter and sim ulated in parallel on graphics cards using graphical pro cessing units (GPUs). The generalized t as a prior for Bay esian regression co efficien ts was first suggested by [16] and more recently in the context of sparse MAP estimation for normal (Gaussian) likelihoo ds in [17, 18, 19], and using Mark ov chain Monte Carlo for a full Bay esian linear regression analysis in [19]. In [18, 19] the prior is referred to as a double-pareto distribution but we prefer to use the terminology of the generalized t as w e feel it is more explicit and easier to interpret as suc h. Our con tribution in this pap er is to consider a full Bay esian analysis of logistic regression mo dels with generalized t priors in the con text of GW AS applications for whic h w e dev elop SMC samplers and GPU implementation. The GPU implementation is imp ortan t in practice, in the GW AS analysis in reduces run-time from ov er five days to hours. W e pa y particular attention to developing graphical displays of summary statistics from the joint p osterior distribution ov er a wide range of prior precisions; arguing that parts of the mo del space with low probability ha ve high utility and are informative to the exploratory understanding of regression signal. In Section 2 we in tro duce the generalized t prior and sho w that it has a scale-mixture rep- resen tation and exhibits sparsity t yp e prop erties for regression analysis. Section 3 provides a brief description of the genetic asso ciation data and sim ulated phenotypes used in our metho ds dev elopment. Section 4 concerns computing p osterior distributions o ver a range of prior scale parameters using sequential Monte Carlo algorithms simulated on GPUs. Section 5 deals with graphical display of SP A statistics and plots of p osterior summary statistics ov er the path of scales. W e conclude with a discussion in Section 6. 2 Generalized t prior for sparse regression analysis In Bay esian logistic regression analysis given a data set { X , y } , of ( n × p ) predictor matrix X and ( n × 1) v ector of binary phenotypes (or resp onse v ariables) y , we adopt a prior p ( β ) on the p regression co efficien ts from whic h the log-p osterior up dates as, log p ( β | X , y , a, c ) ∝ log f ( y | X , β , θ ) + log p ( β ) where f ( · ) is the likelihoo d function, f ( y | X , β ) = n Y i =1 ( p i ) y i (1 − p i ) 1 − y i p i = log it − 1 ( x i β ) In this pap er we prop ose the generalized t distribution as an interesting prior for β , which has the form of a Studen t density on the L q -norm of its argument, [16], p ( β | µ, a, c, q ) = q 2 ca 1 /q B (1 /q , a ) 1 + | β − µ | q ac q − ( a +1 /q ) (1) where we note that the usual Studen t’s densit y is obtained with { q = 2 , c = √ 2 } ; the exp onen tial p o w er family of distributions arise as degrees of freedom a → ∞ ; and the double-exp onen tial 3 or Laplace distribution o ccurs with { q = 1 , a = ∞} . The use of this distribution as a prior on regression co efficients was first describ ed in [16]. W e shall only consider the L 1 -norm case with cen tral lo cation µ = 0, and indep enden t priors across the p co efficien ts for which we can simplify the distribution to, p ( β j | µ = 0 , a, c, q = 1) = 1 2 c 1 + | β j | ac − ( a +1) (2) where b > 0 is a scale parameter and a > 0 is related to the degrees of freedom; we will denote this cen tred L 1 generalized t as Gt ( a, c ). Note that [18, 19] refer to this distribution as a double-pareto but w e prefer to think of it as an instantiation of a generalized t following the earlier deriv ation in the context of priors on regression co efficients in [16]. This also allows for easier understanding of the parameters and direct connections with the double-exp onen tial distribution and other standard densities. It is in teresting to note that (2) can b e deriv ed from the marginal of a scale-mixture of double- exp onen tial densities, such that, for multiv ariate β = { β 1 , . . . , β p } , where, p ( β ) = Y j p ( β j | τ j ) p ( β j | τ j ) = 1 2 τ j exp − | β j | τ j τ j ∼ I G ( a, b ) (3) where I G ( · , · ) denotes the Inv erse-Gamma density , from this w e find, p ( β j ) = Z τ j p ( x j | τ j ) p ( τ j | a, b ) dτ j = Gt ( a, b/a ) . where we can set b = ac to generate a marginal Gt ( a, c ) density . Hence as a prior on the set of regression co efficients, one w ay to think of (2) is as defining a hierarchical prior whereby each co efficien t has a double-exp onen tial prior with a co efficient sp ecific scale parameter distributed a priori as I G ( a, ac ). A plot of the log densities for Gt (1 , 1), and the double-exp onen tial, D E (0 , 1) is shown in Figure 1 where we can see the greater Kurtosis and hea vier tails provided b y the generalized t. As men tioned in the Section 1, the relatively light tails of the D E ( · ) prior is unattractiv e as it tends to shrink large v alues of the co efficien ts even when there is clear evidence in the likelihoo d that they are predictively imp ortan t. W e are interested with inferring and graphing the regression co efficen ts’ p osterior distributions o ver a range of v alues for c for fixed degrees of freedom a , an exploratory pro cedure w e call sparsit y-path-analysis. But first it is instructiv e to examine the form of the MAP estimates obtained under the generalized t prior as this sheds light on their utility as sparsit y priors. 2.1 MAP estimates via the EM algorithm The optimization problem asso ciated with obtaining the MAP estimates for β under the gen- eralized t-distribution prior is not concav e. Ho wev er, one can find lo cal mo des of the p osterior b y making use of the scale-mixture represen tation (3) and using the EM algorithm with the hierarc hical scale parameters τ = τ 1: p as latent v ariables. Indeed, each iteration of EM tak es the form β ( t +1) = arg max β log f ( y | X , β ) + Z log[ p ( β | τ )] p ( τ | β ( t ) , a, b ) d τ 4 where b = ac . The conjugacy of the in v erse-gamma distribution with resp ect to the Laplace distribution gives τ j | β ( t ) j , a j , b j ∼ I G ( a j + 1 , b j + | β j | ) allo wing for differen t prior parameters for eac h co efficient, and with p ( β | τ ) = Q p j =1 p ( β j | τ j ) = Q p j =1 1 / (2 τ j ) exp( −| β j | /τ j ) yields β ( t +1) = arg max β log f ( y | X , β , θ ) − p X j =1 | β j | Z 1 τ j p ( τ j | β ( t ) j , a j , b j ) dτ j where the exp ectation of 1 /τ j giv en τ j ∼ I G ( a j + 1 , b j + | β ( t ) j | ) is ( a j + 1) / ( b j + | β ( t ) j | ). As suc h, one can find a lo cal mo de of the p osterior p ( β | y , X , β , θ ) by starting at some p oin t β (0) and then iteratively solving β ( t +1) = arg max β log f ( y | X , β , θ ) − p X j =1 w ( t ) j | β j | (4) where w ( t ) j = a j + 1 b j + | β ( t ) j | = a j + 1 a j c j + | β ( t ) j | F or logistic regression the up date (4) is a con v ex optimization problem for whic h standard tec hniques exist. W e can see the connection to the Lasso as with a → ∞ we find w ( t ) j → 1 c j , where λ = 1 /c is the usual Lasso p enalty . 2.2 Oracle prop erties of the MAP In the p enalized optimization literature, some estimators are justified at least partially b y their p ossession of the oracle prop ert y: that for appropriate parameter choices, the metho d p erforms just as well as an oracle pro cedure in terms of selecting the correct predictors and estimating the nonzero co efficien ts correctly asymptotically in n when the likelihoo d is Gaussian. Other related prop erties include asymptotic un biasedness, sparsit y and contin uit y in the data [5]. P enalization schemes with the oracle prop ert y include the smo othly clipp ed absolute deviation metho d [5], the adaptive lasso [6] and the one-step lo cal linear appro ximation metho d [7]. F or the generalized t-distribution, the MAP estimate obtained using this prior has b een exam- ined in [17, 18, 19], all of which deriv e the expectation-maximisation algorithm for finding a lo cal mo de of the p osterior. It is straightforw ard to establish that the estimate is asymptoti- cally unbiased, sparse when c < 2 p ( a + 1) /a and contin uous in the data when c = p ( a + 1) /a follo wing the conditions laid out in [5]. [19] sho ws that this scheme satisfies the oracle prop ert y if simultaneously a → ∞ , a/ p ( n ) → 0 and ac p ( n ) → C < ∞ for some constant C . It is worth remarking that the oracle prop erty requires the prior on β to dep end on the num b er of observ ations n . This is in conflict with conv entional Bay esian analysis in that the prior should represent your b eliefs ab out the data generating mechanism irresp ectiv e of the n umber of observ ations that you ma y or may not receiv e. In addition in the context of applications y our n and p may well not b e in the realm of asymptotic relev ance. 5 2.3 Comparison to other priors There ha ve been a num b er of sparsit y prior representations adv o cated recen tly in the literature, although it is important to note that most of the pap ers deal solely with MAP estimation and not full Bay esian analysis. T o name a few imp ortant con tributions we highligh t the normal-gamma, normal-exp onen tial-gamma (NEG) and the Horsesho e priors, see [20, 21, 12, 22]. Like all these priors the Gt ( · , · ) shares the prop erties of sparsit y and adaptiv e shrink age, allowing for less shrink age on large co efficients as shown in Fig.2 for a normal observ ation, and a scale-mixture represen tation which facilitates an efficient EM algorithm to obtain MAP estimates. In T able 1 w e hav e listed the log densities for the sparsity-priors mentioned ab o ve. F rom T able 1 we see an immediate b enefit of the generalized t density is that it has a simple analytic form compared with other sparsit y-priors and is well known and understo o d by statisticians, which in turn aids explanation to data o wners. The generalized t is also easy to compute relative to the NEG, normal-gamma and Horsesho e priors which all inv olv e non-standard computation, making them more c hallenging to implemen t and less amenable to fast parallel GPU sim ulation. Finally , the setting of the parameters of the generalized t is help ed as statisticians are used to thinking ab out degrees of freedom and scale parameters of Studen t’s densities. W e b eliev e this will ha ve an impact on the attractiveness of the prior to non-exp ert users outside of sparsity researc hers. ∝ log p ( β j ) double-exp onen tial − λ | β j | normal-gamma ( 1 2 − λ ) log | β j | − log K λ − 1 / 2 | β j | λ N E G − β 2 4 λ 2 − log D − 2( λ + 1 2 ) | β | λ Horsesho e log R ∞ 0 (2 τ 2 λ 2 ) − 1 / 2 exp − β 2 j 2 τ 2 λ 2 (1 + λ ) − 2 dλ Gt ( a, b ) − ( a + 1) log 1 + | β j | ab T able 1: Log densities for the double-exp onen tial and v arious sparsity-priors. K ( · ) denotes a Bessel function of the third kind and D v ( z ) is the parab olic cylinder function. The Horsesho e prior do es not hav e an analytic form. 3 Genot yp e Data In the dev elopment and testing of the metho ds in this pap er w e made use of anonymous genotype data obtained from the W ellcome T rust Centre for Human Genetics, Oxford. The data consists of a subset of single-neucleotide p olymorphisms (SNPs) from a genome-wide data set originally gathered in a case-con trol study to iden tify colorectal cancer risk alleles. The data set we consider consists of genotype data for 1859 sub jects on 184 closely spaced SNPs from a hit- region on c hromosome 18q. W e standardise the columns of X to b e mean zero and unit standard deviation. A plot of the correlation structure across the first 100 markers in the region is sho wn 6 in Fig 3. W e can see the markers are in high LD with blo c k like strong correlation structure that makes multiple regress ion analysis challenging. W e generated pseudo-phenotype data, y , by selecting at random fiv e lo ci and generating co- efficien ts, β ∼ N (0 , 0 . 2), judged to be realistic signal sizes seen in GW AS [23]. All other co efficien ts were set to zero. W e then generated phenotype, case-con trol, binary y data by sam- pling Bernoulli trials with probability giv en by the logit of the log-o dds, x i β , for an individuals genot yp e x i . W e considered tw o scenarios. The first when we construct a genotype matrix of 500 individuals using the five relev an t markers and an additional 45 “n ull” mark ers whose corresp onding coefficients w ere zero. The second data set uses all av ailable markers and all sub jects to create a (1859 × 184) design matrix. 4 Sequen tial Mon te Carlo and graphic card sim ulation W e propose to explore the posterior distributions of β | X , y , a, c with v arying c = b/a via Sequen tial Monte Carlo (SMC) sampler methodology [24]. In particular, w e let the target distribution at time t b e π t ( β ) = p ( β | X , y , a, b t /a ) with t = { 1 , . . . , T } and some specified sequence { b t } T t =1 where b i > b i +1 and b T is small enough for the prior to dominate the lik eliho od with all co efficients posterior distributions hea vily concentrated around zero, and b 1 large enough for the prior to b e diffuse and hav e limited impact on the lik eliho o d. In order to compute p osterior distributions o ver the range of { b j } T j =1 w e mak e use of sequen tial Mon te Carlo samplers indexed on this sequence of scales. It is imp ortan t to note that we cannot ev aluate π t ( β ) = p ( y | X , β ) p ( β | a, b t ) R p ( y | X , β ) p ( β | a, b t ) d β since w e cannot ev aluate the in tegral in the denominator, the marginal lik eliho o d of y . How ever, w e can ev aluate γ t ( β ) = Z t π t ( β ) where Z t = R p ( y | X , β ) p ( β | a, b t ) d β . A t time t we ha ve a collection of N particles { β ( i ) t } N i =1 and accompan ying weigh ts { W ( i ) t } N i =1 suc h that the empirical distribution ˜ π N t ( d β ) = N X i =1 W ( i ) t δ β ( i ) t ( d β ) is an approximation of π t . The w eights are giv en by W ( i ) t = W ( i ) t − 1 w t ( β ( i ) t − 1 , β ( i ) t ) P N j =1 W ( j ) t − 1 w t ( β ( j ) t − 1 , β ( j ) t ) (5) where w t ( β ( i ) t − 1 , β ( i ) t ) = γ t ( β ( i ) t ) L t − 1 ( β ( i ) t , β ( i ) t − 1 ) γ t − 1 ( β ( i ) t − 1 ) K t ( β ( i ) t − 1 , β ( i ) t ) (6) W e use an Mark ov c hain Monte Carlo (MCMC) kernel K t with inv ariant distribution π t to sample each particle β ( i ) t , ie. β ( i ) t ∼ K t ( β ( i ) t − 1 , · ). The backw ards k ernel L t − 1 used to weigh t the particles is essentially arbitrary but we use the reversal of K t L t − 1 ( β t , β t − 1 ) = π t ( β t − 1 ) K t ( β t − 1 , β t ) π t ( β t ) 7 so that the weigh ts in (6) are given by w t ( β ( i ) t − 1 , β ( i ) t ) = γ t ( β ( i ) t − 1 ) γ t − 1 ( β ( i ) t − 1 ) as suggested in [25, 26, 24]. Since the weigh ts do not dep end on the v alue of β t , it is b etter to resample the particles b efore mo ving them according to K t , when resampling is necessary . W e resample the particles when the effective sample size [27] is below 3 / 4 N . Also note that ev en if we knew the normalizing constan ts Z t − 1 and Z t , they would hav e no impact since the weigh ts are normalized in (5). W e can how ev er obtain an estimate of Z t / Z t − 1 at time t via Z t Z t − 1 = Z γ t ( β t − 1 ) γ t − 1 ( β t − 1 ) γ t − 1 ( β t − 1 ) d β t − 1 (7) ≈ N X i =1 W ( i ) t − 1 w t ( β ( i ) t − 1 , β ( i ) t ) = d Z t Z t − 1 (8) whic h allows one to estimate the unnormalized marginal density Z t / Z 1 for eac h v alue of b t via c Z t Z 1 = Q t t =2 d Z t Z t − 1 . If we additionally giv e log b a uniform prior, then these unnormalized marginal densities can also b e used to weigh t the collections of particles at each time, allowing us to approximate the distribution of β with b marginalized out. ˜ π N t ( d β ) = 1 P T t =1 c Z t Z 1 T X t =1 c Z t Z 1 N X i =1 W ( i ) t δ β ( i ) t ( d β ) (9) Note that this is equiv alent to systematic sampling of the v alues of b . 4.1 F urther Details F or the exp eriments we used the data with n = 500 and p = 50, we use N = 8192 particles and w e consider tw o settings of the degrees of freedom a = 1 and a = 4 with T = 450 and T = 350 resp ectiv ely , with the lo w er T for a = 4 explained b y the lo w marginal lik eliho o d associated with small v alues of b when a is large. W e set b t = 2(0 . 98) t − 1 for t ∈ { 1 , . . . , T } . Given the nature of the p osterior, a simple p -dimensional random w alk Metroplis-Hastings kernel do es not seem to conv erge quickly . W e found that using a cycle of kernels that eac h up date one elemen t of β using a random walk prop osal with v ariance 0 . 25 mixed b etter than a p -dimensional random w alk prop osal. T o ensure that our kernels mix fast enough to get go o d results w e construct K t b y cycling this cycle of kernels 5 times. This is computationally demanding since each step of the algorithm consists of 5 N p MCMC steps, each of which has a complexity in O ( n ), since the calculation of the likelihoo d requires us to up date X β for a change in one element of β . In fact, computation on a 2.67GHz Intel Xeon requires approximately 20 hours of computation time with these settings and T = 450. How ever, w e implemen ted the algorithm to run on an NVIDIA 8800 GT graphics card using the Compute Unified Device Architecture (CUD A) parallel computing arc hitecture, using the SMC sampler framew ork in [28] to take adv antage of the ability to compute the likelihoo d in parallel for many particles. On this hardware, the time to run the algorithm is approximately 30 minutes for T = 450, giving around a 40 fold sp eedup. 8 The initial particles for t = 1 are obtained by running an MCMC algorithm targeting π 1 for a long time and picking suitably thinned samples to initialize the particles. The w eights of eac h particle start off equal at 1 / N . While not ideal, trivial importance sampling estimates ha ve a very low effective sample size for reasonable computational p o wer and, in any case, the SMC sampler metho dology requires us to hav e a rapidly mixing MCMC kernel for our resulting estimates to b e go o d. The particle with the highest p osterior density at each time is used as an initial v alue for the MAP algorithm describ ed in Section 2.1. The densit y of the estimated MAP from this initial v alue is compared with the MAP obtained by using the previous MAP as an initial v alue and the estimate with the higher p osterior density is chosen as the estimated MAP . This has the effect of b oth removing v ariables that the original algorithm has not remov ed and k eeping v ariables in that the original algorithm has not, as it is not p ossible for the MAP algorithm to explore the multimodal p osterior density fully . While the indep endence of the w eights of the particles mak es it possible to adapt the selection of b t allo w aggressive c hanges in b while main taining a giv en effectiv e sample size [29, 30], w e found that this scheme had a p o or effect on the quality of our samples. This is primarily b ecause we disco vered that our cycle length for K t w as not large enough to get the particles to approximate π t after a large change in b , despite a small change in effective sample size. Increasing the cycle length has an essentially linear effect on the computational resources required, so this app ears to be a p oor option for reducing the computational complexity of the metho d. In addition, ha ving b t on a logarithmic schedule enables us to use the simple appro ximation to the p osterior of β | X , y , a given by (9). F or the larger experiment with n = 1859 , p = 184 and a = 4, we use the same parameters except that w e used the sc hedule b t = (0 . 98) t − 1 with T = 250. This was sufficient to capture the a wide range of mo del complexities. Nevertheless, the running time for this exp erimen t on our GPU was 7.5 hours. This demonstrates the k ey practical b enefit to using GPUs, as a similar run on a single CPU would take up wards of 5 days. 4.2 T uning the SMC sampler F ollowing exploratory analysis w e found that a sequence of scales as b t = b 1 (0 . 98) t − 1 w orked w ell, and that setting b 1 equal to one or t wo pro duced a diffuse enough prior when sample size w as in the 100s or 1000s, as is the situation in GW AS. In practice, one will ha ve to exp erimen t with the parameters of the SMC sampler in order to ensure satisfactory results. In particular, given an MCMC kernel K t , one m ust decide ho w man y cycles should b e p erformed p er time step to mix adequately in relation to how quic kly one wishes to decrease b . In an ideal scenario, K t pro duces essentially indep enden t samples from π t regardless of the current set of particles. This w ould allow one to set { b t } T t =1 suc h that computational pow er is only spent in areas of b that are of in terest. In practice, ho w ever, K t will often only mix w ell lo cally and so small steps in b are desirable to ensure that the Monte Carlo error in the sc heme do es not dominate the results. It is difficult to c haracaterize the relationship b et w een the n umber of cycles of K to p erform and ho w aggressiv ely one can decrease b in general. Ho wev er, in this con text, one might not wan t to decrease b too quic kly an ywa y , so that a smooth picture of the relationship b et w een successive p osteriors can b e visualized. The n umber of particles, N , is a crucial parameter that should b e set at such a level that some 9 fraction of N can provide a suitable approximation of eac h intermediate target densit y . In practice, the n umber of effectiv e samples is lo w er than N and the effective sample size is an attempt to ensure that this n umber is controlled. It is standard practice to choose to resample particles when the effectiv e sample size is in (2 / 3 , 1) N . W e c ho ose to resample the particles when the effective sample size fell b elo w 3 / 4 N . A recommended wa y to test whether the sampler is p erforming acceptably is to run an MCMC sc heme with fixed b = b t for some in termediate v alue of t and compare the samples one obtains after con vergence with those pro duced by the SMC sampler. This is p ossible in this con text b ecause the p osterior is not particularly troublesome to sample from for fixed b with a sensible k ernel. Finally it is w orth noting that if the aim was to infer a single distribution giv en a single setting of c , one could ac hieve b etter results via one long MCMC run. The b enefit of the SMC sampler is that it pro vides a unified metho d to obtain all of the exploratory statistics and estimates of the marginal likelihoo d in a single run. 5 Sparsit y-P ath-Analysis Lasso plots of MAP estimates as functions of the p enalt y parameter λ are a highly informativ e visual to ol for statisticians to gain an understanding of the relative predictiv e importance of co v ariates o ver a range of mo del complexities [31]. How ever, MAP statistics are single p oin t estimates that fail to conv ey the full extent of information contained within the join t p osterior distribution. One k ey aim of this report is to develop full Ba yesian exploratory tools that utilize the joint p osterior information. In this section we describ e graphical displays of summary statistics contained in SP A results b y first considering the n = 500 p = 50 data set described ab o ve. In particular the in- dices of the v ariable with non-zero β ’s are I = { 10 , 14 , 24 , 31 , 37 } with corresp onding β I = {− 0 . 2538 , 0 . 4578 , − 0 . 1873 , − 0 . 1498 , 0 . 0996 } . In the Gt ( a, c ) prior we found the Gt ( a = 1 , c ), or Gt ( a = 4 , c ), to b e go od default settings for the degrees of freedom b oth leading to heavy tails. W e ran the SMC-GPU sampler describ ed ab o ve and then generated a n um b er of plots using the set of particles and their weigh ts. 5.1 P ath plots In Fig. 4 we plot four graphs of summary statistics obtained from the SMC sampler across a range of scales log( c ) ∈ ( − 8 , 0). (a) MAP plots: In Fig. 4(a) we plot the MAP paths of all 50 coefficients moving from the most sparse mo dels with c = e − 8 where all ˆ β M AP = 0 through to c = 1 when all ˆ β M AP 6 = 0. The true non-zero co efficients are plotted in colors other than gra y . One can observe that the MAP paths are non-smo oth in log ( c ) such that the mo des suddenly jump a wa y from zero.This is a prop ert y of the non-log-concavit y of the p osterior distributions; suc h that the marginal p osterior densities can be bimodal with one mo de at β j = 0 and another a wa y from zero. As c increases the MAP mo de at zero decreases and at some p oin t the global mo de switc hes to the mo de at β j 6 = 0. This can b e seen clearly in Fig. 5 were we 10 plot the marginal p osterior distribution of β 24 o ver the range log c ∈ ( − 5 , 4); c.f. β 24 is sho wn as as the red curv e in Fig. 4(a). W e can see in Fig. 5 the non-conca vit y of the p osterior density and ho w the global mo de jumps from 0 to ≈ − . 04 as log c is around − 4 . 75. As mentioned ab o ve the MAP is found by iterating the EM algorithm b eginning from the particle with largest p osterior densit y . (b) Median plots: In Fig. 4(b) w e plot the absolute v alue of the median of the marginal distribution of β j ’s. This is a plot of z j (log[ c ]) vrs log c , for j = 1 , . . . , p , where, z j ( c ) = | ˆ F − 1 β j (0 . 5 | c ) | where F − 1 β j ( · ) is the inv erse cumulativ e p osterior distribution of β j , F β j ( x | c ) = Z x −∞ π ( β j | X , y , a, c ) dβ j and where π ( β j | X , y , a, c ) is the marginal p osterior distribution of β j giv en c ; π ( β j | X , y , a, c ) = R − j π ( β j , β − j | X , y , a, c ) d β − j where index {− j } indicates all indices other than the j th. The plot of absolute medians giv es an indication of ho w quickly the p osterior distribu- tions are mo ving out ward aw ay from the origin as the precision of the prior decreases. By plotting on the absolute scale we are b etter able to compare the co efficients with one another and we also see that, unlike the MAPs, the medians are smo oth functions of c . (c) P osterior for scale c : In Fig. 4(c) we show the marginal p osterior distribution p ( c | X , y , a ), p ( c | X , y , a ) = Z β p ( c, β | X , y , a ) d β . The p osterior on c graphs the relativ e evidence for particular prior scale. The mo de of Fig. 4(c), ˜ c = arg max c π ( c | X , y , a ) is indicated on plots (a),(b),(d) by a v ertical dashed line. (d) Concen tration plots: In Fig. 4(d) we plot the concentration of the marginal p osteriors of β j ’s around the origin. In particular, for user sp ecified tolerance ∆, this is a plot of V ( c ) vrs c where, V ( c ) = 1 − Z ∆ − ∆ π ( β j | X , y , a, c ) dβ j = 1 − P r ( β j ∈ ( − ∆ , ∆) | X , y , a, c ) This is a direct measure of the predictiv e relev ance of the corresp onding cov ariate (geno- t yp e). In Fig. 4(d) we ha ve set ∆ = 0 . 1 although this is a user set parameter that can b e sp ecified according to the appropriate level b elo w which a v ariable is not deemed imp ortan t. T aken together w e see the SP A plots highligh t the influential genotypes and the relativ e evidence for their predictive strength. W e can also gain an understanding of the supp ort in the data for differing v alues of log( c ). Ha ving generated the SP A plots it is in teresting to compare the results of c hanging the degrees of freedom from the Gt ( a = 4 , c ) to Gt ( a = 1 , c ). In Fig. 6 we show plots corresp onding to Fig. 4 but using a = 1 degrees of freedom. W e can see that, as exp ected, Gt ( a = 1 , c ) pro duces sparser solutions with only three non-zero MAP estimates at the mo de of the p osterior for ˜ c . Moreo ver, comparing the concentration plots Fig. 6(d) with Fig. 4(d) at the marginal MAP estimate of c w e see that for a = 1 w e see muc h greater disp ersion in the concen tration plot. 11 5.2 Individual co efficien t plots Examination of the plots in Fig.4 may highlight to the statistician some interesting predictors to explore in greater detail. Individual co effcien t plots of summary statistics can be pro duced to pro vide greater information on the posterior distributions. In Fig. 7 we sho w summary plots for four representativ e co efficients with their 90% credible interv als (green), median (black), mean (blue) and MAP (red). These are obtained from the set of weigh ted SMC particles. W e can see that as exp ected the mean and median are smo oth functions of the prior scale, which the MAP can exhibit the c haracteristic jumping for bimo dal densities. In Fig. 8 we sho w corresp onding densit y estimates. These co efficien ts were c hosen to represent markers with strong asso ciation Fig. 7(a), weak er asso ciation Fig. 7(b),(c) and no asso ciation Fig. 7(d). W e can see in the plots for Fig. 7(d) and Fig. 8(d) that for a n ull marker with no asso ciation signal the MAP app ears to b e m uch smother in log ( c ). Equiv alent plots but for a = 1 are shown in Fig.9 and Fig.10 where we see the greater sparsity induced by the hea vier tails of the GT ( a = 1 , c ) relative to GT ( a = 4 , c ). 5.3 Marginal plots The SMC sampler also allo ws us to estimate the marginal p osterior probability of β , using (9), in tegrating ov er the uncertaint y in c , p ( β | X , y , a ) = Z c p ( β , c, | X , y , a ) dc Moreo ver we can also calculate the marginal p osterior concentration aw a y from zero, for given tolerance ∆ as, V = 1 − Z c Z ∆ − ∆ π ( β j | X , y , a, c ) dβ j dc In Fig.11 w e plot summaries for the marginal posterior distributions of β as w ell as the marginal concen trations, for a = 4 and a = 1. W e can see in Fig.11 that the marginal plots pro vide a useful ov erview of the relative imp ortance of each v ariable. 5.4 Comparison to double-exponential prior, a → ∞ It is informative to compare the results for the Gt ( a, c ) prior abov e with that of the double- exp onen tial prior p ( β ) ∝ exp( − β /c ) which is obtained as a generalized t prior with q = 1 and a → ∞ . In Fig. 12 we show SP A plots for this case. W e can see the muc h smo other paths of the MAP compared with Fig 4. This can also b e seen in the individual co efficien t plots shown in Fig. 13 and Fig. 14. It is interesting to in vestigate in more detail the p osterior density of β 24 the co efficien t with strongest evidence of asso ciation. This is shown in Fig.15 and should b e compared to Fig.5 for a = 4. W e can clearly see that the heavier tails of a = 4 induces greater sparsit y and a rapid transition from the p osterior concen trated around zero to being distributed a wa y from zero. 12 5.5 Large Data Set W e next analysed the larger data set with n = 1859 and p = 184. The indices of the non-zero cofficients are I = { 108 , 22 , 5 , 117 , 162 } with the same v alues retained for β I = {− 0 . 2538 , 0 . 4578 , − 0 . 1873 , − 0 . 1498 , 0 . 0996 } . The SP A plots are sho wn in Fig.16 with corre- sp onding individual co efficien t plots in Fig.17 and Fig.18. W e can see in Fig.16 that there is m uch stronger evidence that co efficien ts { 108 , 22 , 5 , 117 } are imp ortan t. In terestingly v ariable 162 is still masked by other predictors. In comparison with Fig.4(c) we observe that a smaller v alue of c is supp orted by the larger data set. F rom Fig.16(d) we can see that one n ull-co efficien t, β 107 , (with true β 107 = 0) app ears to ha ve some evidence of predictive imp ortance. In Fig.19 we sho w summaries of the marginal distributions and concen tration plots ha ving in tegrated ov er the uncertain t y in c . The n ull- co efficien t SNP turns out to b e situated adjacen t to a predictive SNP β 108 = − 0 . 2538 shown as the blue line in Fig.16(a), and the tw o mark ers are in high LD with correlation co efficient C ( X 107 , X 108 ) = 0 . 87, lea ving some ambiguit y in the source of the asso ciation signal. The plot of absolute medians Fig.16(b) helps partly resolve this issue p oin ting to only four clear asso ciation signals around the mo de ˜ c of the marginal probability . W e can drill down a little further and plot summary statistics for the posterior distribution for β 107 sho wn in Fig.20. In Fig.20 w e can see that the MAP (red line) holds at zero for almost all v alues of c , adding supp ort that this is a “n ull” marker, although the credible in terv als (green lines) show there is considerable uncertain ty in the actual v alue. In suc h a w ay w e can see how complex asso ciation signal can b e explored, gaining a b etter understanding of the source of asso ciation signal. 6 Discussion W e ha v e presen ted an exploratory approach using generalized t priors that we call Bay esian sparsit y-path-analysis to aid in the understanding of genetic asso ciation data. The approac h in volv es graphical summaries of p osterior densities of co efficients obtained b y sweeping ov er a range of prior precisions, including v alues with lo w p osterior probabilit y , in order to c haracterize the space of mo dels from the sparse to the most complex. This use of SMC metho dology is ideally suited to the inference task, by indexing the SMC on the scale of the prior. The resulting collection of w eighted particles provides us with appro xi- mations for the co efficient p osterior distributions for each scale in addition to estimates of the marginal lik eliho o d and allo ws for impro ved robustness in MAP computation. The simulations are computationally demanding and w ould take da ys w orth of run-time using conv en tional single-threaded CPU pro cessing. T o alleviate this we mak e use of many-core GPU parallel pro cessing pro ducing around a 20-40 fold improv emen t in run-times. This has real b enefit in allo wing for data analysis within a w orking day for the larger data set we considered. Ac kno wledgemen ts The authors wish to ackno wledge the organisers of the CRM, Montreal, workshop on Computa- tional statistical metho ds for genomics and system biology . CH wishes to ac knowledge supp ort for this research by the MRC, OMI and W ellcome T rust Cen tre for Human Genetics, Oxford. 13 The co de is av ailable from AL or CH. References [1] D. J. Balding. A tutorial on statistical metho ds for p opulation asso ciation studies. Natur e R eviews Genetics , 7:81–791, 2006. [2] Robert Tibshirani. Regression shrink age and selection via the lasso. J. R oyal Statistic al So c. B , 58:267–288, 1996. [3] R. Tibshirani. Regression shrink age and selection via the lasso: a retrosp ectiv e. Journal of the R oyal Statistic al So ciety: Series B , 73(3):273–282, 2011. [4] T. Park and G. Casella. The Bay esian Lasso. Journal of the Americ an Statistic al Asso ci- ation , 103(482):681–686, 2008. [5] Jianqing F an and Runze Li. V ariable selection via nonconcav e p enalized likelihoo d and its oracle prop erties. Journal of th e Americ an Statistic al Asso ciation , 96:1348–1360, 2001. [6] Hui Zou. The adaptiv e lasso and its oracle prop erties. Journal of the Americ an Statistic al Asso ciation , 101:1418–1429, 2006. [7] Hui Zou and Runze Li. One-step sparse estimates in nonconcav e penalized lik eliho o d mo dels. The Annals of Statistics , 36:1509–1533, 2008. [8] Emman uel J. Cand ` es, Mic hael B. W akin, and Stephen P . Bo yd. Enhancing sparsit y by rew eighted ` 1 minimization. Journal of F ourier A nalysis and Applic ations , 14:877–905, 2008. [9] Ric k Chartrand and W otao Yin. Iteratively rew eigh ted algorithms for compressive sensing. In 33r d International Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing (ICASSP) , 2008. [10] C. J. Hoggart, J. C. Whittaker, M. De Iorio, and D. J. B alding. Sim ultaneous analysis of all snps in genome-wide and re-sequencing asso ciation studies. PLOS Genetics , 4:e1000130, 2008. [11] K. L. Ay ers and H. J. Cordell. SNP selection in genome-wide and candidate gene studies via p enalized logistic regression. Genetic Epidemiolo gy , 34:879–891, 2010. [12] Jim E. Griffin and Philip J. Bro wn. Bay esian adaptiv e lassos with non-con v ex p enalization. T echnical rep ort, IMSAS, Universit y of Kent, 2007. [13] E. I. George and R. E. McCullo c h. V ariable selection via Gibbs sampling. Journal of the A meric an Statistic al Asso ciation , 88:881–889, 1993. [14] M.A. Wilson, E. S. Iv ersen, M.A. Clyde, S. C. Schmidler, and J. M. Sc hildkraut. Bay esian mo del searc h and multilev el inference for snp asso ciation studies. Annals of Applie d Statis- tics , 4:1342–1364, 2010. [15] B.L. F ridley . Bay esian v ariable and model selection methods for genetic asso ciation studies. Genetic epidemiolo gy , 33(1):27–37, 2009. [16] J.B. McDonald and W.K. New ey . P artially adaptiv e estimation of regression models via the generalized t distribution. Ec onometric the ory , 4(03):428–457, 1988. 14 [17] An thony Lee, F rancois Caron, Arnaud Doucet, and Chris Holmes. A hierarc hical Bay esian framew ork for constructing sparsity-inducing priors. 2010. [18] V olk an Cevher. Learning with compressible priors. In Y. Bengio, D. Sch uurmans, J. Laf- fert y , C. K. I. Williams, and A. Culotta, editors, A dvanc es in Neur al Information Pr o c essing Systems 22 , pages 261–269. 2009. [19] Artin Armagan, Da vid Dunson, and Jaeyong Lee. Generalized double Pareto shrink age. 2011. [20] J.E. Griffin and P .J. Bro wn. Inference with normal-gamma prior distributions in regression problems. Bayesian A nalysis , 5:171–188, 2010. [21] F. Caron and A. Doucet. Sparse Ba yesian nonparametric regression. In International Confer enc e on Machine L e arning , 2008. [22] C.M. Carv alho, N.G. Polson, and J.G. Scott. The horsesho e estimator for sparse signals. Biometrika , 97(2):465, 2010. [23] M. Stephens and D.J. Balding. Bay esian statistical metho ds for genetic association studies. Natur e R eviews Genetics , 10(10):681–690, 2009. [24] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential Monte Carlo samplers. J. R oyal Statistic al So c. B , 68(3):411–436, 2006. [25] G. E. Cro oks. Nonequilibrium measurements of free energy differences for microscopically rev ersible Marko vian systems. J. Stat. Phys. , 90:1481–1387, 1998. [26] Radford M. Neal. Annealed imp ortance sampling. Statistics and Computing , 11(2):125– 139, 2001. [27] Jun S. Liu and Rong Chen. Blind decon volution via sequential imputations. Journal of the Americ an Statistic al Asso ciation , 90:567–576, 1995. [28] An thony Lee, Christopher Y au, Mic hael B. Giles, Arnaud Doucet, and Christopher C. Holmes. On the utilit y of graphics cards to perform massively parallel sim ulation of ad- v anced Monte Carlo metho ds. Journal of Computational and Gr aphic al Statistics , 19:769– 789, 2010. [29] Pierre Del Moral, Arnaud Doucet, and Aja y Jasra. On adaptive resampling strategies for sequen tial Monte Carlo metho ds. 2008. T ec hnical Rep ort. [30] Luk e Bornn, Arnaud Doucet, and Raphael Gottardo. An efficient computational approach for prior sensitivit y analysis and cross-v alidation. Canadian Journal of Statistics , 38:47–64, 2010. [31] T. Hastie, R. Tibshirani, and J.H. F riedman. The elements of statistic al le arning: data mining, infer enc e, and pr e diction . Springer V erlag, 2009. 15 Figures Figure 1: Plot of log probabilit y under the Gt (1 , 1) (solid line) and the Lasso prior D E (0 , 1) (dashed line). Figure 2: Estimate of the p osterior mean under a single normal observ ation, y , for the Gt ( a = 1 , b − 0 . 1) (blue dashed) and Gt ( a = 4 , 0 . 1) (red dashed). W e can see the sparsity by the setting to zero around y = 0 and the approximate un biasedness with reduction in shrink age as | y | >> 0. 16 Figure 3: Plot of correlation structure across the first 100 adjacent genotype mark ers from c hromosome 18q. (a) MAPs (b) absolute medians (c) p osterior density (d) concentrations Figure 4: SP A plots with a = 4 degrees of freedom; using 50 markers around 18q. The 5 non- zero co efficien ts are indicated b y non-grey lines. The vertical dashed line indicates the mo de of the marginal likelihoo d, π ( c | X , y , a ), for c as shown in plot (c). 17 Figure 5: Plot of marginal p osterior of β 24 as a function of c , c.f. the red curv e in Fig. 4(a). W e can see the bimo dalit y and hence the jumping of the MAP mo de in Fig. 4(a) as log c changes from (-5, -4). (a) MAPs (b) absolute medians (c) p osterior density (d) concentrations Figure 6: SP A plots with a = 1 degrees of freedom; using 50 markers around 18q. The 5 non- zero co efficien ts are indicated b y non-grey lines. The vertical dashed line indicates the mo de of the marginal lik eliho o d, π ( c | X , y , a ), for c as sho wn in plot (c). This Figure should be compared with Fig. 4 using a = 4 degrees of freedom. 18 (a) a = 4, j = 14 (b) a = 4, j = 24 (c) a = 4, j = 31 (d) a = 4, j = 1 Figure 7: Stats for individual co efficien ts from Fig. 4 with a = 4. F or each co efficient we plot 90% credible interv als (green), median (blac k), mean (blue) and MAP (red). (a) a = 4, j = 14 (b) a = 4, j = 24 (c) a = 4, j = 31 (d) a = 4, j = 1 Figure 8: P osterior density plots corresp onding to Fig. 7, a = 4 19 (a) a = 1, j = 14 (b) a = 1, j = 24 (c) a = 1, j = 31 (d) a = 1, j = 1 Figure 9: Stats for individual co efficien ts from Fig. 6 with a = 1. F or each co efficient we plot 90% credible in terv als (green), median (blac k), mean (blue) and MAP (red). Compare with Fig. 7 where a = 4. (a) a = 1, j = 14 (b) a = 1, j = 24 (c) a = 1, j = 31 (d) a = 1, j = 1 Figure 10: P osterior density plots corresp onding to Fig. 9, a = 1 20 (a) (b) (c) (d) Figure 11: Marginal plots: (a) the summary stats from marginal p osterior distributions sho wing MAPs (crosses), Medians (stars), and 90% credible in terv als (bars) for a = 4 degrees of freedom; (b) the marginal concentrations ∆ = 0 . 05 red bars, ∆ = 0 . 1 (blue bars) for a = 4; (c ) marginal p osteriors as in (a) but with a = 1 prior; (d) marginal concentrations as in (b) but for a = 1, prior. 21 (a) MAPs (b) medians (c) marginal density (d) concentrations Figure 12: SP A plots for the double-exp onen tial prior, Gt ( a → ∞ , c ). (a) double-exp onen tial j = 14 (b) double-exp onen tial j = 24 (c) double-exp onen tial j = 31 (d) double-exp onen tial j = 1 Figure 13: Stats for individual co efficients from Fig. 12 with double-exp onen tial prior. F or eac h co efficien t we plot 90% credible interv als (green), median (black), mean (blue) and MAP (red). Compare with Fig. 7 where a = 4. 22 (a) double-exp onen tial j = 14 (b) double-exp onen tial j = 24 (c) double-exp onen tial j = 31 (d) double-exp onen tial j = 1 Figure 14: P osterior density plots corresp onding to Fig. 13 with double-exp onen tial prior. Figure 15: Plot of marginal p osterior of β 24 as a function of c for double-exponential prior a → ∞ . Under the double-exp onen tial prior, a → ∞ , w e see the p osterior remains log-concav e and hence the MAP is a smo oth function of log c ; as compared to Fig. 5. 23 (a) MAPs (b) medians (c) marginal density (d) concentrations Figure 16: SP A plots using the larger data set n = 1859, p = 184 (a) a = 4 , j = 22 (b) a = 4 , j = 5 (c) a = 4 , j = 117 (d) a = 4 , j = 115 Figure 17: Summary statistics for posterior distributions for individual co efficien ts, 90% credible in terv als (green), median (blac k), mean (blue) and MAP (red). 24 (a) a = 4 , j = 22 (b) a = 4 , j = 5 (c) a = 4 , j = 117 (d) a = 4 , j = 115 Figure 18: P osterior density plots of co efficients in Fig.17 (a) (b) Figure 19: Marginal plots: (a) the summary stats from marginal p osterior distributions sho wing MAPs (crosses), Medians (stars), and 90% credible in terv als (bars) for a = 4 degrees of freedom; (b) the marginal concentrations ∆ = 0 . 05 red bars, ∆ = 0 . 1 (blue bars) for a = 4 25 Figure 20: Stats for individual co efficient β 107 sho wing 90% credible interv als (green), median (blac k), mean (blue) and MAP (red). 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment