A Polya-Gamma Sampler for a Generalized Logistic Regression

A P´ oly a-Gamma Sampler for a Generalized Logistic Regression Luciana Dalla V alle ∗ Univ ersit y of Plymouth, Plymouth, UK F abrizio Leisen Univ ersit y of Nottingham, Nottingham, UK Luca Rossini Queen Mary Univ ersit y of London, London, UK and W eixuan Zh u Xiamen Univ ersit y , Xiamen, P eople’s Republic of China Decem b er 22, 2020 Abstract In this pap er w e in tro duce a no vel Ba yesian data augmen tation approach for estimating the parameters of the generalised logistic regression mo del. W e prop ose a P´ oly a-Gamma sampler algorithm that allows us to sample from the exact p osterior distribution, rather than relying on appro ximations. A simulation study illustrates the ﬂexibilit y and accuracy of the prop osed approach to capture heavy and ligh t tails in binary resp onse data of diﬀeren t dimensions. The methodology is applied to t wo diﬀeren t real datasets, where w e demonstrate that the P´ oly a-Gamma sampler pro vides more precise estimates than the empirical lik eliho od metho d, outp erforming approximate approaches. Keywor ds: Ba y esian inference; generalized logistic regression; P´ oly a-Gamma sampler; recidi- vism data. ∗ The authors gratefully ackno wledge Alex Diana for advising on the PGdra w sampler and Michael W¨ ogerer for useful feedbac k on a previous v ersion of the paper. The fourth author is supp orted by the Chinese F unda- men tal Research F unds for the Cen tral Universities No 20720181062. 1 1 In tro duction Data augmen tation (D A) is undoubtedly one of the most p opular Marko v Chain Monte Carlo (MCMC) metho ds. The idea is to consider the target distribution as the marginal of a join t distribution in an augmented space. Assuming it is easy to sample from the conditional distri- butions, the algorithm is a straigh tforw ard t wo steps pro cedure whic h makes use of a simple Gibbs sampling. Since the seminal work of T anner and W ong (1987), data augmentation has b een extensiv ely studied, dev elop ed and used in diﬀeren t contexts. F or instance, Meng and v an Dyk (1999), Liu and W u (1999), v an Dyk and Meng (2001) dev elop ed strategies to speed up the basic DA algorithm. Hob ert and Marc hev (2008) studied the eﬃciency of DA algorithms. Re- cen tly , Leisen et al. (2017) use a D A approach to p erform Ba yesian inference for the Y ule-Simon distribution. The idea is used in the context of inﬁnite hidden Marko v mo dels b y Hensley and Djuri ´ c (2017). The literature on D A is v ast and it is diﬃcult to list all the contributions to the ﬁeld. A review of DA algorithms can b e found in Bro oks et al. (2011), Chapter 10. In the context of binary regression, the article of Alb ert and Chib (1993) sets a milestone in the use of D A algorithms. The authors dev elop exact Ba y esian metho ds for mo deling cate- gorical resp onse data by introducing suitable auxiliary v ariables. In particular, they use a DA approac h to estimate the parameters of the probit regression mo del. Ho w ever, although some v alid approaches ha v e been prop osed (Holmes et al., 2006), Ba y esian inference for the logistic regression w as not satisfactorily addressed until the pap er of Polson et al. (2013). The authors prop osed an elegan t D A approac h which makes use of the follo wing identit y: ( e ψ ) a (1 + e ψ ) b = 2 − b e κψ Z e − ω ψ 2 / 2 p ( ω ) dω (1) where κ = a − b/ 2, a > 0, b > 0 and p ( ω ) is the densit y of a PG( b ,0) distribution. PG( b ,0) is a P´ oly a-Gamma distribution with parameters b and 0; w e refer to the article of Polson et al. (2013) for details ab out P´ olya-Gamma distributions. When ψ = x T β is a linear function of 2 predictors, the in tegrand is the kerne l of a Gaussian lik eliho o d in β . This naturally suggests a D A sc heme whic h simply requires to b e able to sample from the multiv ariate normal and the P´ oly a-Gamma distributions. This algorithm is called the P´ olya-Gamma sampler . A theoretical study of the algorithm can b e found in Choi and Hob ert (2013). In this paper, w e propose a P´ olya-Gamma sampler to estimate the parameters of the gen- eralized logistic regression mo del introduced in Dalla V alle et al. (2019). The authors studied immigration in Europ e emplo ying a nov el regression mo del which mak es use of a generalized logistic distribution. Their work is motiv ated by the need of a logistic mo del with heavy tails. The distribution considered in the pap er has the follo wing densit y function: f ( x ) = 1 B ( p, p ) e px (1 + e x ) 2 p x ∈ R , (2) where the parameter p controls the tails of the distribution. The standard logistic densit y can b e reco v ered with p = 1. Compared with the standard logistic distribution, p > 1 means lighter tails and 0 < p < 1 means heavier tails, as sho wn in Figure 1. Unfortunately , the distribution function is not explicit: F ( x ) = 1 B ( p, p ) B  e x 1 + e x ; p, p  x ∈ R (3) where B ( t ; p, q ) = R t 0 x p − 1 (1 − x ) q − 1 dx , with 0 < t < 1, is the incomplete Beta function. Therefore, a straigh tforw ard use of the metho dology of Polson et al. (2013) is not p ossible. T o o v ercome the problem, Dalla V alle et al. (2019) use an appro ximate Ba yesian computational metho d that relies on the empirical lik eliho o d (see Mengersen et al. (2013) and Karabatsos and Leisen (2018)). Ho w ever, since the empirical likelihoo d approac h is based on an appro ximation of the p osterior distribution, this metho d ma y lead to wide credible interv als and sometimes am biguous estimates, esp ecially for the tail parameter. In this pap er, we propose a nov el approach for estimating the generalised logistic regression, that allo ws us to draw samples from the exact p osterior distribution and is able to ov ercome 3 the dra wbac ks of the empirical likelihoo d metho d. W e show ho w to set a D A scheme for the generalized logistic regression whic h mak es use of the P´ oly a-Gamma identit y in equation (1). W e test the p erformance of the prop osed approach on simulated data and on tw o diﬀeren t real dataset: the ﬁrst dataset includes p eople’s opinions tow ards immigration in Europ e; the second con tains information ab out the recidivism of criminals detained in Io wa. W e show that the P´ oly a-Gamma sampler yields accurate results in high-dimension and outp erforms the empirical lik eliho o d algorithm pro viding a more precise estimation of the mo del parameters. The rest of the pap er is organised as follo ws. In Section 2, we in tro duce the generalized logistic regression framework in a Ba yesian setting. In Section 3, w e pro vide the details of the P´ olya-Gamma sampler for this mo del. Section 4 and Section 5 illustrate the algorithm p erformance with sim ulated and real data. Section 6 concludes. Figure 1: Probability density function of the generalized logistic with p = 1, p = 0 , 3, p = 0 , 7, p = 2 and p = 5. 4 2 The Generalized Ba y esian Logistic Regression Consider a binary regression set-up in whic h Y 1 , . . . , Y n are indep endent Bernoulli random v ariables such that Pr( Y i = 1 | β ) = H ( x T i β ) where x i is a k × 1 vector of known cov ariates asso ciated with Y i , i = 1 , . . . , n , β is a k × 1 vector of unkno wn regression co eﬃcien ts, and H : R → (0 , 1) is a distribution function. In this case, the lik eliho o d function is given by L ( β ) = n Y i =1 [ H ( x T i β )] y i [1 − H ( x T i β )] 1 − y i (4) If H ( x ) = Φ( x ) is the distribution function of a Gaussian, then we are in the probit regression framew ork. If H ( x ) = e x (1 + e x ) − 1 , then we are in the sp ecial case of the standard logistic regression. Albert and Chib (1993) prop osed a D A approac h for sampling from the p osterior distribution of the probit regression mo del. Their metho d makes use of auxiliary v ariables whic h allo w an easy implementation of the Gibbs sampler. P olson et al. (2013) prop osed an elegan t algorithm for tac kling the logistic regression case. The generalized logistic distribution in equation (2) has mean zero and scale one. Consider no w a generalized logistic with mean x T i β and scale one, i.e. with probability density function: f ( x ) = 1 B ( p, p ) e p ( x − x T i β )  1 + e ( x − x T i β )  2 p x ∈ R (5) W e denote the ab o ve distribution with GLog( x T i β , 1). Mimic king Alb ert and Chib (1993), supp ose to sample Z i from a GLog( x T i β , 1), i = 1 , . . . , n , and set Y i = 1 if Z i > 0. Otherwise, if Z i ≤ 0, then set Y i = 0. It is easy to see that P ( Y i = 1) = 1 B ( p, p ) B e x T i β 1 + e x T i β ; p, p ! . The joint p osterior distribution is π ( β , p, Z | y ) ∝ π ( β ) π ( p ) n Y i =1 { I ( Z i > 0) I ( y i = 1) + I ( Z i ≤ 0) I ( y i = 0) } 5 × 1 B ( p, p ) e p ( Z i − x T i β )  1 + e ( Z i − x T i β )  2 p (6) where π ( β ) and π ( p ) are the prior distributions of β and p , respectively . With the probit mo del, the tec hnique of Alb ert and Chib (1993) works b ecause the full conditional distributions can b e resorted to normal distributions (or truncated normal distributions) which are easy to sample from. In our case, the usual normal prior w ould lead to a full conditional on β which is not explicit, requiring the use of a Metrop olis step (or alternativ e algorithms). The P´ olya-Gamma iden tit y in (1) allows us to ov ercome this problem. In particular, we get e p ( Z i − x T i β )  1 + e ( Z i − x T i β )  2 p = 2 − 2 p Z + ∞ 0 e − ω i ( Z i − x T i β ) 2 / 2 p ( ω i ) dω i , where p ( ω ) is a P´ olya-Gamma, PG(2 p, 0), distribution. Note that e − ω ( x T i β ) 2 / 2 p ( ω ) is the un- normalized densit y of a PG(2 p, x T i β ) random v ariable. Therefore, the augmented p osterior distribution is π ( β , p, Z , ω | y ) ∝ π ( β ) π ( p ) n Y i =1 { I ( Z i > 0) I ( y i = 1) + I ( Z i ≤ 0) I ( y i = 0) } × 2 − 2 p e − ω i ( Z i − x T i β ) 2 / 2 p ( ω i ) . (7) 3 Inference Sampling Strategy The algorithm prop osed in Dalla V alle et al. (2019) makes use of an appro ximate lik eliho o d, whic h relies on an approximation of the p osterior distribution (see Mengersen et al. (2013) and Zhu and Leisen (2016)). The algorithm presen ted in this pap er allows us to sample from the true p osterior distribution, making it more app ealing to p erform the Ba y esian inferential exercise. 6 Starting from the augmen ted p osterior distribution in (7), this Section introduces the full conditional distributions required to run the algorithm. The v ariables in volv ed are Z , ω , β , p, where Z = ( Z 1 , . . . , Z n ), ω = ( ω 1 , . . . , ω n ) and β = ( β 1 , . . . , β k ). W e will fo cus on t wo cases: 1) the parameter p is kno wn, 2) the parameter p is unknown. In the ﬁrst case, the algorithm p erformance is very go o d and accurate. In the second case, the estimation of the parameter p introduces more noise in the algorithm leading to a less accurate, but still acceptable, estimation. Section 4 will provide a sim ulation study for b oth scenarios. The real data analysis assumes that p is unknown. The parameter p is kno wn In this case, the p osterior distribution in (7) reduces to: π ( β , Z , ω | y ) ∝ π ( β ) n Y i =1 { I ( Z i > 0) I ( y i = 1) + I ( Z i ≤ 0) I ( y i = 0) } × 2 − 2 p e − ω i ( Z i − x T i β ) 2 / 2 p ( ω i ) . (8) The full conditional distributions are derived b elo w and the v ariables of interest are: Z , ω , β . F ull conditional for Z i . The random v ariables Z 1 , . . . , Z n are indep enden t with full condi- tional distributions: π ( Z i | y , β , ω ) ∝ { I ( Z i > 0) I ( y i = 1) + I ( Z i ≤ 0) I ( y i = 0) } e − ω i ( Z i − x T i β ) 2 / 2 ∝ { I ( Z i > 0) I ( y i = 1) + I ( Z i ≤ 0) I ( y i = 0) } e − ω i 2 ( Z 2 i − 2 Z i x T i β ) whic h are truncated normal distributions. More precisely , 7 • if y i = 1, then π ( Z i | y , β , ω ) is distributed as a N ( x T i β , ω − 1 i ) truncated to the left by 0; • if y i = 0, then π ( Z i | y , β , ω ) is distributed as a N ( x T i β , ω − 1 i ) truncated to the right b y 0. F ull conditional for ω i . F ollo wing P olson et al. (2013), it is easy to see that the full conditional distribution of ω i is: π ( ω i | β , Z , y ) ∝ e − ω i ( Z i − x T i β ) 2 / 2 p ( ω i ) ∼ PG  2 p, Z i − x T i β  An eﬃcient method for sampling P´ olya-Gamma random v ariables is implemented in a mo diﬁed v ersion of the R pac k age BayesLogit (Windle et al., 2016). F ull conditional for β . Let Ω = diag( ω 1 , . . . , ω n ). Assuming a multiv ariate N ( β ∗ , B ∗ ) prior for β , the full conditional distribution of β is π ( β | Z , y , ω ) ∝ π ( β ) n Y i =1 exp n − ω i 2 ( Z i − x T i β ) 2 o = exp  − 1 2  β T  X T Ω X + B ∗− 1  β − 2 β T  X T Ω Z + B ∗− 1 β ∗   ∼ N ( ˜ β ∗ , ˜ V ∗ β ) where ˜ β ∗ =  X T Ω X + B ∗− 1  − 1  X T Ω Z + B ∗− 1 β ∗  ˜ V ∗ β =  X T Ω X + B ∗− 1  − 1 . The parameter p is unkno wn In this case, the full conditional for p is: π ( p | β , Z , ω , y ) ∝ π ( p ) n Y i =1 2 − 2 p B ( p, p ) p ( ω i ) 8 where p ( ω i ) is a PG(2 p, 0) and we assume a Gamma prior for p . Sampling from this full conditional distribution is not straightforw ard and alternativ e strategies m ust b e employ ed. One can marginalize the ab ov e full conditional by in tegrating out the ω ’s and applying the slice sampling of Neal (2003). Alternatively , one can further in tegrate out the Z and obtain the follo wing conditional π ( p | β , y ) ∝ π ( p ) n Y i =1 " 1 B ( p, p ) B e x T i β 1 + e x T i β ; p, p !# y i " 1 − 1 B ( p, p ) B e x T i β 1 + e x T i β ; p, p !# 1 − y i (9) The slice sampling algorithm requires the computation of the inv erse of the ab ov e full condi- tional distribution. W e used the function uniroot a v ailable in R to compute the numerical in v erse. The next sections illustrate the performance of the algorithm with sim ulated and real data. 4 Sim ulation Studies In this section, w e assess the performance of the algorithm b y implemen ting diﬀeren t sim ulation exp erimen ts. In particular, w e use four diﬀerent v alues of the tail parameter p : p = 0 . 3 and p = 0 . 7 (hea vy tails) and p = 1 . 5 and p = 3 (light tails). W e fo cus on tw o scenarios: 1. β = (1 , − 1 , − 3 , 1 , 3), with k = 5; 2. β = (2 . 3 , 1 , − 2 , 1 . 5 , − 2 . 7 , 0 . 2 , − 1 . 4 , 3 , − 0 . 6 , − 1 . 2), with k = 10. W e assume a standard normal distribution for the explanatory v ariables. F or each scenario and eac h v alue of p , w e generate 100 diﬀerent datasets of sample sizes n = 100 and n = 250, resp ectiv ely . W e analyse tw o diﬀeren t cases: the case with p known and the case with p unkno wn. 9 4.1 Case with p kno wn W e considered a v ague prior for β , such as the m ultiv ariate normal distribution, N ( ν , B ), with prior mean v ector ν = 0 and prior co v ariance matrix B = 100 · I k . F or each of the 100 diﬀeren t sim ulated datasets, we run 20 , 000 iterations of the MCMC algorithm and discarded the ﬁrst 5 , 000 iterations as burn-in p erio d. T able 1 lists the p osterior means of the ﬁv e-dimensional β of scenario 1 for the diﬀeren t v alues of p and n , a v eraged ov er the 100 simulated datasets. The v alues in brack ets sho w the standard deviations ov er the 100 sim ulations. W e notice that the p osterior means of β are v ery close to the true v alues, with a b etter estimation p erformance as n increases from 100 to 250. T able 3 shows the p osterior means and, in brac k ets, the standard deviations ov er the 100 simulations for the ten-dimensional scenario. These results are in line with the ﬁve-dimensional case, thus leading us to conclude that increasing the num b er of co v ariates is not an issue in terms of estimation accuracy . Moreo ver, the standard deviations in T ables 1 and 3 substanstially decrease for b oth scenarios when w e mo ve from n = 100 to n = 250. W e compare the p erformance of our P´ olya-Gamma sampler to the one of the appro ximate Ba y esian computation with empirical lik eliho o d ( B C el ) adopted in Dalla V alle et al. (2019). T able 2 and T able 4 display the p osterior means and the standard deviations o ver 100 sim ulation datasets estimated with B C el . The sampling eﬃciency of B C el dep ends heavily on the prior distribution. So w e consider a normal prior distribution N ( ν = 0 , B = 9 · I k ) for β . The results are based on 20,000 p osterior samples obtained with B C el . F rom the table, one can see that b oth metho ds achiev e similar accuracy level in terms of the estimation of the p osterior means. Ho wev er, the standard deviations given by B C el is larger than the ones obtained by P´ oly a-Gamma sampler, esp ecially when p parameter is large. Finally , consider that the prior distribution used by P´ oly a-Gamma sampler is m uch v aguer than the one used b y B C el , we conclude that P´ olya-Gamma sampler outp erforms B C el . 10 In order to illustrate the complete inferential pro cedure, w e show how the mo del parameters are estimated. Figure 2 displays the p osterior chains for a sp eciﬁc dataset with sample size n = 250, where p = 0 . 3 and β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). As shown in Figure 2, the chains, after a burn-in of 5 , 000 iterations, conv erge quickly to the true v alues of all parameters and give an excellent represen tation of the real parameter v alues. W e also performed a con vergence analysis for scenario 1, with ﬁv e-dimensional β 1 . The analysis w as carried out using the R coda pac k age (Plummer et al., 2006). In particular, w e used single datasets of sample sizes n = 100 and n = 250, resp ectively , where we ﬁxed p = 0 . 3. W e computed Gewek e’s con v ergence tests and the auto correlation and partial auto correlation functions for the v alues of β . T able 5 rep orts the results of Gew ek e’s conv ergence test (Gewek e, 1992), indicating no issues since the absolute v alues of the test statistics are all low er than 1 . 96. 4.2 Case with p unkno wn W e considered v ague priors for β and p . In particular, a multiv ariate normal distribution, N ( ν , B ), is c hosen for β with prior mean v ector ν = 0 and prior co v ariance matrix B = 100 · I k . F or p we chose a gamma prior, G a ( a, b ) with h yp erparameters a = b = 1. F or eac h of the 100 diﬀeren t simulated datasets, w e run 6 , 000 iterations of the MCMC algorithm and discard the ﬁrst 1 , 000 iterations as burn-in p erio d. T able 6 lists the p osterior means of p and the ﬁv e- dimensional β of scenario 1 for the diﬀerent v alues of n and the true p , av eraged o v er the 100 simulated datasets. The ﬁgures in brac k ets sho w the standard deviations ov er the 100 1 W e p erformed other conv ergence tests for the ten-dimensional case. These results are omitted for lack of space but are a v ailable on request. 11 p β 0 β 1 β 2 β 3 β 4 true values 0.3 1 -1 -3 1 3 n = 100 – 1.113 -1.128 -3.191 1.056 3.396 – (0.795) (0.666) (0.937) (0.696) (0.904) n = 250 – 1.041 -1.061 -3.173 1.027 3.273 – (0.411) (0.412) (0.548) (0.451) (0.555) true values 0.7 1 -1 -3 1 3 n = 100 – 1.179 -1.094 -3.282 1.148 3.309 – (0.448) (0.545) (0.640) (0.558) (0.659) n = 250 – 1.029 -1.110 -3.187 1.085 3.206 – (0.304) (0.295) (0.503) (0.312) (0.479) true values 1.5 1 -1 -3 1 3 n = 100 – 1.169 -1.124 -3.605 1.158 3.582 – (0.426) (0.430) (0.867) (0.470) (0.782) n = 250 – 1.096 -1.068 -3.277 1.111 3.326 – (0.258) (0.238) (0.541) (0.298) (0.518) true values 3 1 -1 -3 1 3 n = 100 – 1.183 -1.214 -3.681 1.239 3.729 – (0.417) (0.391) (0.853) (0.478) (0.984) n = 250 – 1.130 -1.154 -3.431 1.119 3.408 – (0.253) (0.284) (0.653) (0.239) (0.662) T able 1: PG sampler. Case with p known: p osterior means ov er the 100 diﬀeren t simulated datasets for n equal to 100 and 250 with true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 12 p β 0 β 1 β 2 β 3 β 4 true values 0.3 1 -1 -3 1 3 n = 100 – 1.185 -1.136 -3.356 1.159 3.403 – (0.773) (0.708) (0.900) (0.785) (0.844 ) n = 250 – 1.052 -1.067 -3.185 1.045 3.292 – (0.506) (0.435) (0.600) (0.488) (0.630) true values 0.7 1 -1 -3 1 3 n = 100 – 1.146 -1.177 -3.751 1.331 3.65 – (0.859) (0.711) (0.750) (0.778) (0.819) n = 250 – 1.315 -1.214 -3.289 1.251 3.304 – (0.511) (0.412) (0.614) (0.407) (0.687) true values 1.5 1 -1 -3 1 3 n = 100 – 1.326 -1.086 -3.747 1.187 3.714 – (1.214) (0.618) (1.382) (0.600) (0.997) n = 250 – 1.064 -1.019 -3.068 1.010 3.643 – (0.513) (0.376) (0.598) (0.399) (0.736) true values 3 1 -1 -3 1 3 n = 100 – 1.473 -1.373 -3.761 1.373 3.982 – (1.044) (1.049) (1.556) (0.913) (1.739) n = 250 – 1.180 -1.256 -3.601 1.186 3.534 – (0.513) (0.591) (1.210) (0.539) (1.111) T able 2: B C el sampler. Case with p kno wn: posterior means ov er the 100 diﬀerent sim ulated datasets for n equal to 100 and 250 with true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 13 p β 0 β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 true values 0.3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 2.597 1.049 -2.401 1.737 -2.945 0.063 -1.617 3.366 -0.785 -1.369 – (0.760) (0.894) (0.993) (0.766) (0.836) (1.001) (0.806) (0.960) (0.762) (0.800) n = 250 – 2.479 1.098 -2.243 1.733 -3.035 0.312 -1.563 3.374 -0.709 -1.259 – (0.569) (0.445) (0.481) (0.539) (0.650) (0.507) (0.527) (0.627) (0.542) (0.486) true values 0.7 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 2.925 1.361 -2.386 1.914 -3.466 0.287 -1.736 3.788 -0.752 -1.539 – (0.719) (0.658) (0.754) (0.619) (0.861) (0.570) (0.697) (0.832) (0.606) (0.680) n = 250 – 2.573 1.098 -2.230 1.686 -3.072 0.179 -1.574 3.369 -0.667 -1.348 – (0.475) (0.370) (0.398) (0.459) (0.528) (0.310) (0.385) (0.505) (0.313) (0.384) true values 1.5 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 3.126 1.401 -2.903 2.049 -3.784 0.311 -1.977 4.220 -0.874 -1.679 – (0.678) (0.583) (0.691) (0.657) (0.767) (0.528) (0.671) (0.918) (0.554) (0.552) n = 250 – 2.816 1.180 -2.440 1.810 -3.265 0.248 -1.735 3.623 -0.680 -1.428 – (0.549) (0.321) (0.517) (0.390) (0.635) (0.264) (0.376) (0.620) (0.276) (0.378) true values 3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 3.442 1.600 -3.048 2.171 -4.101 0.227 -2.225 4.558 -1.021 -1.779 – (0.672) (0.607) (0.739) (0.577) (0.764) (0.554) (0.673) (0.870) (0.517) (0.517) n = 250 – 3.005 1.236 -2.594 1.954 -3.468 0.291 -1.833 3.864 -0.795 -1.536 – (0.609) (0.297) (0.546) (0.455) (0.750) (0.263) (0.483) (0.734) (0.309) (0.359) T able 3: PG sampler. Case with p known: p osterior means ov er the 100 diﬀeren t sim ulated datasets for n equal to 100 and 250 with true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 , β 8 , β 9 ) = (2 . 3 , 1 , − 2 , 1 . 5 , − 2 . 7 , 0 . 2 , − 1 . 4 , 3 , − 0 . 6 , − 1 . 2). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 14 p β 0 β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 true values 0.3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 2.868 1.058 -2.617 1.379 -3.035 -0.178 -1.687 3.656 -0.920 -0.980 – (1.046) (1.282) (1.527) (1.381) (1.500) (1.488) (1.493) (1.619) (1.261) (1.463) n = 250 – 2.875 1.221 -2.495 1.843 -3.235 0.068 -1.880 3.693 -0.772 -1.116 – (1.239) (1.206) (1.137) (1.333) (1.579) (1.199) (1.343) (1.231) (1.145) (1.313) true values 0.7 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 3.145 1.089 -2.975 1.874 -3.429 0.168 -1.879 3.598 -0.818 -1.497 – (1.819) (1.674) (1.816) (1.898) (1.901) (1.753) (1.852) (2.011) (1.844) (1.910) n = 250 – 2.855 1.150 -2.331 1.716 -3.227 0.149 -1.674 3.445 -0.750 -1.421 – (1.508) (1.501) (1.389) (1.585) (1.783) (1.501) (1.441) (1.612) (1.431) (1.451) true values 1.5 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 4.270 0.988 -1.849 1.139 -1.806 -0.181 -0.929 2.260 -0.662 -0.826 – (4.033) (2.160) (2.320) (2.242) (2.781) (2.282) (2.552) (2.663) (2.032) (1.611) n = 250 – 3.111 1.244 -2.424 1.595 -3.861 0.409 -1.883 4.051 -0.184 -1.593 – (1.083) (1.159) (1.272) (1.441) (1.571) (1.407) (1.422) (1.422) (1.264) (1.186) true values 3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 – 2.823 0.301 -0.607 0.790 -0.966 0.211 -0.270 0.907 0.015 -0.125 – (3.488) (2.364) (2.492) (2.319) (2.964) (2.425) (2.510) (2.449) (2.365) (2.791) n = 250 – 2.933 1.079 -1.769 1.670 -2.906 0.271 -1.350 3.035 -0.830 -1.172 – (1.576) (1.431) (1.929) (1.660) (1.684) (1.687) (1.740) (1.968) (1.352) (1.888) T able 4: B C el sampler.Case with p known: p osterior means ov er the 100 diﬀerent sim ulated datasets for n equal to 100 and 250 with true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 , β 8 , β 9 ) = (2 . 3 , 1 , − 2 , 1 . 5 , − 2 . 7 , 0 . 2 , − 1 . 4 , 3 , − 0 . 6 , − 1 . 2). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 15 Figure 2: Case with p known: sample chains of the parameters p osterior distributions for the simulated data with sample size n = 250 and with true parameter v alues p = 0 . 3 and β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). sim ulations. W e notice that the p osterior means of p are close to their true v alues when p is less than 1, corresp onding to the “heavy tail” situation, but the estimation accuracy drops when p is greater than 1. This similar performance is also conﬁrmed for the p osterior means of the vector of unknown co eﬃcients β = ( β 0 , β 1 , β 2 , β 3 , β 4 ). Generally , increasing the sample size from 100 to 250 leads to b etter estimations of the mo del parameters. T able 8 shows the p osterior means and, in brack ets, the standard deviations ov er the 100 diﬀerent simulated datasets, for the ten-dimensional scenario. These results are in line with the ﬁv e-dimensional case, th us leading us to conclude that increasing the n umber of co v ariates is not an issue in terms of estimation accuracy . Moreo ver, most standard deviations in T ables 6 and 8 decrease for the diﬀerent scenarios and p v alues when w e mo ve from n = 100 to n = 250. W e further compare the p erformance of our P´ oly a-Gamma sampler to the one of B C el . T able 7 and T able 9 shows the p osterior means of p and β , and the standard deviations o ver 100 diﬀeren t 16 V ariables T est Statistic V ariables T est Statistic n = 100 β 0 = 1 -0.8422 n = 250 β 0 = 1 -0.0932 β 1 = − 1 -0.6155 β 1 = − 1 0.9662 β 2 = − 3 0.2379 β 2 = − 3 0.4544 β 3 = 1 -0.2488 β 3 = 1 0.4355 β 4 = 3 0.4159 β 4 = 3 -0.3423 T able 5: Case with p known: Gewek e’s test statistics for the p osterior c hain of the unkno wn co eﬃcien ts, β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) for a simulated dataset of sample size n = 100 (left panel) and n = 250 (right panel) with p = 0 . 3. sim ulated datasets. As in the scenario with p kno wn, we c ho ose a m uch less v aguer normal prior distribution N ( ν = 0 , B = 9 · I k ) for β . Results are based on 20,000 p osterior samples obtained with B C el . It can b e easily seen that when p < 1, the P´ oly a-Gamma sampler outp erforms the B C el substan tially , in terms of b oth the p osterior means and standard deviations across diﬀeren t datasets. The adv antage of P´ oly a-Gamma sampler against B C el reduces when p > 1, ho w ever, the standard deviations still suggest that P´ oly a-Gamma sampler is more consisten t. As done previously with p kno wn, here, with p unknown, we display in Figure 3 the p os- terior chains for a sp eciﬁc dataset with sample size n = 250, where p = 0 . 3 and β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). Despite exhibiting sligh tly slo wer mixing b eha viours com- pared to Figure 2, the c hains, after a burn-in of 1 , 000 iterations, con v erge relatively quickly to the true v alues of all parameters. W e also computed Gewek e’s con vergence tests for the v alues of p and ﬁv e-dimensional β , with n = 100 and n = 250 and p = 0 . 3 2 . T able 10 rep orts the results of Gewek e’s conv ergence test (Gewek e, 1992), that indicate no issues since the absolute v alues of the test statistics are all low er than 1 . 96. 17 p β 0 β 1 β 2 β 3 β 4 true values 0.3 1 -1 -3 1 3 n = 100 0.271 1.400 -1.311 -3.882 1.330 3.946 (0.096) (0.815) (0.667) (0.629) (0.828) (0.689) n = 250 0.230 1.439 -1.480 -4.341 1.442 4.489 (0.056) (0.553) (0.583) (0.565) (0.599) (0.635) true values 0.7 1 -1 -3 1 3 n = 100 0.798 0.995 -0.976 -2.991 1.028 2.884 (0.174) (0.320) (0.394) (0.383) (0.393) (0.384) n = 250 0.821 0.927 -0.959 -3.026 0.966 3.117 (0.214) (0.271) (0.283) (0.345) (0.287) (0.345) true values 1.5 1 -1 -3 1 3 n = 100 1.129 1.388 -1.390 -4.335 1.427 4.351 (0.351) (0.382) (0.405) (0.526) (0.528) (0.490) n = 250 1.164 1.465 -1.436 -4.381 1.490 4.444 (0.265) (0.318) (0.307) (0.579) (0.382) (0.557) true values 3 1 -1 -3 1 3 n = 100 1.593 1.506 -1.576 -4.697 1.607 4.736 (0.387) (0.415) (0.424) (0.563) (0.528) (0.608) n = 250 1.691 1.570 -1.607 -4.778 1.560 4.740 (0.412) (0.267) (0.324) (0.587) (0.289) (0.586) T able 6: PG sampler. Case with p unkno wn: p osterior means ov er the 100 diﬀeren t sim ulated datasets for n equal to 100 and 250 compared with the true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 18 p β 0 β 1 β 2 β 3 β 4 true values 0.3 1 -1 -3 1 3 n = 100 0.559 0.258 -0.264 -0.868 0.345 0.930 (0.520) (0.440) (0.394) (0.577) (0.480) (0.631) n = 250 0.149 0.166 -0.165 -0.427 0.139 0.398 (0.269) (0.507) (0.445) (0.603) (0.484) (0.548) true values 0.7 1 -1 -3 1 3 n = 100 0.807 0.435 -0.576 -1.991 0.498 1.579 (0.698) (0.507) (0.438) (0.570) (0.491) (0.581) n = 250 0.754 0.483 -0.591 -2.076 0.514 1.618 (0.214) (0.471) (0.414) (0.610) (0.437) (0.519) true values 1.5 1 -1 -3 1 3 n = 100 2.464 0.563 -0.617 -1.626 0.595 1.716 (1.378) (0.558) (0.494) (0.489) (0.441) (0.556) n = 250 1.877 0.271 -0.483 -1.619 0.377 1.518 (1.687) (1.013) (0.629) (0.774) (0.583) (0.787) true values 3 1 -1 -3 1 3 n = 100 3.056 0.632 -0.391 -1.385 0.508 1.433 (1.204) (0.630) (0.601) (0.857) (0.551) (0.945) n = 250 2.641 0.478 -0.551 -1.570 0.598 1.634 (1.759) (0.750) (0.705) (0.728) (0.594) (0.723) T able 7: B C el sampler. Case with p unknown: p osterior means ov er the 100 diﬀerent sim ulated datasets for n equal to 100 and 250 compared with the true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (second panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). The v alues in brac kets are the standard deviations ov er the 100 diﬀerent sim ulations. 19 Figure 3: Case with p unknown: sample c hains of the p osterior distribution of the parameters for the simulated data with sample size n = 250 and with true parameter v alues p = 0 . 3 and β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) = (1 , − 1 , − 3 , 1 , 3). 5 Real Data Applications In this Section, w e presen t t w o diﬀerent real data applications to illustrate the accurate and eﬃcien t p erformance of the P´ olya-Gamma sampler for estimating the generalized logistic re- gression. The applications analyse nov el datasets of diﬀerent dimensions, whic h are related to curren t and topical problems in the so cial sciences. In the ﬁrst application, w e consider data selected from the Europ ean So cial Surv ey (ESS), that were previously analysed in Dalla V alle et al. (2019) to identify the determinants of public opinions to wards immigration. The sec- ond application analyses the motives b ehind the recidivism of oﬀenders released from prison 2 Results for the ten-dimensional case are omitted for lac k of space but are av ailable on request. 20 p β 0 β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 true values 0.3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 0.197 3.440 1.395 -3.179 2.271 -3.667 0.097 -2.093 4.356 -1.044 -1.820 (0.063) (0.735) (1.221) (1.128) (0.903) (0.868) (1.126) (1.025) (1.031) (0.948) (0.959) n = 250 0.186 3.685 1.656 -3.341 2.563 -4.497 0.447 -2.309 4.982 -1.070 -1.875 (0.040) (0.696) (0.648) (0.657) (0.706) (0.678) (0.757) (0.667) (0.722) (0.839) (0.646) true values 0.7 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 0.658 2.517 1.132 -2.213 1.739 -3.016 0.202 -1.608 3.264 -0.713 -1.112 (0.197) (0.644) (0.673) (0.550) (0.543) (0.465) (0.575) (0.617) (0.654) (0.642) (0.650) n = 250 0.657 2.653 1.186 -2.286 1.883 -3.172 0.248 -1.593 3.410 -0.693 -1.408 (0.138) (0.306) (0.443) (0.448) (0.435) (0.396) (0.375) (0.289) (0.494) (0.285) (0.303) true values 1.5 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 0.956 3.678 1.657 -3.351 2.397 -4.363 0.316 -2.286 4.876 -1.014 -1.974 (0.329) (0.561) (0.611) (0.548) (0.665) (0.672) (0.650) (0.697) (0.680) (0.645) (0.578) n = 250 0.986 3.818 1.611 -3.306 2.460 -4.423 0.339 -2.362 4.913 -0.921 -1.940 (0.263) (0.430) (0.372) (0.407) (0.408) (0.486) (0.358) (0.385) (0.428) (0.361) (0.405) true values 3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 1.471 3.804 1.731 -3.314 2.389 -4.510 0.308 -2.497 5.029 -1.156 -2.014 (0.378) (0.533) (0.590) (0.595) (0.646) (0.603) (0.568) (0.595) (0.674) (0.517) (0.482) n = 250 1.488 4.008 1.656 -3.449 2.595 -4.600 0.387 -2.426 5.146 -1.042 -2.054 (0.430) (0.540) (0.358) (0.480) (0.441) (0.603) (0.338) (0.434) (0.655) (0.326) (0.373) T able 8: PG sampler. Case with p unkno wn: p osterior means o v er the 100 diﬀerent simulated datasets for n equal to 100 and 250 compared with the true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (sec- ond panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 , β 8 , β 9 ) = (2 . 3 , 1 , − 2 , 1 . 5 , − 2 . 7 , 0 . 2 , − 1 . 4 , 3 , − 0 . 6 , − 1 . 2). The v alues in brack ets are the standard deviations ov er the 100 diﬀerent sim ulations. 21 p β 0 β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 true values 0.3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 0.149 0.554 0.300 -0.697 0.321 -0.690 0.207 -0.412 0.844 -0.232 -0.348 (0.164) (0.590) (0.608) (0.831) (0.568) (0.613) (0.582) (0.657) (0.629) (0.671) (0.609) n = 250 0.056 0.409 0.128 -0.334 0.242 -0.478 0.040 -0.184 0.493 -0.208 -0.178 (0.085) (0.535) (0.402) (0.470) (0.497) (0.560) (0.502) (0.583) (0.566) (0.663) (0.485) true values 0.7 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 0.885 0.847 0.385 -0.573 0.418 -0.650 0.120 -0.401 0.710 -0.258 -0.307 (0.286) (0.604) (0.649) (0.541) (0.589) (0.494) (0.580) (0.682) (0.697) (0.657) (0.662) n = 250 0.641 0.896 0.401 -0.580 0.515 -0.671 0.158 -0.534 0.790 -0.277 -0.401 (0.288) (0.708) (0.749) (0.701) (0.794) (0.755) (0.801) (0.583) (0.651) (0.497) (0.724) true values 1.5 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 2.228 0.955 0.397 -0.454 0.434 -0.631 -0.025 -0.382 0.617 -0.257 -0.147 (2.090) (0.811) (0.773) (1.079) (0.983) (1.121) (0.980) (0.926) (1.363) (0.743) (0.986) n = 250 1.118 1.191 0.423 -0.738 0.595 -0.773 0.085 -0.525 1.284 -0.303 -0.368 (1.543) (0.758) (0.885) (0.948) (0.939) (1.030) (0.970) (0.976) (1.111) (0.958) (0.925) true values 3 2.3 1 -2 1.5 -2.7 0.2 -1.4 3 -0.6 -1.2 n = 100 3.337 0.567 0.055 -0.004 0.149 0.002 -0.028 -0.198 0.005 -0.018 -0.009 (1.909) (1.020) (1.182) (1.252) (1.053) (1.269) (0.983) (1.246) (1.276) (0.976) (1.210) n = 250 2.041 0.995 0.148 -0.376 0.258 -0.379 0.186 -0.249 0.617 -0.110 -0.216 (1.912) (0.827) (1.118) (1.227) (1.087) (1.294) (0.891) (1.070) (1.231) (0.932) (1.144) T able 9: B C el sampler. Case with p unknown: p osterior means o ver the 100 diﬀerent sim ulated datasets for n equal to 100 and 250 compared with the true v alues of p = 0 . 3 (ﬁrst panel), p = 0 . 7 (sec- ond panel), p = 1 . 5 (third panel) and p = 3 (fourth panel) and ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 , β 8 , β 9 ) = (2 . 3 , 1 , − 2 , 1 . 5 , − 2 . 7 , 0 . 2 , − 1 . 4 , 3 , − 0 . 6 , − 1 . 2). The v alues in brack ets are the standard deviations ov er the 100 diﬀerent sim ulations. 22 V ariables T est Statistic V ariables T est Statistic n = 100 p = 0 . 3 -0.507 n = 250 p = 0 . 3 0.914 β 0 = 1 1.721 β 0 = 1 -1.394 β 1 = − 1 -1.308 β 1 = − 1 1.320 β 2 = − 3 -1.293 β 2 = − 3 1.321 β 3 = 1 1.413 β 3 = 1 -1.545 β 4 = 3 0.539 β 4 = 3 -1.259 T able 10: Case with p unkno wn: Gewek e’s test statistics for the p osterior chain of the tail parameter p and of the unknown co eﬃcients, β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) for a simulated dataset of sample size n = 100 (left panel) and n = 250 (righ t panel). b et w een 2016 and 2018 in the State of Iow a, USA. 5.1 Immigration in Europ ean Countries The ﬁrst dataset fo cuses on p eople’s attitudes tow ards immigration. The data w as selected from the ESS and was collected following hour-long face-to-face interviews. W e fo cused on questions regarding immigration from the ESS8 edition 1.0 published in Octob er 2017 and collected b etw een August 2016 and March 2017 (ESS8, 2016, 2018) for Great Britain (GB), German y (DE) and F rance (FR), resp ectiv ely . F or these three coun tries, the total n um b er of observ ations is 5 , 354, of whic h 1 , 419 are for GB; 2 , 284 for DE and 1 , 651 for FR. The dep enden t v ariable is immig , that indicates whether the resp onden t would allow im- migran ts from p o orer coun tries outside Europ e. More precisely , if immig = 1 the resp onden t is against immigration, if immig = 0 the resp ondent is in fa vor of immigration. F ollowing Dalla V alle et al. (2019), w e considered as cov ariates the v ariables: pplfair , trstep , trstun , happy , agea , edulvlb and hinctnta . The ﬁrst three v ariables include answ ers to question 23 ranging from 0 (most negative) to 10 (most p ositiv e): do you think that most p eople try to tak e adv an tage of you ( pplfair )? Do y ou trust the Europ ean parliamen t ( trstep )? Do y ou trust the United Nations ( trstun )? The happy v ariable tak es v alues from 0 (unhappy) to 10 (extremely happy). The last three v ariables are sub ject-sp eciﬁc and include the age, ranging from 15 to 100 ( agea ); the highest level of education, from primary education to doctoral degree ( edulvlb ) and the household’s total net income ( hinctnta ). F or this dataset, w e run 5 , 000 iterations of the P´ olya-Gamma sampler algorithm and w e discarded the ﬁrst 1 , 000 iterations as burn-in. W e used the MLE estimates of β and p as starting p oin ts of the algorithm and w e emplo y ed a normal prior with mean 0, standard deviation 5 times greater than the one estimated with a standard logistic regression. In order to assess the p erformance of the P´ olya-Gamma sampler, w e rep ort the results ob- tained using the empirical likelihoo d approac h on the same dataset, adopting a similar setting. T able 11 lists the results of the P´ olya-Gamma sampler (left-hand side) and the empirical like- liho o d approac h (right-hand side). F or each Europ ean coun try (GB, DE and FR) and for eac h parameter, T able 11 sho ws the p osterior means, the 2.5% and 97.5% quantiles of the p osterior distributions of the P´ oly a-Gamma and the empirical likelihoo d metho ds. The eﬀects of the parameters are similar across coun tries and suggest that the higher the level of education, the higher the probabilit y for p eople to be fav ourable tow ards immigration. Also, trustful people and individuals with a strong conﬁdence in the Europ ean parliament tend to b e in fa v our of immigration. The p osterior mean estimates obtained with the P´ oly a-Gamma metho d are in line with those obtained with the empirical lik eliho o d. F or some of the co v ariates, suc h as pplfair and trstep , the empirical lik eliho o d approac h assigns posterior supp ort to the v alue zero, while the corresp onding P´ olya-Gamma credible interv als (CIs) do not include zero. In addition, b oth metho ds agree on the estimation of the p parameter for all coun tries, suggesting hea vy tails for the generalised logistic regression. Therefore, the P´ olya-Gamma sampler outp erforms the 24 empirical lik eliho o d metho d for the estimation of the generalised logistic regression, since it pro vides more precise parameter estimates as for the β . 5.2 Recidivism of Oﬀenders Released from Prison in Io w a The second dataset contains information about the recidivism of oﬀenders released from prison b et w een 2016 and 2018 in Io wa 3 . Recidivism happ ens when a prisoner is released from jail and he or she relapses bac k in to criminal b ehavior and returns to jail. W e applied the generalised logistic regression to determine whether a prisoner is lik ely to recidiv ate based on his or her c haracteristics. After removing missing v alues, the data consists of 13 , 644 records corresponding to oﬀenders detained in prison and released from 2016 to 2018. The dep enden t v ariable is recidivism , whic h is equal to 1 if the prisoner is arrested again within a 3-year tracking p eriod after b eing released, and it is equal to 0 if he or she has not returned to prison within the trac king p erio d. W e mo del the probability of b eing recidivist using a subsample of all the explanatory v ari- ables presen t in the original database: the sex of the oﬀender ( sex , which is equal to 1 for men and 0 for w omen); the age when released from prison ( age ); whether an oﬀender commit- ted a felon y or misdemeanor ( felony , whic h is equal to 1 for felony and 0 for misdemeanor); whether an oﬀender w as c harged with a drug crime ( drug ); whether an oﬀender w as c harged with a public order crime ( puborder ); whether an oﬀender was c harged with a violent crime ( violent ); and whether an oﬀender was released on disc harge or parole ( discharge , which is equal to 1 for discharge and 0 for parole). In this second example, w e run the MCMC algorithm using 5 , 000 iterations and discarding 3 See the following website for details: https://data.io wa.go v/Correctional-System/3-Y ear-Recidivism-for- Oﬀenders-Released-from-Pris/m w8r-vqy4. 25 P´ olya-Gamma Sampler Empirical Lik eliho od Country P arameters p ost. mean 2.50% 97.50% p ost. mean 2.50% 97.50% GB const -0.8 -2.441 0.541 -0.4202 -2.4377 1.4487 pplfair -0.223 -0.552 -0.068 -0.1822 -0.3176 0.0113 trstep -0.362 -0.888 -0.123 -0.1731 -0.3796 0.0447 trstun -0.075 -0.302 0.046 -0.0372 -0.2354 0.1409 happy 0.047 -0.099 0.241 -0.0133 -0.2892 0.1209 agea 0.015 -0.001 0.04 0.0094 -0.0139 0.0358 edulvlb -0.004 -0.009 -0.001 -0.0021 -0.0039 -0.0006 hinctnta 0.017 -0.105 0.148 -0.044 -0.1856 0.1228 p 0.815 0.2 1.944 0.8218 0.4284 1.5696 DE const -1.152 -2.648 0.29 -1.1452 -2.4617 0.6941 pplfair -0.301 -0.616 -0.114 -0.1612 -0.3291 -0.0093 trstep -0.429 -1 -0.148 -0.1524 -0.2839 -0.0107 trstun -0.091 -0.33 0.085 -0.0112 -0.2115 0.1847 happy -0.169 -0.46 0.003 -0.0873 -0.2438 0.0659 agea 0.036 0.013 0.077 0.0124 -0.0061 0.0375 edulvlb 0 -0.004 0.002 -0.0007 -0.0026 0.0016 hinctnta -0.2 -0.533 -0.042 -0.1171 -0.2352 0.0414 p 0.608 0.155 1.814 0.7338 0.2955 1.7036 FR const 1.054 -0.35 3.361 0.8442 -0.3443 1.4314 pplfair -0.111 -0.347 0.01 -0.0921 -0.1898 0.0193 trstep -0.364 -0.872 -0.107 -0.1912 -0.3592 -0.0114 trstun -0.122 -0.353 0.004 -0.0514 -0.2047 0.1417 happy 0.029 -0.129 0.227 0.0191 -0.0609 0.1545 agea 0.001 -0.013 0.021 -0.0043 -0.0155 0.0091 edulvlb -0.005 -0.012 -0.002 -0.0038 -0.0056 -0.0027 hinctnta -0.092 -0.275 0.016 -0.1061 -0.239 0.0286 p 0.827 0.159 2.055 0.7845 0.3991 1.2809 T able 11: EU immigration data results for Great Britain (GB), Germany (DE) and F rance (FR), obtained with the P´ oly a-Gamma (left-hand side) and the empirical likelihoo d (righ t- hand side) metho ds. F or eac h parameter, columns 3 and 6 sho w the p osterior means, columns 4 and 7 show the 2.5% quantiles of the p osterior distribution, columns 5 and 8 sho w the 97.5% quan tiles of the p osterior distribution. 26 the ﬁrst 1 , 000 iterations. As in the immigration example, we adopted v ague priors and w e c hose the initial v alues of β and p based on the corresp onding MLE estimators. P´ olya-Gamma Sampler Empirical Likelihoo d Parameters post. mean 2.50% 97.50% p ost. mean 2.50% 97.50% const -0.1443 -0.3811 0.0232 -0.4278 -0.9341 0.1719 sex 0.311 0.1468 0.5873 0.1371 -0.2583 0.4574 age -0.0004 -0.0007 -0.0002 -0.0003 -0.0005 -0.0002 felony -0.0941 -0.2521 0.0184 -0.0974 -0.4457 0.0999 drug -0.0308 -0.1496 0.0774 -0.0996 -0.4009 0.1912 puborder -0.2769 -0.5663 -0.116 -0.0694 -0.3384 0.2261 violent -0.5759 -1.0713 -0.3271 -0.4743 -0.9779 -0.2638 discharge -0.5736 -1.068 -0.3225 -0.4654 -0.6726 -0.1936 p 0.9343 0.3088 1.8439 0.5679 0.3409 1.0905 T able 12: Criminal recidivism data results obtained with the P´ oly a-Gamma (left-hand side) and the empirical likelihoo d (right-hand side) metho ds. F or eac h parameter, columns 2 and 5 show the p osterior means, columns 3 and 6 show the 2.5% quan tiles of the p osterior distribution, columns 4 and 7 show the 97.5% quantiles of the p osterior distribution. As w e did with the immigration data, we compared the p erformances of the P´ oly a-Gamma sampler and the empirical lik eliho o d approac hes, using the same settings for b oth algorithms. The results for the parameters β and p are listed in T able 12, that displa ys the P´ olya-Gamma results on the left-hand side and the empirical likelihoo d results on the right-hand side. T able 12 illustrates the p osterior means, the 2.5% and 97.5% quantiles of the p osterior distributions of the P´ olya-Gamma and the empirical lik eliho o d metho ds. The results suggest that men are more lik ely to recidiv ate than w omen, y ounger prisoners are more lik ely to b ecome rep eat oﬀenders than older prisoners, there is a low er probabilit y that the criminal is a recidivist if he or she w as charged with a public order or violen t crime and oﬀenders released on parole are more lik ely to return to jail than those released on discharge. 27 The sign of the posterior means for both approac hes is the same, suggesting a similar direction for the eﬀects obtained using the P´ olya-Gamma and the empirical lik eliho o d metho d. Ho w ever, the CIs of the ﬁrst approach are narrow er than those of second approach. Moreo ver, the P´ olya-Gamma metho d, unlike the empirical lik eliho o d, do es not giv e posterior supp ort to zero to the v ariables sex and puborder , suggesting that the inclusion of these v ariables in the mo del is appropriate. In agreement with our ﬁndings with the immigration data, the results obtained with the recidivism dataset demonstrate that the P´ oly a-Gamma sampler metho d yields a higher esti- mates precision and accuracy compared to the empirical likelihoo d metho d for the co v ariate parameters β . 6 Conclusions This pap er in tro duces a nov el D A sc heme for the generalized logistic regression mo del. This mo del is particularly ﬂexible since it is able to accommo date b oth ligh t and heavy tails in dic hotomous resp onse data. The prop osed D A scheme makes use of the P´ olya-Gamma identit y and it is strongly related to the slice sampler algorithm. The P´ oly a-Gamma sampler allows us to implemen t a Bay esian metho d able to dra w samples from the exact p osterior distribution. On the contrary , other methods, suc h as the empirical lik eliho o d approach, are based on appro xi- mations of the posterior and ma y lead to lo w precision in the estimation of the parameters. Our sim ulation study demonstrates the estimation accuracy of the newly prop osed P´ olya-Gamma sampler with datasets of diﬀeren t dimensions. W e compared the p erformances of the P´ olya- Gamma and the empirical lik eliho o d methods for mo delling new interesting datasets regarding the opinion on immigration in Europ ean countries and the probability of b eing recidivist for prisoners in Io wa, USA. Our results demonstrate the sup eriorit y of the P´ olya-Gamma sampler 28 o v er the empirical lik eliho o d in terms of parameter precision. The P´ olya-Gamma metho d al- lo ws a more accurate estimation of the tail parameter for the generalised logistic regression compared to approximate metho ds. References Alb ert, J. H. and Chib, S. (1993). Ba yesian analysis of binary and polychotomous resp onse data. Journal of the A meric an Statistic al Asso ciation , 88(422):669–679. Bro oks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011). Handb ook of mark ov chain mon te carlo. Choi, H. M. and Hob ert, J. P . (2013). The p oly a-gamma gibbs sampler for bay esian logistic regression is uniformly ergo dic. Ele ctr on. J. Statist. , 7:2054–2064. Dalla V alle, L., Leisen, F., Rossini, L., and Zh u, W. (2019). Ba y esian analysis of immigration in europ e with generalized logistic regression. Journal of Applie d Statistics , 0(0):1–15. ESS8 (2016). ESS Round 8: Europ ean Social Surv ey Round 8 Data. Data ﬁle e dition 2.0. NSD - Norwe gian Centr e for R ese ar ch Data, Norway ??? Data A r chive and distributor of ESS data for ESS ERIC . ESS8 (2018). ESS Round 8: European So cial Survey . ESS-8 2016 Do cumentation R ep ort. Edition 2.0. Ber gen, Eur op e an So cial Survey Data Ar chive, NSD - Norwe gian Centr e for R ese ar ch Data for ESS ERIC . Gew ek e, J. (1992). Ev aluating the accuracy of sampling-based approac hes to calculating p os- terior momen ts. In Bernardo, J. M., Berger, J., Da wid, A. P ., and Smith, J. F. M., editors, Bayesian Statistics 4 , pages 169–193. Oxford Universit y Press, Oxford. 29 Hensley, A. A. and Djuri ´ c, P . M. (2017). Nonparametric learning for hidden marko v mo dels with preferen tial attac hment dynamics. In 2017 IEEE International Confer enc e on A c oustics, Sp e e ch and Signal Pr o c essing (ICASSP) , pages 3854–3858. Hob ert, J. P . and Marc hev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and px-da algorithms. A nn. Statist. , 36(2):532–554. Holmes, C. C., Held, L., et al. (2006). Ba y esian auxiliary v ariable mo dels for binary and m ultinomial regression. Bayesian analysis , 1(1):145–168. Karabatsos, G. and Leisen, F. (2018). An appro ximate likelihoo d p ersp ectiv e on ab c metho ds. Statist. Surv. , 12:66–104. Leisen, F., Rossini, L., and Villa, C. (2017). A note on the p osterior inference for the yule–simon distribution. Journal of Statistic al Computation and Simulation , 87(6):1179–1188. Liu, J. and W u, Y. (1999). P arameter expansion for data augmen tation. Journal of the Amer- ic an Statistic al Asso ciation , 94. Meng, X.-L. and v an Dyk, D. (1999). Seeking eﬃcien t data augmentation sc hemes via condi- tional and marginal augmen tation. Biometrika , 86(2):301–320. Mengersen, K. L., Pudlo, P ., and Rob ert, C. P . (2013). Bay esian computation via empirical lik eliho o d. Pr o c e e dings of the National A c ademy of Scienc es , 110(4):1321–1326. Neal, R. M. (2003). Slice sampling. A nn. Statist. , 31(3):705–767. Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Con v ergence Diagnosis and Output Analysis for MCMC. R News , 6(1):7–11. 30 P olson, N. G., Scott, J. G., and Windle, J. (2013). Ba y esian inference for logistic mo d- els using p´ oly a–gamma latent v ariables. Journal of the Americ an Statistic al Asso ciation , 108(504):1339–1349. T anner, T. A. and W ong, W. H. (1987). The calculation of p osterior distributions b y data augmen tation. Journal of the A meric an Statistic al Asso ciation , 82(398):528–540. v an Dyk, D. A. and Meng, X.-L. (2001). The art of data augmentation. Journal of Computa- tional and Gr aphic al Statistics , 10(1):1–50. Windle, J., P olson, N. G., and Scott, J. G. (2016). Ba yesLogit: Bay esian Logistic Regression. T echnical rep ort, R Pac k age V ersion 0.6. Zh u, W. and Leisen, F. (2016). A b o otstrap likelihoo d approac h to ba y esian computation. A ustr alian and New Ze aland Journal of Statistics , 58(1):227–244. 31

A Polya-Gamma Sampler for a Generalized Logistic Regression

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment