Simultaneous adjustment of bias and coverage probabilities for confidence intervals

Simultaneous adjustment of bias and coverage probabilities for conﬁdence intervals P . M E N ´ E N D E Z 1 Y . F A N , 2 P . H . G A RT H WA I T E 3 and S . A . S I S S O N 2 September 9, 2021 A B S T R A C T A new method is proposed for the correction of conﬁdence intervals when the original interval does not have the correct nominal coverage pr obabilities in the frequentist sense. The proposed method is general and does not requir e any distributional assumptions. It can be applied to both frequentist and Bayesian inference where interval estimates are desired. W e pr ovide theoretical r esults for the consistency of the pr oposed estimator , and give two complex examples, on conﬁdence interval correction for composite likelihood estimators and in appr oximate Bayesian computation (ABC), to demonstrate the wide applicability of the new method. Comparison is made with the double-bootstrap and other methods of improving conﬁdence interval coverage. Keywords : Conﬁdence interval correction; Coverage probability; Composite likelihood; Approximate Bayesian computation. 1 Introduction Interval estimates are typically intended to have a speciﬁed level of coverage. This is true, for example, of both frequentist conﬁdence intervals and Bayesian credible inter- vals. However , for many problems the coverage of a conﬁdence or credible interval will only equal its nominal value asymptotically , and coverage can be poor even for quite large samples in some situations. In many complex problems, there can be inherent bias which can be difﬁcult to quantify or calculate. This can arise, for example, in composite likelihood problems (V arin et al. 2011) and approximate Bayesian computation (Sisson and Fan 2011). In this paper we propose a novel procedure for adjusting interval esti- mates that has wide application and will typically reduce the bias in their coverage. The pr ocedure assumes that the mechanism that generated the sample data could be sim- ulated if population parameters wer e known. These parameters are estimated by sample statistics derived from r eal data, and then pseudo-samples are drawn from the estimated population distribution. From each pseudo-sample a conﬁdence/credible interval is de- termined for the quantity of interest. The frequentist bias in these intervals is calculated 1 School of Mathematics and Physics, University of Queensland, St Lucia, 4072, AUSTRALIA 2 School of Mathematics and Statistics, University of New South W ales, Sydney , 2052, AUSTRALIA 3 Department of Mathematics and Statistics, Open University , Milton Keynes, MK7 6AA, U.K. 1 and then used to adjust the interval estimate given by the real data. The method has similarities to the double-bootstrap (Davison and Hinkley 1997), in which a bias correction is applied to a bootstrap interval by re-sampling from a boot- strap distribution. A differ ence in our method is that it involves only one level of sample generation, which makes it computationally less demanding, although the saving may dissipate if computationally demanding methods (such as Markov chain Monte Carlo) are used to obtain the interval estimates fr om sample data. Several authors have looked at the problem of computing conﬁdence intervals with the correct coverage properties. Much of this work is based on variations of bootstrap proce- dures. In particular , Hall (1986) considered the bootstrap in terms of Edgeworth expan- sions, and Beran (1987) pr ovided a method for appr oximate conﬁdence sets, by using the bootstrap (or asymptotic theory) to estimate the relevant quantiles. This so-called pre- pivoting , based on an estimated bootstrap cumulative distribution function, is iterated to produce improved coverage. Efron (1987) introduced a method to correct the coverage of a bootstrap conﬁdence interval within the bootstrap itself, and Martin (1990) pr oposed an iterated bootstrap procedure to obtain better bootstrap coverage. The so called dou- ble bootstrap, where one or more additional levels of bootstrap are conducted to adjust conﬁdence limits, was disccussed by DiCiccio et al. (1992). In a non-bootstrap procedur e, Garthwaite and Buckland (1992) proposed a method to compute conﬁdence intervals based on Monte Carlo simulations of the Robbins-Monro search process. In the context of autor egressive models, Hansen (1999) pr esented a method based on bootstrap replica- tions over a grid to compute conﬁdence intervals in situations where standar d bootstrap methods fail. There are also several appr oaches for computing conﬁdence intervals in the presence of nuisance parameters (Kabaila 1993; Kabaila and Lloyd 2000; Lloyd 2011). In some of our examples, we apply our procedure to reduce bias in the coverage of Bayesian credible intervals. This may not seem intuitive, since coverage is a frequentist property while a Bayesian interval may reﬂect personal probabilities. However , there are many situations where posterior distributions should preferably be well calibrated. These include infer ence with objective or probability matching prior distributions, the veriﬁcation of Bayesian simulation software (Cook et al. 2006) and techniques and diag- nostics in likelihood-free Bayesian inference (Fearnhead and Prangle 2012; Prangle et al. 2012). In Section 2 we describe the pr oposed method and give theor etical results related to it, il- lustrating them through simulated examples and comparing the resulting intervals with bootstrap intervals. In Section 3 we apply our method to two more complex, real anal- yses. One of these involves estimation with composite likelihoods, which is known to produce conﬁdence intervals that are too narrow , and the other involves approximate Bayesian computation, which typically gives larger posterior credibility intervals than desired. Some concluding comments are given in Section 4. 2 2 Coverage correction for conﬁdence intervals Suppose we are interested in estimating an equal-tailed 100(1 − α )% conﬁdence interval for some parameter θ ∈ Θ ⊆ R . Thus for observed data x , we seek an estimate L c ( x ) , such that P ( θ ≤ L c ( x )) = α/ 2 , where L c ( x ) denotes the lower limit of the interval. Similarly for the upper limit, we seek an estimate U c ( x ) , such that, P ( θ ≥ U c ( x )) = α/ 2 . In the frequentist setting, the parameter θ is considered a ﬁxed quantity and the expres- sions above ar e written in terms of pivotal functions of the data, x . In the Bayesian setting, the credible interval is computed from the quantiles of the posterior distribution of θ . In an abuse of notation, we will use the above notations in both frequentist and Bayesian cases. Suppose that we have a method of obtaining estimates, L ( x ) and U ( x ) , of the correct lower and upper interval bounds, L c ( x ) and U c ( x ) . W e do not assume that these esti- mates produce the correct coverage probability . However , we do assume that the pop- ulation parameters, θ , can be well approximated from the data. Our goal is to provide a method that gives adjustments to L ( x ) and U ( x ) that improve the interval’s coverage. W e ﬁrst give theoretical results for the proposed methodology , and then give details of its implementation. 2.1 Theoretical results Assumption 1 We suppose that the observed data x come from the model given by f ( x | θ ) , θ ∈ Θ . For any θ ∈ Θ , we assume that it is possible to simulate from f ( ·| θ ) . Assumption 2 Given θ and data x ∼ f ( x | θ ) ther e exists a consistent estimator ˜ θ of θ . Assumption 1 requires that we are able to simulate replicate data from the model given the values of the parameters. Assumption 2 requir es that we have a good estimator for θ , so that interval estimates obtained using ˜ θ converge to those estimates obtained using the population parameter θ , as the amount of data gets large. In the following, we only require the lengths of the intervals to be consistent. Conse- quently , Assumption 2 is not always necessary . For example, this occurs if θ represents a location parameter whose conﬁdence interval has a length that is independent of θ (see later example). In the frequentist setting, the maximum likelihood estimator of θ is consistent and unbiased in many ﬁnite sample situations. In the Bayesian setting, the posterior distribution is consistent under mild assumptions, and the posterior mean esti- mate of θ is asymptotically unbiased. However , in both cases, ﬁnite sample bias in ˜ θ may render our method less accurate. Theorem 2.1 For some θ and x ∼ f ( x | θ ) , let L ( x ) be an estimator of the lower limit of a 100(1 − α )% level conﬁdence interval, and suppose that P { θ < L ( x ) } 6 = α/ 2 . 3 Let G { W } denote the distribution function of a random variable W . Consider the new estimator L c ( x ) = L ( x ) + ξ α/ 2 , (1) where ξ α/ 2 is the α/ 2 -th quantile of the distribution function G { θ − L ( x ) } , so that G { θ − L ( x ) } ( ξ α/ 2 ) = α/ 2 . Then the new estimator , L c ( x ) , will have the correct coverage pr obability P { θ < L c ( x ) } = α/ 2 . Proof: See Appendix. From the above theorem, it can then be seen that for the estimator of the upper limit of a 100(1 − α )% conﬁdence interval, U ( x ) , we can write U c ( x ) = U ( x ) + ξ 1 − α/ 2 , (2) where ξ 1 − α/ 2 is the (1 − α/ 2) -th quantile of the distribution function G { θ − U ( x ) } , so that G { θ − U ( x ) } ( ξ 1 − α/ 2 ) = 1 − α/ 2 . In this case, we then have that P { θ > U c ( x ) } = α/ 2 . Theorem 2.2 For some θ and observed data x ∼ f ( x | θ ) , suppose the lower limit of a 100(1 − α )% conﬁdence interval L ( x ) can be obtained, and that this estimate does not necessarily give the correct coverage probability . Suppose that ˜ θ ∈ R is a consistent estimator of θ , evaluated using the data x . Let y 1 , . . . , y n be n replicate datasets simulated independently from f ( ·| ˜ θ ) , and denote the corresponding lower conﬁdence limits by L 1 ( y 1 ) , . . . , L n ( y n ) , obtained in the same manner as L ( x ) . Deﬁne ˆ G { ˜ θ − L ( y ) } (  ) = 1 n n X i =1 I { ˜ θ − L i ( y i ) < } as the empirical distribution of ˜ θ − L ( y ) based on the observed values of ˜ θ − L i ( y i ) , i = 1 , . . . , n . If we deﬁne ˜ L c ( x ) = L ( x ) + ˆ ξ α/ 2 (3) where ˆ ξ α/ 2 = ˆ G − 1 { ˜ θ − L ( y ) } ( α/ 2) , then ˜ L c ( x ) is a consistent estimator of L c ( x ) , as deﬁned in Equa- tion (1). Proof: See Appendix. In combination, Theorems 2.1 and 2.2 state that if we simulate data y 1 , . . . , y n ∼ f ( y | ˜ θ ) and subsequently obtain the conﬁdence limits L 1 ( y 1 ) , . . . , L n ( y n ) in the same way as for the original data x , then we can correct the bias in the original lower limit estimate, L ( x ) , by addition of the α/ 2 -th sample quantile of ˜ θ − L 1 ( y 1 ) , . . . , ˜ θ − L n ( y n ) . Corollary 1 Under the assumptions in Theorem 2.2, a central limit theorem holds for ˜ L c ( x ) . Speciﬁcally , for all α ∈ (0 , 1) , θ ∈ R and x ∼ f ( x | θ ) , we have that √ n ( ˜ L c  x ) − L c ( x )  G 0 { θ − L ( x ) } ( ξ α ) − → N (0 , α (1 − α )) as n → ∞ , where G 0 { θ − L ( x ) } ( ξ ) = ∂ ∂ ξ G { θ − L ( x ) } ( ξ ) , and ξ α is the α -th quantile of G { θ − L ( x ) } . 4 Proof: The result follows immediately from Equation (7) of the proof for Theorem 2.2 (see Appendix). The above theoretical results pr ovide a simple way of estimating corrections to the lower and upper conﬁdence limits that will produce the correct nominal coverage probability . In addition, these estimators are consistent and asymptotically normal. 2.2 Correction procedure In summary , the correction algorithm has the following steps: Step 1 Obtain L ( x ) and U ( x ) , the upper and lower limits of the desired 100(1 − α )% conﬁdence interval for the parameter θ , for an observed dataset x . Step 2 Evaluate ˜ θ and generate n independent datasets y 1 , . . . , y n ∼ f ( y | ˜ θ ) from the model. Step 3 For each dataset y i , compute the 100(1 − α )% lower and upper conﬁdence limits, L i ( y i ) and U i ( y i ) , for the parameter ˜ θ , using the same method as in Step 1. Step 4 Set the corrected lower and upper limits to ˜ L c ( x ) = L ( x ) + ˆ G − 1 { ˜ θ − L ( y ) } ( α/ 2) ˜ U c ( x ) = U ( x ) + ˆ G − 1 { ˜ θ − U ( y ) } (1 − α/ 2) where ˆ G − 1 { W } ( α ) denotes the α -th sample quantile of the random variable W . 2.3 Simple examples W e illustrate the above procedur e with two simple examples. In the ﬁrst, we consider conﬁdence interval correction for the mean parameter of a normal distribution with known variance. In the second example, it is assumed that the mean is known and that we are interested in the variance parameter . Example 1: Normal distribution with known variance Suppose that θ is the location parameter of a Normal distribution with unit variance, so that x i ∼ N ( θ , 1) where x = ( x 1 , . . . , x m ) . In this case, the maximum likelihood estimator is ˜ θ = ¯ x = P i x i /m . For illustration, we suppose that the conﬁdence interval we ob- tain for θ does not have the correct coverage, in that we obtain the equivalent conﬁdence interval when data are generated from x i ∼ N ( θ , (1 +  ) 2 ) with  ≥ 0 . The value of  controls the amount of error in the coverage probability . Following the usual frequentist approach, the 100(1 − α )% conﬁdence interval for θ is given by L ( x ) = ¯ x − z α/ 2 (1 +  ) / √ m and U ( x ) = ¯ x + z 1 − α/ 2 (1+  ) / √ m , where z α is the α -th quantile of the standard normal dis- tribution. Clearly the correction for the interval when  > 0 is L c ( x ) = L ( x ) + z α/ 2 / √ m and U c ( x ) = U ( x ) − z 1 − α/ 2 / √ m . Figure 1 displays the results of the correction procedure for a 95% conﬁdence interval based on 100 replicate analyses. Each analysis is based on samples of size m = 20 with 5 θ = 0 , so that x 1 , . . . , x m ∼ N (0 , 1) , and n = 100 replicated samples y 1 , . . . , y n with ele- ments drawn fr om N ( ˜ θ , 1) . Figure 1 (top plots) illustrates the corrected conﬁdence limits ˜ L c ( x ) and ˜ U c ( x ) for a range of error term values,  . Clearly the correction produces an unbiased adjustment, as the boxplots ar e centred on the true conﬁdence bounds (the hor - izontal line) in each case. Further , the performance of the method produces qualitatively the same corrected interval limits, irr espective of the value of  . The bottom plots display the corrections ˜ L c ( x ) and ˜ U c ( x ) with  = 1 ﬁxed, for a range of values of ˜ θ . For this example, choosing ˜ θ to be any arbitrary value will r esult in the same quality of unbiased correction. This arises as the distributions of ˜ θ − L ( y ) and ˜ θ − U ( y ) do not change with ˜ θ , so that the conﬁdence intervals all have the same width as ˜ θ varies. As this is a location parameter only analysis, this is one case where Assumption 2 is not requir ed to produce a consistent adjustment (see Section 2.1). Example 2: Normal distribution with known mean Suppose now that θ is the scale (variance) parameter of a Normal distribution with mean zero, so that x i ∼ N (0 , θ ) . Her e we specify ˜ θ = S 2 = 1 m − 1 P m i =1 ( x i − ¯ x ) 2 as the sam- ple variance. In this setting, suppose that the regular conﬁdence limits for θ are biased downwards by a constant value  > 0 . Speciﬁcally , the 100(1 − α )% conﬁdence interval for θ is given by L ( x ) = ( m − 1) S 2 χ 2 1 − α/ 2; m − 1 −  and U ( x ) = ( m − 1) S 2 χ 2 α/ 2; m − 1 −  , where χ 2 α,k denotes the α -th percentile of a χ 2 k distribution with k degrees of fr eedom. Figure 2 shows the results of the correction procedure for the lower limit of a 95% con- ﬁdence interval based on 100 replicate analyses. Each analysis uses samples of size m with θ = 1 , so that x 1 , . . . , x m ∼ N (0 , 1) , and n = 2000 replicated samples y 1 , . . . , y n with elements drawn from N (0 , ˜ θ ) . Figur e 2 (left panel) illustrates the corrected lower conﬁ- dence limit, ˜ L c ( x ) , based on a sample of size m = 20 , for a range of ﬁxed values of ˜ θ . The extreme left and right boxplots correspond to the raw biased ( L ( x ) ) and true unbiased ( L c ( x ) ) limits respectively . Clearly , as ˜ θ changes, then so does the location of the adjusted limits. This occurs as, in contrast with the above example, the distributions of ˜ θ − L ( y ) and ˜ θ − U ( y ) clearly do change with ˜ θ . When ˜ θ = θ = 1 , then the correction procedure produces the correct adjusted limits, as indicated by the rightmost boxplot. Hence, it is necessary to use the right value for ˜ θ when making the correction. Assumption 2 requir es that ˜ θ is a consistent estimator of θ . Hence we can be sure that ˜ θ → θ as m → ∞ , and as a result that the distribution of ˜ θ − L ( y ) approaches that of θ − L ( x ) , so that our correction procedure will perform correctly for large enough m . In practice, the requir ed value of m can be moderate. Figure 2 (right panel) shows how the correction error , ˜ L c ( x ) − L c ( x ) , varies as a function of m . Clearly , the median error is close to zero even for small sample sizes. However , there is some asymmetry for small m , which is also visible in the left panel (e.g. compare the differences in the bias in the boxplots with ˜ θ = 0 . 4 and ˜ θ = 1 . 6 ), although this is eliminated as m increases. Finally , T able 1 compares the empirical coverage probabilities for 95% conﬁdence inter- vals for both µ and σ 2 in Examples 1 and 2, using our correction procedur e and the parametric bootstrap (e.g. Davison and Hinkley 1997). 6 ● 20 13.33 6.67 0 −2.2 −2.0 −1.8 ε L ~ c ( x ) 20 13.33 6.67 0 1.7 1.8 1.9 2.0 2.1 2.2 ε U ~ c ( x ) ● ● −2 −1 0 1 2 −2.1 −2.0 −1.9 −1.8 −1.7 θ ~ L ~ c ( x ) ● ● ● −2 −1 0 1 2 1.7 1.8 1.9 2.0 2.1 2.2 θ ~ U ~ c ( x ) Figure 1: Lower ˜ L c ( x ) and upper ˜ U c ( x ) corrected conﬁdence limit estimates for 100 repli- cated analyses for the normal location model. T op plots show the corrected limits for  = 20 , 13 . 33 , 6 . 67 , 0 with ˜ θ = ¯ x . Bottom plots show the corrected limits for ˜ θ = − 2 , − 1 , 0 , 1 , 2 with  = 1 . The horizontal lines represent the 0 . 025 -th (left plots) and 0 . 975 -th (right plots) percentiles of a standard normal distribution. 7 ● L ( x ) 0.4 0.6 0.8 1 1.2 1.4 1.6 L c ( x ) −0.5 0.0 0.5 1.0 θ ~ L ~ c ( x ) ● ● ● ● ● ● ● ● 10 20 50 100 −1.0 −0.5 0.0 0.5 m L ~ c ( x ) − L c ( x ) Figure 2: Left panel: Boxplots of the corrected lower conﬁdence limit, ˜ L c ( x ) , when holding ˜ θ ﬁxed at various values ˜ θ = 0 . 4 , ..., 1 . 6 (true value is ˜ θ = 1) . Leftmost and rightmost boxplots correspond to the biased ( L x ) and true unbiased ( L c ( x ) ) lower limits respectively . Right panel: Boxplots of the correction error ˜ L c ( x ) − L ( x ) as a function of observed data sample size m . All boxplots are based on 100 r eplicate analyses. 8 Example 1: µ Example 2: σ 2  P CP B CB DB  P CP B CB DB 0 . 00 0.956 0.951 – – – 0 . 00 0.954 0.954 – – – 6 . 67 1.000 0.954 – – – 0 . 20 0.939 0.952 – – – – – – 0.946 0.949 0.952 – – – 0.918 0.956 0.964 T ime (s) 0.09 0.21 4.53 6.70 0.17 0.27 7.31 8.16 T able 1: Empirical coverage probabilities for 95% conﬁdence intervals for the parameters in Ex- amples 1 and 2, for various methods, and for differing values of error ,  . Columns denote coverage of (P) the pivot-based intervals, (CP) our correction of the pivot intervals, (B) parametric bootstrap intervals, (CB) our correction of the parametric bootstrap intervals, and (DB) the double bootstrap intervals. Coverage probabilities are based on 1 , 000 replicate analyses under each method, and n = 2 , 000 generated datasets y 1 , . . . , y n for the corrected pivotal (CP) and corrected bootstrap (CB). The bootstrap results used 2,000 bootstrap samples (B), 99 bootstrap samples for the cor- rected bootstrap (CB), and 2000 × 44 samples for the double bootstrap (DB). T ime (in seconds) indicates the time needed to produce one adjusted r eplicate interval. Where there is no error in the construction of the pivot-based conﬁdence intervals (col- umn P) i.e. for  = 0 , the corrected intervals (column CP) retain the same correct cover- age properties as before the correction. However , we note that as the corrected intervals ( ˜ L c ( x ) , ˜ U c ( x )) ar e estimated by Monte Carlo, for ﬁnite numbers of generated datasets, y 1 , . . . , y n , there will be some non-zero adjustment of each individual conﬁdence inter- val, even when no systematic error is present. In T able 1, n = 2 , 000 datasets were used for each corrected interval. However , in spite of this random adjustment, the corr ect cov- erage is retained over multiple replicates. This point is discussed further in Section 4. The second row of T able 1, shows the same information as the ﬁrst row , except with a non-zero error ,  = 6 . 67 (for µ ) and  = 0 . 20 (for σ 2 ). Clearly the adjusted intervals have the correct nominal coverage. The thir d row in T able 1 illustrates the empirical coverage pr obabilites of 95% conﬁdence intervals based on using the parametric bootstrap (B), our correction of the parametric bootstrap (CB) and the double-bootstrap (DB). Each bootstrap (B) interval was based on 2,000 bootstrap samples, 2,000 × 44 samples for the double bootstrap (DB) (following Mc- Cullough and V inod 1998; Booth and Hall 1994), 99 samples for our correction of the bootstrap (CB), and n = 2000 datasets, y 1 , . . . , y n , for our correction procedur e. These numbers were chosen to provide broadly comparable algorithmic overheads for each method. The bootstrap calculations were implemented using the R package boot , and the double bootstrap conﬁdence intervals were computed as in McCullough and V inod (1998). The parametric bootstrap has pr eviously been observed to have lower than nominal cov- erage (e.g. Schenker 1985; Buckland 1984). In T able 1 this is particularly apparent for σ 2 . Both our correction and the double bootstrap produce improved coverage in each case, although the double bootstrap requires differ ent speciﬁcation (i.e. the number of boot- strap r eplicates at each of two levels) than our approach, which alternatively requires the number of auxiliary datasets, n , and a consistent estimator of θ . While it is difﬁcult to pro- vide similar algorithmic speciﬁcations to permit speed comparisons between the double 9 bootstrap and our correction procedur e, in that they possess different algorithm struc- tures, the recor ded times for each method were broadly similar (T able 1, bottom row), with our procedure slightly faster in both current examples. However , the double boot- strap algorithm has a number of optimisations available (e.g. Nankervis 2005), whereas our procedur e was implemented with unoptimised code. Broadly , the two approaches are comparable in the pr esent analyses. 3 Real examples W e now consider interval estimation in two real, complex modelling situations. The ﬁrst is an application of composite likelihood techniques in the modelling of spatial ex- tremes. W ith composite likelihoods, deriving unbiased conﬁdence intervals can require a large amount of algebra, whereas biased intervals that are typically too narr ow ar e easily computable. The second is an application of approximate Bayesian computation (ABC) methods in the modelling of a time series of g -and- k distributed observations. In most practical settings the mechanism behind the model ﬁtting process within the ABC frame- work typically gives posterior credible intervals that ar e too large. In both analyses, the parameter θ is a vector of d > 1 dimensions. However , our coverage correction procedure is a univariate method as it is based on quantiles. As such, after obtaining the consistent vector estimator ˜ θ , we correct the coverage probabilities of each element of the parameter vector in turn, and obtain conﬁdence intervals that achieve the correct nominal mar ginal coverage probabilities. 3.1 Spatial extremes via composite likelihoods In the context of analysing spatial extremes, Padoan et al. (2010) developed a pairwise composite likelihood model, for inference using max-stable stationary processes. Specif- ically , for m annual maximum daily rainfall observations, at each of K spatial locations, the pairwise composite likelihood was speciﬁed as ` C ( θ | x ) = X i 0 are weights such that P i,j w ij = 1 . Under the usual regularity conditions, the maxi- mum composite likelihood estimator , ˆ θ , can provide asymptotically unbiased and nor- mally distributed parameter estimates when standard likelihood estimators are unavail- able (e.g. V arin et al. 2011). Speciﬁcally , we have (e.g. Huber 1967) that ˆ θ ∼ N ( θ , ˜ I − 1 ( ˆ θ )) , with I ( θ ) = H ( θ ) J ( θ ) − 1 H ( θ ) , (4) where H ( θ ) and J ( θ ) are respectively the expected information matrix and the covariance matrix of the score vector . In the ordinary maximum likelihood setting, H ( θ ) = J ( θ ) . In the max-stable process framework, Padoan et al. (2010) provided an analytic expres- sion for J ( θ ) for a particular (Gaussian) spatial dependence model. Combined with the 10 standard numerical estimates of H ( θ ) , this allowed for the construction of standard conﬁ- dence intervals for θ . However , for composite likelihood techniques in general, obtaining analytic expressions or numerical estimates of J ( θ ) can be challenging, whereas estimates of H ( θ ) are readily available. In this example, we demonstrate how our proposed method can be employed to correct the too narrow conﬁdence intervals that result from using I ( θ ) = H ( θ ) . W e then compare our results with those derived fr om the known maximum composite likelihood information matrix (4). W e considered four spatial models for stationary max-stable processes that describe dif- ferent degrees of extremal dependence, with parameter inference based on m = 100 ob- servations at each of K = 50 randomly generated spatial locations. Each model expr esses the degree of extr emal dependence via the covariance matrix Σ =  σ 2 1 σ 12 σ 12 σ 2 2  , where the values for each parameter for each model M 1 , . . . , M 4 are given in T able 2. Model M 4 has an additional non-stationary spatial component that is modelled by the extra marginal parameters µ, λ and ξ (corresponding to location, scale and shape param- eters) through the r esponse surface µ = α 0 + α 1 ∗ lat + α 2 ∗ lon λ = β 0 + β 1 ∗ lon ξ = γ 0 , where lat and lon denote latitude and longitude coordinates. Model σ 2 1 σ 12 σ 2 2 M 1 , M 4 9/8 0 9/8 M 2 2000 1200 2800 M 3 25 35 14 T able 2: Covariance matrix conﬁgurations for models M 1 , . . . , M 4 for the extremal spatial depen- dence analysis. T ables 3 and 4 summarise the empirical coverage probabilities for nominal 95% conﬁ- dence intervals for models M 1 , . . . , M 4 based on 500 replicate analyses. Columns C 1 pro- vide the interval coverage using the standard Hessian matrix I ( θ ) = H ( θ ) , and columns C 2 provide the same using the composite likelihood information matrix (4) following Padoan et al. (2010). Columns C 3 correspond to our correction procedur e when applied to the intervals in column C 1 using the standard Hessian matrix. For the correction proce- dure we used ˜ θ = ˆ θ , the maximum composite likelihood estimate, and n = 500 simulated datasets to perform the adjustment. From T able 3, clearly conﬁdence interval coverage based on the standar d Hessian matrix ( C 1 ) is too low . The coverage using the sandwich information matrix ( C 2 ) is very good, with all the reported values close to 0.95. The coverage values obtained using our adjust- ment pr ocedure, which is based on the intervals in column C 1 , ar e also very close to 0 . 95 , 11 and mostly closer than with the sandwich information matrix. Similar results are ob- tained for model M 4 in T able 4. T aken together , these results indicate that our adjustment procedur e can successfully modify the upper and lower limits of a conﬁdence interval to achieve comparable r esults to established methods in complex settings. However , it does not make use of the algebraic repr esentation of J ( θ ) in this case, and so is more easily extended to alternative models (e.g. where J ( θ ) is not available), albeit at a moderate computational cost. M 1 M 2 M 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 σ 2 1 0.428 0.960 0.960 0.098 0.947 0.950 0.092 0.939 0.944 σ 12 0.518 0.940 0.960 0.122 0.955 0.956 0.154 0.921 0.960 σ 2 2 0.468 0.930 0.936 0.092 0.955 0.938 0.102 0.940 0.952 T able 3: Empirical coverage probabilities for 95% conﬁdence intervals of the parameters of mod- els M 1 , M 2 , and M 3 based on 500 replicate analyses. Columns indicate interval conﬁdence estima- tion methods using: ( C 1 ) the standard Hessian matrix I ( θ ) = H ( θ ) ; ( C 2 ) the sandwich information matrix I ( θ ) = H ( θ ) J − 1 ( θ ) H ( θ ) ; and ( C 3 ) the standard Hessian matrix I ( θ ) = H ( θ ) followed by our correction pr ocedure. σ 2 1 σ 12 σ 2 2 α 0 α 1 α 2 β 0 β 1 γ 0 C 1 0.372 0.544 0.388 0.114 0.122 0.098 0.108 0.140 0.104 C 2 0.930 0.925 0.945 0.935 0.945 0.945 0.935 0.940 0.910 C 3 0.924 0.924 0.958 0.950 0.940 0.944 0.944 0.952 0.952 T able 4: Empirical coverage probabilities for 95% conﬁdence intervals of the parameters of model M 4 based on 500 replicate analyses. Columns indicate interval conﬁdence estimation methods using: ( C 1 ) the standard Hessian matrix I ( θ ) = H ( θ ) ; ( C 2 ) the sandwich information matrix I ( θ ) = H ( θ ) J − 1 ( θ ) H ( θ ) ; and ( C 3 ) the standard Hessian matrix I ( θ ) = H ( θ ) followed by our correction pr ocedure. 3.2 Exchange rate analysis using approximate Bayesian computation Approximate Bayesian computation (ABC) describes a family of methods of approximat- ing a posterior distribution when the likelihood function is computationally intractable, but where sampling fr om the likelihood is possible (e.g. Beaumont et al. 2002; Sisson and Fan 2011). These methods can be thought of as constructing a conditional density esti- mate of the posterior (Blum 2010), where the scale parameter , h > 0 , of the kernel density function controls both the level of accuracy of the approximation, and the computation requir ed to construct it. Lower h results in more accurate posterior approximations, but in r eturn r equires considerably more computation. As such, moderate values of the scale parameter are often used in practice. Accordingly , this typically r esults in oversmoothed estimates of the posterior , and in turn, credible intervals that ar e too wide. W e consider an analysis of daily exchange rate log returns of the British pound to the Aus- tralian dollar between 2005 and 2007. Drovandi and Pettitt (2011) developed an MA(1) type model for these data where the individual log returns were modelled by a g -and- k distribution (Rayner and MacGillivray 2002). The g -and- k distribution is typically de- 12 ﬁned through it’s quantile function Q ( z ( p ); θ ) = a + b  1 + c 1 − exp( − g z ( p )) 1 + exp( − g z ( p ))  (1 + z ( p ) 2 ) k z ( p ) , (5) where θ = ( a, b, g , k ) are parameters controlling location, scale, skewness and kurtosis, and z ( p ) is the p -quantile of a standard normal distribution. The parameter c = 0 . 8 is typically ﬁxed. W e used the sequential Monte Carlo-based ABC algorithm in Drovandi and Pettitt (2011), based on 2,000 particles, to ﬁt the MA(1) model. The data-generation process, used in both ABC and our correction pr ocedure, consists of drawing dependent quantiles z i = ( η i + αη i − 1 ) / √ 1 + α 2 for i = 1 , . . . n , where η i ∼ N (0 , 1) for i = 0 , . . . , n , and then substituting z ( p ) = z i in (5). T able 5 shows the estimated 95% central credible intervals, and their widths, for each model parameter based on the ABC kernel scale parameter h = 0 . 016 (following Drovandi and Pettitt 2011) and also the lower value of h = 0 . 009 . Also shown are the inter- vals obtained after performing a local-linear , ridge regression-adjustment (Blum et al. 2013; Beaumont et al. 2002) on the posterior obtained with h = 0 . 016 . The regr ession- adjustment is a standar d ABC technique for improving the precision of an ABC posterior approximation, which aims to estimate the posterior at h = 0 based on an assumed re- gression model. Clearly the parameter cr edible intervals obtained with h = 0 . 009 are narrower than those obtained with h = 0 . 016 , indicating that the larger intervals indeed have greater than 95% coverage. The regr ession-adjusted intervals generally have widths somewhere between the intervals constructed with h = 0 . 016 and h = 0 . 009 . The suggestion from T able 5 is that even narrower (i.e. more accurate) credible intervals may result if it were possible to reduce h further . T able 6 shows the corrected 95% central credible interval estimates, obtained from the ABC posterior approximations with kernel scale parameter h = 0 . 016 , 0 . 02 and 0 . 03 . The correction was based on n = 500 simulated datasets and using the posterior mean as the estimate ˜ θ of θ . The results of the correction across the three kernel scale parameter values are similar , suggesting potential computational savings in the ABC posterior simulation stage, as one may perform the analysis with larger values of h . All parameters achieve equivalent or improved precision compared to the most precise ABC posterior estimate obtained with h = 0 . 009 . While for a standard Bayesian analysis, the posterior mean is a consistent estimator of θ , this may not be true in the case of the ABC approximate posterior for h > 0 , as the lo- cation and shape of the ABC posterior can change with h . However , the posterior mean is a consistent estimator for θ for h = 0 . As such, some care may be needed when spec- ifying ˜ θ as the posterior mean in the ABC setting. In the current analysis, a preliminary investigation suggested that estimates of the posterior mean stabilised below h = 0 . 03 , which suggest that the posterior mean is approximately consistent for h < 0 . 03 . While this determination is slightly ad-hoc, it is practically viable, and an intuitively sensible way of determining whether the computed posterior mean is a consistent estimator of θ . As such, we are conﬁdent that the posterior mean produces an effectively consistent estimate ˜ θ of θ in this case. 13 h = 0 . 016 W idth h = 0 . 009 W idth Reg. Adj. ( h = 0 . 016 ) W idth a (-0.0006, 0.0002) 0.0008 (-0.0004, 0.0001) 0.0005 (-0.0006, 0.0002) 0.0008 b ( 0.0018, 0.0028) 0.0010 ( 0.0019, 0.0026) 0.0007 ( 0.0018, 0.0027) 0.0009 g (-0.0267, 0.2573) 0.2840 (-0.0044, 0.2138) 0.2182 (-0.0286, 0.2505) 0.2791 k ( 0.2024, 0.5061) 0.3037 ( 0.2607, 0.5322) 0.2715 ( 0.2148, 0.5092) 0.2944 α ( 0.1413, 0.2713) 0.1300 ( 0.1491, 0.2771) 0.1280 ( 0.1489, 0.2742) 0.1253 T able 5: 95% central credibile intervals and corresponding interval widths from the g - and- k distribution MA(1) model. Results obtained using ABC posterior approximation with kernel scale parameter h = 0 . 016 and h = 0 . 009 , and following a ridge regr ession- adjustment based on an ABC posterior approximation with h = 0 . 016 . h = 0 . 016 W idth h = 0 . 02 W idth h = 0 . 03 W idth a (-0.0003, 0.0000) 0.0003 (-0.0003, 0.0000) 0.0003 (-0.0004, -0.0001) 0.0005 b ( 0.0020, 0.0024) 0.0004 ( 0.0021, 0.0025) 0.0004 ( 0.0021, 0.0024) 0.0003 g ( 0.0303, 0.2156) 0.1853 ( 0.0173, 0.1818) 0.1645 ( 0.0204, 0.1957) 0.1753 k ( 0.2769, 0.4099) 0.1330 ( 0.2909, 0.4235) 0.1326 ( 0.2768, 0.4129) 0.1362 α ( 0.1430, 0.2659) 0.1229 ( 0.1335, 0.2708) 0.1373 ( 0.1513, 0.2714) 0.1201 T able 6: Adjusted 95% central credibility intervals and corresponding interval widths from the g -and- k distribution MA(1) model. Adjusted intervals based on correcting ABC posterior approximations with kernel scale parameter h = 0 . 016 , 0 . 02 and 0 . 03 . 4 Discussion In this article we have introduced a method of adjusting conﬁdence interval estimates to have a correct nominal coverage probability . This method was developed in the fr equen- tist framework, but may be equally applied to ensure that Bayesian credible intervals possess the (fr equentist) coverage property . Our approach is general and makes mini- mal assumptions: namely that it is possible to generate data under the same procedure (model) that produced the observed data, and that a consistent estimator is available for the parameter of interest. The correction is asymptotically unbiased, although it can work well for moderate sample sizes ( m ), and there is a central limit theorem for the corrected interval limits in terms of the number ( n ) of auxiliary samples used to implement the correction. As the correction is estimated by Monte Carlo, when there is no bias present, so that L ( x ) = L c ( x ) , for ﬁnite numbers of replicate datasets y 1 , . . . , y n , ﬁnite sample estimates of ξ α/ 2 may be non-zero. This will result in small, non-zero adjustments to intervals that already have the correct nominal coverage. In practice, for moderate n , this is likely to have negligible effect (e.g. see the results in T able 1). However , in this and other settings where there is low bias, the central limit theorem of Corollary 1 describes the precision of the ﬁnite sample adjustment as a function of n , thereby providing a guide as to when the Monte Carlo variability of ˜ L c ( x ) will be an improvement over the bias of L ( x ) . As constr ucted in Theorems 2.1 and 2.2, our proposed corr ection is for univariate param- 14 eters, θ , as it is based on quantiles. For multivariate θ , from the perspective of adjust- ing any given margin, the impact of the r emaining (nuisance) parameters is controlled through the estimate ˜ θ of θ . Asymptotically , the consistency of ˜ θ means that ˜ θ → θ as the sample size m → ∞ , from which Theorems 2.1 and 2.2 will then hold for the margin of interest. However , sub-asymptotically this is not the case, and the performance of the adjustment of any mar gin will depend on the quality of the estimate of θ (this is also true in the univariate setting). The results of our analyses in Sections 2.3 and 3 suggest that the procedur e can work well, even for moderate m . In practice, in the examples that we have consider ed, we have found that our method can produce conﬁdence intervals which perform comparably to existing gold standard approaches – though with gr eater scope for extension to mor e complicated models – and provide a reliable method of adjusting approximately obtained credible intervals in chal- lenging settings. One potential criticism of our approach is that it requir es the construction of a large number ( n ) of conﬁdence or credible intervals in order to correct one interval. In the case where constructing a single interval is computationally expensive, implementing the correction procedur e in full can result in a large amount of computation. This was the case in our exchange rate data analysis using ABC methods, where using an alter- native ABC algorithm such as regression-adjustment (based on a single lar ge number of model simulations) would have been more efﬁcient. However , regr ession-adjustment can itself perform poorly if the assumed regr ession model is incorrect, while as our cor- rection procedur e makes minimal assumptions, we may still have good conﬁdence in the resulting adjusted intervals it pr ovides. Acknowledgements W e are grateful to Chris Drovandi and T ony Pettitt for kindly allowing us to use their Matlab codes for the ABC analysis. Financial support for this research was provided by the Australian Research Council Discovery Project scheme (DP1092805) and the Univer- sity of New South W ales. Appendix: Proofs Proof of Theorem 2.1 W riting L c ( x ) = L ( x ) + δ , then P ( θ < L c ( x )) = P ( θ < L ( x ) + δ ) = P ( θ − L ( x ) < δ ) . Hence, by deﬁnition, P ( θ < L c ( x )) = α/ 2 if δ = ξ α/ 2 , where ξ α/ 2 is the α/ 2 -th quantile of the distribution of θ − L ( x ) . 15 Proof of Theorem 2.2 Let G { θ − L ( x ) } be the distribution function of θ − L ( x ) , which has positive ﬁrst derivatives so that G 0 { θ − L ( x ) } ( ν ) = ∂ ∂ z G { θ − L ( x ) } ( ν ) > 0 for all ν ∈ R . Also let ˆ G { ˜ θ − L ( y ) } be the empiri- cal distribution of ˜ θ − L ( y ) based on the samples ˜ θ − L i ( y i ) , i = 1 , . . . , n . From Theor em 2.1 we have that L c ( x ) = L ( x ) + ξ α/ 2 for some α ∈ (0 , 1) , where G { θ − L ( x ) } ( ξ α/ 2 ) = α/ 2 . Let ˆ ξ α/ 2 = ˆ G − 1 { ˜ θ − L ( y ) } ( α/ 2) be the empirical estimate of ξ α/ 2 . If we deﬁne ˜ L c ( x ) = L ( x ) + ˆ ξ α/ 2 , then for any ω ∈ R , we have Pr  √ n ( ˜ L c ( x ) − L c ( x )) ≤ ω ) = Pr  √ n ( ˆ ξ α/ 2 − ξ α/ 2 ) ≤ ω ) = Pr ( ˆ ξ α/ 2 ≤ ξ α/ 2 + ω / √ n  = Pr  G { θ − L ( x ) } ( ˆ ξ α/ 2 ) ≤ G { θ − L ( x ) } ( ξ α/ 2 + ω / √ n )  = Pr  G { θ − L ( x ) } ( ˆ ξ α/ 2 ) ≤ α/ 2 + [ ω G 0 { θ − L ( x ) } ( ξ α/ 2 ) + o (1)] / √ n  where the last equality follows fr om a ﬁrst order T aylor expansion of G at ξ α/ 2 . If Y represents the number of times that G ( ˆ ξ α/ 2 ) is smaller than ζ = α/ 2+[ ω G 0 { θ − L ( x ) } ( ξ α/ 2 )+ o (1)] / √ n , then since G ( ξ ) ∼ U (0 , 1) , we have Y ∼ Binomial ( n, ζ ) . Hence Y − nζ p nζ (1 − ζ ) → N (0 , 1) (6) in distribution as n → ∞ (der V aart 2000). Let r n be the integer rank of the α/ 2 -th quantile from a data set X = { X 1 , . . . , X n } of length n , such that ˆ ξ α/ 2 = X ( r n ) . If we assume that nα/ 2 − r n √ n → 0 as n → ∞ , then from (6) we have Pr  √ n ( ˆ ξ α/ 2 − ξ α/ 2 ) ≤ ω  = Pr  G { θ − L ( x ) } ( ˆ ξ α/ 2 ) ≤ ζ  = Pr  Y ≥ r n  = Pr Y − nζ p nζ (1 − ζ ) ≥ r n − nζ p nζ (1 − ζ ) ! = Φ nα/ 2 + √ nω G 0 { θ − L ( x ) } ( ξ α/ 2 ) − r n p nζ (1 − ζ ) ! + o p (1) = Φ n ( α/ 2) − r n p n ( α/ 2)(1 − α/ 2) + ω G 0 { θ − L ( x ) } ( ξ α/ 2 ) p ( α/ 2)(1 − α/ 2) ! + o p (1) = Φ ω G 0 { θ − L ( x ) } ( ξ α/ 2 ) p ( α/ 2)(1 − α/ 2) ! + o p (1) . (7) It then follows that the consistency of the estimator ˜ L c can be established as lim n →∞ Pr ( √ n ( ˜ L c ( x ) − L c ( x )) ≥  ) ≤ lim n →∞ E [ √ n ( ˜ L c ( x ) − L c ( x ))]  = 0 by the Markov inequality (Ash and Doleans-Dade 2000). 16 References Ash, R. B. and C. Doleans-Dade (2000). Probability and Measur e theory . Academic Press. Beaumont, M. A., W . Zhang, and D. J. Balding (2002). Approximate Bayesian compu- tation in population genetics. Genetics 162 (4), 2025–2035. Beran, R. (1987). Prepivoting to reduce level error of conﬁdence sets. Biometrika 74 (3), 457–468. Blum, M. G. B. (2010). Approximate Bayesian computation: a nonparametric perspec- tive. Journal of the American Statistical Association 105 , 1178–1187. Blum, M. G. B., M. A. Nunes, D. Prangle, and S. A. Sisson (2013). A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science 28 , 189–208. Booth, J. G. and P . Hall (1994). Monte Carlo approximation and the iterated bootstrap. Biometrika 81 , 331–340. Buckland, S. T . (1984). Monte Carlo conﬁdence intervals. Biometics 40 , 811–817. Cook, S., A. Gelman, and D. Rubin (2006). V alidation of software for Bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics 15 , 675– 692. Davison, A. C. and D. V . Hinkley (1997). Bootstrap methods and their application . Cam- bridge University Press. der V aart, A. W . V . (2000). Asymptotic statistics . Cambridge University Pr ess. DiCiccio, T . J., M. A. Martin, and G. A. Y oung (1992). Fast and accurate approximate double bootstrap conﬁdence intervals. Biometrika 79 (2), 285–295. Drovandi, C. C. and A. N. Pettitt (2011). Likelihood-free Bayesian estimation of mul- tivariate quantile distributions. Computational Statistics & Data Analysis 55 , 2541– 2556. Efron, B. (1987). Better bootstrap conﬁdence intervals. Journal of the American Statistical Association 82 (397), 171–185. Fearnhead, P . and D. Prangle (2012). Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation (with discussion). Journal of the Royal Statistical Society: Series B 74 , 419–474. Garthwaite, P . H. and S. T . Buckland (1992). Generating Monte Carlo conﬁdence inter- vals by the Robbins-Monro pr ocess. Applied Statistics , 159–171. Hall, P . (1986). On the bootstrap and conﬁdence intervals. The Annals of Statistics , 1431– 1452. Hansen, B. E. (1999). The grid bootstrap and the autoregressive model. Review of Eco- nomics and Statistics 81 (4), 594–607. Huber , P . J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the ﬁfth Berkeley Symposium on Mathematical Statistics and Probability , V olume 1, pp. 221–33. Kabaila, P . (1993). Some properties of proﬁle bootstrap conﬁdence intervals. Australian Journal of Statistics 35 (2), 205–214. 17 Kabaila, P . and C. J. Lloyd (2000). A computable conﬁdence upper limit from discrete data with good coverage properties. Statistics & pr obability letters 47 (2), 189–198. Lloyd, C. (2011). Computing highly accurate conﬁdence limits from discrete data using importance sampling. http://works.bepress.com/chris_lloyd/23 . Martin, M. A. (1990). On bootstrap iteration for coverage correction in conﬁdence in- tervals. Journal of the American Statistical Association 85 (412), 1105–1118. McCullough, B. and H. D. V inod (1998). Implementing the double bootstrap. Compu- tational Economics 12 (1), 79–95. Nankervis, J. C. (2005). Computational algorithms for double bootstrap conﬁdence intervals. Computational Statistics & Data Analysis 49 , 462–475. Padoan, S. A., M. Ribatet, and S. A. Sisson (2010). Likelihood-based inference for max- stable processes. Journal of the American Statistical Association 105 , 263 – 277. Prangle, D., M. G. B. Blum, G. Popovic, and S. A. Sisson (2012). Diagnostic tools for approximate Bayesian computation using the coverage property . T ech Report , http://arxiv.org/abs/1301.3166 . Rayner , G. D. and H. L. MacGillivray (2002). Numerical maximum likelihood estima- tion for the g -and- k and generalized g -and- h distributions. Statistics and Comput- ing 12 (1), 57–75. Schenker , N. (1985). Qualms about bootstrap conﬁdence intervals. Journal of the Ameri- can Statistical Association 390 , 360–361. Sisson, S. A. and Y . Fan (2011). Likelihood-free Markov chain Monte Carlo. In S. P . Brooks, A. Gelman, G. Jones, and X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo , pp. 319–341. Chapman and Hall/CRC Press. V arin, C., N. Reid, and D. Firth (2011). An overview of composite likelihood methods. Statistica Sinica 21 (1), 5–42. 18

Simultaneous adjustment of bias and coverage probabilities for confidence intervals

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment